November 30, 2017

What is TDA?

Topological Data Analysis (TDA) is a way of looking at the topology of data, rather than linear fits or probability models. What's the shape of data?

Pros: shape persists however you "rotate" your data (point of view matters less).

Today: TDA with R

  • Example of a circle with TDA and TDAmapper packages
  • Example of an infinity symbol
  • Stock data (messy, real life)

Packages

library(BatchGetSymbols)
library(reshape2)
library(TDA)
library(TDAmapper)
library(igraph)

Circle first

circle1 <- circleUnif(60)
circle2 <- circleUnif(60,r=2)+3
circles <- rbind(circle1,circle2)
plot(circles)

Circle persistence diagram

Persistence diagram: birth and death of homology classes

TDAmapper for the circle

circlemapper <- mapper1D(distance_matrix=dist(circles), filter_values = circles[,1], 
  num_intervals =  12, 
  percent_overlap = 50,
  num_bins_when_clustering =12)
g2 <- graph.adjacency(circlemapper$adjacency,   
  mode="undirected")

Plot result from TDAmapper for circles

plot(g2, layout = layout.auto(g2) )

Infinity symbol

infinitydf<- cbind(2*cos(0.5*(1:100)),sin(1:100))
plot(infinitydf)

Persistence diagram

TDAmapper

infinitymapper <- mapper1D(distance_matrix=dist(infinitydf), 
  filter_values = infinitydf[,1], 
  num_intervals =  12, 
  percent_overlap = 50,
  num_bins_when_clustering =12)

g2 <- graph.adjacency(infinitymapper$adjacency, 
  mode="undirected")

TDAmapper for infinity symbol

Financial data!

  • With students Jacqueline Cai and Hao Li at the University of Minnesota
  • Inspired by paper "Topology Data Analysis Of Critical Transitions in Financial Networks" by Marian Gidea
  • Visualize transition in stock correlations before crisis (?)

Get stock data and make log returns matrix

first.date <- as.Date('2004/01/01')
last.date <- as.Date('2009/09/30')
tickers <- c('MMM','AXP','AAPL','BA','CAT','CVX','CSCO','KO','DWDP','XOM','GE','GS','HD','IBM','INTC','JNJ','JPM','MCD','MRK','MSFT','NKE','PFE','PG','TRV','UNH','UTX','VZ','WMT','DIS')
data <- BatchGetSymbols(tickers = tickers,
                        first.date = first.date,
                        last.date = last.date)
## 
## Running BatchGetSymbols for:
##    tickers = MMM, AXP, AAPL, BA, CAT, CVX, CSCO, KO, DWDP, XOM, GE, GS, HD, IBM, INTC, JNJ, JPM, MCD, MRK, MSFT, NKE, PFE, PG, TRV, UNH, UTX, VZ, WMT, DIS
##    Downloading data for benchmark ticker
## Downloading Data for MMM from yahoo (1|29) - Nice!
## Downloading Data for AXP from yahoo (2|29) - Well done!
## Downloading Data for AAPL from yahoo (3|29) - Well done!
## Downloading Data for BA from yahoo (4|29) - Good job!
## Downloading Data for CAT from yahoo (5|29) - Good job!
## Downloading Data for CVX from yahoo (6|29) - Got it!
## Downloading Data for CSCO from yahoo (7|29) - Boa!
## Downloading Data for KO from yahoo (8|29) - Got it!
## Downloading Data for DWDP from yahoo (9|29) - Nice!
## Downloading Data for XOM from yahoo (10|29) - OK!
## Downloading Data for GE from yahoo (11|29) - OK!
## Downloading Data for GS from yahoo (12|29) - Nice!
## Downloading Data for HD from yahoo (13|29) - Well done!
## Downloading Data for IBM from yahoo (14|29) - Well done!
## Downloading Data for INTC from yahoo (15|29) - OK!
## Downloading Data for JNJ from yahoo (16|29) - Good stuff!
## Downloading Data for JPM from yahoo (17|29) - Good stuff!
## Downloading Data for MCD from yahoo (18|29) - Got it!
## Downloading Data for MRK from yahoo (19|29) - Well done!
## Downloading Data for MSFT from yahoo (20|29) - Good stuff!
## Downloading Data for NKE from yahoo (21|29) - Well done!
## Downloading Data for PFE from yahoo (22|29) - Got it!
## Downloading Data for PG from yahoo (23|29) - Boa!
## Downloading Data for TRV from yahoo (24|29) - Well done!
## Downloading Data for UNH from yahoo (25|29) - Good job!
## Downloading Data for UTX from yahoo (26|29) - Good stuff!
## Downloading Data for VZ from yahoo (27|29) - Well done!
## Downloading Data for WMT from yahoo (28|29) - OK!
## Downloading Data for DIS from yahoo (29|29) - Good job!
newdata <- BatchGetSymbols(tickers = tickers,
            first.date = as.Date('2017/05/01'),
            last.date = as.Date('2017/10/01'))
## 
## Running BatchGetSymbols for:
##    tickers = MMM, AXP, AAPL, BA, CAT, CVX, CSCO, KO, DWDP, XOM, GE, GS, HD, IBM, INTC, JNJ, JPM, MCD, MRK, MSFT, NKE, PFE, PG, TRV, UNH, UTX, VZ, WMT, DIS
##    Downloading data for benchmark ticker
## Downloading Data for MMM from yahoo (1|29) - Nice!
## Downloading Data for AXP from yahoo (2|29) - Good stuff!
## Downloading Data for AAPL from yahoo (3|29) - Nice!
## Downloading Data for BA from yahoo (4|29) - Good job!
## Downloading Data for CAT from yahoo (5|29) - Good stuff!
## Downloading Data for CVX from yahoo (6|29) - OK!
## Downloading Data for CSCO from yahoo (7|29) - Boa!
## Downloading Data for KO from yahoo (8|29) - OK!
## Downloading Data for DWDP from yahoo (9|29) - OK!
## Downloading Data for XOM from yahoo (10|29) - Nice!
## Downloading Data for GE from yahoo (11|29) - Well done!
## Downloading Data for GS from yahoo (12|29) - Well done!
## Downloading Data for HD from yahoo (13|29) - Nice!
## Downloading Data for IBM from yahoo (14|29) - Boa!
## Downloading Data for INTC from yahoo (15|29) - Nice!
## Downloading Data for JNJ from yahoo (16|29) - Got it!
## Downloading Data for JPM from yahoo (17|29) - Nice!
## Downloading Data for MCD from yahoo (18|29) - Got it!
## Downloading Data for MRK from yahoo (19|29) - OK!
## Downloading Data for MSFT from yahoo (20|29) - Nice!
## Downloading Data for NKE from yahoo (21|29) - Good stuff!
## Downloading Data for PFE from yahoo (22|29) - Good job!
## Downloading Data for PG from yahoo (23|29) - Got it!
## Downloading Data for TRV from yahoo (24|29) - Good job!
## Downloading Data for UNH from yahoo (25|29) - Boa!
## Downloading Data for UTX from yahoo (26|29) - Good stuff!
## Downloading Data for VZ from yahoo (27|29) - Well done!
## Downloading Data for WMT from yahoo (28|29) - Got it!
## Downloading Data for DIS from yahoo (29|29) - Good stuff!
widestock <- dcast(data$df.tickers[,6:8], ref.date ~ ticker, value.var="price.adjusted")
logrets <- apply(widestock[,2:30], 2, 
                function(x) diff(log(x), lag=1))

Make a correlation matrix for the log returns

Correlation matrices are really easy!

first100cor <- cor(logrets[1:100,], method = "pearson")
second100cor <- cor(logrets[101:200,], method = "pearson")

Graphing the correlation networks

Let's make a matrix by looking at a "super-level set," basically only showing edges when stocks have highly correlated log returns. Notice I'm making a "distance" matrix out of the correlation matrix.

highcor100mat <- sqrt(2*(first100cor+1))
highcorsuperlevel100 <- ifelse (highcor100mat>1.8, highcor100mat,0)
network=graph_from_adjacency_matrix(highcorsuperlevel100, 
        weighted=TRUE, 
        mode="undirected", 
        diag=F)

Plot for first 100 days

plot(network)

Plot for second 100 days

Financial crisis 100 days

Look now at 100 days starting December 21, 2007.

Financial crisis 1200 days

And now October-November-December 2008

Persistence diagram for stock data

We can make a persistence diagram for the stock data as well, to see if any "circles" exist or persist.

first100persistence <- ripsDiag(highcor100mat,maxscale=2,maxdimension = 1,dist="arbitrary")

plot(first100persistence[["diagram"]])

The next 200 days

Persistence diagram for the next 200 days, still in 2004-2005.

1001-1100 days

Persistence diagram for the network from 2007.

1001-1100 days

Persistence diagram for the network from 2007.

What about now?