November 30, 2017

What is TDA?

Topological Data Analysis (TDA) is a way of looking at the topology of data, rather than linear fits or probability models. What's the shape of data?

Pros: shape persists however you "rotate" your data (point of view matters less).

Today: TDA with R

  • Example of a circle with TDA and TDAmapper packages
  • Example of an infinity symbol
  • Stock data (messy, real life)

Packages

library(BatchGetSymbols)
library(reshape2)
library(TDA)
library(TDAmapper)
library(igraph)

Circle first

circle1 <- circleUnif(60)
circle2 <- circleUnif(60,r=2)+3
circles <- rbind(circle1,circle2)
plot(circles)

Circle persistence diagram

Persistence diagram: birth and death of homology classes

TDAmapper for the circle

circlemapper <- mapper1D(distance_matrix=dist(circles), filter_values = circles[,1], 
  num_intervals =  12, 
  percent_overlap = 50,
  num_bins_when_clustering =12)
g2 <- graph.adjacency(circlemapper$adjacency,   
  mode="undirected")

Plot result from TDAmapper for circles

plot(g2, layout = layout.auto(g2) )

Infinity symbol

infinitydf<- cbind(2*cos(0.5*(1:100)),sin(1:100))
plot(infinitydf)

Persistence diagram

TDAmapper

infinitymapper <- mapper1D(distance_matrix=dist(infinitydf), 
  filter_values = infinitydf[,1], 
  num_intervals =  12, 
  percent_overlap = 50,
  num_bins_when_clustering =12)

g2 <- graph.adjacency(infinitymapper$adjacency, 
  mode="undirected")

TDAmapper for infinity symbol

Financial data!

  • With students Jacqueline Cai and Hao Li at the University of Minnesota
  • Inspired by paper "Topology Data Analysis Of Critical Transitions in Financial Networks" by Marian Gidea
  • Visualize transition in stock correlations before crisis (?)

Get stock data and make log returns matrix

first.date <- as.Date('2004/01/01')
last.date <- as.Date('2009/09/30')
tickers <- c('MMM','AXP','AAPL','BA','CAT','CVX','CSCO','KO','DWDP','XOM','GE','GS','HD','IBM','INTC','JNJ','JPM','MCD','MRK','MSFT','NKE','PFE','PG','TRV','UNH','UTX','VZ','WMT','DIS')
data <- BatchGetSymbols(tickers = tickers,
                        first.date = first.date,
                        last.date = last.date)
## 
## Running BatchGetSymbols for:
##    tickers = MMM, AXP, AAPL, BA, CAT, CVX, CSCO, KO, DWDP, XOM, GE, GS, HD, IBM, INTC, JNJ, JPM, MCD, MRK, MSFT, NKE, PFE, PG, TRV, UNH, UTX, VZ, WMT, DIS
##    Downloading data for benchmark ticker
## Downloading Data for MMM from yahoo (1|29) - Nice!
## Downloading Data for AXP from yahoo (2|29) - Well done!
## Downloading Data for AAPL from yahoo (3|29) - Well done!
## Downloading Data for BA from yahoo (4|29) - Good job!
## Downloading Data for CAT from yahoo (5|29) - Good job!
## Downloading Data for CVX from yahoo (6|29) - Got it!
## Downloading Data for CSCO from yahoo (7|29) - Boa!
## Downloading Data for KO from yahoo (8|29) - Got it!
## Downloading Data for DWDP from yahoo (9|29) - Nice!
## Downloading Data for XOM from yahoo (10|29) - OK!
## Downloading Data for GE from yahoo (11|29) - OK!
## Downloading Data for GS from yahoo (12|29) - Nice!
## Downloading Data for HD from yahoo (13|29) - Well done!
## Downloading Data for IBM from yahoo (14|29) - Well done!
## Downloading Data for INTC from yahoo (15|29) - OK!
## Downloading Data for JNJ from yahoo (16|29) - Good stuff!
## Downloading Data for JPM from yahoo (17|29) - Good stuff!
## Downloading Data for MCD from yahoo (18|29) - Got it!
## Downloading Data for MRK from yahoo (19|29) - Well done!
## Downloading Data for MSFT from yahoo (20|29) - Good stuff!
## Downloading Data for NKE from yahoo (21|29) - Well done!
## Downloading Data for PFE from yahoo (22|29) - Got it!
## Downloading Data for PG from yahoo (23|29) - Boa!
## Downloading Data for TRV from yahoo (24|29) - Well done!
## Downloading Data for UNH from yahoo (25|29) - Good job!
## Downloading Data for UTX from yahoo (26|29) - Good stuff!
## Downloading Data for VZ from yahoo (27|29) - Well done!
## Downloading Data for WMT from yahoo (28|29) - OK!
## Downloading Data for DIS from yahoo (29|29) - Good job!
newdata <- BatchGetSymbols(tickers = tickers,
            first.date = as.Date('2017/05/01'),
            last.date = as.Date('2017/10/01'))
## 
## Running BatchGetSymbols for:
##    tickers = MMM, AXP, AAPL, BA, CAT, CVX, CSCO, KO, DWDP, XOM, GE, GS, HD, IBM, INTC, JNJ, JPM, MCD, MRK, MSFT, NKE, PFE, PG, TRV, UNH, UTX, VZ, WMT, DIS
##    Downloading data for benchmark ticker
## Downloading Data for MMM from yahoo (1|29) - Nice!
## Downloading Data for AXP from yahoo (2|29) - Good stuff!
## Downloading Data for AAPL from yahoo (3|29) - Nice!
## Downloading Data for BA from yahoo (4|29) - Good job!
## Downloading Data for CAT from yahoo (5|29) - Good stuff!
## Downloading Data for CVX from yahoo (6|29) - OK!
## Downloading Data for CSCO from yahoo (7|29) - Boa!
## Downloading Data for KO from yahoo (8|29) - OK!
## Downloading Data for DWDP from yahoo (9|29) - OK!
## Downloading Data for XOM from yahoo (10|29) - Nice!
## Downloading Data for GE from yahoo (11|29) - Well done!
## Downloading Data for GS from yahoo (12|29) - Well done!
## Downloading Data for HD from yahoo (13|29) - Nice!
## Downloading Data for IBM from yahoo (14|29) - Boa!
## Downloading Data for INTC from yahoo (15|29) - Nice!
## Downloading Data for JNJ from yahoo (16|29) - Got it!
## Downloading Data for JPM from yahoo (17|29) - Nice!
## Downloading Data for MCD from yahoo (18|29) - Got it!
## Downloading Data for MRK from yahoo (19|29) - OK!
## Downloading Data for MSFT from yahoo (20|29) - Nice!
## Downloading Data for NKE from yahoo (21|29) - Good stuff!
## Downloading Data for PFE from yahoo (22|29) - Good job!
## Downloading Data for PG from yahoo (23|29) - Got it!
## Downloading Data for TRV from yahoo (24|29) - Good job!
## Downloading Data for UNH from yahoo (25|29) - Boa!
## Downloading Data for UTX from yahoo (26|29) - Good stuff!
## Downloading Data for VZ from yahoo (27|29) - Well done!
## Downloading Data for WMT from yahoo (28|29) - Got it!
## Downloading Data for DIS from yahoo (29|29) - Good stuff!
widestock <- dcast(data$df.tickers[,6:8], ref.date ~ ticker, value.var="price.adjusted")
logrets <- apply(widestock[,2:30], 2, 
                function(x) diff(log(x), lag=1))

Make a correlation matrix for the log returns

Correlation matrices are really easy!

first100cor <- cor(logrets[1:100,], method = "pearson")
second100cor <- cor(logrets[101:200,], method = "pearson")

Graphing the correlation networks

Let's make a matrix by looking at a "super-level set," basically only showing edges when stocks have highly correlated log returns. Notice I'm making a "distance" matrix out of the correlation matrix.

highcor100mat <- sqrt(2*(first100cor+1))
highcorsuperlevel100 <- ifelse (highcor100mat>1.8, highcor100mat,0)
network=graph_from_adjacency_matrix(highcorsuperlevel100, 
        weighted=TRUE, 
        mode="undirected", 
        diag=F)

Plot for first 100 days

plot(network)