Topological Data Analysis (TDA) is a way of looking at the topology of data, rather than linear fits or probability models. What's the shape of data?
Pros: shape persists however you "rotate" your data (point of view matters less).
November 30, 2017
Topological Data Analysis (TDA) is a way of looking at the topology of data, rather than linear fits or probability models. What's the shape of data?
Pros: shape persists however you "rotate" your data (point of view matters less).
library(BatchGetSymbols) library(reshape2) library(TDA) library(TDAmapper) library(igraph)
circle1 <- circleUnif(60) circle2 <- circleUnif(60,r=2)+3 circles <- rbind(circle1,circle2) plot(circles)
Persistence diagram: birth and death of homology classes
circlemapper <- mapper1D(distance_matrix=dist(circles), filter_values = circles[,1], num_intervals = 12, percent_overlap = 50, num_bins_when_clustering =12) g2 <- graph.adjacency(circlemapper$adjacency, mode="undirected")
plot(g2, layout = layout.auto(g2) )
infinitydf<- cbind(2*cos(0.5*(1:100)),sin(1:100)) plot(infinitydf)
infinitymapper <- mapper1D(distance_matrix=dist(infinitydf), filter_values = infinitydf[,1], num_intervals = 12, percent_overlap = 50, num_bins_when_clustering =12) g2 <- graph.adjacency(infinitymapper$adjacency, mode="undirected")
first.date <- as.Date('2004/01/01') last.date <- as.Date('2009/09/30') tickers <- c('MMM','AXP','AAPL','BA','CAT','CVX','CSCO','KO','DWDP','XOM','GE','GS','HD','IBM','INTC','JNJ','JPM','MCD','MRK','MSFT','NKE','PFE','PG','TRV','UNH','UTX','VZ','WMT','DIS') data <- BatchGetSymbols(tickers = tickers, first.date = first.date, last.date = last.date)
## ## Running BatchGetSymbols for: ## tickers = MMM, AXP, AAPL, BA, CAT, CVX, CSCO, KO, DWDP, XOM, GE, GS, HD, IBM, INTC, JNJ, JPM, MCD, MRK, MSFT, NKE, PFE, PG, TRV, UNH, UTX, VZ, WMT, DIS ## Downloading data for benchmark ticker ## Downloading Data for MMM from yahoo (1|29) - Nice! ## Downloading Data for AXP from yahoo (2|29) - Well done! ## Downloading Data for AAPL from yahoo (3|29) - Well done! ## Downloading Data for BA from yahoo (4|29) - Good job! ## Downloading Data for CAT from yahoo (5|29) - Good job! ## Downloading Data for CVX from yahoo (6|29) - Got it! ## Downloading Data for CSCO from yahoo (7|29) - Boa! ## Downloading Data for KO from yahoo (8|29) - Got it! ## Downloading Data for DWDP from yahoo (9|29) - Nice! ## Downloading Data for XOM from yahoo (10|29) - OK! ## Downloading Data for GE from yahoo (11|29) - OK! ## Downloading Data for GS from yahoo (12|29) - Nice! ## Downloading Data for HD from yahoo (13|29) - Well done! ## Downloading Data for IBM from yahoo (14|29) - Well done! ## Downloading Data for INTC from yahoo (15|29) - OK! ## Downloading Data for JNJ from yahoo (16|29) - Good stuff! ## Downloading Data for JPM from yahoo (17|29) - Good stuff! ## Downloading Data for MCD from yahoo (18|29) - Got it! ## Downloading Data for MRK from yahoo (19|29) - Well done! ## Downloading Data for MSFT from yahoo (20|29) - Good stuff! ## Downloading Data for NKE from yahoo (21|29) - Well done! ## Downloading Data for PFE from yahoo (22|29) - Got it! ## Downloading Data for PG from yahoo (23|29) - Boa! ## Downloading Data for TRV from yahoo (24|29) - Well done! ## Downloading Data for UNH from yahoo (25|29) - Good job! ## Downloading Data for UTX from yahoo (26|29) - Good stuff! ## Downloading Data for VZ from yahoo (27|29) - Well done! ## Downloading Data for WMT from yahoo (28|29) - OK! ## Downloading Data for DIS from yahoo (29|29) - Good job!
newdata <- BatchGetSymbols(tickers = tickers, first.date = as.Date('2017/05/01'), last.date = as.Date('2017/10/01'))
## ## Running BatchGetSymbols for: ## tickers = MMM, AXP, AAPL, BA, CAT, CVX, CSCO, KO, DWDP, XOM, GE, GS, HD, IBM, INTC, JNJ, JPM, MCD, MRK, MSFT, NKE, PFE, PG, TRV, UNH, UTX, VZ, WMT, DIS ## Downloading data for benchmark ticker ## Downloading Data for MMM from yahoo (1|29) - Nice! ## Downloading Data for AXP from yahoo (2|29) - Good stuff! ## Downloading Data for AAPL from yahoo (3|29) - Nice! ## Downloading Data for BA from yahoo (4|29) - Good job! ## Downloading Data for CAT from yahoo (5|29) - Good stuff! ## Downloading Data for CVX from yahoo (6|29) - OK! ## Downloading Data for CSCO from yahoo (7|29) - Boa! ## Downloading Data for KO from yahoo (8|29) - OK! ## Downloading Data for DWDP from yahoo (9|29) - OK! ## Downloading Data for XOM from yahoo (10|29) - Nice! ## Downloading Data for GE from yahoo (11|29) - Well done! ## Downloading Data for GS from yahoo (12|29) - Well done! ## Downloading Data for HD from yahoo (13|29) - Nice! ## Downloading Data for IBM from yahoo (14|29) - Boa! ## Downloading Data for INTC from yahoo (15|29) - Nice! ## Downloading Data for JNJ from yahoo (16|29) - Got it! ## Downloading Data for JPM from yahoo (17|29) - Nice! ## Downloading Data for MCD from yahoo (18|29) - Got it! ## Downloading Data for MRK from yahoo (19|29) - OK! ## Downloading Data for MSFT from yahoo (20|29) - Nice! ## Downloading Data for NKE from yahoo (21|29) - Good stuff! ## Downloading Data for PFE from yahoo (22|29) - Good job! ## Downloading Data for PG from yahoo (23|29) - Got it! ## Downloading Data for TRV from yahoo (24|29) - Good job! ## Downloading Data for UNH from yahoo (25|29) - Boa! ## Downloading Data for UTX from yahoo (26|29) - Good stuff! ## Downloading Data for VZ from yahoo (27|29) - Well done! ## Downloading Data for WMT from yahoo (28|29) - Got it! ## Downloading Data for DIS from yahoo (29|29) - Good stuff!
widestock <- dcast(data$df.tickers[,6:8], ref.date ~ ticker, value.var="price.adjusted") logrets <- apply(widestock[,2:30], 2, function(x) diff(log(x), lag=1))
Correlation matrices are really easy!
first100cor <- cor(logrets[1:100,], method = "pearson") second100cor <- cor(logrets[101:200,], method = "pearson")
Let's make a matrix by looking at a "super-level set," basically only showing edges when stocks have highly correlated log returns. Notice I'm making a "distance" matrix out of the correlation matrix.
highcor100mat <- sqrt(2*(first100cor+1)) highcorsuperlevel100 <- ifelse (highcor100mat>1.8, highcor100mat,0) network=graph_from_adjacency_matrix(highcorsuperlevel100, weighted=TRUE, mode="undirected", diag=F)
plot(network)
Look now at 100 days starting December 21, 2007.
And now October-November-December 2008
We can make a persistence diagram for the stock data as well, to see if any "circles" exist or persist.
first100persistence <- ripsDiag(highcor100mat,maxscale=2,maxdimension = 1,dist="arbitrary") plot(first100persistence[["diagram"]])
Persistence diagram for the next 200 days, still in 2004-2005.
Persistence diagram for the network from 2007.
Persistence diagram for the network from 2007.