Jared Cohen 5/7/2019

Introduction

Diversification is a cornerstone of Modern Portfolio Theory. Essentially, by diversifying financial assets into different categories, one can minimize risk while optimizing returns at a certain risk level. One layer of diversification involves the usage of separate assets classes, such as equities, bonds, and precious metals. Beyond asset class diversification, it is important to diversify within an asset class. The global equities market, for example, has myriad types of investments varying based on sector, industry, geography, size, and other factors. By spreading money across a greater variety of investment types, risk can be reduced, and investors can gain exposure to specific domains of the economy without taking on undue risk.

As a result of this desire to access returns of specifc areas of the economy while limiting risk, exchange-traded funds (ETFs) have massively grown in popularity. The fund invests in many different securities, and investors can trade shares of the fund. This allows investors to gain exposure to certain types of investments that would otherwise be too risky to invest in.

A popular trend within the realm of funds is the low-cost fund, or index fund. These ETFs track broad indices or follow broad strategies and are passively managed. Because there is no active management, fees are lower, which allows investors to keep more of their returns.

An important element about diversification is the correlation between different assets. In a perfectly diversified portfolio, losses in some positions would be offset by gains in others. For this to work effectively, different securities should have low or negative correlations. This paper will analyze a selection of low-cost ETFs with different strategies in order to gain an understanding of the patterns of correlation between them.

Data

The following dataset contains historical daily closing prices from 4/1/2014 until 3/31/2019 for 13 different ETFs. This data was retrieved from Yahoo Finance. A separate CSV file was downloaded for each ETF. The column of closing prices from each file was copied into a new CSV file. This CSV file was uploaded as “ETF_prices.csv”

An additional CSV file “ETF_edges.csv” was created. This file contains an edge list, where nodes are the ETFs. Edges represent a correlation in closing price, and all nodes are connected. The CSV file has a column containing the correlation coefficients, and the edge weights are equal to the correlation coeficients multiplied by 100 and then rounded to the nearest integer.

The 13 ETFs are the following:

  1. SPDR S&P 500 ETF (SPY) This fund tracks the performance of the S&P 500 index
  2. Vanguard S&P 500 ETF (VOO) This fund also tracks the performance of the S&P 500 index
  3. Invesco QQQ Trust (QQQ) This fund tracks the performance of the NASDAQ index
  4. Vanguard Total Stock Market Index Fund ETF (VTI) This fund tracks the performance of the entirety of the US stock market
  5. Vanguard Total International Stock Index Fund ETF (VXUS) This fund tracks the overall performance of stock markets around the world
  6. Vanguard FTSE Developed Markets Index Fund ETF (VEA) This fund tracks the performance of the stock markets of developed economies outside North America, such as Japan and Western European nations
  7. Vanguard FTSE Emerging Markets Index Fund ETF (VWO) This fund tracks the overall performance of the stock markets in many
    developing economies
  8. Vanguard High Dividend Yield Index Fund ETF (VYM) This fund focuses on US large-cap stocks that pay dividends
  9. Vanguard Dividend Appreciation Index Fund ETF (VIG) This fund tracks the performance of the NASDAQ US Dividend Achievers Select Index, which is comprised of US large-cap dividend-paying stocks that exhibit characteristics of growth
  10. Vanguard Value Index Fund ETF (VTV) This fund tracks the performance of the MSCI US Prime Market Value Index, focusing on US stocks that adhere to the value style of investing
  11. Vanguard Growth Index Fund ETF (VUG) This fund tracks the MSCI US Prime Market Growth Index, focusing on large-cap US stocks that adgere to the growth style of investing
  12. Vanguard Mid-Cap Value Index Fund ETF (VOE) This fund tracks the performance of mid-cap US stocks, following a value investing style
  13. Vanguard Small-Cap Growth Index Fund ETF (VBK) This fund tracks the performance of small-cap US stocks, following a growth investing style

All of these ETFs are low-cost, passively managed funds that hold hundreds of positions. The purpose of these types of funds is to provide investors with wide-exposure to a specific area of the market while diversifying between differnet equities within that area. Some of these funds simply track major indices like the S&P 500 or the NASDAQ, while others don’t track a specific index but rather employ a broad strategy in order to generate returns that are representative of that segment of the market. For example, the strategy of VWO is to invest in foreign, emerging markets that are capable of growly more quickly than more established markets. VTV, however, focuson on large US value companies. A value company may or may not have very high growth rates; the core of the value investing style is to buy stocks at a discount to their intrinsic value. In fact, long-term stock market analyses have revealed the merits of value investing. One study at Yale University demonstrated that buying a stock above a fair valuation materially decreased total returns for up to 30 years into the future (refernece to Yale article).

(ETF data from ETFdb.com)

The last row was removed from ETF_prices because an empty row was automatically added to the CSV file after saving it. ETF_prices was also trimmed to contain only the first 14 columns for the same reason. This was needed to ensure that all values contained in the CSV file were numeric, which was needed for the following calculations.

library(readr)
library(igraph)

Attaching package: ‘igraph’

The following objects are masked from ‘package:stats’:

    decompose, spectrum

The following object is masked from ‘package:base’:

    union
ETF_prices <- read_csv("ETF_prices.csv")
Missing column names filled in: 'X15' [15], 'X16' [16]Parsed with column specification:
cols(
  Date = col_character(),
  SPY = col_double(),
  VOO = col_double(),
  QQQ = col_double(),
  VTI = col_double(),
  VXUS = col_double(),
  VEA = col_double(),
  VWO = col_double(),
  VYM = col_double(),
  VIG = col_double(),
  VTV = col_double(),
  VUG = col_double(),
  VOE = col_double(),
  VBK = col_double(),
  X15 = col_logical(),
  X16 = col_logical()
)
ETF_prices <- ETF_prices[-nrow(ETF_prices),1:14]

Methods

Correlations were calculated between the vectors of historical closing price data for each pair of ETFs. The correlations were used to give proportional edge weights in a complete network. The edge weight was set to the nearest integer to 100 times the correlation coefficient.

A heat map was created to visualize the correlation coefficients betweren each pair of ETFs. A histogram was also plotted to visulaize the distribution of correlation coefficents bewtween ETFs. The graph of the ETFs was plotted using a force directed layout, so that nodes with heavier edges between them were closer together. Edge thickness was also set proportionally to edge weight in order to visualize the relative correlary strengths,

The network was then run through community detection algorithm, focusing on modularity. These algorithms were also run on an unweighted version if the graph to establish a baseline. Another plot was then generated to visualize the differnet communities in the ETF network.

These methods were then redone with price data from each individual year within the five year sample in order to compare the different results.

Results

summary(ETF_prices[,-1])
      SPY             VOO             QQQ              VTI              VXUS            VEA       
 Min.   :181.5   Min.   :166.3   Min.   : 84.11   Min.   : 92.56   Min.   :39.73   Min.   :32.23  
 1st Qu.:205.0   1st Qu.:188.1   1st Qu.:105.86   1st Qu.:105.16   1st Qu.:47.09   1st Qu.:37.44  
 Median :216.6   Median :198.8   Median :117.11   Median :111.21   Median :50.70   Median :40.15  
 Mean   :229.8   Mean   :211.0   Mean   :128.77   Mean   :118.28   Mean   :50.58   Mean   :40.05  
 3rd Qu.:259.8   3rd Qu.:238.6   3rd Qu.:155.85   3rd Qu.:133.71   3rd Qu.:53.82   3rd Qu.:42.39  
 Max.   :293.6   Max.   :269.8   Max.   :186.74   Max.   :151.31   Max.   :61.17   Max.   :47.88  
      VWO             VYM             VIG              VTV              VUG              VOE        
 Min.   :28.55   Min.   :60.85   Min.   : 72.10   Min.   : 73.67   Min.   : 89.98   Min.   : 75.33  
 1st Qu.:37.36   1st Qu.:68.00   1st Qu.: 79.61   1st Qu.: 82.91   1st Qu.:105.43   1st Qu.: 87.95  
 Median :40.80   Median :72.93   Median : 84.77   Median : 87.50   Median :111.52   Median : 93.25  
 Mean   :40.18   Mean   :74.61   Mean   : 88.60   Mean   : 91.87   Mean   :120.53   Mean   : 96.75  
 3rd Qu.:43.33   3rd Qu.:82.46   3rd Qu.: 99.55   3rd Qu.:102.13   3rd Qu.:139.24   3rd Qu.:106.14  
 Max.   :50.98   Max.   :90.91   Max.   :112.45   Max.   :113.26   Max.   :161.48   Max.   :117.78  
      VBK       
 Min.   :100.6  
 1st Qu.:124.5  
 Median :134.5  
 Mean   :141.0  
 3rd Qu.:158.6  
 Max.   :190.0  

Calculate correlation matrix

ETF_Cor_matrix <- cor(ETF_prices[,-1])
ETF_Cor_matrix
           SPY       VOO       QQQ       VTI      VXUS       VEA       VWO       VYM       VIG
SPY  1.0000000 0.9999616 0.9917854 0.9992553 0.5787891 0.6075434 0.4949917 0.9863730 0.9934929
VOO  0.9999616 1.0000000 0.9917455 0.9992486 0.5782746 0.6069869 0.4946115 0.9866153 0.9935335
QQQ  0.9917854 0.9917455 1.0000000 0.9893916 0.5211175 0.5546232 0.4288681 0.9633972 0.9822217
VTI  0.9992553 0.9992486 0.9893916 1.0000000 0.5966961 0.6249407 0.5129306 0.9848334 0.9918898
VXUS 0.5787891 0.5782746 0.5211175 0.5966961 1.0000000 0.9929997 0.9573807 0.5804221 0.5673828
VEA  0.6075434 0.6069869 0.5546232 0.6249407 0.9929997 1.0000000 0.9188415 0.6021841 0.5911180
VWO  0.4949917 0.4946115 0.4288681 0.5129306 0.9573807 0.9188415 1.0000000 0.5146943 0.4933169
VYM  0.9863730 0.9866153 0.9633972 0.9848334 0.5804221 0.6021841 0.5146943 1.0000000 0.9818726
VIG  0.9934929 0.9935335 0.9822217 0.9918898 0.5673828 0.5911180 0.4933169 0.9818726 1.0000000
VTV  0.9932698 0.9933849 0.9729177 0.9934355 0.6021431 0.6248710 0.5310698 0.9951071 0.9884243
VUG  0.9953226 0.9952348 0.9965622 0.9947169 0.5664017 0.5992106 0.4738464 0.9679061 0.9873790
VOE  0.9709235 0.9710278 0.9434570 0.9751733 0.6573828 0.6827102 0.5823031 0.9792913 0.9528102
VBK  0.9721199 0.9719318 0.9639486 0.9787801 0.6414844 0.6682942 0.5549126 0.9376058 0.9669227
           VTV       VUG       VOE       VBK
SPY  0.9932698 0.9953226 0.9709235 0.9721199
VOO  0.9933849 0.9952348 0.9710278 0.9719318
QQQ  0.9729177 0.9965622 0.9434570 0.9639486
VTI  0.9934355 0.9947169 0.9751733 0.9787801
VXUS 0.6021431 0.5664017 0.6573828 0.6414844
VEA  0.6248710 0.5992106 0.6827102 0.6682942
VWO  0.5310698 0.4738464 0.5823031 0.5549126
VYM  0.9951071 0.9679061 0.9792913 0.9376058
VIG  0.9884243 0.9873790 0.9528102 0.9669227
VTV  1.0000000 0.9779297 0.9810755 0.9593473
VUG  0.9779297 1.0000000 0.9544401 0.9773931
VOE  0.9810755 0.9544401 1.0000000 0.9458348
VBK  0.9593473 0.9773931 0.9458348 1.0000000

Create a heat map to visualize the relative strengths of correlation between ETFs.

library(ggplot2)
library(reshape2)
cormat <- ETF_Cor_matrix2
# Get lower triangle of the correlation matrix
  get_lower_tri<-function(cormat){
    cormat[upper.tri(cormat)] <- NA
    return(cormat)
  }
  # Get upper triangle of the correlation matrix
  get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)]<- NA
    return(cormat)
  }
  upper_tri <- get_upper_tri(cormat)
melted_cormat <- melt(upper_tri, na.rm = TRUE)
ggplot(data = melted_cormat, aes(Var2, Var1, fill = value))+
 geom_tile(color = "white")+
 scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
   name="Pearson\nCorrelation") +
  theme_minimal()+ 
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
    size = 12, hjust = 1))+
 coord_fixed()

These values and this histogram illustrate the distribution of correlation coefficients between ETFs. The majority of the ETFs have an r value above 0.9, which is a strong correlation. However, there is a group of some pairs of ETFs with r values between 0.4 and 0.7, which is a moderately strong, positive correlation. This does not include self-edges. Additionally, it would be possible to look remove the less correlatated pairs in order to gain insight into the differences in correlations of the pairs in the 0.9 to 1.0 range; however, any differences on that small a scale would likely be practically insignificant due to random market fluctuations. The primary insight from this is that the majority of ETFs are extremely highly correlated with each other while a few ETFs had only a moderate correlation, meaning there was a greater divergence in price patterns, which may reflect differences in overall performance.

summary(ETF_edges$Correlation)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4289  0.5999  0.9637  0.8217  0.9872  1.0000 
hist(ETF_edges$Correlation, main="Frequency Of Correlation Coefficients between Edges",xlab="Correlation Coefficient")

#Create an unweighted, complete iGraph object and a weighted, complete iGraph object with each ETF as a node
ETFgraph <- graph.data.frame(ETF_edges, directed = FALSE)
ETFgraph.weighted <- ETFgraph
E(ETFgraph.weighted)$weight <- E(ETFgraph.weighted)$Weight

This plot illustrates the relationships between these ETFs (nodes). Thicker edges indicate a stronger correlation. The use of a force layout causes the nodes to be placed such that the nodes connected with greater weight (more highly correlated) are closer together. This plot gives the impression that VXUS, VWO, and VEA are loosely connected to all other nodes while the remaining nodes are tightly interconnected.

set.seed(50)
E(ETFgraph.weighted)$width <- E(ETFgraph.weighted)$weight/42
plot(ETFgraph.weighted, layout=layout_with_fr)

These calculations detect communities in the unweighted version of the graph in order to provide a baseline for comparison to the weighted graph. The unweighted graph is basically one large group, with only VBK occupying a separate community. The modularity of close to zero indicates that overall, the unweighted graph lacks much of a sense of communities anywhere,

ETFcommunity <- fastgreedy.community(ETFgraph)
length(ETFcommunity)
[1] 2
sizes(ETFcommunity)
Community sizes
 1  2 
12  1 
membership(ETFcommunity)
 SPY  VOO  QQQ  VTI VXUS  VEA  VWO  VYM  VIG  VTV  VUG  VOE  VBK 
   1    1    1    1    1    1    1    1    1    1    1    1    2 
modularity(ETFcommunity)
[1] -1.908196e-17

These calculations detect communites in the weighted graph. The graph is split into two groups. One group has VXUS, VEA, and VWO, and the other group has the other 10 nodes. The modularity of 0.026, while still small, is orders of magnitide greater than the baseline modularity, indicating that the relative edge weights based on correlation coefficeints did have a material impact in the community structure of the ETFs.

ETFcommunity.weighted <- fastgreedy.community(ETFgraph.weighted)
length(ETFcommunity.weighted)
[1] 2
sizes(ETFcommunity.weighted)
Community sizes
 1  2 
10  3 
membership(ETFcommunity.weighted)
 SPY  VOO  QQQ  VTI VXUS  VEA  VWO  VYM  VIG  VTV  VUG  VOE  VBK 
   1    1    1    1    2    2    2    1    1    1    1    1    1 
modularity(ETFcommunity.weighted)
[1] 0.02603699

This plot illustrates the separation of the two communites. VEA, VXUS, and VWO occupy their own small communinty aside form the majority of nodes. In the earlier plot, these three ETFs appeared to be more peripheral, and the community detection algorithm confirms that idea.

set.seed(50)
plot(ETFcommunity.weighted,ETFgraph.weighted)

Looking at just April 1, 2014 until March 31 2015, the majority of the correlation coefficients between ETFs are above 0.8; however, instead of a there being a group of moderately, positively correlatated ETFs, there is a group of weakly-moderately, negatively correlated ETFs. The presence of these negative correlations is indicative of higher amounts of variation between ETFs in shorter time periods while they tend to converge more over longer time periods.

ETF_edges_Y1 <- read_csv("/Users/jaredmacbook/Documents/R/MATH 190 data/MATH 190 Final Project/ETF_edges_Y1.csv")
Parsed with column specification:
cols(
  ETF1 = col_character(),
  ETF2 = col_character(),
  Weight = col_double(),
  Correlation = col_double()
)
summary(ETF_edges_Y1$Correlation)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.5276 -0.2285  0.8733  0.4601  0.9704  0.9998 
hist(ETF_edges_Y1$Correlation, main="Frequency Of Correlation Coefficients between Edges",xlab="Correlation Coefficient")

#Create an unweighted, complete iGraph object and a weighted, complete iGraph object with each ETF as a node
#Add 100 to each edge weight to turn negative edge weights positive while maintaining the same relative weighting
ETFgraph.Y1 <- graph.data.frame(ETF_edges_Y1, directed = FALSE)
E(ETFgraph.Y1)$weight <- (E(ETFgraph.Y1)$Weight + 100)
E(ETFgraph.weighted)$width <- E(ETFgraph.weighted)$weight/42
set.seed(50)
plot(ETFgraph.weighted, layout=layout_with_fr)

The correlation network from just the first year of the five year sample splits into the exact same two communities as the network that includes all five years. The modularity for this network is also about double that of the five year period network, indicating that these same communities are more tightly connected over this single year. This could mean that: 1) VXUS, VEA, and VWO were more correlated with each other during this period than overall 2) The other seven ETFs were more correlated with each other during this time period than overall 3) VXUS, VEA, and VWO were less correlated with the other seven ETFs during this period than overall The third seems most likely as it explains the meaningful number of negative correlations.

ETFcommunity.Y1 <- fastgreedy.community(ETFgraph.Y1)
length(ETFcommunity.Y1)
[1] 2
sizes(ETFcommunity.Y1)
Community sizes
 1  2 
10  3 
membership(ETFcommunity.Y1)
 SPY  VOO  QQQ  VTI VXUS  VEA  VWO  VYM  VIG  VTV  VUG  VOE  VBK 
   1    1    1    1    2    2    2    1    1    1    1    1    1 
modularity(ETFcommunity.Y1)
[1] 0.05622614
set.seed(50)
plot(ETFcommunity.Y1,ETFgraph.Y1)

Discussion

VXUS, VEA, and VWO emerged as a community both over the entire five year span and over the course of the first year within that. These funds are Vanguard Total International Stock Index Fund ETF (VXUS), Vanguard FTSE Developed Markets Index Fund ETF (VEA), and Vanguard FTSE Emerging Markets Index Fund ETF (VWO). It makes sense that these are highly correlated because they all focus on equities outside of North America while the other seven ETFs focus on American equities. One possible conclusion is that there is not much of a difference between the three internation ETFs despite one being focused on developed economies with another being focused on emerging economies. The correlation coefficient for those two ETFs over all five years was 0.92, indicating a very strong, positve correlation. Additionally, the fact that the other seven ETFs formed a community, combined with the fact that the majority of correlation coefficients were greater than 0.9, indicates that the overall performance of these funds was quite similar despite the funds varying in their approaches, whether that be growth, value, high-yield, dividend growth, large-cap, mid-cap, or small-cap. This high correlation implies that investment performance would not differ much between these seven funds, which means that attempting to diversify by investing in several of these funds would not be effective. It’s important to note, however, that the five year period being analyzed is entirely encompassed by a bull market. It is thus difficult to determine if these conclusions would hold true in a bear market, which is perhaps more important because a bear market is when reducing risk becomes most prudent.

Future Analysis

This analysis can be taken further in two sets of ways. First, the parameters of this analysis could be altered. For example, including more ETFs, including different time periods, covering a longer time period, using opening prices, or using daily average prices could possibly provide more insight. Additionally, analyzing a greater number of funds would allow for community detection at a higher resolution. Also, instead of creating a complete graph with varying edge weights corresponding to correlation strength, it may be interesting to only set the edge if the r value is above 0.5 or even 0.8. This would likely have a material affect of the nature and community structure of the graph. Secondly, the ETFs could be compared in other ways besides price correlations. For example, correlations in yearly total returns over time or similarity of holdings could also be analyzed. It would also be interesting to determine how accurate price correlation is for predicting relative capial appreciation.

Sources

ETFdb.com

Yahoo Finance

