In finance/stock trading, **correlation** is just a measure of the extent at which two equities behave with respect to each other.# Below, we build the correlation matrix from `final_df` and store it in the variable `price_corr`. The matrix provides us with the corresponding **correlation coefficients (-1.0 to +1.0)** for all stock pairs in the list of companies. [Straightforward](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.corr.html), isn't it? # In[12]: price_corr = final_df.corr() # When two assets are *positively correlated*, it means that the general trends of the two stocks are similar; when one goes up, the other one goes up as well. When, on the other hand, two assets are *negatively correlated*, their trends go in opposite directions. # # I made a quick sketch below to illustrate these relationships. # # ![alt text](./img/corr.png "Correlation") # # For a more concrete example, let us have a look at the relationships between (`$ABS` and `$ACR`), (`$FMETF` and `$ALI`) and (`$BLOOM` and `$BDO`); the `$` sign is the hashtag used for stock quotes. I chose these securities based on the correlation values of the pairs in `price_corr`. Can you tell which of the pairs are positively correlated? Negatively correlation? # In[13]: ## ABS and ACR plt.figure(figsize=(8,5)) plt.plot(final_df['ABS'], final_df["ACR"], '.', markersize=10) plt.xlabel("Price of $ABS (PhP)") plt.ylabel("Price of $ACR (PhP)") _ = plt.show() # In[14]: ## FMETF and ALI plt.figure(figsize=(8,5)) plt.plot(final_df['FMETF'], final_df["ALI"], '.', markersize=10) plt.xlabel("Price of $FMETF (PhP)") plt.ylabel("Price of $ALI (PhP)") _ = plt.show() # In[15]: ## BLOOM and BDO plt.figure(figsize=(8,5)) plt.plot(final_df['BLOOM'], final_df["BDO"], '.', markersize=10) plt.xlabel("Price of $BLOOM (PhP)") plt.ylabel("Price of $BDO (PhP)") _ = plt.show() # Below, we draw the heatmap of the resulting correlation matrix. # In[16]: ## Source: https://stanford.edu/~mwaskom/software/seaborn/examples/many_pairwise_correlations.html ## Generate a mask for the upper triangle mask = np.zeros_like(price_corr, dtype=np.bool) mask[np.triu_indices_from(mask)] = True ## Set up the matplotlib figure f, ax = plt.subplots(figsize=(15, 15)) ## Generate a custom diverging colormap cmap = sns.diverging_palette(220, 10, as_cmap=True) ## Draw the heatmap with the mask and correct aspect ratio sns.heatmap(price_corr, mask=mask, cmap=cmap, vmax=.3, square=True, xticklabels=2, yticklabels=2, linewidths=.5, cbar_kws={"shrink": .5}, ax=ax) _ = plt.show() # --- # ### Step 2: Build Distance Matrix # As mentioned in the blog post, we will use two distance metrics in building the distance matrices. The first metric is from Bonanno et al. # # \begin{equation}d_{ij} = \sqrt{2 \times (1 - c_{ij})}\end{equation} # # where $c_{ij}$ is the correlation cofficient of stocks $i$ and $j$. In the equation, when $c_{ij}=1$, $d_{ij}=0$; and, when $c_{ij}=-1$, $d_{ij}=2$. That is, when there is a perfectly positive correlation (+1), the distance is 0; and, when the correlation is perfectly negative, the distance is the farthest at 2. The next distance measure is from [SliceMatrix](http://SliceMatrix.com) (mktstk) and it is given by # # \begin{equation}d_{ij} = 1 - |c_{ij}|.\end{equation} # # This equation does not distinguish between a positively or a negatively correlated pair of stocks; as long as two stocks are highly correlated, the distance is minimized. # # Here, we define the distance matrices as `dist_bonanno` and `dist_mktstk`. # In[17]: dist_bonanno = np.sqrt(2*(1-(price_corr))) dist_mktstk = 1-abs(price_corr) ## I am just defining the labels labs_bonanno = list(dist_bonanno.index) labs_mktstk = list(dist_mktstk.index) # --- # ### Step 3: Build Minimum Spanning Tree (MST) # # Now, we are ready to build the minimum spanning tree. The idea is to connect the ones that have the closest distance to each other, i.e. connect those that are highly correlated. Let's first build the "weighted" networks `G_bonanno` and `G_mktstk` from the distance matrices `dist_bonanno` and `dist_mktstk`, respectively. Using the Python package NetworkX, that's pretty straightforward to do. # In[18]: G_bonanno = nx.from_numpy_matrix(dist_bonanno.as_matrix()) G_mktstk = nx.from_numpy_matrix(dist_mktstk.as_matrix()) # Once we have the distance networks, we can already build minimum spanning trees (MST). Here, we use Kruskal's algorithm. Below is the pseudo-code copied from the Wikipedia entry on the algorithm. # #

# KRUSKAL(G): # 1 A = ∅ # 2 foreach v ∈ G.V: # 3 MAKE-SET(v) # 4 foreach (u, v) in G.E ordered by weight(u, v), increasing: # 5 if FIND-SET(u) ≠ FIND-SET(v): # 6 A = A ∪ {(u, v)} # 7 UNION(u, v) # 8 return A ## # Again, we can use NetworkX to build the MST with the graphs as inputs. # In[19]: MST_b = nx.minimum_spanning_tree(G_bonanno) MST_m = nx.minimum_spanning_tree(G_mktstk) # Finally, let's add more attributes to the "nodes" or the stocks. The attributes that I want to include here are: # # - `label` (the stock symbol) # - `sector` (which sector the stock belongs) # - `change` (the $\%$ change of the stock for the period under study) # # This way, when we draw the MSTs, we can choose to color the nodes by either `sector` or `change`. # In[20]: change = (final_df.iloc[-1] - final_df.iloc[0]) * 100 / final_df.iloc[0] # In[21]: for node in MST_b.nodes(): sector = pse_companies[pse_companies["Stock Symbol"] == labs_bonanno[node]].Sector.iloc[0] MST_b.node[node]["sector"] = sector MST_b.node[node]["label"] = labs_bonanno[node] if math.isnan(change[labs_bonanno[node]]): MST_b.node[node]["color"] = "black" elif change[labs_bonanno[node]] < -10: MST_b.node[node]["color"] = "red" elif change[labs_bonanno[node]] > 10: MST_b.node[node]["color"] = "green" else: MST_b.node[node]["color"] = "blue" # In[22]: for node in MST_m.nodes(): sector = pse_companies[pse_companies["Stock Symbol"] == labs_mktstk[node]].Sector.iloc[0] MST_m.node[node]["sector"] = sector MST_m.node[node]["label"] = labs_mktstk[node] if math.isnan(change[labs_mktstk[node]]): #print change[labs_mktstk[node]], labs_mktstk[node] #Gm.node[node]["change"] = 101 MST_m.node[node]["color"] = "black" elif change[labs_mktstk[node]] < -10: MST_m.node[node]["color"] = "red" elif change[labs_mktstk[node]] > 10: MST_m.node[node]["color"] = "green" else: MST_m.node[node]["color"] = "blue" # ### Drawing the MSTs # In[23]: plt.figure(figsize=(10,10)) nx.draw_networkx(MST_b) # In[24]: plt.figure(figsize=(10,10)) nx.draw_networkx(MST_m) # ### Write out MSTs # Below, we write the MSTs as `gexf` files so we can use them in [Gephi](https://gephi.org/) (open-source and free) to generate much prettier networks/trees. # #

Gephi is the leading visualization and exploration software for all kinds of graphs and networks. Gephi is open-source and free.# # In[25]: nx.write_gexf(MST_b, "corrmat_bonanno.gexf") nx.write_gexf(MST_m, "corrmat_mktstk.gexf") # Below is the resulting network (`MST_b`) drawn using Gephi. # ![MST Bonanno](./img/mst_bonanno.png "MST Bonanno") # In[ ]: