PSY 381D Brain Connectivity, Spring 2019
Dynamic connectivity and brain states
April 8, 2019
The materials presented in this note require Scikit-learn (referred as sklearn
), a popular machine learning library for Python. Unfortunately I forgot to include this in the docker image sathayas/python-fsl-bundle
, so you have to install this right after you start your docker, before you start Jupyter notebook. You only need to run this command:
pip3 install scikit-learn
The data for today's demo is a resting-state fMRI data set from Leiden. The fMRI time series was acquired with TR=2.2s. There are 212 time points in the preprocessed data set. From the preprocessed fMRI data, mean ROI time series data were extracted with Craddock's Rt 2-level atlas with K=200.
All the files are located under the DataDynamicConn
directory.
Dynamic connectivity, unlike static connectivity, describe temporally varying connectivity over the duration of an experiment. One popular way to determine dynamic connectivity is by a sliding windows method. In a sliding window method, a segment of an fMRI time series data set is extracted (e.g., 45 time points out of 212 time points). The window width is choses so that data points from 60-100s are included in a window. Then a connectivity is calculated based on the extracted time series data. The process is repeated by selecting additional time segments:
This will results in a number of overlapping segments.
A simple sliding window has a boxcar shape, giving all time points the same weight. One shortcoming of this approach is that a distant observation at the edge of a window can potentially influence the connectivity. To overcome this problem, it is a common practice to use tapered windows, with distant observations given less weight. Details can be found in Betzel et al. paper.
Within each window, you can calculate a connectivity matrix describing association between different ROIs. This can be done by calculating a correlation matrix of the weighted data. Another popular approach is to use graphical LASSO to estimate the inverse correlation matrix (a.k.a., ICOV).
From the Leiden resting-state fMRI data. Here, the window width is 100s or 45 time points.
<CalcR.py>
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from sklearn.covariance import GraphicalLassoCV
from sklearn.preprocessing import StandardScaler
##### Function to extract time series data
def extract_winTS(X, timeStart, winSize):
# extracting the window
winX = X[(timeStart-winSize+1):(timeStart+1)]
# calculating the weights
th = winSize / 3
w0 = (1-np.exp(-1/th))/(1-np.exp(-winSize/th))
wt = []
for iT in range(1,winSize+1):
w = w0 * np.exp((iT - winSize)/th)
wt.append(w)
wt = np.array(wt).reshape(winSize,1)
# weighting with wt
wwinX = wt * winX
return wwinX
##### Parameters
TR = 2.2 # TR
nTime = 212 # number of time points
winSizeS = 100 # window width in S
winSizeTR = round(winSizeS / TR) # window width in TR
##### loading the data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200.npz')
ts = infile['ts']
nodes = infile['nodes']
xyz = infile['xyz']
ts_std = StandardScaler().fit_transform(ts) # scaling the data
The original time series data is standardized by the StandardScaler
from Scikit-learn so that all variables have zero mean and unit variance.
##### plotting the weight and weighted data
# calculating weights
th = winSizeTR / 3
w0 = (1-np.exp(-1/th))/(1-np.exp(-winSizeTR/th))
wt = []
for iT in range(1,winSizeTR+1):
w = w0 * np.exp((iT - winSizeTR)/th)
wt.append(w)
wt = np.array(wt).reshape(winSizeTR,1)
# extracting weighted time series
wts = extract_winTS(ts_std, winSizeTR, winSizeTR)
# plotting them
plt.figure(figsize=[6,3])
plt.subplot(121)
plt.plot(wt)
plt.xlabel('Time points')
plt.xticks([0,winSizeTR],['tau = 1', 'tau = L'])
plt.ylabel('Weight')
plt.title('Weights')
plt.subplot(122)
for iNode in range(len(nodes)):
plt.plot(wts[:,iNode])
plt.xlabel('Time points')
plt.xticks([0,winSizeTR],['t - L', 't'])
plt.ylabel('Weighted time series')
plt.title('Weighted time series')
plt.subplots_adjust(left=0.125, right=0.975,
bottom=0.15, top=0.90, wspace=0.4)
plt.show()
Here, the left panel shows the shape of the tapered window used in this calculation. The right panel shows the time series data weighted with the tapered weights. Based on these weighted data, a correlation matrix is calculated.
There are 212 time points in the original fMRI data, and each window contains 45 time points. Thus it is possible to form 167 windows -- thus 167 correlation matrices. Those correlation matrices are stored in an .npz
file to be used later.
###### Calculating correlation across windows
Rmat = []
for iTime in range(winSizeTR,nTime):
# extracting weighted time series
wts = extract_winTS(ts_std, iTime, winSizeTR)
# correlation matrix
R = np.corrcoef(wts, rowvar=False)
# zeroing the main diagonal
for i in range(R.shape[0]):
R[i,i] = 0
# keeping R
Rmat.append(R)
Rmat = np.array(Rmat)
###### Saving the Rmat for future use
np.savez('DataDynamicConn/Leiden_sub39335_Rt2_K200_Rmat.npz',
Rmat = Rmat,
winSize = winSizeTR,
nodes = nodes,
xyz = xyz)
Now we have a series of correlation matrices. We form networks at each time point (i.e., each correlation matrix) with target average degree of 20. Then we will examine how network properties changes over time. To do so, we calculate the local and global efficiencies at each node over time. Unfortunately networkx
does not allow calculation of efficiencies at the nodal-level, I have written some functions to calculate efficiencies at nodal levels (eloc_node
and eglob_node
; see the code below).
<Efficiency.py>
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
###### Network construction functions, based on correlation matrix
def net_builder_HardTh(R, NodeInd, K, cType=1):
'''
a function to construct the network by the hard-thresholding.
input parameters:
R: A dense correlation matrix array.
NodeInd: A list of nodes in the network.
K: The target K, the average connections at each node
cType: Type of functional connectivity.
1: Positive correlation only
0: Both positive and negative correlations
-1: Negative correlation only
The default is 1 (i.e., positive correlation only).
returns:
G: The resulting graph (networkX format)
'''
# first, initialize the graph
G = nx.Graph()
G.add_nodes_from(NodeInd)
NNodes = R.shape[0]
# upper triangle of the correlation matrix only
I,J = np.triu_indices(NNodes,1)
# creating a vector of correlation coefficients, depending on cType
if cType==1:
VecR = np.squeeze(np.array(R[I,J]))
elif cType==0:
VecR = np.squeeze(np.array(abs(R[I,J])))
elif cType==-1:
VecR = np.squeeze(np.array(-R[I,J]))
# the number of elements is too big, so we truncate it
# first, find the appropriate threshold for R
NthR = 0
tmpRth = 0.95
StepTh = 0.05
while NthR<K*NNodes/2.0:
tmpRth -= StepTh
#print('Threshold = %.2f' % tmpRth)
NthR = len(np.nonzero(VecR>tmpRth)[0])
# second, truncate the variables
IndVecR = np.nonzero(VecR>tmpRth)
thVecR = VecR[IndVecR]
thI = I[IndVecR]
thJ = J[IndVecR]
# sort the correlation values
zipR = zip(thVecR, thI, thJ)
zipsR = sorted(zipR, key = lambda t: t[0], reverse=True)
sVecR, sI, sJ = zip(*zipsR)
# then adding edges
m = int(np.ceil(K*NNodes/2.0)) # the number of edges
trI = np.array(sI[:m])
trJ = np.array(sJ[:m])
Elist = np.vstack((NodeInd[trI], NodeInd[trJ])).T
G.add_edges_from(Elist)
RTh = sVecR[m-1] # the threshold
# finally returning the resultant graph and the threshold
return G
####### functions to calculate nodal local efficiencies
def eloc_node(G, xNode):
'''
A function to calculate the nodal local efficiency
from a node xNode.
input parameters:
G: A graph in networkX format.
xNode: The node where the nodal global efficiency is calculated.
returns:
Eloc: The nodal local efficiency at node xNode.
'''
subG = subgraph(G, xNode)
#Eloc, tmpEloci, tmpNodes = eglob_net(subG)
NNodes = len(subG.nodes())
if NNodes>1:
#Dij = nx.all_pairs_shortest_path_length(subG)
Dij = nx.floyd_warshall(subG)
D = [Dij[i].values() for i in subG.nodes()]
cD = []
for irow in D:
cD += irow
NZD = np.array(cD)[np.nonzero(cD)]
if len(NZD)>0:
Eloc = (1.0/(NNodes*(NNodes-1.0))) * np.sum(1.0/NZD)
else:
Eloc = 0
else:
Eloc = 0
return Eloc
def subgraph(G, xNode):
''''
A function to extract a subgraph of a node xNode
input parameters:
G: A graph in networkX format.
xNode: The node where the nodal global efficiency is calculated.
returns:
subG: A subgraph of G, containing neighbors of xNode but not xNode
itself.
'''
subNodes = list(nx.all_neighbors(G, xNode))
Edges = G.edges()
subEdges = [] #create list of subgraph edges
for iEdge in Edges:
if (iEdge[0] in subNodes and iEdge[1] in subNodes):
subEdges.append(iEdge)
subG = nx.Graph() # create subgraph
subG.add_nodes_from(subNodes) #populate subgraph with nodes
subG.add_edges_from(subEdges) # populate subgraph with edges
return subG
def eglob_node(G, xNode):
'''
A function to calculate the nodal global efficiency
from a node.
input parameters:
G: A graph in networkX format.
xNode: The node where the nodal global efficiency is calculated.
returns:
Eglob: The nodal blobal efficiency at xNode.
'''
NNodes = len(G.nodes())
Dx = list(nx.single_source_shortest_path_length(G, xNode).values())
indZ = np.nonzero(np.array(Dx)==0)[0]
nzDx = np.delete(Dx, indZ)
if len(nzDx)>0:
Eglob = (1.0/(NNodes-1.0)) * np.sum(1.0/nzDx)
else:
Eglob = 0
# returning the nodal global efficiency
return Eglob
##### Parameters
targetDeg = 20
##### Loading the data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Rmat.npz')
Rmat = infile['Rmat']
nodes = infile['nodes']
xyz = infile['xyz']
Here is a loop to construct a network and calculate nodal local and global efficiencies at each time point.
###### Loop over time points
ElocMat = np.zeros((len(nodes),Rmat.shape[0]))
EglobMat = np.zeros((len(nodes),Rmat.shape[0]))
for iTime in range(Rmat.shape[0]):
print('Working on time point %d' % iTime)
###### extracting a network
G = net_builder_HardTh(Rmat[iTime,:,:], nodes, targetDeg)
###### calculating local and global efficiency
ElocList = []
EglobList = []
for iNode in nodes:
Eloc = eloc_node(G,iNode)
ElocList.append(Eloc)
Eglob = eglob_node(G,iNode)
EglobList.append(Eglob)
# saving it for later
ElocMat[:,iTime] = ElocList
EglobMat[:,iTime] = EglobList
##### saving to a file so that we don't have to recalculate
np.savez('DataDynamicConn/Leiden_sub39335_Rt2_K200_Efficiency.npz',
ElocMat = ElocMat,
EglobMat = EglobMat,
nodes = nodes,
xyz = xyz)
But it takes a few minutes to run, so I just load the pre-cooked results.
##### loading from the file to save time
f = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Efficiency.npz')
ElocMat = f['ElocMat']
EglobMat = f['EglobMat']
nodes = f['nodes']
xyz = f['xyz']
And let's plot the efficiencies over time.
##### plotting efficiency over time
plt.figure(figsize=[9,5])
plt.subplot(121)
plt.imshow(ElocMat, cmap=plt.cm.rainbow)
plt.title('Local efficiency')
plt.xlabel('Time')
plt.ylabel('Nodes')
plt.colorbar()
plt.subplot(122)
plt.imshow(EglobMat, cmap=plt.cm.rainbow)
plt.title('Global efficiency')
plt.xlabel('Time')
plt.ylabel('Nodes')
plt.colorbar()
plt.show()
Here, rows represent different nodes, and columns represent time. As you can see, there are some time points where efficiencies change dramatically (manifesting as a vertical edge, more apparent in global efficiency).
So, there seems to be system-wide changes occurring at different points during this experiment. In fact, so-called resting-state is made up of distinct states with similar connectivity patters. These states are discrete, and transitions from one state to another in a short period of time.
We can identify different brain states by K-means clustering, an unsupervised machine learning method.
K-means clustering is a machine learning method that groups observations into K distinct groups according to their similarities. Here, we cluster time points into different clusters of similar connectivity patterns. To do so, we re-organize the correlation matrix at each time point into a single vector.
<Clustering.py>
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
####### Loading the data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Rmat.npz')
Rmat = infile['Rmat']
nNodes = Rmat.shape[-1]
nTime = Rmat.shape[0]
indR = np.triu_indices(nNodes,1) # indices for the upper triangle of R
# initializing the data array, rows = correlation, cols = time
Rdata = np.zeros((nTime,len(indR[0])))
for iTime in range(nTime):
R = Rmat[iTime,:,:]
Rdata[iTime,:] = R[indR]
One downside of K-means clustering is that you have to know K, the number of clusters, before you run the clustering algorithm. If K is unknown, then you have to estimate K. One way to do this is by generating a Scree plot. In particular, we perform K-means clustering with different values of K. Then plot the squared sum of distances between the cluster centroid and observations (known as inertia) over K.
###### Clustering --- Figuring out the number of clusters
# determinging the number of clusters (up to 30 clusters)
SSE = []
for iClus in range(1,31):
#print('Number of clusters: %d' % iClus)
# K-means clustering
km = KMeans(n_clusters=iClus) # K-means with a given number of clusters
km.fit(Rdata) # fitting the principal components
SSE.append(km.inertia_) # recording the sum of square distances
# plotting the sum of square distance
plt.plot(np.arange(1,31),SSE,marker = "o")
plt.xlabel('Number of clusters')
plt.ylabel('Sum of sq distances')
plt.show()
Once you have a Scree plot, you look for an elbow. The elbow is defined as the point where the rate of decrease stabilizes. This criterion is somewhat subjective. In our case, we will go with K=6 as the elbow.
###### Clustering --- with K=6
km = KMeans(n_clusters=6) # defining the clustering object
km.fit(Rdata) # actually fitting the data
y_clus = km.labels_ # clustering info resulting from K-means
y_cent = km.cluster_centers_ # centroid coordinates
We have cluster assignments stored in an array y_clus
. We can plot y_clus
and the global efficiency over time.
####### plotting cluster over time
f = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Efficiency.npz')
EglobMat = f['EglobMat']
xyz = f['xyz']
nodes = f['nodes']
plt.figure(figsize=[4,7])
plt.subplot(211)
plt.plot(y_clus)
plt.title('States over time')
plt.xlabel('Time')
plt.ylabel('State')
plt.xlim(1,nTime)
plt.subplot(212)
plt.imshow(EglobMat, cmap=plt.cm.rainbow, aspect='auto')
plt.title('Global efficiency')
plt.xlabel('Time')
plt.ylabel('Nodes')
plt.xlim(1,nTime)
plt.subplots_adjust(left=0.15, right=0.975, top=0.95, bottom=0.075,
hspace=0.4)
plt.show()
As you may notice, state changes corresponds to vertical edges in the global efficiency plot.
We shall save the cluster assignment, as well as the cluster centroids (y_cent
for later use.
####### Saving the centroid information
np.savez('DataDynamicConn/Leiden_sub39335_Rt2_K200_Cluster.npz',
y_clus = y_clus,
y_cent = y_cent,
nodes = nodes,
xyz = xyz)
y_cent.shape
(6, 17578)
The centroid for each cluster from K-means clustering represent representative connectivity at that particular brain state. We can transform the centroid back to a connectivity matrix (correlation matrix), generate a network representative of that state, and determine the modular organization for that state.
<EigenModules.py>
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import community # Louvain method
from sklearn.metrics import adjusted_rand_score
##### Custom distinct color function --- to be used later
def get_cmap(n, name='hsv'):
'''Returns a function that maps each index in 0, 1, ..., n-1 to a distinct
RGB color; the keyword argument name must be a standard mpl colormap name.'''
return plt.cm.get_cmap(name, n)
###### Network construction functions, based on correlation matrix
def net_builder_HardTh(R, NodeInd, K, cType=1):
'''
a function to construct the network by the hard-thresholding.
input parameters:
R: A dense correlation matrix array.
NodeInd: A list of nodes in the network.
K: The target K, the average connections at each node
cType: Type of functional connectivity.
1: Positive correlation only
0: Both positive and negative correlations
-1: Negative correlation only
The default is 1 (i.e., positive correlation only).
returns:
G: The resulting graph (networkX format)
'''
# first, initialize the graph
G = nx.Graph()
G.add_nodes_from(NodeInd)
NNodes = R.shape[0]
# upper triangle of the correlation matrix only
I,J = np.triu_indices(NNodes,1)
# creating a vector of correlation coefficients, depending on cType
if cType==1:
VecR = np.squeeze(np.array(R[I,J]))
elif cType==0:
VecR = np.squeeze(np.array(abs(R[I,J])))
elif cType==-1:
VecR = np.squeeze(np.array(-R[I,J]))
# the number of elements is too big, so we truncate it
# first, find the appropriate threshold for R
NthR = 0
tmpRth = 0.95
StepTh = 0.05
while NthR<K*NNodes/2.0:
tmpRth -= StepTh
#print('Threshold = %.2f' % tmpRth)
NthR = len(np.nonzero(VecR>tmpRth)[0])
# second, truncate the variables
IndVecR = np.nonzero(VecR>tmpRth)
thVecR = VecR[IndVecR]
thI = I[IndVecR]
thJ = J[IndVecR]
# sort the correlation values
zipR = zip(thVecR, thI, thJ)
zipsR = sorted(zipR, key = lambda t: t[0], reverse=True)
sVecR, sI, sJ = zip(*zipsR)
# then adding edges
m = int(np.ceil(K*NNodes/2.0)) # the number of edges
trI = np.array(sI[:m])
trJ = np.array(sJ[:m])
Elist = np.vstack((NodeInd[trI], NodeInd[trJ])).T
G.add_edges_from(Elist)
RTh = sVecR[m-1] # the threshold
# finally returning the resultant graph and the threshold
return G
####### Loading the cluster data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Cluster.npz')
y_clus = infile['y_clus']
y_cent = infile['y_cent']
xyz = infile['xyz']
nodes = infile['nodes']
nNode = len(nodes)
nClus = max(y_clus) + 1
####### modules for centroid connectivity
targetDeg = 20
partition_list = []
GC_list = []
for i in np.unique(y_clus):
# reconstructing the connectivity matrix for the centroid
R = np.zeros((nNode,nNode))
indR = np.triu_indices(nNode,1)
R[indR] = y_cent[i]
R = R.T
R[indR] = y_cent[i]
# Constructing a network based on the centroid conenctivty
G = net_builder_HardTh(R, nodes, targetDeg)
# finding the giant component
GC_nodes = max(nx.connected_components(G), key=len)
GC = G.subgraph(GC_nodes)
GC_list.append(G)
###### modular partition by Louvain
partition = community.best_partition(GC)
partition_list.append(partition)
###### drawing the graph
# dictionary of xy-coordinates
pos = {}
for iROI in range(len(nodes)):
pos[nodes[iROI]] = xyz[iROI,:2]
# Loop over states for visualization
plt.figure(figsize=[9,7])
for iState in np.unique(y_clus):
# finally, graph with communities in different colors (Louvain)
plt.subplot(2,3,iState+1)
nComm = max([comm for comm in partition_list[iState].values()])+1
node_color_list = get_cmap(nComm+1,'rainbow')
for iComm in range(nComm):
nodeList = [iNode for iNode,Comm in partition_list[iState].items()
if Comm==iComm]
nx.draw_networkx_nodes(GC_list[iState], pos,
nodelist=nodeList,
node_color = np.array([node_color_list(iComm)]),
node_size=30)
nx.draw_networkx_edges(GC_list[iState], pos, width=0.5,
edge_color='lightblue')
plt.title('State %d' % iState)
plt.axis('off')
plt.subplots_adjust(left=0.01, right=0.99, wspace=0.05,
bottom=0.025, top=0.925)
plt.show()
/usr/local/lib/python3.6/dist-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarning: isinstance(..., numbers.Number) if cb.is_numlike(alpha):
As you can see, the modular organization is different under different states.
Modularity for different states. Calculate the modularity measure for the modular partition for the six states. (Post both code & numbers).
for iState in range(len(partition_list)):
print('State %d:' % iState +
'\t\tModularity %6.4f' % community.modularity(partition_list[iState],
GC_list[iState]))
State 0: Modularity 0.5000 State 1: Modularity 0.4733 State 2: Modularity 0.4896 State 3: Modularity 0.4109 State 4: Modularity 0.4884 State 5: Modularity 0.4631
Allen et al. 2014. Tracking Whole-Brain Connectivity Dynamics in the Resting State. Cerebral Cortex 24:663–676
Preti et al. 2017. The dynamic functional connectome: State-of-the-art and perspective. NeuroImage 160:41–54
Zalesky et al. 2014. Time-resolved resting-state brain networks. PNAS 111:10341–10346