Notebook

PSY 381D Brain Connectivity, Spring 2019

Dynamic connectivity and brain states

April 8, 2019

0. Docker¶

The materials presented in this note require Scikit-learn (referred as sklearn), a popular machine learning library for Python. Unfortunately I forgot to include this in the docker image sathayas/python-fsl-bundle, so you have to install this right after you start your docker, before you start Jupyter notebook. You only need to run this command:

pip3 install scikit-learn

1. Data¶

The data for today's demo is a resting-state fMRI data set from Leiden. The fMRI time series was acquired with TR=2.2s. There are 212 time points in the preprocessed data set. From the preprocessed fMRI data, mean ROI time series data were extracted with Craddock's Rt 2-level atlas with K=200.

All the files are located under the DataDynamicConn directory.

2. Dynamic connectivity calculation¶

Sliding windows¶

Dynamic connectivity, unlike static connectivity, describe temporally varying connectivity over the duration of an experiment. One popular way to determine dynamic connectivity is by a sliding windows method. In a sliding window method, a segment of an fMRI time series data set is extracted (e.g., 45 time points out of 212 time points). The window width is choses so that data points from 60-100s are included in a window. Then a connectivity is calculated based on the extracted time series data. The process is repeated by selecting additional time segments:

segment 1: time points 1-45
segment 2: time points 2-46
segment 3: time points 3-47
and so on

This will results in a number of overlapping segments.

Tapered windows¶

A simple sliding window has a boxcar shape, giving all time points the same weight. One shortcoming of this approach is that a distant observation at the edge of a window can potentially influence the connectivity. To overcome this problem, it is a common practice to use tapered windows, with distant observations given less weight. Details can be found in Betzel et al. paper.

Connectivity matrix¶

Within each window, you can calculate a connectivity matrix describing association between different ROIs. This can be done by calculating a correlation matrix of the weighted data. Another popular approach is to use graphical LASSO to estimate the inverse correlation matrix (a.k.a., ICOV).

Example¶

From the Leiden resting-state fMRI data. Here, the window width is 100s or 45 time points.

<CalcR.py>

In [1]:

%matplotlib inline

In [2]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.covariance import GraphicalLassoCV
from sklearn.preprocessing import StandardScaler


##### Function to extract time series data 
def extract_winTS(X, timeStart, winSize):
    # extracting the window
    winX = X[(timeStart-winSize+1):(timeStart+1)]
    # calculating the weights
    th = winSize / 3
    w0 = (1-np.exp(-1/th))/(1-np.exp(-winSize/th))
    wt = []
    for iT in range(1,winSize+1):
        w = w0 * np.exp((iT - winSize)/th)
        wt.append(w)
    wt = np.array(wt).reshape(winSize,1)
    # weighting with wt
    wwinX = wt * winX
    return wwinX


##### Parameters
TR = 2.2   # TR
nTime = 212  # number of time points
winSizeS = 100  # window width in S
winSizeTR = round(winSizeS / TR) # window width in TR


##### loading the data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200.npz')
ts = infile['ts']
nodes = infile['nodes']
xyz = infile['xyz']
ts_std = StandardScaler().fit_transform(ts)   # scaling the data

The original time series data is standardized by the StandardScaler from Scikit-learn so that all variables have zero mean and unit variance.

In [3]:

##### plotting the weight and weighted data
# calculating weights
th = winSizeTR / 3
w0 = (1-np.exp(-1/th))/(1-np.exp(-winSizeTR/th))
wt = []
for iT in range(1,winSizeTR+1):
    w = w0 * np.exp((iT - winSizeTR)/th)
    wt.append(w)
wt = np.array(wt).reshape(winSizeTR,1)

# extracting weighted time series
wts = extract_winTS(ts_std, winSizeTR, winSizeTR)

# plotting them 
plt.figure(figsize=[6,3])
plt.subplot(121)
plt.plot(wt)
plt.xlabel('Time points')
plt.xticks([0,winSizeTR],['tau = 1', 'tau = L'])
plt.ylabel('Weight')
plt.title('Weights')

plt.subplot(122)
for iNode in range(len(nodes)):
    plt.plot(wts[:,iNode])
plt.xlabel('Time points')
plt.xticks([0,winSizeTR],['t - L', 't'])
plt.ylabel('Weighted time series')
plt.title('Weighted time series')

plt.subplots_adjust(left=0.125, right=0.975, 
                    bottom=0.15, top=0.90, wspace=0.4)
plt.show()

Here, the left panel shows the shape of the tapered window used in this calculation. The right panel shows the time series data weighted with the tapered weights. Based on these weighted data, a correlation matrix is calculated.

There are 212 time points in the original fMRI data, and each window contains 45 time points. Thus it is possible to form 167 windows -- thus 167 correlation matrices. Those correlation matrices are stored in an .npz file to be used later.

In [4]:

###### Calculating correlation across windows
Rmat = []
for iTime in range(winSizeTR,nTime):
    # extracting weighted time series
    wts = extract_winTS(ts_std, iTime, winSizeTR)
    # correlation matrix
    R = np.corrcoef(wts, rowvar=False)
    # zeroing the main diagonal
    for i in range(R.shape[0]):
        R[i,i] = 0
    # keeping R
    Rmat.append(R)

Rmat = np.array(Rmat)

In [6]:

###### Saving the Rmat for future use
np.savez('DataDynamicConn/Leiden_sub39335_Rt2_K200_Rmat.npz',
         Rmat = Rmat,
         winSize = winSizeTR,
         nodes = nodes,
         xyz = xyz)

3. Temporal dynamics of network property¶

Now we have a series of correlation matrices. We form networks at each time point (i.e., each correlation matrix) with target average degree of 20. Then we will examine how network properties changes over time. To do so, we calculate the local and global efficiencies at each node over time. Unfortunately networkx does not allow calculation of efficiencies at the nodal-level, I have written some functions to calculate efficiencies at nodal levels (eloc_node and eglob_node; see the code below).

<Efficiency.py>

In [5]:

import networkx as nx
import matplotlib.pyplot as plt
import numpy as np

###### Network construction functions, based on correlation matrix
def net_builder_HardTh(R, NodeInd, K, cType=1):
    '''
    a function to construct the network by the hard-thresholding.
    input parameters:
          R:         A dense correlation matrix array.
          NodeInd:   A list of nodes in the network.
          K:         The target K, the average connections at each node
          cType:     Type of functional connectivity. 
                        1:  Positive correlation only
                        0:  Both positive and negative correlations
                        -1: Negative correlation only
                     The default is 1 (i.e., positive correlation only).
    
    returns:
          G:         The resulting graph (networkX format)
    '''
    
    # first, initialize the graph
    G = nx.Graph()
    G.add_nodes_from(NodeInd)
    NNodes = R.shape[0]
    # upper triangle of the correlation matrix only
    I,J = np.triu_indices(NNodes,1)
    # creating a vector of correlation coefficients, depending on cType
    if cType==1:
        VecR = np.squeeze(np.array(R[I,J]))
    elif cType==0:
        VecR = np.squeeze(np.array(abs(R[I,J])))
    elif cType==-1:
        VecR = np.squeeze(np.array(-R[I,J]))
    # the number of elements is too big, so we truncate it
    # first, find the appropriate threshold for R
    NthR = 0
    tmpRth = 0.95
    StepTh = 0.05
    while NthR<K*NNodes/2.0:
        tmpRth -= StepTh
        #print('Threshold = %.2f' % tmpRth)
        NthR = len(np.nonzero(VecR>tmpRth)[0])
    # second, truncate the variables
    IndVecR = np.nonzero(VecR>tmpRth)
    thVecR = VecR[IndVecR]
    thI = I[IndVecR]
    thJ = J[IndVecR]
    # sort the correlation values
    zipR = zip(thVecR, thI, thJ)
    zipsR = sorted(zipR, key = lambda t: t[0], reverse=True)
    sVecR, sI, sJ = zip(*zipsR)
    # then adding edges
    m = int(np.ceil(K*NNodes/2.0))  # the number of edges
    trI = np.array(sI[:m])
    trJ = np.array(sJ[:m])
    Elist = np.vstack((NodeInd[trI], NodeInd[trJ])).T
    G.add_edges_from(Elist)
    RTh = sVecR[m-1]  # the threshold
    # finally returning the resultant graph and the threshold
    return G


####### functions to calculate nodal local efficiencies
def eloc_node(G, xNode):
    '''
    A function to calculate the nodal local efficiency
    from a node xNode.
    
    input parameters:
          G:      A graph in networkX format.
          xNode:  The node where the nodal global efficiency is calculated.
    returns:
          Eloc:   The nodal local efficiency at node xNode.
    '''

    subG = subgraph(G, xNode)
    #Eloc, tmpEloci, tmpNodes = eglob_net(subG)
    NNodes = len(subG.nodes())
    if NNodes>1:
        #Dij = nx.all_pairs_shortest_path_length(subG)
        Dij = nx.floyd_warshall(subG)
        D = [Dij[i].values() for i in subG.nodes()]
        cD = []
        for irow in D:
            cD += irow            
        NZD = np.array(cD)[np.nonzero(cD)]
        if len(NZD)>0:
            Eloc = (1.0/(NNodes*(NNodes-1.0))) * np.sum(1.0/NZD)
        else:
            Eloc = 0
    else:
        Eloc = 0
    return Eloc

def subgraph(G, xNode):
    ''''
    A function to extract a subgraph of a node xNode
    
    input parameters:
          G:      A graph in networkX format.
          xNode:  The node where the nodal global efficiency is calculated.
    returns:
          subG:   A subgraph of G, containing neighbors of xNode but not xNode
                  itself.
    '''
    subNodes = list(nx.all_neighbors(G, xNode))
    Edges = G.edges()
    subEdges = []       #create list of subgraph edges
    for iEdge in Edges:
        if (iEdge[0] in subNodes and iEdge[1] in subNodes):
            subEdges.append(iEdge)
    subG = nx.Graph()             # create subgraph
    subG.add_nodes_from(subNodes)    #populate subgraph with nodes
    subG.add_edges_from(subEdges)    # populate subgraph with edges
    return subG

def eglob_node(G, xNode):
    '''
    A function to calculate the nodal global efficiency
    from a node.
    input parameters:
          G:      A graph in networkX format.
          xNode:  The node where the nodal global efficiency is calculated.
    
    returns:
          Eglob:  The nodal blobal efficiency at xNode.
    '''

    NNodes = len(G.nodes())
    Dx = list(nx.single_source_shortest_path_length(G, xNode).values())
    indZ = np.nonzero(np.array(Dx)==0)[0]
    nzDx = np.delete(Dx, indZ)
    if len(nzDx)>0:
        Eglob = (1.0/(NNodes-1.0)) * np.sum(1.0/nzDx)
    else:
        Eglob = 0
    # returning the nodal global efficiency
    return Eglob




##### Parameters
targetDeg = 20

##### Loading the data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Rmat.npz')
Rmat = infile['Rmat']
nodes = infile['nodes']
xyz = infile['xyz']

Here is a loop to construct a network and calculate nodal local and global efficiencies at each time point.

In [ ]:

###### Loop over time points
ElocMat = np.zeros((len(nodes),Rmat.shape[0]))
EglobMat = np.zeros((len(nodes),Rmat.shape[0]))
for iTime in range(Rmat.shape[0]):
    print('Working on time point %d' % iTime)

    ###### extracting a network 
    G = net_builder_HardTh(Rmat[iTime,:,:], nodes, targetDeg)

    ###### calculating local and global efficiency
    ElocList = []
    EglobList = []
    for iNode in nodes:
        Eloc = eloc_node(G,iNode)
        ElocList.append(Eloc)
        Eglob = eglob_node(G,iNode)
        EglobList.append(Eglob)

    # saving it for later
    ElocMat[:,iTime] = ElocList
    EglobMat[:,iTime] = EglobList

##### saving to a file so that we don't have to recalculate
np.savez('DataDynamicConn/Leiden_sub39335_Rt2_K200_Efficiency.npz',
         ElocMat = ElocMat,
         EglobMat = EglobMat,
         nodes = nodes,
         xyz = xyz)

But it takes a few minutes to run, so I just load the pre-cooked results.

In [6]:

##### loading from the file to save time
f = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Efficiency.npz')
ElocMat = f['ElocMat']
EglobMat = f['EglobMat']
nodes = f['nodes']
xyz = f['xyz']

And let's plot the efficiencies over time.

In [7]:

##### plotting efficiency over time
plt.figure(figsize=[9,5])
plt.subplot(121)
plt.imshow(ElocMat, cmap=plt.cm.rainbow)
plt.title('Local efficiency')
plt.xlabel('Time')
plt.ylabel('Nodes')
plt.colorbar()

plt.subplot(122)
plt.imshow(EglobMat, cmap=plt.cm.rainbow)
plt.title('Global efficiency')
plt.xlabel('Time')
plt.ylabel('Nodes')
plt.colorbar()

plt.show()

Here, rows represent different nodes, and columns represent time. As you can see, there are some time points where efficiencies change dramatically (manifesting as a vertical edge, more apparent in global efficiency).

4. Brain states¶

So, there seems to be system-wide changes occurring at different points during this experiment. In fact, so-called resting-state is made up of distinct states with similar connectivity patters. These states are discrete, and transitions from one state to another in a short period of time.

We can identify different brain states by K-means clustering, an unsupervised machine learning method.

K-means clustering¶

K-means clustering is a machine learning method that groups observations into K distinct groups according to their similarities. Here, we cluster time points into different clusters of similar connectivity patterns. To do so, we re-organize the correlation matrix at each time point into a single vector.

<Clustering.py>

In [8]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

####### Loading the data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Rmat.npz')
Rmat = infile['Rmat']
nNodes = Rmat.shape[-1]
nTime = Rmat.shape[0]
indR = np.triu_indices(nNodes,1) # indices for the upper triangle of R
# initializing the data array, rows = correlation, cols = time
Rdata = np.zeros((nTime,len(indR[0])))
for iTime in range(nTime):
    R = Rmat[iTime,:,:]
    Rdata[iTime,:] = R[indR]

Determining K¶

One downside of K-means clustering is that you have to know K, the number of clusters, before you run the clustering algorithm. If K is unknown, then you have to estimate K. One way to do this is by generating a Scree plot. In particular, we perform K-means clustering with different values of K. Then plot the squared sum of distances between the cluster centroid and observations (known as inertia) over K.

In [9]:

###### Clustering --- Figuring out the  number of clusters
# determinging the number of clusters (up to 30 clusters)
SSE = []
for iClus in range(1,31):
    #print('Number of clusters: %d' % iClus)
    # K-means clustering
    km = KMeans(n_clusters=iClus)  # K-means with a given number of clusters
    km.fit(Rdata)  # fitting the principal components
    SSE.append(km.inertia_) # recording the sum of square distances

# plotting the sum of square distance
plt.plot(np.arange(1,31),SSE,marker = "o")
plt.xlabel('Number of clusters')
plt.ylabel('Sum of sq distances')
plt.show()

Once you have a Scree plot, you look for an elbow. The elbow is defined as the point where the rate of decrease stabilizes. This criterion is somewhat subjective. In our case, we will go with K=6 as the elbow.

Running K-means clustering¶

In [10]:

###### Clustering --- with K=6
km = KMeans(n_clusters=6)  # defining the clustering object
km.fit(Rdata)  # actually fitting the data
y_clus = km.labels_   # clustering info resulting from K-means
y_cent = km.cluster_centers_  # centroid coordinates

We have cluster assignments stored in an array y_clus. We can plot y_clus and the global efficiency over time.

In [11]:

####### plotting cluster over time
f = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Efficiency.npz')
EglobMat = f['EglobMat']
xyz = f['xyz']
nodes = f['nodes']
plt.figure(figsize=[4,7])
plt.subplot(211)
plt.plot(y_clus)
plt.title('States over time')
plt.xlabel('Time')
plt.ylabel('State')
plt.xlim(1,nTime)

plt.subplot(212)
plt.imshow(EglobMat, cmap=plt.cm.rainbow, aspect='auto')
plt.title('Global efficiency')
plt.xlabel('Time')
plt.ylabel('Nodes')
plt.xlim(1,nTime)

plt.subplots_adjust(left=0.15, right=0.975, top=0.95, bottom=0.075,
                    hspace=0.4)
plt.show()

As you may notice, state changes corresponds to vertical edges in the global efficiency plot.

We shall save the cluster assignment, as well as the cluster centroids (y_cent for later use.

In [12]:

####### Saving the centroid information
np.savez('DataDynamicConn/Leiden_sub39335_Rt2_K200_Cluster.npz',
         y_clus = y_clus,
         y_cent = y_cent,
         nodes = nodes,
         xyz = xyz)

In [13]:

y_cent.shape

Out[13]:

(6, 17578)

Centroid modules¶

The centroid for each cluster from K-means clustering represent representative connectivity at that particular brain state. We can transform the centroid back to a connectivity matrix (correlation matrix), generate a network representative of that state, and determine the modular organization for that state.

<EigenModules.py>

In [14]:

import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
import community   # Louvain method
from sklearn.metrics import adjusted_rand_score


##### Custom distinct color function --- to be used later
def get_cmap(n, name='hsv'):
    '''Returns a function that maps each index in 0, 1, ..., n-1 to a distinct 
    RGB color; the keyword argument name must be a standard mpl colormap name.'''
    return plt.cm.get_cmap(name, n)


###### Network construction functions, based on correlation matrix
def net_builder_HardTh(R, NodeInd, K, cType=1):
    '''
    a function to construct the network by the hard-thresholding.
    input parameters:
          R:         A dense correlation matrix array.
          NodeInd:   A list of nodes in the network.
          K:         The target K, the average connections at each node
          cType:     Type of functional connectivity. 
                        1:  Positive correlation only
                        0:  Both positive and negative correlations
                        -1: Negative correlation only
                     The default is 1 (i.e., positive correlation only).
    
    returns:
          G:         The resulting graph (networkX format)
    '''
    
    # first, initialize the graph
    G = nx.Graph()
    G.add_nodes_from(NodeInd)
    NNodes = R.shape[0]
    # upper triangle of the correlation matrix only
    I,J = np.triu_indices(NNodes,1)
    # creating a vector of correlation coefficients, depending on cType
    if cType==1:
        VecR = np.squeeze(np.array(R[I,J]))
    elif cType==0:
        VecR = np.squeeze(np.array(abs(R[I,J])))
    elif cType==-1:
        VecR = np.squeeze(np.array(-R[I,J]))
    # the number of elements is too big, so we truncate it
    # first, find the appropriate threshold for R
    NthR = 0
    tmpRth = 0.95
    StepTh = 0.05
    while NthR<K*NNodes/2.0:
        tmpRth -= StepTh
        #print('Threshold = %.2f' % tmpRth)
        NthR = len(np.nonzero(VecR>tmpRth)[0])
    # second, truncate the variables
    IndVecR = np.nonzero(VecR>tmpRth)
    thVecR = VecR[IndVecR]
    thI = I[IndVecR]
    thJ = J[IndVecR]
    # sort the correlation values
    zipR = zip(thVecR, thI, thJ)
    zipsR = sorted(zipR, key = lambda t: t[0], reverse=True)
    sVecR, sI, sJ = zip(*zipsR)
    # then adding edges
    m = int(np.ceil(K*NNodes/2.0))  # the number of edges
    trI = np.array(sI[:m])
    trJ = np.array(sJ[:m])
    Elist = np.vstack((NodeInd[trI], NodeInd[trJ])).T
    G.add_edges_from(Elist)
    RTh = sVecR[m-1]  # the threshold
    # finally returning the resultant graph and the threshold
    return G


####### Loading the cluster data
infile = np.load('DataDynamicConn/Leiden_sub39335_Rt2_K200_Cluster.npz')
y_clus = infile['y_clus']
y_cent = infile['y_cent']
xyz = infile['xyz']
nodes = infile['nodes']
nNode = len(nodes)
nClus = max(y_clus) + 1



####### modules for centroid connectivity
targetDeg = 20
partition_list = []
GC_list = []
for i in np.unique(y_clus):
    # reconstructing the connectivity matrix for the centroid
    R = np.zeros((nNode,nNode))
    indR = np.triu_indices(nNode,1)
    R[indR] = y_cent[i]
    R = R.T
    R[indR] = y_cent[i]
    # Constructing a network based on the centroid conenctivty
    G = net_builder_HardTh(R, nodes, targetDeg)
    # finding the giant component
    GC_nodes = max(nx.connected_components(G), key=len)  
    GC = G.subgraph(GC_nodes)
    GC_list.append(G)
    ###### modular partition by Louvain
    partition = community.best_partition(GC)
    partition_list.append(partition)


###### drawing the graph
# dictionary of xy-coordinates
pos = {}
for iROI in range(len(nodes)):
    pos[nodes[iROI]] = xyz[iROI,:2]

# Loop over states for visualization
plt.figure(figsize=[9,7])
for iState in np.unique(y_clus):
    
    # finally, graph with communities in different colors (Louvain)
    plt.subplot(2,3,iState+1)
    nComm = max([comm for comm in partition_list[iState].values()])+1
    node_color_list = get_cmap(nComm+1,'rainbow')
    for iComm in range(nComm):
        nodeList = [iNode for iNode,Comm in partition_list[iState].items()
                    if Comm==iComm]
        nx.draw_networkx_nodes(GC_list[iState], pos, 
                               nodelist=nodeList,
                               node_color = np.array([node_color_list(iComm)]),
                               node_size=30)
        nx.draw_networkx_edges(GC_list[iState], pos, width=0.5,
                               edge_color='lightblue')
    plt.title('State %d' % iState)
    plt.axis('off')

plt.subplots_adjust(left=0.01, right=0.99, wspace=0.05,
                    bottom=0.025, top=0.925)
plt.show()

/usr/local/lib/python3.6/dist-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarning: isinstance(..., numbers.Number)
  if cb.is_numlike(alpha):

As you can see, the modular organization is different under different states.

Exercise¶

Modularity for different states. Calculate the modularity measure for the modular partition for the six states. (Post both code & numbers).

In [18]:

for iState in range(len(partition_list)):
    print('State %d:' % iState + 
          '\t\tModularity %6.4f' % community.modularity(partition_list[iState],
                                                        GC_list[iState]))

State 0:		Modularity 0.5000
State 1:		Modularity 0.4733
State 2:		Modularity 0.4896
State 3:		Modularity 0.4109
State 4:		Modularity 0.4884
State 5:		Modularity 0.4631