(Modified from https://github.com/oercompbiomed/CBM101/tree/master/D_Network_analysis by Peder Lillebostad)
Performing statistical tests on brain networks.
Data and software availability (layer fMRI):
Anonymized MRI data that are presented in this article can be anonymously downloaded from OpenNeuro. Volume maps of data presented in Fig. 2 can be downloaded at http://doi.org/10.18112/openneuro.ds002274.v1.0.2 (go to actualfiles/V1_LAYERING). The raw and processed data of multiple participants of the study shown in Fig. 4, Fig. 6 are available here: www.doi.org/10.18112/openneuro.ds001547.v1.1.0. Data shown in Fig. 5, Fig. 8 are publicly available from previous published studies (Huber et al., 2017). Underlying tables and lists used for Fig. 10 are available on https://layerfmri.com/VASOworldwide. All custom written software (source code) and evaluation scripts are available on Github (https://github.com/layerfMRI/repository). The authors are happy to share the MAGEC-VASO 3D-EPI MR sequence upon request via a SIEMENS C2P agreement. A complete list of scan parameters used in this study is available on Github (https://github.com/layerfMRI/Sequence_Github). The source code of the layer-specific analysis software is available on Github (https://github.com/layerfMRI/LAYNII) with analysis pipelines explained on www.layerfmri.com. All stimulation presentations were implemented in Psychopy and can be downloaded here: https://github.com/layerfMRI/Phychopy_git.
Unfortunately for legal reasons, in order to complete this notebook you must gain access to make an account on https://db.humanconnectome.org/ to access the data. This requires IBM Aspera Connect
, a plugin to access the data (you will be prompted by a pop-up from the website).
https://db.humanconnectome.org/data/projects/HCP_1200#HcpOpen
Go under HCP1200 Parcellation+Timeseries+Netmats (PTN) and download the 10 GB 812 Subjects, recon r227 only, PTN Release
zip file.
Under Quick Downloads (left hand side) download the Behavioural data (csv file).
- Once downloaded, unzip the folder.
- untar the move the file
netmats_3T_HCP1200_MSMAll_ICAd100_ts2.tar.gz
in git bash usingtar -xzvf <filename>
- this creates a directory
netmats
. Open this and select the file netmats1.txt and move it to the data subdirectory and rename it toICA_d100_netmats.txt
- rename the behavioural data to
behavioural.csv
.- Move all the relevant files plus
subjectsID_recon2.txt
to the data subdirectory.
Network-based statistic is a non-parametric permutation test used to assess whether the connectomes of two populations are different.
What all statistical tests have in common is that they calculate some number (a test statistic) from the data, and compare it to the null distribution. If the test statistic deviates more from this than what we can reasonably expect under the null hypothesis, we say there is an effect.
The only difference in a permutation test is that we don't need to make assumptions about the null distribution (i.e. non-parametric), but we find it using random assignments of the labels. The basic steps is to compute the test-statistic, then randomly shuffle the labels (e.g. sick versus patient) and recompute the statistic. You repeat this reshuffling/recomputing a large number of times to get the null distribution. If the first ("real") test-statistic significantly differs from the expected null distribution, we likely have a population difference.
The steps of NBS are straight forward:
For each edge of the network, perform a statistical test (two-sided t-test) for whether it is different between the two populations. This yields a single number for each edge.
Construct a pseudo-network where the above t-stats are used as the edges of the pseudo-network. The resulting graph can be thought of the "difference network" between the two populations.
Choose a threshold (usually you try multiple) and discard all the weights weaker than this threshold in the pseudo-network. This results in a fragmented network with components of variable size. Identify the largest connected component and write down its size (number of edges).
Identify the LARGEST connected component and write down its size (number of edges). This is your test-statistic.
Randomly reassign the participants, and repeat step 1-3 K times (usually K=1000). Check for significance.
This image reviews the steps in a single iteration to compute the test statistic.
bct-cpp is a C++ port of the Brain Connectivity Toolbox (BCT), a set of MATLAB functions and neuroanatomical data sets useful in the analysis of structural or functional brain networks.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import sys
import os
import bct
import numpy as np
import pandas as pd
import networkx as nx
from bct import nbs
from matplotlib import pyplot as plt
from numpy.linalg import norm
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler as scaler
from nilearn.plotting import plot_matrix
sys.executable
'/Users/arvid/opt/anaconda3/envs/bmed360v2021/bin/python'
NBS is implemented in the Brain Connectivity Toolbox for Python bctpy
. If you don't have it already installed, uncomment and run the cell below.
import bct
from bct import nbs
#!{sys.executable} -m pip install bctpy
#verify installation
!conda list bctpy
# packages in environment at /Users/arvid/opt/anaconda3/envs/bmed360v2021: # # Name Version Build Channel bctpy 0.5.2 pypi_0 pypi
all_subjects = np.random.rand(15,15,50) # N=50, nodes=15
group1 = all_subjects[:,:,:25]
group2 = all_subjects[:,:,25:]
sub0 = group1[:,:,0]
plt.imshow(sub0, cmap=plt.cm.Reds, vmin=0, vmax=1)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7ffc595bd8e0>
Since we drew both groups out of the same population all_subjects
, there should be no group difference between the two.
def do_nbs(g1, g2, kappa=0.8, plot=True, k=1000):
pval, adj, null = nbs.nbs_bct(g1, g2, kappa, k=k)
if plot:
plt.hist(null, bins=20); plt.show()
return pval, adj, null
pval, adj, null = do_nbs(group1, group2, plot=False)
max component size is 47 estimating null distribution with 1000 permutations permutation 0 of 1000. p-value so far is 1.000 permutation 100 of 1000. p-value so far is 0.297 permutation 200 of 1000. p-value so far is 0.353 permutation 300 of 1000. p-value so far is 0.346 permutation 400 of 1000. p-value so far is 0.347 permutation 500 of 1000. p-value so far is 0.327 permutation 600 of 1000. p-value so far is 0.326 permutation 700 of 1000. p-value so far is 0.321 permutation 800 of 1000. p-value so far is 0.322 permutation 900 of 1000. p-value so far is 0.333 permutation 999 of 1000. p-value so far is 0.337
plt.hist(null, bins=20)
plt.axvline(41, c='red', )
<matplotlib.lines.Line2D at 0x7ffc48bb78b0>
Conclusion: the test statistic is well within the expected range under the null hypothesis, so we have no reason to assume a group difference. This is expected because we drew both groups from the same simulated population. Next we can use a real dataset from the Human Connectome Project (HCP1200) with some behavioural data to see if we have a difference based on certain traits.
We start by loading the behavioural data (metadata from which we can test variables of interest and filter based on exclusion criteria).
behav = pd.read_csv('data/behavioural.csv')
behav.head()
Subject | Release | Acquisition | Gender | Age | 3T_Full_MR_Compl | T1_Count | T2_Count | 3T_RS-fMRI_Count | 3T_RS-fMRI_PctCompl | ... | Noise_Comp | Odor_Unadj | Odor_AgeAdj | PainIntens_RawScore | PainInterf_Tscore | Taste_Unadj | Taste_AgeAdj | Mars_Log_Score | Mars_Errs | Mars_Final | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100004 | S900 | Q06 | M | 22-25 | False | 0 | 0 | 0 | 0.0 | ... | 5.2 | 101.12 | 86.45 | 2.0 | 45.9 | 107.17 | 105.31 | 1.80 | 0.0 | 1.80 |
1 | 100206 | S900 | Q11 | M | 26-30 | True | 1 | 1 | 4 | 100.0 | ... | 6.0 | 108.79 | 97.19 | 1.0 | 49.7 | 72.63 | 72.03 | 1.84 | 0.0 | 1.84 |
2 | 100307 | Q1 | Q01 | F | 26-30 | True | 1 | 1 | 4 | 100.0 | ... | 3.6 | 101.12 | 86.45 | 0.0 | 38.6 | 71.69 | 71.76 | 1.76 | 0.0 | 1.76 |
3 | 100408 | Q3 | Q03 | M | 31-35 | True | 1 | 1 | 4 | 100.0 | ... | 2.0 | 108.79 | 98.04 | 2.0 | 52.6 | 114.01 | 113.59 | 1.76 | 2.0 | 1.68 |
4 | 100610 | S900 | Q08 | M | 26-30 | True | 2 | 1 | 4 | 100.0 | ... | 2.0 | 122.25 | 110.45 | 0.0 | 38.6 | 84.84 | 85.31 | 1.92 | 1.0 | 1.88 |
5 rows × 582 columns
for col in behav.columns: print(col)
Subject Release Acquisition Gender Age 3T_Full_MR_Compl T1_Count T2_Count 3T_RS-fMRI_Count 3T_RS-fMRI_PctCompl 3T_Full_Task_fMRI 3T_tMRI_PctCompl fMRI_WM_PctCompl fMRI_Gamb_PctCompl fMRI_Mot_PctCompl fMRI_Lang_PctCompl fMRI_Soc_PctCompl fMRI_Rel_PctCompl fMRI_Emo_PctCompl 3T_dMRI_Compl 3T_dMRI_PctCompl dMRI_3T_ReconVrs fMRI_3T_ReconVrs 7T_Full_MR_Compl 7T_RS-fMRI_Count 7T_RS-fMRI_PctCompl 7T_Full_Task_fMRI 7T_tMRI_PctCompl fMRI_Movie_Compl fMRI_Movie_PctCompl fMRI_Ret_Compl fMRI_Ret_PctCompl 7T_dMRI_Compl 7T_dMRI_PctCompl 7T_fMRI_Mov_Vrs MEG_AnyData MEG_FullProt_Compl MEG_HeadModel_Avail MEG_CortRibn_Avail MEG_Anatomy_Avail MEG_Anatomy_Compl MEG_Noise_Avail MEG_Noise_Compl MEG_RS_Avail MEG_RS_Compl MEG_WM_Avail MEG_WM_Compl MEG_StoryMath_Avail MEG_StoryMath_Compl MEG_Motor_Avail MEG_Motor_Compl Non-TB_Compl VisProc_Compl DelDisc_Compl SCPT_Compl IWRD_Compl PMAT_Compl VSPLOT_Compl EmoRecog_Compl NEO-FFI_Compl ASR-Syn_Compl ASR-DSM_Compl Toolbox_Compl MMSE_Compl PSQI_Compl Alert_Compl ASQ_Compl FamPsychNeuro_Compl SSAGA_Compl SSAGA_Demo_Compl SSAGA_Mental_Compl SSAGA_Alc_Compl SSAGA_Illicit_Compl SSAGA_Tob_Compl SSAGA_Mj_Compl QC_Issue MRsession_Scanner_3T MRsession_Scans_3T MRsession_Label_3T MRsession_Scanner_7T MRsession_Scans_7T MRsession_Label_7T MEGsession_Scanner MEGsession_Scans MEGsession_Label Alpha_Peak Beta_Peak MMSE_Score PSQI_Score PSQI_Comp1 PSQI_Comp2 PSQI_Comp3 PSQI_Comp4 PSQI_Comp5 PSQI_Comp6 PSQI_Comp7 PSQI_BedTime PSQI_Min2Asleep PSQI_GetUpTime PSQI_AmtSleep PSQI_Latency30Min PSQI_WakeUp PSQI_Bathroom PSQI_Breathe PSQI_Snore PSQI_TooCold PSQI_TooHot PSQI_BadDream PSQI_Pain PSQI_Other PSQI_Quality PSQI_SleepMeds PSQI_DayStayAwake PSQI_DayEnthusiasm PSQI_BedPtnrRmate PicSeq_Unadj PicSeq_AgeAdj CardSort_Unadj CardSort_AgeAdj Flanker_Unadj Flanker_AgeAdj PMAT24_A_CR PMAT24_A_SI PMAT24_A_RTCR ReadEng_Unadj ReadEng_AgeAdj PicVocab_Unadj PicVocab_AgeAdj ProcSpeed_Unadj ProcSpeed_AgeAdj DDisc_SV_1mo_200 DDisc_SV_6mo_200 DDisc_SV_1yr_200 DDisc_SV_3yr_200 DDisc_SV_5yr_200 DDisc_SV_10yr_200 DDisc_SV_1mo_40K DDisc_SV_6mo_40K DDisc_SV_1yr_40K DDisc_SV_3yr_40K DDisc_SV_5yr_40K DDisc_SV_10yr_40K DDisc_AUC_200 DDisc_AUC_40K VSPLOT_TC VSPLOT_CRTE VSPLOT_OFF SCPT_TP SCPT_TN SCPT_FP SCPT_FN SCPT_TPRT SCPT_SEN SCPT_SPEC SCPT_LRNR IWRD_TOT IWRD_RTC ListSort_Unadj ListSort_AgeAdj CogFluidComp_Unadj CogFluidComp_AgeAdj CogEarlyComp_Unadj CogEarlyComp_AgeAdj CogTotalComp_Unadj CogTotalComp_AgeAdj CogCrystalComp_Unadj CogCrystalComp_AgeAdj ER40_CR ER40_CRT ER40ANG ER40FEAR ER40HAP ER40NOE ER40SAD AngAffect_Unadj AngHostil_Unadj AngAggr_Unadj FearAffect_Unadj FearSomat_Unadj Sadness_Unadj LifeSatisf_Unadj MeanPurp_Unadj PosAffect_Unadj Friendship_Unadj Loneliness_Unadj PercHostil_Unadj PercReject_Unadj EmotSupp_Unadj InstruSupp_Unadj PercStress_Unadj SelfEff_Unadj FS_IntraCranial_Vol FS_BrainSeg_Vol FS_BrainSeg_Vol_No_Vent FS_BrainSeg_Vol_No_Vent_Surf FS_LCort_GM_Vol FS_RCort_GM_Vol FS_TotCort_GM_Vol FS_SubCort_GM_Vol FS_Total_GM_Vol FS_SupraTentorial_Vol FS_L_WM_Vol FS_R_WM_Vol FS_Tot_WM_Vol FS_Mask_Vol FS_BrainSegVol_eTIV_Ratio FS_MaskVol_eTIV_Ratio FS_LH_Defect_Holes FS_RH_Defect_Holes FS_Total_Defect_Holes FS_L_LatVent_Vol FS_L_InfLatVent_Vol FS_L_Cerebellum_WM_Vol FS_L_Cerebellum_Cort_Vol FS_L_ThalamusProper_Vol FS_L_Caudate_Vol FS_L_Putamen_Vol FS_L_Pallidum_Vol FS_3rdVent_Vol FS_4thVent_Vol FS_BrainStem_Vol FS_L_Hippo_Vol FS_L_Amygdala_Vol FS_CSF_Vol FS_L_AccumbensArea_Vol FS_L_VentDC_Vol FS_L_Vessel_Vol FS_L_ChoroidPlexus_Vol FS_R_LatVent_Vol FS_R_InfLatVent_Vol FS_R_Cerebellum_WM_Vol FS_R_Cerebellum_Cort_Vol FS_R_ThalamusProper_Vol FS_R_Caudate_Vol FS_R_Putamen_Vol FS_R_Pallidum_Vol FS_R_Hippo_Vol FS_R_Amygdala_Vol FS_R_AccumbensArea_Vol FS_R_VentDC_Vol FS_R_Vessel_Vol FS_R_ChoroidPlexus_Vol FS_5thVent_Vol FS_WM_Hypointens_Vol FS_L_WM_Hypointens_Vol FS_R_WM_Hypointens_Vol FS_L_Non-WM_Hypointens_Vol FS_R_Non-WM_Hypointens_Vol FS_OpticChiasm_Vol FS_CC_Posterior_Vol FS_CC_MidPosterior_Vol FS_CC_Central_Vol FS_CC_MidAnterior_Vol FS_CC_Anterior_Vol FS_L_Bankssts_Thck FS_L_Caudalanteriorcingulate_Thck FS_L_Caudalmiddlefrontal_Thck FS_L_Cuneus_Thck FS_L_Entorhinal_Thck FS_L_Fusiform_Thck FS_L_Inferiorparietal_Thck FS_L_Inferiortemporal_Thck FS_L_Isthmuscingulate_Thck FS_L_Lateraloccipital_Thck FS_L_Lateralorbitofrontal_Thck FS_L_Lingual_Thck FS_L_Medialorbitofrontal_Thck FS_L_Middletemporal_Thck FS_L_Parahippocampal_Thck FS_L_Paracentral_Thck FS_L_Parsopercularis_Thck FS_L_Parsorbitalis_Thck FS_L_Parstriangularis_Thck FS_L_Pericalcarine_Thck FS_L_Postcentral_Thck FS_L_Posteriorcingulate_Thck FS_L_Precentral_Thck FS_L_Precuneus_Thck FS_L_Rostralanteriorcingulate_Thck FS_L_Rostralmiddlefrontal_Thck FS_L_Superiorfrontal_Thck FS_L_Superiorparietal_Thck FS_L_Superiortemporal_Thck FS_L_Supramarginal_Thck FS_L_Frontalpole_Thck FS_L_Temporalpole_Thck FS_L_Transversetemporal_Thck FS_L_Insula_Thck FS_R_Bankssts_Thck FS_R_Caudalanteriorcingulate_Thck FS_R_Caudalmiddlefrontal_Thck FS_R_Cuneus_Thck FS_R_Entorhinal_Thck FS_R_Fusiform_Thck FS_R_Inferiorparietal_Thck FS_R_Inferiortemporal_Thck FS_R_Isthmuscingulate_Thck FS_R_Lateraloccipital_Thck FS_R_Lateralorbitofrontal_Thck FS_R_Lingual_Thck FS_R_Medialorbitofrontal_Thck FS_R_Middletemporal_Thck FS_R_Parahippocampal_Thck FS_R_Paracentral_Thck FS_R_Parsopercularis_Thck FS_R_Parsorbitalis_Thck FS_R_Parstriangularis_Thck FS_R_Pericalcarine_Thck FS_R_Postcentral_Thck FS_R_Posteriorcingulate_Thck FS_R_Precentral_Thck FS_R_Precuneus_Thck FS_R_Rostralanteriorcingulate_Thck FS_R_Rostralmiddlefrontal_Thck FS_R_Superiorfrontal_Thck FS_R_Superiorparietal_Thck FS_R_Superiortemporal_Thck FS_R_Supramarginal_Thck FS_R_Frontalpole_Thck FS_R_Temporalpole_Thck FS_R_Transversetemporal_Thck FS_R_Insula_Thck FS_L_Bankssts_Area FS_L_Caudalanteriorcingulate_Area FS_L_Caudalmiddlefrontal_Area FS_L_Cuneus_Area FS_L_Entorhinal_Area FS_L_Fusiform_Area FS_L_Inferiorparietal_Area FS_L_Inferiortemporal_Area FS_L_Isthmuscingulate_Area FS_L_Lateraloccipital_Area FS_L_Lateralorbitofrontal_Area FS_L_Lingual_Area FS_L_Medialorbitofrontal_Area FS_L_Middletemporal_Area FS_L_Parahippocampal_Area FS_L_Paracentral_Area FS_L_Parsopercularis_Area FS_L_Parsorbitalis_Area FS_L_Parstriangularis_Area FS_L_Pericalcarine_Area FS_L_Postcentral_Area FS_L_Posteriorcingulate_Area FS_L_Precentral_Area FS_L_Precuneus_Area FS_L_Rostralanteriorcingulate_Area FS_L_Rostralmiddlefrontal_Area FS_L_Superiorfrontal_Area FS_L_Superiorparietal_Area FS_L_Superiortemporal_Area FS_L_Supramarginal_Area FS_L_Frontalpole_Area FS_L_Temporalpole_Area FS_L_Transversetemporal_Area FS_L_Insula_Area FS_R_Bankssts_Area FS_R_Caudalanteriorcingulate_Area FS_R_Caudalmiddlefrontal_Area FS_R_Cuneus_Area FS_R_Entorhinal_Area FS_R_Fusiform_Area FS_R_Inferiorparietal_Area FS_R_Inferiortemporal_Area FS_R_Isthmuscingulate_Area FS_R_Lateraloccipital_Area FS_R_Lateralorbitofrontal_Area FS_R_Lingual_Area FS_R_Medialorbitofrontal_Area FS_R_Middletemporal_Area FS_R_Parahippocampal_Area FS_R_Paracentral_Area FS_R_Parsopercularis_Area FS_R_Parsorbitalis_Area FS_R_Parstriangularis_Area FS_R_Pericalcarine_Area FS_R_Postcentral_Area FS_R_Posteriorcingulate_Area FS_R_Precentral_Area FS_R_Precuneus_Area FS_R_Rostralanteriorcingulate_Area FS_R_Rostralmiddlefrontal_Area FS_R_Superiorfrontal_Area FS_R_Superiorparietal_Area FS_R_Superiortemporal_Area FS_R_Supramarginal_Area FS_R_Frontalpole_Area FS_R_Temporalpole_Area FS_R_Transversetemporal_Area FS_R_Insula_Area Emotion_Task_Acc Emotion_Task_Median_RT Emotion_Task_Face_Acc Emotion_Task_Face_Median_RT Emotion_Task_Shape_Acc Emotion_Task_Shape_Median_RT Gambling_Task_Perc_Larger Gambling_Task_Perc_Smaller Gambling_Task_Perc_NLR Gambling_Task_Median_RT_Larger Gambling_Task_Median_RT_Smaller Gambling_Task_Reward_Perc_Larger Gambling_Task_Reward_Median_RT_Larger Gambling_Task_Reward_Perc_Smaller Gambling_Task_Reward_Median_RT_Smaller Gambling_Task_Reward_Perc_NLR Gambling_Task_Punish_Perc_Larger Gambling_Task_Punish_Median_RT_Larger Gambling_Task_Punish_Perc_Smaller Gambling_Task_Punish_Median_RT_Smaller Gambling_Task_Punish_Perc_NLR Language_Task_Acc Language_Task_Median_RT Language_Task_Story_Acc Language_Task_Story_Median_RT Language_Task_Story_Avg_Difficulty_Level Language_Task_Math_Acc Language_Task_Math_Median_RT Language_Task_Math_Avg_Difficulty_Level Relational_Task_Acc Relational_Task_Median_RT Relational_Task_Match_Acc Relational_Task_Match_Median_RT Relational_Task_Rel_Acc Relational_Task_Rel_Median_RT Social_Task_Perc_Random Social_Task_Perc_TOM Social_Task_Perc_Unsure Social_Task_Perc_NLR Social_Task_Median_RT_Random Social_Task_Median_RT_TOM Social_Task_Median_RT_Unsure Social_Task_Random_Perc_Random Social_Task_Random_Median_RT_Random Social_Task_Random_Perc_TOM Social_Task_Random_Median_RT_TOM Social_Task_Random_Perc_Unsure Social_Task_Random_Median_RT_Unsure Social_Task_Random_Perc_NLR Social_Task_TOM_Perc_Random Social_Task_TOM_Median_RT_Random Social_Task_TOM_Perc_TOM Social_Task_TOM_Median_RT_TOM Social_Task_TOM_Perc_Unsure Social_Task_TOM_Median_RT_Unsure Social_Task_TOM_Perc_NLR WM_Task_Acc WM_Task_Median_RT WM_Task_2bk_Acc WM_Task_2bk_Median_RT WM_Task_0bk_Acc WM_Task_0bk_Median_RT WM_Task_0bk_Body_Acc WM_Task_0bk_Body_Acc_Target WM_Task_0bk_Body_Acc_Nontarget WM_Task_0bk_Face_Acc WM_Task_0bk_Face_Acc_Target WM_Task_0bk_Face_ACC_Nontarget WM_Task_0bk_Place_Acc WM_Task_0bk_Place_Acc_Target WM_Task_0bk_Place_Acc_Nontarget WM_Task_0bk_Tool_Acc WM_Task_0bk_Tool_Acc_Target WM_Task_0bk_Tool_Acc_Nontarget WM_Task_2bk_Body_Acc WM_Task_2bk_Body_Acc_Target WM_Task_2bk_Body_Acc_Nontarget WM_Task_2bk_Face_Acc WM_Task_2bk_Face_Acc_Target WM_Task_2bk_Face_Acc_Nontarget WM_Task_2bk_Place_Acc WM_Task_2bk_Place_Acc_Target WM_Task_2bk_Place_Acc_Nontarget WM_Task_2bk_Tool_Acc WM_Task_2bk_Tool_Acc_Target WM_Task_2bk_Tool_Acc_Nontarget WM_Task_0bk_Body_Median_RT WM_Task_0bk_Body_Median_RT_Target WM_Task_0bk_Body_Median_RT_Nontarget WM_Task_0bk_Face_Median_RT WM_Task_0bk_Face_Median_RT_Target WM_Task_0bk_Face_Median_RT_Nontarget WM_Task_0bk_Place_Median_RT WM_Task_0bk_Place_Median_RT_Target WM_Task_0bk_Place_Median_RT_Nontarget WM_Task_0bk_Tool_Median_RT WM_Task_0bk_Tool_Median_RT_Target WM_Task_0bk_Tool_Median_RT_Nontarget WM_Task_2bk_Body_Median_RT WM_Task_2bk_Body_Median_RT_Target WM_Task_2bk_Body_Median_RT_Nontarget WM_Task_2bk_Face_Median_RT WM_Task_2bk_Face_Median_RT_Target WM_Task_2bk_Face_Median_RT_Nontarget WM_Task_2bk_Place_Median_RT WM_Task_2bk_Place_Median_RT_Target WM_Task_2bk_Place_Median_RT_Nontarget WM_Task_2bk_Tool_Median_RT WM_Task_2bk_Tool_Median_RT_Target WM_Task_2bk_Tool_Median_RT_Nontarget Endurance_Unadj Endurance_AgeAdj GaitSpeed_Comp Dexterity_Unadj Dexterity_AgeAdj Strength_Unadj Strength_AgeAdj NEOFAC_A NEOFAC_O NEOFAC_C NEOFAC_N NEOFAC_E NEORAW_01 NEORAW_02 NEORAW_03 NEORAW_04 NEORAW_05 NEORAW_06 NEORAW_07 NEORAW_08 NEORAW_09 NEORAW_10 NEORAW_11 NEORAW_12 NEORAW_13 NEORAW_14 NEORAW_15 NEORAW_16 NEORAW_17 NEORAW_18 NEORAW_19 NEORAW_20 NEORAW_21 NEORAW_22 NEORAW_23 NEORAW_24 NEORAW_25 NEORAW_26 NEORAW_27 NEORAW_28 NEORAW_29 NEORAW_30 NEORAW_31 NEORAW_32 NEORAW_33 NEORAW_34 NEORAW_35 NEORAW_36 NEORAW_37 NEORAW_38 NEORAW_39 NEORAW_40 NEORAW_41 NEORAW_42 NEORAW_43 NEORAW_44 NEORAW_45 NEORAW_46 NEORAW_47 NEORAW_48 NEORAW_49 NEORAW_50 NEORAW_51 NEORAW_52 NEORAW_53 NEORAW_54 NEORAW_55 NEORAW_56 NEORAW_57 NEORAW_58 NEORAW_59 NEORAW_60 Noise_Comp Odor_Unadj Odor_AgeAdj PainIntens_RawScore PainInterf_Tscore Taste_Unadj Taste_AgeAdj Mars_Log_Score Mars_Errs Mars_Final
We select only those who has completed all scanning sessions. We will use these indices to select the relevant matrices later.
ids = pd.read_csv('data/subjectIDs_recon2.txt', names=['Subject'])
print(ids.shape)
ids.head()
(812, 1)
Subject | |
---|---|
0 | 100206 |
1 | 100610 |
2 | 101006 |
3 | 101107 |
4 | 101309 |
print(behav.shape)
behav.head()
(1206, 582)
Subject | Release | Acquisition | Gender | Age | 3T_Full_MR_Compl | T1_Count | T2_Count | 3T_RS-fMRI_Count | 3T_RS-fMRI_PctCompl | ... | Noise_Comp | Odor_Unadj | Odor_AgeAdj | PainIntens_RawScore | PainInterf_Tscore | Taste_Unadj | Taste_AgeAdj | Mars_Log_Score | Mars_Errs | Mars_Final | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100004 | S900 | Q06 | M | 22-25 | False | 0 | 0 | 0 | 0.0 | ... | 5.2 | 101.12 | 86.45 | 2.0 | 45.9 | 107.17 | 105.31 | 1.80 | 0.0 | 1.80 |
1 | 100206 | S900 | Q11 | M | 26-30 | True | 1 | 1 | 4 | 100.0 | ... | 6.0 | 108.79 | 97.19 | 1.0 | 49.7 | 72.63 | 72.03 | 1.84 | 0.0 | 1.84 |
2 | 100307 | Q1 | Q01 | F | 26-30 | True | 1 | 1 | 4 | 100.0 | ... | 3.6 | 101.12 | 86.45 | 0.0 | 38.6 | 71.69 | 71.76 | 1.76 | 0.0 | 1.76 |
3 | 100408 | Q3 | Q03 | M | 31-35 | True | 1 | 1 | 4 | 100.0 | ... | 2.0 | 108.79 | 98.04 | 2.0 | 52.6 | 114.01 | 113.59 | 1.76 | 2.0 | 1.68 |
4 | 100610 | S900 | Q08 | M | 26-30 | True | 2 | 1 | 4 | 100.0 | ... | 2.0 | 122.25 | 110.45 | 0.0 | 38.6 | 84.84 | 85.31 | 1.92 | 1.0 | 1.88 |
5 rows × 582 columns
Subject
column, call the new dataframe merged
.¶This effectively selects only the 812 subjects from which we have data
# %load solutions/ex2_1.py
merged = pd.merge(ids, behav, on='Subject')
print(merged.shape)
merged.head()
(812, 582)
Subject | Release | Acquisition | Gender | Age | 3T_Full_MR_Compl | T1_Count | T2_Count | 3T_RS-fMRI_Count | 3T_RS-fMRI_PctCompl | ... | Noise_Comp | Odor_Unadj | Odor_AgeAdj | PainIntens_RawScore | PainInterf_Tscore | Taste_Unadj | Taste_AgeAdj | Mars_Log_Score | Mars_Errs | Mars_Final | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 100206 | S900 | Q11 | M | 26-30 | True | 1 | 1 | 4 | 100.0 | ... | 6.0 | 108.79 | 97.19 | 1.0 | 49.7 | 72.63 | 72.03 | 1.84 | 0.0 | 1.84 |
1 | 100610 | S900 | Q08 | M | 26-30 | True | 2 | 1 | 4 | 100.0 | ... | 2.0 | 122.25 | 110.45 | 0.0 | 38.6 | 84.84 | 85.31 | 1.92 | 1.0 | 1.88 |
2 | 101006 | S500 | Q06 | F | 31-35 | True | 2 | 2 | 4 | 100.0 | ... | 6.0 | 122.25 | 111.41 | 0.0 | 38.6 | 123.80 | 123.31 | 1.80 | 0.0 | 1.80 |
3 | 101107 | S500 | Q06 | M | 22-25 | True | 2 | 2 | 4 | 100.0 | ... | 6.8 | 108.79 | 97.19 | 1.0 | 50.1 | 134.65 | 131.38 | 1.84 | 0.0 | 1.84 |
4 | 101309 | S500 | Q06 | M | 26-30 | True | 1 | 1 | 4 | 100.0 | ... | 5.2 | 122.25 | 110.45 | 0.0 | 38.6 | 106.39 | 104.39 | 1.80 | 0.0 | 1.80 |
5 rows × 582 columns
netmats = np.loadtxt('data/ICA_d100_netmats.txt')
netmats.shape # each row is a flattened adjacency matrix.
(812, 10000)
Rescale the data to 0 mean and 1 variance
scaler = StandardScaler()
netmats = scaler.fit_transform(netmats)
i.e. we want 812 100x100 matrices
# %load solutions/ex2_2.py
d = np.sqrt(netmats.shape[1]).astype(np.int)
print(d)
netmats = netmats.reshape(netmats.shape[0], d,d)
100
print(netmats.shape)
plot_matrix(netmats[60])
plt.show()
(812, 100, 100)
We will stick to gender for the sake of simplicity
# %load solutions/ex2_3.py
merged = merged.loc[:,['Subject','Gender']]
merged.head()
Subject | Gender | |
---|---|---|
0 | 100206 | M |
1 | 100610 | M |
2 | 101006 | F |
3 | 101107 | M |
4 | 101309 | M |
merged.shape
(812, 2)
netmats.shape
(812, 100, 100)
merged
.¶# %load solutions/ex2_4.py
# any NaNs?
np.any(merged.isna())
False
merged.Gender
0 M 1 M 2 F 3 M 4 M .. 807 M 808 F 809 F 810 M 811 F Name: Gender, Length: 812, dtype: object
female = netmats[merged.Gender=='F'].T
male = netmats[merged.Gender=='M'].T
print('M: ', male.shape, '\nF: ', female.shape)
M: (100, 100, 402) F: (100, 100, 410)
# %load solutions/ex2_6.py
# must be equal size
female = female[:,:,:male.shape[-1]]
print('M: ', male.shape, '\nF: ', female.shape)
male.shape == female.shape
M: (100, 100, 402) F: (100, 100, 402)
True
male.T.shape
(402, 100, 100)
help(do_nbs)
Help on function do_nbs in module __main__: do_nbs(g1, g2, kappa=0.8, plot=True, k=1000)
# https://stackoverflow.com/questions/16571150/how-to-capture-stdout-output-from-a-python-function-call
from io import StringIO
import sys
class Capturing(list):
def __enter__(self):
self._stdout = sys.stdout
sys.stdout = self._stringio = StringIO()
return self
def __exit__(self, *args):
self.extend(self._stringio.getvalue().splitlines())
del self._stringio # free up some memory
sys.stdout = self._stdout
Usage:
with Capturing() as output:
do_something(my_object)
output
is now a list containing the lines printed by the function call.
%%time
# after correcting
with Capturing() as output:
pval, adj, null = do_nbs(male, female, kappa=0.8)
CPU times: user 5min 8s, sys: 389 ms, total: 5min 9s Wall time: 5min 9s
max component size is 3876
estimating null distribution with 1000 permutations
permutation 0 of 1000. p-value so far is 0.000
permutation 100 of 1000. p-value so far is 0.000
permutation 200 of 1000. p-value so far is 0.000
permutation 300 of 1000. p-value so far is 0.000
permutation 400 of 1000. p-value so far is 0.000
permutation 500 of 1000. p-value so far is 0.000
permutation 600 of 1000. p-value so far is 0.000
permutation 700 of 1000. p-value so far is 0.000
permutation 800 of 1000. p-value so far is 0.000
permutation 900 of 1000. p-value so far is 0.000
permutation 999 of 1000. p-value so far is 0.000
output
['max component size is 3876', 'estimating null distribution with 1000 permutations', 'permutation 0 of 1000. p-value so far is 0.000', 'permutation 100 of 1000. p-value so far is 0.000', 'permutation 200 of 1000. p-value so far is 0.000', 'permutation 300 of 1000. p-value so far is 0.000', 'permutation 400 of 1000. p-value so far is 0.000', 'permutation 500 of 1000. p-value so far is 0.000', 'permutation 600 of 1000. p-value so far is 0.000', 'permutation 700 of 1000. p-value so far is 0.000', 'permutation 800 of 1000. p-value so far is 0.000', 'permutation 900 of 1000. p-value so far is 0.000', 'permutation 999 of 1000. p-value so far is 0.000']
# read from the print-out above
# max_comp_sz = 3876
max_comp_sz = int(output[0][22:])
plt.hist(null, bins=50)
plt.axvline(max_comp_sz, c='r')
plt.show()
Exact p-values are inaccurate in permutation tests when very small, so we can increase the number of permutations (and append to the null distribution above) if you want a more precise estimate. However, from the results we can say that the p-value at least is < 0.001 (it might be much lower, but we only did 1000 permutations). It should also be specified that the p-value for any statistical test is not concerned with the magnitude of the difference, but rather a measure of the certainty of whether there is a difference at all, irrespective of its size. We used a large dataset (N=400), so discovering a difference was not surprising.
The identified difference is not pertained to a single locus but is based on a subnetwork of edges in the connectome, specifically:
plt.imshow(adj, cmap='gray')
plt.show()
For those interested, one could download the cortical maps corresponding to the 100 regions from https://db.humanconnectome.org/, and plot the edges in question. However, the image above shows that the edges (white) cover the majority (80%) of the edges.
print('Graph density:')
bct.density_und(adj)
Graph density:
(0.7830303030303031, 100, 3876)
adj.sum() / (100*100-100)
0.7830303030303031
# read from the print-out above
#max_comp_sz = 3876
Furthermore, every node in the graph is included in the subnetwork, which can be dissapointing. More interesting is to redo the NBS at a stricter threshold, narrowing down the subnetwork to its most important "core". However, beware that this kappa value has to be predefined to be statistically valid. Commonly one would decide on a range of e.g. 4 values, and run the test for each.
%%time
# after correcting
with Capturing() as output2:
pval2, adj2, null2 = do_nbs(male, female, kappa=6, k=500)
CPU times: user 2min 46s, sys: 404 ms, total: 2min 46s Wall time: 2min 47s
output2
['max component size is 170', 'estimating null distribution with 500 permutations', 'permutation 0 of 500. p-value so far is 0.000', 'permutation 50 of 500. p-value so far is 0.000', 'permutation 100 of 500. p-value so far is 0.000', 'permutation 150 of 500. p-value so far is 0.000', 'permutation 200 of 500. p-value so far is 0.000', 'permutation 250 of 500. p-value so far is 0.000', 'permutation 300 of 500. p-value so far is 0.000', 'permutation 350 of 500. p-value so far is 0.000', 'permutation 400 of 500. p-value so far is 0.000', 'permutation 450 of 500. p-value so far is 0.000', 'permutation 499 of 500. p-value so far is 0.000']
plt.imshow(adj2)
plt.show()
bct.
bct.density_dir(adj2)
(0.03434343434343434, 100, 340)
plt.imshow(adj2, cmap='gray')
# number of nodes included:
print(adj2.sum(axis=0))
# number of nodes included in the sub-network
n_nodes = np.sum(adj2.sum(axis=0) > 0)
n_nodes
[ 0. 2. 10. 1. 6. 12. 0. 6. 2. 4. 7. 1. 13. 1. 5. 3. 4. 0. 6. 3. 3. 5. 2. 1. 1. 2. 10. 2. 2. 6. 3. 0. 1. 7. 9. 9. 0. 5. 1. 1. 2. 2. 7. 0. 6. 2. 1. 3. 3. 2. 4. 1. 1. 1. 1. 3. 13. 2. 0. 2. 1. 0. 1. 10. 5. 7. 0. 2. 7. 7. 12. 3. 9. 2. 6. 0. 11. 5. 0. 0. 0. 1. 0. 1. 0. 1. 1. 0. 0. 13. 0. 0. 7. 0. 8. 1. 1. 3. 2. 1.]
80
Degree functions allow to determine nodes with a large number of connections ("degree centrality"), while distance functions allow to determine nodes which are close to other nodes ("closeness centrality").
nbs.degree_centrality(adj2)
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-106-6862bf16dd94> in <module> ----> 1 nbs.degree_centrality(adj2) AttributeError: module 'bct.nbs' has no attribute 'degree_centrality'
networkx.algorithms.centrality.degree_centrality
degree_centrality(G)[source]
Compute the degree centrality for nodes.
The degree centrality for a node v is the fraction of nodes it is connected to.
Parameters
G (graph) – A networkx graph
Returns
nodes – Dictionary of nodes with degree centrality as the value.
Return type - dictionary
Notes
The degree centrality values are normalized by dividing by the maximum possible degree in a simple graph n-1 where n is the number of nodes in G.
G2 = nx.from_numpy_matrix(adj2)
D2 = nx.degree_centrality(G2)
plt.figure(figsize=(20,4))
plt.bar(range(len(D2)), list(D2.values()), align='center')
plt.xticks(range(len(D2)), list(D2.keys()), fontsize=7)
plt.show()
%%time
test3 = do_nbs(male, female, kappa=7, k=500)
max component size is 38 estimating null distribution with 500 permutations permutation 0 of 500. p-value so far is 0.000 permutation 50 of 500. p-value so far is 0.000 permutation 100 of 500. p-value so far is 0.000 permutation 150 of 500. p-value so far is 0.000 permutation 200 of 500. p-value so far is 0.000 permutation 250 of 500. p-value so far is 0.000 permutation 300 of 500. p-value so far is 0.000 permutation 350 of 500. p-value so far is 0.000 permutation 400 of 500. p-value so far is 0.000 permutation 450 of 500. p-value so far is 0.000 permutation 499 of 500. p-value so far is 0.000
CPU times: user 2min 41s, sys: 1.19 s, total: 2min 42s Wall time: 2min 42s
adj3 = test3[1]
plt.imshow(adj3, cmap='gray')
plt.show()
test4 = do_nbs(male, female, kappa=7.4, k=500)
max component size is 13 estimating null distribution with 500 permutations permutation 0 of 500. p-value so far is 0.000 permutation 50 of 500. p-value so far is 0.000 permutation 100 of 500. p-value so far is 0.000 permutation 150 of 500. p-value so far is 0.000 permutation 200 of 500. p-value so far is 0.000 permutation 250 of 500. p-value so far is 0.000 permutation 300 of 500. p-value so far is 0.000 permutation 350 of 500. p-value so far is 0.000 permutation 400 of 500. p-value so far is 0.000 permutation 450 of 500. p-value so far is 0.000 permutation 499 of 500. p-value so far is 0.000
adj4 = test4[1]
plt.imshow(adj4, cmap='gray')
plt.show()
print(adj4.sum(axis=0))
n_nodes = np.sum(adj4.sum(axis=0) > 0)
n_nodes
[0. 0. 6. 0. 0. 3. 0. 3. 0. 0. 9. 0. 9. 0. 6. 3. 3. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 6. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 6. 0. 2. 0. 1. 1. 2. 6. 0. 3. 0. 0. 0. 9. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 6. 0. 0. 0. 0. 0. 0. 0. 4. 0. 0.]
25
Now we have boiled it down into 25 nodes. These nodes are based on an ICA decomposition of the brain activity, so we would have to download an additional dataset from the connectome database in order to visualize them. However, we will not do it here due to the size of the data.
Tip: you can read more about the behavioural variables here and what the column names are in the dataframe: https://wiki.humanconnectome.org/display/PublicData/HCP+Data+Dictionary+Public-+Updated+for+the+1200+Subject+Release
## YOUR CODE HERE
The below code use the nilearn.datasets
module to download fMRI scans from the Autism Brain Imaging Data Exchange (ABIDE) study and process it in a standard network-modelling pipeline using the Yeo-2011 atlas for parcellating the cortex into 17 regions. We then proceed to perform NBS to compare the autism group to matching controls.
However, this loads raw fMRI data and not the network matrices. fMRI data is notoriously large, and it is also fairly time consuming to process (despite preprocessing has already been done, so beware of having to wait some minutes.
%%time
n_subjects = 20
import os
import numpy as np
from nilearn import datasets
yeo = datasets.fetch_atlas_yeo_2011()
autism = datasets.fetch_abide_pcp(n_subjects=n_subjects, DX_GROUP=1)
control = datasets.fetch_abide_pcp(n_subjects=n_subjects, DX_GROUP=2)
from nilearn.input_data import NiftiLabelsMasker
from nilearn.connectome import ConnectivityMeasure
masker = NiftiLabelsMasker(labels_img = yeo['thick_17'], standardize=True,
memory='nilearn_cache')
measure = ConnectivityMeasure(kind='correlation')
ts_au=[]
for func in autism.func_preproc:
ts_au.append(masker.fit_transform(func))
ts_hc=[]
for func in control.func_preproc:
ts_hc.append(masker.fit_transform(func))
corr_au = measure.fit_transform(ts_au)
corr_hc = measure.fit_transform(ts_hc)
for i, m in enumerate(corr_au):
fn = f"abide_au{i+101}.txt"
fn = "data/abide/" + fn
np.savetxt(fn, m)
for i, m in enumerate(corr_hc):
fn = f"abide_hc{i+101}.txt"
fn = "data/abide/" + fn
np.savetxt(fn, m)
/Users/arvid/opt/anaconda3/envs/bmed360v2021/lib/python3.8/site-packages/numpy/lib/npyio.py:2349: VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default. output = genfromtxt(fname, **kwargs)
CPU times: user 59.9 s, sys: 4.13 s, total: 1min 4s Wall time: 1min 3s
datapath = 'data/abide'
mats = [np.loadtxt(fn) for fn in os.scandir(path=datapath) if fn.name.startswith('abide')]
[fn.name for fn in os.scandir(datapath)]
['abide_au112.txt', 'abide_au106.txt', 'abide_hc115.txt', 'abide_hc101.txt', 'abide_hc114.txt', 'abide_au107.txt', 'abide_au113.txt', 'abide_au105.txt', 'abide_au111.txt', 'abide_hc102.txt', 'abide_hc116.txt', 'abide_hc117.txt', 'abide_hc103.txt', 'abide_au110.txt', 'abide_au104.txt', 'abide_au114.txt', 'abide_hc107.txt', 'abide_hc113.txt', 'abide_hc112.txt', 'abide_hc106.txt', 'abide_au115.txt', 'abide_au101.txt', 'abide_au117.txt', 'abide_au103.txt', 'abide_hc110.txt', 'abide_hc104.txt', 'abide_hc105.txt', 'abide_hc111.txt', 'abide_au102.txt', 'abide_au116.txt', 'abide_hc108.txt', 'abide_hc120.txt', 'abide_hc109.txt', 'abide_au118.txt', 'abide_au119.txt', 'abide_au109.txt', 'abide_au120.txt', 'abide_au108.txt', 'abide_hc119.txt', 'abide_hc118.txt']
X = np.array(mats)
y = np.array(['au']*20 + ['hc']*20)
y==['au']
array([ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False])
print(y.shape); print(X.shape)
(40,) (40, 17, 17)
au = X[y=='au',:]
hc = X[y=='hc',:]
au.T.shape
(17, 17, 20)
kappa = 0.8
pval, adj, null = nbs.nbs_bct(au.T, hc.T, kappa)
max component size is 83 estimating null distribution with 1000 permutations permutation 0 of 1000. p-value so far is 0.000 permutation 100 of 1000. p-value so far is 0.198 permutation 200 of 1000. p-value so far is 0.244 permutation 300 of 1000. p-value so far is 0.249 permutation 400 of 1000. p-value so far is 0.224 permutation 500 of 1000. p-value so far is 0.232 permutation 600 of 1000. p-value so far is 0.221 permutation 700 of 1000. p-value so far is 0.208 permutation 800 of 1000. p-value so far is 0.210 permutation 900 of 1000. p-value so far is 0.203 permutation 999 of 1000. p-value so far is 0.204
plt.hist(null)
plt.axvline(36, c='red')
plt.show()
pval, adj, null = do_nbs(au.T, hc.T, kappa=0.3)
print(pval)
max component size is 112 estimating null distribution with 1000 permutations permutation 0 of 1000. p-value so far is 0.000 permutation 100 of 1000. p-value so far is 0.287 permutation 200 of 1000. p-value so far is 0.274 permutation 300 of 1000. p-value so far is 0.272 permutation 400 of 1000. p-value so far is 0.292 permutation 500 of 1000. p-value so far is 0.309 permutation 600 of 1000. p-value so far is 0.309 permutation 700 of 1000. p-value so far is 0.305 permutation 800 of 1000. p-value so far is 0.302 permutation 900 of 1000. p-value so far is 0.311 permutation 999 of 1000. p-value so far is 0.310
[0.31]
plt.hist(null)
plt.axvline(110, c='red')
plt.show()
We could identify no difference in the connectome between those with ADHD and controls with this setup. It might be unexpected that we don't observe any difference, but we must consider three factors: i) our data set is only 40 (compared to 800 in the first dataset), making it harder to detect any effect, ii) we used a very rough parcellation of only 17 nodes (compared to 100), again compromising sensitivity, and iii) the parcellation is based on predefined regions anatomically rather than mined from the brain activity itself (ICA). This scheme is simpler but vulnerable to functionally meaningless segmentation.
Pros
It let's you easily test hypotheses about group differences in networks (e.g. connectomes between sick and healthy).
You can identify edges disturbed by disease
There is no difficult math (non-parametric)
It deals naturally with the graph structure
Cons