fMRIflows is a consortium of many (dependent) fMRI analysis pipelines, including anatomical and functional pre-processing, univariate 1st and 2nd-level analysis, as well as multivariate pattern analysis.
This notebook will help you to setup the JSON specification files to run the individual analyses.
First, let's see what we've got:
from bids.layout import BIDSLayout
layout = BIDSLayout("/data/")
subject_list = sorted(layout.get_subjects())
subject_list
session_list = layout.get_sessions()
session_list
runs = layout.get_runs()
runs
task_id = sorted(layout.get_tasks())
task_id
To get information, such as TR and voxel resolution of functional images, let's collect the metadata from the functional images of all subjects (of the first task).
# List of functional images
func_files = layout.get(datatype='func', return_type='file', extension='.nii.gz',
suffix='bold', task=task_id[0])
func_files
Next, let's collect TR and voxel resolution of all functional images (of first task), overall subjects, sessions and runs.
import numpy as np
import nibabel as nb
resolution = np.array([nb.load(f).header.get_zooms() for f in func_files])
resolution
# Get median TR of all collected functional images
TR = np.median(resolution[:, 3])
TR
# Get average voxel resolution of all collected functional images
vox_res = [round(r, 3) for r in np.median(resolution[:, :3], axis=0)]
vox_res
Now that we know the content of our dataset, let's write the specification file for the anatomical and functional preprocessing workflow.
For the anatomical preprocessing workflow, we need only a few parameters:
# Create an empty dictionary
content_anat = {}
# List of subject identifiers
content_anat['subject_list_anat'] = subject_list
# List of session identifiers
content_anat['session_list_anat'] = session_list
# T1w image identifier (default: T1w)
content_anat['T1w_id'] = 'T1w'
# Voxel resolution of reference template
content_anat['res_norm'] = [1.0, 1.0, 1.0]
# Should ANTs Normalization a 'fast' or a 'precise' normalization (default: 'precise')
content_anat['norm_accuracy'] = 'precise'
To make sure that everything is as we want it, let's plot the parameters for the anatomical preprocessing pipeline again:
content_anat
For the functional preprocessing workflow, we need a few more parameters:
# Create an empty dictionary
content_func = {}
# List of subject identifiers
content_func['subject_list_func'] = subject_list
# List of session identifiers
content_func['session_list_func'] = session_list
# List of task identifiers
content_func['task_list'] = task_id
# List of run identifiers
content_func['run_list'] = runs
# Reference timepoint for slice time correction (in ms)
content_func['ref_timepoint'] = int(round((TR * 1000.) / 2.0, 3))
# Requested isometric voxel resolution after coregistration of functional images
content_func['res_func'] = round(np.median(vox_res).astype('float64'), 3)
# List of spatial filters (smoothing) to apply (separetely, i.e. with iterables)
# Values are given in mm
content_func['filters_spatial'] = [["LP", 3. * content_func['res_func']]]
# List of temporal filters to apply (separetely, i.e. with iterables)
# Values are given in seconds
content_func['filters_temporal'] = [[None, 100.]]
And for the confound sub-workflow, we need to specify the number of CompCor
components that should be computed, the thresholds to detect outliers in FD
, DVARS
, TV
, GM
, WM
, CSF
, as well as the number of independent components that should be extracted from the preprocessed signal.
# Number of CompCor components to compute
content_func['n_compcor_confounds'] = 5
# Threshold for outlier detection (3.27 represents a threshold of 99.9%)
content_func['outlier_thresholds'] = [3.27, 3.27, 3.27, None, None, None]
# Number of independent components to compute
content_func['n_independent_components'] = 10
To make sure that everything is as we want it, let's plot the parameters for the functional preprocessing pipeline again:
content_func
JSON
specification file¶We will be using one common JSON
file for the specifications for the anatomical and functional preprocessing pipelines. The creation of this file is rather simple:
content = {}
content.update(content_anat)
content.update(content_func)
The only thing that we're still missing is the number of parallel processes that we want to allow during the execution of the preprocessing workflows:
# Number of parallel jops to use during the preprocessing
import multiprocessing
n_proc = multiprocessing.cpu_count() - 1
n_proc
content['n_parallel_jobs'] = n_proc
Now, we're ready to write the content
to a JSON
file. By default the filename is fmriflows_spec_preproc.json
:
import json
file_id = 'fmriflows_spec_preproc'
with open('/data/%s.json' % file_id, 'w') as f:
json.dump(content, f, indent=4)
After the anatomical and functional preprocessing pipelines were run, we can move on to the 1st-level univariate and multivariate analysis. For this we need to get the different conditions and corresponding contrasts, for each task.
So let's explore the dataset:
# Create an empty dictionary
content_analysis = {}
content_analysis['tasks'] = {}
for t in task_id:
# Collect first event file of a given task
idx = 0
condition_names = []
while len(condition_names) == 0 and idx < len(func_files):
event_file = layout.get(return_type='file', suffix='events', task=t)[idx]
# Collect unique condition names
import pandas as pd
df = pd.read_csv(event_file, sep=' ')
condition_names = [str(c) for c in np.unique(df['trial_type'])]
idx += 1
# Create contrasts (unique and versus rest)
contrasts = []
# Adding unique contrasts, i.e. [1, 0, 0, 0, ...]
eye_matrix = np.eye(len(condition_names)).tolist()
contrasts += [[c, eye_matrix[i], 'T'] for i, c in enumerate(condition_names)]
# Add one vs. rest contrasts, i.e. [1, -0.25, -0.25, -0.25, -0.25]
try:
rest_matrix = np.copy(eye_matrix)
rest_matrix[rest_matrix==0] = np.round(-1./(len(condition_names) - 1), 4)
contrasts += [['%s_vs_rest' % c, rest_matrix[i].tolist(), 'T']
for i, c in enumerate(condition_names)]
except:
pass
# Store the task specific information in a dictionray
content_task = {}
content_task['condition_names'] = condition_names
# Add contrasts to task dictionary
content_task['contrasts'] = contrasts
# Store everything ini the analysis dictionary
content_analysis['tasks'][t] = content_task
# Processing feedback
print('Task \'%s\' finished.' % t)
So what do we have so far?
from pprint import pprint
pprint(content_analysis['tasks'])
If you want to add some additional contrasts, either add them to content_analysis['tasks'][task_id]['contrasts']
or directly adapt the content of the fmriflows_spec_analysis.json
file, once this section was excecuted.
The next section specifies additional model parameters. The assumption is that all different tasks will use those same parameters. If this is not the case, specify multiple fmriflows_spec_analysis.json
files and run them individually in the 03_analysis_1st-level
notebook.
# List of subject identifiers
content_analysis['subject_list'] = subject_list
# List of session identifiers
content_analysis['session_list'] = session_list
# List of spatial filters (smoothing) that were used during functional preprocessing
content_analysis['filters_spatial'] = content_func['filters_spatial']
# List of temporal filters that were used during functional preprocessing
content_analysis['filters_temporal'] = content_func['filters_temporal']
# Nuisance identifiers that should be included in the GLM
content_analysis['nuisance_regressors'] = ['Rotation', 'Translation', 'FD', 'DVARS']
# If outliers detected during functional preprocing should be used in GLM
content_analysis['use_outliers'] = True
# Serial Correlation Model to use: 'AR(1)', 'FAST' or 'none'
content_analysis['model_serial_correlations'] = 'AR(1)'
# Model bases to use: 'hrf', 'fourier', 'fourier', 'fourier_han', 'gamma' or 'fir'
# If 'hrf', also specify if time and dispersion derivatives should be used
content_analysis['model_bases'] = {'hrf': {'derivs': [0, 0]}}
# Estimation Method to use: 'Classical', 'Bayesian' or 'Bayesian2'
content_analysis['estimation_method'] = {'Classical': 1}
# Specify if contrasts should be normalized to template space after estimation
content_analysis['normalize'] = True
# Specify voxel resolution of normalized contrasts
content_analysis['norm_res'] = [1, 1, 1]
# Specify if a contrast should be computed for stimuli category per run
content_analysis['con_per_run'] = True
# Specify voxel resolution of normalized contrasts for multivariate analysis
content_analysis['norm_res_multi'] = [float(v) for v in vox_res]
# Specify a particular analysis postfix
content_analysis['analysis_postfix'] = ''
Some of the parameters for the 2nd-level analysis can directly be taken from the 1st-level analysis parameters, i.e. subject names, filters, etc. Here we specify some additional parameters, unique to the 2nd-level analysis. They will be stored together with the 1st-level parameters in a file called fmriflows_spec_analysis.json
. To run the 2nd-level analysis, run the 04_analysis_2nd-level
notebook.
# Specify value to threshold gray matter probability template to create 2nd-level mask
content_analysis['gm_mask_thr'] = 0.1
# Value for initial thresholding to define clusters
content_analysis['height_threshold'] = 0.001
# Whether to use FWE (Bonferroni) correction for initial threshold
content_analysis['use_fwe_correction'] = False
# Minimum cluster size in voxels
content_analysis['extent_threshold'] = 5
# Whether to use FDR correction over cluster extent probabilities
content_analysis['use_topo_fdr'] = True
# P threshold to use to on FDR corrected cluster size probabilities
content_analysis['extent_fdr_p_threshold'] = 0.05
# Name of atlases to use for creation of output tables
content_analysis['atlasreader_names'] = 'default'
# Probability threshold to use for output tables
content_analysis['atlasreader_prob_thresh'] = 5
JSON
specification file¶As in the preprocessing example, we will be using one common JSON
file for the specifications of the 1st-level and 2nd-level analysis. The creation of this file is as before:
# Number of parallel jops to use during the preprocessing
import multiprocessing
n_proc = multiprocessing.cpu_count() - 1
content_analysis['n_parallel_jobs'] = n_proc
import json
file_id = 'fmriflows_spec_analysis'
with open('/data/%s.json' % file_id, 'w') as f:
json.dump(content_analysis, f, indent=4)
content_analysis
Once the 1st-level analysis was run with the parameter con_per_run
set to True
, you can run a multivariate searchlight analysis. Keep in mind that this might take some time, especially if you want to perform a correct group analysis, as proposed in Stelzer et al. (2013).
# Create an empty dictionary
content_multivariate = {}
# List of subject identifiers
content_multivariate['subject_list'] = subject_list
# List of session identifiers
content_multivariate['session_list'] = session_list
# List of spatial filters (smoothing) that were used during functional preprocessing
content_multivariate['filters_spatial'] = content_func['filters_spatial']
# List of temporal filters that were used during functional preprocessing
content_multivariate['filters_temporal'] = content_func['filters_temporal']
# Specify a particular multivariate postfix
content_multivariate['multivariate_postfix'] = ''
# Specify which classifier to use, options are one or many of:
# 'LinearCSVMC', 'LinearNuSVMC', 'RbfCSVMC', 'RbfNuSVMC', 'SMLR', 'kNN', 'GNB'
content_multivariate['clf_names'] = ['LinearNuSVMC']
# Searchlight sphere radius (in voxels), i.e. number of additional voxels
# next to the center voxel. E.g sphere_radius = 3 means radius = 3.5*voxelsize
content_multivariate['sphere_radius'] = 3
# Comment 01: ~10mm radius seems to be a good starting point
# Comment 02: How many voxels are in a sphere of a given radius?
# Radius = 2 -> Voxel in Sphere = 33
# Radius = 3 -> Voxel in Sphere = 123
# Radius = 4 -> Voxel in Sphere = 257
# Radius = 5 -> Voxel in Sphere = 515
# Radius = 6 -> Voxel in Sphere = 925
# Number of step size to define a sphere center, i.e. value of 5 means
# that only every 5th voxel is used to perform a searchlight analysis
content_multivariate['sphere_steps'] = 3
# Number of chunks to use for the N-Fold crossvalidation
# (needs to divide number of labels without reminder)
content_multivariate['n_chunks'] = len(runs)
# Which classifications should be performed? (separated by task)
# - Classification targets are a tuple of two tuples, indicating
# - Target classification to train and target classification to test
content_multivariate['tasks'] = {}
for t in task_id:
# Collect first event file of a given task
event_file = layout.get(return_type='file', suffix='events', task=t)[0]
# Collect unique condition names
import pandas as pd
df = pd.read_csv(event_file, sep=' ')
condition_names = [str(c) for c in np.unique(df['trial_type'])]
import itertools
target_list = list(itertools.combinations(condition_names, 2))
# Store everything in the analysis dictionary
if len(target_list) != 0:
content_multivariate['tasks'][t] = [z for z in zip(target_list, target_list)]
# Show the proposed classification targets
from pprint import pprint
pprint(content_multivariate['tasks'])
# Number of permutations to indicate group-analysis strategy:
# n_perm <= 1: group analysis is classical 2nd-level GLM analysis
# n_perm > 1: group analysis is multivariate analysis according to Stelzer et al. (2013)
content_multivariate['n_perm'] = 100
# Number of bootstrap samples to be generated
content_multivariate['n_bootstrap'] = 100000
# Number of elements per segment used to compute the feature-wise NULL distributions
# Low number mean low memory demand and slow computation time
content_multivariate['block_size'] = 1000
# Feature-wise probability threshold per voxel
content_multivariate['threshold'] = 0.001
# Strategy for multiple comparison correction, options are: 'bonferroni', 'sidak',
# 'holm-sidak', 'holm', 'simes-hochberg', 'hommel', 'fdr_bh', 'fdr_by', None
content_multivariate['multicomp_correction'] = 'fdr_bh'
# Family-wise error rate threshold for multiple comparison correction
content_multivariate['fwe_rate'] = 0.05
# Name of atlases to use for creation of output tables
content_multivariate['atlasreader_names'] = 'default'
# Probability threshold to use for output tables
content_multivariate['atlasreader_prob_thresh'] = 5
JSON
specification file¶The multivariate analysis parameters will be stored in a common JSON
file, as before:
# Number of parallel jops to use during the preprocessing
import multiprocessing
n_proc = multiprocessing.cpu_count() - 1
content_multivariate['n_parallel_jobs'] = n_proc
import json
file_id = 'fmriflows_spec_multivariate'
with open('/data/%s.json' % file_id, 'w') as f:
json.dump(content_multivariate, f, indent=4)
content_multivariate