Welcome to fMRIflows¶

fMRIflows is a consortium of many (dependent) fMRI analysis pipelines, including anatomical and functional pre-processing, univariate 1st and 2nd-level analysis, as well as multivariate pattern analysis.

This notebook will help you to setup the JSON specification files to run the individual analyses.

Exploration of the dataset¶

First, let's see what we've got:

In [ ]:

from bids.layout import BIDSLayout
layout = BIDSLayout("/data/")

List of subjects¶

In [ ]:

subject_list = sorted(layout.get_subjects())
subject_list

Sessions in the dataset¶

In [ ]:

session_list = layout.get_sessions()
session_list

Runs in the dataset¶

In [ ]:

runs = layout.get_runs()
runs

Tasks in the dataset¶

In [ ]:

task_id = sorted(layout.get_tasks())
task_id

Metadata in dataset¶

To get information, such as TR and voxel resolution of functional images, let's collect the metadata from the functional images of all subjects (of the first task).

In [ ]:

# List of functional images
func_files = layout.get(datatype='func', return_type='file', extension='.nii.gz',
                        suffix='bold', task=task_id[0])
func_files

Next, let's collect TR and voxel resolution of all functional images (of first task), overall subjects, sessions and runs.

In [ ]:

import numpy as np
import nibabel as nb

resolution = np.array([nb.load(f).header.get_zooms() for f in func_files])
resolution

In [ ]:

# Get median TR of all collected functional images
TR = np.median(resolution[:, 3])
TR

In [ ]:

# Get average voxel resolution of all collected functional images
vox_res = [round(r, 3) for r in np.median(resolution[:, :3], axis=0)]
vox_res

Specifications for preprocessing workflows¶

Now that we know the content of our dataset, let's write the specification file for the anatomical and functional preprocessing workflow.

Specification for the anatomical preprocessing workflow¶

For the anatomical preprocessing workflow, we need only a few parameters:

In [ ]:

# Create an empty dictionary
content_anat = {}

In [ ]:

# List of subject identifiers
content_anat['subject_list_anat'] = subject_list

In [ ]:

# List of session identifiers
content_anat['session_list_anat'] = session_list

In [ ]:

# T1w image identifier (default: T1w)
content_anat['T1w_id'] = 'T1w'

In [ ]:

# Voxel resolution of reference template 
content_anat['res_norm'] = [1.0, 1.0, 1.0]

In [ ]:

# Should ANTs Normalization a 'fast' or a 'precise' normalization (default: 'precise')
content_anat['norm_accuracy'] = 'precise'

To make sure that everything is as we want it, let's plot the parameters for the anatomical preprocessing pipeline again:

In [ ]:

content_anat

Specification for the functional preprocessing workflow¶

For the functional preprocessing workflow, we need a few more parameters:

In [ ]:

# Create an empty dictionary
content_func = {}

In [ ]:

# List of subject identifiers
content_func['subject_list_func'] = subject_list

In [ ]:

# List of session identifiers
content_func['session_list_func'] = session_list

In [ ]:

# List of task identifiers
content_func['task_list'] = task_id

In [ ]:

# List of run identifiers
content_func['run_list'] = runs

In [ ]:

# Reference timepoint for slice time correction (in ms)
content_func['ref_timepoint'] = int(round((TR * 1000.) / 2.0, 3))

In [ ]:

# Requested isometric voxel resolution after coregistration of functional images
content_func['res_func'] = round(np.median(vox_res).astype('float64'), 3)

In [ ]:

# List of spatial filters (smoothing) to apply (separetely, i.e. with iterables)
# Values are given in mm
content_func['filters_spatial'] = [["LP", 3. * content_func['res_func']]]

In [ ]:

# List of temporal filters to apply (separetely, i.e. with iterables)
# Values are given in seconds
content_func['filters_temporal'] = [[None, 100.]]

And for the confound sub-workflow, we need to specify the number of CompCor components that should be computed, the thresholds to detect outliers in FD, DVARS, TV, GM, WM, CSF, as well as the number of independent components that should be extracted from the preprocessed signal.

In [ ]:

# Number of CompCor components to compute
content_func['n_compcor_confounds'] = 5

In [ ]:

# Threshold for outlier detection (3.27 represents a threshold of 99.9%)
content_func['outlier_thresholds'] = [3.27, 3.27, 3.27, None, None, None]

In [ ]:

# Number of independent components to compute
content_func['n_independent_components'] = 10

To make sure that everything is as we want it, let's plot the parameters for the functional preprocessing pipeline again:

In [ ]:

content_func

Creating the `JSON` specification file¶

We will be using one common JSON file for the specifications for the anatomical and functional preprocessing pipelines. The creation of this file is rather simple:

In [ ]:

content = {}
content.update(content_anat)
content.update(content_func)

The only thing that we're still missing is the number of parallel processes that we want to allow during the execution of the preprocessing workflows:

In [ ]:

# Number of parallel jops to use during the preprocessing
import multiprocessing
n_proc = multiprocessing.cpu_count() - 1
n_proc

In [ ]:

content['n_parallel_jobs'] = n_proc

Now, we're ready to write the content to a JSON file. By default the filename is fmriflows_spec_preproc.json:

In [ ]:

import json
file_id = 'fmriflows_spec_preproc'
with open('/data/%s.json' % file_id, 'w') as f:
    json.dump(content, f, indent=4)

Specifications for 1st-level analysis Workflows¶

After the anatomical and functional preprocessing pipelines were run, we can move on to the 1st-level univariate and multivariate analysis. For this we need to get the different conditions and corresponding contrasts, for each task.

Exploration of the dataset¶

So let's explore the dataset:

In [ ]:

# Create an empty dictionary
content_analysis = {}

In [ ]:

content_analysis['tasks'] = {}

for t in task_id:
    
    # Collect first event file of a given task
    idx = 0
    condition_names = []
    while len(condition_names) == 0 and idx < len(func_files):
        event_file = layout.get(return_type='file', suffix='events', task=t)[idx]

        # Collect unique condition names
        import pandas as pd
        df = pd.read_csv(event_file, sep='	')
        condition_names = [str(c) for c in np.unique(df['trial_type'])]
        idx += 1
    
    # Create contrasts (unique and versus rest)
    contrasts = []
    
    # Adding unique contrasts, i.e. [1, 0, 0, 0, ...]
    eye_matrix = np.eye(len(condition_names)).tolist()
    contrasts += [[c, eye_matrix[i], 'T'] for i, c in enumerate(condition_names)]
    
    # Add one vs. rest contrasts, i.e. [1, -0.25, -0.25, -0.25, -0.25]
    try:
        rest_matrix = np.copy(eye_matrix)
        rest_matrix[rest_matrix==0] = np.round(-1./(len(condition_names) - 1), 4)
        contrasts += [['%s_vs_rest' % c, rest_matrix[i].tolist(), 'T']
                      for i, c in enumerate(condition_names)]
    except:
        pass
        
    # Store the task specific information in a dictionray
    content_task = {}
    content_task['condition_names'] = condition_names
    
    # Add contrasts to task dictionary
    content_task['contrasts'] = contrasts
    
    # Store everything ini the analysis dictionary
    content_analysis['tasks'][t] = content_task
    
    # Processing feedback
    print('Task \'%s\' finished.' % t)

So what do we have so far?

In [ ]:

from pprint import pprint
pprint(content_analysis['tasks'])

If you want to add some additional contrasts, either add them to content_analysis['tasks'][task_id]['contrasts'] or directly adapt the content of the fmriflows_spec_analysis.json file, once this section was excecuted.

Specify additional 1st-level analysis parameters¶

The next section specifies additional model parameters. The assumption is that all different tasks will use those same parameters. If this is not the case, specify multiple fmriflows_spec_analysis.json files and run them individually in the 03_analysis_1st-level notebook.

In [ ]:

# List of subject identifiers
content_analysis['subject_list'] = subject_list

In [ ]:

# List of session identifiers
content_analysis['session_list'] = session_list

In [ ]:

# List of spatial filters (smoothing) that were used during functional preprocessing
content_analysis['filters_spatial'] = content_func['filters_spatial']

In [ ]:

# List of temporal filters that were used during functional preprocessing
content_analysis['filters_temporal'] = content_func['filters_temporal']

In [ ]:

# Nuisance identifiers that should be included in the GLM
content_analysis['nuisance_regressors'] = ['Rotation', 'Translation', 'FD', 'DVARS']

In [ ]:

# If outliers detected during functional preprocing should be used in GLM
content_analysis['use_outliers'] = True

In [ ]:

# Serial Correlation Model to use: 'AR(1)', 'FAST' or 'none'
content_analysis['model_serial_correlations'] = 'AR(1)'

In [ ]:

# Model bases to use: 'hrf', 'fourier', 'fourier', 'fourier_han', 'gamma' or 'fir'
# If 'hrf', also specify if time and dispersion derivatives should be used
content_analysis['model_bases'] = {'hrf': {'derivs': [0, 0]}}

In [ ]:

# Estimation Method to use: 'Classical', 'Bayesian' or 'Bayesian2'
content_analysis['estimation_method'] = {'Classical': 1}

In [ ]:

# Specify if contrasts should be normalized to template space after estimation
content_analysis['normalize'] = True

In [ ]:

# Specify voxel resolution of normalized contrasts
content_analysis['norm_res'] = [1, 1, 1]

In [ ]:

# Specify if a contrast should be computed for stimuli category per run
content_analysis['con_per_run'] = True

In [ ]:

# Specify voxel resolution of normalized contrasts for multivariate analysis
content_analysis['norm_res_multi'] = [float(v) for v in vox_res]

In [ ]:

# Specify a particular analysis postfix
content_analysis['analysis_postfix'] = ''

Specify 2nd-level analysis parameters¶

Some of the parameters for the 2nd-level analysis can directly be taken from the 1st-level analysis parameters, i.e. subject names, filters, etc. Here we specify some additional parameters, unique to the 2nd-level analysis. They will be stored together with the 1st-level parameters in a file called fmriflows_spec_analysis.json. To run the 2nd-level analysis, run the 04_analysis_2nd-level notebook.

In [ ]:

# Specify value to threshold gray matter probability template to create 2nd-level mask
content_analysis['gm_mask_thr'] = 0.1

In [ ]:

# Value for initial thresholding to define clusters
content_analysis['height_threshold'] = 0.001

In [ ]:

# Whether to use FWE (Bonferroni) correction for initial threshold
content_analysis['use_fwe_correction'] = False

In [ ]:

# Minimum cluster size in voxels 
content_analysis['extent_threshold'] = 5

In [ ]:

# Whether to use FDR correction over cluster extent probabilities
content_analysis['use_topo_fdr'] = True

In [ ]:

# P threshold to use to on FDR corrected cluster size probabilities
content_analysis['extent_fdr_p_threshold'] = 0.05

In [ ]:

# Name of atlases to use for creation of output tables
content_analysis['atlasreader_names'] = 'default'

In [ ]:

# Probability threshold to use for output tables
content_analysis['atlasreader_prob_thresh'] = 5

Creating the `JSON` specification file¶

As in the preprocessing example, we will be using one common JSON file for the specifications of the 1st-level and 2nd-level analysis. The creation of this file is as before:

In [ ]:

# Number of parallel jops to use during the preprocessing
import multiprocessing
n_proc = multiprocessing.cpu_count() - 1
content_analysis['n_parallel_jobs'] = n_proc

In [ ]:

import json
file_id = 'fmriflows_spec_analysis'
with open('/data/%s.json' % file_id, 'w') as f:
    json.dump(content_analysis, f, indent=4)

In [ ]:

content_analysis

Multivariate analysis¶

Once the 1st-level analysis was run with the parameter con_per_run set to True, you can run a multivariate searchlight analysis. Keep in mind that this might take some time, especially if you want to perform a correct group analysis, as proposed in Stelzer et al. (2013).

Specify multivariate analysis parameters¶

In [ ]:

# Create an empty dictionary
content_multivariate = {}

In [ ]:

# List of subject identifiers
content_multivariate['subject_list'] = subject_list

In [ ]:

# List of session identifiers
content_multivariate['session_list'] = session_list

In [ ]:

# List of spatial filters (smoothing) that were used during functional preprocessing
content_multivariate['filters_spatial'] = content_func['filters_spatial']

In [ ]:

# List of temporal filters that were used during functional preprocessing
content_multivariate['filters_temporal'] = content_func['filters_temporal']

In [ ]:

# Specify a particular multivariate postfix
content_multivariate['multivariate_postfix'] = ''

In [ ]:

# Specify which classifier to use, options are one or many of:
#   'LinearCSVMC', 'LinearNuSVMC', 'RbfCSVMC', 'RbfNuSVMC', 'SMLR', 'kNN', 'GNB'
content_multivariate['clf_names'] = ['LinearNuSVMC']

In [ ]:

# Searchlight sphere radius (in voxels), i.e. number of additional voxels
#    next to the center voxel. E.g sphere_radius = 3 means radius = 3.5*voxelsize
content_multivariate['sphere_radius'] = 3

# Comment 01: ~10mm radius seems to be a good starting point
# Comment 02: How many voxels are in a sphere of a given radius?
#             Radius = 2 -> Voxel in Sphere =  33
#             Radius = 3 -> Voxel in Sphere = 123
#             Radius = 4 -> Voxel in Sphere = 257
#             Radius = 5 -> Voxel in Sphere = 515
#             Radius = 6 -> Voxel in Sphere = 925

In [ ]:

# Number of step size to define a sphere center, i.e. value of 5 means
#    that only every 5th voxel is used to perform a searchlight analysis
content_multivariate['sphere_steps'] = 3

In [ ]:

# Number of chunks to use for the N-Fold crossvalidation
#    (needs to divide number of labels without reminder)
content_multivariate['n_chunks'] = len(runs)

In [ ]:

# Which classifications should be performed? (separated by task)
#    - Classification targets are a tuple of two tuples, indicating
#    - Target classification to train and target classification to test
content_multivariate['tasks'] = {}

for t in task_id:
    
    # Collect first event file of a given task
    event_file = layout.get(return_type='file', suffix='events', task=t)[0]
    
    # Collect unique condition names
    import pandas as pd
    df = pd.read_csv(event_file, sep='	')
    condition_names = [str(c) for c in np.unique(df['trial_type'])]
    
    import itertools
    target_list = list(itertools.combinations(condition_names, 2))
    
    # Store everything in the analysis dictionary
    if len(target_list) != 0:
        content_multivariate['tasks'][t] = [z for z in zip(target_list, target_list)]

# Show the proposed classification targets
from pprint import pprint
pprint(content_multivariate['tasks'])

Parameters needed for Stelzer's Group Cluster Threshold approach¶

In [ ]:

# Number of permutations to indicate group-analysis strategy:
#    n_perm <= 1: group analysis is classical 2nd-level GLM analysis
#    n_perm > 1: group analysis is multivariate analysis according to Stelzer et al. (2013)
content_multivariate['n_perm'] = 100

In [ ]:

# Number of bootstrap samples to be generated
content_multivariate['n_bootstrap'] = 100000

In [ ]:

# Number of elements per segment used to compute the feature-wise NULL distributions
# Low number mean low memory demand and slow computation time
content_multivariate['block_size'] = 1000

In [ ]:

# Feature-wise probability threshold per voxel
content_multivariate['threshold'] = 0.001

In [ ]:

# Strategy for multiple comparison correction, options are: 'bonferroni', 'sidak',
# 'holm-sidak', 'holm', 'simes-hochberg', 'hommel', 'fdr_bh', 'fdr_by', None
content_multivariate['multicomp_correction'] = 'fdr_bh'

In [ ]:

# Family-wise error rate threshold for multiple comparison correction
content_multivariate['fwe_rate'] = 0.05

In [ ]:

# Name of atlases to use for creation of output tables
content_multivariate['atlasreader_names'] = 'default'

In [ ]:

# Probability threshold to use for output tables
content_multivariate['atlasreader_prob_thresh'] = 5

Creating the `JSON` specification file¶

The multivariate analysis parameters will be stored in a common JSON file, as before:

In [ ]:

# Number of parallel jops to use during the preprocessing
import multiprocessing
n_proc = multiprocessing.cpu_count() - 1
content_multivariate['n_parallel_jobs'] = n_proc

In [ ]:

import json
file_id = 'fmriflows_spec_multivariate'
with open('/data/%s.json' % file_id, 'w') as f:
    json.dump(content_multivariate, f, indent=4)

In [ ]:

content_multivariate

Welcome to fMRIflows¶

Exploration of the dataset¶

List of subjects¶

Sessions in the dataset¶

Runs in the dataset¶

Tasks in the dataset¶

Metadata in dataset¶

Specifications for preprocessing workflows¶

Specification for the anatomical preprocessing workflow¶

Specification for the functional preprocessing workflow¶

Creating the JSON specification file¶

Specifications for 1st-level analysis Workflows¶

Exploration of the dataset¶

Specify additional 1st-level analysis parameters¶

Specify 2nd-level analysis parameters¶

Creating the JSON specification file¶

Multivariate analysis¶

Specify multivariate analysis parameters¶

Parameters needed for Stelzer's Group Cluster Threshold approach¶

Creating the JSON specification file¶

Creating the `JSON` specification file¶

Creating the `JSON` specification file¶

Creating the `JSON` specification file¶