Hands-on: Numerical Python -- Core tools for loading and storing data¶

Objectives

To get grasp of common utilities for loading data from/to common data formats. Upon completion of this class you should be able

to load data from CSV files as Python disctionary or NumPy array
pickle and unpickle data
load and save neuroimaging data

CSV (Comma separated values)¶

Tabular data is often stored in CSV files, of which we have a little fake example:

In [ ]:

!head files/exp_res.csv

builtin csv module can be a big help to load them up:

In [ ]:

import csv

In [ ]:

entries = list(csv.DictReader(open('files/exp_res.csv','r')))

which later could also be saved

In [ ]:

with open('files/exp_res_saved.csv', 'w') as f:
    writer = csv.DictWriter(f, entries[0].keys())
    writer.writeheader()
    writer.writerows(entries)

In [ ]:

!head files/exp_res_saved.csv

Excercise

In our HW2/HW3 we stored response and response time in our custom format. Revisit your HW3 (you must have submitted by now, if not -- HW2), add cmdline option to store in CSV, upon which the same information should be stored as CSV format. Hint: You don't have to use DictWriter ;-)

Pickling¶

And if dealing with any Python data type, regardless how nested they are, we can pickle those data structures (well, fancy work would be to "serialize") and store to a file:

In [ ]:

import pickle
with open("files/exp_res.pickle", "w") as f:
    pickle.dump(entries, f)

so they could be stored in file and reloaded later on:

In [ ]:

with open("files/exp_res.pickle") as f:
    entries_in_winter = pickle.load(f)

print entries_in_winter == entries

Pickling is convenient but

Python-specific: not portable across languages/environments
Often not efficient for storing vast (numpy) data arrays
csv/txt files are more portable happen you have simple tabular data
you might like to look into json if you want to go beyond tabular data while staying language agnostic, lightweight and standardized

NumPy I/O functionality¶

In [ ]:

import numpy as np

Numpy's loadtxt provides a helper to load any tabular data from a text file

In [ ]:

a = np.loadtxt('files/exp_res.csv', delimiter=',', skiprows=1)
print a[:2]

And gurus could use structured data types (which I successfully avoided to describe in the previous lecture)

In [ ]:

a = np.loadtxt('files/exp_res.csv', delimiter=',', skiprows=1,
               dtype=[('Subject', int), ('Question', int), ('RT', float)])

In [ ]:

print a[:2]

In [ ]:

print a["Question"]

In [ ]:

print a[["Question", "RT"]]

Excercise

Choose any method from above (csv, numpy) to load this sample data and compute per question mean/std. Do not assume that entries anyhow ordered. You might like to use "np.unique" function

Neuroimaging data I/O: nibabel¶

The holy grail library for anyone doing neuroimaging in Python. Provides I/O to a variety (but not all, yet) of neuroimaging formats making access to data and meta-information very easy.

In [ ]:

import nibabel as nib

Commands below make use of git-annex to download tutorial data for PyMVPA. If you don't have annex installed -- you would benefit from installing it. Otherwise, the same data is available as a tarball from http://data.pymvpa.org/datasets/tutorial_data -- download and extract it.

In [ ]:

!git clone http://data.pymvpa.org/datasets/tutorial_data/.git

In [ ]:

!cd tutorial_data/data; git annex get anat.nii.gz mask_* bold.nii.gz

To open any neuroimaging data file, use .load

In [ ]:

anat = nib.load('tutorial_data/data/anat.nii.gz')

which will open and load meta-information but not real data, which you can load if desired as a numpy array:

In [ ]:

anat_data = anat.get_data()

You can now analyze that data as any other numpy array and whenever you are done -- save it as another Nifti image. E.g. let's perform poor man skull stripping with which we remove all voxels which carry values less than 50:

In [ ]:

anat_data[anat_data < 50] = 0

In [ ]:

nib.Nifti1Image(anat_data, anat.get_affine(), anat.get_header()).to_filename('/tmp/anat_betted.nii.gz')

Excercise

Load functional volume (bold.nii.gz) and one of the masks. Apply the mask (zero out non-mask voxels), save resultant "masked" volume as a new bold_masked.nii.gz.

In [ ]: