Objectives
To get grasp of common utilities for loading data from/to common data formats. Upon completion of this class you should be able
Tabular data is often stored in CSV files, of which we have a little fake example:
!head files/exp_res.csv
builtin csv module can be a big help to load them up:
import csv
entries = list(csv.DictReader(open('files/exp_res.csv','r')))
which later could also be saved
with open('files/exp_res_saved.csv', 'w') as f:
writer = csv.DictWriter(f, entries[0].keys())
writer.writeheader()
writer.writerows(entries)
!head files/exp_res_saved.csv
Excercise
In our HW2/HW3 we stored response and response time in our custom format. Revisit your HW3 (you must have submitted by now, if not -- HW2), add cmdline option to store in CSV, upon which the same information should be stored as CSV format. Hint: You don't have to use DictWriter
;-)
And if dealing with any Python data type, regardless how nested they are, we can pickle those data structures (well, fancy work would be to "serialize") and store to a file:
import pickle
with open("files/exp_res.pickle", "w") as f:
pickle.dump(entries, f)
so they could be stored in file and reloaded later on:
with open("files/exp_res.pickle") as f:
entries_in_winter = pickle.load(f)
print entries_in_winter == entries
Pickling is convenient but
import numpy as np
Numpy's loadtxt
provides a helper to load any tabular data from a text file
a = np.loadtxt('files/exp_res.csv', delimiter=',', skiprows=1)
print a[:2]
And gurus could use structured data types (which I successfully avoided to describe in the previous lecture)
a = np.loadtxt('files/exp_res.csv', delimiter=',', skiprows=1,
dtype=[('Subject', int), ('Question', int), ('RT', float)])
print a[:2]
print a["Question"]
print a[["Question", "RT"]]
Excercise
Choose any method from above (csv, numpy) to load this sample data and compute per question mean/std. Do not assume that entries anyhow ordered. You might like to use "np.unique" function
The holy grail library for anyone doing neuroimaging in Python. Provides I/O to a variety (but not all, yet) of neuroimaging formats making access to data and meta-information very easy.
import nibabel as nib
Commands below make use of git-annex to download tutorial data for PyMVPA. If you don't have annex installed -- you would benefit from installing it. Otherwise, the same data is available as a tarball from http://data.pymvpa.org/datasets/tutorial_data -- download and extract it.
!git clone http://data.pymvpa.org/datasets/tutorial_data/.git
!cd tutorial_data/data; git annex get anat.nii.gz mask_* bold.nii.gz
To open any neuroimaging data file, use .load
anat = nib.load('tutorial_data/data/anat.nii.gz')
which will open and load meta-information but not real data, which you can load if desired as a numpy array:
anat_data = anat.get_data()
You can now analyze that data as any other numpy array and whenever you are done -- save it as another Nifti image. E.g. let's perform poor man skull stripping
with which we remove all voxels which carry values less than 50:
anat_data[anat_data < 50] = 0
nib.Nifti1Image(anat_data, anat.get_affine(), anat.get_header()).to_filename('/tmp/anat_betted.nii.gz')
Excercise
Load functional volume (bold.nii.gz
) and one of the masks. Apply the mask (zero out non-mask voxels), save resultant "masked" volume as a new bold_masked.nii.gz
.