Notebook

In [1]:

# Jupyter Layout
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [2]:

# Standard Python imports
from tempfile import NamedTemporaryFile
import os
import glob
from scipy.io import loadmat
import numpy as np
import datetime
import numpy as np

In [3]:

# PyNWB imports
from pynwb import NWBFile, get_manager
from pynwb.base import ProcessingModule
from pynwb.misc import UnitTimes, SpikeUnit
from pynwb.ecephys import ElectrodeGroup, Device
from pynwb.epoch import EpochTimeSeries
from pynwb.image import ImageSeries
from pynwb.core import set_parents
from pynwb import NWBHDF5IO

/Users/oruebel/anaconda2/envs/py35/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters

In [4]:

# Import helper function to render the HDF5 file
interactive = False

try:
    import sys
    sys.path.append('/Users/oruebel/Devel/nwb/nwb-schema/docs')
    from utils.render import NXGraphHierarchyDescription, HierarchyDescription
    if interactive:
        %matplotlib notebook
        %config InlineBackend.figure_format = 'svg'
        autostop_interact = True
    else:
        %matplotlib inline
    import matplotlib.pyplot as plt    
    VIS_AVAILABLE = True
except ImportError:
    VIS_AVAILABLE = False
    print('DISABLING VISUALIZATION OF FILE HIERARCHY')

Notes:¶

This example is based on https://github.com/NeurodataWithoutBorders/api-python/blob/master/examples/create_scripts/crcns_ret-1.py from H5Gate.

Compared to the NWB files generated by the original example we here use the extension mechanism to add custom data fields rather than adding unspecified custom data directly to the file, i.e., all objects (datasets, attributes, groups etc.) are governed by a formal specification.

Previously pixle_size, meister_x, meister_y, meister_dx, meister_dy were stored as custom datasets in ImageSeries. Here we create an extensions MeisterImageSeries which extens ImageSeries and stores that values as attributes pixel_size, x, y, dx, dy. We here chosse attributes instead of datasets simply because these are small, single int and float metadata values for which attributes are more approbirate.

Compared to the NWB files generated by the original example the files generated here contain the following additional main changes:

Change si_unit attribute to unit for compliance with the spec of ImageSeries
Moved 'source' attribute from the ProcessingModule to the NWBContainer
Added missing 'source' for SpikeUnit
Added missing tags and description for epochs
Added /general/devices/... to describe the device
Added neurodata_type and namespace attributes for format compliance
Added neurodata_type and namespace attributes for format compliance
NWBContainer is now a base type of all core neurodata_types and as such help and source attributes have been added to all core types
The original script reused iterator variables in nested loops. We have updated those occurrence to avoid consusion and avoid possible errors.
The following custom metadata fields---i.e., datasets that were originally added to the file without being part of the NWB specification and without creation of corresponding extensions---have not yet been integrated with the NWB files:
- /general custom metdata: /notes, /random_number_generation, /related_publications. This will require extension of NWBFile to extend the spec of /general. Improvements to make this easier have been proposed for discussion at the upcoming hackathon.
- SpikeUnit custom datasets with additional copies of the per-stimulus spike times (i.e., /processing/Cells/UnitTimes/cell_/stim_ in the original version). This will require an extension for SpikeUnit.
- /subject, subject/genotype, subject/species : See Issue https://github.com/NeurodataWithoutBorders/pynwb/issues/45 support for subject metadata is upcoming in PyNWB
- /specifications, /specifications/nwb_core.py : See Issue https://github.com/NeurodataWithoutBorders/pynwb/issues/44 will be added by PyNWB automatically

For readability and to ease comparison, we provide in https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/docs/notebooks/convert-crcns-ret-1-meisterlab-compare-nwb-1.0.6.ipynb a notebook that shows the original script for converting to NWW 1.0.x.

Step 1: Download the Example Data¶

A tar file with the example data is available for download from:

https://portal.nersc.gov/project/crcns/download/nwb-1/example_script_data/source_data_2.tar.gz

Please download and uncompress the data file.

Step 2: Define the location of the example data¶

In [5]:

indata_dir = "/Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1"

Step 3: Define the location for the output NWB file¶

In [6]:

output_dir = '/Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2'     # Define name of the output file. 
stim_lib_dir = os.path.join(output_dir, 'stim_lib')
if not os.path.exists(stim_lib_dir):
    os.makedirs(stim_lib_dir)

Step 4: Reading the input data¶

The data read portion of this tutorial has been adapted from https://github.com/NeurodataWithoutBorders/api-python/blob/master/examples/create_scripts/crcns_ret-1.py (July 24.2017) . Large portions have been copied from that script (and modified as needed) to ease use of this example. We here separate the data read from the data write to be illustrate which parts of the code are actually relevant for reading the input data (this section) vs. which parts are specific to the data write (see Section 5). Alternatively, we could naturally also directly read the data to PyNWB NWBContainer classes.

4.1 Input data settings¶

In [7]:

# Create the list of file names
data_dir = os.path.join(indata_dir, 'Data')
file_list = glob.glob( data_dir + "/*.mat")

4.2 Read helper functions¶

In [8]:

def find_exp_time(fname, data_info):
    try:
        d = data_info["date"][0]
    except IndexError:
        print ("Warning: Unable to read recording date from " + fname)
        return ""
    try:
        rst = data_info["RecStartTime"][0]
    except IndexError:
        print ("Warning: Unable to read recording start time from " + fname)
        return ""
    time = d + str(rst[3]) + str(rst[4]) + str(rst[5])
    dt=datetime.datetime.strptime((time), "%Y%m%d%H%M%S")
    return dt.strftime("%a %b %d %Y %H:%M:%S")
    
def find_stim_times(fname, data_info):
    try:
        d = data_info["date"][0]
    except IndexError:
        print ("Error: Unable to read recording date from " + fname)
        assert False
    try:
        rst = data_info['RecStartTime']
    except IndexError:
        print ("Error: Unable to read recording start time from " + fname)
        assert False
    dt = []
    for i in range(len(rst)):
        time = d + str(rst[i][3]) + str(rst[i][4]) + str(rst[i][5])
        dt.append(datetime.datetime.strptime((time), "%Y%m%d%H%M%S"))
    for i in range(1,len(rst)):
        dt[i] = (dt[i] - dt[0]).total_seconds()
    dt[0] = 0.0
    return dt

def create_stim_ident(x, y, dx, dy):
    return "%dx%d_%dx%d" % (x, y, dx, dy)

def create_stimulus_file(fname, seed, x, y, dx, dy, indata_dir):
    import h5py
    print ("Creating stimulus file " + fname)
    ident = create_stim_ident(x, y, dx, dy)
    n_pixels = stim_pixel_cnt[ident]
    data = np.zeros(n_pixels)
    with open(indata_dir + "/ran1.bin", 'rb') as f:    # MODIFIED: Fix path to fit the notebook
        for i in range(int(n_pixels/8)):  #py3, added int
            byte = f.read(1)
            for j in reversed(range(8)):
                data[i*8+j] = 255*(ord(byte) >> j & 1)
            # for j in range(8):
            #     data[i*8+j] = 255 * ((ord(byte) >> (7-j)) & 1)
    nx = int(x/dx)  # py3, added int
    ny = int(y/dy)  # py3, added int
    n_frames = int(n_pixels / (nx * ny))  # py3, added int
    # reshape stimulus
    datar = np.reshape(data, (n_frames, ny, nx))
    h5 = h5py.File(fname, 'w')
    grp = h5.create_dataset("image_stack", data=datar, compression=True, dtype='uint8')
    grp.attrs["unit"] = "grayscale"
    grp.attrs["resolution"] = "1"
    grp.attrs["conversion"] = "1"
    h5.close()

def fetch_stimulus_link(seed, x, y, dx, dy, indata_dir, stim_lib_dir):
    # see if stack already exists. if not, create it
    fname = os.path.join(stim_lib_dir, "%s_%d.h5" % (create_stim_ident(x, y, dx, dy), seed))
    if not os.path.isfile(fname):
        # need to create file
        create_stimulus_file(fname, seed, x, y, dx, dy, indata_dir)
    return fname, "image_stack"

4.3 Create Electrode Metadata¶

In [9]:

channel_coordinates = []
device_name = "61-channel probe"
# using 61-channel configuration from Mesiter et al., 1994
# NOTE: this is an estimate and is used for example purposes only
for i in range(5): channel_coordinates.append([(140+70*i),   0, 0])
for i in range(6): channel_coordinates.append([(105+70*i),  60, 0])
for i in range(7): channel_coordinates.append([( 70+70*i), 120, 0])
for i in range(8): channel_coordinates.append([( 30+70*i), 180, 0])
for i in range(9): channel_coordinates.append([(    70*i), 240, 0])
for i in range(8): channel_coordinates.append([( 30+70*i), 300, 0])
for i in range(7): channel_coordinates.append([( 70+70*i), 360, 0])
for i in range(6): channel_coordinates.append([(105+70*i), 420, 0])
for i in range(5): channel_coordinates.append([(140+70*i), 480, 0])
channel_coordinates = np.asarray(channel_coordinates)   

electrode_meta = {
    'name': "61-channel_probe", 
    'channel_coordinates': channel_coordinates,
    'channel_description': ['none'] * 61,
    'channel_location': ['undefined'] * 61,
    'channel_filtering': ['undefined'] * 61,
    'channel_impedance': ['undefined'] * 61,
    'description': "Approximation of electrode array used in experiment based on Mester, et. al., 1994 (DEMO DATA)",
    'location': "Retina flatmount recording",
    'device': "----"  
}

4.4 Getting stimulus frame requirements¶

In [10]:

stim_pixel_cnt = {}
for filename in file_list:
    fname = os.path.basename(filename)
    mfile = loadmat(filename, struct_as_record=True)
    data_info = mfile['datainfo'][0,0]
    stim_offset = find_stim_times(fname, data_info)
    for i in range(len(stim_offset)):
        # create stimuli
        # read data from mat file
        stimulus = mfile["stimulus"][0,i]
        n_frames = stimulus["Nframes"][0,0]
        # figure out how many pixels per frame
        x = stimulus["param"]["x"][0,0][0,0]
        y = stimulus["param"]["y"][0,0][0,0]
        dx = stimulus["param"]["dx"][0,0][0,0]
        dy = stimulus["param"]["dy"][0,0][0,0]
        nx = int(x/dx)
        ny = int(y/dy)
        n_pix = n_frames * nx * ny
        # remember maximum pixel count for this stimulus type
        name = create_stim_ident(x, y, dx, dy)
        if name in stim_pixel_cnt:
            n = stim_pixel_cnt[name]
            if n_pix > n:
                stim_pixel_cnt[name] = n_pix
        else:
            stim_pixel_cnt[name] = n_pix

4.5 Read file metadata, spike data, and stimulus data¶

In [11]:

# The following dicts use filenames as keys and store for each file a dict with the 
# associated file metadta, stimulus data, and spike data, respectively
per_file_meta = {}              # File metadata : Store for each file a single dict with the metadata about that file 
stimulus_data = {}          # Stimulus data : Store for each file a list of dicts with the metadata about all stimuli
spike_unit_times_data = {}  # Spike data : Store for each file a list of dicts with the metadata about the SpikeUnits in UnitTimes

for filename in file_list:
    print("Processing: %s" % filename)
    fname = os.path.basename(filename)
    mfile = loadmat(filename, struct_as_record=True)
    data_info = mfile['datainfo'][0,0]
    stim_offset = find_stim_times(fname, data_info)
    
    ##################################
    #  Read file metadata
    ##################################
    rec_no = data_info["RecNo"][0]
    smpl_no = data_info["SmplNo"][0]
    animal = str(data_info["animal"][0])
    notes = str(data_info["description"][0])
    session_id = "RecNo: " + str(rec_no) + "; SmplNo: " + str(smpl_no)
    per_file_meta[filename] =  {
        "output_filename": os.path.join(output_dir, "%s.nwb" % str(fname[:-4])),
        "start_time": find_exp_time(os.path.basename(file_list[0]), data_info),
        "identifier": "Meister test data",
        "description": "Optical stimulation and extracellular recording of retina",
        "create_date": datetime.datetime.now(),
        "experimenter": "Yifeng Zhang",                       
        "experiment_description": None,
        "session_id": "RecNo: " + str(rec_no) + "; SmplNo: " + str(smpl_no),
        "institution": "Harvard University",
        "lab": "Markus Meister",
        "related_publications": "Yi-Feng Zhang, Hiroki Asari, and Markus Meister (2014); Multi-electrode recordings from retinal ganglion cells. CRCNS.org. http://dx.doi.org/10.6080/K0RF5RZT",
        "notes": notes,
        "species": "mouse",
        "genotype": animal
    }   
    
    ###################################################
    #  Read stimulus presentation image series data
    ###################################################
    stim_offsets = find_stim_times(fname, data_info)
    curr_stim_data = []
    # get stimulus offset times
    for i in range(len(stim_offsets)):
        stim_dat = {
            "file": filename,
            "index": i,
            "offset": stim_offsets[i]
        }
        # create stimuli
        # read data from mat file
        stimulus = mfile["stimulus"][0,i]
        stim_dat['n_frames'] = stimulus["Nframes"][0,0]
        stim_dat['frame'] = stimulus["frame"][0,0]
        stim_dat['onset'] = stimulus["onset"][0,0]
        stim_dat['type_s'] = stimulus["type"][0]
        stim_dat['seed'] = stimulus["param"]["seed"][0,0][0,0]
        stim_dat['x'] = stimulus["param"]["x"][0,0][0,0]
        stim_dat['y'] = stimulus["param"]["y"][0,0][0,0]
        stim_dat['dx'] = stimulus["param"]["dx"][0,0][0,0]
        stim_dat['dy'] = stimulus["param"]["dy"][0,0][0,0]
        stim_dat['pixel_size'] = stimulus["pixelsize"][0,0]
        stim_dat['timestamps'] = np.arange(stim_dat['n_frames']) * stim_dat['frame']  + stim_dat['onset'] + stim_dat['offset']
        stim_dat['end'] = 1.
        if stim_dat['timestamps'].size>0:
            stim_dat['end'] += stim_dat['timestamps'][-1] 
        stim_dat['rec_stim_name'] = "rec_stim_%d" % (i+1)
        stim_dat['description'] = "type = %s; seed = %s" % (str(stim_dat['type_s']), str(stim_dat['seed']))
        stim_dat['dimension'] = [int(stim_dat['x']/stim_dat['dx']), 
                                 int(stim_dat['y']/stim_dat['dy'])]
        stim_dat['data'] = fetch_stimulus_link(seed=stim_dat['seed'], 
                                               x=stim_dat['x'], 
                                               y=stim_dat['y'], 
                                               dx=stim_dat['dx'], 
                                               dy=stim_dat['dy'],
                                               indata_dir=indata_dir,
                                               stim_lib_dir=stim_lib_dir)
        curr_stim_data.append(stim_dat)
    stimulus_data[filename] = curr_stim_data
    
    ###########################################
    #  Read the spike data
    ############################################
    spikes_mat = mfile["spikes"]
    num_cells = spikes_mat.shape[0]
    num_stims =  spikes_mat.shape[1]
    spike_units = []
    for cell_index in range(num_cells):
        curr_spike_unit = {
            'name': "cell_%d" %(cell_index+1),
            'custom_stim_spikes': {}
        }
        spike_list = []
        for j in range(num_stims):
           
            stim_name = "stim_%d" % (j+1)
            spikes = np.hstack(spikes_mat[cell_index,j]) + stim_offsets [j]
            spike_list.extend(spikes)
            curr_spike_unit['custom_stim_spikes'][stim_name] = spikes
            
        curr_spike_unit['times'] = spike_list
        curr_spike_unit['unit_description'] = 'none'  # This was also 'none' in the original version 
        curr_spike_unit['source'] = "Data as reported in the original crcns file"
        spike_units.append(curr_spike_unit)
    
    spike_unit_times_data[filename] = spike_units

Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080516_R1.mat
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x480_1x480_-10000.h5
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x480_10x10_-10000.h5
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x480_640x1_-10000.h5
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080516_R2.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080606_R1.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080606_R2.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080606_R3.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080624_R1.mat
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x480_2x480_-10000.h5
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x480_640x2_-10000.h5
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x300_10x10_-10000.h5
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080624_R2.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080624_R3.mat
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x480_5x480_-10000.h5
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080628_R2.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080628_R3.mat
Creating stimulus file /Volumes/Data Drive/nwb_convert_tutorial/convert_test2/nwb2/stim_lib/640x320_8x8_-10000.h5
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080628_R4.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080628_R5.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080628_R6.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080628_R7.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080702_R1.mat
Processing: /Volumes/Data Drive/nwb_convert_tutorial/api-python/examples/source_data_2/crcns_ret-1/Data/20080702_R4.mat

Step 5: Creating the NWB data using PyNWB¶

5.1 Create extensions for storing custom metadata¶

In [12]:

from pynwb.spec import NWBNamespaceBuilder, NWBGroupSpec, NWBAttributeSpec, NWBDatasetSpec
from pynwb import get_class, load_namespaces
import os

# Define where our extensions should be stored
ext_output_dir  = os.path.join(output_dir, "extensions")
ns_name = 'crcnsret1'
ns_path  = os.path.join(ext_output_dir, '%s.namespace.yaml' % ns_name)
ext_path = '%s.extensions.yaml' % ns_name

# Check to make sure that our storage location for the extensions exists
if not os.path.exists(ext_output_dir):
    os.makedirs(ext_output_dir)

# Build the namespace
ns_builder = NWBNamespaceBuilder('Extension for use in my Lab', ns_name)

# Create a custom ImageSeries to add our custom attributes and add our extenions to the namespace
mis_ext = NWBGroupSpec('A custom ImageSeries to add MeisterLab custom metadata',
                       attributes=[NWBAttributeSpec('x' , 'int', 'meister x', required=False),
                                   NWBAttributeSpec('y' , 'int', 'meister y', required=False),
                                   NWBAttributeSpec('dx', 'int', 'meister dx', required=False),
                                   NWBAttributeSpec('dy', 'int', 'meister dy', required=False),
                                   NWBAttributeSpec('pixel_size', 'float', 'meister pixel size', required=False)],
                       neurodata_type_inc='ImageSeries',
                       neurodata_type_def='MeisterImageSeries') 
ns_builder.add_spec(ext_path, mis_ext)

# Export the YAML files for our namespace and extensions
ns_builder.export(os.path.basename(ns_path), outdir=os.path.dirname(ns_path))

# Load the namespace to register it with PyNWB
load_namespaces(ns_path)

Out[12]:

('crcnsret1',)

For illustration purposes, let's also have a quick look at the contents of our YAML namespace and extension file.

In [13]:

print(os.path.basename(ns_path))
print("------------------------")
with open(ns_path, 'r') as fin:
    print(fin.read())
print("")
print(ext_path)
print("-------------------------")
with open(os.path.join(ext_output_dir, ext_path), 'r') as fin:
    print(fin.read())

crcnsret1.namespace.yaml
------------------------
namespaces:
- doc: Extension for use in my Lab
  name: crcnsret1
  schema:
  - namespace: core
  - source: crcnsret1.extensions.yaml


crcnsret1.extensions.yaml
-------------------------
groups:
- attributes:
  - doc: int
    dtype: meister x
    name: x
    required: false
  - doc: int
    dtype: meister y
    name: y
    required: false
  - doc: int
    dtype: meister dx
    name: dx
    required: false
  - doc: int
    dtype: meister dy
    name: dy
    required: false
  - doc: float
    dtype: meister pixel size
    name: pixel_size
    required: false
  doc: A custom ImageSeries to add MeisterLab custom metadata
  neurodata_type_def: MeisterImageSeries
  neurodata_type_inc: ImageSeries

In [14]:

# Get our custom container classes for our extensions
MeisterImageSeries = get_class('MeisterImageSeries', ns_name)

We can now inspect our container class using the usual mechanisms, e.g., help. For illustration purposes, let's call help on our class. Here we can see that:

Our custom attributes have been added to the constructor with approbriate documention describing the type and purpose we indicated in the spec for our attributes
From the "Method resolution order" documentationw we can see that our MeisterImageSeries inherits from pynwb.image.ImageSeries so that interaction mechanism from the base class are also available in our class

In [15]:

help(MeisterImageSeries)

Help on class MeisterImageSeries in module abc:

class MeisterImageSeries(pynwb.image.ImageSeries)
 |  General image data that is common between acquisition and stimulus time series.
 |  The image data can be stored in the HDF5 file or it will be stored as an external image file.
 |  
 |  Method resolution order:
 |      MeisterImageSeries
 |      pynwb.image.ImageSeries
 |      pynwb.base.TimeSeries
 |      pynwb.core.NWBDataInterface
 |      pynwb.core.NWBContainer
 |      pynwb.core.NWBBaseType
 |      pynwb.form.container.Container
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __init__(*args, **kwargs)
 |      __init__(name, source, data, unit='None', format='None', external_file=None, starting_frame=None, bits_per_pixel=None, dimension=[nan], resolution=0.0, conversion=1.0, timestamps=None, starting_time=None, rate=None, comments='no comments', description='no description', control=None, control_description=None, parent=None, pixel_size=None, x=None, y=None, dx=None, dy=None)
 |      
 |      
 |      
 |      Args:
 |          name (str): The name of this TimeSeries dataset
 |          source (str): Name of TimeSeries or Modules that serve as the source for the data contained here. It can also be the name of a device, for stimulus or acquisition data
 |          data (ndarray or list or tuple or Dataset or DataChunkIterator or DataIO or TimeSeries): The data this TimeSeries dataset stores. Can also store binary data e.g. image frames
 |          unit (str): The base unit of measurement (should be SI unit)
 |          format (str): Format of image. Three types: 1) Image format; tiff, png, jpg, etc. 2) external 3) raw.
 |          external_file (Iterable): Path or URL to one or more external file(s). Field only present if format=external.              Either external_file or data must be specified, but not both.
 |          starting_frame (Iterable): Each entry is the frame number in the corresponding external_file variable.              This serves as an index to what frames each file contains.
 |          bits_per_pixel (int): Number of bit per image pixel
 |          dimension (Iterable): Number of pixels on x, y, (and z) axes.
 |          resolution (float): The smallest meaningful difference (in specified unit) between values in data
 |          conversion (float): Scalar to multiply each element by to conver to volts
 |          timestamps (ndarray or list or tuple or Dataset or DataChunkIterator or DataIO or TimeSeries): Timestamps for samples stored in data
 |          starting_time (float): The timestamp of the first sample
 |          rate (float): Sampling rate in Hz
 |          comments (str): Human-readable comments about this TimeSeries dataset
 |          description (str): Description of this TimeSeries dataset
 |          control (Iterable): Numerical labels that apply to each element in data
 |          control_description (Iterable): Description of each control value
 |          parent (NWBContainer): The parent NWBContainer for this NWBContainer
 |          pixel_size (None): float
 |          x (None): int
 |          y (None): int
 |          dx (None): int
 |          dy (None): int
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __abstractmethods__ = frozenset()
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pynwb.image.ImageSeries:
 |  
 |  bits_per_pixel
 |  
 |  dimension
 |  
 |  external_file
 |  
 |  format
 |  
 |  starting_frame
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from pynwb.image.ImageSeries:
 |  
 |  __nwbfields__ = ('source', 'help', 'comments', 'description', 'data', ...
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pynwb.base.TimeSeries:
 |  
 |  ancestry
 |  
 |  comments
 |  
 |  control
 |  
 |  control_description
 |  
 |  conversion
 |  
 |  data
 |  
 |  data_link
 |  
 |  description
 |  
 |  help
 |  
 |  interval
 |  
 |  neurodata_type
 |  
 |  num_samples
 |  
 |  rate
 |  
 |  rate_unit
 |  
 |  resolution
 |  
 |  starting_time
 |  
 |  time_unit
 |  
 |  timestamp_link
 |  
 |  timestamps
 |  
 |  timestamps_unit
 |  
 |  unit
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pynwb.core.NWBContainer:
 |  
 |  source
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pynwb.core.NWBBaseType:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  container_source
 |      The source of this Container e.g. file name or table
 |  
 |  fields
 |  
 |  name
 |  
 |  parent
 |      The parent NWBContainer of this NWBContainer
 |  
 |  ----------------------------------------------------------------------
 |  Class methods inherited from pynwb.form.container.Container:
 |  
 |  type_hierarchy() from pynwb.form.utils.ExtenderMeta

Step 5.2: Write Single NWB file¶

This example generates a series of NWB HDF5 files. We here define first a function to convert a single file.

In [16]:

def convert_single_file(file_stimulus_data, file_meta, spike_units, electrode_meta):
    import h5py
    #########################################
    # Create the NWBFile containter
    ##########################################
    nwbfile = NWBFile(session_description=file_meta['description'],
                      identifier=file_meta['identifier'],
                      session_start_time=file_meta['start_time'],
                      file_create_date=file_meta['create_date'],
                      experimenter=file_meta['experimenter'],
                      experiment_description=file_meta['experiment_description'],
                      session_id=file_meta['session_id'],
                      institution=file_meta['institution'],
                      lab=file_meta['lab'],
                      source='MeisterLab')
    
    ################################################
    # Convert the stimuls data for the file
    ################################################
    for i, curr_stimulus in enumerate(file_stimulus_data):
        
        ####################################################
        # Get the external dataset with the stimulus data
        ####################################################
        # To create an external link to the HDF5 dataset from our stimulus library we simply get the dataset
        # and assign it to the data input parameter of our ImageSeries (i.e., here our extension MeisterImageSeries).
        # PyNWB recognizes the h5py.Dataset and automatically creates an ExternalLink or SoftLink for the 
        # data, depending on whether the data comes from the same file or an external HDF5 file. 
        stim_data_file = h5py.File(curr_stimulus['data'][0], 'r')
        stim_ext_data = stim_data_file[curr_stimulus['data'][1]]
        
        #####################################################
        # Create the image timeseries for the stimulus
        #####################################################
        img = MeisterImageSeries(name=curr_stimulus['rec_stim_name'],  # The following are inherited from ImageSeries
                                 source='Undefined',
                                 data=stim_ext_data, # By assigning the existing HDF5 dataset from the external file, PyNWB automatically creates an ExternalLink for data
                                 unit="grayscale",
                                 format='raw',
                                 bits_per_pixel=8,
                                 dimension=curr_stimulus['dimension'],
                                 timestamps=curr_stimulus['timestamps'],
                                 description=curr_stimulus['description'],  
                                 x=int(curr_stimulus['x']),   # The following are custom attribute values we added to our MeiterImageSeries extension
                                 y=int(curr_stimulus['y']),
                                 dx=int(curr_stimulus['dx']),
                                 dy=int(curr_stimulus['dy']),
                                 pixel_size=curr_stimulus['pixel_size'],
                             )
        nwbfile.add_stimulus(img)  # Add the stimulus timeseries to the file
        
        ##############################################
        # Create the epoch for the stimulus
        ##############################################    
        epoch = nwbfile.add_epoch(name="stim_%d"%(i+1), 
                                     start=curr_stimulus['offset'],
                                     stop=curr_stimulus['end'],
                                     tags=[],
                                     description="",
                                     source="MeisterLab")
        # Create the EpochTimeSeries
        epoch.add_timeseries(timeseries=img,
                             in_epoch_name="stimulus")
            
    
    ####################################################################################################
    #  Create the module 'Cells' with a single 'UnitTimes' interfaces storing a set of 'SpikeUnites'
    #####################################################################################################
    cells_module = nwbfile.create_processing_module(name="Cells", 
                                                    description="Spike times for the individual cells and stimuli",
                                                    source="MeisterLab")
    # Create the unit times object
    unit_times = UnitTimes(source="Data as reported in the original crcns file",
                           spike_units=[])
    # Create the spike units for the unit times object
    for spike_unit_data in spike_units:
        unit_times.create_spike_unit(name=spike_unit_data['name'],
                                     times=spike_unit_data['times'],
                                     unit_description=spike_unit_data['unit_description'],
                                     source=spike_unit_data['source'])
        # TODO Add custom data'custom_stim_spikes' data to the SpikeUnit
    
    # Add the unit times to the processing module
    cells_module.add_container(unit_times)
    
    #####################################################
    # Create the electrode group
    #####################################################
    device = nwbfile.create_device(electrode_meta['device'], source='MeisterLab')
    electrode_group = nwbfile.create_electrode_group(name=electrode_meta['name'],
                                                     channel_coordinates=electrode_meta['channel_coordinates'],
                                                     channel_description=electrode_meta['channel_description'],
                                                     channel_location=electrode_meta['channel_location'],
                                                     channel_filtering=electrode_meta['channel_filtering'],
                                                     channel_impedance=electrode_meta['channel_impedance'],
                                                     description=electrode_meta['description'],
                                                     location=electrode_meta['location'],
                                                     device=device,
                                                     source='MeisterLab')
    
    # TODO Add metadata
    # file_meta['related_publications']
    # file_meta['notes']
    # file_meta['species'] 
    # file_meta['genotype']
    
    ##################################
    # Write the NWB file
    ##################################
    manager = get_manager()
    io = NWBHDF5IO(file_meta['output_filename'], manager)
    io.write(nwbfile)
    io.close()
    stim_data_file.close()

Step 5.2: Convert all files¶

Convert all the files by iteating over the files and calling convert_single_file function for each of the file

In [18]:

for filename in file_list:
    convert_single_file(file_stimulus_data=stimulus_data[filename],
                        file_meta=per_file_meta[filename], 
                        spike_units=spike_unit_times_data[filename], 
                        electrode_meta=electrode_meta)

6 Basic Validation and Visualization¶

In [19]:

%matplotlib inline
%config InlineBackend.figure_format = 'png'
if VIS_AVAILABLE:
    # Plotting settings
    show_bar_plot = False    # Change setting to plot distribution of object sizes in the HDF5 file
    plot_single_file = True # Plot all files or a single example file
    output_filenames = [fm['output_filename'] for fi, fm in per_file_meta.items()] 
    
    # Select the files to plot
    if plot_single_file:
        for fi in output_filenames:
            if fi.endswith('20080624_R1.nwb'):
                filenames =[fi,]
                break
    else:
        filenames = output_filenames
        
    # Create the plots for all files
    for filename in filenames:
        file_hierarchy = HierarchyDescription.from_hdf5(filename)  
        file_graph = NXGraphHierarchyDescription(file_hierarchy)          
        fig = file_graph.draw(show_plot=False,
                              figsize=(12,16),
                              label_offset=(0.0, 0.0065),
                              label_font_size=10)
        plot_title = filename + " \n " + "#Datasets=%i, #Attributes=%i, #Groups=%i, #Links=%i" % (len(file_hierarchy['datasets']), len(file_hierarchy['attributes']), len(file_hierarchy['groups']), len(file_hierarchy['links']))
        plt.title(plot_title)
        plt.show()
        
        # Show a sorted bar plot with the sizes of all datasets in the file
        if show_bar_plot:
            d = {i['name']: np.prod(i['size']) for i in file_hierarchy['datasets']}
            l = [w for w in sorted(d, key=d.get, reverse=True)]
            s = [d[w] for w in l]   
            p = np.arange(len(l))   
            fig,ax = plt.subplots(figsize=(16,7))
            ax.set_title(filename)
            ax.bar(p, s, width=1, color='r')
            ax.set_xticks(p+1)  
            ax.set_xticklabels(l)  
            ax.set_yscale("log", nonposy='clip')
            fig.autofmt_xdate(bottom=0.2, rotation=90, ha='right')
            plt.show()

In [ ]: