HDF5
(Hierarchical Data Format)Link to Kyle's [Bitbucket Repo](https://bitbucket.org/yingkaisha/python-in-remote-sensing/src/tip/_libs/) and some [testing data-sets](https://bitbucket.org/yingkaisha/python-in-remote-sensing/src/tip/_data/_demos/)
Links to HDF5
& Python
HDF5
& Python, you'll need to install 2 libraries/packages from the Anaconda distribuion.
Type the following commands into the terminal
prompt.
conda install hdf5
conda install h5py
HDF5
File Structure
The 3 components of an HDF5
file (files appended with a ".h5
").
Datasets
Groups
datasets
and other groups
Attributes
datasets
(and groups
)
Primary benefits of HDF5
-files:
Groups
and Attributes
.Groups
act like folders, allowing related datasets
to be stored together.Attributes
allows the direct attachement of metadata to the actual data they describe.HDF5
files (taken from book 'Python and HDF5
')HDF5
files shows the organizational structure...import h5py
>>> f = h5py.File("weather_data.h5")
>>> f["/15/Temperature"] = temperature_station15
>>> f["/15/Temperature"].attrs["dt"] = 10.0
>>> f["/15/Temperature"].attrs["startTime"] = 1375204299
>>> f["/15/Wind"] = wind
>>> f["/15/Wind"].attrs["dt"] = 5.0
>>> f["/20/Temperature"] = temperature_station20
In the above code-chunk:
/15/...
is similar to a folder system on a computer./15/...
is stored under the /15/...
(e.g. Temperature & Wind).
Similarly, accessing the metadata attributes
can be done in the following way...
>>> dataset = f["15/Temperature"]
>>> for key, value in dataset.attr.iteritems():
print "%s: %s" % (key, value)
dt: 10.0
start_time = 1375204299
attr
attribute of a dataset, which is a dictionary
with key-value
pairs.h5dump()
module from Kyle's Bitbucket Repoexample.h5
into the notebookimport h5py as h5
from h5lib import h5dump, print_attrs
f = h5.File('example.h5')
print '\nHDF5 file \'example.h5\' just loaded:\n\n%r' % f
print '\nWe see that the file has been opened in read-mode, hence the \'mode +r\'...'
HDF5 file 'example.h5' just loaded: <HDF5 file "example.h5" (mode r+)> We see that the file has been opened in read-mode, hence the 'mode +r'...
.h5
fileprint '\nThe loaded .h5 file is of type...\n%s' % type(f)
print '\nLoaded file is of type \'File\'.'
The loaded .h5 file is of type... <class 'h5py._hl.files.File'> Loaded file is of type 'File'.
HDF5
filesThere are a few ways of viewing the stored metadata...
One is to use h5dump()
(contribution from Kyle). This routine essentially prints out all of the material stored within the .h5
file.
h5dump
on "example.h5
"h5dump(f)
item name: Example SDS <HDF5 dataset "Example SDS": shape (16, 5), type ">i2"> HDF4_OBJECT_TYPE: SDS HDF4_OBJECT_NAME: Example SDS HDF4_REF_NUM: 2 item name: Example Vdata <HDF5 dataset "Example Vdata": shape (10,), type "|V6"> TITLE: Example Vdata CLASS: TABLE VERSION: 1.0 FIELD_0_NAME: Idx FIELD_1_NAME: Temp FIELD_2_NAME: Dewpt HDF4_OBJECT_TYPE: Vdata HDF4_OBJECT_NAME: Example Vdata HDF4_REF_NUM: 11 item name: Example Vdata_t <HDF5 named type "Example Vdata_t" (dtype |V6)> item name: HDF4_DIMGROUP <HDF5 group "/HDF4_DIMGROUP" (0 members)> item name: MonthlyRain <HDF5 group "/MonthlyRain" (2 members)> HDF4_OBJECT_TYPE: Vgroup HDF4_OBJECT_NAME: MonthlyRain HDF4_REF_NUM: 12 item name: MonthlyRain/Data Fields <HDF5 group "/MonthlyRain/Data Fields" (2 members)> HDF4_OBJECT_TYPE: Vgroup HDF4_OBJECT_NAME: Data Fields HDF4_REF_NUM: 13 item name: MonthlyRain/Data Fields/RrLandRain <HDF5 dataset "RrLandRain": shape (28, 72), type ">f4"> HDF4_OBJECT_TYPE: SDS HDF4_OBJECT_NAME: RrLandRain HDF4_REF_NUM: 16 DIMENSION_NAMELIST: ['/HDF4_DIMGROUP/YDim:MonthlyRain' '/HDF4_DIMGROUP/XDim:MonthlyRain'] item name: MonthlyRain/Data Fields/TbOceanRain <HDF5 dataset "TbOceanRain": shape (28, 72), type ">f4"> HDF4_OBJECT_TYPE: SDS DIMENSION_NAMELIST: ['/HDF4_DIMGROUP/YDim:MonthlyRain' '/HDF4_DIMGROUP/XDim:MonthlyRain'] HDF4_OBJECT_NAME: TbOceanRain HDF4_REF_NUM: 15 item name: MonthlyRain/Grid Attributes <HDF5 group "/MonthlyRain/Grid Attributes" (0 members)> HDF4_OBJECT_TYPE: Vgroup HDF4_OBJECT_NAME: Grid Attributes HDF4_REF_NUM: 14 ------------------- attributes for the root file ------------------- attribute name: HDFEOSVersion_GLOSDS --- value: HDFEOS_V2.16 attribute name: StructMetadata.0_GLOSDS --- value: GROUP=SwathStructure END_GROUP=SwathStructure GROUP=GridStructure GROUP=GRID_1 GridName="MonthlyRain" XDim=72 YDim=28 UpperLeftPointMtrs=(0.000000,70000000.000000) LowerRightMtrs=(360000000.000000,-70000000.000000) Projection=GCTP_GEO GROUP=Dimension END_GROUP=Dimension GROUP=DataField OBJECT=DataField_1 DataFieldName="TbOceanRain" DataType=DFNT_FLOAT32 DimList=("YDim","XDim") END_OBJECT=DataField_1 OBJECT=DataField_2 DataFieldName="RrLandRain" DataType=DFNT_FLOAT32 DimList=("YDim","XDim") END_OBJECT=DataField_2 END_GROUP=DataField GROUP=MergedFields END_GROUP=MergedFields END_GROUP=GRID_1 END_GROUP=GridStructure GROUP=PointStructure END_GROUP=PointStructure END
The loaded .h5
file is a organized as an [OrderedDict
](https://docs.python.org/2/library/collections.html#collections.OrderedDict) data-container (recall [Python dictionaries
](https://docs.python.org/2/tutorial/datastructures.html)).
key-value
pairs, or all groups
and datasets
) stored in the .h5
file
Accessing the .items()
method of an .h5
file will give you all of the stored datasets
and groups
.
print '\nStored groups and datasets within .h5 file:\n'.upper()
for each_item in f.items():
print each_item
STORED GROUPS AND DATASETS WITHIN .H5 FILE: (u'Example SDS', <HDF5 dataset "Example SDS": shape (16, 5), type ">i2">) (u'Example Vdata', <HDF5 dataset "Example Vdata": shape (10,), type "|V6">) (u'Example Vdata_t', <HDF5 named type "Example Vdata_t" (dtype |V6)>) (u'HDF4_DIMGROUP', <HDF5 group "/HDF4_DIMGROUP" (0 members)>) (u'MonthlyRain', <HDF5 group "/MonthlyRain" (2 members)>)
If only the keys
are desired, then simply use the .keys()
method.
print '\nPrinting out all the keys in the imported hdf5 file.\n'
for n, each_key in enumerate(f.keys()):
print 'Key %d:\t%s' % (n + 1, each_key)
print '\n'
Printing out all the keys in the imported hdf5 file. Key 1: Example SDS Key 2: Example Vdata Key 3: Example Vdata_t Key 4: HDF4_DIMGROUP Key 5: MonthlyRain
dataset
or group
Can use the print_attrs()
routine (thanks Kyle). A look at the routine...
>>> def print_attrs(name, obj):
print("item name: ",name,repr(obj))
for key, val in obj.attrs.iteritems():
print(" %s: %s" % (key, val))
dataset
print_attrs('Example SDS', f['Example SDS'])
item name: Example SDS <HDF5 dataset "Example SDS": shape (16, 5), type ">i2"> HDF4_OBJECT_TYPE: SDS HDF4_OBJECT_NAME: Example SDS HDF4_REF_NUM: 2
group
print_attrs('MonthlyRain', f['MonthlyRain'])
print '''\n
Here,
- we see 'MonthlyRain' is a group belonging to the group 'Vgroup'.
- ...also see there are 2 additional members (or subgroups) attached to
this group.
- the subgroups can be accessed with the .keys() method.
\n'''
item name: MonthlyRain <HDF5 group "/MonthlyRain" (2 members)> HDF4_OBJECT_TYPE: Vgroup HDF4_OBJECT_NAME: MonthlyRain HDF4_REF_NUM: 12 Here, - we see 'MonthlyRain' is a group belonging to the group 'Vgroup'. - ...also see there are 2 additional members (or subgroups) attached to this group. - the subgroups can be accessed with the .keys() method.
for n, each_key in enumerate(f['MonthlyRain']):
print 'Key %d:\t%s' % (n+1, each_key)
print '''\n
We can check/confirm what object type each group/dataset is, by using the 'type()'
function.\n
'''
Key 1: Data Fields Key 2: Grid Attributes We can check/confirm what object type each group/dataset is, by using the 'type()' function.
print 'f[\'MonthlyRain\'] is a of type:\n\n%s\n' % type(f['MonthlyRain'])
f['MonthlyRain'] is a of type: <class 'h5py._hl.group.Group'>
Data Fields
subgroup in f['MonthlyRain']
& accessing its metadataprint_attrs('Data Fields Metadata', f['MonthlyRain']['Data Fields'])
print '\nHere we see 2 members of the Data Fields group, so we can access \n\
additional fields with the .keys() method.\n'
print 'Additional groups/datasets:\n'
for n, each_key in enumerate(f['MonthlyRain']['Data Fields']):
print 'Key %d:\t%s' % (n+1, each_key)
item name: Data Fields Metadata <HDF5 group "/MonthlyRain/Data Fields" (2 members)> HDF4_OBJECT_TYPE: Vgroup HDF4_OBJECT_NAME: Data Fields HDF4_REF_NUM: 13 Here we see 2 members of the Data Fields group, so we can access additional fields with the .keys() method. Additional groups/datasets: Key 1: RrLandRain Key 2: TbOceanRain
type(f['MonthlyRain']['Data Fields']['TbOceanRain'])
h5py._hl.dataset.Dataset
.value()
method.print '\n', f['MonthlyRain']['Data Fields']['TbOceanRain'].value
[[ -1. -1. -1. ..., -1. -1. -1.] [ -1. -1. -1. ..., -1. -1. -1.] [ 93. -1. -1. ..., 58. -1. -1.] ..., [ -1. -1. -1. ..., -1. -1. -1.] [ -1. -1. -1. ..., -1. -1. -1.] [ -1. -1. -1. ..., -1. -1. -1.]]