This example shows how to use the ODM2 Python API (odm2api
) to connect to an ODM2 database, retrieve data, and analyze and visualize the data. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.
This example uses SQLite for the database because it doesn't require a server. However, the ODM2 Python API demonstrated here can alse be used with ODM2 databases implemented in MySQL, PostgreSQL or Microsoft SQL Server.
More details on the ODM2 Python API and its source code and latest development can be found at https://github.com/ODM2/ODM2PythonAPI
Adapted from notebook https://github.com/BiG-CZ/wshp2017_tutorial_content/blob/master/notebooks/ODM2_Example3.ipynb, based in part on earlier code and an ODM2 database from Jeff Horsburgh's group at Utah State University.
import os
import datetime
import matplotlib.pyplot as plt
%matplotlib inline
from shapely.geometry import Point
import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import MarkerCluster
import odm2api
from odm2api.ODMconnection import dbconnection
import odm2api.services.readService as odm2rs
from odm2api.models import SamplingFeatures
/home/mayorga/miniconda/envs/odm2client/lib/python2.7/site-packages/folium/__init__.py:59: UserWarning: This version of folium is the last to support Python 2. Transition to Python 3 to be able to receive updates and fixes. Check out https://python3statement.org/ for more info. UserWarning
"{} UTC".format(datetime.datetime.utcnow())
'2019-05-12 01:58:17.943605 UTC'
pd.__version__, gpd.__version__, folium.__version__
(u'0.24.2', u'0.5.0', u'0.8.3')
odm2api version used to run this notebook:
odm2api.__version__
u'0.7.2'
This example uses an ODM2 SQLite database file loaded with water quality sample data from multiple monitoring sites in the iUTAH Gradients Along Mountain to Urban Transitions (GAMUT) water quality monitoring network. Water quality samples have been collected and analyzed for nitrogen, phosphorus, total coliform, E-coli, and some water isotopes. The database (iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite) contains "measurement"-type results.
The example database is located in the data
sub-directory.
# Assign directory paths and SQLite file name
dbname_sqlite = "iUTAHGAMUT_waterquality_measurementresults_ODM2.sqlite"
sqlite_pth = os.path.join("data", dbname_sqlite)
try:
session_factory = dbconnection.createConnection('sqlite', sqlite_pth)
read = odm2rs.ReadODM2(session_factory)
print("Database connection successful!")
except Exception as e:
print("Unable to establish connection to the database: ", e)
Database connection successful!
This section shows some examples of how to use the API to run both simple and more advanced queries on the ODM2 database, as well as how to examine the query output in convenient ways thanks to Python tools.
Simple query functions like getVariables( ) return objects similar to the entities in ODM2, and individual attributes can then be retrieved from the objects returned.
A simple query with simple output.
# Get all of the Variables from the ODM2 database then read the records
# into a Pandas DataFrame to make it easy to view and manipulate
allVars = read.getVariables()
variables_df = pd.DataFrame.from_records([vars(variable) for variable in allVars],
index='VariableID')
variables_df.head(10)
NoDataValue | SpeciationCV | VariableCode | VariableDefinition | VariableNameCV | VariableTypeCV | _sa_instance_state | |
---|---|---|---|---|---|---|---|
VariableID | |||||||
1 | -9999.0000000000 | N | TN | None | Nitrogen, total | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
2 | -9999.0000000000 | P | TP | None | Phosphorus, total | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
3 | -9999.0000000000 | N | Nitrate | None | Nitrogen, dissolved nitrite (NO2) + nitrate (NO3) | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
4 | -9999.0000000000 | N | Ammonia | None | Nitrogen, NH4 | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
5 | -9999.0000000000 | P | Phosphate | None | Phosphorus, orthophosphate dissolved | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
6 | -9999.0000000000 | Not Applicable | Tcoliform | None | Coliform, total | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
7 | -9999.0000000000 | Not Applicable | E-Coli | None | E-coli | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
8 | -9999.0000000000 | C | DOC | None | Carbon, dissolved organic | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
9 | -9999.0000000000 | N | TDN | None | Nitrogen, total dissolved | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
10 | -9999.0000000000 | Not Applicable | Abs254 | None | Absorbance | Water quality | <sqlalchemy.orm.state.InstanceState object at ... |
Another simple query.
allPeople = read.getPeople()
pd.DataFrame.from_records([vars(person) for person in allPeople]).head()
PersonFirstName | PersonID | PersonLastName | PersonMiddleName | _sa_instance_state | |
---|---|---|---|---|---|
0 | Nancy | 1 | Mesner | <sqlalchemy.orm.state.InstanceState object at ... | |
1 | Dane | 2 | Brophy | <sqlalchemy.orm.state.InstanceState object at ... | |
2 | Ben | 3 | Rider | <sqlalchemy.orm.state.InstanceState object at ... | |
3 | Michelle | 4 | Baker | <sqlalchemy.orm.state.InstanceState object at ... | |
4 | Erin | 5 | Jones | <sqlalchemy.orm.state.InstanceState object at ... |
Some of the API functions accept arguments that let you subset what is returned. For example, I can query the database using the getSamplingFeatures( ) function and pass it a SamplingFeatureType of "Site" to return a list of those SamplingFeatures that are Sites.
# Get all of the SamplingFeatures from the ODM2 database that are Sites
siteFeatures = read.getSamplingFeatures(sftype='Site')
# Read Sites records into a Pandas DataFrame
# "if sf.Latitude" is used only to instantiate/read Site attributes)
df = pd.DataFrame.from_records([vars(sf) for sf in siteFeatures if sf.Latitude])
Since we know this is a geospatial dataset (Sites, which have latitude and longitude), we can use more specialized Python tools like GeoPandas
(geospatially enabled Pandas) and Folium
interactive maps.
# Create a GeoPandas GeoDataFrame from Sites DataFrame
ptgeom = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=ptgeom, crs={'init': 'epsg:4326'})
gdf.head(5)
ElevationDatumCV | Elevation_m | FeatureGeometryWKT | Latitude | Longitude | SamplingFeatureCode | SamplingFeatureDescription | SamplingFeatureGeotypeCV | SamplingFeatureID | SamplingFeatureName | SamplingFeatureTypeCV | SamplingFeatureUUID | SiteTypeCV | SpatialReferenceID | _sa_instance_state | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EGM96 | 1356.0 | None | 40.745078 | -111.854449 | RB_1300E | None | None | 1 | Red Butte Creek at 1300E (downstream of spring) | Site | 0DDE8EF6-EC2F-42C0-AB50-20C6C02E89B2 | Stream | 1 | <sqlalchemy.orm.state.InstanceState object at ... | POINT (-111.854449 40.745078) |
1 | EGM96 | 1356.0 | None | 40.745106 | -111.854389 | RB_1300ESpring | None | None | 2 | Spring that enters Red Butte Creek at 1300E | Site | 9848BBFE-EA3F-4918-A324-13E8EDE5381C | Spring | 1 | <sqlalchemy.orm.state.InstanceState object at ... | POINT (-111.854389 40.745106) |
2 | EGM96 | 1289.0 | None | 40.741583 | -111.917667 | RB_900W_BA | None | None | 3 | Red Butte Creek terminus at Jordan River at 13... | Site | 688017BC-9E02-4444-A21D-270366BE2348 | Stream | 1 | <sqlalchemy.orm.state.InstanceState object at ... | POINT (-111.917667 40.741583) |
3 | EGM96 | 1519.0 | None | 40.766134 | -111.826530 | RB_Amphitheater | None | None | 4 | Red Butte Creek below Red Butte Garden Amphith... | Site | 9CFE685B-5CDA-4E38-98D9-406D645C7D21 | Stream | 1 | <sqlalchemy.orm.state.InstanceState object at ... | POINT (-111.82653 40.766134) |
4 | EGM96 | 1648.0 | None | 40.779602 | -111.806669 | RB_ARBR_AA | None | None | 5 | Red Butte Creek above Red Butte Reservoir Adan... | Site | 98C7F63A-FDFB-4898-87C6-5AA8EC34D1E4 | Stream | 1 | <sqlalchemy.orm.state.InstanceState object at ... | POINT (-111.806669 40.779602) |
# Number of records (features) in GeoDataFrame
len(gdf)
25
# A trivial but easy-to-generate GeoPandas plot
gdf.plot();
A site has a SiteTypeCV
. Let's examine the site type distribution, and use that information to create a new GeoDataFrame column to specify a map marker color by SiteTypeCV
.
gdf['SiteTypeCV'].value_counts()
Stream 24 Spring 1 Name: SiteTypeCV, dtype: int64
gdf["color"] = gdf.apply(lambda feat: 'green' if feat['SiteTypeCV'] == 'Stream' else 'red', axis=1)
Note: While the database holds a copy of the ODM2 Controlled Vocabularies, the complete description of each CV term is available from a web request to the CV API at http://vocabulary.odm2.org. Want to know more about how a "spring" is defined? Here's one simple way, using Pandas
to access and parse the CSV web service response.
sitetype = 'spring'
pd.read_csv("http://vocabulary.odm2.org/api/v1/sitetype/{}/?format=csv".format(sitetype))
term | name | definition | category | provenance | provenance_uri | note | |
---|---|---|---|---|---|---|---|
0 | spring | Spring | A location at which the water table intersects... | Spring Sites | Adapted from USGS Site Types. | NaN | http://wdr.water.usgs.gov/nwisgmap/help/sitety... |
Now we'll create an interactive and helpful Folium
map of the sites. This map features:
SiteTypeCV
c = gdf.unary_union.centroid
m = folium.Map(location=[c.y, c.x], tiles='CartoDB positron', zoom_start=11)
marker_cluster = MarkerCluster().add_to(m)
for idx, feature in gdf.iterrows():
folium.Marker(location=[feature.geometry.y, feature.geometry.x],
icon=folium.Icon(color=feature['color']),
popup="{0} ({1}): {2}".format(
feature['SamplingFeatureCode'], feature['SiteTypeCV'],
feature['SamplingFeatureName'])
).add_to(marker_cluster)
# Done with setup. Now render the map
m
Just to llustrate how to add a new entry. We won't "commit" (save) the sampling feature to the database.
sitesf0 = siteFeatures[0]
try:
newsf = SamplingFeatures()
session = session_factory.getSession()
newsf.FeatureGeometryWKT = "POINT(-111.946 41.718)"
newsf.Elevation_m = 100
newsf.ElevationDatumCV = sitesf0.ElevationDatumCV
newsf.SamplingFeatureCode = "TestSF"
newsf.SamplingFeatureDescription = "this is a test to add a sampling feature"
newsf.SamplingFeatureGeotypeCV = "Point"
newsf.SamplingFeatureTypeCV = sitesf0.SamplingFeatureTypeCV
newsf.SamplingFeatureUUID = sitesf0.SamplingFeatureUUID+"2"
session.add(newsf)
# To save the new sampling feature, do session.commit()
print("New sampling feature created, but not saved to database.\n")
print(newsf)
except Exception as e :
print("error adding a sampling feature: {}".format(e))
New sampling feature created, but not saved to database. <SamplingFeatures({'SamplingFeatureDescription': 'this is a test to add a sampling feature', 'SamplingFeatureGeotypeCV': 'Point', 'ElevationDatumCV': u'EGM96', 'Elevation_m': 100, 'SamplingFeatureUUID': u'0DDE8EF6-EC2F-42C0-AB50-20C6C02E89B22', 'SamplingFeatureTypeCV': u'Site', 'SamplingFeatureCode': 'TestSF', 'FeatureGeometryWKT': 'POINT(-111.946 41.718)'})>
This code shows some examples of how objects and related objects can be retrieved using the API. In the following, we use the getSamplingFeatures( ) function to return a particular sampling feature by passing in its SamplingFeatureCode. This function returns a list of SamplingFeature objects, so just get the first one in the returned list.
# Get the SamplingFeature object for a particular SamplingFeature by passing its SamplingFeatureCode
sf = read.getSamplingFeatures(codes=['RB_1300E'])[0]
type(sf)
odm2api.models.Sites
# Simple way to examine the content (properties) of a Python object, as if it were a dictionary
vars(sf)
{'ElevationDatumCV': u'EGM96', 'Elevation_m': 1356.0, 'FeatureGeometryWKT': None, 'Latitude': 40.745078, 'Longitude': -111.854449, 'SamplingFeatureCode': u'RB_1300E', 'SamplingFeatureDescription': None, 'SamplingFeatureGeotypeCV': None, 'SamplingFeatureID': 1, 'SamplingFeatureName': u'Red Butte Creek at 1300E (downstream of spring)', 'SamplingFeatureTypeCV': u'Site', 'SamplingFeatureUUID': u'0DDE8EF6-EC2F-42C0-AB50-20C6C02E89B2', 'SiteTypeCV': u'Stream', 'SpatialReferenceID': 1, '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7efede22fc10>}
You can also drill down and get objects linked by foreign keys. The API returns related objects in a nested hierarchy so they can be interrogated in an object oriented way. So, if I use the getResults( ) function to return a Result from the database (e.g., a "Measurement" Result), I also get the associated Action that created that Result (e.g., a "Specimen analysis" Action).
try:
# Call getResults, but return only the first Result
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
frfaa = firstResult.FeatureActionObj.ActionObj
print("The FeatureAction object for the Result is: ", frfa)
print("The Action object for the Result is: ", frfaa)
# Print some Action attributes in a more human readable form
print("\nThe following are some of the attributes for the Action that created the Result: ")
print("ActionTypeCV: {}".format(frfaa.ActionTypeCV))
print("ActionDescription: {}".format(frfaa.ActionDescription))
print("BeginDateTime: {}".format(frfaa.BeginDateTime))
print("EndDateTime: {}".format(frfaa.EndDateTime))
print("MethodName: {}".format(frfaa.MethodObj.MethodName))
print("MethodDescription: {}".format(frfaa.MethodObj.MethodDescription))
except Exception as e:
print("Unable to demo Foreign Key Example: {}".format(e))
('The FeatureAction object for the Result is: ', <FeatureActions({'FeatureActionID': 1, 'SamplingFeatureID': 26, 'ActionID': 1})>) ('The Action object for the Result is: ', <Actions({'MethodID': 2, 'ActionDescription': None, 'ActionFileLink': None, 'EndDateTime': None, 'BeginDateTime': datetime.datetime(2014, 10, 30, 0, 0), 'EndDateTimeUTCOffset': None, 'ActionTypeCV': u'Specimen analysis', 'ActionID': 1, 'BeginDateTimeUTCOffset': -7})>) The following are some of the attributes for the Action that created the Result: ActionTypeCV: Specimen analysis ActionDescription: None BeginDateTime: 2014-10-30 00:00:00 EndDateTime: None MethodName: Astoria Total Phosphorus MethodDescription: Determination of total phosphorus by persulphate oxidation digestion and ascorbic acid method
Because all of the objects are returned in a nested form, if you retrieve a result, you can interrogate it to get all of its related attributes. When a Result object is returned, it includes objects that contain information about Variable, Units, ProcessingLevel, and the related Action that created that Result.
print("------- Example of Retrieving Attributes of a Result -------")
try:
firstResult = read.getResults()[0]
frfa = firstResult.FeatureActionObj
print("The following are some of the attributes for the Result retrieved: ")
print("ResultID: {}".format(firstResult.ResultID))
print("ResultTypeCV: {}".format(firstResult.ResultTypeCV))
print("ValueCount: {}".format(firstResult.ValueCount))
print("ProcessingLevel: {}".format(firstResult.ProcessingLevelObj.Definition))
print("SampledMedium: {}".format(firstResult.SampledMediumCV))
print("Variable: {}: {}".format(firstResult.VariableObj.VariableCode,
firstResult.VariableObj.VariableNameCV))
print("Units: {}".format(firstResult.UnitsObj.UnitsName))
print("SamplingFeatureID: {}".format(frfa.SamplingFeatureObj.SamplingFeatureID))
print("SamplingFeatureCode: {}".format(frfa.SamplingFeatureObj.SamplingFeatureCode))
except Exception as e:
print("Unable to demo example of retrieving Attributes of a Result: {}".format(e))
------- Example of Retrieving Attributes of a Result ------- The following are some of the attributes for the Result retrieved: ResultID: 1 ResultTypeCV: Measurement ValueCount: 1 ProcessingLevel: Raw Data SampledMedium: Liquid aqueous Variable: TP: Phosphorus, total Units: milligrams per liter SamplingFeatureID: 26 SamplingFeatureCode: 3
The last block of code returns a particular Measurement Result. From that I can get the SamplingFeaureID (in this case 26) for the Specimen from which the Result was generated. But, if I want to figure out which Site the Specimen was collected at, I need to query the database to get the related Site SamplingFeature. I can use getRelatedSamplingFeatures( ) for this. Once I've got the SamplingFeature for the Site, I could get the rest of the SamplingFeature attributes.
# Pass the Sampling Feature ID of the specimen, and the relationship type
relatedSite = read.getRelatedSamplingFeatures(sfid=26, relationshiptype='Was Collected at')[0]
vars(relatedSite)
{'ElevationDatumCV': u'EGM96', 'Elevation_m': 1356.0, 'FeatureGeometryWKT': None, 'Latitude': 40.745078, 'Longitude': -111.854449, 'SamplingFeatureCode': u'RB_1300E', 'SamplingFeatureDescription': None, 'SamplingFeatureGeotypeCV': None, 'SamplingFeatureID': 1, 'SamplingFeatureName': u'Red Butte Creek at 1300E (downstream of spring)', 'SamplingFeatureTypeCV': u'Site', 'SamplingFeatureUUID': u'0DDE8EF6-EC2F-42C0-AB50-20C6C02E89B2', 'SiteTypeCV': u'Stream', 'SpatialReferenceID': 1, '_sa_instance_state': <sqlalchemy.orm.state.InstanceState at 0x7efede22fc10>}
From the list of Variables returned above and the information about the SamplingFeature I queried above, I know that VariableID = 2 for Total Phosphorus and SiteID = 1 for the Red Butte Creek site at 1300E. I can use the getResults( ) function to get all of the Total Phosphorus results for this site by passing in the VariableID and the SiteID.
siteID = 1 # Red Butte Creek at 1300 E (obtained from the getRelatedSamplingFeatures query)
v = variables_df[variables_df['VariableCode'] == 'TP']
variableID = v.index[0]
results = read.getResults(siteid=siteID, variableid=variableID, restype="Measurement")
# Get the list of ResultIDs so I can retrieve the data values associated with all of the results
resultIDList = [x.ResultID for x in results]
len(resultIDList)
18
Now I can retrieve all of the data values associated with the list of Results I just retrieved. In ODM2, water chemistry measurements are stored as "Measurement" results. Each "Measurement" Result has a single data value associated with it. So, for convenience, the getResultValues( ) function allows you to pass in a list of ResultIDs so you can get the data values for all of them back in a Pandas data frame object, which is easier to work with. Once I've got the data in a Pandas data frame object, I can use the plot( ) function directly on the data frame to create a quick visualization.
# Get all of the data values for the Results in the list created above
# Call getResultValues, which returns a Pandas Data Frame with the data
resultValues = read.getResultValues(resultids=resultIDList, lowercols=False)
resultValues.head()
ValueID | ResultID | DataValue | ValueDateTime | ValueDateTimeUTCOffset | |
---|---|---|---|---|---|
0 | 1 | 1 | 0.0100 | 2015-10-27 13:26:24 | -7 |
1 | 100 | 1 | 0.0100 | 2015-11-17 13:55:12 | -7 |
2 | 109 | 10 | 0.0574 | 2015-05-12 14:24:00 | -7 |
3 | 10 | 10 | 0.0574 | 2015-06-18 12:43:12 | -7 |
4 | 198 | 99 | 0.0424 | 2015-10-27 13:55:12 | -7 |
# Plot the time sequence of Measurement Result Values
ax = resultValues.plot(x='ValueDateTime', y='DataValue', title=relatedSite.SamplingFeatureName,
kind='line', use_index=True, linestyle='solid', style='o')
ax.set_ylabel("{0} ({1})".format(results[0].VariableObj.VariableNameCV,
results[0].UnitsObj.UnitsAbbreviation))
ax.set_xlabel('Date/Time')
ax.legend().set_visible(False)
If I'm going to reuse a series of steps, it's always helpful to write little generic functions that can be called to quickly and consistently get what we need. To conclude this demo, here's one such function that encapsulates the VariableID
, getResults
and getResultValues
queries we showed above. Then we leverage it to create a nice 2-variable (2-axis) plot of TP and TN vs time, and conclude with a reminder that we have ready access to related metadata about analytical lab methods and such.
def get_results_and_values(siteid, variablecode):
v = variables_df[variables_df['VariableCode'] == variablecode]
variableID = v.index[0]
results = read.getResults(siteid=siteid, variableid=variableID, restype="Measurement")
resultIDList = [x.ResultID for x in results]
resultValues = read.getResultValues(resultids=resultIDList, lowercols=False)
return resultValues, results
Fancy plotting, leveraging the Pandas
plot method and matplotlib
.
# Plot figure and axis set up
f, ax = plt.subplots(1, figsize=(13, 6))
# First plot (left axis)
VariableCode = 'TP'
resultValues_TP, results_TP = get_results_and_values(siteID, VariableCode)
resultValues_TP.plot(x='ValueDateTime', y='DataValue', label=VariableCode,
style='o-', kind='line', ax=ax)
ax.set_ylabel("{0}: {1} ({2})".format(VariableCode, results_TP[0].VariableObj.VariableNameCV,
results_TP[0].UnitsObj.UnitsAbbreviation))
# Second plot (right axis)
VariableCode = 'TN'
resultValues_TN, results_TN = get_results_and_values(siteID, VariableCode)
resultValues_TN.plot(x='ValueDateTime', y='DataValue', label=VariableCode,
style='^-', kind='line', ax=ax,
secondary_y=True)
ax.right_ax.set_ylabel("{0}: {1} ({2})".format(VariableCode, results_TN[0].VariableObj.VariableNameCV,
results_TN[0].UnitsObj.UnitsAbbreviation))
# Tweak the figure
ax.legend(loc='upper left')
ax.right_ax.legend(loc='upper right')
ax.grid(True)
ax.set_xlabel('')
ax.set_title(relatedSite.SamplingFeatureName);
Finally, let's show some useful metadata. Use the Results
records and their relationship to Actions
(via FeatureActions
) to extract and print out the Specimen Analysis methods used for TN and TP. Or at least for the first result for each of the two variables; methods may have varied over time, but the specific method associated with each result is stored in ODM2 and available.
results_faam = lambda results, i: results[i].FeatureActionObj.ActionObj.MethodObj
print("TP METHOD: {0} ({1})".format(results_faam(results_TP, 0).MethodName,
results_faam(results_TP, 0).MethodDescription))
print("TN METHOD: {0} ({1})".format(results_faam(results_TN, 0).MethodName,
results_faam(results_TN, 0).MethodDescription))
TP METHOD: Astoria Total Phosphorus (Determination of total phosphorus by persulphate oxidation digestion and ascorbic acid method) TN METHOD: Astoria Total Nitrogen (Determination of total Nitrogen by persulphate oxidation digestion and cadmium reduction method)