Xarray is a powerful Python package that provides N-dimensional
labeled arrays and datasets following the Common Data Model. MetPy's suite of meteorological
calculations are designed to integrate with xarray DataArrays as one of its two primary data
models (the other being Pint Quantities). MetPy also provides DataArray and Dataset
accessors (collections of methods and properties attached to the .metpy
property) for
coordinate/CRS and unit operations.
Full information on MetPy's accessors is available in the appropriate section of MetPy's documentation, otherwise, continue on in this notebook for a demonstration of the three main components of MetPy's integration with xarray (coordinates/coordinate reference systems, units, and calculations), as well as instructive examples for both CF-compliant and non-compliant datasets.
This training notebook a lengthy one, and is based off of the xarray with MetPy Tutorial in MetPy's documentation. Also, there are several other training notebooks that expand upon the features of xarray (see the see also section below). Feel free to use this notebook as the big-picture overview of working with gridded data in MetPy's ecosystem (and then going to the specific xarray notebooks where needed), or, look through the xarray notebooks individually first, since this notebook covers extensions to those behaviors to make them easier to use with MetPy.
import numpy as np
import xarray as xr
# Any import of metpy will activate the accessors
import metpy.calc as mpcalc
from metpy.cbook import get_test_data
from metpy.units import units
While xarray can handle a wide variety of n-dimensional data (essentially anything that can be stored in a netCDF file), a common use case is working with gridded model output. For more details on accessing data with xarray, see the xarray data access notebook.
For now, we are just going to load a local file from MetPy's test data collection:
# Open the netCDF file as a xarray Dataset
data = xr.open_dataset(get_test_data('irma_gfs_example.nc', False))
# View a summary of the Dataset
data
Take a look at the output of this cell. The data
variable here is an xarray Dataset
, and it consists of dimensions and their associated coordinates, which in turn make up the axes along which the data variables are defined. The dataset also has a dictionary-like collection of attributes. If you've worked with NetCDF data in the past, hopefully these should be familiar to you.
What happens if we look at just a single data variable?
temperature = data['Temperature_isobaric']
temperature
This is a DataArray
, which stores just a single data variable with its associated
coordinates and attributes. These individual DataArray
s are the kinds of objects that
MetPy's calculations take as input (more on that in Calculations section below).
If you are more interested in learning about xarray's terminology and data structures, see the terminology section of xarray's documenation.
MetPy's first set of helpers comes with identifying coordinate types. In a given dataset, coordinates can have a variety of different names and yet refer to the same type (such as the "isobaric1" and "isobaric3" coordinates seen above both referring to vertical isobaric coordinates). Following CF conventions, as well as using some fall-back regular expressions, MetPy can systematically identify coordinates of the following types:
When identifying a single coordinate, it is best to use the property directly associated with that type
temperature.metpy.time
When accessing multiple coordinate types simultaneously, you can use the .coordinates()
method to yield a generator for the respective coordinates
x, y = temperature.metpy.coordinates('x', 'y')
If you're indexing into your data with xarray, and want to use these shortcut aliases instead of the direct coordinate names, MetPy provides its own version of .sel
and .loc
to handle these! For example, to use MetPy's wrapped .sel
and .loc
to access 500 hPa heights at 1800Z, you can do the following:
heights = data['Geopotential_height_isobaric'].metpy.sel(
time='2017-09-05 18:00',
vertical=50000.
)
(Notice how we specified 50000 here without units...we'll go over a better alternative in the next section on units.)
One point of warning: xarray's selection and indexing only works if these coordinates are
dimension coordinates, meaning that they are 1D and share the name of their associated
dimension. In practice, this means that you can't index a dataset that has 2D latitude and
longitude coordinates by latitudes and longitudes, instead, you must index by the 1D y and x
dimension coordinates. (What if these coordinates are missing, you may ask? MetPy has the .assign_y_x
helper for that.)
Beyond just the coordinates themselves, a common need for both calculations with and plots
of geospatial data is knowing the coordinate reference system (CRS) on which the horizontal
spatial coordinates are defined. MetPy follows the CF Conventions
for its CRS definitions, which it then caches on the metpy_crs
coordinate in order for
it to persist through calculations and other array operations. There are two ways to do so
in MetPy:
First, if your dataset is already conforming to the CF Conventions, it will have a grid
mapping variable that is associated with the other data variables by the grid_mapping
attribute. This is automatically parsed via the .parse_cf()
method:
# Parse full dataset
data_parsed = data.metpy.parse_cf()
# Parse subset of dataset
data_subset = data.metpy.parse_cf([
'u-component_of_wind_isobaric',
'v-component_of_wind_isobaric',
'Vertical_velocity_pressure_isobaric'
])
# Parse single variable
relative_humidity = data.metpy.parse_cf('Relative_humidity_isobaric')
If your dataset doesn't have a CF-conforming grid mapping variable, you can manually specify
the CRS using the .assign_crs()
method:
temperature = data['Temperature_isobaric'].metpy.assign_crs(
grid_mapping_name='latitude_longitude',
earth_radius=6371229.0
)
temperature
Notice the newly added metpy_crs
non-dimension coordinate. Now how can we use this in
practice? For individual DataArrays
s, we can access the cartopy and pyproj objects
corresponding to this CRS:
# Cartopy CRS, useful for plotting
relative_humidity.metpy.cartopy_crs
# pyproj CRS, useful for projection transformations and forward/backward azimuth and great
# circle calculations
temperature.metpy.pyproj_crs
Finally, there are times when a certain horizontal coordinate type is missing from your
dataset, and you need the other, that is, you have latitude/longitude and need y/x, or visa
versa. This is where the .assign_y_x
and .assign_latitude_longitude
methods come in
handy. Our current GFS sample won't work to demonstrate this (since, on its
latitude-longitude grid, y is latitude and x is longitude), so for more information, take
a look at the Non-Compliant Dataset Example below.
Since unit-aware calculations are a major part of the MetPy library, unit support is a major part of MetPy's xarray integration!
One very important point of consideration is that xarray data variables (in both
Dataset
s and DataArray
s) can store both unit-aware and unit-naive array types.
Unit-naive array types will be used by default in xarray, so we need to convert to a
unit-aware type if we want to use xarray operations while preserving unit correctness (What are these xarray operations you may ask? See the xarray aggregations and xarray calculations training notebooks!). MetPy
provides the .quantify()
method for this conversion (named since we are turning the data stored
inside the xarray object into a Pint Quantity
object...and if you want to learn more about those, see the MetPy with units training notebook).
heights = heights.metpy.quantify()
heights
Notice how the units are now represented in the data itself, rather than as a text attribute. Now, even if we perform some kind of xarray operation (such as taking the zonal mean, which is an aggregation operation), the units are preserved
heights_mean = heights.mean('longitude')
heights_mean
However, this "quantification" is not without its consequences. By default, xarray loads its
data lazily to conserve memory usage. Unless your data is chunked into a Dask array (using
the chunks
argument), this .quantify()
method will load data into memory, which
could slow your script or even cause your process to run out of memory. And so, we recommend
subsetting your data before quantifying it.
Also, these Pint Quantity
data objects are not properly handled by xarray when writing
to disk. And so, if you want to safely export your data, you will need to undo the
quantification with the .dequantify()
method, which converts your data back to a
unit-naive array with the unit as a text attribute
heights_mean_str_units = heights_mean.metpy.dequantify()
heights_mean_str_units
Other useful unit integration features include:
Unit-based selection/indexing:
heights_at_45_north = data['Geopotential_height_isobaric'].metpy.sel(
latitude=45 * units.degrees_north,
vertical=300 * units.hPa
)
heights_at_45_north
Unit conversion:
temperature_degC = temperature[0].metpy.convert_units('degC')
temperature_degC
Unit conversion for coordinates:
heights_on_hPa_levels = heights.metpy.convert_coordinate_units('isobaric3', 'hPa')
heights_on_hPa_levels['isobaric3']
Accessing just the underlying unit array:
heights_unit_array = heights.metpy.unit_array
heights_unit_array
Accessing just the underlying units:
height_units = heights.metpy.units
height_units
MetPy's xarray integration extends to its calcuation suite as well. Most grid-capable
calculations (such as thermodynamics, kinematics, and smoothers) fully support xarray
DataArray
s by accepting them as inputs, returning them as outputs, and automatically
using the attached coordinate data/metadata to determine grid arguments.
As before, if you'd like more information on MetPy's calculations, see the basic overview training notebook or look through MetPy's documentation.
As an example, let's calculate geostrophic wind from our geopotential height data. Note that we only have to specify the height field, as all coordinate/grid information is automatically extracted from the attached metadata.
heights = data_parsed.metpy.parse_cf('Geopotential_height_isobaric').metpy.sel(
time='2017-09-05 18:00',
vertical=500 * units.hPa
)
u_g, v_g = mpcalc.geostrophic_wind(heights)
u_g
Now, not every MetPy calculation will always return a DataArray
. For profile-based calculations (and most remaining calculations in the metpy.calc
module not in the categories mentioned above), xarray DataArray
s are accepted as inputs, but the outputs remain Pint Quantities (typically scalars), like in this below example with CAPE:
data_at_point = data.metpy.sel(
time1='2017-09-05 12:00',
latitude=40 * units.degrees_north,
longitude=260 * units.degrees_east
)
dewpoint = mpcalc.dewpoint_from_relative_humidity(
data_at_point['Temperature_isobaric'],
data_at_point['Relative_humidity_isobaric']
)
cape, cin = mpcalc.surface_based_cape_cin(
data_at_point['isobaric3'],
data_at_point['Temperature_isobaric'],
dewpoint
)
cape
A few remaining portions of MetPy's calculations (mainly the interpolation module and a few
other functions) do not fully support xarray, and so, use of .values
may be needed to
convert to a bare NumPy array.
Now let's examine some other datasets so that we can go over the two most common workflows needed to handle any kind of gridded dataset you may have in xarray.
The GFS sample used throughout this tutorial so far has been an example of a CF-compliant dataset. These kinds of datasets are easiest to work with it MetPy, since most of the "xarray magic" uses CF metadata. For this kind of dataset, a typical workflow looks like the following
# Load data, parse it for a CF grid mapping, and promote lat/lon data variables to coordinates
data = xr.open_dataset(
get_test_data('narr_example.nc', False)
).metpy.parse_cf().set_coords(['lat', 'lon'])
# Subset to only the data you need to save on memory usage
subset = data.metpy.sel(isobaric=500 * units.hPa)
# Quantify if you plan on performing xarray operations that need to maintain unit correctness
subset = subset.metpy.quantify()
# Perform calculations
heights = mpcalc.smooth_gaussian(subset['Geopotential_height'], 5)
subset['u_geo'], subset['v_geo'] = mpcalc.geostrophic_wind(heights)
# Plot
heights.plot()
# Save output
subset.metpy.dequantify().drop_vars('metpy_crs').to_netcdf('500hPa_analysis.nc')
When CF metadata (such as grid mapping, coordinate attributes, etc.) are missing, a bit more work is required to manually supply the required information, for example,
nonstandard = xr.Dataset({
'temperature': (('y', 'x'), np.arange(0, 9).reshape(3, 3) * units.degC),
'y': ('y', np.arange(0, 3) * 1e5, {'units': 'km'}),
'x': ('x', np.arange(0, 3) * 1e5, {'units': 'km'})
})
# Add both CRS and then lat/lon coords using chained methods
data = nonstandard.metpy.assign_crs(
grid_mapping_name='lambert_conformal_conic',
latitude_of_projection_origin=38.5,
longitude_of_central_meridian=262.5,
standard_parallel=38.5,
earth_radius=6371229.0
).metpy.assign_latitude_longitude()
# Preview the changes
data
Once the CRS and additional coordinates are assigned, you can generally proceed as you would for a CF-compliant dataset.
There was a lot going on in this notebook! But, that's just because there is so much that you can do with xarray, and MetPy has a fair number of utilities to make it all go more smoothly when working with atmospheric science datasets. Since we couldn't cover the details of everything xarray here, be sure to take a look at the other xarray training notebooks linked below: