This notebooks shows how to query the catalog and load the data using python
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import fsspec
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = 12, 6
The data catatalog is stored as a CSV file. Here we read it with Pandas.
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()
The columns of the dataframe correspond to the CMI6 controlled vocabulary. A beginners' guide to these terms is available in this document.
Here we filter the data to find monthly surface air temperature for historical experiments.
df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta
Now we do further filtering to find just the models from NCAR.
df_ta_ncar = df_ta.query('institution_id == "NCAR"')
df_ta_ncar
Now we will load a single store using fsspec, zarr, and xarray.
# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_ta_ncar.zstore.values[-1]
print(zstore)
# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)
# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
ds
Plot a map from a specific date.
ds.tas.sel(time='1950-01').squeeze().plot()
Create a timeseries of global-average surface air temperature. For this we need the area weighting factor for each gridpoint.
df_area = df.query("variable_id == 'areacella' & source_id == 'CESM2'")
ds_area = xr.open_zarr(fsspec.get_mapper(df_area.zstore.values[0]), consolidated=True)
ds_area
total_area = ds_area.areacella.sum(dim=['lon', 'lat'])
ta_timeseries = (ds.tas * ds_area.areacella).sum(dim=['lon', 'lat']) / total_area
ta_timeseries
By default the data are loaded lazily, as Dask arrays. Here we trigger computation explicitly.
%time ta_timeseries.load()
ta_timeseries.plot(label='monthly')
ta_timeseries.rolling(time=12).mean().plot(label='12 month rolling mean')
plt.legend()
plt.title('Global Mean Surface Air Temperature')