This notebooks shows how to query the catalog and load the data using python
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import zarr
import fsspec
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = 12, 6
The data catatalog is stored as a CSV file. Here we read it with Pandas.
df = pd.read_csv('https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv')
df.head()
/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3146: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False. interactivity=interactivity, compiler=compiler, result=result)
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | AerChemMIP | AS-RCEC | TaiESM1 | histSST | r1i1p1f1 | AERmon | od550aer | gn | gs://cmip6/AerChemMIP/AS-RCEC/TaiESM1/histSST/... | NaN | 20200310 |
1 | AerChemMIP | BCC | BCC-ESM1 | histSST | r1i1p1f1 | AERmon | mmrbc | gn | gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i... | NaN | 20190718 |
2 | AerChemMIP | BCC | BCC-ESM1 | histSST | r1i1p1f1 | AERmon | mmrdust | gn | gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i... | NaN | 20191127 |
3 | AerChemMIP | BCC | BCC-ESM1 | histSST | r1i1p1f1 | AERmon | mmroa | gn | gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i... | NaN | 20190809 |
4 | AerChemMIP | BCC | BCC-ESM1 | histSST | r1i1p1f1 | AERmon | mmrso4 | gn | gs://cmip6/AerChemMIP/BCC/BCC-ESM1/histSST/r1i... | NaN | 20191127 |
The columns of the dataframe correspond to the CMI6 controlled vocabulary. A beginners' guide to these terms is available in this document.
Here we filter the data to find monthly surface air temperature for historical experiments.
df_ta = df.query("activity_id=='CMIP' & table_id == 'Amon' & variable_id == 'tas' & experiment_id == 'historical'")
df_ta
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
5769 | CMIP | AS-RCEC | TaiESM1 | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/AS-RCEC/TaiESM1/historical/r1i... | NaN | 20200623 |
5953 | CMIP | AWI | AWI-CM-1-1-MR | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/AWI/AWI-CM-1-1-MR/historical/r... | NaN | 20200720 |
6013 | CMIP | AWI | AWI-CM-1-1-MR | historical | r2i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/AWI/AWI-CM-1-1-MR/historical/r... | NaN | 20200720 |
6067 | CMIP | AWI | AWI-CM-1-1-MR | historical | r3i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/AWI/AWI-CM-1-1-MR/historical/r... | NaN | 20200720 |
6126 | CMIP | AWI | AWI-CM-1-1-MR | historical | r4i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/AWI/AWI-CM-1-1-MR/historical/r... | NaN | 20200720 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
75408 | CMIP | THU | CIESM | historical | r1i1p1f1 | Amon | tas | gr | gs://cmip6/CMIP/THU/CIESM/historical/r1i1p1f1/... | NaN | 20200417 |
75446 | CMIP | THU | CIESM | historical | r2i1p1f1 | Amon | tas | gr | gs://cmip6/CMIP/THU/CIESM/historical/r2i1p1f1/... | NaN | 20200417 |
75473 | CMIP | THU | CIESM | historical | r3i1p1f1 | Amon | tas | gr | gs://cmip6/CMIP/THU/CIESM/historical/r3i1p1f1/... | NaN | 20200417 |
75598 | CMIP | UA | MCM-UA-1-0 | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/UA/MCM-UA-1-0/historical/r1i1p... | NaN | 20190731 |
75639 | CMIP | UA | MCM-UA-1-0 | historical | r1i1p1f2 | Amon | tas | gn | gs://cmip6/CMIP/UA/MCM-UA-1-0/historical/r1i1p... | NaN | 20190731 |
546 rows × 11 columns
Now we do further filtering to find just the models from NCAR.
df_ta_ncar = df_ta.query('institution_id == "NCAR"')
df_ta_ncar
activity_id | institution_id | source_id | experiment_id | member_id | table_id | variable_id | grid_label | zstore | dcpp_init_year | version | |
---|---|---|---|---|---|---|---|---|---|---|---|
61014 | CMIP | NCAR | CESM2-FV2 | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-FV2/historical/r1i1... | NaN | 20191120 |
61332 | CMIP | NCAR | CESM2-FV2 | historical | r2i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-FV2/historical/r2i1... | NaN | 20200226 |
61453 | CMIP | NCAR | CESM2-FV2 | historical | r3i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-FV2/historical/r3i1... | NaN | 20200226 |
61865 | CMIP | NCAR | CESM2-WACCM-FV2 | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-WACCM-FV2/historica... | NaN | 20191120 |
62163 | CMIP | NCAR | CESM2-WACCM-FV2 | historical | r2i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-WACCM-FV2/historica... | NaN | 20200226 |
62290 | CMIP | NCAR | CESM2-WACCM-FV2 | historical | r3i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-WACCM-FV2/historica... | NaN | 20200226 |
63238 | CMIP | NCAR | CESM2-WACCM | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-WACCM/historical/r1... | NaN | 20190227 |
63636 | CMIP | NCAR | CESM2-WACCM | historical | r2i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-WACCM/historical/r2... | NaN | 20190227 |
63942 | CMIP | NCAR | CESM2-WACCM | historical | r3i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2-WACCM/historical/r3... | NaN | 20190227 |
65381 | CMIP | NCAR | CESM2 | historical | r10i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r10i1p1f... | NaN | 20190313 |
65666 | CMIP | NCAR | CESM2 | historical | r11i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r11i1p1f... | NaN | 20190514 |
65998 | CMIP | NCAR | CESM2 | historical | r1i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r1i1p1f1... | NaN | 20190308 |
66325 | CMIP | NCAR | CESM2 | historical | r2i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r2i1p1f1... | NaN | 20190308 |
66652 | CMIP | NCAR | CESM2 | historical | r3i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r3i1p1f1... | NaN | 20190308 |
66979 | CMIP | NCAR | CESM2 | historical | r4i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r4i1p1f1... | NaN | 20190308 |
67290 | CMIP | NCAR | CESM2 | historical | r5i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r5i1p1f1... | NaN | 20190308 |
67601 | CMIP | NCAR | CESM2 | historical | r6i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r6i1p1f1... | NaN | 20190308 |
67913 | CMIP | NCAR | CESM2 | historical | r7i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r7i1p1f1... | NaN | 20190311 |
68223 | CMIP | NCAR | CESM2 | historical | r8i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r8i1p1f1... | NaN | 20190311 |
68533 | CMIP | NCAR | CESM2 | historical | r9i1p1f1 | Amon | tas | gn | gs://cmip6/CMIP/NCAR/CESM2/historical/r9i1p1f1... | NaN | 20190311 |
Now we will load a single store using fsspec, zarr, and xarray.
# get the path to a specific zarr store (the first one from the dataframe above)
zstore = df_ta_ncar.zstore.values[-1]
print(zstore)
# create a mutable-mapping-style interface to the store
mapper = fsspec.get_mapper(zstore)
# open it using xarray and zarr
ds = xr.open_zarr(mapper, consolidated=True)
ds
gs://cmip6/CMIP/NCAR/CESM2/historical/r9i1p1f1/Amon/tas/gn/
<xarray.Dataset> Dimensions: (lat: 192, lon: 288, nbnd: 2, time: 1980) Coordinates: * lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0 lat_bnds (lat, nbnd) float32 dask.array<chunksize=(192, 2), meta=np.ndarray> * lon (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8 lon_bnds (lon, nbnd) float32 dask.array<chunksize=(288, 2), meta=np.ndarray> * time (time) object 1850-01-15 12:00:00 ... 2014-12-15 12:00:00 time_bnds (time, nbnd) object dask.array<chunksize=(1980, 2), meta=np.ndarray> Dimensions without coordinates: nbnd Data variables: tas (time, lat, lon) float32 dask.array<chunksize=(600, 192, 288), meta=np.ndarray> Attributes: Conventions: CF-1.7 CMIP-6.2 activity_id: CMIP branch_method: standard branch_time_in_child: 674885.0 branch_time_in_parent: 295650.0 case_id: 23 cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.009 contact: cesm_cmip6@ucar.edu creation_date: 2019-01-27T10:42:54Z data_specs_version: 01.00.29 experiment: all-forcing simulation of the recent past experiment_id: historical external_variables: areacella forcing_index: 1 frequency: mon further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.h... grid: native 0.9x1.25 finite volume grid (192x288 latxlon) grid_label: gn history: none initialization_index: 1 institution: National Center for Atmospheric Research, Climate... institution_id: NCAR license: CMIP6 model data produced by <The National Center... mip_era: CMIP6 model_doi_url: https://doi.org/10.5065/D67H1H0V nominal_resolution: 100 km parent_activity_id: CMIP parent_experiment_id: piControl parent_mip_era: CMIP6 parent_source_id: CESM2 parent_time_units: days since 0001-01-01 00:00:00 parent_variant_label: r1i1p1f1 physics_index: 1 product: model-output realization_index: 9 realm: atmos source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite v... source_id: CESM2 source_type: AOGCM BGC sub_experiment: none sub_experiment_id: none table_id: Amon tracking_id: hdl:21.14100/4b164514-1627-4deb-a8d5-93d4c5166d41... variable_id: tas variant_info: CMIP6 20th century experiments (1850-2014) with C... variant_label: r9i1p1f1 status: 2019-10-25;created;by nhn2@columbia.edu
array([-90. , -89.057592, -88.115183, -87.172775, -86.230366, -85.287958, -84.34555 , -83.403141, -82.460733, -81.518325, -80.575916, -79.633508, -78.691099, -77.748691, -76.806283, -75.863874, -74.921466, -73.979058, -73.036649, -72.094241, -71.151832, -70.209424, -69.267016, -68.324607, -67.382199, -66.439791, -65.497382, -64.554974, -63.612565, -62.670157, -61.727749, -60.78534 , -59.842932, -58.900524, -57.958115, -57.015707, -56.073298, -55.13089 , -54.188482, -53.246073, -52.303665, -51.361257, -50.418848, -49.47644 , -48.534031, -47.591623, -46.649215, -45.706806, -44.764398, -43.82199 , -42.879581, -41.937173, -40.994764, -40.052356, -39.109948, -38.167539, -37.225131, -36.282723, -35.340314, -34.397906, -33.455497, -32.513089, -31.570681, -30.628272, -29.685864, -28.743455, -27.801047, -26.858639, -25.91623 , -24.973822, -24.031414, -23.089005, -22.146597, -21.204188, -20.26178 , -19.319372, -18.376963, -17.434555, -16.492147, -15.549738, -14.60733 , -13.664921, -12.722513, -11.780105, -10.837696, -9.895288, -8.95288 , -8.010471, -7.068063, -6.125654, -5.183246, -4.240838, -3.298429, -2.356021, -1.413613, -0.471204, 0.471204, 1.413613, 2.356021, 3.298429, 4.240838, 5.183246, 6.125654, 7.068063, 8.010471, 8.95288 , 9.895288, 10.837696, 11.780105, 12.722513, 13.664921, 14.60733 , 15.549738, 16.492147, 17.434555, 18.376963, 19.319372, 20.26178 , 21.204188, 22.146597, 23.089005, 24.031414, 24.973822, 25.91623 , 26.858639, 27.801047, 28.743455, 29.685864, 30.628272, 31.570681, 32.513089, 33.455497, 34.397906, 35.340314, 36.282723, 37.225131, 38.167539, 39.109948, 40.052356, 40.994764, 41.937173, 42.879581, 43.82199 , 44.764398, 45.706806, 46.649215, 47.591623, 48.534031, 49.47644 , 50.418848, 51.361257, 52.303665, 53.246073, 54.188482, 55.13089 , 56.073298, 57.015707, 57.958115, 58.900524, 59.842932, 60.78534 , 61.727749, 62.670157, 63.612565, 64.554974, 65.497382, 66.439791, 67.382199, 68.324607, 69.267016, 70.209424, 71.151832, 72.094241, 73.036649, 73.979058, 74.921466, 75.863874, 76.806283, 77.748691, 78.691099, 79.633508, 80.575916, 81.518325, 82.460733, 83.403141, 84.34555 , 85.287958, 86.230366, 87.172775, 88.115183, 89.057592, 90. ])
|
array([ 0. , 1.25, 2.5 , ..., 356.25, 357.5 , 358.75])
|
array([cftime.DatetimeNoLeap(1850, 1, 15, 12, 0, 0, 0), cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0), cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0), ..., cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0), cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0), cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0)], dtype=object)
|
|
Plot a map from a specific date.
ds.tas.sel(time='1950-01').squeeze().plot()
<matplotlib.collections.QuadMesh at 0x7f4e2e378390>
Create a timeseries of global-average surface air temperature. For this we need the area weighting factor for each gridpoint.
df_area = df.query("variable_id == 'areacella' & source_id == 'CESM2'")
ds_area = xr.open_zarr(fsspec.get_mapper(df_area.zstore.values[0]), consolidated=True)
ds_area
<xarray.Dataset> Dimensions: (lat: 192, lon: 288, nbnd: 2) Coordinates: * lat (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0 lat_bnds (lat, nbnd) float64 dask.array<chunksize=(192, 2), meta=np.ndarray> * lon (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8 lon_bnds (lon, nbnd) float64 dask.array<chunksize=(288, 2), meta=np.ndarray> Dimensions without coordinates: nbnd Data variables: areacella (lat, lon) float32 dask.array<chunksize=(192, 288), meta=np.ndarray> Attributes: Conventions: CF-1.7 CMIP-6.2 activity_id: CFMIP branch_method: no parent branch_time_in_child: 721970.0 branch_time_in_parent: 0.0 case_id: 965 cesm_casename: f.e21.FHIST_BGC.f09_f09_mg17.CFMIP-amip-4xCO2.001 contact: cesm_cmip6@ucar.edu creation_date: 2019-03-13T22:41:31Z data_specs_version: 01.00.29 experiment: Continuation of CFMIP-2 AMIP experiments and CMIP... experiment_id: amip-4xCO2 forcing_index: 1 frequency: fx further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.a... grid: native 0.9x1.25 finite volume grid (192x288 latxlon) grid_label: gn initialization_index: 1 institution: National Center for Atmospheric Research, Climate... institution_id: NCAR license: CMIP6 model data produced by <The National Center... mip_era: CMIP6 model_doi_url: https://doi.org/10.5065/D67H1H0V nominal_resolution: 100 km parent_activity_id: no parent parent_experiment_id: no parent parent_mip_era: no parent parent_source_id: no parent parent_time_units: no parent parent_variant_label: no parent physics_index: 1 product: model-output realization_index: 1 realm: atmos land source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite v... source_id: CESM2 source_type: AGCM AER sub_experiment: none sub_experiment_id: none table_id: fx tracking_id: hdl:21.14100/35b69461-eb69-4a58-b2f7-413dbf2cd8fe variable_id: areacella variant_info: The same as the amip experiment within DECK, exce... variant_label: r1i1p1f1 status: 2019-11-04;created;by nhn2@columbia.edu
array([-90. , -89.057592, -88.115183, -87.172775, -86.230366, -85.287958, -84.34555 , -83.403141, -82.460733, -81.518325, -80.575916, -79.633508, -78.691099, -77.748691, -76.806283, -75.863874, -74.921466, -73.979058, -73.036649, -72.094241, -71.151832, -70.209424, -69.267016, -68.324607, -67.382199, -66.439791, -65.497382, -64.554974, -63.612565, -62.670157, -61.727749, -60.78534 , -59.842932, -58.900524, -57.958115, -57.015707, -56.073298, -55.13089 , -54.188482, -53.246073, -52.303665, -51.361257, -50.418848, -49.47644 , -48.534031, -47.591623, -46.649215, -45.706806, -44.764398, -43.82199 , -42.879581, -41.937173, -40.994764, -40.052356, -39.109948, -38.167539, -37.225131, -36.282723, -35.340314, -34.397906, -33.455497, -32.513089, -31.570681, -30.628272, -29.685864, -28.743455, -27.801047, -26.858639, -25.91623 , -24.973822, -24.031414, -23.089005, -22.146597, -21.204188, -20.26178 , -19.319372, -18.376963, -17.434555, -16.492147, -15.549738, -14.60733 , -13.664921, -12.722513, -11.780105, -10.837696, -9.895288, -8.95288 , -8.010471, -7.068063, -6.125654, -5.183246, -4.240838, -3.298429, -2.356021, -1.413613, -0.471204, 0.471204, 1.413613, 2.356021, 3.298429, 4.240838, 5.183246, 6.125654, 7.068063, 8.010471, 8.95288 , 9.895288, 10.837696, 11.780105, 12.722513, 13.664921, 14.60733 , 15.549738, 16.492147, 17.434555, 18.376963, 19.319372, 20.26178 , 21.204188, 22.146597, 23.089005, 24.031414, 24.973822, 25.91623 , 26.858639, 27.801047, 28.743455, 29.685864, 30.628272, 31.570681, 32.513089, 33.455497, 34.397906, 35.340314, 36.282723, 37.225131, 38.167539, 39.109948, 40.052356, 40.994764, 41.937173, 42.879581, 43.82199 , 44.764398, 45.706806, 46.649215, 47.591623, 48.534031, 49.47644 , 50.418848, 51.361257, 52.303665, 53.246073, 54.188482, 55.13089 , 56.073298, 57.015707, 57.958115, 58.900524, 59.842932, 60.78534 , 61.727749, 62.670157, 63.612565, 64.554974, 65.497382, 66.439791, 67.382199, 68.324607, 69.267016, 70.209424, 71.151832, 72.094241, 73.036649, 73.979058, 74.921466, 75.863874, 76.806283, 77.748691, 78.691099, 79.633508, 80.575916, 81.518325, 82.460733, 83.403141, 84.34555 , 85.287958, 86.230366, 87.172775, 88.115183, 89.057592, 90. ])
|
array([ 0. , 1.25, 2.5 , ..., 356.25, 357.5 , 358.75])
|
|
total_area = ds_area.areacella.sum(dim=['lon', 'lat'])
ta_timeseries = (ds.tas * ds_area.areacella).sum(dim=['lon', 'lat']) / total_area
ta_timeseries
<xarray.DataArray (time: 1980)> dask.array<truediv, shape=(1980,), dtype=float32, chunksize=(600,), chunktype=numpy.ndarray> Coordinates: * time (time) object 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
|
array([cftime.DatetimeNoLeap(1850, 1, 15, 12, 0, 0, 0), cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0), cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0), ..., cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0), cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0), cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0)], dtype=object)
By default the data are loaded lazily, as Dask arrays. Here we trigger computation explicitly.
%time ta_timeseries.load()
CPU times: user 2.16 s, sys: 998 ms, total: 3.15 s Wall time: 2.26 s
<xarray.DataArray (time: 1980)> array([285.6408 , 285.57397, 286.33856, ..., 288.6992 , 287.76303, 287.0621 ], dtype=float32) Coordinates: * time (time) object 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
array([285.6408 , 285.57397, 286.33856, ..., 288.6992 , 287.76303, 287.0621 ], dtype=float32)
array([cftime.DatetimeNoLeap(1850, 1, 15, 12, 0, 0, 0), cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0), cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0), ..., cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0), cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0), cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0)], dtype=object)
ta_timeseries.plot(label='monthly')
ta_timeseries.rolling(time=12).mean().plot(label='12 month rolling mean')
plt.legend()
plt.title('Global Mean Surface Air Temperature')
Text(0.5, 1.0, 'Global Mean Surface Air Temperature')