Tutorial: Python Methodology to Derive a Climatology from netCDF Daily Data. Includes method for automating multiple years.


In this tutorial, we will open and subset a temporal range of values for the maximum temperature variables of a dataset that contains daily meteorologic data. We will create a summer climatology and save the climatology into a new netCDF file.

Source Data

The source data is a subsest of the North American spatial extent of the Daymet dataset daily data. Data are available from https://daymet.ornl.gov . Obtain spatial or temporal subsets from the netCDF Subset Service (NCSS). A tutorial of the NCSS is available here: https://daymet.ornl.gov/web_services.html

1. Load python modules for working with netCDF files.

NumPy works with N-dimensional array objects. netCDF4 is a Python interface to the netCDF version 4 library. netCDF4 relies on NumPy arrays and requires Python 2.7 or later.

In [1]:
import numpy as np  
from netCDF4 import Dataset
from datetime import datetime

2. Read and create netCDF files.

Import netCDF file information. Assume the nc files are in the python script location. Create a new netCDF file using the netCDF4 Dataset() method. Open one of the subsets and use some of the file information to populate the new netCDF file.

In [2]:
daymet_ds = Dataset("../data/indata/tmax_2015GSMNPsubset.nc", 'r') # read the netCDF file
tmaxJJA_write = Dataset('../data/outdata/tmaxJJA_GSMNPsubset.nc', 'w', format='NETCDF4') # creating a file in which to write new climatology

3. Write the dimension, variables, and attributes into the new netCDF file.

For the output netCDF file, create dimensions and variables to write the JJA climatology into. netCDF defines the sizes of variables in terms of dimensions. Before any variables can be created, the dimensions they use must first be created. In this case, the x and y dimension will be the same size as the input dataset, so we'll call the Python len function to copy the size from the input file. Use the createDimension method to create new dimensions. The first argument supplies the dimension name. The second argument is the size of the dimension. Time will be unlimited, so set to None. Unlimited dimensions can be appended to. Write the new netCDF file using some information from the existing file. Use the createVariable method to create variablez in the new netCDF file to hold the tmax_JJA_mean climatology and other netCDF variables. The first argument supplies the variable name, the second arguement is the datatype (eg. f4 is 32 bit float), the third argument sets the shape. Variable names can be changed using the renameVariable method of a Dataset instance. Add attributes to the tmax_JJA_mean variable. Attributes can be deleted from a netCDF Dataset, or Variable using the python del statement.

In [3]:
# get dimension size from existing dimensions
x_len = len(daymet_ds.dimensions['x'])
y_len = len(daymet_ds.dimensions['y'])
# create dimensions in new file
tmaxJJA_write.createDimension('x', x_len)
tmaxJJA_write.createDimension('y', y_len)
tmaxJJA_write.createDimension('time', None)
# read in lat/lon and x/y variable values from existing file
lat_daymet_ds = daymet_ds.variables['lat'][:]
lon_daymet_ds = daymet_ds.variables['lon'][:]
x_daymet_ds = daymet_ds.variables['x'][:]
y_daymet_ds = daymet_ds.variables['y'][:]
# create variables in new netCDF file
tmax_JJA_mean_var = tmaxJJA_write.createVariable('tmax_JJA_mean', 'f4', ('time','y','x'), fill_value=-9999)
lat_var = tmaxJJA_write.createVariable('lat', 'f4', ('y','x'))
lon_var = tmaxJJA_write.createVariable('lon', 'f4', ('y','x'))
x_var = tmaxJJA_write.createVariable('x', 'f4', ('x'))
y_var = tmaxJJA_write.createVariable('y', 'f4', ('y'))
proj_var = tmaxJJA_write.createVariable('lambert_conformal_conic', 'u2') # u2:16bit unsigned integer, no dimensions
time_var = tmaxJJA_write.createVariable('time', 'f4', ('time'))

# add attributes for each variable in the new netCDF file
tmax_JJA_mean_var.units = 'degrees C'
tmax_JJA_mean_var.coordinate = 'y x'
tmax_JJA_mean_var.grid_mapping = 'lambert_conformal_conic'
tmax_JJA_mean_var.cell_methods = 'area: mean time: minimum within days time: mean over days'
tmax_JJA_mean_var.long_name = 'summer mean of daily maximum temperature'
lat_var.units = 'degrees_north'
lat_var.long_name = 'latitude coordinate'
lat_var.standard_name = 'latitude'

lon_var.units = 'degrees_east'
lon_var.long_name = 'longitude coordinate'
lon_var.standard_name = 'longitude'

x_var.units = 'km'
x_var.long_name = 'x coordinate of projection'
x_var.standard_name = 'projection_x_coordinate'

y_var.units = 'km'
y_var.long_name = 'y coordinate of projection'
y_var.standard_name = 'projection_y_coordinate'

proj_var.grid_mapping_name = 'lambert_conformal_conic'
proj_var.longitude_of_central_meridian = -100.0
proj_var.latitude_of_projection_origin = 42.5
proj_var.false_easting = 0.0
proj_var.false_northing = 0.0
proj_var.standard_parallel = (25.,  60.)
proj_var.semi_major_axis = 6378137.0
proj_var.inverse_flattening = 298.257223563

time_var.long_name = 'time'
time_var.calendar = 'standard'
time_var.units = 'days since 1980_01_01 00:00:00 UTC'
# write lat/lon and x/y values into the new netCDF file
lat_var[:,:] = lat_daymet_ds
lon_var[:,:] = lon_daymet_ds
x_var[:] = x_daymet_ds
y_var[:] = y_daymet_ds
In [4]:

4. Create a climatology of the GSMNP spatial subset and write the data into the new netCDF file.

To create a temporal climatology from the daily data, subset the tmax variable to a time set of interest. For example, a summer, or JuneJulyAugust, subset would be yearday 151 - 243.

In [6]:
for year in range(2015, 2017):
    startYear = 2015
    nc_in = "../data/indata/tmax_" + str(year) + "GSMNPsubset.nc"
    daymet_JJA = Dataset(nc_in, 'r') # read the netCDF file
    tmax_JJA = daymet_JJA.variables['tmax'][151:243 , :, :]  # We're not subsetting the x or y dimension.
    tmax_JJA_mean_comp = np.mean(tmax_JJA, axis=0, keepdims=True)
    tmax_JJA_mean_var[year-startYear, :, :] = tmax_JJA_mean_comp
    # Determine the time value as days since Jan 1, 1980 and write into the proper time index
    date_format = "%m/%d/%Y"
    startdate = datetime.strptime('1/1/1980', date_format)
    stopdate = datetime.strptime('7/15/' + str(year), date_format)
    delta = stopdate - startdate
    days = delta.days
    time_var[year-startYear] = days
In [7]:
In [8]:
# Close the files