Notebook

Working with Satellite Data

Unidata Python Workshop

Overview:¶

Teaching: 60 minutes
Exercises: 30 minutes

Questions¶

How can satellite data be obtained with Siphon?
What format is satellite data distributed in?
How can MetPy be used to open and explore satellite data?
How can maps of satellite data be made?
How can other datasets be integrated with satellite data?

Objectives¶

Download satellite data with Siphon
Parse out netCDF file
Make a plot of the data
Use colortables and annotations
Bonus: Animations

1. Download satellite data with Siphon¶

The first step is to find the satellite data. Normally, we would browse over to http://thredds.ucar.edu/thredds/ and find the top-level THREDDS Data Server (TDS) catalog. From there we can drill down to find satellite data products.

For this tutorial, we want to utilize the new GOES-16 imagery. This imagery is still considered experimental and not for operational use, so it is found on the test dataserver at http://thredds-test.unidata.ucar.edu/. Navigate to the Test Datasets directory, then GOES-16 Products and GOES-16. Here you'll find a folder for each day of the archive. In the daily folders there are subfolders for the CONUS, full disk, and mesoscale sector images. In each of these there is a folder for each channel. As you can see, there is a massive amount of data coming down from GOES-16!

We could download the files to our computers from here, but that can become tedious for downloading many files, requires us to store them on our computer, and does not lend itself to automation.

We can use Unidata's Siphon package to parse the catalog from the TDS. This provides us a nice programmatic way of accessing the data. We start by importing the TDSCatalog class from siphon and giving it the URL to the catalog we just surfed to manually. Note: Instead of giving it the link to the HTML catalog, we change the extension to XML, which asks the TDS for the XML version of the catalog. This is much better to work with in code.

Next we'll build a URL to get to the current day's data for a specified channel:

In [1]:

from datetime import datetime
from siphon.catalog import TDSCatalog

date = datetime.strftime(datetime.utcnow(), '%Y%m%d')
channel = 8

cat = TDSCatalog('http://thredds-test.unidata.ucar.edu/thredds/catalog/satellite/goes16/GOES16/'
                 '{}/CONUS/Channel{:02d}/catalog.xml'.format(date, channel))

We now have a TDSCatalog object called cat that we can examine and use to get handles to work with the data. To find the latest file, we can look at the cat.datasets attribute. This is a Python dictionary, mapping the name of the dataset to a Python Dataset object (which came from more XML supplied by the TDS). Since this is a dictionary, we can look at a list of keys and see what datasets are available. Let’s look at the last five keys:

In [2]:

list(cat.datasets)[-5:]

Out[2]:

['GOES16_20170407_015224_6.19_2km_32.3N_91.4W.nc4',
 'GOES16_20170407_015724_6.19_2km_32.3N_91.4W.nc4',
 'GOES16_20170407_020224_6.19_2km_32.3N_91.4W.nc4',
 'GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'GOES16_20170407_021224_6.19_2km_32.3N_91.4W.nc4']

What we really want is the most recent data, which is the last item in the list. We can pull that out, and use its name to get the actual Python Dataset object:

In [3]:

dataset_name = list(cat.datasets)[-2]
dataset = cat.datasets[dataset_name]
print(dataset)

<siphon.catalog.Dataset object at 0x106e51ac8>

The catalog.Dataset class provides access to a lot of information about a dataset, like metadata (e.g. time range, spatial extent). What we want most however, is to know how to access the data. This is provided by the dataset.access_urls attribute:

In [4]:

dataset.access_urls

Out[4]:

{'CdmRemote': 'http://thredds-test.unidata.ucar.edu/thredds/cdmremote/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'CdmrFeature': 'http://thredds-test.unidata.ucar.edu/thredds/cdmrfeature/grid/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'DAP4': 'http://thredds-test.unidata.ucar.edu/thredds/dap4/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'HTTPServer': 'http://thredds-test.unidata.ucar.edu/thredds/fileServer/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'ISO': 'http://thredds-test.unidata.ucar.edu/thredds/iso/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'NCML': 'http://thredds-test.unidata.ucar.edu/thredds/ncml/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'NetcdfSubset': 'http://thredds-test.unidata.ucar.edu/thredds/ncss/grid/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'OPENDAP': 'http://thredds-test.unidata.ucar.edu/thredds/dodsC/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'UDDC': 'http://thredds-test.unidata.ucar.edu/thredds/uddc/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'WCS': 'http://thredds-test.unidata.ucar.edu/thredds/wcs/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4',
 'WMS': 'http://thredds-test.unidata.ucar.edu/thredds/wms/satellite/goes16/GOES16/20170407/CONUS/Channel08/GOES16_20170407_020724_6.19_2km_32.3N_91.4W.nc4'}

These different URLs provide access to the data in different ways: some support different protocols (like OPeNDAP or CDMRemote), others allow harvesting metadata (e.g. ISO). We will use the netCDF4 package to read the data directly from the server using OPeNDAP. You could of course download the files locally if you needed to archive them for research, but there's no need to do that to make a 'real-time' display.

In [5]:

from netCDF4 import Dataset
ds = Dataset(dataset.access_urls['OPENDAP'])

Exercise: Find your own data¶

2. Parse out the netCDF File¶

We have a netCDF4-python dataset object now that contains all of the data for a given satellite image. We need to explore the variables available in the file and pull out the useful parts that we need to make a map.

In [6]:

list(ds.variables)

Out[6]:

['y', 'x', 'lambert_projection', 'Sectorized_CMI']

Our goal is to plot the imagery, so we'll be using the Sectorized_CMI variable from the .variables dictionary. Rather than just giving back the raw array of data, this gives back a Variable object; from here not only can we get the raw data values, but there is useful metadata as well. We can see just what additional information is present by printing out the Variable object:

In [7]:

data_var = ds.variables['Sectorized_CMI']
print(data_var)

<class 'netCDF4._netCDF4.Variable'>
int16 Sectorized_CMI(y, x)
    _FillValue: 0
    standard_name: brightness_temperature
    units: kelvin
    grid_mapping: lambert_projection
    add_offset: 173.15
    scale_factor: 0.0393162
    valid_min: 0
    valid_max: 4095
    _ChunkSizes: [1024 1024]
unlimited dimensions: 
current shape = (2560, 3072)
filling off

This reveals several useful pieces of information (such as a longer description of the variable), but we're going to focus on the grid_mapping data. This attribute is defined by the NetCDF Climate and Forecast (CF) Metadata Conventions and specifies a variable that contains information about the grid's projection.

We also need to get the x and y coordinates of the image from the dataset object. We use an empty slice ([:]) to copy the actual numeric values out of the variables (for easier use with matplotlib and cartopy).

In [8]:

x = ds.variables['x'][:]
y = ds.variables['y'][:]

We will also grab the variable corresponding to the grid_mapping attribute so that we can have a look at the projection information. Rather than hard coding the name of the variable (in this case Lambert_Conformal), we just directly pass the grid_mapping attribute to the .variables dictionary; this makes it easier to re-use the code in the future with different data.

In [9]:

proj_var = ds.variables[data_var.grid_mapping]
print(proj_var)

<class 'netCDF4._netCDF4.Variable'>
int32 lambert_projection()
    grid_mapping_name: lambert_conformal_conic
    standard_parallel: 25.0
    longitude_of_central_meridian: -95.0
    latitude_of_projection_origin: 25.0
    false_easting: 0.0
    false_northing: 0.0
    semi_major: 6371200.0
    semi_minor: 6371200.0
unlimited dimensions: 
current shape = ()
filling off

This shows that the projection is Lambert conformal conic. The variable also includes a few parameters (such as the latitude and longitude of the origin) needed to properly set up the projection to match what was used to create the image. This variable also has information about the assumed shape of the earth, which in this case is spherical with a radius of 6371.2 km.

Exercise: Parse your data¶

3. Make a plot of the data¶

We are finally ready to plot the satellite data! Using imshow() we can get an image, but there is a lot of work to do here. The data need to be projected to a meaningful representation, map outlines added, and annotations added.

In [10]:

# Make sure the notebook puts figures inline
%matplotlib inline
import matplotlib.pyplot as plt

In [11]:

plt.imshow(data_var[:])
plt.show()

Projections¶

The mapping will be accomplished with a combination of Cartopy and Matplotlib. To begin, we will use Cartopy’s crs (Coordinate Reference System) module. With the crs, we will create a Globe object that contains information about the assumed planet shape used in the projection.

In [12]:

proj_var

Out[12]:

<class 'netCDF4._netCDF4.Variable'>
int32 lambert_projection()
    grid_mapping_name: lambert_conformal_conic
    standard_parallel: 25.0
    longitude_of_central_meridian: -95.0
    latitude_of_projection_origin: 25.0
    false_easting: 0.0
    false_northing: 0.0
    semi_major: 6371200.0
    semi_minor: 6371200.0
unlimited dimensions: 
current shape = ()
filling off

In [13]:

import cartopy.crs as ccrs

# Create a Globe specifying a spherical earth with the correct radius
globe = ccrs.Globe(ellipse='sphere', semimajor_axis=proj_var.semi_major,
                   semiminor_axis=proj_var.semi_minor)

Now that we have a globe of the appropriate shape, we need to create our projection. Knowing that that data are using the Lambert conformal conic projection, we will use the LambertConformal class. This class uses the globe model we just created, along with the projection information from the file, namely the central latitude, central longitude, and standard parallels.

In [14]:

proj = ccrs.LambertConformal(central_longitude=proj_var.longitude_of_central_meridian,
                             central_latitude=proj_var.latitude_of_projection_origin,
                             standard_parallels=[proj_var.standard_parallel],
                             globe=globe)

Plotting¶

Now that we know how to properly reference the imagery data, we can plot the data. CartoPy's projections are designed to interface with matplotlib, so they can just be passed as the projection keyword argument when creating an Axes using the add_subplot method. Since the x and y coordinates, as well as the image data, are referenced in the lambert conformal projection, we can pass all of them directly to plotting methods (such as imshow) with no additional information. The extent keyword argument to imshow is used to specify the bounds of the image data being plotted. It is especially important to specify that the origin is at the the upper left of the image (standard practice in imagery). If your forget this, your image will be flipped. Try it!

In [15]:

print(x[0], x[-1], y[0], y[-1])

-2779168.60944 3460374.02806 3417470.07785 -1781810.15965

In [16]:

# Create a new figure with size 10" by 10"
fig = plt.figure(figsize=(10, 10))

# Put a single axes on this figure; set the projection for the axes to be our
# Lambert conformal projection
ax = fig.add_subplot(1, 1, 1, projection=proj)

# Plot the data with mostly the defaults
# Note, we save the image returned by imshow for later...
im = ax.imshow(data_var[:], extent=(x.min(), x.max(), y.min(), y.max()), origin='upper')

# Add high-resolution coastlines to the plot
ax.coastlines(resolution='50m', color='black')

Out[16]:

<cartopy.mpl.feature_artist.FeatureArtist at 0x114db6f98>

This is a nice start, but it would be nice to have better geographic references for the image. For example, where are the states? Cartopy's feature module has support for adding geographic features to plots ,with many features are built in. The BORDERS built-in feature contains country borders. There is also support for creating "custom" features from the Natural Earth set of free vector and raster map data (CartoPy will automatically download the necessary data and cache it locally). Here we create a feature for states/provinces.

In [17]:

import cartopy.feature as cfeat

# Add country borders with a thick line.
ax.add_feature(cfeat.BORDERS, linewidth='2', edgecolor='black')

# Set up a feature for the state/province lines. Tell cartopy not to fill in the polygons
state_boundaries = cfeat.NaturalEarthFeature(category='cultural',
                                             name='admin_1_states_provinces_lakes',
                                             scale='50m', facecolor='none')

# Add the feature with dotted lines, denoted by ':'
ax.add_feature(state_boundaries, linestyle=':', edgecolor='black')

# Redisplay modified figure
fig

Out[17]:

Exercise: Plot your data¶

4. Use colortables and annotations¶

The map is much improved now, but it would look much better with a different color scheme.

Colormapping in matplotlib (which backs CartoPy) is handled through two pieces:

The colormap controls how values are converted from floating point values in the range [0, 1] to colors (think colortable)
The norm (normalization) controls how data values are converted to floating point values in the range [0, 1]

Let's start by setting the colormap to be black and white and normalizing the data to get the best contrast. As with many satellite data fields, the water vapor values range from 0-255. We clip that range down to enhance contrasts in the data. We'll make a histogram to see the most common values in the data. Note: cmap and norm can also be set during the imshow call as keyword arguments.

That's great, but all of the zero values don't let us really see the range of values. Do they stop at 200? Maybe, or maybe there are just very few data points there. It's best to remove the masked elements of the array and plot the histogram again.

In [19]:

plt.hist(data_var[:].compressed().flatten(), bins=255);

In [20]:

# Set colormap
im.set_cmap('Greys_r')

# Set norm
im.set_norm(plt.Normalize(200,255))

# Show figure again
fig

Out[20]:

In meteorology, we have many ‘standard’ colortables that have been used for certain types of data. We have included these in Metpy in the metpy.plots.ctables module. By importing the ColortableRegistry we gain access to the colortables, as well as handy normalizations to go with them. We can see the colortables available by converting the dictionary to a list.

In [21]:

from metpy.plots.ctables import registry

print(list(registry))

['Carbone42', 'ir_bd', 'ir_drgb', 'ir_rgbv', 'ir_tpc', 'ir_tv1', 'NWS8bitVel', 'NWSReflectivity', 'NWSReflectivityExpanded', 'NWSSpectrumWidth', 'NWSStormClearReflectivity', 'NWSVelocity', 'rainbow', 'test', 'viridis', 'wv_tpc', 'WVCIMSS']

Let’s use the WVCIMSS colormap, a direct conversion of the GEMPAK colormap. The code below asks for the colormap, as well as a normalization that starts at 0 and increases by a value of 1 for each color in the table. We then apply it to the existing image we have been working with.

In [22]:

wv_norm, wv_cmap = registry.get_with_range('WVCIMSS', 170, 255)
im.set_cmap(wv_cmap)
im.set_norm(wv_norm)
fig

Out[22]:

One more thing that would be nice is putting the date and time on the image, so let's do that. First grab the start_date_time attribute from the dataset object:

In [23]:

ds.start_date_time

Out[23]:

'2017097020724'

This timestamp contains the year, Julian day, hour, minute, and second that the image was collected. We will use datetime.strptime to parse this out into a Python datetime object.

In [24]:

timestamp = datetime.strptime(ds.start_date_time, '%Y%j%H%M%S')
print(timestamp)

2017-04-07 02:07:24

A Python datetime object is easy to work with. Let's add it to our plot.

We use the text method to draw text on our plot. In this case, we call it with a transform keyword argument, which allows us to tell matplotlib how to interpret the x and y coordinates. In this case, we set the transform to ax.transAxes, which means "interpret x and y as being in axes space". The axes space has x and y in the range [0, 1] across the entire plotting area (e.g. (0, 0) is lower left, (1, 1) is upper right). Using this, we can put text in the lower right corner (x = 0.99, y = 0.01) regardless of the range of x and y (or longitude and latitude) in the plot. We also need to make sure to right-align the text so that the text ends at the specified point.

We use the strftime method to format the datetime as a string. The details of that format string are described here.

The code below uses matplotlib's path effects to make the text have an outline effect as well. We won't go into detail on that here, but details can be found in the documentation if you want to know more.

We also need to add an annotation telling us what channel is being plotted, as well as a statement that the data is experimental only.

For completeness, the code below replicates the entirety of the plotting code from above.

In [25]:

# Same as before, except we call imshow with our colormap and norm.
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(1, 1, 1, projection=proj)

im = ax.imshow(data_var[:], extent=(x[0], x[-1], y[-1], y[0]), origin='upper',
               cmap=wv_cmap, norm=wv_norm)
ax.coastlines(resolution='50m', color='black')
ax.add_feature(state_boundaries, linestyle=':', edgecolor='black')
ax.add_feature(cfeat.BORDERS, linewidth='2', edgecolor='black')

# Add text (aligned to the right); save the returned object so we can manipulate it.
text_time = ax.text(0.99, 0.01, timestamp.strftime('%d %B %Y %H%MZ'),
               horizontalalignment='right', transform=ax.transAxes,
               color='white', fontsize='x-large', weight='bold')

text_channel = ax.text(0.5, 0.97, 'Experimental GOES-16 Ch.{}'.format(channel),
               horizontalalignment='center', transform=ax.transAxes,
               color='white', fontsize='large', weight='bold')

# Make the text stand out even better using matplotlib's path effects
from matplotlib import patheffects
text_time.set_path_effects([patheffects.Stroke(linewidth=2, foreground='black'),
                       patheffects.Normal()])
text_channel.set_path_effects([patheffects.Stroke(linewidth=2, foreground='black'),
                       patheffects.Normal()])
ax.set_extent([240., 290., 20., 49.])

Exercise: Enhance your plot¶

Bonus: Animations¶

NOTE: This is just a quick taste of producing an animation using matplotlib. The animation support in matplotlib is robust, but sometimes installation of the underlying tools (mencoder/ffmpeg) can be a little tricky. In order to make sure we get don't get bogged down, this is really more of a demo than something expected to work out of the box.

For windows builds, you might try:

For ffmpeg
For mencoder

On OSX and linux, conda-forge has packages, so it may be as easy as:

In [26]:

#!conda install -y -n unidata-workshop -c http://conda.anaconda.org/conda-forge ffmpeg

First we'll import the animation support from matplotlib. We also tell it that we want it to render the animations to HTML using the HTML5 video tag:

In [27]:

import os.path
import sys
from matplotlib import rcParams
from IPython.display import HTML
from matplotlib.animation import ArtistAnimation

Then we create an empty figure which will serve as the basis for all of the frames of the animation:

In [28]:

fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(1, 1, 1, projection=proj)
ax.coastlines(zorder=2)
ax.coastlines(resolution='50m', color='black')
ax.add_feature(state_boundaries, linestyle=':', edgecolor='black')
ax.add_feature(cfeat.BORDERS, linewidth='2', edgecolor='black')

Out[28]:

<cartopy.mpl.feature_artist.FeatureArtist at 0x114eca160>

Then we loop over a bunch of the datasets. For each one we pull out the data and plot both the timestamp and the image. The ArtistAnimation class takes the Figure instance and a list as required arguments. The contents of this list is a collection of matplotlib artists for each frame of the animation. In the loop below, we populate this list with the Text instance created when adding the timestamp as well as the image that results from plotting the data.

In [29]:

# List used to store the contents of all frames. Each item in the list is a tuple of
# (image, text)
artists = []

# How much to downsample
downsample = 2

# Loop over the last 20 satellite images in the catalog
for ds in list(cat.datasets.values())[-20:]:

    # Open the data using the HTTPServer access url and pass to GINIFile. Then convert
    # it to a netcdf-like dataset
    nc = Dataset(ds.access_urls['OPENDAP'])

    # Pull out the image data, x and y coordinates, and the time. Also go ahead and
    # convert the time to a python datetime
    x = nc.variables['x'][::downsample]
    y = nc.variables['y'][::downsample]
    timestamp = datetime.strptime(nc.start_date_time, '%Y%j%H%M%S')
    img_data = nc.variables['Sectorized_CMI'][::downsample]

    # Plot the image and the timestamp. We save the results of these plotting functions
    # so that we can tell the animation that these two things should be drawn as one
    # frame in the animation
    im = ax.imshow(img_data, extent=(x.min(), x.max(), y.min(), y.max()), origin='upper',
                   cmap=wv_cmap, norm=wv_norm)

    # Add text (aligned to the right); save the returned object so we can manipulate it.
    text_time = ax.text(0.99, 0.01, timestamp.strftime('%d %B %Y %H%MZ'),
                   horizontalalignment='right', transform=ax.transAxes,
                   color='white', fontsize='x-large', weight='bold', animated=True)

    text_channel = ax.text(0.5, 0.97, 'Experimental GOES-16 Ch.{}'.format(channel),
                   horizontalalignment='center', transform=ax.transAxes,
                   color='white', fontsize='large', weight='bold', animated=True)

    # Make the text stand out even better using matplotlib's path effects
    from matplotlib import patheffects
    text_time.set_path_effects([patheffects.Stroke(linewidth=2, foreground='black'),
                           patheffects.Normal()])
    text_channel.set_path_effects([patheffects.Stroke(linewidth=2, foreground='black'),
                           patheffects.Normal()])
    ax.set_extent([240., 290., 20., 49.])

    # Stuff them in a tuple and add to the list of things to animate
    artists.append((im, text_time, text_channel))

# Create the animation--in addition to the required args, we also state that each
# frame should last 200 milliseconds
anim = ArtistAnimation(fig, artists, interval=200., blit=False)
HTML(anim.to_html5_video())

Out[29]:

In [ ]: