Catalog

Use the Descartes Labs Catalog to discover existing raster products, search the images contained in them and manage your own products and images. The Catalog Python client is mainly for discovering data and for managing data. For data analysis and rastering use, Scenes.

You can run the following cells using Shift-Enter.

Note: The Catalog Python object-oriented client provides the functionality previously covered by the more low-level, now deprecated Metadata and Catalog Python clients.

Concepts

The Descartes Labs Catalog is a repository for georeferenced images. Commonly these images are either acquired by Earth observation platforms like a satellite or they are derived from other georeferenced images. The catalog is modeled on the following core concepts, each of which is represented by its own class in the API.

Images

An image (represented by class Image in the API) contains data for a shape on earth, as specified by its georeferencing. An image references one or more files (commonly TIFF or JPEG files) that contain the binary data conforming to the band declaration of its product.

Bands

A band (represented by class Band) is a 2-dimensional slice of raster data in an image. A product must have at least one band and all images in the product must conform to the declared band structure. For example, an optical sensor will commonly have bands that correspond to the red, blue and green visible light spectrum, which you could raster together to create an RGB image.

Products

A product (represented by class Product) is a collection of images that share the same band structure. Images in a product can generally be used jointly in a data analysis, as they are expected to have been uniformly processed with respect to data correction, georegistration and so on. For example, you can composite multiple images from a product to run an algorithm over a large geographic region.

Searching the catalog

All objects support the same search interface. Let’s look at two of the most commonly searched for types of objects: products and images.

Finding products

Product.search() is the entry point for searching products. It returns a query builder that you can use to refine your search and can iterate over to retrieve search results.

Count all products with some data before 2016 using filter():

In [1]:
from descarteslabs.catalog import Product, properties as p
search = Product.search().filter(p.start_datetime < "2016-01-01")
search.count()
Out[1]:
124

You can apply multiple filters. To restrict this search to products with data after 2000:

In [2]:
search = search.filter(p.end_datetime > "2000-01-01")
search.count()
Out[2]:
50

Of these, get the 3 products with the oldest data, using sort() and limit(). The search is not executed until you start retrieving results by iterating over it:

In [3]:
oldest_search = search.sort("start_datetime").limit(3)
for result in oldest_search:
    print(result.id)
e7e4fc5b0afd45d32c623697cb5a4437d3f09ade:daily-weather:cfsr-v1
fcd57a7bf668c5b65d49c5e32130a2c0a5281322:daily-weather:cfsr-v1:dev-v0
daily-weather:gsod-interpolated:v0

All attributes are documented in the Product API reference, which also spells out which ones can be used to filter or sort.

Lookup by id and object relationships

If you know a product’s id, look it up directly with Product.get():

Wherever there are relationships between objects expect methods such as bands() to find related objects. This shows the first four bands of the Landsat 8 product we looked up:

In [4]:
landsat8_collection1 = Product.get("landsat:LC08:01:RT:TOAR")
landsat8_collection1
Out[4]:
Product: Landsat 8 Collection 1 Real-Time
  id: landsat:LC08:01:RT:TOAR
In [5]:
list(landsat8_collection1.bands().limit(4))
Out[5]:
[
 SpectralBand: coastal-aerosol
   id: landsat:LC08:01:RT:TOAR:coastal-aerosol
   product: landsat:LC08:01:RT:TOAR, 
 SpectralBand: blue
   id: landsat:LC08:01:RT:TOAR:blue
   product: landsat:LC08:01:RT:TOAR, 
 SpectralBand: green
   id: landsat:LC08:01:RT:TOAR:green
   product: landsat:LC08:01:RT:TOAR, 
 SpectralBand: red
   id: landsat:LC08:01:RT:TOAR:red
   product: landsat:LC08:01:RT:TOAR]

bands() returns a search object that can be further refined. This shows all class bands of this Landsat 8 product, sorted by name:

In [6]:
from descarteslabs.catalog import BandType
list(landsat8_collection1.bands().filter(p.type == BandType.CLASS).sort("name"))
Out[6]:
[
 ClassBand: qa_cirrus
   id: landsat:LC08:01:RT:TOAR:qa_cirrus
   product: landsat:LC08:01:RT:TOAR, 
 ClassBand: qa_cloud
   id: landsat:LC08:01:RT:TOAR:qa_cloud
   product: landsat:LC08:01:RT:TOAR, 
 ClassBand: qa_cloud_shadow
   id: landsat:LC08:01:RT:TOAR:qa_cloud_shadow
   product: landsat:LC08:01:RT:TOAR, 
 ClassBand: qa_saturated
   id: landsat:LC08:01:RT:TOAR:qa_saturated
   product: landsat:LC08:01:RT:TOAR, 
 ClassBand: qa_snow
   id: landsat:LC08:01:RT:TOAR:qa_snow
   product: landsat:LC08:01:RT:TOAR, 
 ClassBand: valid-cloudfree
   id: landsat:LC08:01:RT:TOAR:valid-cloudfree
   product: landsat:LC08:01:RT:TOAR]

Finding images

Image filters

Search images by the most common attributes - by product, intersecting with a geometry and by a date range:

In [7]:
from descarteslabs.catalog import Image, properties as p
geometry = {
     "type": "Polygon",
     "coordinates": [[
         [2.915496826171875, 42.044193618165224],
         [2.838592529296875, 41.92475971933975],
         [3.043212890625, 41.929868314485795],
         [2.915496826171875, 42.044193618165224]
     ]]
 }

search = Product.get("landsat:LC08:01:RT:TOAR").images()
search = search.intersects(geometry)
search = search.filter((p.acquired > "2017-01-01") & (p.acquired < "2018-01-01"))
search.count()
Out[7]:
14

There are other attributes useful to filter by, documented in the API reference for Image. For example exclude images with too much cloud cover:

In [8]:
search = search.filter(p.cloud_fraction < 0.2)
search.count()
Out[8]:
7

Filtering by cloud_fraction is only reasonable when the product sets this attribute on images. Images that don’t set the attribute are excluded from the filter.

The created timestamp is added to all objects in the catalog when they are created and is immutable. Restrict the search to results created before some time in the past, to make sure that the image results are stable:

In [9]:
from datetime import datetime
search = search.filter(p.created < datetime(2019, 1, 1))
search.count()
Out[9]:
7

Note that for all timestamps we can use datetime instances or strings that can reasonably be parsed as a timestamp. If a timestamp has no explicit timezone, it’s assumed to be in UTC.

Image summaries

Any queries for images support a summary via the summary() method, returning a SummaryResult with aggregate statistics beyond just the number of results:

In [10]:
from descarteslabs.catalog import Image, properties as p
search = Image.search().filter(p.product_id == "landsat:LC08:01:T1:TOAR")
search.summary()
Out[10]:
Summary for 490473 images:
 - Total bytes: 57,212,159,637,881
 - Products: landsat:LC08:01:T1:TOAR

These summaries can also be bucketed by time intervals with summary_interval() to create a time series:

In [11]:
search.summary_interval(interval="month", start_datetime="2017-01-01", end_datetime="2017-06-01")
Out[11]:
[
 Summary for 9872 images:
  - Total bytes: 1,230,379,744,242
  - Interval start: 2017-01-01 00:00:00+00:00, 
 Summary for 10185 images:
  - Total bytes: 1,288,400,404,886
  - Interval start: 2017-02-01 00:00:00+00:00, 
 Summary for 12426 images:
  - Total bytes: 1,556,107,514,684
  - Interval start: 2017-03-01 00:00:00+00:00, 
 Summary for 12492 images:
  - Total bytes: 1,476,030,969,986
  - Interval start: 2017-04-01 00:00:00+00:00, 
 Summary for 13768 images:
  - Total bytes: 1,571,780,442,608
  - Interval start: 2017-05-01 00:00:00+00:00]

Managing products

Creating and updating a product Before uploading images to the catalog, you need to create a product and declare its bands. The only required attributes are a unique id, passed in the constructor, and a name:

In [12]:
from descarteslabs.catalog import Product
import random 

# We append a random number to the product ID so users who run this example multiple times do not get a "Product with that ID exists" error
product_id = "guide-example-product_" + str(random.randint(1000,9999))
product = Product(id=product_id)
product.name = "Example product"
product.save()
product.id
Out[12]:
'descarteslabs:guide-example-product_7005'
In [13]:
product.created
Out[13]:
datetime.datetime(2020, 1, 7, 18, 51, 47, 437294, tzinfo=<UTC>)

save() saves the product to the catalog in the cloud. Note that you get to choose an id for your product but it must be unique within your organization (you get an exception if it’s not). This code example is assuming the user is in the “descarteslabs” organization. The id is prefixed with the organization id on save to enforce global uniqueness and uniqueness within an organization. If you are not part of an organization the prefix will be your unique user id.

Every object has a read-only created attribute with the timestamp from when it was first saved.

There are a few more attributes that you can set (see the Product API reference). You can update the product to define the timespan that it covers. This is as simple as assigning attributes and then saving again:

In [14]:
product.start_datetime = "2012-01-01"
product.end_datetime = "2015-01-01"
product.save()
product.start_datetime
Out[14]:
datetime.datetime(2012, 1, 1, 0, 0, tzinfo=<UTC>)
In [15]:
product.modified
Out[15]:
datetime.datetime(2020, 1, 7, 18, 52, 47, 97574, tzinfo=<UTC>)

A read-only modified attribute exists on all objects and is updated on every save.

Note that all timestamp attributes are represented as datetime instances in UTC. You may assign strings to timestamp attributes if they can be reasonably parsed as timestamps. Once the object is saved the attributes will appear as parsed datetime instances. If a timestamp has no explicit timezone, it’s assumed to be in UTC.

Creating bands

Before adding any images to a product you should create bands that declare the structure of the data shared among all images in a product.

In [16]:
from descarteslabs.catalog import SpectralBand, DataType, Resolution, ResolutionUnit
band = SpectralBand(name="blue", product=product)
band.data_type = DataType.UINT16
band.data_range = (0, 10000)
band.display_range = (0, 4000)
band.resolution = Resolution(unit=ResolutionUnit.METERS, value=60)
band.band_index = 0
band.save()
band.id
Out[16]:
'descarteslabs:guide-example-product_7005:blue'

A band is uniquely identified by its name and product. The full id of the band is composed of the product id and the name.

The band defines where its data is found in the files attached to images in the product: In this example, band_index = 0 indicates that blue is the first band in the image file, and that first band is expected to be represented by unsigned 16-bit integers (DataType.UINT16).

This band is specifically a SpectralBand, with pixel values representing measurements somewhere in the visible/NIR/SWIR electro-optical wavelength spectrum, so you can also set additional attributes to locate it on the spectrum:

In [17]:
# These values are in nanometers (nm)
band.wavelength_nm_min = 452
band.wavelength_nm_max = 512
band.save()

Bands are created and updated in the same way was as products and all other Catalog objects.

Band types

It’s common for many products to have an alpha band, which masks pixels in the image that don’t have valid data:

In [18]:
from descarteslabs.catalog import MaskBand
alpha = MaskBand(name="alpha", product=product)
alpha.is_alpha = True
alpha.data_type = DataType.UINT16
alpha.resolution = band.resolution
alpha.band_index = 1
alpha.save()

Here the alpha band is created as a MaskBand which is by definition a binary band with a data range from 0 to 1, so there is no need to set the data_range and display_range attribute.

Setting is_alpha to True enables special behavior for this band during rastering. If this band appears as the last band in a raster operation (such as SceneCollection.mosaic or SceneCollection.stack in the scenes client) pixels with a value of 0 in this band will be treated as transparent.

There are five band types which may have some attributes specific to them. The type of a band does not necessarily affect how it is rastered, it mainly conveys useful information about the data it contains.

  • SpectralBand: A band that lies somewhere on the visible/NIR/SWIR electro-optical wavelength spectrum. Specific attributes: wavelength_nm_center, wavelength_nm_min, wavelength_nm_max, wavelength_nm_fwhm

  • MicrowaveBand: A band that lies in the microwave spectrum, often from SAR or passive radar sensors. Specific attributes: frequency, bandwidth

  • MaskBand: A binary band where by convention a 0 means masked and 1 means non-masked. The data_range and display_range for masks is implicitly [0, 1]. Specific attributes: is_alpha

  • ClassBand: A band that maps a finite set of values that may not be continuous to classification categories (e.g. a land use classification). A visualization with straight pixel values is typically not useful, so commonly a colormap is used. Specific attributes: colormap, colormap_name, class_labels

  • GenericBand: A generic type for bands that are not represented by the other band types, e.g., mapping physical values like temperature or angles. Specific attributes: colormap, colormap_name, physical_range, physical_range_unit

Managing images

Apart from searching and discovering data available to you, the main use case of the catalog is to let you upload new images. In the following example, we will upload data with a single band representing the blue light spectrum. In addition to the example below, users can upload image files in geoTiff format through the catalog.

Uploading ndarrays

Often, when creating derived products - for example, running a classification model on existing data - you’ll have a NumPy array (often referred to as “ndarrays”) in memory instead of a file written to disk. In that case, you can use Image.upload_ndarray. This method behaves like Image.upload, with one key difference: you must provide georeferencing attributes for the ndarray.

Georeferencing attributes are used to map between geospatial coordinates (such as latitude and longitude) and their corresponding pixel coordinates in the array. The required attributes are:

An affine geotransform in GDAL format (the geotrans attribute)

A coordinate reference system definition, preferrably as an EPSG code (the cs_code attribute) or alternatively as a string in PROJ.4 or WKT format (the projection attribute)

If the ndarray you’re uploading was rastered through the the platform, this information is easy to get. When rastering you also receive a dictionary of metadata that includes both of these parameters. Using the Scene.ndarray, you have to set raster_info=True; with Raster.ndarray, it’s always returned.

The following example puts these pieces together. This extracts the blue band from a Landsat 8 scene at a lower resolution and uploads it to our product:

In [21]:
from descarteslabs.catalog import OverviewResampler
from descarteslabs.scenes import Scene
scene, geoctx = Scene.from_id("landsat:LC08:01:T1:TOAR:meta_LC08_L1TP_163068_20181025_20181025_01_T1_v1")
ndarray, raster_meta = scene.ndarray(
     "blue",
     geoctx.assign(resolution=60),
     # return georeferencing info we need to re-upload
     raster_info=True
 )

image2 = Image(product=product, name="scene2")
image2.acquired = "2012-01-02"
upload2 = image2.upload_ndarray(
    ndarray,
    raster_meta=raster_meta,
    # create overviews for 120m and 240m resolution
    overviews=[2, 4],
    overview_resampler=OverviewResampler.AVERAGE
)
    
upload2.wait_for_completion()
upload2.status
Out[21]:
'success'