The cell below loads the visual style of the notebook when run.
In [ ]:
from IPython.core.display import HTML
css_file = './styles/styles.css'
HTML(open(css_file, "r").read())

Calibrating CCD data

Learning Objectives

  • How to use the astropy library to read in FITS images
  • How to display FITS images
  • How to access FITS headers
  • How to add, subtract divide and combine images
  • Bias subtract, dark subtract and flat-field images

In the lectures we covered the basics of how CCDs work. Once we have a CCD image we'd like to use it for some scientific purpose. For example we might like to perform absolute photometry on the stars in the images.

Usually, before doing that, we must perform some additional processing on the image. We'll look at why later in this practical. Before we do that, let's examine the file format that CCD images come in, and how to deal with it in Python.

FITS images

The FITS (Flexible Image Transport System) file format is the most common data format for astronomical images and data. We need to know a little bit about it to deal with it.


A FITS file contains one or more Header/Data Units (HDUs). Each HDU contains a header, which contains some information about the file, and an optional data unit. For example, a FITS file containing a CCD image would have one HDU - the header would describe the data, and the data would be the CCD image itself.

Reading FITS images

The astropy library is a huge Python library with many, many useful goodies for astronomers to use. Amongst those is a library purely for reading and writing FITS files. We import it like so:

In [ ]:
from import fits
# let's also import some standard modules
import numpy as np
# display plots in notebook
%matplotlib inline
import matplotlib.pyplot as plt

We'll also need some FITS files to play with! We can use another astropy library to download a file over the internet. The argument cache=True means we won't re-download the file if we've already downloaded it once. Once downloaded, the variable image_file is a simple string containing the path to the file.

In [ ]:
from import download_file
image_file = download_file('',cache=True)

For most FITS files, particularly for simple images, we can get the header and the data from the file like so:

In [ ]:
image_data = fits.getdata(image_file)
image_hdr  = fits.getheader(image_file)

For more complicated FITS files, with many HDUs you might need to do something more advanced - for those cases look at the documentation for help!

Image Data

The image_data is just a 2D numpy array, so we can use all the wonderful numpy goodness with them:

In [ ]:

We can use the same code we saw in the plotting practical to display the image data. If you find the representation and scaling of image data a bit confusing you may want to look at this companion notebook.

In [ ]:
fig, axis = plt.subplots(figsize=(10,10))
implot = axis.imshow(image_data,cmap='gray',origin='lower')

Header Data

Let's take a look at the image header we read in earlier

In [ ]:

You can see that the image header is typically used to store information about the image. The header can include information about the object, the exposure time, the filter used etc. All the items in the header follow a name = value pattern. The header can be accessed like a Python dictionary. I've avoided discussing dictionaries in the course so far. They are extremely useful. Dictionaries are a bit like lists, but instead of accessing elements by index (e.g list[0]) we can access them by name.

Thus, if we want the EXPOSURE item from the fits header we can access it like so:

In [ ]:
print( image_hdr['EXPOSURE'] )

Writing FITS images

So, that's how you read image data from a FITS file. How do we write our own data?

The key to writing our own FITS files is to know that all FITS files must contain a Primary HDU - i.e the first HDU. We write a FITS file by creating a Primary HDU object, and using it's writeto method:

In [ ]:
hdu  = fits.PrimaryHDU(data=image_data)

The clobber=True argument means that the file will be over-written if it already exists. Otherwise it will raise an exception. If we want to provide a header for the FITS file we can, but note that this is optional:

In [ ]:
hdu  = fits.PrimaryHDU(data=image_data, header=image_hdr)

An introduction to Reducer

Reducer is a Python package which has been written with the notebook in mind. The idea is to make grouping and automatic processing of CCD images easy. We'll be using reducer to process all the data in this tutorial.

If you are running on the managed desktop or your own laptop, reducer and it's dependencies will not be installed. You need to start a terminal (linux, OS X), or an Anaconda command prompt (windows) and install ccdproc, msumastro and reducer. These are each installed by typing pip install name, where you replace name with the package name (e.g pip install ccdproc). Put a red sticky note on your machine if you have problems!

Below we import the necessary modules

In [ ]:
import reducer.gui
import reducer.astro_gui as astro_gui
from reducer.image_browser import ImageBrowser
import msumastro

from reducer import __version__
print ("Using reducer version {}".format(__version__))

Reducer has a really nice graphical browser which displays in the notebook and allows you to look at what FITS files are in a directory, grouped by information in the FITS headers. First we need to set two variables - the directory where the data is located, and the directory where we want to store the calibrated data:

In [ ]:
data_dir = 'raw_data'
destination_dir = 'calib_data'

import os
#make destination directory if it does not exist
if not os.path.exists(destination_dir):

Load the dataset

In all the examples below, don't worry about understanding all the details of the code that runs reducer - the idea is to present you with a template you can use to reduce data in future...

In [ ]:
images = msumastro.ImageFileCollection(location=data_dir, keywords='*')

With the dataset loaded, we can get a look at what files are in the data dir as shown below. The ['imagetype','exposure'] argument in the first line sets how the files are grouped in the browser. You can group files on any keyword in the FITS header by changing this argument. Feel free to play around and see for yourself:

In [ ]:
tt = msumastro.TableTree(images.summary_info, ['imagetyp', 'exposure'], 'file')
fits_browser = ImageBrowser(tt, demo=False, directory=data_dir)
fits_browser.padding = '10px'

How does reducer group images?

How does the browser above know how to group our images? The answer lies in the IMAGETYP entry in the FITS header. For the data used in this spreadsheet this header entry labels the files as DARK, BIAS, FLAT or (for our science data) LIGHT frames.

Data you take on the 16-inch telescope for your project may not be like this. Often there is no indication in the FITS header what kind of image the file contains. If this is the case for your data, the first step you may need to take is to edit the FITS header of your data files, following the instructions in this notebook.

Calibrating CCD Images

OK, now we know how to read and write FITS files. Let's look at what calibrations we need to perform on CCD images - why we need to do them and how to do them in Python.

Bias Frames

Let's recall how a CCD measures the number of electrons $N_e$ in each pixel. These electrons have a total charge $Q=eN_e$. We measure this charge by dumping it onto a capacitor with capacitance $C$ and measuring the voltage $V=Q/C$. We can re-write the number of electrons in terms of this tiny, analog voltage as

$$ N_e = CV/e .$$

In other words, the voltage is proportional to the number of electrons. Because we need to store the data in digital format, the analog voltage is converted to a digital number of counts $N_c$, by an analog-to-digital converter (ADC). Since the value in counts is proportional to the voltage $N_c \propto V$, it follows that the number of counts is proportional to the number of photo-electrons, i.e $N_e=GN_c$, where $G$ is the gain, measured in e-/ADU. The number of bits used by the ADC to store the count values limits the number of different count values that can be represented. For a 16-bit ADC, we can represent count values from 0 to 65,535.

Now, imagine a relatively short exposure, taken from a dark astronomical site. Suppose that the gain, $G=1$ e-/ADU and that in our short exposure we create, on average, two photo-electrons from the sky in each pixel. Because of readout noise, we will NOT have 2 counts in each pixel. Instead, the pixel values will follow a Gaussian distribution, with a mean of 2 counts, and a standard deviation given by the readout noise, which may be of the order of 3 counts. It should be obvious that this implies that many pixels should contain negative count values. However, our ADC cannot represent numbers less than 0! This means our data has been corrupted by the digitisation process. If we didn't fix this, it would cause all sorts of problems: in this case it would lead us to over-estimate the sky level.

The solution is to apply a bias voltage. This is a constant offset voltage applied to the capacitor before analog-to-digital conversion. The result is that, even if the pixel contains no photo-electrons, the ADC returns a value of a few hundred counts, nicely solving the issue of negative counts. However, it does mean that we must correct for the bias level when doing photometry! Each pixel in our image contains counts from stars, from the sky background and from the bias level. We must subtract the bias level before performing photometry.

Measuring the bias level

How do we know what the bias level is? The easiest way to do this is to take a series of images with zero exposure time. Because there is no exposure time, these images contain no photo-electrons, and no thermally excited electrons. These images, known as bias frames, allow us to measure the bias level, and subtract it from our science data. Several bias frames are needed because the value of any pixel in a given bias frame will differ from the bias level due to readout noise. Averaging several frames together reduces the impact of readout noise and gives a more accurate estimate of the bias level. The master bias frame produced from this averaging can be subtracted from all science images to remove the bias level from each pixel.

Calculating the master bias frame

There are three bias frames in the data_dir directory. The command below will create a combine files dialog that allows us to combine them. Notice how the image_source argument is set to the ImageFileCollection we created above? Also notice how the apply_to arguments selects which type of files will be combined?

Run this cell, then click the "Master Bias Settings" box that appears. A standard set of options for combining images is available - we can do some clipping on the images, group images to combine by FITS header keyword, and choose how to combine them.

Click the "Combine images?" option, make sure "Average" is selected, then click "Lock settings and Go!"

In [ ]:
bias_settings = astro_gui.Combiner(description="Master Bias Settings",
                                   apply_to={'imagetyp': 'BIAS'},

If all that worked successfully, we will have created a master bias frame in our destination_dir. Let's use the built-in os module to take a look!

In [ ]:

Now that we've made and saved our masterbias - let's take a look at it.

In [ ]:
# use os library to glue directories and filenames together easily
filepath = os.path.join(destination_dir, '')

# open the file
masterbias = fits.getdata(filepath)

# What's the mean level in the bias image?

print('The masterbias has mean level of {:.1f} counts'.format( masterbias.mean() ))

# display it
fig, axis = plt.subplots(figsize=(10,10))
# custom scaling - see companion notebook
vmin,vmax = np.percentile(masterbias, [1,99])
implot = axis.imshow(masterbias,cmap='gray',origin='lower',vmin=vmin,vmax=vmax)

Averaging and read noise

Calculate the mean and standard deviation of the master bias, and a single bias frame. Note how averaging the bias frames has reduced the standard deviation around the mean level.

Why are the pixels in a single bias frame scattered around the mean?

In [ ]:

Dark Frames

Recall that photo-electrons are produced in CCDs by photons exciting electrons from the valence band to the conduction band. However, this is not the only way to excite electrons into the conduction band. Thermal excitation is also capable of producing electrons in the conduction band. Thermal excitation of electrons is known as dark current and the electrons produced by it are indistinguishable from photo-electrons.

Dark current can be very substantial. At room temperature, the dark current of a standard CCD is typically 100,000 e-/pixel/s, which is sufficient to saturate most CCDs in only a few seconds! The solution is to cool the CCD. The typical operating temperatures of CCDs are in the range 150 to 263 K (i.e. -123 to -10$^{\circ}$C). At major observatories, most CCDs are cooled to the bottom end of this range, generally using liquid nitrogen. The resulting dark current can be as low as a few electrons per pixel per hour. The CCDs on the Hicks telescopes are air-cooled to a few degrees below zero, and have dark currents of around 40 e-/pixel/hour.

Neither of these values are negligible, especially for short exposures. Thus, every pixel in our image contains contributions from stars, sky background, bias level and dark current. The dark current must be measured, and subtracted from our science images for the same reasons as the bias level. For this purpose dark frames are taken. These are long exposures, taken with the shutter closed. These frames will have no contribution from photo-electrons, but they will contain dark current and the bias level. This means that dark frames must have the bias subtracted from them before use. Once the bias level has been subtracted off, several dark frames can be combined to make a master dark frame, which can be subtracted from your images.

It is best to combine the dark frames using the median, rather than the mean. Dark frames are often long exposures, which can be affected by cosmic rays. Cosmic rays hitting the CCD will excite also excite electrons. Taking the median of the master dark will help remove cosmic rays from the master dark frame.

Because the dark frame increases with time, it is easiest if the dark frames have the same exposure time as your science images. If they do not, it is possible to scale the dark frame by the ratio of exposure times, since dark current increases (roughly) linearly with time. Dark current is also a strong function of temperature - it is essential that dark frames are taken with the CCD at the same temperature as your science frame.

Calculating the master dark

In the directory "../../data/M51" are three dark frames. Each one needs to be bias subtracted, and then combined with a median.

Subtract bias from darks

Using the code cell below, subtract the bias from the darks. You should just be able to run the cell, tick the 'Subtract bias' box and lock the settings and go.

Optionally, try and understand why the various arguments have the values they do.

In [ ]:
reduced_collection = msumastro.ImageFileCollection(location=destination_dir, keywords='*')
dark_reduction = astro_gui.Reduction(description='Reduce dark frames',
                                     apply_to={'imagetyp': 'dark'})


Make a master dark

Using the code cell below, create and save a master dark frame. You shouldn't need to change any of the code. Just make sure you select the "combine images" box and select "median" for the combination method.

Note the Group by option in the controls that appear after you execute the cell below. reducer will make a master for each value of the FITS keyword listed in Group by. By default this keyword is named exposure for darks, so if you have darks with exposure times of 10 sec, 15 sec and 120 sec you will get three master darks, one for each exposure time.

In [ ]:
reduced_collection = msumastro.ImageFileCollection(location=destination_dir, keywords='*')

dark = astro_gui.Combiner(description="Make Master Dark",
                          apply_to={'imagetyp': 'dark'},

Listing the contents of the destination_dir we should have three bias subtracted darks (, etc) and master dark for each exposure time:

In [ ]:

Having made a masterdark image - let's plot it:

In [ ]:
filepath = os.path.join(destination_dir,"")

masterdark_im = fits.getdata(filepath)

# What's the mean level in the dark image?
print('The masterdark has mean level of {:.1f} counts'.format( masterdark_im.mean() ))

# display it
fig, axis = plt.subplots(figsize=(10,10))
# custom scaling - see companion notebook
vmin,vmax = np.percentile(masterdark_im, [1,99])
implot = axis.imshow(masterdark_im,cmap='gray',origin='lower',vmin=vmin,vmax=vmax)

The average level should be about 50 counts. What is the exposure time? We can read in the header of the dark frames to find out:

In [ ]:
img_hdr = fits.getheader('calib_data/')

So there's a mean dark current of 50 counts in a 300 second exposure. Note that not every pixel has the same value. Some of this is read noise, but it is also true that different pixels in a CCD show different levels of dark current. Some pixels show very high levels of dark current - these so called hot pixels can have a very serious effect on your photometry if your target star happens to lie on top of one.

Flat Fields

Suppose we use our telescope and CCD to take an image of a perfectly uniform light source. Would every pixel have the same number of counts in it? No - as we have seen each pixel will have a varying contribution from dark current and readout noise. What if we ignored these effects? The answer is still no. Various effects combine to mean that the count level can vary significantly across the image.

Below is an image of the twilight sky taken with the ROSA telescope on the roof of the Hicks building. On the small image scale of a telescope, the twilight sky is an excellent approximation to a uniformly illuminated light source. However, the image below is far from uniform.

There are three main reasons for the structure in this image:

  1. Vignetting

    Consider the design of the Newtonian telescope. If the secondary mirror is exactly the right size to fit the beam produced by an on-axis source, some fraction of the beam produced by an off-axis source will miss the secondary mirror. This light will be lost, and off-axis sources will appear dimmer than on-axis sources. This is vignetting, and its effect is clearly visible in the figure above.

  2. Pixel-to-Pixel variations

    Each pixel in a CCD is not exactly the same size; manufacturing tolerances mean that some pixels are larger than others. If each CCD pixel is exposed to a constant flux, the variation in pixel area means that some pixels will capture more photons than others. This effect can also be seen in the figure above if you look closely, and is often called flat field grain.

  3. Dust grains on optical surfaces

    Dust grains on the window of the CCD, or on the filters will block out a small fraction of the light falling on the CCD. These grains appear as dark donuts, with the size of the donut depending on how far from the focal plane the dust grain is. Two such donuts are visible inthe figure above.

Flat Field Frames

It is essential to correct our images for these effects. If we did not, the number of counts from an object would vary depending upon its location in the image, ruining our photometry. Fortunately, we can correct these effects using flat field frames. These are images taken of the twilight sky, which is assumed to be uniform (this is a good assumption for most instruments). Any variation in the flat field is therefore due to the effects above. Because vignetting and the contribution from dust grains can depend on the filter, flat fields must be taken in the same filters as your science data.

For small telescopes, it can be more convenient to use a specially constructed flat-field panel. These panels are carefully designed to provide a uniform light source. The advantage of a flat field panel over the twilight sky is that flat fields can be taken at any time, whereas twilight flats can only be taken in a narrow window after sunset. The 16" Hicks telescope has a flat field panel for taking flat fields.

How should the flat field be used? First of all, we must realise that the flat field image must be bias subtracted. Dark frame subtraction is not normally necessary, since exposure times are short and the dark current will be very small. The count level of pixel $(i,j)$ in the bias-subtracted flat field image can be written as

$$F_{ij} = \alpha_{ij} F,$$

where $F$ is the uniform flux of the twilight sky, or flat field panel. The quantity $\alpha_{ij}$ represents the fraction of light lost to vignetting, pixel-to-pixel variations and dust grains. If we normalise our image, i.e. divide the flat field by the mean flux, $F$, we get an image whose brightness $f$ is given by $f_{ij}=\alpha_{ij}$. Our science image is given by the product of the flux falling on each pixel $G_{ij}$, and the flat field effects, giving an image in which the brightness of pixel $(i,j)$ is given by

$$ G'_{ij} = G_{ij}\alpha_{ij}.$$

Dividing our science image, $G′_{ij}$ by the normalised flat field image, $\alpha_{ij}$, gives the actual flux falling on each pixel $G_{ij}$, as desired.

Calculating the master flat frame

Usually we take several flat fields in each filter. These must all be bias subtracted and averaged together. The average frame produced must be normalised: divided by its mean.

Bias subtract the flat fields

In the data directory are three twilight flat fields taken through the I-filter. These are called "", "" and "".

You should be able to run the code cell below to bias subtract the flat fields:

In [ ]:
reduced_collection = msumastro.ImageFileCollection(location=destination_dir, keywords='*')
flat_reduction = astro_gui.Reduction(description='Reduce flat frames',
                                     apply_to={'imagetyp': 'flat'})


Make a master flat

This should be pretty similar to when we procude a master dark. Again, note the presence of Group by. This allows us to produce one flat field per filter observed.

In [ ]:
reduced_collection = msumastro.ImageFileCollection(location=destination_dir, keywords='*')

flat = astro_gui.Combiner(description="Make Master Flat",
                          apply_to={'imagetyp': 'flat'},

As we have done before - let's plot the image and look at the mean level.

In [ ]:
filepath = os.path.join(destination_dir,"")

masterflat_im = fits.getdata(filepath)

# What's the mean level in the dark image?
print('The masterflat has mean level of {:.1f} counts'.format( masterflat_im.mean() ))

# display it
fig, axis = plt.subplots(figsize=(10,10))
# custom scaling - see companion notebook
vmin,vmax = np.percentile(masterflat_im, [1,99])
implot = axis.imshow(masterflat_im,cmap='gray',origin='lower',vmin=vmin,vmax=vmax)

Notice how the flat field has a mean of ~25,000 counts. If we were doing this by hand we would normalise the flat field before using it. However, in this case reducer will take care of the normalisation when we use the flat field to correct our data. The structure due to vignetting, flat field grain and "donuts" from dust on the optical surfaces are clearly visible.

Using our calibration frames

Now we have our master calibration frames, we can use them to calibrate our science CCD data! We will do this in the homework!

Homework #5

Calibrating Science Frames

If you have run through the notebook above, you should have three files in the same directory as the notebook - a master bias, master dark (300s exposure) and a master flat field.

In the data directory (../../data/M51) are located 18 images of M51 taken in the I-band. Our task in this homework is to apply the master calibration frames to the data. As a reminder the steps we have to carry out on each image are:

  1. Subtract masterbias from image
  2. Subtract masterdark from image, correctly scaled for exposure times of image and dark
  3. Divide image by masterflat
  4. Save calibrated image

The questions in the homework will guide you through this step by step.

Q1: Calibrating the science files (5 points)

The reducer code below will calibrate all of our science frames:

There is some clever processing going on here being the scenes:

  • If darks are subtracted a dark of the same edxposure time will be used, if available. If not, and dark scaling is enabled, the dark with the closest exposure time will be scaled to match the science image.
  • If the dark you want to scale appears not to be bias-subtracted an error will be raised.
  • Flats are matched by filter.

Make sure you select "subtract bias", "subtract dark" and "flat correct".

If you get the settings right, the tests in the cell after the next one will pass. remember, these test cells form part of your grade!

In [ ]:
reduced_collection = msumastro.ImageFileCollection(location=destination_dir, keywords='*')
light_reduction = astro_gui.Reduction(description='Reduce light frames',
                                      apply_to={'imagetyp': 'light'})

In [ ]:
from import assert_almost_equal, assert_equal

# check all files are present
for filenum in range(1,19):
    filename = 'calib_data/M51_I_{:03d}.fit'.format(filenum)
    if not os.path.exists(filename):
        raise IOError('output file not found')
# see if the first file is properly processed

# you havent done any silly trimming, have you?

Q2: Stacking the calibrated files (5 points)

There will be many occasions when you will want to average several calibrated images of the same target. Usually you will want to perform photometry on a single image with an improved signal-to-noise ratio.

This can be done within Python using external libraries, such as montage-wrapper, for example. But it's a pain and installing the software is difficult. Luckily, the astroImageJ software we saw in the photometry practicals can also stack software.

Follow the instructions here to stack the 18 images of M51. Using the "upload" button in the main Jupyter notebook page, upload your stacked file to the destination directory.

Finally, in the code cell below, read in and plot your image.

In [ ]:
raise NotImplementedError()