A quick - and highly biased - tour of Python for X-ray Astronomy

Doug Burke, Chandra X-ray Center

Presented at the COSPAR Capacity Building Workshop, 'X-ray Astrophysics School', Ensenada, Mexico, November 2014

This is a talk presented using IPython notebooks - now re-branded as Project Jupyter - which allows you to embed code and documentation, and is being enthusiastically endorsed by the "Open Science" movement. In particular, you can send notebooks to collaborators to allow "reproducible" science, as well as being used for demonstrations, as we will see below.

The contents of this talk - where applicable, so not the images - are placed in the Public Domain.

What is Python?

Let's ask Wolfram Alpha:

Python: snake

Oops; let's focus on the programming language:

Python: language


Why Python?

There are a lot of Astronomers - and people in related (i.e. useful) fields - who provide tutorials, code, modules, workshops, and help. For instance, I saw this on Monday:

"Frequentism and Bayesianism: A Python-driven Primer", Jake VanderPlas, November 18, 2014:

This paper presents a brief, semi-technical comparison of the essential features of the frequentist and Bayesian approaches to statistical inference, with several illustrative examples implemented in Python. The differences between frequentism and Bayesianism fundamentally stem from differing definitions of probability, a philosophical divide which leads to distinct approaches to the solution of statistical problems as well as contrasting ways of asking and answering questions about unknown parameters. After an example-driven discussion of these differences, we briefly compare several leading Python statistical packages which implement frequentist inference using classical methods and Bayesian inference using Markov Chain Monte Carlo.

http://arxiv.org/abs/1411.5018


There are many examples; here is one I found that I thought would be relevant for this workshop:

Bayesian

http://allendowney.blogspot.mx/2014/11/the-world-cup-problem-part-2-germany-v.html


Yes, yes, I get it, how do I start Python?

Starting python for interactive use, use ipython (you can use python for interactive analysis, but it's not as nice):

unix% ipython
Python 2.7.6 (default, Apr 18 2014, 17:48:56) 
Type "copyright", "credits" or "license" for more information.

IPython 2.0.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import antigravity

In [2]: import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

If you want to "run a script" - i.e. something that you don't want to run interactively - then you can just say

python my-wonderful-script-for-suzaku-analysis.py

and you can even make scripts executable and use a "shebang" line to avoid the need to say python (e.g. the CIAO chandra_repro script is written in Python).

Note that there are several versions to be aware of:

  • the Python version ("big" changes between Python 2.x and Python 3.y)

  • the IPython version

  • the versions of various packages/systems we use (e.g. NumPy and Matplotlib; these versions aren't shown here)

What about R?

You should also be aware of R - The R project for Statistical Computing - which is not Python (and you can combine both if you realy want to), but is important as Astronomy moves into the era of

big data

courtesy of https://medium.com/@wtrsld/big-data-made-me-do-it-5bfc3f46871c

Terminology

  • Object Oriented

  • Dynamic Typing

  • Strongly Typed

I have a brief introduction at another IPython notebook but there are many other, better, examples around.

Computer Science has its own terminology, just like X-ray Astronomy.

In [1]:
from IPython.display import Image
Image('question.jpg')
Out[1]:

Thanks to Steve for the photo opportunity. This photograph is copyright Douglas Burke, 2014 and is not for re-use without express permission.

  • XMM has patterns, Chandra has ???

  • Optical has blue and red, X-ray Astronomers use ??? and ???

Important words and phrases

Google(*) for "python for Astronomy".

(* not Wolfram Alpha!)

Note: many of these packages are changing fast.

searching for Astronomy

  • Here's the AstroPy home page:

AstroPy home page

  • self promotion, as I'm an author of this paper...

Acknowledging or Citing Astropy In Publications

If you use Astropy for work/research presented in a publication (whether directly, or as a dependency to another package), we would be grateful if you could include the following acknowledgment:

This research made use of Astropy, a community-developed core Python package for Astronomy (Astropy Collaboration, 2013).

where (Astropy Collaboration, 2013) is a citation to the Astropy Paper (ADS - BibTeX). If you wish, you can also include a link to http://www.astropy.org (if the journal allows this) in addition to the above text.

  • and just to re-iterate this; here's the paper on ADS

AstroPy citation on ADS

Quick question: who uses the "new" ADS interface?

  • here's the Machine Learning book; or at least its cover

AstroML book

  • there are plenty of tutorials using IPython notebooks

IPython, notebooks, and Astronomy

Matplotlib and XKXD

Installation

It's as easy as CIAO, SAS, and FTOOLS, all rolled up as one ...

CIAO has its own Python installation (works well in most cases, but it does not play nicely with other systems). Radio and HST have something similar.

A popular installation is Anaconda https://store.continuum.io/cshop/anaconda/ since it is a binary installation (SciPy can be a tad awkward to build) with a lot of scientific software provided, including astropy.

Your computer almost-certainly has its own Python installation.

Mixing and matching the different installations will lead to

Sad Panda

The image is from http://knowyourmeme.com/memes/sad-panda


Using (I)Python

For the notebook I need to tell Matplotlib to add the images "into" the notebook, rather than throw up a window.

Err, I don't think I thought that sentence through...

In [2]:
%matplotlib inline

The main reason for using Python is because there's code out there that is useful to us; here's some useful imports - if you say

ipython --pylab

then this is done for you (note that this depends on the version of IPython you are using; e.g. ipython -pylab or ipython --matplotlib qt).

In [3]:
import numpy as np
import matplotlib.pyplot as plt
In [4]:
plt.rcParams['figure.figsize'] = (8, 6)

I am going to use AstroPy - remember: http://www.astropy.org/ - to read in FITS data:

In [5]:
from astropy.io import fits

You can use help(fits), but the output is a bit long for this example, so I've picked a symbol somewhat at random:

In [6]:
help(plt.plot)
Help on function plot in module matplotlib.pyplot:

plot(*args, **kwargs)
    Plot lines and/or markers to the
    :class:`~matplotlib.axes.Axes`.  *args* is a variable length
    argument, allowing for multiple *x*, *y* pairs with an
    optional format string.  For example, each of the following is
    legal::
    
        plot(x, y)        # plot x and y using default line style and color
        plot(x, y, 'bo')  # plot x and y using blue circle markers
        plot(y)           # plot y using x as index array 0..N-1
        plot(y, 'r+')     # ditto, but with red plusses
    
    If *x* and/or *y* is 2-dimensional, then the corresponding columns
    will be plotted.
    
    An arbitrary number of *x*, *y*, *fmt* groups can be
    specified, as in::
    
        a.plot(x1, y1, 'g^', x2, y2, 'g-')
    
    Return value is a list of lines that were added.
    
    By default, each line is assigned a different color specified by a
    'color cycle'.  To change this behavior, you can edit the
    axes.color_cycle rcParam.
    
    The following format string characters are accepted to control
    the line style or marker:
    
    ================    ===============================
    character           description
    ================    ===============================
    ``'-'``             solid line style
    ``'--'``            dashed line style
    ``'-.'``            dash-dot line style
    ``':'``             dotted line style
    ``'.'``             point marker
    ``','``             pixel marker
    ``'o'``             circle marker
    ``'v'``             triangle_down marker
    ``'^'``             triangle_up marker
    ``'<'``             triangle_left marker
    ``'>'``             triangle_right marker
    ``'1'``             tri_down marker
    ``'2'``             tri_up marker
    ``'3'``             tri_left marker
    ``'4'``             tri_right marker
    ``'s'``             square marker
    ``'p'``             pentagon marker
    ``'*'``             star marker
    ``'h'``             hexagon1 marker
    ``'H'``             hexagon2 marker
    ``'+'``             plus marker
    ``'x'``             x marker
    ``'D'``             diamond marker
    ``'d'``             thin_diamond marker
    ``'|'``             vline marker
    ``'_'``             hline marker
    ================    ===============================
    
    
    The following color abbreviations are supported:
    
    ==========  ========
    character   color
    ==========  ========
    'b'         blue
    'g'         green
    'r'         red
    'c'         cyan
    'm'         magenta
    'y'         yellow
    'k'         black
    'w'         white
    ==========  ========
    
    In addition, you can specify colors in many weird and
    wonderful ways, including full names (``'green'``), hex
    strings (``'#008000'``), RGB or RGBA tuples (``(0,1,0,1)``) or
    grayscale intensities as a string (``'0.8'``).  Of these, the
    string specifications can be used in place of a ``fmt`` group,
    but the tuple forms can be used only as ``kwargs``.
    
    Line styles and colors are combined in a single format string, as in
    ``'bo'`` for blue circles.
    
    The *kwargs* can be used to set line properties (any property that has
    a ``set_*`` method).  You can use this to set a line label (for auto
    legends), linewidth, anitialising, marker face color, etc.  Here is an
    example::
    
        plot([1,2,3], [1,2,3], 'go-', label='line 1', linewidth=2)
        plot([1,2,3], [1,4,9], 'rs',  label='line 2')
        axis([0, 4, 0, 10])
        legend()
    
    If you make multiple lines with one plot command, the kwargs
    apply to all those lines, e.g.::
    
        plot(x1, y1, x2, y2, antialised=False)
    
    Neither line will be antialiased.
    
    You do not need to use format strings, which are just
    abbreviations.  All of the line properties can be controlled
    by keyword arguments.  For example, you can set the color,
    marker, linestyle, and markercolor with::
    
        plot(x, y, color='green', linestyle='dashed', marker='o',
             markerfacecolor='blue', markersize=12).
    
    See :class:`~matplotlib.lines.Line2D` for details.
    
    The kwargs are :class:`~matplotlib.lines.Line2D` properties:
    
      agg_filter: unknown
      alpha: float (0.0 transparent through 1.0 opaque)         
      animated: [True | False]         
      antialiased or aa: [True | False]         
      axes: an :class:`~matplotlib.axes.Axes` instance         
      clip_box: a :class:`matplotlib.transforms.Bbox` instance         
      clip_on: [True | False]         
      clip_path: [ (:class:`~matplotlib.path.Path`,         :class:`~matplotlib.transforms.Transform`) |         :class:`~matplotlib.patches.Patch` | None ]         
      color or c: any matplotlib color         
      contains: a callable function         
      dash_capstyle: ['butt' | 'round' | 'projecting']         
      dash_joinstyle: ['miter' | 'round' | 'bevel']         
      dashes: sequence of on/off ink in points         
      drawstyle: ['default' | 'steps' | 'steps-pre' | 'steps-mid' |                   'steps-post']         
      figure: a :class:`matplotlib.figure.Figure` instance         
      fillstyle: ['full' | 'left' | 'right' | 'bottom' | 'top' | 'none']         
      gid: an id string         
      label: string or anything printable with '%s' conversion.         
      linestyle or ls: [``'-'`` | ``'--'`` | ``'-.'`` | ``':'`` | ``'None'`` |                   ``' '`` | ``''``]         and any drawstyle in combination with a linestyle, e.g., ``'steps--'``.         
      linewidth or lw: float value in points         
      lod: [True | False]         
      marker: unknown
      markeredgecolor or mec: any matplotlib color         
      markeredgewidth or mew: float value in points         
      markerfacecolor or mfc: any matplotlib color         
      markerfacecoloralt or mfcalt: any matplotlib color         
      markersize or ms: float         
      markevery: unknown
      path_effects: unknown
      picker: float distance in points or callable pick function         ``fn(artist, event)``         
      pickradius: float distance in points         
      rasterized: [True | False | None]         
      sketch_params: unknown
      snap: unknown
      solid_capstyle: ['butt' | 'round' |  'projecting']         
      solid_joinstyle: ['miter' | 'round' | 'bevel']         
      transform: a :class:`matplotlib.transforms.Transform` instance         
      url: a url string         
      visible: [True | False]         
      xdata: 1D array         
      ydata: 1D array         
      zorder: any number         
    
    kwargs *scalex* and *scaley*, if defined, are passed on to
    :meth:`~matplotlib.axes.Axes.autoscale_view` to determine
    whether the *x* and *y* axes are autoscaled; the default is
    *True*.
    
    
    
    Additional kwargs: hold = [True|False] overrides default hold state

The reason for all the :class... and extra chatacters, such as * and ~ - is to make a "nice" HTML page:

on the web it looks nicer

http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot

You can find out what symbols - i.e. the name of a "thing" - are attached to another symbol using the dir command:

In [7]:
dir(fits)
Out[7]:
['BinTableHDU',
 'Card',
 'CardList',
 'ColDefs',
 'Column',
 'CompImageHDU',
 'Conf',
 'DELAYED',
 'Delayed',
 'ENABLE_RECORD_VALUED_KEYWORD_CARDS',
 'EXTENSION_NAME_CASE_SENSITIVE',
 'FITSDiff',
 'FITS_rec',
 'FITS_record',
 'FitsHDU',
 'Group',
 'GroupData',
 'GroupsHDU',
 'HDUDiff',
 'HDUList',
 'Header',
 'HeaderDiff',
 'ImageDataDiff',
 'ImageHDU',
 'PrimaryHDU',
 'RawDataDiff',
 'STRIP_HEADER_WHITESPACE',
 'Section',
 'StreamingHDU',
 'TableDataDiff',
 'TableHDU',
 'USE_MEMMAP',
 'Undefined',
 'VerifyError',
 '__all__',
 '__builtins__',
 '__doc__',
 '__file__',
 '__name__',
 '__package__',
 '__path__',
 '_config',
 'append',
 'card',
 'column',
 'compression',
 'conf',
 'convenience',
 'create_card',
 'create_card_from_string',
 'delval',
 'diff',
 'file',
 'fitsrec',
 'getdata',
 'getheader',
 'getval',
 'hdu',
 'header',
 'info',
 'new_table',
 'open',
 'py3compat',
 'register_hdu',
 'setval',
 'tabledump',
 'tableload',
 'tcreate',
 'tdump',
 'unregister_hdu',
 'update',
 'upper_key',
 'util',
 'verify',
 'writeto']

Don't forget to talk about tab completion!


I want to read in a FITS file, so open looks a good place to start (plus, I've used this module before, so I know what I want to do ;-).

In [8]:
help(fits.open)
Help on function fitsopen in module astropy.io.fits.hdu.hdulist:

fitsopen(name, mode='readonly', memmap=None, save_backup=False, **kwargs)
    Factory function to open a FITS file and return an `HDUList` object.
    
    Parameters
    ----------
    name : file path, file object or file-like object
        File to be opened.
    
    mode : str
        Open mode, 'readonly' (default), 'update', 'append', 'denywrite', or
        'ostream'.
    
        If ``name`` is a file object that is already opened, ``mode`` must
        match the mode the file was opened with, readonly (rb), update (rb+),
        append (ab+), ostream (w), denywrite (rb)).
    
    memmap : bool
        Is memory mapping to be used?
    
    save_backup : bool
        If the file was opened in update or append mode, this ensures that a
        backup of the original file is saved before any changes are flushed.
        The backup has the same name as the original file with ".bak" appended.
        If "file.bak" already exists then "file.bak.1" is used, and so on.
    
    kwargs : dict
        optional keyword arguments, possible values are:
    
        - **uint** : bool
    
            Interpret signed integer data where ``BZERO`` is the
            central value and ``BSCALE == 1`` as unsigned integer
            data.  For example, ``int16`` data with ``BZERO = 32768``
            and ``BSCALE = 1`` would be treated as ``uint16`` data.
    
            Note, for backward compatibility, the kwarg **uint16** may
            be used instead.  The kwarg was renamed when support was
            added for integers of any size.
    
        - **ignore_missing_end** : bool
    
            Do not issue an exception when opening a file that is
            missing an ``END`` card in the last header.
    
        - **checksum** : bool, str
    
            If `True`, verifies that both ``DATASUM`` and
            ``CHECKSUM`` card values (when present in the HDU header)
            match the header and data of all HDU's in the file.  Updates to a
            file that already has a checksum will preserve and update the
            existing checksums unless this argument is given a value of
            'remove', in which case the CHECKSUM and DATASUM values are not
            checked, and are removed when saving changes to the file.
    
        - **disable_image_compression** : bool
    
            If `True`, treates compressed image HDU's like normal
            binary table HDU's.
    
        - **do_not_scale_image_data** : bool
    
            If `True`, image data is not scaled using BSCALE/BZERO values
            when read.
    
        - **ignore_blank** : bool
           If `True`, the BLANK keyword is ignored if present.
    
        - **scale_back** : bool
    
            If `True`, when saving changes to a file that contained scaled
            image data, restore the data to the original type and reapply the
            original BSCALE/BZERO values.  This could lead to loss of accuracy
            if scaling back to integer values after performing floating point
            operations on the data.
    
    Returns
    -------
        hdulist : an `HDUList` object
            `HDUList` containing all of the header data units in the
            file.

Let's look in the data directory:

In [9]:
%ls data
evt2.fits  lc.fits  mystery-picture.fits  swift.evt

For some common commands you do not need the % character; so I could have said:

In [10]:
ls data
evt2.fits  lc.fits  mystery-picture.fits  swift.evt

Let's read in some data:

In [11]:
hdus = fits.open('data/mystery-picture.fits')

Aside:

import astropy.io.fits
hdus = astropy.io.fits.open(...)

import astropy.io.fits as keith
hdus = keith.open(...)

import astropy.io.fits
keith = astropy.io.fits.open
hdus = keith.open(...)

# do *NOT* do the following, as it will overwrite the 
# standard `open` function, leading to "amusing" erros
# at some time in the future
from astropy.io.fits import open
hdus = open(...)

# Similarly, don't use the name time for a variable.


If you are wondering what a symbol "means", you can try just entering it at the prompt and seeing what gets returned (remember, this is IPython, which automatically prints out anything that is "returned", as long as it is not the special value None).

In [12]:
hdus
Out[12]:
[<astropy.io.fits.hdu.image.PrimaryHDU at 0x7f7d93333310>]

So, it is an array of "things", and there's only one of these "things". So, let's access the first element, which is labelled 0 in Python:

In [13]:
ihdu = hdus[0]
ihdu
Out[13]:
<astropy.io.fits.hdu.image.PrimaryHDU at 0x7f7d93333310>

If you want to find out how long something is, use len:

In [14]:
len(hdus)
Out[14]:
1
In [15]:
len("Carlos é magnífico")
Out[15]:
20

but not everything has a "length":


A quick aside on errors:

In [16]:
len(ihdu)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-729b550a1ce2> in <module>()
----> 1 len(ihdu)

TypeError: object of type 'PrimaryHDU' has no len()

As with using SAS, FTOOLS, CIAO, ... - it is helpful to know how to "read" error messages!


So, you can have multiple lines in each "cell" of the notebook that get executed, and it's the output from the last command that will get displayed. That means that

In [17]:
2 + 3
"David Hasselhoff en alemán significa impresionante"
(-3.4 + 2j) * (4.1 - 0.8j)
Out[17]:
(-12.339999999999998+10.92j)

This is only important if you are using IPython notebooks (so I don't know why I mentioned it).

Alternatively you can use the print command (in Python 2.x it doesn't need the () around what you want to print but it is needed in 3.x and above).

In [18]:
print(ihdu)
<astropy.io.fits.hdu.image.PrimaryHDU object at 0x7f7d93333310>
In [19]:
print ihdu
<astropy.io.fits.hdu.image.PrimaryHDU object at 0x7f7d93333310>

Some Python classes have written a "human-readable" output for print (or for when you just get IPython to display the value), as we will see very soon.

In [20]:
dir(ihdu)
Out[20]:
['ImgCode',
 'NumCode',
 '_EXCLUDE',
 '_MASK',
 '__class__',
 '__delattr__',
 '__dict__',
 '__doc__',
 '__format__',
 '__getattribute__',
 '__hash__',
 '__init__',
 '__module__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_axes',
 '_bitpix',
 '_blank',
 '_bscale',
 '_buffer',
 '_bzero',
 '_calculate_checksum',
 '_calculate_datasum',
 '_char_encode',
 '_compute_checksum',
 '_compute_hdu_checksum',
 '_convert_pseudo_unsigned',
 '_datLoc',
 '_datSpan',
 '_data_loaded',
 '_data_needs_rescale',
 '_data_offset',
 '_data_replaced',
 '_data_size',
 '_default_name',
 '_do_not_scale_image_data',
 '_dtype_for_bitpix',
 '_encode_byte',
 '_file',
 '_gcount',
 '_get_raw_data',
 '_get_scaled_image_data',
 '_get_timestamp',
 '_has_data',
 '_hdrLoc',
 '_hdu_registry',
 '_header',
 '_header_offset',
 '_modified',
 '_new',
 '_orig_bitpix',
 '_orig_blank',
 '_orig_bscale',
 '_orig_bzero',
 '_output_checksum',
 '_padding_byte',
 '_pcount',
 '_postwriteto',
 '_prewriteto',
 '_readfrom_internal',
 '_scale_back',
 '_standard',
 '_summary',
 '_uint',
 '_update_checksum',
 '_update_header_scale_info',
 '_update_uint_scale_keywords',
 '_verify',
 '_verify_checksum_datasum',
 '_writedata',
 '_writedata_direct_copy',
 '_writedata_internal',
 '_writeheader',
 '_writeto',
 'add_checksum',
 'add_datasum',
 'copy',
 'data',
 'filebytes',
 'fileinfo',
 'fromstring',
 'header',
 'is_image',
 'level',
 'match_header',
 'name',
 'readfrom',
 'register_hdu',
 'req_cards',
 'run_option',
 'scale',
 'section',
 'shape',
 'size',
 'standard_keyword_comments',
 'unregister_hdu',
 'update_ext_name',
 'update_ext_version',
 'update_header',
 'ver',
 'verify',
 'verify_checksum',
 'verify_datasum',
 'writeto']

So now I want to look at the pixel data, which I happen to know is stored in the data attribute of the PrimaryHDU object.


Aside: when do I use ()?

In [21]:
carlos = ihdu.data()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-21-2656497dcf52> in <module>()
----> 1 carlos = ihdu.data()

TypeError: 'numpy.ndarray' object is not callable

Note that you can refer to a function - i.e. something that needs () - without them and not get an error; in fact, this can be vitally important in some cases. It does mean potential confusion for new users who don't know what is a function (thing that can get called) and what is an attribute (thing that contains information).

In reality it's more complex, since attributes can be made to call routines automatically when accessed, but that's all hidden behind the scenes and you don't need to really know about this until you start writing your own classes.

Oh, you can also tell IPython to allow you to write

symbol value

and get it to add in the brackets for you; e.g.

In [999]: open "swift-is-awesome.dat"
-------> open("swift-is-awesome.dat")
IOError: [Errno 2] No such file or directory: 'swift-is-awesome.dat'

which we use in ChIPS and Sherpa, but it's not standard, as it makes writing code hard (or at least, copying and pasting code hard).


In [22]:
carlos = ihdu.data
In [23]:
carlos == hdus[0].data
Out[23]:
array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ..., 
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True]], dtype=bool)
In [24]:
np.all(carlos == hdus[0].data)
Out[24]:
True

What this is saying is that I can use either the symbol carlos or hdus[0].data to get at the data. It also shows off some basic NumPy functionality (storing arrays of data; doing things with the data).

Let's see what this mysterios carlos is:

In [25]:
carlos
Out[25]:
array([[116, 116, 115, ..., 114, 116, 112],
       [111, 112, 114, ..., 113, 113, 113],
       [113, 114, 115, ..., 111, 112, 113],
       ..., 
       [ 68,  68,  69, ..., 126, 128, 125],
       [ 69,  71,  70, ..., 125, 124, 124],
       [ 70,  69,  68, ..., 124, 124, 124]], dtype=uint8)
In [26]:
print(carlos)
[[116 116 115 ..., 114 116 112]
 [111 112 114 ..., 113 113 113]
 [113 114 115 ..., 111 112 113]
 ..., 
 [ 68  68  69 ..., 126 128 125]
 [ 69  71  70 ..., 125 124 124]
 [ 70  69  68 ..., 124 124 124]]

Note that the output of these two are slightly different from each other, and that they try to be "nice" and not display every element (this can actually be tweaked if you want, as it's a part of NumPy).

In [27]:
carlos.shape
Out[27]:
(1200, 800)

And before Carlos complains, this is listed as number of Y pixels (so height) and then the number of X pixels (width). This is the "C-like" style of indexing array data, rather than the "Fortran-like" style.

In [28]:
carlos.dtype
Out[28]:
dtype('uint8')
In [29]:
plt.imshow(carlos)
Out[29]:
<matplotlib.image.AxesImage at 0x7f7d92a87110>

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

http://mic.com/articles/104474/brain-scans-reveal-what-dogs-really-think-of-us

In [30]:
plt.set_cmap('grey')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-83e946b4f45d> in <module>()
----> 1 plt.set_cmap('grey')

/home/dburke/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in set_cmap(cmap)
   2195     :func:`matplotlib.cm.get_cmap`.
   2196     """
-> 2197     cmap = cm.get_cmap(cmap)
   2198 
   2199     rc('image', cmap=cmap.name)

/home/dburke/anaconda/lib/python2.7/site-packages/matplotlib/cm.pyc in get_cmap(name, lut)
    161         raise ValueError(
    162             "Colormap %s is not recognized. Possible values are: %s"
--> 163             % (name, ', '.join(cmap_d.keys())))
    164 
    165 

ValueError: Colormap grey is not recognized. Possible values are: Spectral, summer, coolwarm, Wistia_r, pink_r, Set1, Set2, Set3, brg_r, Dark2, prism, PuOr_r, afmhot_r, terrain_r, PuBuGn_r, RdPu, gist_ncar_r, gist_yarg_r, Dark2_r, YlGnBu, RdYlBu, hot_r, gist_rainbow_r, gist_stern, PuBu_r, cool_r, cool, gray, copper_r, Greens_r, GnBu, gist_ncar, spring_r, gist_rainbow, gist_heat_r, Wistia, OrRd_r, CMRmap, bone, gist_stern_r, RdYlGn, Pastel2_r, spring, terrain, YlOrRd_r, Set2_r, winter_r, PuBu, RdGy_r, spectral, rainbow, flag_r, jet_r, RdPu_r, gist_yarg, BuGn, Paired_r, hsv_r, bwr, cubehelix, Greens, PRGn, gist_heat, spectral_r, Paired, hsv, Oranges_r, prism_r, Pastel2, Pastel1_r, Pastel1, gray_r, jet, Spectral_r, gnuplot2_r, gist_earth, YlGnBu_r, copper, gist_earth_r, Set3_r, OrRd, gnuplot_r, ocean_r, brg, gnuplot2, PuRd_r, bone_r, BuPu, Oranges, RdYlGn_r, PiYG, CMRmap_r, YlGn, binary_r, gist_gray_r, Accent, BuPu_r, gist_gray, flag, bwr_r, RdBu_r, BrBG, Reds, Set1_r, summer_r, GnBu_r, BrBG_r, Reds_r, RdGy, PuRd, Accent_r, Blues, autumn_r, autumn, cubehelix_r, nipy_spectral_r, ocean, PRGn_r, Greys_r, pink, binary, winter, gnuplot, RdYlBu_r, hot, YlOrBr, coolwarm_r, rainbow_r, Purples_r, PiYG_r, YlGn_r, Blues_r, YlOrBr_r, seismic, Purples, seismic_r, RdBu, Greys, BuGn_r, YlOrRd, PuOr, PuBuGn, nipy_spectral, afmhot

As mentioned earlier, the error messages can be useful, if you know where to look...

In [31]:
plt.imshow(carlos)
plt.set_cmap('gray')

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

Note that matplotlib can be run in a more interactive manner, in which case I could have just called set_cmap and it would have changed the existing plot. This doesn't work in the notebook environment that I am using, so we had to re-create the image here.

In [32]:
mariano = carlos[400:,:]
In [33]:
mariano.shape
Out[33]:
(800, 800)

If you know what you are doing, you can sometimes combine steps:

In [34]:
plt.imshow(mariano, cmap='gray')
Out[34]:
<matplotlib.image.AxesImage at 0x7f7d90e01a90>

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

In [35]:
from skimage.filter import canny
In [36]:
matteo = canny(mariano)

Remember you can use help(canny) to get more information. This information is also available on the web, since the Python documentation/help system makes it easy to convert the code documentation into other forms.

In [37]:
matteo.shape
Out[37]:
(800, 800)
In [38]:
matteo.dtype
Out[38]:
dtype('bool')
In [39]:
plt.imshow(matteo)
Out[39]:
<matplotlib.image.AxesImage at 0x7f7d9279c250>

Dog waiting to be scanned at MR Research Center in Budapest. Image Credit: Borbala Ferenczy.

I don't expect you to know that canny is the routine to use here (or in fact, whether it is remotely useful for your analysis); the point is to say that there's lots of functionality available to you in NumPy, SciPy, ... so you don't always have to go and write your own code.

A brief foray into timing analysis

Ack, variability

Bill the Cat, created by Berkeley Breathed: http://en.wikipedia.org/wiki/Bill_the_Cat

In [40]:
hdus = fits.open("data/evt2.fits")
print(hdus)
[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x7f7d928132d0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x7f7d7df00110>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x7f7d7df00890>]
In [41]:
hdus.info()
Filename: data/evt2.fits
No.    Name         Type      Cards   Dimensions   Format
0    PRIMARY     PrimaryHDU      32   ()              
1    EVENTS      BinTableHDU    731   35782R x 2C   [1D, 1E]   
2    GTI         BinTableHDU     28   4R x 2C      [1D, 1D]   
In [42]:
thdu = hdus[1]
In [43]:
print(thdu.columns)
ColDefs(
    name = 'time'; format = '1D'; unit = 's'
    name = 'energy'; format = '1E'; unit = 'eV'
)
In [44]:
coldata = thdu.data
In [45]:
type(coldata)
Out[45]:
astropy.io.fits.fitsrec.FITS_rec
In [46]:
coldata.size
Out[46]:
35782
In [47]:
coldata.shape
Out[47]:
(35782,)
In [48]:
coldata[0]
Out[48]:
(186103828.51639557, 1936.8479)
In [49]:
times = coldata.field('time')
In [50]:
times.shape
Out[50]:
(35782,)
In [51]:
times[0]
Out[51]:
186103828.51639557
In [52]:
dt = times - times[0]

So, dt is the arrival times of the events; since this is Chandra data the TIME column is in seconds in the Modified Julian Date system (with an "arbitrary" offset), dt is in seconds too (in this case relative to the first event).

In [53]:
dt[0:10]
Out[53]:
array([ 0.        ,  0.        ,  1.76416001,  2.64622009,  3.96931016,
        4.41035017,  6.17448026,  8.37966037,  8.82070038,  8.82070038])

Note that the times are discrete: why is this?

In [54]:
print(thdu.header['TIMEDEL'], thdu.header['EXPTIME'])
(0.44104, 0.4)

As an aside: if you need to create a string out of numbers and text, use format:

In [55]:
print("timedel={} exptime={}".format(thdu.header['TIMEDEL'], thdu.header['EXPTIME']))
timedel=0.44104 exptime=0.4

You will also see the use of % - as below - which is "old Python" and shouldn't really be used anymore (to make upgrading to Python 3 easier).

In [56]:
print "timedel=%f exptime=%f" % (thdu.header['TIMEDEL'], thdu.header['EXPTIME'])
timedel=0.441040 exptime=0.400000
In [57]:
dt[:20] / thdu.header['EXPTIME']
Out[57]:
array([  0.        ,   0.        ,   4.41040002,   6.61555022,
         9.92327541,  11.02587543,  15.43620065,  20.94915092,
        22.05175094,  22.05175094,  27.56467611,  28.66727613,
        31.97502635,  34.1802264 ,  38.59055154,  46.30862705,
        47.41122708,  49.61642705,  51.82157725,  56.23190247])
In [58]:
dt[0:20] / thdu.header['TIMEDEL']
Out[58]:
array([  0.        ,   0.        ,   4.00000002,   5.99995485,
         8.999887  ,   9.99988703,  13.9998192 ,  18.9997741 ,
        19.99977412,  19.99977412,  24.99970625,  25.99970627,
        28.99966112,  30.99966116,  34.99959327,  41.99948037,
        42.99948039,  44.99948036,  46.9994352 ,  50.99936737])
In [59]:
(dt[:20] / thdu.header['TIMEDEL']).astype(np.int)
Out[59]:
array([ 0,  0,  4,  5,  8,  9, 13, 18, 19, 19, 24, 25, 28, 30, 34, 41, 42,
       44, 46, 50])
In [60]:
Image(filename='wat.jpg')
Out[60]:

Let' break this down:

(dt[:20] / thdu.header['TIMEDEL']).astype(np.int)

diego = dt[:20]
diego /= thdu.header['TIMEDEL']
diego = diego.astype(np.int)

We can also access "from the end" of the array: here we get the last value:

In [61]:
dt[-1]
Out[61]:
41300.67773014307

Let's bin the data up, so we can view it easier. Here I pick a single binning factor (in this case ~ 400 seconds), but remember Diego's talks (in fact, all our talks), and explore your data (so in this case look at different binning values). You might to also want to use histogram in a slightly-different way (e.g. giving the bins to use; see help(np.histogram)), or other functions.

In [62]:
vals = np.histogram(dt, bins=100)

The histogram routine in NumPy returns both the binned values and the bin edges (always read the help to find out what is returned!) as a tuple (you can think of it as an array which you can not change).

In [63]:
vals
Out[63]:
(array([377, 400, 336, 353, 350, 326, 393, 374, 324, 381, 324, 348, 350,
        381, 376, 334, 317, 370, 367, 375, 354, 351, 338, 370, 343, 346,
        352, 324, 350, 355, 373, 349, 363, 348, 356, 408, 350, 363, 361,
        348, 343, 415, 344, 387, 346, 374, 368, 345, 373, 336, 364, 314,
        339, 354, 359, 350, 338, 344, 332, 310, 341, 351, 354, 383, 360,
        347, 336, 379, 328, 367, 368, 340, 351, 376, 364, 372, 342, 382,
        358, 361, 378, 362, 358, 390, 364, 358, 348, 372, 364, 360, 355,
        329, 387, 372, 380, 351, 345, 363, 388, 405]),
 array([     0.        ,    413.0067773 ,    826.0135546 ,   1239.0203319 ,
          1652.02710921,   2065.03388651,   2478.04066381,   2891.04744111,
          3304.05421841,   3717.06099571,   4130.06777301,   4543.07455032,
          4956.08132762,   5369.08810492,   5782.09488222,   6195.10165952,
          6608.10843682,   7021.11521412,   7434.12199143,   7847.12876873,
          8260.13554603,   8673.14232333,   9086.14910063,   9499.15587793,
          9912.16265523,  10325.16943254,  10738.17620984,  11151.18298714,
         11564.18976444,  11977.19654174,  12390.20331904,  12803.21009634,
         13216.21687365,  13629.22365095,  14042.23042825,  14455.23720555,
         14868.24398285,  15281.25076015,  15694.25753745,  16107.26431476,
         16520.27109206,  16933.27786936,  17346.28464666,  17759.29142396,
         18172.29820126,  18585.30497856,  18998.31175587,  19411.31853317,
         19824.32531047,  20237.33208777,  20650.33886507,  21063.34564237,
         21476.35241967,  21889.35919698,  22302.36597428,  22715.37275158,
         23128.37952888,  23541.38630618,  23954.39308348,  24367.39986078,
         24780.40663809,  25193.41341539,  25606.42019269,  26019.42696999,
         26432.43374729,  26845.44052459,  27258.44730189,  27671.4540792 ,
         28084.4608565 ,  28497.4676338 ,  28910.4744111 ,  29323.4811884 ,
         29736.4879657 ,  30149.494743  ,  30562.50152031,  30975.50829761,
         31388.51507491,  31801.52185221,  32214.52862951,  32627.53540681,
         33040.54218411,  33453.54896142,  33866.55573872,  34279.56251602,
         34692.56929332,  35105.57607062,  35518.58284792,  35931.58962522,
         36344.59640253,  36757.60317983,  37170.60995713,  37583.61673443,
         37996.62351173,  38409.63028903,  38822.63706633,  39235.64384364,
         39648.65062094,  40061.65739824,  40474.66417554,  40887.67095284,
         41300.67773014]))
In [64]:
Image('question.jpg')
Out[64]:

This photograph is copyright Douglas Burke, 2014 and is not for re-use without express permission.

Question: what have I done wrong here?

Perhaps it's an "instrumental effect" that I've forgotten about...

In [65]:
(y,x) = vals
print(y.shape)
print(x.shape)
(100,)
(101,)

Question: why are these values not the same?

In [66]:
plt.plot(x[:-1], y)
Out[66]:
[<matplotlib.lines.Line2D at 0x7f7d7ddb86d0>]

Look around; there may be other useful routines! For instance, matplotlib has a routine that will bin up data using histogram and then plot it as a histogram:

In [67]:
plt.hist(dt, bins=100)
Out[67]:
(array([ 377.,  400.,  336.,  353.,  350.,  326.,  393.,  374.,  324.,
         381.,  324.,  348.,  350.,  381.,  376.,  334.,  317.,  370.,
         367.,  375.,  354.,  351.,  338.,  370.,  343.,  346.,  352.,
         324.,  350.,  355.,  373.,  349.,  363.,  348.,  356.,  408.,
         350.,  363.,  361.,  348.,  343.,  415.,  344.,  387.,  346.,
         374.,  368.,  345.,  373.,  336.,  364.,  314.,  339.,  354.,
         359.,  350.,  338.,  344.,  332.,  310.,  341.,  351.,  354.,
         383.,  360.,  347.,  336.,  379.,  328.,  367.,  368.,  340.,
         351.,  376.,  364.,  372.,  342.,  382.,  358.,  361.,  378.,
         362.,  358.,  390.,  364.,  358.,  348.,  372.,  364.,  360.,
         355.,  329.,  387.,  372.,  380.,  351.,  345.,  363.,  388.,  405.]),
 array([     0.        ,    413.0067773 ,    826.0135546 ,   1239.0203319 ,
          1652.02710921,   2065.03388651,   2478.04066381,   2891.04744111,
          3304.05421841,   3717.06099571,   4130.06777301,   4543.07455032,
          4956.08132762,   5369.08810492,   5782.09488222,   6195.10165952,
          6608.10843682,   7021.11521412,   7434.12199143,   7847.12876873,
          8260.13554603,   8673.14232333,   9086.14910063,   9499.15587793,
          9912.16265523,  10325.16943254,  10738.17620984,  11151.18298714,
         11564.18976444,  11977.19654174,  12390.20331904,  12803.21009634,
         13216.21687365,  13629.22365095,  14042.23042825,  14455.23720555,
         14868.24398285,  15281.25076015,  15694.25753745,  16107.26431476,
         16520.27109206,  16933.27786936,  17346.28464666,  17759.29142396,
         18172.29820126,  18585.30497856,  18998.31175587,  19411.31853317,
         19824.32531047,  20237.33208777,  20650.33886507,  21063.34564237,
         21476.35241967,  21889.35919698,  22302.36597428,  22715.37275158,
         23128.37952888,  23541.38630618,  23954.39308348,  24367.39986078,
         24780.40663809,  25193.41341539,  25606.42019269,  26019.42696999,
         26432.43374729,  26845.44052459,  27258.44730189,  27671.4540792 ,
         28084.4608565 ,  28497.4676338 ,  28910.4744111 ,  29323.4811884 ,
         29736.4879657 ,  30149.494743  ,  30562.50152031,  30975.50829761,
         31388.51507491,  31801.52185221,  32214.52862951,  32627.53540681,
         33040.54218411,  33453.54896142,  33866.55573872,  34279.56251602,
         34692.56929332,  35105.57607062,  35518.58284792,  35931.58962522,
         36344.59640253,  36757.60317983,  37170.60995713,  37583.61673443,
         37996.62351173,  38409.63028903,  38822.63706633,  39235.64384364,
         39648.65062094,  40061.65739824,  40474.66417554,  40887.67095284,
         41300.67773014]),
 <a list of 100 Patch objects>)

FFT ahoy

In [68]:
cvals = np.fft.fft(y)
print("shape={}".format(cvals.shape))
print("data type={}".format(cvals.dtype))
shape=(100,)
data type=complex128
In [69]:
avals = np.abs(cvals)**2
print("data type={}".format(avals.dtype))
data type=float64
In [70]:
plt.plot(avals)
plt.yscale('LOG')
plt.xlim(-5, 105)
Out[70]:
(-5, 105)
In [71]:
Image('question.jpg')
Out[71]:

This photograph is copyright Douglas Burke, 2014 and is not for re-use without express permission.

Question: how do you think I know to use -5 and 105 for the x limits?

In [72]:
tstep = thdu.header['TIMEDEL']
freqs = np.fft.fftfreq(y.size, tstep)
idx = np.argsort(freqs)

The above was found via random googling: you will find that many questions are asked on StackOverflow; some are even answered.

Some of these answers are even useful!

In [73]:
plt.plot(freqs[idx], avals[idx])
plt.yscale('LOG')

We can get rid of that peak by subtracting the mean of the signal; note that we don't need to recalculate freqs (although there is no harm in doing so, as it is not a long calculation).

Actually, I'm getting bored of this calculation, so let's automate it, since isn't that the whole point of computers?

In [74]:
def diego(times, tstep):
    """What Would Diego Do?"""
    
    fft = np.fft.fft(times)
    real = np.abs(fft)**2
    freqs = np.fft.fftfreq(times.size, tstep)
    idx = np.argsort(freqs)
    return (freqs[idx], real[idx])
In [75]:
(a,b) = diego(y - y.mean(), tstep)
plt.plot(a, b)
Out[75]:
[<matplotlib.lines.Line2D at 0x7f7d7d761950>]

Moar light curves

In [76]:
stimes = fits.open("data/swift.evt")[1].data.field('time')
In [77]:
Image(filename='wat.jpg')
Out[77]: