Simulating a 2D image and error analysis using the object API

Thanks to some inspired thinking by a colleague - left anonymous, to protect the innocent - I am going to try the name "the object API" - and variants thereof - to describe what I've previously called the "low-level API". That is, the API you get from using the Sherpa classes directly, rather than the routines in sherpa.ui or sherpa.astro.ui.

To celebrate this new name, I'm going to show how to simulate an image - i.e. model a 2D data set and then add in some noise - then fit it, calculating errors on the fit parameters, and displaying the results.

As well as Sherpa and matplotlib, this notebook requires that scipy is also installed (but it's not a major component of the analysis so it can be skipped if you don't have it). If you have anaconda or miniconda installed then you can set up an environment to run this notebook with the following commands (assuming a bash shell):

% conda create -n=image-test -c sherpa sherpa python=3.5 astropy matplotlib scipy ipython-notebook
% source activate image-test
% git clone
% cd sherpa-standalone-notebooks
% jupyter notebook

and then select the "simulating a 2D image and a bit of error analysis" option from the web page that should be displayed in your web browser.

Author and disclaimer

This was written by Douglas Burke on June 19 2015. This notebook, and others that may be of interest, can be found on GitHub at

The information in this document is placed into the Publc Domain. It is not an official product of the Chandra X-ray Center, and I make no guarantee that it is not without bugs or embarassing typos. Please contact me via the GitHub repository or on Twitter - at @doug_burke - if you have any questions.

Follow up

I have updated the notebook for the Sherpa 4.8.2 release in September 2016; there are no changes to the notebook, but you can now use Python 3.5 as well as 2.7. Actually, there's one minor change to the notebook, but that is due to a change in the IPython/Jupyter code rather than Sherpa: it is no longer necessary to say

import logging
logging.getLogger('sherpa').propagate = False

to stop seeing double messages from Sherpa.

Last run

When was this notebook last run?

In [1]:
import datetime"%Y-%m-%d %H:%M")
'2016-09-27 10:41'

Setting up the environment

In [2]:
import numpy as np
from matplotlib import pyplot as plt
In [3]:
%matplotlib inline
In [4]:

Creating a 2D model

As I am going to simulate an image, I need to define the coordinates to be used. I could use "pixel" coordinates (e.g. 0 to n-1), but that's not very interesting, so I have decided to pick a grid with X values between 3000 and 4000, and Y values 4000 to 4800. To keep things simple, I am going to use square pixels, of side 5, and I am also going to use the Sherpa naming scheme for the independent axes, so that X is represented by x0, and Y is represented by x1 (for Sherpa, y is the dependent axis). The choice of variable names is not relevant to Sherpa, but I wanted to keep a consistent scheme since I end up using x0, x1, and y attributes later on in the notebook.

This all means that I want:

In [5]:
x1low = 4000
x1high = 4800

x0low = 3000
x0high = 4000

dx = 5

x1,x0 = np.mgrid[x1low:x1high:dx, x0low:x0high:dx]

This turns into a 200 by 160 pixel grid - as shown by

In [6]:
(160, 200)

which is a reasonable size for testing and development (there's only so long I can re-run these commands when writing the notebook!). The coordinates are going to be the points at which the model is evaluated; I am going to consider them to be the center of each pixel - so that the X range of the image is actually 2997.5 to 4002.5, going from left edge to right edge - but it doesn't really matter as long as you remain consistent in your interpretation$^\dagger$. Note that I am using the support for "unbinned" images in sherpa - that is, evaluate the model at a single point, rather than integrate it over the pixel - so the interpretation of the coordinate positions is important to understand.

$^\dagger$ if I were using ds9 image visualizer to display images - using the optional DS9 image backend in Sherpa - then the converntion would matter, since Sherpa/DS9 make the same assumption on coordinates that I do.

Now that I have a grid, I can define one or more Sherpa model components, and pass them the coordinate grid to create a simulated model. For this notebook I am going to use the sum of two "2D beta" profiles, which is an Astronomy specific name for a profile with a radial variation of

$$f(r) \propto (1 + (r/r_0)^2)^{-\alpha}$$

The model supports an elliptical profile, but I am going to use the circular form, and have the centers of the two components be the same. The important parameters are therefore: the core radius (r0), which sets the scale of the emission; the alpha parameter which determines the slope at large radii (where it's $f(r) \sim r^{-2\alpha}$); and the normalisation (ampl):

In [7]:
from sherpa.astro.models import Beta2D

cpt1 = Beta2D('cpt1')
cpt2 = Beta2D('cpt2')
model = cpt1 + cpt2

(cpt1 + cpt2)
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   cpt1.r0      thawed           10  1.17549e-38  3.40282e+38           
   cpt1.xpos    thawed            0 -3.40282e+38  3.40282e+38           
   cpt1.ypos    thawed            0 -3.40282e+38  3.40282e+38           
   cpt1.ellip   frozen            0            0        0.999           
   cpt1.theta   frozen            0     -6.28319      6.28319    radians
   cpt1.ampl    thawed            1 -3.40282e+38  3.40282e+38           
   cpt1.alpha   thawed            1          -10           10           
   cpt2.r0      thawed           10  1.17549e-38  3.40282e+38           
   cpt2.xpos    thawed            0 -3.40282e+38  3.40282e+38           
   cpt2.ypos    thawed            0 -3.40282e+38  3.40282e+38           
   cpt2.ellip   frozen            0            0        0.999           
   cpt2.theta   frozen            0     -6.28319      6.28319    radians
   cpt2.ampl    thawed            1 -3.40282e+38  3.40282e+38           
   cpt2.alpha   thawed            1          -10           10           

As of the Sherpa 4.8.2 release (September 2016), model classes now contain documentation, as shown below:

In [8]:
Help on class Beta2D in module sherpa.astro.models:

class Beta2D(sherpa.models.model.ArithmeticModel)
 |  Two-dimensional beta model function.
 |  The beta model is a Lorentz model with a varying power law.
 |  Attributes
 |  ----------
 |  r0
 |      The core radius.
 |  xpos
 |      The center of the model on the x0 axis.
 |  ypos
 |      The center of the model on the x1 axis.
 |  ellip
 |      The ellipticity of the model.
 |  theta
 |      The angle of the major axis. It is in radians, measured
 |      counter-clockwise from the X0 axis (i.e. the line X1=0).
 |  ampl
 |      The amplitude refers to the maximum peak of the model.
 |  alpha
 |      This parameter controls the slope of the profile at large
 |      radii.
 |  See Also
 |  --------
 |  Beta1D, DeVaucouleurs2D, HubbleReynolds, Lorentz2D, Sersic2D
 |  Notes
 |  -----
 |  The functional form of the model for points is::
 |      f(x0,x1) = ampl * (1 + r(x0,x1)^2)^(-alpha)
 |      r(x0,x1)^2 = xoff(x0,x1)^2 * (1-ellip)^2 + yoff(x0,x1)^2
 |                   -------------------------------------------
 |                                r0^2 * (1-ellip)^2
 |      xoff(x0,x1) = (x0 - xpos) * cos(theta) + (x1 - ypos) * sin(theta)
 |      yoff(x0,x1) = (x1 - ypos) * cos(theta) - (x0 - xpos) * sin(theta)
 |  The grid version is evaluated by adaptive multidimensional
 |  integration scheme on hypercubes using cubature rules, based
 |  on code from HIntLib ([1]_) and GSL ([2]_).
 |  References
 |  ----------
 |  .. [1] HIntLib - High-dimensional Integration Library
 |  .. [2] GSL - GNU Scientific Library
 |  Method resolution order:
 |      Beta2D
 |      sherpa.models.model.ArithmeticModel
 |      sherpa.models.model.Model
 |      sherpa.utils.NoNewAttributesAfterInit
 |      builtins.object
 |  Methods defined here:
 |  __init__(self, name='beta2d')
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  calc(self, *args, **kwargs)
 |  get_center(self)
 |  guess(self, dep, *args, **kwargs)
 |  set_center(self, xpos, ypos, *args, **kwargs)
 |  ----------------------------------------------------------------------
 |  Methods inherited from sherpa.models.model.ArithmeticModel:
 |  __abs__ = func(self)
 |  __add__ = func(self, rhs)
 |  __div__ = func(self, rhs)
 |  __floordiv__ = func(self, rhs)
 |  __getitem__(self, filter)
 |  __mod__ = func(self, rhs)
 |  __mul__ = func(self, rhs)
 |  __neg__ = func(self)
 |  __pow__ = func(self, rhs)
 |  __radd__ = rfunc(self, lhs)
 |  __rdiv__ = rfunc(self, lhs)
 |  __rfloordiv__ = rfunc(self, lhs)
 |  __rmod__ = rfunc(self, lhs)
 |  __rmul__ = rfunc(self, lhs)
 |  __rpow__ = rfunc(self, lhs)
 |  __rsub__ = rfunc(self, lhs)
 |  __rtruediv__ = rfunc(self, lhs)
 |  __setstate__(self, state)
 |  __sub__ = func(self, rhs)
 |  __truediv__ = func(self, rhs)
 |  apply(self, outer, *otherargs, **otherkwargs)
 |  startup(self)
 |  teardown(self)
 |  ----------------------------------------------------------------------
 |  Methods inherited from sherpa.models.model.Model:
 |  __call__(self, *args, **kwargs)
 |      Call self as a function.
 |  __getattr__(self, name)
 |      # Make parameter access case insensitive
 |  __iter__(self)
 |      # This allows all models to be used in iteration contexts, whether or
 |      # not they're composite
 |  __repr__(self)
 |      Return repr(self).
 |  __setattr__(self, name, val)
 |      Implement setattr(self, name, value).
 |  __str__(self)
 |      Return str(self).
 |  reset(self)
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from sherpa.models.model.Model:
 |  thawedparhardmaxes
 |  thawedparhardmins
 |  thawedparmaxes
 |  thawedparmins
 |  thawedpars
 |  ----------------------------------------------------------------------
 |  Methods inherited from sherpa.utils.NoNewAttributesAfterInit:
 |  __delattr__(self, name)
 |      Implement delattr(self, name).
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from sherpa.utils.NoNewAttributesAfterInit:
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  __weakref__
 |      list of weak references to the object (if defined)

I am going to place the source near the center of the grid, but offset slightly, with the first component representing a bright core (large ampl, small r0, and large alpha, so that the emission is centrally concentrated), and the second component is an extended component (larger r0, smaller alpha so that the emission is "flatter", and a lower amplitude):

In [9]:
cpt1.xpos, cpt1.ypos = 3512, 4418
cpt2.xpos, cpt2.ypos = cpt1.xpos, cpt1.ypos

cpt1.r0, cpt2.r0 = 30, 120
cpt1.alpha, cpt2.alpha = 4.2, 2.1

cpt1.ampl, cpt2.ampl = 45, 10

(cpt1 + cpt2)
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   cpt1.r0      thawed           30  1.17549e-38  3.40282e+38           
   cpt1.xpos    thawed         3512 -3.40282e+38  3.40282e+38           
   cpt1.ypos    thawed         4418 -3.40282e+38  3.40282e+38           
   cpt1.ellip   frozen            0            0        0.999           
   cpt1.theta   frozen            0     -6.28319      6.28319    radians
   cpt1.ampl    thawed           45 -3.40282e+38  3.40282e+38           
   cpt1.alpha   thawed          4.2          -10           10           
   cpt2.r0      thawed          120  1.17549e-38  3.40282e+38           
   cpt2.xpos    linked         3512          expr: cpt1.xpos           
   cpt2.ypos    linked         4418          expr: cpt1.ypos           
   cpt2.ellip   frozen            0            0        0.999           
   cpt2.theta   frozen            0     -6.28319      6.28319    radians
   cpt2.ampl    thawed           10 -3.40282e+38  3.40282e+38           
   cpt2.alpha   thawed          2.1          -10           10           

Oops - after going on above that Sherpa uses x0 and x1 for the independent axes, I then go and show you a model where the position is labelled xpos and ypos. Oh well!

Evaluating the model is just a case of passing in the coordinate grid (so the X and Y axis values) to the model. The axes are assumed to be in 1D format (since you do not have to evaluate the full grid, if you only want a non-rectangular set of points), which means converting back and forth between 1D and 2D data sets (flatten to go from 2D to 1D and resize to go the other way). I've called the model mexp as it is the expected values for this model (as I'm about to go and add noise to it):

In [10]:
mexp = model(x0.flatten(), x1.flatten())

If I were using the UI level of Sherpa, and not using an IPython notebook, then I would use the sherpa.ui.image_data command to view the data. This brings up the data in an external viewer - the DS9 imaging and data visualization application - if installed, which allows you to adjust and interact with the data (e.g. to change the color scale and dynamic scaling of the data). For this notebook I am going to use the imaging support in matplotlib. Here's a view of the evaluated model, using a linear scale (left) and arcsinh scaling (right; based on the work of Lupton, Gunn & Szalay (1999)). A central source can be seen, that extends out across the image, but the dynamic range is such that it's hard to see the data at the edges and the core.

In [11]:
plt.imshow(mexp, origin='lower', cmap='cubehelix')

plt.imshow(np.arcsinh(mexp), origin='lower', cmap='cubehelix')
_ = plt.title('arcsinh')

I use the form _ = plt.XXX on the last line of the cells in these notebooks just to stop the notebook from displaying the string representation of a matplotlib object, as I find it distracting.

From the lefthand plot I can see that the data ranges from 0 to about 50, but what's the exact range?

In [12]:
print("Stats: min = {}  max = {}".format(mexp.min(), mexp.max()))
Stats: min = 0.007215205702565074  max = 53.34646802646546

Simulating data

I want to simulate this image using Poisson statistics, which is fortunately very easy to do, using numpy.random.poisson:

In [13]:
msim = np.random.poisson(mexp)
print("Stats: min = {}  max = {}".format(msim.min(), msim.max()))
Stats: min = 0  max = 64

Unsurprisingly, the result isn't quite as smooth as the model! The quantized nature of the data (all pixels have integer values) does mean that it's a bit easier to see the extended emission than in the earlier plots.

In [14]:
plt.imshow(np.arcsinh(msim), origin='lower', cmap='cubehelix')
plt.title('Simulated image')
_ = plt.colorbar()

As I have coordinates for the two axes, let's try and include them in the plots (I'm also already tired of setting the origin and cmap keywords). I'm not particularly proud of these routines, as they do rely on hidden state - e.g. the extent array - but they work well in this notebook!

In [15]:
x0min, x0max = x0.min(), x0.max()
x1min, x1max = x1.min(), x1.max()
extent = [x0min - dx/2, x0max + dx/2, x1min - dx/2, x1max + dx/2]

def image_data(idata, transform=np.arcsinh, **kwargs):
    """Create an image of idata, which is assumed to be an array of
    the same size as x and y. The data is passed through transform,
    which defaults to np.arcsinh, if not None.
    if transform is not None:
        idata = transform(idata)
    plt.imshow(idata, extent=extent, origin='lower', cmap='cubehelix', **kwargs)
def image_model(imodel, transform=np.arcsinh, **kwargs):
    """Create an image of the given model, using the x and y grids.
    The transform argument is applied to the data before display, if not None.
    The default is np.arcsinh.
    idata = imodel(x0.ravel(), x1.ravel()).reshape(x0.shape)
    image_data(idata, transform=transform, **kwargs)

# because I'm going to want them later on
def vline(x):
    (yl,yh) = plt.ylim()
    plt.ylim(yl, yh) # force limits
    plt.vlines(x, yl, yh)

def hline(y):
    (xl,xh) = plt.xlim()
    plt.xlim(xl, xh)
    plt.hlines(y, xl, xh)

Using these, I can display the simulated data:

In [16]:

I can also compare the simulated data (top left) to the full model (bottom left) and the individual components (right column):

In [17]:
image_data(msim, vmin=0, vmax=4.5)

image_model(model, vmin=0, vmax=4.5)

image_model(cpt1, vmin=0, vmax=4.5)

image_model(cpt2, vmin=0, vmax=4.5)

Fitting the data

To fit a model to the simulated data I need a data set (Data2D), statistic (in this case the Cash statistic, a Maximum Likelihood statistic for Poisson data, from "Parameter estimation in astronomy through application of the likelihood ratio", Cash, W. 1979, ApJ 228, 939), optimisation method (NelderMead), and the Fit object to control it all.

In [18]:
from import Data2D
from sherpa.stats import Cash
from sherpa.optmethods import NelderMead
from import Fit

To start off, I create the data object. It needs a name, the independent axes (x0 and x1), the dependent axis (msim), and then I also give the shape of the image. The shape isn't required here, but I do take advantage of it later on- in cell 29 - when creating images. Although in this case the grid is rectangular, with no missing data, Sherpa will happily accept sparse, or irregularly gridded, data.

In [19]:
d = Data2D('sim', x0.flatten(), x1.flatten(), msim.flatten(), shape=x0.shape)

I am going to fit the simulated image with the same model used to create it. I access the components using the parts attribute of the combined model - mainly to show that it can be done, since in this case there's no real benefit over writing the more-obvious form:

m1 = Beta2D('m1')
m2 = Beta2D('m2')
mdl_fit = m1 + m2
In [20]:
mdl_fit = Beta2D('m1') + Beta2D('m2')
(m1, m2) =

The default parameters - which are the same as the first time that I called print(model) in this notebook, in cell 7 - are not close to the best-fit parameters. Even worse, the two components have the same starting values, which means that the optimiser is going to have a tough time disentangling the two components. There's a number of ways I could calculate starting values for these parameters, and if I were doing this on real data I'd be interested to see how sensitive the fit is to the starting position, to make sure I had found the global minimum$^\dagger$.

I am going to set the peak position to the peak of the distribution - formed by summing the data up over the rows and columns, respectively - then use a completely ad-hoc guess for the core radii and amplitudes, based on the image extent and data values. First, I create the profiles along the X and Y axes.

$^\dagger$ In early versions of this notebook I did try starting from the default values. It required several fits and tweaks to nudge the fit into a good solution. Since my aim here is not to discuss the fine details of model fitting, and also because the previous approach took a long time to converge, I've decided to "help out" the optimiser.

In [21]:
ysum = msim.sum(axis=0)
xsum = msim.sum(axis=1)

With these arrays, I can find the coordinate of the maximum point:

In [22]:
xguess = x0[0, np.argmax(ysum)]
yguess = x1[np.argmax(xsum), 0]

This can perhaps be seen better graphically (note that the actual center is 3512,4418, but the grid is evaluated only every 5 pixels, which is why the guesses are both divisible by 5).

In [23]:
plt.subplot(2, 1, 1)
plt.plot(x0[0,:], ysum)
plt.legend(['data', 'x={}'.format(xguess)])

plt.subplot(2, 1, 2)
plt.plot(x1[:,0], xsum)
_ = plt.legend(['data', 'y={}'.format(yguess)])

I use this value for the position, and link the second component so that it uses the same values:

In [24]:
m1.xpos = xguess
m1.ypos = yguess
m2.xpos = m1.xpos
m2.ypos = m1.ypos

I am going to define the first component to have an amplitude equal to the maximum pixel value in the data, and the second one to be one-tenth that (this is completely arbitrary).

If I had said

m2.ampl = m1.ampl / 10.0

then the amplitude of the second component would have been fixed to be one-tenth that of the first (that is, parameters can be linked by an expression, not just a value).

In [25]:
m1.ampl = msim.max()
m2.ampl = msim.max() / 10.0

The core radii are taken from the height of the image (since it's the smallest axis):

In [26]:
m1.r0 = (x1max - x1min) / 10.0
m2.r0 = (x1max - x1min) / 4.0

The resulting set of parameters is:

In [27]:
(m1 + m2)
   Param        Type          Value          Min          Max      Units
   -----        ----          -----          ---          ---      -----
   m1.r0        thawed         79.5  1.17549e-38  3.40282e+38           
   m1.xpos      thawed         3515 -3.40282e+38  3.40282e+38           
   m1.ypos      thawed         4415 -3.40282e+38  3.40282e+38           
   m1.ellip     frozen            0            0        0.999           
   m1.theta     frozen            0     -6.28319      6.28319    radians
   m1.ampl      thawed           64 -3.40282e+38  3.40282e+38           
   m1.alpha     thawed            1          -10           10           
   m2.r0        thawed       198.75  1.17549e-38  3.40282e+38           
   m2.xpos      linked         3515            expr: m1.xpos           
   m2.ypos      linked         4415            expr: m1.ypos           
   m2.ellip     frozen            0            0        0.999           
   m2.theta     frozen            0     -6.28319      6.28319    radians
   m2.ampl      thawed          6.4 -3.40282e+38  3.40282e+38           
   m2.alpha     thawed            1          -10           10           

Note that the m2.xpos and m2.ypos parameters are marked as linked; that is, they are not free parameters themselves but are calculated from an expression based on other parameters (in this case the position of the first component).

The fit is started by creating a Fit object, giving the data, model, statistic, and optimization method (LevMar is generally faster than NelderMead but it isn't really suited to this type of statistic, and will often stop prematurely).

In [28]:
f = Fit(d, mdl_fit, Cash(), NelderMead())

This has not performed a fit yet! For that, I need to call the fit method, which can take some time:

In [29]:
res1 =
datasets       = None
itermethodname = none
methodname     = neldermead
statname       = cash
succeeded      = True
parnames       = ('m1.r0', 'm1.xpos', 'm1.ypos', 'm1.ampl', 'm1.alpha', 'm2.r0', 'm2.ampl', 'm2.alpha')
parvals        = (43.424902385688533, 3511.6336079725506, 4418.2589557453121, 43.080687358517046, 8.2941850069076022, 116.6307439503573, 10.384319151369729, 2.0694160143503075)
statval        = 1968.0454602658456
istatval       = 387737.2298887874
dstatval       = 385769.184429
numpoints      = 32000
dof            = 31992
qval           = None
rstat          = None
message        = Optimization terminated successfully
nfev           = 2451

I am going to repeat the fit, to check that the optimiser hasn't stopped early, and also to show how to change the format of the return value of the fit method:

In [30]:
res2 =
Method                = neldermead
Statistic             = cash
Initial fit statistic = 1968.05
Final fit statistic   = 1968.05 at function evaluation 589
Data points           = 32000
Degrees of freedom    = 31992
Change in statistic   = 3.07536e-07
   m1.r0          43.4305     
   m1.xpos        3511.63     
   m1.ypos        4418.26     
   m1.ampl        43.0803     
   m1.alpha       8.29622     
   m2.r0          116.63      
   m2.ampl        10.3844     
   m2.alpha       2.06941     

The second fit only changed the statistic by a very small amount ($\ll 10^{-5}$) which suggests that the fit is in a local minimum.

How can I visualize this? First, I evaluate the model, using the eval_model method of the data set:

In [31]:
meval = d.eval_model(mdl_fit).reshape(d.shape)

I could have given the grid to mdl_fit directly - as I did when creating msim and mexp - but wanted to show how to use the dataset. This also takes advantage of the shape attribute I set up when calling Data2D in cell 19.

I can use the routines I set up earlier to display the best-fit model:

In [32]:
_ = plt.title('Best fit')

I can also view the residuals - i.e. data - model - this time with no transform as the range of values is small and is centered on zero:

In [33]:
image_data(msim - meval, transform=None)
plt.title('Residuals: data - model')
_ = plt.colorbar()

Interpreting this residual image can be hard, since the variation is larger where the signal is larger, but is there any statistically-significant structure here? I would normally try things like smoothing the residuals, looking for structure, and comparing to the model and data. For now, let's just look at a heavily-smoothed version if the residuals (taking advantage of SciPy's gaussian_filter function).

In [34]:
from scipy import ndimage
In [35]:
smoothed = ndimage.gaussian_filter(msim - meval, sigma=(5, 5), order=0)
In [36]:
image_data(smoothed, transform=None)
plt.title('Smoothed residuals')
_ = plt.colorbar()

Note that, due to the heavy smoothing, the scale (the dynamic range of the image) is significantly smaller than the previous residual plot. We can compare the two with the same scale, but the result isn't particularly compelling (I think it's saying there's not much structure in the residuals, which is unsurprising given that this is a simualation).

In [37]:
plt.subplot(1, 2, 1)
image_data(msim - meval, transform=None, vmin=-5, vmax=5)

plt.subplot(1, 2, 2)
image_data(smoothed, transform=None, vmin=-5, vmax=5)