Thanks to some inspired thinking by a colleague - left anonymous, to protect the innocent - I am going to try the name "the object API" - and variants thereof - to describe what I've previously called the "low-level API". That is, the
API you get from using the Sherpa classes directly, rather than the routines in `sherpa.ui`

or `sherpa.astro.ui`

.

To celebrate this new name, I'm going to show how to simulate an image - i.e. model a 2D data set and then add in some noise - then fit it, calculating errors on the fit parameters, and displaying the results.

As well as `Sherpa`

and `matplotlib`

, this notebook requires that `scipy`

is also installed (but it's not a major component of the analysis so it can be skipped if you don't have it). If you have
anaconda or miniconda
installed then you can set up an environment to run this notebook with the following commands (assuming a bash shell):

```
% conda create -n=image-test -c sherpa sherpa python=3.5 astropy matplotlib scipy ipython-notebook
...
% source activate image-test
% git clone https://github.com/DougBurke/sherpa-standalone-notebooks
...
% cd sherpa-standalone-notebooks
% jupyter notebook
```

and then select the "simulating a 2D image and a bit of error analysis" option from the web page that should be displayed in your web browser.

This was written by Douglas Burke on June 19 2015. This notebook, and others that may be of interest, can be found on GitHub at https://github.com/DougBurke/sherpa-standalone-notebooks.

The information in this document is placed into the Publc Domain. It is not an official product of the Chandra X-ray Center, and I make no guarantee that it is not without bugs or embarassing typos. Please contact me via the GitHub repository or on Twitter - at @doug_burke - if you have any questions.

I have updated the notebook for the Sherpa 4.8.2 release in September 2016; there are no changes to the notebook, but you can now use Python 3.5 as well as 2.7. Actually, there's one minor change to the notebook, but that is due to a change in the IPython/Jupyter code rather than Sherpa: it is no longer necessary to say

```
import logging
logging.getLogger('sherpa').propagate = False
```

to stop seeing double messages from Sherpa.

When was this notebook last run?

In [1]:

```
import datetime
datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
```

Out[1]:

In [2]:

```
import numpy as np
from matplotlib import pyplot as plt
```

In [3]:

```
%matplotlib inline
```

In [4]:

```
np.random.seed(1)
```

As I am going to simulate an image, I need to define the coordinates
to be used. I could use "pixel" coordinates (e.g. 0 to n-1), but that's not
very interesting, so I have decided to pick a grid with X values between 3000
and 4000, and Y values 4000 to 4800. To keep things simple, I am going to use
square pixels, of side 5, and I am also going to use the Sherpa naming scheme
for the independent axes, so that X is represented by `x0`

, and Y is represented
by `x1`

(for Sherpa, `y`

is the dependent axis). The choice of variable names is
not relevant to Sherpa, but I wanted to keep a consistent scheme since I end
up using `x0`

, `x1`

, and `y`

attributes later on in the notebook.

This all means that I want:

In [5]:

```
x1low = 4000
x1high = 4800
x0low = 3000
x0high = 4000
dx = 5
x1,x0 = np.mgrid[x1low:x1high:dx, x0low:x0high:dx]
```

This turns into a 200 by 160 pixel grid - as shown by

In [6]:

```
x0.shape
```

Out[6]:

which is a reasonable size for testing and development (there's only so long I can re-run these commands when writing the notebook!). The coordinates are going to be the points at which the model is evaluated; I am going to consider them to be the center of each pixel - so that the X range of the image is actually 2997.5 to 4002.5, going from left edge to right edge - but it doesn't really matter as long as you remain consistent in your interpretation$^\dagger$. Note that I am using the support for "unbinned" images in sherpa - that is, evaluate the model at a single point, rather than integrate it over the pixel - so the interpretation of the coordinate positions is important to understand.

$^\dagger$ if I were using ds9 image visualizer to display images - using the optional DS9 image backend in Sherpa - then the converntion would matter, since Sherpa/DS9 make the same assumption on coordinates that I do.

Now that I have a grid, I can define one or more Sherpa model components, and pass them the coordinate grid to create a simulated model. For this notebook I am going to use the sum of two "2D beta" profiles, which is an Astronomy specific name for a profile with a radial variation of

$$f(r) \propto (1 + (r/r_0)^2)^{-\alpha}$$

The model supports an elliptical profile, but I am going to use the circular form, and have the centers of the two components be the same. The important parameters are therefore: the core radius (`r0`

), which sets the scale of the emission; the `alpha`

parameter which determines the slope at large radii (where it's $f(r) \sim r^{-2\alpha}$); and the normalisation (`ampl`

):

In [7]:

```
from sherpa.astro.models import Beta2D
cpt1 = Beta2D('cpt1')
cpt2 = Beta2D('cpt2')
model = cpt1 + cpt2
print(model)
```

As of the Sherpa 4.8.2 release (September 2016), model classes now contain documentation, as shown below:

In [8]:

```
help(Beta2D)
```

I am going to place the source near the center of the grid, but offset slightly, with the first component representing a bright core (large `ampl`

, small `r0`

, and large `alpha`

, so that the emission is centrally concentrated), and the second component is an extended component (larger `r0`

, smaller `alpha`

so that the emission is "flatter", and a lower amplitude):

In [9]:

```
cpt1.xpos, cpt1.ypos = 3512, 4418
cpt2.xpos, cpt2.ypos = cpt1.xpos, cpt1.ypos
cpt1.r0, cpt2.r0 = 30, 120
cpt1.alpha, cpt2.alpha = 4.2, 2.1
cpt1.ampl, cpt2.ampl = 45, 10
print(model)
```

Oops - after going on above that Sherpa uses `x0`

and `x1`

for the independent axes, I then go and show you a model where the position is labelled `xpos`

and `ypos`

. Oh well!

Evaluating the model is just a case of passing in the coordinate grid (so the X and Y axis values) to the model. The axes are assumed to be in 1D format (since you do not have to evaluate the full grid, if you only want a non-rectangular set of points), which means
converting back and forth between 1D and 2D data sets (`flatten`

to go from 2D to 1D and `resize`

to go the other way). I've called the model `mexp`

as it is the expected values for this model (as I'm about to go and add noise to it):

In [10]:

```
mexp = model(x0.flatten(), x1.flatten())
mexp.resize(x0.shape)
```

If I were using the UI level of Sherpa, and not using an IPython notebook, then I would use the `sherpa.ui.image_data`

command to view the data. This brings up the data in an external
viewer - the DS9 imaging and data visualization application - if installed, which allows you to adjust and interact with the data (e.g. to change the color scale and dynamic scaling of the data). For this notebook I am going to use the imaging
support in `matplotlib`

. Here's a view of the evaluated model, using a linear scale (left) and
arcsinh scaling (right; based on the work of Lupton, Gunn & Szalay (1999)). A central source can be seen, that extends out across the image, but the dynamic range is such that it's hard to see the data at the edges and the core.

In [11]:

```
plt.subplot(1,2,1)
plt.imshow(mexp, origin='lower', cmap='cubehelix')
plt.colorbar(orientation='horizontal')
plt.title('linear')
plt.subplot(1,2,2)
plt.imshow(np.arcsinh(mexp), origin='lower', cmap='cubehelix')
plt.colorbar(orientation='horizontal')
_ = plt.title('arcsinh')
```

I use the form `_ = plt.XXX`

on the last line of the cells in these notebooks just to stop the
notebook from displaying the string representation of a matplotlib object, as I find it distracting.

From the lefthand plot I can see that the data ranges from 0 to about 50, but what's the exact range?

In [12]:

```
print("Stats: min = {} max = {}".format(mexp.min(), mexp.max()))
```

I want to simulate this image using Poisson statistics, which is fortunately very easy
to do, using `numpy.random.poisson`

:

In [13]:

```
msim = np.random.poisson(mexp)
print("Stats: min = {} max = {}".format(msim.min(), msim.max()))
```

Unsurprisingly, the result isn't quite as smooth as the model! The quantized nature of the data (all pixels have integer values) does mean that it's a bit easier to see the extended emission than in the earlier plots.

In [14]:

```
plt.imshow(np.arcsinh(msim), origin='lower', cmap='cubehelix')
plt.title('Simulated image')
_ = plt.colorbar()
```

As I have coordinates for the two axes, let's try and include them in the plots (I'm also already tired of setting the `origin`

and `cmap`

keywords). I'm not particularly proud of these routines, as they do rely on hidden state - e.g. the `extent`

array - but they work well in this notebook!

In [15]:

```
x0min, x0max = x0.min(), x0.max()
x1min, x1max = x1.min(), x1.max()
extent = [x0min - dx/2, x0max + dx/2, x1min - dx/2, x1max + dx/2]
def image_data(idata, transform=np.arcsinh, **kwargs):
"""Create an image of idata, which is assumed to be an array of
the same size as x and y. The data is passed through transform,
which defaults to np.arcsinh, if not None.
"""
if transform is not None:
idata = transform(idata)
plt.imshow(idata, extent=extent, origin='lower', cmap='cubehelix', **kwargs)
def image_model(imodel, transform=np.arcsinh, **kwargs):
"""Create an image of the given model, using the x and y grids.
The transform argument is applied to the data before display, if not None.
The default is np.arcsinh.
"""
idata = imodel(x0.ravel(), x1.ravel()).reshape(x0.shape)
image_data(idata, transform=transform, **kwargs)
# because I'm going to want them later on
def vline(x):
(yl,yh) = plt.ylim()
plt.ylim(yl, yh) # force limits
plt.vlines(x, yl, yh)
def hline(y):
(xl,xh) = plt.xlim()
plt.xlim(xl, xh)
plt.hlines(y, xl, xh)
```

Using these, I can display the simulated data:

In [16]:

```
image_data(msim)
```

I can also compare the simulated data (top left) to the full model (bottom left) and the individual components (right column):

In [17]:

```
plt.subplot(2,2,1)
image_data(msim, vmin=0, vmax=4.5)
plt.subplot(2,2,3)
image_model(model, vmin=0, vmax=4.5)
plt.subplot(2,2,2)
image_model(cpt1, vmin=0, vmax=4.5)
plt.subplot(2,2,4)
image_model(cpt2, vmin=0, vmax=4.5)
```

To fit a model to the simulated data I need a data set (`Data2D`

), statistic (in this case the `Cash`

statistic, a Maximum Likelihood statistic for Poisson data, from
"Parameter estimation in astronomy through application of
the likelihood ratio", Cash, W. 1979, ApJ 228, 939), optimisation method (`NelderMead`

), and the `Fit`

object to control it all.

In [18]:

```
from sherpa.data import Data2D
from sherpa.stats import Cash
from sherpa.optmethods import NelderMead
from sherpa.fit import Fit
```

To start off, I create the data object. It needs a name, the independent axes (`x0`

and `x1`

), the dependent axis (`msim`

), and then I also give the shape of the image. The shape isn't required here, but I do take advantage of it later on- in cell 29 - when creating images. Although in this case the grid is rectangular, with no missing data, Sherpa will happily accept sparse, or irregularly gridded, data.

In [19]:

```
d = Data2D('sim', x0.flatten(), x1.flatten(), msim.flatten(), shape=x0.shape)
```

I am going to fit the simulated image with the same model used to create it. I access the components using the `parts`

attribute of the combined model - mainly to show that it can be done, since in this case there's no real benefit over writing the more-obvious form:

```
m1 = Beta2D('m1')
m2 = Beta2D('m2')
mdl_fit = m1 + m2
```

In [20]:

```
mdl_fit = Beta2D('m1') + Beta2D('m2')
(m1, m2) = mdl_fit.parts
```

The default parameters - which are the same as the first time that I called `print(model)`

in this notebook, in cell 7 - are not close to the best-fit parameters. Even worse, the two components have the same starting values, which means that the optimiser is going to have a tough time disentangling the two components. There's a number of ways I could calculate starting values for these parameters, and if I were doing this on real data I'd be interested to see how sensitive the fit is to the starting position, to make sure I had found the global minimum$^\dagger$.

I am going to set the peak position to the peak of the distribution - formed by summing the data up over the rows and columns, respectively - then use a completely ad-hoc guess for the core radii and amplitudes, based on the image extent and data values. First, I create the profiles along the X and Y axes.

$^\dagger$ In early versions of this notebook I did try starting from the default values. It required several fits and tweaks to nudge the fit into a good solution. Since my aim here is not to discuss the fine details of model fitting, and also because the previous approach took a long time to converge, I've decided to "help out" the optimiser.

In [21]:

```
ysum = msim.sum(axis=0)
xsum = msim.sum(axis=1)
```

With these arrays, I can find the coordinate of the maximum point:

In [22]:

```
xguess = x0[0, np.argmax(ysum)]
yguess = x1[np.argmax(xsum), 0]
```

This can perhaps be seen better graphically (note that the actual center is 3512,4418, but the grid is evaluated only every 5 pixels, which is why the guesses are both divisible by 5).

In [23]:

```
plt.subplot(2, 1, 1)
plt.plot(x0[0,:], ysum)
vline(xguess)
plt.legend(['data', 'x={}'.format(xguess)])
plt.subplot(2, 1, 2)
plt.plot(x1[:,0], xsum)
vline(yguess)
_ = plt.legend(['data', 'y={}'.format(yguess)])
```

I use this value for the position, and link the second component so that it uses the same values:

In [24]:

```
m1.xpos = xguess
m1.ypos = yguess
m2.xpos = m1.xpos
m2.ypos = m1.ypos
```

I am going to define the first component to have an amplitude equal to the maximum pixel value in the data, and the second one to be one-tenth that (this is completely arbitrary).

If I had said

```
m2.ampl = m1.ampl / 10.0
```

then the amplitude of the second component would have been fixed to be one-tenth that of the first (that is, parameters can be linked by an expression, not just a value).

In [25]:

```
m1.ampl = msim.max()
m2.ampl = msim.max() / 10.0
```

The core radii are taken from the height of the image (since it's the smallest axis):

In [26]:

```
m1.r0 = (x1max - x1min) / 10.0
m2.r0 = (x1max - x1min) / 4.0
```

The resulting set of parameters is:

In [27]:

```
print(mdl_fit)
```

Note that the `m2.xpos`

and `m2.ypos`

parameters are marked as `linked`

; that is, they are not free parameters themselves but are calculated from an expression based on other parameters (in this case the position of the first component).

The fit is started by creating a `Fit`

object, giving the data, model, statistic, and optimization method (`LevMar`

is generally faster than `NelderMead`

but it isn't really suited to this type of statistic, and will often stop prematurely).

In [28]:

```
f = Fit(d, mdl_fit, Cash(), NelderMead())
```

This has not performed a fit yet! For that, I need to call the `fit`

method, which can take some time:

In [29]:

```
res1 = f.fit()
print(res1)
```

I am going to repeat the fit, to check that the optimiser hasn't stopped early, and also to show how to change the format of the return value of the `fit`

method:

In [30]:

```
res2 = f.fit()
print(res2.format())
```

The second fit only changed the statistic by a very small amount ($\ll 10^{-5}$) which suggests that the fit is in a local minimum.

How can I visualize this? First, I evaluate the model, using the `eval_model`

method of the data set:

In [31]:

```
meval = d.eval_model(mdl_fit).reshape(d.shape)
```

I could have given the grid to `mdl_fit`

directly - as I did when creating `msim`

and `mexp`

- but wanted to show how to use the dataset. This also takes advantage of the `shape`

attribute I set up when calling `Data2D`

in cell 19.

I can use the routines I set up earlier to display the best-fit model:

In [32]:

```
image_data(meval)
_ = plt.title('Best fit')
```

I can also view the residuals - i.e. `data - model`

- this time with no transform as the range of values is small and is centered on zero:

In [33]:

```
image_data(msim - meval, transform=None)
plt.title('Residuals: data - model')
_ = plt.colorbar()
```

Interpreting this residual image can be hard, since the variation is larger where the signal is larger, but is there any statistically-significant structure here? I would normally try things like smoothing the residuals, looking for structure, and comparing to the model and data. For now, let's just look at a heavily-smoothed version if the residuals (taking advantage of
SciPy's `gaussian_filter`

function).

In [34]:

```
from scipy import ndimage
```

In [35]:

```
smoothed = ndimage.gaussian_filter(msim - meval, sigma=(5, 5), order=0)
```

In [36]:

```
image_data(smoothed, transform=None)
plt.title('Smoothed residuals')
_ = plt.colorbar()
```

Note that, due to the heavy smoothing, the scale (the dynamic range of the image) is significantly smaller than the previous residual plot. We can compare the two with the same scale, but the result isn't particularly compelling (I think it's saying there's not much structure in the residuals, which is unsurprising given that this is a simualation).

In [37]:

```
plt.subplot(1, 2, 1)
image_data(msim - meval, transform=None, vmin=-5, vmax=5)
plt.subplot(1, 2, 2)
image_data(smoothed, transform=None, vmin=-5, vmax=5)
```