I was recently working on a way to characterize the draws from a MCMC chain - in this case generated with pyBLoCXS but that's only relevant here because we are interested in Poisson-distributed data, hence the Gamma function - and came up with a simple user model for Sherpa. I thought it would be make a good example case, and decided to include an example of using the "low-level" API of Sherpa to fit data.

For this example I'm writing everything in an IPython notebook, and the easiest way to do that is to use the binary form of the standalone Sherpa release, using Anaconda Python:

```
% conda create -n sherpa_usermodel -c sherpa python=3.5 sherpa astropy matplotlib scipy ipython-notebook
% source activate sherpa_usermodel
% git clone https://github.com/DougBurke/sherpa-standalone-notebooks
% cd sherpa-standalone-notebooks
% jupyter notebook
```

The code will also run if you built the code directly from https://github.com/sherpa/sherpa or
if you use the Sherpa provided with CIAO - although
in the latter case you would use
ChIPS
rather than matplotlib for plotting and would not have access to `scipy.stats`

for
the simulation (try
`numpy.random.gamma`

instead).

I have written a follow-up to this, available at an integrated user model. This was written on May 4 2015.

I then had a brain wave and decided to talk about plotting in Sherpa when using the low-level API. This was written on May 5 2015.

I have updated the notebook for the Sherpa 4.8.2 release in September 2016; there are no changes to the notebook, but you can now use Python 3.5 as well as 2.7. Actually, there's one minor change to the notebook, but that is due to a change in the IPython/Jupyter code rather than Sherpa: it is no longer necessary to say

```
import logging
logging.getLogger('sherpa').propagate = False
```

to stop seeing double messages from Sherpa.

This was written by Douglas Burke on May 26 2015. This notebook, and others that may be of interest, can be found on GitHub at https://github.com/DougBurke/sherpa-standalone-notebooks.

The information in this document is placed into the Publc Domain. It is not an official product of the Chandra X-ray Center, and I make no guarantee that it is not without bugs or embarassing typos. Please contact me via the GitHub repository or on Twitter - at @doug_burke - if you have any questions.

When was this notebook last run?

In [1]:

```
import datetime
datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
```

Out[1]:

In [2]:

```
import scipy.stats
import matplotlib.pyplot as plt
import numpy as np
```

In [3]:

```
%matplotlib inline
```

In [4]:

```
np.random.seed(1)
```

For this example, I am going to simulate values from a Gamma distribution using
`scipy.stats.gamma`

.
Following the
Wikipedia page I use the names `k`

and `theta`

for the parameters of the distribution; fortunately they map onto the `a`

and `scale`

values used
by `scipy.stats.gamma`

.

The mean of the distribution is just `k * theta`

, so `2.86`

here:

In [5]:

```
k_orig = 1.1
theta_orig = 2.6
ysim = scipy.stats.gamma.rvs(a=k_orig, scale=theta_orig, size=1000)
```

Let's have a quick look at the data:

In [6]:

```
plt.plot(ysim)
```

Out[6]:

In [7]:

```
plt.hist(ysim)
```

Out[7]:

`ysim`

):

In [8]:

```
xcdf = ysim.copy()
xcdf.sort()
ycdf = np.arange(1, xcdf.size+1) * 1.0 / xcdf.size
plt.plot(xcdf, ycdf)
plt.xlabel('$\Gamma$')
plt.ylabel('CDF')
```

Out[8]:

`igam`

routine
which calculates what we need, so I can write the following (I could
have also used `scipy.stats`

routines, but I wanted this to be usable
from Sherpa in CIAO):

In [9]:

```
import sherpa.utils
```

In [10]:

```
def calc_gamma_cdf(x, k, theta):
"""Return the CDF of a gamma distribution.
The cumulative distribution function (CDF) of the gamma
distribution is defined in [1]_ as::
cdf(x;k,theta) = incomplete_gamma(k, x/theta)
----------------------------
gamma(k)
Parameters
----------
x : array
The values at which to evaluate the CDF.
k : number
The shape parameter, which must be greater than 0.
theta : number
The scale parameter.
Returns
-------
cdf : array
The CDF evaluated at each element of x.
Notes
-----
The mean of the distribution is given by ``k * theta``,
and the rate is ``1 / theta``.
References
----------
.. [1] http://en.wikipedia.org/wiki/Gamma_distribution
"""
# Unfortunately igam does not accept a Numpy array as the
# second argument, so need to map over the input array.
#
tval = theta * 1.0
kval = k * 1.0
x = np.asarray(x)
out = np.zeros_like(x)
for i,xi in enumerate(x):
# igam is the "regularized" incomplete gamma (lower)
# form, so already has the normalization by gamma(a)
# included.
out[i] = sherpa.utils.igam(kval, xi/tval)
return out
```

I can use this to plot the actual distribution on the simulated one:

In [11]:

```
plt.plot(xcdf, ycdf)
plt.plot(xcdf, calc_gamma_cdf(xcdf, k_orig, theta_orig))
```

Out[11]:

The Quantile-Quantile plot also looks good (as it should here ;-):

In [12]:

```
plt.plot(ycdf, calc_gamma_cdf(xcdf,k_orig,theta_orig))
plt.plot([0,1],[0,1], 'k--')
plt.title('Q-Q plot')
```

Out[12]:

Using `calc_gamma_cdf`

I can now create a Sherpa model that can be used to
fit the CDF data. Rather than use the
`load_user_model`

function,
I have decided to use a class-based approach. The advantage is that this model then
behaves in the same manner as existing Sherpa models - so that you can create multiple
instances of it directly - at the expense of a little-more set-up work.

To do this, I need to import the Sherpa `sherpa.models.model`

module.
In this case I am being explicit about the module names, to make it easier to see where
symbols are defined, but normally I would say something like

`from sherpa.models import model`

In [13]:

```
import sherpa.models.model
```

With this, I can create the `GammaCDF`

class. The model has two parameters, which
I name `k`

and `theta`

, and set initial parameters and limits (the limit is inclusive,
which is not ideal here because both should be `> 0`

rather than `>= 0`

, but let's
see how this works). The important parts are `__init__`

, which sets everything
up, and `calc`

which is used to evaluate the model at a set of points. For this
example I am going to assume that if it is used with an "integrated" 1D data set - that
is, a data set where the signal is integrated between bin limits, rather than just
a value at a point - then the code should just use the mid-point of the bin. The
`modelCacher1d`

function decorator tells the system that the results of the model can
be cached when all the parameter values are frozen (e.g. during a fit).

In "production" code, I'd include checks to make sure it's not used with a 2D data set!

In [14]:

```
class GammaCDF(sherpa.models.model.ArithmeticModel):
"""A Gamma CDF.
The cumulative distribution function (CDF) for the Gamma
distribution, as defined by [1]_, is::
cdf(x;k,theta) = incomplete_gamma(k, x/theta)
----------------------------
gamma(k)
The model parameters are:
k
The shape parameter, which must be greater than 0.
theta
The scale parameter.
Notes
-----
The mean of the distribution is given by ``k * theta``,
and the rate is ``1 / theta``.
References
----------
.. [1] http://en.wikipedia.org/wiki/Gamma_distribution
"""
def __init__(self, name='gammacdf'):
# It would be nice to force > 0 rather than >=0.
# Perhaps should just use a small value, e.g.
# sherpa.models.parameter.tinyval?
self.k = sherpa.models.model.Parameter(name, 'k', 5, min=0, hard_min=0)
self.theta = sherpa.models.model.Parameter(name, 'theta', 2, min=0, hard_min=0)
sherpa.models.model.ArithmeticModel.__init__(self, name, (self.k, self.theta))
@sherpa.models.model.modelCacher1d
def calc(self, pars, x, *args, **kwargs):
(k, theta) = pars
if len(args) == 1:
x = (x + args[0]) / 2.0
return calc_gamma_cdf(x, k, theta)
```

This model can be used to fit the data. I'll show this two ways

using the low-level API, where data management is not automatic

using the high-level UI which most users of Sherpa are used to.

In [15]:

```
from sherpa.data import Data1D
from sherpa.stats import LeastSq
from sherpa.optmethods import LevMar
from sherpa.fit import Fit
```

`cdf`

.

In [16]:

```
d = Data1D('cdf', xcdf, ycdf)
```

`GammaCDF`

model, so I need an instance (the name need not be the same as the variable name, but it makes tracking
things a bit easier!):

In [17]:

```
cdf = GammaCDF('cdf')
```

`Fit`

object, selecting the statistic - I use the least-squared
version (`LeastSq`

) since I have no errors - and the optimiser, for which I use the
Levenberg-Marquardt method (as provided by the `LevMar`

class):

In [18]:

```
f = Fit(d, cdf, LeastSq(), LevMar())
```

The fit can then be "run" and the output stored away. For this case the optimiser converged.

In [19]:

```
res = f.fit()
res.succeeded
```

Out[19]:

`fit`

method returns a lot if useful information, including the best-fit parameter values:

In [20]:

```
print(res)
```

However, we can also access these values directly from the model:

In [21]:

```
print(cdf)
```

and compare them to the input values:

In [22]:

```
print(cdf.k.val, cdf.theta.val)
print(k_orig, theta_orig)
```

In [23]:

```
plt.plot(xcdf, ycdf - cdf(xcdf))
plt.axhline()
plt.ylabel('$\Delta$ CDF')
plt.xlabel('$\Gamma$')
```

Out[23]:

The QQ plot looks good, as might be expected from the residuals:

In [24]:

```
plt.plot(ycdf, cdf(xcdf))
plt.plot([0,1], [0,1], 'k--')
plt.xlabel('Measured')
plt.ylabel('Predicted')
plt.title('QQ plot')
```

Out[24]:

`sherpa.ui`

module,
but the "Astronomy-specific" module `sherpa.astro.ui`

can also be used.

In [25]:

```
import sherpa.ui
```

The model needs to be added to Sherpa using `add_model`

:

In [26]:

```
sherpa.ui.add_model(GammaCDF)
```

Let's select the least-square statistic, by name:

In [27]:

```
sherpa.ui.set_stat('leastsq')
```

Rather than create a `Data1D`

object, here I use the default data set - which is
labelled `1`

:

In [28]:

```
sherpa.ui.load_arrays(1, xcdf, ycdf)
```

`leastsq`

,
so I change this behavior before displaying the data (although, even with this setting, the code still
complains to you, which is something I need to send a bug report about):

In [29]:

```
pprefs = sherpa.ui.get_data_plot_prefs()
pprefs['yerrorbars'] = False
sherpa.ui.plot_data()
```

`set_source`

. Since `add_model`

was used,
I can create an instance of the `GammaCDF`

model using the syntax `modelname.cptname`

, where
`modelname`

is in lower case. To allow comparison with the eariler fit, I name the component
`cpt2`

:

In [30]:

```
sherpa.ui.set_source(gammacdf.cdf2)
```

In [31]:

```
print(cdf2)
```

The defailt optimiser is `LevMar`

, so I do not need to set it - which would have been the following
call

```
set_method('levmar')
```

Instead I can just go straight to fitting the data:

In [32]:

```
sherpa.ui.fit()
```

and view the results:

In [33]:

```
sherpa.ui.plot_fit()
```

Unfortunately it's not that obvious what is going on here, so let's look at the residual plot:

In [34]:

```
sherpa.ui.plot_resid()
```

In [35]:

```
resid = sherpa.ui.get_resid_plot()
```

and this can be used to create the residual plot, which should look familiar:

In [36]:

```
plt.plot(resid.x, resid.y)
plt.axhline()
plt.ylabel('$\Delta$ (CDF)')
plt.xlabel('$\Gamma$')
```

Out[36]:

In [37]:

```
print(cdf)
print(cdf2)
```

`add_model`

command, and direct access to the low-level API provided by Sherpa.