This notebook shows how the most common use cases for IACT analysis with Gammapy would be done with the proposed high-level interface for PIG 12.
Analysis
class which is the "driver" or "manager" for one analysis.The idea is to have a layer of "Tools" that do things:
TODO: design how these interact, not clear who drives what.
Very important: what's the responsibility of Analysis
vs Observations
vs Datasets
vs Fit
Different cases:
Example config file. We will develop a schema and validate / give good error messages on read.
analysis:
process:
# add options to allow either in-memory or disk-based processing
out_folder: "." # default is current working directory
store_per_obs: {true, false}
reduce:
type: {"1D, "3D"}
stacked: {true, false}
background: {"irf", "reflected", "ring"}
roi: max_offset
exclusion: exclusion.fits
fit:
energy_min, energy_max
logging:
level: debug
dataspace:
spatial: center, width, binsz
energy: min, max, nbins
time: min, max
# PSF RAD and EDISP MIGRA not exposed for now
# Per-obs energy_safe and roi_max ?
model:
# Model configuration will mostly be designed in different PIG
sources:
source_1: spectrum: powerlaw, spatial: shell
diffuse: gal_diffuse.fits
background:
IRF
# If Gammapy has configurable maker classes, there could be a generic
# way to overwrite parameters from them from the config file.
# Make every "maker" class in Gammapy sub-class of a "Tool" or "Maker" base class
# that does config handling in a uniform way?
reflected_background_maker:
n_regions: 99
meta:
# User comments, not used by framework
target_name:
description:
Typically you would start like this:
$ mkdir analysis99
$ cd analysis99
$ edit gammapy_analysis_config.yaml
Then you would type ipython
or juypter
and put the code below.
IF we change observation selection to be also config-file driven,
we could add a gammapy analysis data_reduction
or gammapy analysis optimise
which does the slow parts before going to ipython
or jupyter
,
using all the information from the config file.
For example, gammapy analysis data_reduction
would do this:
from gammapy import Analysis
analysis = Analysis() # uses the default config filename, or could look for it or take it as an argument for the CLI
analysis.data_reduction()
analysis.write()
and gammapy analysis optimise
would do this:
from gammapy import Analysis
analysis.read() # recover the state after data reduction
# Assumes we have serialisation and e.g. "analysis_state.yaml"
analysis.optimise()
analysis.write()
To generate the config file, we could add a gammapy analysis config
which dumps the config file with all lines commented out, and the user
can then fill in the numbers they care about. Alternative: users would
copy & paste from config file example in the docs.
For now, we don't put this in the config file. It's something the user does before, keep as-is for now.
There are some advantages to do observation selection automatically, but it's hard to make if very flexible and satisfy all the ways users might want to select observations (e.g. "near Crab nebula" or "from 2017" or "only good quality" or by zenith angle, ...
Some observation selection / discarding will happen automatically, e.g. we will not process runs that are 100 deg away from the target position.
from gammapy.data import DataStore
data_store = DataStore.from_dir("$GAMMAPY_DATA/cta-1dc/index/gps/")
obs_ids = [110380, 111140, 111159]
observations = data_store.get_observations(obs_ids)
Basic idea is to have one high-level Analysis
class for the end user.
from gammapy import Analysis
analysis = Analysis(config, observations)
# analysis.observations: gammapy.data.Observations
analysis.reduce_data() # often slow, can be hours
# analysis.datasets: gammapy.datasets.Datasets
# If the user wants they can save all results from data reduction
# and re-start later. This stores config, datasets, ... all the
# analysis class state.
# analysis.write()
# analysis = Analysis.read())
analysis.optimise() # often slow, can be hours
# Again, we could write and read, do the slow things only once.
# e.g. supervisor comes in and asks about significance of some
# model component or whatever
# Everything is accessible via the "analysis"
# It's like the Analysis in Fermipy or Sherpa "session" or "HESSArray" in HAP
# a global object that gives you access to everything.
# Method calls modify data members (e.g. models), but in between method calls
# advanced users can do a lot of custom processing.
# Many advanced use cases can be done with the Analysis API.
profile = analysis.model("source_42").spectrum.plot()
# Do we need energy_binning for the SED points in config or only here?
sed = analysis.spectral_points("source_42", energy_binning)
One example of what analysis.reduce_data
could do:
class Analysis:
def reduce_data(self):
maker = DataReduction.from_config(self.config)
maker.run()
self.datasets = maker.datasets
A different way would be like this:
class Analysis:
def reduce_data(self):
config = self.make_data_reduction_config(self.config)
maker = self.make_data_recution_class(config)
maker.run()
self.datasets = maker.datasets
Either way, we will need "registries" mapping options to classes, e.g.:
DATA_REDUCERS = {
"1d": "gammapy.spectrum.SpectrumExtration",
"3d": "gammapy.cube.MapMaker",
}
By having registries (i.e. Python dicts), both within Gammapy and users can add their own, we are a "framework" that is extensible even at runtime.
For this to work, we need to have a limited number or data containers (e.g. DataSet subclasses), because "makers" require a certain structure of containers, modify them in-place or make new containers.
As an example, let's see what the MapMaker would look like.
TODO: how do we compose lower-level Tools into Analysis chains? Who creates the Tool objects when?
Analysis needs to know how to turn config
into Python objects.
E.g. for geom
, but also for exclusion_mask: exclusion.fits
, make a Map object
# There's some base class, similar to what ctapipe has
# Base class: uniform scheme (ABC?), parameter handling/validation, logging, provenance
from gammapy import Tool
class MapMaker(Tool):
config_spec = {
offset_max = "2 deg"
geom = "???" # is this part of the parameters, or passed to run?
}
def __init__(self, config):
self.config = config
def run(self, observations):
datasets = Datasets()
for observation in observations:
dataset = self.process(dataset)
datasets.append(dataset)
return datasets
The read and write would work like this:
Example:
class Analysis:
def write(self):
out_path = self.out_path
self.config.write(out_path)
if self.datasets is not None:
self.datasets.write(out_path)
if self.the_other_thing is not None:
self.the_other_thing.write(out_path)
self.analysis_state.write("analysis_state.yaml")
def read(self, out_path="."):
state = AnalysisState.read("analysis_state.yaml")
if state.has_config()
self.config = state.read_config()
if state.has_datasets():
self.datasets = state.read_datasets()
if state.has_the_other_thing():
self.datasets = state.read_the_other_thing()
A lot of this is on Dataset
and Datasets
, and Model
, serialisation would always call down to parts of composite objects.
Serialisation will be a mix of YAML (for models and "index" files and FITS files for maps etc.)
In v1.0, we will not have a framework supporting different serialisation backends, we will have one way.