Notebook

PIG 12 - High-level interface examples¶

This notebook shows how the most common use cases for IACT analysis with Gammapy would be done with the proposed high-level interface for PIG 12.

Basic idea is to have a single config-driven analysis.
And a single Analysis class which is the "driver" or "manager" for one analysis.
Very similar to end-user interface of Fermipy or HAP in HESS.

What is an analysis?¶

An analysis is a single analysis and model, not parameterised variations.
An analysis is for a single region (e.g. 10 deg region with 5 sources), you can't run the GPS or all-sky pipeline via this solution.
Making an SED is still part of one Analysis, because many SED methods require the full model and all energy data.
Making a lightcurve could be part of one Analysis (like it is in Fermi), or could be a higher-level, creating on Analysis and running it for each time bin. This exercise of pro / con is left to the LC PIG. Christoph expects that datasets serialisation is much simpler if LC / time bins isn't part of serialisation for one analysis, i.e. it would be something higher-level.
Source detection with iteratively adding sources is not the responsibility of Analysis. It's higher-level, could use Analysis or not, not discussed here.

Who does what?¶

The idea is to have a layer of "Tools" that do things:

Analysis is the boss, manager, driver
Then there's a middle management agents/tools (e.g. MapMaker, ReflectedBgEstimator, ...)
Then there's minions or data structures (e.g. DataSet)

TODO: design how these interact, not clear who drives what. Very important: what's the responsibility of Analysis vs Observations vs Datasets vs Fit

Main analyses¶

3D map analysis
2d map analysis (same as 3D?)
1D spec analysis

Different cases:

different background models
joint vs stacked
on vs on/off
simulate or fit

After analysis or in between steps¶

spectral points (for any analysis)
lightcurve?
diagnostics (residuals, significance, TS)

Config file¶

Example config file. We will develop a schema and validate / give good error messages on read.

analysis:
    process:
        # add options to allow either in-memory or disk-based processing
        out_folder: "."  # default is current working directory
        store_per_obs: {true, false}
    reduce:
        type: {"1D, "3D"}
        stacked: {true, false}
        background: {"irf", "reflected", "ring"}
        roi: max_offset
        exclusion: exclusion.fits
    fit:
        energy_min, energy_max
    logging:
        level: debug

dataspace:
    spatial: center, width, binsz
    energy: min, max, nbins
    time: min, max
    # PSF RAD and EDISP MIGRA not exposed for now
    # Per-obs energy_safe and roi_max ?

model:
    # Model configuration will mostly be designed in different PIG
    sources:
        source_1: spectrum: powerlaw, spatial: shell
        diffuse: gal_diffuse.fits
    background:
        IRF


# If Gammapy has configurable maker classes, there could be a generic
# way to overwrite parameters from them from the config file.
# Make every "maker" class in Gammapy sub-class of a "Tool" or "Maker" base class
# that does config handling in a uniform way?
reflected_background_maker:
    n_regions: 99

meta:
    # User comments, not used by framework
    target_name: 
    description:

User workflow¶

Typically you would start like this:

$ mkdir analysis99
$ cd analysis99
$ edit gammapy_analysis_config.yaml

Then you would type ipython or juypter and put the code below.

IF we change observation selection to be also config-file driven, we could add a gammapy analysis data_reduction or gammapy analysis optimise which does the slow parts before going to ipython or jupyter, using all the information from the config file.

For example, gammapy analysis data_reduction would do this:

from gammapy import Analysis
analysis = Analysis() # uses the default config filename, or could look for it or take it as an argument for the CLI
analysis.data_reduction()
analysis.write()

and gammapy analysis optimise would do this:

from gammapy import Analysis
analysis.read() # recover the state after data reduction
# Assumes we have serialisation and e.g. "analysis_state.yaml"
analysis.optimise()
analysis.write()

To generate the config file, we could add a gammapy analysis config which dumps the config file with all lines commented out, and the user can then fill in the numbers they care about. Alternative: users would copy & paste from config file example in the docs.

Data / observation selection¶

For now, we don't put this in the config file. It's something the user does before, keep as-is for now.

There are some advantages to do observation selection automatically, but it's hard to make if very flexible and satisfy all the ways users might want to select observations (e.g. "near Crab nebula" or "from 2017" or "only good quality" or by zenith angle, ...

Some observation selection / discarding will happen automatically, e.g. we will not process runs that are 100 deg away from the target position.

In [ ]:

from gammapy.data import DataStore

data_store = DataStore.from_dir("$GAMMAPY_DATA/cta-1dc/index/gps/")
obs_ids = [110380, 111140, 111159]
observations = data_store.get_observations(obs_ids)

The Analysis class¶

Basic idea is to have one high-level Analysis class for the end user.

In [ ]:

from gammapy import Analysis

analysis = Analysis(config, observations)

# analysis.observations: gammapy.data.Observations

analysis.reduce_data()  # often slow, can be hours

# analysis.datasets: gammapy.datasets.Datasets

# If the user wants they can save all results from data reduction
# and re-start later. This stores config, datasets, ... all the
# analysis class state.
# analysis.write()
# analysis = Analysis.read())

analysis.optimise()  # often slow, can be hours

# Again, we could write and read, do the slow things only once.
# e.g. supervisor comes in and asks about significance of some
# model component or whatever

# Everything is accessible via the "analysis"
# It's like the Analysis in Fermipy or Sherpa "session" or "HESSArray" in HAP
# a global object that gives you access to everything.
# Method calls modify data members (e.g. models), but in between method calls
# advanced users can do a lot of custom processing.
# Many advanced use cases can be done with the Analysis API.
profile = analysis.model("source_42").spectrum.plot()

# Do we need energy_binning for the SED points in config or only here?
sed = analysis.spectral_points("source_42", energy_binning)

How Analysis interacts with lower-level code¶

One example of what analysis.reduce_data could do:

class Analysis:
    def reduce_data(self):
        maker = DataReduction.from_config(self.config)
        maker.run()
        self.datasets = maker.datasets

A different way would be like this:

class Analysis:
    def reduce_data(self):
        config = self.make_data_reduction_config(self.config)
        maker = self.make_data_recution_class(config)
        maker.run()
        self.datasets = maker.datasets

Either way, we will need "registries" mapping options to classes, e.g.:

DATA_REDUCERS = {
    "1d": "gammapy.spectrum.SpectrumExtration",
    "3d": "gammapy.cube.MapMaker",
}

By having registries (i.e. Python dicts), both within Gammapy and users can add their own, we are a "framework" that is extensible even at runtime.

For this to work, we need to have a limited number or data containers (e.g. DataSet subclasses), because "makers" require a certain structure of containers, modify them in-place or make new containers.

Example of a Tool¶

As an example, let's see what the MapMaker would look like.

TODO: how do we compose lower-level Tools into Analysis chains? Who creates the Tool objects when?

Analysis needs to know how to turn config into Python objects. E.g. for geom, but also for exclusion_mask: exclusion.fits, make a Map object

In [ ]:

# There's some base class, similar to what ctapipe has
# Base class: uniform scheme (ABC?), parameter handling/validation, logging, provenance
from gammapy import Tool

class MapMaker(Tool):
    config_spec = {
        offset_max = "2 deg"
        geom = "???" # is this part of the parameters, or passed to run?
    }
    def __init__(self, config):
        self.config = config
    
    def run(self, observations):
        datasets = Datasets()
        for observation in observations:
            dataset = self.process(dataset)
            datasets.append(dataset)
        return datasets

Serialisation¶

The read and write would work like this:

Everyone in Gammapy knows how to serialise themselves
Composite objects serialise each part.

Example:

class Analysis:
    def write(self):
        out_path = self.out_path
        self.config.write(out_path)
        if self.datasets is not None:
            self.datasets.write(out_path)
        if self.the_other_thing is not None:
            self.the_other_thing.write(out_path)

        self.analysis_state.write("analysis_state.yaml")
    def read(self, out_path="."):
        state = AnalysisState.read("analysis_state.yaml")
        if state.has_config()
            self.config = state.read_config()
        if state.has_datasets():
            self.datasets = state.read_datasets()
        if state.has_the_other_thing():
            self.datasets = state.read_the_other_thing()

A lot of this is on Dataset and Datasets, and Model, serialisation would always call down to parts of composite objects.

Serialisation will be a mix of YAML (for models and "index" files and FITS files for maps etc.)

In v1.0, we will not have a framework supporting different serialisation backends, we will have one way.

TODO¶

What about lightcurve?
What about simulate / fake?
What about flexible background modeling?
How would one write an interative detection method on top of this?

In [ ]: