Notebook

How to rediscover the Higgs boson yourself - with a BDT!¶

This notebook uses ATLAS Open Data http://opendata.atlas.cern to show you the steps to apply Machine Learning in search for the Higgs boson!

ATLAS Open Data provides open access to proton-proton collision data at the LHC for educational purposes. ATLAS Open Data resources are ideal for high-school, undergraduate and postgraduate students.

Notebooks are web applications that allow you to create and share documents that can contain for example:

live code
visualisations
narrative text

This notebook builds on HZZAnalysis.ipynb in the same folder as this notebook.

HZZAnalysis.ipynb loosely follows the discovery of the Higgs boson by ATLAS (mostly Section 4 and 4.1)

Notebooks are a perfect platform to develop Machine Learning for your work, since you'll need exactly those 3 things: code, visualisations and narrative text!

We're interested in Machine Learning because we can design an algorithm to figure out for itself how to do various analyses, potentially saving us countless human-hours of design and analysis work.

Machine Learning use within ATLAS includes:

particle tracking
particle identification
signal/background classification
and more!

This notebook will focus on signal/background classification.

By the end of this notebook you will be able to:

run a Boosted Decision Tree to classify signal and background
know some things you can change to improve your Boosted Decision Tree

Feynman diagram pictures are borrowed from our friends at https://www.particlezoo.net

Contents:

Running a Jupyter notebook
First time setup on your computer (no need on mybinder)
To setup everytime
Lumi, fraction, file path
Samples
Changing a cut
Applying a cut
Optimisation
Boosted Decision Tree (BDT)
Training and Testing split
Training Decision Trees
Assessing a Classifier's Performance
Receiver Operating Characteristic (ROC) curve
Overtraining check
Optimisation
Going further

Running a Jupyter notebook¶

To run the whole Jupyter notebook, in the top menu click Cell -> Run All.

To propagate a change you've made to a piece of code, click Cell -> Run All Below.

You can also run a single code cell, by clicking Cell -> Run Cells, or using the keyboard shortcut Shift+Enter.

First time setup on your computer (no need on mybinder)¶

This first cell only needs to be run the first time you open this notebook on your computer.

If you close Jupyter and re-open on the same computer, you won't need to run this first cell again.

If you open on mybinder, you don't need to run this cell.

In [ ]:

import sys
!{sys.executable} -m pip install --upgrade --user pip # update the pip package installer
!{sys.executable} -m pip install uproot3 pandas numpy matplotlib sklearn --user # install required packages

How to rediscover the Higgs boson yourself - with a BDT!¶

Running a Jupyter notebook¶

First time setup on your computer (no need on mybinder)¶

To setup everytime¶

Lumi, fraction, file path¶

Samples¶

Changing a cut¶

Applying a cut¶

Optimisation¶

Boosted Decision Tree (BDT)¶

The Training and Testing split¶

Training Decision Trees¶

Assessing a Classifier's Performance¶

Receiver Operarting Characteristic (ROC) curve for BDT¶

BDT Overtraining Check¶

BDT Optimisation¶

Going further¶