The MAGIC telescope is a Cherenkov telescope situated on La Palma, one of the Canary Islands. The MAGIC machine learning dataset can be obtained from UC Irvine Machine Learning Repository.
The task is to separate signal events (gamma showers) and background events (hadron showers) based on the features of a measured Cherenkov shower.
The features of a shower are:
1. fLength: continuous # major axis of ellipse [mm]
2. fWidth: continuous # minor axis of ellipse [mm]
3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]
4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]
5. fConc1: continuous # ratio of highest pixel over fSize [ratio]
6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]
7. fM3Long: continuous # 3rd root of third moment along major axis [mm]
8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]
9. fAlpha: continuous # angle of major axis with vector to origin [deg]
10. fDist: continuous # distance from origin to center of ellipse [mm]
11. class: g,h # gamma (signal), hadron (background)
g = gamma (signal): 12332 h = hadron (background): 6688
For technical reasons, the number of h events is underestimated. In the real data, the h class represents the majority of the events.
You can find further information about the MAGIC telescope and the data discrimination studies in the following paper (R. K. Bock et al., "Methods for multidimensional event classification: a case studyusing images from a Cherenkov gamma-ray telescope" NIM A 516 (2004) 511-528) (You need to be within the university network to get free access.)
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
filename = "https://www.physi.uni-heidelberg.de/~reygers/lectures/2021/ml/data/magic04_data.txt"
df = pd.read_csv(filename, engine='python')
# use categories 1 and 0 insted of "g" and "h"
df['class'] = df['class'].map({'g': 1, 'h': 0})
df.head()
import matplotlib.pyplot as plt
df0 = df[df['class'] == 0] # hadron data set
df1 = df[df['class'] == 1] # gamma data set
print(len(df0),len(df1))
### YOUR CODE ###
y = df['class'].values
X = df[[col for col in df.columns if col!="class"]]
### YOUR CODE ###
from sklearn import linear_model
# define logistic regressor
### YOUR CODE ###
# fit training data
### YOUR CODE ###
from sklearn.metrics import roc_auc_score
### YOUR CODE ###
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve
%matplotlib inline
y_pred_prob = logreg.predict_proba(X_test) # predicted probabilities
### YOUR CODE ###
### YOUR CODE ###