I'm sure many of us love the TV series Prison Break, watching the brothers Michael Scofield and Lincoln Burrows using all sorts of means to break out of prison again and again.
However, in real life, there are actual helicopter prison escapes since 1971.
In this project, we will analyze the data from Wikipedia and attempt to answer the following questions:
# Imports
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.style as style
style.use('fivethirtyeight')
sns.set(style="darkgrid")
# parse url to get data table
url = 'https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes'
df = pd.read_html(url, match='Prison name')
# convert list to DataFrame
df = df[0]
print(df.head(3))
print(df.tail(3))
Date Prison name Country \ 0 August 19, 1971 Santa Martha Acatitla Mexico 1 October 31, 1973 Mountjoy Jail Ireland 2 May 24, 1978 United States Penitentiary, Marion United States Succeeded Escapee(s) \ 0 Yes Joel David Kaplan Carlos Antonio Contreras Castro 1 Yes JB O'Hagan Seamus TwomeyKevin Mallon 2 No Garrett Brock TrapnellMartin Joseph McNallyJam... Details 0 Joel David Kaplan was a New York businessman w... 1 On October 31, 1973, an IRA member hijacked a ... 2 43-year-old Barbara Ann Oswald hijacked a Sain... Date Prison name Country Succeeded \ 45 February 22, 2016 Thiva Greece No 46 July 1, 2018 Réau, near Paris France Yes 47 September 25, 2020 Forest prison, Brussels Belgium No Escapee(s) \ 45 Pola RoupaNikos Maziotis 46 Rédoine Faïd 47 Kristel A. Details 45 A helicopter pilot foiled an attempted hijack ... 46 Faïd was helped by several heavily armed men w... 47 Three armed men hijacked a Eurocopter AS355 he...
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 48 entries, 0 to 47 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 48 non-null object 1 Prison name 48 non-null object 2 Country 48 non-null object 3 Succeeded 48 non-null object 4 Escapee(s) 48 non-null object 5 Details 48 non-null object dtypes: object(6) memory usage: 2.4+ KB
df.describe()
Date | Prison name | Country | Succeeded | Escapee(s) | Details | |
---|---|---|---|---|---|---|
count | 48 | 48 | 48 | 48 | 48 | 48 |
unique | 48 | 45 | 15 | 2 | 40 | 48 |
top | May 24, 1978 | Touraine Central Prison, Tours | France | Yes | — | Three detainees awaiting trial for murder esca... |
freq | 1 | 2 | 15 | 34 | 7 | 1 |
Date
column should be a datetime object.Details
column is irrelevant for this project.# drop `Details column`
df.drop('Details', axis=1, inplace=True)
# convert `Date` column to datetime
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 48 entries, 0 to 47 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 48 non-null datetime64[ns] 1 Prison name 48 non-null object 2 Country 48 non-null object 3 Succeeded 48 non-null object 4 Escapee(s) 48 non-null object dtypes: datetime64[ns](1), object(4) memory usage: 2.0+ KB
# rename columns
df.rename(columns={'Prison name': 'Prison',
'Escapee(s)': 'Escapees',},
inplace=True)
df.columns
Index(['Date', 'Prison', 'Country', 'Succeeded', 'Escapees'], dtype='object')
# create a new Dateframe by year
df_year = df['Date'].dt.year.value_counts()
df_year.sort_index(inplace=True)
# years with most occurences
years_with_max_occurences = df_year[df_year.values == df_year.max()].index.to_list()
years_with_max_occurences
[1986, 2001, 2007, 2009]
# create a palette to hightlight the maximum value
def set_maximum_palette(series, max_color='turquoise', other_color='lightgray'):
return [max_color if item == series.max() else other_color for item in series]
palette = set_maximum_palette(df_year)
sns.barplot(x=df_year.index, y=df_year.values, palette=palette)
plt.title('Helicopter Prison Breaks By Year')
plt.xticks(rotation=90)
plt.show()
The most attempts at breaking out of prison with a helicopter occured in 1986, 2001, 2007 and 2009 with 3 attempts each.
# create a new Dateframe by countries
df_country = df['Country'].value_counts()
# years with most occurences
df_country
France 15 United States 8 Greece 4 Canada 4 Belgium 4 Australia 2 United Kingdom 2 Brazil 2 Mexico 1 Ireland 1 Italy 1 Russia 1 Netherlands 1 Puerto Rico 1 Chile 1 Name: Country, dtype: int64
palette = set_maximum_palette(df_country)
sns.barplot(x=df_country.values, y=df_country.index, palette=palette)
plt.title('Helicopter Prison Breaks By Country')
plt.show()
The most attempts at breaking out of prison with a helicopter occured in France with 15 attempts. The runner-up is United States with 8 attempts.
# convert `Succeeded` column to boolean
df['IsSuccess'] = df.apply(lambda row: 1 if row['Succeeded'] == 'Yes' else 0, axis=1)
# get the success rate by country
success_rate_by_country = df.groupby('Country')['IsSuccess'].mean()
success_rate_by_country.sort_values(ascending=False, inplace=True)
success_rate_by_country[success_rate_by_country == success_rate_by_country.max()]
Country Brazil 1.0 Chile 1.0 Ireland 1.0 Italy 1.0 Mexico 1.0 Puerto Rico 1.0 Russia 1.0 Name: IsSuccess, dtype: float64
palette = set_maximum_palette(success_rate_by_country)
sns.barplot(x=success_rate_by_country.values, y=success_rate_by_country.index, palette=palette)
plt.title('Helicopter Prison Breaks Success Rate By Country')
plt.show()
Helicopter prison breaks have a 100% chance in Brazil, Chile, Ireland, Italy, Maxico, Puerto Rico, and Russia.
It is interesting to see that France has the most number of attempts to escape the prisons via a helicopter. At the same time, the success rate is also rather high at more than 70%.
The United States with the second highest number of attempts has also a high success rate.
Perhaps France and the United States should strengthen their prisons against helicopter escapes.