Analyzing Data¶

Prison Helicopter Escapes¶

We will try to answer the following questions by analysing the data and drawing conclusions:

In which year did the most attempts at breaking out of prison with a helicopter occur?
In which countries do the most attempted helicopter prison escapes occur?
In which countries do helicopter prison breaks have a higher chance of success?

We begin by importing some helper functions.

In [131]:

from helper import *

Get the Data¶

Now, let's get the data from the List of helicopter prison escapes Wikipedia article.

In [132]:

url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"
data_with_description = data_from_url(url)
data = data_with_description

Processing the data¶

Let's remove the 'Details' column from our list.

In [133]:

index = 0
for row in data:
    data[index] = row[:-1]
    index += 1

Checking the progress:

In [134]:

print(data[:3])

[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

Extracting only the year from the date column:

In [135]:

for row in data:
    year = fetch_year(row[0])
    row[0] = year

Checking progress:

In [136]:

print(data[:3])

[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

Creating a list of all years in the table without repetition:

In [137]:

min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]

years = []
for year in range(min_year, max_year + 1):
    years.append(year)

In [138]:

attempts_per_year = []
for year in years:
    attempts_per_year.append([year, 0])

Checking progress:

In [139]:

print(attempts_per_year)

[[1971, 0], [1972, 0], [1973, 0], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 0], [1979, 0], [1980, 0], [1981, 0], [1982, 0], [1983, 0], [1984, 0], [1985, 0], [1986, 0], [1987, 0], [1988, 0], [1989, 0], [1990, 0], [1991, 0], [1992, 0], [1993, 0], [1994, 0], [1995, 0], [1996, 0], [1997, 0], [1998, 0], [1999, 0], [2000, 0], [2001, 0], [2002, 0], [2003, 0], [2004, 0], [2005, 0], [2006, 0], [2007, 0], [2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 0], [2014, 0], [2015, 0], [2016, 0], [2017, 0], [2018, 0], [2019, 0], [2020, 0]]

Counting the number of attempts per year and adding them to the list of years:

In [140]:

for row in data:
    for year_attempt in attempts_per_year:
        year = year_attempt[0]
        if row[0] == year:
            year_attempt[1] += 1

print(attempts_per_year)

[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]

Visualisation of frequency per years and conclusions:¶

In [141]:

%matplotlib inline
barplot(attempts_per_year)

Answer to Q1:¶

The most attempts at breaking out of prison with a helicopter occured in 1986, 2001, 2007 and 2009 and that is 4 attempts in each of these years.

Visualisation of frequency by countries and conclusions:¶

In [142]:

countries_frequency = df["Country"].value_counts()
print_pretty_table(countries_frequency)

Country	Number of Occurrences
France	15
United States	8
Belgium	4
Greece	4
Canada	4
United Kingdom	2
Australia	2
Brazil	2
Mexico	1
Netherlands	1
Russia	1
Ireland	1
Chile	1
Puerto Rico	1
Italy	1

Answer to Q2:¶

The most attempted helicopter prison escapes occur in France.

Processing the data further¶

Creating a list of countries without repetition:

In [143]:

countries_repetition = []
for row in data:
    countries_repetition.append(row[2])

countries = list(dict.fromkeys(countries_repetition))

Checking progress:

In [144]:

print(countries)

['Mexico', 'Ireland', 'United States', 'France', 'Canada', 'Australia', 'Brazil', 'Italy', 'United Kingdom', 'Puerto Rico', 'Chile', 'Netherlands', 'Greece', 'Belgium', 'Russia']

In [145]:

outcome = []

for country in countries:
    outcome.append([country, 0, 0])
    
for row in data:
    for country in outcome:
        if row[2] == country[0]:
            if row[3] == 'Yes':
                country[1] += 1
            elif row[3] == 'No':
                country[-1] += 1

Checking progress:

In [146]:

print(outcome)

[['Mexico', 1, 0], ['Ireland', 1, 0], ['United States', 6, 2], ['France', 11, 4], ['Canada', 3, 1], ['Australia', 1, 1], ['Brazil', 2, 0], ['Italy', 1, 0], ['United Kingdom', 1, 1], ['Puerto Rico', 1, 0], ['Chile', 1, 0], ['Netherlands', 0, 1], ['Greece', 2, 2], ['Belgium', 2, 2], ['Russia', 1, 0]]

Calculating the success rate as a percantage of all attemptscheching progress:

In [147]:

success_rate = []

for country in outcome:
    rate = 100 * float(country[1]) / (float(country[1]) + float(country[2]))
    success_rate.append([country[0], int(rate)])

print(success_rate)

[['Mexico', 100], ['Ireland', 100], ['United States', 75], ['France', 73], ['Canada', 75], ['Australia', 50], ['Brazil', 100], ['Italy', 100], ['United Kingdom', 50], ['Puerto Rico', 100], ['Chile', 100], ['Netherlands', 0], ['Greece', 50], ['Belgium', 50], ['Russia', 100]]

Visualising and concluding:

In [148]:

import matplotlib.pyplot as plt

countries = []
rates = []
for set in success_rate:
    countries.append(set[0])
    rates.append(set[-1])

plt.barh(countries, rates)
plt.title('Rate of success by country')
plt.ylabel('Countries')
plt.xlabel('Percantage rate of successes')
plt.show()

Answer to Q3:¶

100% of the helicopter prison break attempts have been successful in Russia, Chile, Puerto Rico, Italy, Brazil, Ireland and Mexico. Based solely on the success rate the data shows that in those countries the chance of success is the highest.