We will try to answer the following questions by analysing the data and drawing conclusions:
We begin by importing some helper functions.
from helper import *
Now, let's get the data from the List of helicopter prison escapes Wikipedia article.
url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"
data_with_description = data_from_url(url)
data = data_with_description
Let's remove the 'Details' column from our list.
index = 0
for row in data:
data[index] = row[:-1]
index += 1
Checking the progress:
print(data[:3])
[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
Extracting only the year from the date column:
for row in data:
year = fetch_year(row[0])
row[0] = year
Checking progress:
print(data[:3])
[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
Creating a list of all years in the table without repetition:
min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]
years = []
for year in range(min_year, max_year + 1):
years.append(year)
attempts_per_year = []
for year in years:
attempts_per_year.append([year, 0])
Checking progress:
print(attempts_per_year)
[[1971, 0], [1972, 0], [1973, 0], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 0], [1979, 0], [1980, 0], [1981, 0], [1982, 0], [1983, 0], [1984, 0], [1985, 0], [1986, 0], [1987, 0], [1988, 0], [1989, 0], [1990, 0], [1991, 0], [1992, 0], [1993, 0], [1994, 0], [1995, 0], [1996, 0], [1997, 0], [1998, 0], [1999, 0], [2000, 0], [2001, 0], [2002, 0], [2003, 0], [2004, 0], [2005, 0], [2006, 0], [2007, 0], [2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 0], [2014, 0], [2015, 0], [2016, 0], [2017, 0], [2018, 0], [2019, 0], [2020, 0]]
Counting the number of attempts per year and adding them to the list of years:
for row in data:
for year_attempt in attempts_per_year:
year = year_attempt[0]
if row[0] == year:
year_attempt[1] += 1
print(attempts_per_year)
[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]
%matplotlib inline
barplot(attempts_per_year)
The most attempts at breaking out of prison with a helicopter occured in 1986, 2001, 2007 and 2009 and that is 4 attempts in each of these years.
countries_frequency = df["Country"].value_counts()
print_pretty_table(countries_frequency)
Country | Number of Occurrences |
---|---|
France | 15 |
United States | 8 |
Belgium | 4 |
Greece | 4 |
Canada | 4 |
United Kingdom | 2 |
Australia | 2 |
Brazil | 2 |
Mexico | 1 |
Netherlands | 1 |
Russia | 1 |
Ireland | 1 |
Chile | 1 |
Puerto Rico | 1 |
Italy | 1 |
The most attempted helicopter prison escapes occur in France.
Creating a list of countries without repetition:
countries_repetition = []
for row in data:
countries_repetition.append(row[2])
countries = list(dict.fromkeys(countries_repetition))
Checking progress:
print(countries)
['Mexico', 'Ireland', 'United States', 'France', 'Canada', 'Australia', 'Brazil', 'Italy', 'United Kingdom', 'Puerto Rico', 'Chile', 'Netherlands', 'Greece', 'Belgium', 'Russia']
outcome = []
for country in countries:
outcome.append([country, 0, 0])
for row in data:
for country in outcome:
if row[2] == country[0]:
if row[3] == 'Yes':
country[1] += 1
elif row[3] == 'No':
country[-1] += 1
Checking progress:
print(outcome)
[['Mexico', 1, 0], ['Ireland', 1, 0], ['United States', 6, 2], ['France', 11, 4], ['Canada', 3, 1], ['Australia', 1, 1], ['Brazil', 2, 0], ['Italy', 1, 0], ['United Kingdom', 1, 1], ['Puerto Rico', 1, 0], ['Chile', 1, 0], ['Netherlands', 0, 1], ['Greece', 2, 2], ['Belgium', 2, 2], ['Russia', 1, 0]]
Calculating the success rate as a percantage of all attemptscheching progress:
success_rate = []
for country in outcome:
rate = 100 * float(country[1]) / (float(country[1]) + float(country[2]))
success_rate.append([country[0], int(rate)])
print(success_rate)
[['Mexico', 100], ['Ireland', 100], ['United States', 75], ['France', 73], ['Canada', 75], ['Australia', 50], ['Brazil', 100], ['Italy', 100], ['United Kingdom', 50], ['Puerto Rico', 100], ['Chile', 100], ['Netherlands', 0], ['Greece', 50], ['Belgium', 50], ['Russia', 100]]
Visualising and concluding:
import matplotlib.pyplot as plt
countries = []
rates = []
for set in success_rate:
countries.append(set[0])
rates.append(set[-1])
plt.barh(countries, rates)
plt.title('Rate of success by country')
plt.ylabel('Countries')
plt.xlabel('Percantage rate of successes')
plt.show()
100% of the helicopter prison break attempts have been successful in Russia, Chile, Puerto Rico, Italy, Brazil, Ireland and Mexico. Based solely on the success rate the data shows that in those countries the chance of success is the highest.