My First Data Science Project¶

Helicopter Escapes¶

We begin by importing some helper functions.

In [1]:

from helper import *

Get the Data¶

Now, let's get the data from the [List of helicopter prison escapes] (https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes) Wikipedia article.

Let's print the first three rows.

In [2]:

url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"
data = data_from_url(url)

for row in data[:3]:
    print(row)

['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Both men were flown to Texas and then different planes flew Kaplan to California and Castro to Guatemala.[3] The Mexican government never initiated extradition proceedings against Kaplan.[9] The escape is told in a book, The 10-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It also inspired the 1975 action movie Breakout, which starred Charles Bronson and Robert Duvall.[9]"]
['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 'On October 31, 1973 an IRA member hijacked a helicopter and forced the pilot to land in the exercise yard of Dublin\'s Mountjoy Jail\'s D Wing at 3:40\xa0p.m., October 31, 1973. Three members of the IRA were able to escape: JB O\'Hagan, Seamus Twomey and Kevin Mallon. Another prisoner who also was in the prison was quoted as saying, "One shamefaced screw apologised to the governor and said he thought it was the new Minister for Defence (Paddy Donegan) arriving. I told him it was our Minister of Defence leaving." The Mountjoy helicopter escape became Republican lore and was immortalized by "The Helicopter Song", which contains the lines "It\'s up like a bird and over the city. There\'s three men a\'missing I heard the warder say".[1]']
['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', "43-year-old Barbara Ann Oswald hijacked a Saint Louis-based charter helicopter and forced the pilot to land in the yard at USP Marion. While landing the aircraft, the pilot, Allen Barklage, who was a Vietnam War veteran, struggled with Oswald and managed to wrestle the gun away from her. Barklage then shot and killed Oswald, thwarting the escape.[10] A few months later Oswald's daughter hijacked TWA Flight 541 in an effort to free Trapnell."]

Remove the detail column that takes up most of the dataset to make it much more manageable and readable.

In [3]:

index = 0
for row in data:
    data[index] = row[:-1]
    index += 1
    
print(data[:3])

[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

Replace the month/date/year first column with only the year for each entry.

In [4]:

for row in data:
    row[0] = fetch_year(row[0])
    
print(data[:3])

[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]

How many prison breaks occured each year?¶

Create list of lists with each year and zero placeholder.

In [5]:

min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]

years = []
for y in range(min_year, max_year + 1):
    years.append(y)
    
attempts_per_year = []
for y in years:
    attempts_per_year.append([y,0])
    
print(attempts_per_year)

[[1971, 0], [1972, 0], [1973, 0], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 0], [1979, 0], [1980, 0], [1981, 0], [1982, 0], [1983, 0], [1984, 0], [1985, 0], [1986, 0], [1987, 0], [1988, 0], [1989, 0], [1990, 0], [1991, 0], [1992, 0], [1993, 0], [1994, 0], [1995, 0], [1996, 0], [1997, 0], [1998, 0], [1999, 0], [2000, 0], [2001, 0], [2002, 0], [2003, 0], [2004, 0], [2005, 0], [2006, 0], [2007, 0], [2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 0], [2014, 0], [2015, 0], [2016, 0], [2017, 0], [2018, 0], [2019, 0], [2020, 0]]

Update frequency list with each entry, so that the number of prison break attempts each year is the second column.

In [6]:

for row in data:
    for y in attempts_per_year:
        if row[0] == y[0]:
            y[1] += 1

print(attempts_per_year)

[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]

Create a bar plot visualization to display data in frequency list.

In [7]:

%matplotlib inline
barplot(attempts_per_year)

From the bar plot above, it is evident that the most prison breaks attempts using helicopters occured in the years 1986, 2001, 2007, and 2009. It would be interesting to see if this trend is due to any social or political conditions.

Use code to justify the result obtained via the barplot.

In [8]:

max_attempts = 0
for y in attempts_per_year:
    if y[1] > max_attempts:
        max_attempts = y[1]

years_max = []
for y in attempts_per_year:
    if y[1] == max_attempts:
        years_max.append(y[0])
        
print(years_max)
        

[1986, 2001, 2007, 2009]

The code above confirms the conclusion drawn in the boxplot: the most prison break attempts using helicopters did occur in 1986, 2001, 2007, and 2009.

How many prison break attempts occured in each country?¶

Create a table with number of prison break attempt occurences per country.

In [9]:

countries_frequency = df["Country"].value_counts()
print_pretty_table(countries_frequency)

Country	Number of Occurrences
France	15
United States	8
Canada	4
Greece	4
Belgium	4
Brazil	2
Australia	2
United Kingdom	2
Russia	1
Puerto Rico	1
Mexico	1
Netherlands	1
Chile	1
Italy	1
Ireland	1

France had the highest number of prison break occurences of any country. Would be interesting to see this table reorganized per capita and per prison.

In which countries do helicopter prison breaks have a higher chance of success?¶

Create two dictionaries with countries as keys, one with number of successes, other with total number of attempts.

In [10]:

success_dict = {}
for row in data:
    if row[2] in success_dict and row[3] == "Yes":
        success_dict[row[2]] += 1
    elif row[2] not in success_dict:
        if row[3] == "Yes":
            success_dict[row[2]] = 1
        elif row[3] == "No":
            success_dict[row[2]] = 0
            
print(success_dict)

{'Mexico': 1, 'Ireland': 1, 'United States': 6, 'France': 11, 'Canada': 3, 'Australia': 1, 'Brazil': 2, 'Italy': 1, 'United Kingdom': 1, 'Puerto Rico': 1, 'Chile': 1, 'Netherlands': 0, 'Greece': 2, 'Belgium': 2, 'Russia': 1}

In [11]:

total_dict = {}
country_list = []
for row in data:
    if row[2] in total_dict:
        total_dict[row[2]] += 1
    else:
        total_dict[row[2]] = 1
        country_list.append(row[2])
        
print(total_dict)

{'Mexico': 1, 'Ireland': 1, 'United States': 8, 'France': 15, 'Canada': 4, 'Australia': 2, 'Brazil': 2, 'Italy': 1, 'United Kingdom': 2, 'Puerto Rico': 1, 'Chile': 1, 'Netherlands': 1, 'Greece': 4, 'Belgium': 4, 'Russia': 1}

Create dictionary with country and rate of success.

In [12]:

percent_dict = {}
for country in country_list:
    percent_dict[country] = str(100*round(success_dict[country]/total_dict[country], 2)) + "%"
    
print(percent_dict)

{'Mexico': '100.0%', 'Ireland': '100.0%', 'United States': '75.0%', 'France': '73.0%', 'Canada': '75.0%', 'Australia': '50.0%', 'Brazil': '100.0%', 'Italy': '100.0%', 'United Kingdom': '50.0%', 'Puerto Rico': '100.0%', 'Chile': '100.0%', 'Netherlands': '0.0%', 'Greece': '50.0%', 'Belgium': '50.0%', 'Russia': '100.0%'}

From the output above, Mexico, Ireland, Brazil, Italy, Puerto Rico, Chile, and Russia need to improve their helicopter prison escape security procedure!

How does the number of escapees affect the success?¶

Create a column for each entry with the number of escapees.

In [17]:

#note: to answer the question, it is difficult to identify the start and end of each person's name, so the total
#number of words in each entry's name column will be used

for row in data:
    name_list = row[4].split()
    row.append(len(name_list))
    
print(data[:3])

[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', 7, 7], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 5, 5], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', 7, 7]]

Create two dictionaries with the number of escapees as the key, one with the number of successes and the other with the total number of attempts.

In [18]:

es_num_success = {}
for row in data:
    if row[5] in es_num_success and row[3] == "Yes":
        es_num_success[row[5]] += 1
    elif row[5] not in es_num_success:
        if row[3] == "Yes":
            es_num_success[row[5]] = 1
        elif row[3] == "No":
            es_num_success[row[5]] = 0
            
print(es_num_success)

{7: 3, 5: 2, 3: 7, 2: 11, 4: 5, 1: 3, 8: 1, 10: 1, 6: 1}

In [22]:

es_num_total = {}
num_list = []
for row in data:
    if row[5] in es_num_total:
        es_num_total[row[5]] += 1
    else:
        es_num_total[row[5]] = 1
        num_list.append(row[5])

num_list.sort()
print(es_num_total)

{7: 4, 5: 3, 3: 8, 2: 18, 4: 5, 1: 7, 8: 1, 10: 1, 6: 1}

Create a dictionary with the key as the number of escapees, and the value as the success percentage.

In [24]:

es_num_percent = {}
for num in num_list:
    es_num_percent[num] = str(round(100*es_num_success[num]/es_num_total[num], 2)) + "%"
    
print(es_num_percent)
print(len(data))

{1: '42.86%', 2: '61.11%', 3: '87.5%', 4: '100.0%', 5: '66.67%', 6: '100.0%', 7: '75.0%', 8: '100.0%', 10: '100.0%'}
48

From the dictionary above, the data suggests that the more escapees present during a prison break the higher the chance of success is. Prison escapes with 4, 6, 8, and 10 escapees all had 100% success rates. However, it is worth noting that the data set is small, with n = 48.

Which escapees have done it more than once?¶

Create list of escapees that have tried to escape prison via helicopter more than once.

In [29]:

entry_names_dict = {}
for row in data:
    list_names = row[4].split()
    for name in list_names:
        if name in entry_names_dict:
            entry_names_dict[name] += 1
        else:
            entry_names_dict[name] = 1
            
entry_repeat = []
for name in entry_names_dict:
    if entry_names_dict[name] > 1 and len(name) > 1:
        entry_repeat.append(name)
        
for name in entry_repeat:
    print(name)

David
Carlos
Michel
Vaujour
Rodriguez
Pascal
Payet
Jose
Diaz
Eric

This is a list of the names of the escapees who have tried to escape prison via helicopter more than once.