#!/usr/bin/env python # coding: utf-8 # # My First Data Science Project - Helicopter Prison Break Escapes # ### Background # There have been multiple prison escapes where an inmate escapes by means of a helicopter. One of the earliest instances was the escape of Joel David Kaplan, nicknamed "Man Fan", on August 19, 1971, from the Santa Martha Acatitla in Mexico.
# # **The following questions are to be answered**
# 1: In which year did the most helicopter prison break attempts occur?
# 2: In which countries do the most attempted helicopter prison breaks occur?
# 3: In which countries do helicopter prison breaks have a higher chance of success?
# 4: How does the number of escapees affect the success?
# 5: Which escapees have done it more than once?
# # To answer the above questions, data from this [Wikipedia page](https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes) was scraped and analyzed. It covers the details of attempted helicopter prison escapes over a period of 50 years (1971-2020) # # **Exploring the dataset** # The dataset contains six data fields:
# # 1- Date: the date of the attempted prison break
# 2- Prison name: the name of the prison
# 3- Country: where it the prison break happened
# 4- Status: Whether the attempt was successful or not
# 5- Names: of the escapees
# 6- More details: other details of the prison break
# ## We begin by importing some helper functions # In[1]: # import the needed helper function from helper import * # ## Get the Dat # In[2]: # get the needed data from the data source url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes" data = data_from_url(url) # ## Let's print the first three rows # In[3]: # Loop through the data to print the first 3 records for row in data: print(row[:3]) # ## Removing the details # # We initialize an index variable with the value of 0. The purpose of this variable is to help us track which row we're modifying. # In[4]: # Remove the details Column as it not needed in the analysis index = 0 for row in data: data[index] = row[:-1] index += 1 # In[5]: print(row[:3]) # ### Extracting the Year # In the code cell below, we iterate over data using the iterable variable row and: # # * With every occurrence of `row[0]`, we refer to the first entry of `row`, i.e., the date. # * Thus, with `date = fetch_year(row[0])`, we're extracting the year out of the date in `row[0]` and assiging it to the variable `date`. # * We then replace the value of `row[0]` with the year that we just extracted. # In[6]: # Since we don’t need the month and day, thus, extract the year for row in data: (row[0]) = fetch_year(row[0]) # In[7]: # Print to confirm that the year has been extracted print(row) # ### Number of attempted prison break per year # In[8]: min_year = min(data, key=lambda x: x[0])[0] max_year = max(data, key=lambda x: x[0])[0] # Before we continuing , let's check what are the earliest and latest dates we have in our dataset. # In[9]: print(min_year) # the minimum year print(max_year) # the maximum year # Now we'll create a list of all the years ranging from min_year to max_year. Our goal is to then determine how many prison break attempts were there for each year. Since years in which there weren't any prison breaks aren't present in the dataset, this will make sure we capture them. # In[10]: years = [] for y in range(min_year, max_year + 1): years.append(y) # Check the `years` to see if it gives what we want # In[11]: print(years) # The years look as expected # Now we create a list where each element looks like `[, 0]`. # In[12]: attempts_per_year = [] for y in years: attempts_per_year.append([y, 0]) # In[13]: for row in data: for ya in attempts_per_year: y = ya[0] if row[0] == y: ya[1] += 1 print(attempts_per_year) # ### Visualize the number of attempts per year # In[14]: get_ipython().run_line_magic('matplotlib', 'inline') # In[15]: barplot(attempts_per_year) # The years in which the most helicopter prison break attempts occurred were 1986, 2001, 2007 and 2009, with a total of three attempts each. # In[16]: # countries_frequency = df["Country"].value_counts() # print_pretty_table(countries_frequency) frequency_by_countries = df["Country"].value_counts() print_pretty_table(frequency_by_countries) # ### Visualize the number of occurence of attempted helicopter prison escapes per country # In[17]: df['Country'].value_counts().plot(kind='bar' , x = 'Country',y = 'No. of Escapes',title = 'Prison Breaks per Country',figsize = (12,6)) # France has the highest number attempted helicopter prison escapes, follow by United States # ### Number of successful and unsuccessful attempts of helicopter prision escapes # In[18]: df["Succeeded"].value_counts() # In[19]: # Counting the number of successful and failed attempts success_count = df.pivot_table(index ="Succeeded", values = "Country", aggfunc = "count") success_count.columns = ["Counts"] success_count.sort_index(inplace = True, ascending = False) success_count # In[20]: plt.pie(data =success_count, x ="Counts", labels = ("Successful", "Not Successful"), autopct='%1.f%%') plt.title("Percentages of Successful and Unsuccessful Helicopter Prison Breaks") # As depicted in the chart above, 71% (34) of the helicopter prison breaks recorded to be successful and the prisoners were able to escape with the helicopter, while 29% were recorded to be unsuccessful. # ## Conclusion # **This project analysed the data of helicopter prison escapes from the year 1971 - 2020.**
# 1: I can be deduced that the year 1986, 2001, 2007 and 2009 has the highest number of helicopter prison escapes with a total of three attempts each.
# 2: France has the highest number attempted helicopter prison escapes with 15 attempts, follow by United States with 8 attempst
# 3: The helicopter prison breaks have a higher chance of success in France
# 4: 71% (34) of the helicopter prison breaks recorded to be successful and the prisoners were able to escape with the helicopter, while 29% were recorded to be unsuccessful.
# # **QUESTION:** Which of these is correct, using **We**. or **I** while narting steps taken, e.g; Now **we** will create a list of all the years ranging from min_year to max_year # # Thank you for reading and feel free to let me know your observations, corrections or suggestions. #