Given some date from the Wikipedia page of helicopter prison escapes, we ansewer some basic questions.
In which year did the most helicopter prison break attempts occur?
In which countries do the most attempted helicopter prison escapes occur?
We begin by importing some helper functions.
from helper import *
Now, let's get the data from the List of helicopter prison escapes Wikipedia article.
url = "https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes"
data = data_from_url(url)
Let's print the first three rows
for item in data[0:3]:
print(item)
['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro', "Joel David Kaplan was a New York businessman who had been arrested for murder in 1962 in Mexico City and was incarcerated at the Santa Martha Acatitla prison in the Iztapalapa borough of Mexico City. Joel's sister, Judy Kaplan, arranged the means to help Kaplan escape, and on August 19, 1971, a helicopter landed in the prison yard. The guards mistakenly thought this was an official visit. In two minutes, Kaplan and his cellmate Carlos Antonio Contreras, a Venezuelan counterfeiter, were able to board the craft and were piloted away, before any shots were fired.[9] Both men were flown to Texas and then different planes flew Kaplan to California and Castro to Guatemala.[3] The Mexican government never initiated extradition proceedings against Kaplan.[9] The escape is told in a book, The 10-Second Jailbreak: The Helicopter Escape of Joel David Kaplan.[4] It also inspired the 1975 action movie Breakout, which starred Charles Bronson and Robert Duvall.[9]"] ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon", 'On October 31, 1973 an IRA member hijacked a helicopter and forced the pilot to land in the exercise yard of Dublin\'s Mountjoy Jail\'s D Wing at 3:40\xa0p.m., October 31, 1973. Three members of the IRA were able to escape: JB O\'Hagan, Seamus Twomey and Kevin Mallon. Another prisoner who also was in the prison was quoted as saying, "One shamefaced screw apologised to the governor and said he thought it was the new Minister for Defence (Paddy Donegan) arriving. I told him it was our Minister of Defence leaving." The Mountjoy helicopter escape became Republican lore and was immortalized by "The Helicopter Song", which contains the lines "It\'s up like a bird and over the city. There\'s three men a\'missing I heard the warder say".[1]'] ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', "43-year-old Barbara Ann Oswald hijacked a Saint Louis-based charter helicopter and forced the pilot to land in the yard at USP Marion. While landing the aircraft, the pilot, Allen Barklage, who was a Vietnam War veteran, struggled with Oswald and managed to wrestle the gun away from her. Barklage then shot and killed Oswald, thwarting the escape.[10] A few months later Oswald's daughter hijacked TWA Flight 541 in an effort to free Trapnell."]
In order to move ahead with the data analysis, we need to retain just the necessary information from the lists of strings of data above.
We do that be recognising that the last entry in each sub-list contains the details of the helicopter prison escape which we can easily ignore for our purposes.
#starting at index = 0, we assign new values to our data such that each sub-list excludes the
#last column
index = 0
for row in data:
data[index] = row[:-1]
index += 1
print(data[:3]) #print first 3 rows of data to check the new values
[['August 19, 1971', 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['October 31, 1973', 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['May 24, 1978', 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
#using functions from helper.py we take just the year from the date
#row[0] corresponds to the date
for row in data:
row[0] = fetch_year(row[0])
print(data[:3])
[[1971, 'Santa Martha Acatitla', 'Mexico', 'Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], [1973, 'Mountjoy Jail', 'Ireland', 'Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], [1978, 'United States Penitentiary, Marion', 'United States', 'No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson']]
#looking at the earliest and latest years in the dataset
min_year = min(data, key=lambda x: x[0])[0]
max_year = max(data, key=lambda x: x[0])[0]
print(f"Earliest year: {min_year}, and Latest year: {max_year}") #utilising f-strings to fill up data
Earliest year: 1971, and Latest year: 2020
#list of all years from 1971 to 2020
years = []
for y in range(min_year, max_year + 1):
years.append(y)
years
[1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
[<year>,0]
attempts_per_year = []
for y in years:
attempts_per_year.append([y,0])
attempts_per_year
[[1971, 0], [1972, 0], [1973, 0], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 0], [1979, 0], [1980, 0], [1981, 0], [1982, 0], [1983, 0], [1984, 0], [1985, 0], [1986, 0], [1987, 0], [1988, 0], [1989, 0], [1990, 0], [1991, 0], [1992, 0], [1993, 0], [1994, 0], [1995, 0], [1996, 0], [1997, 0], [1998, 0], [1999, 0], [2000, 0], [2001, 0], [2002, 0], [2003, 0], [2004, 0], [2005, 0], [2006, 0], [2007, 0], [2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 0], [2014, 0], [2015, 0], [2016, 0], [2017, 0], [2018, 0], [2019, 0], [2020, 0]]
0
, by 1
each time a year appears in the datafor row in data:
for ya in attempts_per_year:
y = ya[0]
if row[0] == y:
ya[1] += 1
print(attempts_per_year)
[[1971, 1], [1972, 0], [1973, 1], [1974, 0], [1975, 0], [1976, 0], [1977, 0], [1978, 1], [1979, 0], [1980, 0], [1981, 2], [1982, 0], [1983, 1], [1984, 0], [1985, 2], [1986, 3], [1987, 1], [1988, 1], [1989, 2], [1990, 1], [1991, 1], [1992, 2], [1993, 1], [1994, 0], [1995, 0], [1996, 1], [1997, 1], [1998, 0], [1999, 1], [2000, 2], [2001, 3], [2002, 2], [2003, 1], [2004, 0], [2005, 2], [2006, 1], [2007, 3], [2008, 0], [2009, 3], [2010, 1], [2011, 0], [2012, 1], [2013, 2], [2014, 1], [2015, 0], [2016, 1], [2017, 0], [2018, 1], [2019, 0], [2020, 1]]
%matplotlib inline
barplot(attempts_per_year)
The years in which the most helicopter prison break attempts occurred were 1986, 2001, 2007 and 2009, with a total of three attempts each.
#using helper.py to find how many times a country appears in the dataset
countries_frequency = df["Country"].value_counts()
print_pretty_table(countries_frequency)
Country | Number of Occurrences |
---|---|
France | 15 |
United States | 8 |
Canada | 4 |
Greece | 4 |
Belgium | 4 |
Australia | 2 |
Brazil | 2 |
United Kingdom | 2 |
Mexico | 1 |
Ireland | 1 |
Italy | 1 |
Puerto Rico | 1 |
Chile | 1 |
Netherlands | 1 |
Russia | 1 |
We see that the most helicopter prison escape attempts have happened in France.
#making the list of all countries with inital successful attempts as 0
countries = []
for i in data:
if i[2] not in countries:
countries.append(i[2])
country_succ_attempt = []
for i in countries:
country_succ_attempt.append([i,0])
#for successful attempts, we see how many times "Yes" appears in our dataset
for i in data:
for n in country_succ_attempt:
if i[2]==n[0]:
if i[3]=="Yes":
n[1]+=1
print(country_succ_attempt)
[['Mexico', 1], ['Ireland', 1], ['United States', 6], ['France', 11], ['Canada', 3], ['Australia', 1], ['Brazil', 2], ['Italy', 1], ['United Kingdom', 1], ['Puerto Rico', 1], ['Chile', 1], ['Netherlands', 0], ['Greece', 2], ['Belgium', 2], ['Russia', 1]]
list_country = list(countries_frequency.index)
list_attempts = list(countries_frequency.values)
country_tot_attempt = []
index = 0
for i in list_attempts:
country_tot_attempt.append([list_country[index],i])
index += 1
country_tot_attempt
[['France', 15], ['United States', 8], ['Canada', 4], ['Greece', 4], ['Belgium', 4], ['Australia', 2], ['Brazil', 2], ['United Kingdom', 2], ['Mexico', 1], ['Ireland', 1], ['Italy', 1], ['Puerto Rico', 1], ['Chile', 1], ['Netherlands', 1], ['Russia', 1]]
for i in country_tot_attempt:
for n in country_succ_attempt:
if n[0] == i[0]:
succ_ratio = n[1]/i[1]
print(i[0],succ_ratio)
France 0.7333333333333333 United States 0.75 Canada 0.75 Greece 0.5 Belgium 0.5 Australia 0.5 Brazil 1.0 United Kingdom 0.5 Mexico 1.0 Ireland 1.0 Italy 1.0 Puerto Rico 1.0 Chile 1.0 Netherlands 0.0 Russia 1.0
We can see that Brazil, Mexico, Ireland, Italy, Puerto Rico, Chile, and Russia have the highest success rates.
escape = []
for i in data:
escape.append([i[3],i[4]])
escape
[['Yes', 'Joel David Kaplan Carlos Antonio Contreras Castro'], ['Yes', "JB O'Hagan Seamus TwomeyKevin Mallon"], ['No', 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson'], ['Yes', 'Gérard DupréDaniel Beaumont'], ['No', 'Marina Paquet (hijacker)Giles Arseneault (prisoner)'], ['No', 'David McMillan'], ['Yes', 'James Rodney LeonardWilliam Douglas BallewJesse Glenn Smith'], ['Yes', 'José Carlos dos Reis Encina, a.k.a. "Escadinha"'], ['Yes', 'Michel Vaujour'], ['Yes', 'Samantha Lopez'], ['Yes', 'André BellaïcheGianluigi EspositoLuciano Cipollari'], ['Yes', 'Sydney DraperJohn Kendall'], ['Yes', 'Mahoney Danny Francis MitchellRandy Lackey'], ['No', 'Ben Kramer'], ['Yes', 'Ralph BrownFreddie Gonzales'], ['Yes', 'Robert FordDavid Thomas'], ['Yes', 'William Lane'], ['Yes', '—'], ['No', '—'], ['No', 'Michel Vaujour'], ['Yes', 'Four members of the Manuel Rodriguez Patriotic Front'], ['No', '—'], ['Yes', 'John Killick'], ['Yes', 'Steven Whitsett'], ['Yes', '—'], ['Yes', 'Pascal Payet'], ['Yes', 'Abdelhamid CarnousEmile Forma-SariJean-Philippe Lecase'], ['No', '—'], ['Yes', '—'], ['Yes', 'Orlando Cartagena Jose Rodriguez Victor Diaz Hector Diaz Jose Tapia'], ['Yes', 'Eric AlboreoFranck PerlettoMichel Valero'], ['No', '—'], ['Yes', 'Hubert SellesJean-Claude MorettiMohamed Bessame'], ['Yes', 'Vassilis Paleokostas'], ['Yes', 'Eric Ferdinand'], ['Yes', 'Pascal Payet'], ['No', 'Nordin Benallal'], ['Yes', 'Vasilis PaleokostasAlket Rizai'], ['Yes', 'Alexin JismyFabrice Michel'], ['Yes', 'Ashraf Sekkaki plus three other criminals'], ['No', 'Brian Lawrence'], ['Yes', 'Alexey Shestakov'], ['No', 'Panagiotis Vlastos'], ['Yes', 'Benjamin Hudon-BarbeauDanny Provençal'], ['Yes', 'Yves DenisDenis LefebvreSerge Pomerleau'], ['No', 'Pola RoupaNikos Maziotis'], ['Yes', 'Rédoine Faïd'], ['No', 'Kristel A.']]
Noticing that each name consists at least of a first and last name, and some with middle names.
To simplify we can divide the length of the string with 2 and for fractions, assume the lower limit, and getting the number of escapees.
esc_num_succ = []
for i in escape:
esc_num_succ.append([i[0],len(i[1].split())//2])
a = list(enumerate(esc_num_succ))
print(a)
[(0, ['Yes', 3]), (1, ['Yes', 2]), (2, ['No', 3]), (3, ['Yes', 1]), (4, ['No', 2]), (5, ['No', 1]), (6, ['Yes', 3]), (7, ['Yes', 3]), (8, ['Yes', 1]), (9, ['Yes', 1]), (10, ['Yes', 2]), (11, ['Yes', 1]), (12, ['Yes', 2]), (13, ['No', 1]), (14, ['Yes', 1]), (15, ['Yes', 1]), (16, ['Yes', 1]), (17, ['Yes', 0]), (18, ['No', 0]), (19, ['No', 1]), (20, ['Yes', 4]), (21, ['No', 0]), (22, ['Yes', 1]), (23, ['Yes', 1]), (24, ['Yes', 0]), (25, ['Yes', 1]), (26, ['Yes', 2]), (27, ['No', 0]), (28, ['Yes', 0]), (29, ['Yes', 5]), (30, ['Yes', 2]), (31, ['No', 0]), (32, ['Yes', 2]), (33, ['Yes', 1]), (34, ['Yes', 1]), (35, ['Yes', 1]), (36, ['No', 1]), (37, ['Yes', 1]), (38, ['Yes', 1]), (39, ['Yes', 3]), (40, ['No', 1]), (41, ['Yes', 1]), (42, ['No', 1]), (43, ['Yes', 1]), (44, ['Yes', 2]), (45, ['No', 1]), (46, ['Yes', 1]), (47, ['No', 1])]
for i in a:
print(i[0],i[1][1])
0 3 1 2 2 3 3 1 4 2 5 1 6 3 7 3 8 1 9 1 10 2 11 1 12 2 13 1 14 1 15 1 16 1 17 0 18 0 19 1 20 4 21 0 22 1 23 1 24 0 25 1 26 2 27 0 28 0 29 5 30 2 31 0 32 2 33 1 34 1 35 1 36 1 37 1 38 1 39 3 40 1 41 1 42 1 43 1 44 2 45 1 46 1 47 1
esc_num_succ[0][1] = 2
esc_num_succ[1][1] = 3
esc_num_succ[3][1] = 2
esc_num_succ[4][1] = 2
esc_num_succ[7][1] = 2
esc_num_succ[11][1] = 2
esc_num_succ[10][1] = 3
esc_num_succ[12][1] = 3
esc_num_succ[14][1] = 2
esc_num_succ[15][1] = 2
esc_num_succ[17][1] = 2
esc_num_succ[18][1] = 1
esc_num_succ[21][1] = 1
esc_num_succ[24][1] = 3
esc_num_succ[26][1] = 3
esc_num_succ[27][1] = 2
esc_num_succ[28][1] = 4
esc_num_succ[30][1] = 3
esc_num_succ[31][1] = 1
esc_num_succ[32][1] = 3
esc_num_succ[37][1] = 2
esc_num_succ[38][1] = 2
esc_num_succ[39][1] = 4
esc_num_succ[43][1] = 3
esc_num_succ[44][1] = 3
esc_num_succ[45][1] = 2
esc_num_succ
[['Yes', 2], ['Yes', 3], ['No', 3], ['Yes', 2], ['No', 2], ['No', 1], ['Yes', 3], ['Yes', 2], ['Yes', 1], ['Yes', 1], ['Yes', 3], ['Yes', 2], ['Yes', 3], ['No', 1], ['Yes', 2], ['Yes', 2], ['Yes', 1], ['Yes', 2], ['No', 1], ['No', 1], ['Yes', 4], ['No', 1], ['Yes', 1], ['Yes', 1], ['Yes', 3], ['Yes', 1], ['Yes', 3], ['No', 2], ['Yes', 4], ['Yes', 5], ['Yes', 3], ['No', 1], ['Yes', 3], ['Yes', 1], ['Yes', 1], ['Yes', 1], ['No', 1], ['Yes', 2], ['Yes', 2], ['Yes', 4], ['No', 1], ['Yes', 1], ['No', 1], ['Yes', 3], ['Yes', 3], ['No', 2], ['Yes', 1], ['No', 1]]
one_esc = 0
one_noesc = 0
two_esc = 0
two_noesc = 0
more_esc = 0
more_noesc = 0
for i in esc_num_succ:
if i[1] == 1 and i[0] == "Yes":
one_esc += 1
elif i[1] == 2 and i[0] == "Yes":
two_esc += 1
elif i[1] >= 3 and i[0] == "Yes":
more_esc += 1
if i[1] == 1 and i[0] == "No":
one_noesc += 1
elif i[1] == 2 and i[0] == "No":
two_noesc += 1
elif i[1] >= 3 and i[0] == "No":
more_noesc += 1
import matplotlib.pyplot as plt
plt.bar("one_success",one_esc)
plt.bar("one_fail",one_noesc)
<BarContainer object of 1 artists>
plt.bar("two_success",two_esc)
plt.bar("two_fail",two_noesc)
<BarContainer object of 1 artists>
plt.bar("more_success",more_esc)
plt.bar("more_fail",more_noesc)
<BarContainer object of 1 artists>
Qualitatively looking at the bar graphs we can safely conclude that the escapees failed almost as much they succeeded when there was just one of them; they succeeded more than half the times when two escapees were involved, and suceeded most of the times when more than two escapees were involved.
Three's not a crowd when you want to escape a prison in a helicopter!
#printing the names from the data
for i in data:
print(i[4])
Joel David Kaplan Carlos Antonio Contreras Castro JB O'Hagan Seamus TwomeyKevin Mallon Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson Gérard DupréDaniel Beaumont Marina Paquet (hijacker)Giles Arseneault (prisoner) David McMillan James Rodney LeonardWilliam Douglas BallewJesse Glenn Smith José Carlos dos Reis Encina, a.k.a. "Escadinha" Michel Vaujour Samantha Lopez André BellaïcheGianluigi EspositoLuciano Cipollari Sydney DraperJohn Kendall Mahoney Danny Francis MitchellRandy Lackey Ben Kramer Ralph BrownFreddie Gonzales Robert FordDavid Thomas William Lane — — Michel Vaujour Four members of the Manuel Rodriguez Patriotic Front — John Killick Steven Whitsett — Pascal Payet Abdelhamid CarnousEmile Forma-SariJean-Philippe Lecase — — Orlando Cartagena Jose Rodriguez Victor Diaz Hector Diaz Jose Tapia Eric AlboreoFranck PerlettoMichel Valero — Hubert SellesJean-Claude MorettiMohamed Bessame Vassilis Paleokostas Eric Ferdinand Pascal Payet Nordin Benallal Vasilis PaleokostasAlket Rizai Alexin JismyFabrice Michel Ashraf Sekkaki plus three other criminals Brian Lawrence Alexey Shestakov Panagiotis Vlastos Benjamin Hudon-BarbeauDanny Provençal Yves DenisDenis LefebvreSerge Pomerleau Pola RoupaNikos Maziotis Rédoine Faïd Kristel A.
escapees_all = []
for i in data:
escapees_all.append(i[4])
print(escapees_all)
['Joel David Kaplan Carlos Antonio Contreras Castro', "JB O'Hagan Seamus TwomeyKevin Mallon", 'Garrett Brock TrapnellMartin Joseph McNallyJames Kenneth Johnson', 'Gérard DupréDaniel Beaumont', 'Marina Paquet (hijacker)Giles Arseneault (prisoner)', 'David McMillan', 'James Rodney LeonardWilliam Douglas BallewJesse Glenn Smith', 'José Carlos dos Reis Encina, a.k.a. "Escadinha"', 'Michel Vaujour', 'Samantha Lopez', 'André BellaïcheGianluigi EspositoLuciano Cipollari', 'Sydney DraperJohn Kendall', 'Mahoney Danny Francis MitchellRandy Lackey', 'Ben Kramer', 'Ralph BrownFreddie Gonzales', 'Robert FordDavid Thomas', 'William Lane', '—', '—', 'Michel Vaujour', 'Four members of the Manuel Rodriguez Patriotic Front', '—', 'John Killick', 'Steven Whitsett', '—', 'Pascal Payet', 'Abdelhamid CarnousEmile Forma-SariJean-Philippe Lecase', '—', '—', 'Orlando Cartagena Jose Rodriguez Victor Diaz Hector Diaz Jose Tapia', 'Eric AlboreoFranck PerlettoMichel Valero', '—', 'Hubert SellesJean-Claude MorettiMohamed Bessame', 'Vassilis Paleokostas', 'Eric Ferdinand', 'Pascal Payet', 'Nordin Benallal', 'Vasilis PaleokostasAlket Rizai', 'Alexin JismyFabrice Michel', 'Ashraf Sekkaki plus three other criminals', 'Brian Lawrence', 'Alexey Shestakov', 'Panagiotis Vlastos', 'Benjamin Hudon-BarbeauDanny Provençal', 'Yves DenisDenis LefebvreSerge Pomerleau', 'Pola RoupaNikos Maziotis', 'Rédoine Faïd', 'Kristel A.']
,
esc_all_str = ["".join(escapees_all)]
str(esc_all_str)
'[\'Joel David Kaplan Carlos Antonio Contreras CastroJB O\\\'Hagan Seamus TwomeyKevin MallonGarrett Brock TrapnellMartin Joseph McNallyJames Kenneth JohnsonGérard DupréDaniel BeaumontMarina Paquet (hijacker)Giles Arseneault (prisoner)David McMillanJames Rodney LeonardWilliam Douglas BallewJesse Glenn SmithJosé Carlos dos Reis Encina, a.k.a. "Escadinha"Michel VaujourSamantha LopezAndré BellaïcheGianluigi EspositoLuciano CipollariSydney DraperJohn KendallMahoney Danny Francis MitchellRandy LackeyBen KramerRalph BrownFreddie GonzalesRobert FordDavid ThomasWilliam Lane——Michel VaujourFour members of the Manuel Rodriguez Patriotic Front—John KillickSteven Whitsett—Pascal PayetAbdelhamid CarnousEmile Forma-SariJean-Philippe Lecase——Orlando Cartagena Jose Rodriguez Victor Diaz Hector Diaz Jose TapiaEric AlboreoFranck PerlettoMichel Valero—Hubert SellesJean-Claude MorettiMohamed BessameVassilis PaleokostasEric FerdinandPascal PayetNordin BenallalVasilis PaleokostasAlket RizaiAlexin JismyFabrice MichelAshraf Sekkaki plus three other criminalsBrian LawrenceAlexey ShestakovPanagiotis VlastosBenjamin Hudon-BarbeauDanny ProvençalYves DenisDenis LefebvreSerge PomerleauPola RoupaNikos MaziotisRédoine FaïdKristel A.\']'
# removing punctuations using a function
def punctuation(string):
# punctuation marks
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~—— —'''
# traverse the given string and if any punctuation
# marks occur replace it with null
for x in string.lower():
if x in punctuations:
string = string.replace(x, "")
# Print string without punctuation
return string
esc_onestring = punctuation(str(esc_all_str))
esc_onestring
'JoelDavidKaplanCarlosAntonioContrerasCastroJBOHaganSeamusTwomeyKevinMallonGarrettBrockTrapnellMartinJosephMcNallyJamesKennethJohnsonGérardDupréDanielBeaumontMarinaPaquethijackerGilesArseneaultprisonerDavidMcMillanJamesRodneyLeonardWilliamDouglasBallewJesseGlennSmithJoséCarlosdosReisEncinaakaEscadinhaMichelVaujourSamanthaLopezAndréBellaïcheGianluigiEspositoLucianoCipollariSydneyDraperJohnKendallMahoneyDannyFrancisMitchellRandyLackeyBenKramerRalphBrownFreddieGonzalesRobertFordDavidThomasWilliamLaneMichelVaujourFourmembersoftheManuelRodriguezPatrioticFrontJohnKillickStevenWhitsettPascalPayetAbdelhamidCarnousEmileFormaSariJeanPhilippeLecaseOrlandoCartagenaJoseRodriguezVictorDiazHectorDiazJoseTapiaEricAlboreoFranckPerlettoMichelValeroHubertSellesJeanClaudeMorettiMohamedBessameVassilisPaleokostasEricFerdinandPascalPayetNordinBenallalVasilisPaleokostasAlketRizaiAlexinJismyFabriceMichelAshrafSekkakiplusthreeothercriminalsBrianLawrenceAlexeyShestakovPanagiotisVlastosBenjaminHudonBarbeauDannyProvençalYvesDenisDenisLefebvreSergePomerleauPolaRoupaNikosMaziotisRédoineFaïdKristelA'
#splitting at uppercase letters using re.findall() method
import re
esc_list = []
esc_list = re.findall('[A-Z][^A-Z]*', esc_onestring)
print(esc_list)
['Joel', 'David', 'Kaplan', 'Carlos', 'Antonio', 'Contreras', 'Castro', 'J', 'B', 'O', 'Hagan', 'Seamus', 'Twomey', 'Kevin', 'Mallon', 'Garrett', 'Brock', 'Trapnell', 'Martin', 'Joseph', 'Mc', 'Nally', 'James', 'Kenneth', 'Johnson', 'Gérard', 'Dupré', 'Daniel', 'Beaumont', 'Marina', 'Paquethijacker', 'Giles', 'Arseneaultprisoner', 'David', 'Mc', 'Millan', 'James', 'Rodney', 'Leonard', 'William', 'Douglas', 'Ballew', 'Jesse', 'Glenn', 'Smith', 'José', 'Carlosdos', 'Reis', 'Encinaaka', 'Escadinha', 'Michel', 'Vaujour', 'Samantha', 'Lopez', 'André', 'Bellaïche', 'Gianluigi', 'Esposito', 'Luciano', 'Cipollari', 'Sydney', 'Draper', 'John', 'Kendall', 'Mahoney', 'Danny', 'Francis', 'Mitchell', 'Randy', 'Lackey', 'Ben', 'Kramer', 'Ralph', 'Brown', 'Freddie', 'Gonzales', 'Robert', 'Ford', 'David', 'Thomas', 'William', 'Lane', 'Michel', 'Vaujour', 'Fourmembersofthe', 'Manuel', 'Rodriguez', 'Patriotic', 'Front', 'John', 'Killick', 'Steven', 'Whitsett', 'Pascal', 'Payet', 'Abdelhamid', 'Carnous', 'Emile', 'Forma', 'Sari', 'Jean', 'Philippe', 'Lecase', 'Orlando', 'Cartagena', 'Jose', 'Rodriguez', 'Victor', 'Diaz', 'Hector', 'Diaz', 'Jose', 'Tapia', 'Eric', 'Alboreo', 'Franck', 'Perletto', 'Michel', 'Valero', 'Hubert', 'Selles', 'Jean', 'Claude', 'Moretti', 'Mohamed', 'Bessame', 'Vassilis', 'Paleokostas', 'Eric', 'Ferdinand', 'Pascal', 'Payet', 'Nordin', 'Benallal', 'Vasilis', 'Paleokostas', 'Alket', 'Rizai', 'Alexin', 'Jismy', 'Fabrice', 'Michel', 'Ashraf', 'Sekkakiplusthreeothercriminals', 'Brian', 'Lawrence', 'Alexey', 'Shestakov', 'Panagiotis', 'Vlastos', 'Benjamin', 'Hudon', 'Barbeau', 'Danny', 'Provençal', 'Yves', 'Denis', 'Denis', 'Lefebvre', 'Serge', 'Pomerleau', 'Pola', 'Roupa', 'Nikos', 'Maziotis', 'Rédoine', 'Faïd', 'Kristel', 'A']
rough_name_list = []
for i in range(len(esc_list)):
if i < len(esc_list)-1:
rough_name_list.append(esc_list[i]+esc_list[i+1])
print(rough_name_list)
['JoelDavid', 'DavidKaplan', 'KaplanCarlos', 'CarlosAntonio', 'AntonioContreras', 'ContrerasCastro', 'CastroJ', 'JB', 'BO', 'OHagan', 'HaganSeamus', 'SeamusTwomey', 'TwomeyKevin', 'KevinMallon', 'MallonGarrett', 'GarrettBrock', 'BrockTrapnell', 'TrapnellMartin', 'MartinJoseph', 'JosephMc', 'McNally', 'NallyJames', 'JamesKenneth', 'KennethJohnson', 'JohnsonGérard', 'GérardDupré', 'DupréDaniel', 'DanielBeaumont', 'BeaumontMarina', 'MarinaPaquethijacker', 'PaquethijackerGiles', 'GilesArseneaultprisoner', 'ArseneaultprisonerDavid', 'DavidMc', 'McMillan', 'MillanJames', 'JamesRodney', 'RodneyLeonard', 'LeonardWilliam', 'WilliamDouglas', 'DouglasBallew', 'BallewJesse', 'JesseGlenn', 'GlennSmith', 'SmithJosé', 'JoséCarlosdos', 'CarlosdosReis', 'ReisEncinaaka', 'EncinaakaEscadinha', 'EscadinhaMichel', 'MichelVaujour', 'VaujourSamantha', 'SamanthaLopez', 'LopezAndré', 'AndréBellaïche', 'BellaïcheGianluigi', 'GianluigiEsposito', 'EspositoLuciano', 'LucianoCipollari', 'CipollariSydney', 'SydneyDraper', 'DraperJohn', 'JohnKendall', 'KendallMahoney', 'MahoneyDanny', 'DannyFrancis', 'FrancisMitchell', 'MitchellRandy', 'RandyLackey', 'LackeyBen', 'BenKramer', 'KramerRalph', 'RalphBrown', 'BrownFreddie', 'FreddieGonzales', 'GonzalesRobert', 'RobertFord', 'FordDavid', 'DavidThomas', 'ThomasWilliam', 'WilliamLane', 'LaneMichel', 'MichelVaujour', 'VaujourFourmembersofthe', 'FourmembersoftheManuel', 'ManuelRodriguez', 'RodriguezPatriotic', 'PatrioticFront', 'FrontJohn', 'JohnKillick', 'KillickSteven', 'StevenWhitsett', 'WhitsettPascal', 'PascalPayet', 'PayetAbdelhamid', 'AbdelhamidCarnous', 'CarnousEmile', 'EmileForma', 'FormaSari', 'SariJean', 'JeanPhilippe', 'PhilippeLecase', 'LecaseOrlando', 'OrlandoCartagena', 'CartagenaJose', 'JoseRodriguez', 'RodriguezVictor', 'VictorDiaz', 'DiazHector', 'HectorDiaz', 'DiazJose', 'JoseTapia', 'TapiaEric', 'EricAlboreo', 'AlboreoFranck', 'FranckPerletto', 'PerlettoMichel', 'MichelValero', 'ValeroHubert', 'HubertSelles', 'SellesJean', 'JeanClaude', 'ClaudeMoretti', 'MorettiMohamed', 'MohamedBessame', 'BessameVassilis', 'VassilisPaleokostas', 'PaleokostasEric', 'EricFerdinand', 'FerdinandPascal', 'PascalPayet', 'PayetNordin', 'NordinBenallal', 'BenallalVasilis', 'VasilisPaleokostas', 'PaleokostasAlket', 'AlketRizai', 'RizaiAlexin', 'AlexinJismy', 'JismyFabrice', 'FabriceMichel', 'MichelAshraf', 'AshrafSekkakiplusthreeothercriminals', 'SekkakiplusthreeothercriminalsBrian', 'BrianLawrence', 'LawrenceAlexey', 'AlexeyShestakov', 'ShestakovPanagiotis', 'PanagiotisVlastos', 'VlastosBenjamin', 'BenjaminHudon', 'HudonBarbeau', 'BarbeauDanny', 'DannyProvençal', 'ProvençalYves', 'YvesDenis', 'DenisDenis', 'DenisLefebvre', 'LefebvreSerge', 'SergePomerleau', 'PomerleauPola', 'PolaRoupa', 'RoupaNikos', 'NikosMaziotis', 'MaziotisRédoine', 'RédoineFaïd', 'FaïdKristel', 'KristelA']
repeat_names = []
for i in rough_name_list:
if rough_name_list.count(i)>1:
repeat_names.append(i)
repeat_names
['MichelVaujour', 'MichelVaujour', 'PascalPayet', 'PascalPayet']
#for unique repeated names, we can use a set
repeat_names=list(set(repeat_names))
repeat_names
['MichelVaujour', 'PascalPayet']
So we find that the two repeated escapees are Michel Vaujour and Pascal Payet.