This snippit was written by Chris R. Albon and is part of his collection of well-documented Python snippits. All code is written in Python 3 in iPython notebook and offered under the Creative Commons Attribution-ShareAlike 4.0 International License.
# import the pandas package and call it "pd".
import pandas as pd
# Create a list of strings (i.e. text), each one a headline.
headlines = ['Germany Attacks Poland Through The Mountains.',
'Britan Declares War On Germany',
'Britan Asks US For Help',
'Japan Attacks US Base In Hawaii',
'US Declares War On Germany']
This might look complicated, but it isn't. The basic idea is that we are telling Python to "loop" through each headline, and if it finds the name of a certain country, then go mark that that is happened. This particular loop actually does two things: first it marks a "yes" or "no" if it finds a country, but it also adds that country to a list of dyads.
I've added comments.
# Create a list variable (hence the square brackets) called poland
poland = []
# Create a list variable called germany
germany = []
# Create a list variable called britan
britan = []
# Create a list variable called japan
japan = []
# Create a list variable called us
US = []
# Create a list variable called dyad
dyad = []
# Now for the loop
# For each row in the variable called headlines,
for row in headlines:
# create a variable called dyad_member, and then,
dyad_member = []
# if poland is in the headline,
if 'Poland' in row:
# append 'yes' to the poland list variable, and,
poland.append('yes')
# append 'poland' to the dyad_member variable
dyad_member.append('Poland')
# otherwise,
else:
# just append 'no' to the poland list variable
poland.append('no')
# the code below just does the exact same thing that
# we just did with poland, but with each other country
if 'Germany' in row:
germany.append('yes')
dyad_member.append('Germany')
else:
germany.append('no')
if 'Britan' in row:
britan.append('yes')
dyad_member.append('britan')
else:
britan.append('no')
if 'Japan' in row:
japan.append('yes')
dyad_member.append('Japan')
else:
japan.append('no')
if 'US' in row:
US.append('yes')
dyad_member.append('US')
else:
US.append('no')
# append the variable dyad_member to the dyad variable
dyad.append(dyad_member)
# view the dyad variable just to make sure we did everything right
dyad
[['Poland', 'Germany'], ['Germany', 'britan'], ['britan', 'US'], ['Japan', 'US'], ['Germany', 'US']]
# Create list variables, country1 and country2
country1 = []
country2 = []
# For each row in the variable, dyad
for row in dyad:
# append the first country listed to country1
country1.append(row[0])
# append the second country listed to country2
country2.append(row[1])
# Now to turn our work into a dataframe with rows and columns
# create a dataframe called df
df = pd.DataFrame()
# create a column called headlines, from the variable deadlines
df['headlines'] = headlines
# create a column called country1, from the variable country1
df['country1'] = country1
# create a column called country2, from the variable country2
df['country2'] = country2
# create a column called poland, from the variable poland
df['poland'] = poland
# create a column called germany, from the variable germany
df['germany'] = germany
# create a column called britan, from the variable britan
df['britan'] = britan
# create a column called japan, from the variable japan
df['japan'] = japan
# create a column called US, from the variable US
df['US'] = US
# View the dataframe
df
headlines | country1 | country2 | poland | germany | britan | japan | US | |
---|---|---|---|---|---|---|---|---|
0 | Germany Attacks Poland Through The Mountains. | Poland | Germany | yes | yes | no | no | no |
1 | Britan Declares War On Germany | Germany | britan | no | yes | yes | no | no |
2 | Britan Asks US For Help | britan | US | no | no | yes | no | yes |
3 | Japan Attacks US Base In Hawaii | Japan | US | no | no | no | yes | yes |
4 | US Declares War On Germany | Germany | US | no | yes | no | no | yes |