Young Scientist Conference 2018-05-12

David Hay, Elk Island Public Schools

We'll use Python in a Jupyter notebook to do some data science:

  • Climate Change with Open Data
  • Drawing with Turtles
  • Experimental Probability
  • Text Analysis of Shakespeare

To get started, click on the Cell menu above, and choose Run All. Or you can run each cell individually using Ctrl-Enter.

Climate Change

World Temperature Data

We can get data from NASA Global Temperature so we can graph temperature over time.

In [44]:
import pandas as pd
url = 'https://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt'
global_temperature_df = pd.read_table(url, sep='\s+', header=None, names=['Year','Annual Mean', 'Lowess smoothing'])

Let's see the last ten years of data:

In [45]:
global_temperature_df.tail(10)
Out[45]:
Year Annual Mean Lowess smoothing
128 2008 0.52 0.62
129 2009 0.63 0.62
130 2010 0.70 0.62
131 2011 0.57 0.63
132 2012 0.61 0.67
133 2013 0.64 0.71
134 2014 0.73 0.77
135 2015 0.86 0.83
136 2016 0.99 0.89
137 2017 0.90 0.95

Now let's make a graph of those data.

In [46]:
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

trace0 = go.Scatter(
    x = global_temperature_df['Year'],
    y = global_temperature_df['Annual Mean'],
    mode = 'lines+markers',
    name = 'Annual Mean'
)

trace1 = go.Scatter(
    x = global_temperature_df['Year'],
    y = global_temperature_df['Lowess smoothing'],
    mode = 'lines+markers',
    name = 'Lowess smoothing'
)

data = [trace0]

layout = go.Layout(dict(title='Global Land Ocean Temperature Index'),
                   xaxis=dict(title='Year'),
                   yaxis=dict(title='Temperature Anomaly')
)
fig = go.Figure(data=data,layout=layout)
iplot(fig)

If you want to see a smoothed-out version of the data, change the line data = [trace0] in the above code to be data = [trace1] or data = [trace0,trace1] and run the cell again.

Canadian Temperature Data

Let's grab some data from Canada's Data Portal.

In [47]:
ca_url = 'https://www.canada.ca/content/dam/eccc/migration/main/indicateurs-indicators/64c9e931-fcc6-4df6-ad44-c2b0236fe255/temperaturechange_en.csv'
ca_df = pd.read_table(ca_url, sep=",", skiprows=[0,1])
ca_df.head(10)
Out[47]:
Year Temperature departure (degrees Celsius) Warmest year ranking
0 1948 -0.2 49.0
1 1949 -0.2 50.0
2 1950 -1.3 66.0
3 1951 -0.6 61.0
4 1952 0.8 16.0
5 1953 0.8 17.0
6 1954 0.0 40.0
7 1955 -0.2 48.0
8 1956 -0.8 64.0
9 1957 -0.3 56.0

Now to make a graph of those data:

In [48]:
init_notebook_mode(connected=True)

trace0 = go.Scatter(
    x = ca_df['Year'],
    y = ca_df['Temperature departure (degrees Celsius)'],
    mode = 'lines+markers',
    name = 'Annual Mean'
)

layout = go.Layout(dict(title='Annual Average Temperature Departures in Canada'),
                   xaxis=dict(title='Year'),
                   yaxis=dict(title='Temperature Anomaly')
)
fig = go.Figure(data=[trace0],layout=layout)
iplot(fig)

Do you see a trend in these data sets?

Turtle Drawing

In order to draw with turles, we first need to install a package called mobilechelonian. This only needs to be done once.

In [49]:
!pip install --user mobilechelonian
Requirement already satisfied: mobilechelonian in ./.local/lib/python3.6/site-packages
Requirement already satisfied: IPython in /opt/conda/lib/python3.6/site-packages (from mobilechelonian)
Requirement already satisfied: ipywidgets>=7.0.0 in /opt/conda/lib/python3.6/site-packages (from mobilechelonian)
Requirement already satisfied: decorator in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: traitlets>=4.2 in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: simplegeneric>0.8 in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: backcall in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: setuptools>=18.5 in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: jedi>=0.10 in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: pexpect; sys_platform != "win32" in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.15 in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: pickleshare in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: pygments in /opt/conda/lib/python3.6/site-packages (from IPython->mobilechelonian)
Requirement already satisfied: widgetsnbextension~=3.1.0 in /opt/conda/lib/python3.6/site-packages (from ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: ipykernel>=4.5.1 in /opt/conda/lib/python3.6/site-packages (from ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: nbformat>=4.2.0 in /opt/conda/lib/python3.6/site-packages (from ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from traitlets>=4.2->IPython->mobilechelonian)
Requirement already satisfied: ipython-genutils in /opt/conda/lib/python3.6/site-packages (from traitlets>=4.2->IPython->mobilechelonian)
Requirement already satisfied: parso>=0.2.0 in /opt/conda/lib/python3.6/site-packages (from jedi>=0.10->IPython->mobilechelonian)
Requirement already satisfied: ptyprocess>=0.5 in /opt/conda/lib/python3.6/site-packages (from pexpect; sys_platform != "win32"->IPython->mobilechelonian)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.6/site-packages (from prompt-toolkit<2.0.0,>=1.0.15->IPython->mobilechelonian)
Requirement already satisfied: notebook>=4.4.1 in /opt/conda/lib/python3.6/site-packages (from widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: tornado>=4.0 in /opt/conda/lib/python3.6/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: jupyter-client in /opt/conda/lib/python3.6/site-packages (from ipykernel>=4.5.1->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: jupyter-core in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /opt/conda/lib/python3.6/site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: terminado>=0.8.1 in /opt/conda/lib/python3.6/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: nbconvert in /opt/conda/lib/python3.6/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.6/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: Send2Trash in /opt/conda/lib/python3.6/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: pyzmq>=13 in /opt/conda/lib/python3.6/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.6/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: mistune>=0.7.4 in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: bleach in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: entrypoints>=0.2.2 in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: testpath in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/conda/lib/python3.6/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.6/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)
Requirement already satisfied: html5lib!=0.9999,!=0.99999,<0.99999999,>=0.999 in /opt/conda/lib/python3.6/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.1.0->ipywidgets>=7.0.0->mobilechelonian)

Then we can import that package, create a turle (named t in this case), and move the turle around.

In [50]:
from mobilechelonian import Turtle

t = Turtle()
t.speed(10)
t.pencolor("red")
t.penup()
t.pendown()
t.forward(50)
t.backward(50)
t.right(90)
t.left(90)
t.circle(15)
t.home()

Using examples from the code above, have a new turtle draw a rectangle. Feel free to change the turtle's name, just make sure you spell it exactly the same each time.

In [51]:
venus = Turtle()
venus.forward(50)

Then try drawing some other shapes, or perhaps writing your name.

Experimental Probability

We can use a Python module called randint to generate random integers. This allows us to simulate experiments where we roll dice or flip coins.

Let's say we wanted to flip a coin 20 times. We could generate a random integer that is 1 or 2 and decide that "heads" is 1 and "tails" is 2.

In [52]:
from random import randint
howMany = 20
sides = 2

# create an empty list that we will store results in
results = []

# create a loop that will run once for each number of flips
for x in range(0,howMany):
    # generate a random number between 1 and 2
    result = randint(1,sides)
    # add the result to our list
    results += [result]
results
Out[52]:
[1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 1, 1, 2, 1]

Each time you run the above code you should get a different list of numbers.

Now let's do the same thing, but create a bar graph of the frequency of each number.

In [53]:
from random import randint
howMany = 20
sides = 2

results = []
for x in range(0,howMany):
    result = randint(1,sides)
    results += [result]

# count how many times we got each sum
from collections import Counter
counts = Counter(results)

# create a bar graph from that count
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()

The same code would work if we wanted to use dice, we just need to change the number of sides.

In [54]:
from random import randint
howMany = 20
sides = 6

results = []
for x in range(0,howMany):
    result = randint(1,sides)
    results += [result]

from collections import Counter
counts = Counter(results)
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()

Theoretically we expect each number to be rolled with the same frequency, but 20 is not a large enough sample size. Let's try it with more rolls.

In [55]:
from random import randint
howMany = 1000
sides = 6

results = []
for x in range(0,howMany):
    result = randint(1,sides)
    results += [result]

from collections import Counter
counts = Counter(results)
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()

That's pretty close to the theoretical probability.

One more statistics experiment, let's roll two dice and add them together. Five thousand times.

In [56]:
from random import randint
howMany = 5000
sides = 6

results = []
for x in range(0,howMany):
    number1 = randint(1,sides)
    number2 = randint(1,sides)
    result = number1 + number2
    results += [result]

from collections import Counter
counts = Counter(results)
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()

The most common sum is 7, since there are the most possible combinations. The least commonly rolled are 2 and 12, since there is only one way to make each of them (1+1 and 6+6 respectively).

Text Analysis

If you're interested in automated text analysis, check out this notebook.