David Hay, Elk Island Public Schools

We'll use Python in a Jupyter notebook to do some data science:

- Climate Change with Open Data
- Drawing with Turtles
- Experimental Probability
- Text Analysis of Shakespeare

*To get started, click on the* Cell *menu above, and choose* Run All. *Or you can run each cell individually using Ctrl-Enter.*

We can get data from NASA Global Temperature so we can graph temperature over time.

In [44]:

```
import pandas as pd
url = 'https://climate.nasa.gov/system/internal_resources/details/original/647_Global_Temperature_Data_File.txt'
global_temperature_df = pd.read_table(url, sep='\s+', header=None, names=['Year','Annual Mean', 'Lowess smoothing'])
```

Let's see the last ten years of data:

In [45]:

```
global_temperature_df.tail(10)
```

Out[45]:

Now let's make a graph of those data.

In [46]:

```
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
trace0 = go.Scatter(
x = global_temperature_df['Year'],
y = global_temperature_df['Annual Mean'],
mode = 'lines+markers',
name = 'Annual Mean'
)
trace1 = go.Scatter(
x = global_temperature_df['Year'],
y = global_temperature_df['Lowess smoothing'],
mode = 'lines+markers',
name = 'Lowess smoothing'
)
data = [trace0]
layout = go.Layout(dict(title='Global Land Ocean Temperature Index'),
xaxis=dict(title='Year'),
yaxis=dict(title='Temperature Anomaly')
)
fig = go.Figure(data=data,layout=layout)
iplot(fig)
```

If you want to see a smoothed-out version of the data, change the line ** data = [trace0]** in the above code to be

`data = [trace1]`

`data = [trace0,trace1]`

Let's grab some data from Canada's Data Portal.

In [47]:

```
ca_url = 'https://www.canada.ca/content/dam/eccc/migration/main/indicateurs-indicators/64c9e931-fcc6-4df6-ad44-c2b0236fe255/temperaturechange_en.csv'
ca_df = pd.read_table(ca_url, sep=",", skiprows=[0,1])
ca_df.head(10)
```

Out[47]:

Now to make a graph of those data:

In [48]:

```
init_notebook_mode(connected=True)
trace0 = go.Scatter(
x = ca_df['Year'],
y = ca_df['Temperature departure (degrees Celsius)'],
mode = 'lines+markers',
name = 'Annual Mean'
)
layout = go.Layout(dict(title='Annual Average Temperature Departures in Canada'),
xaxis=dict(title='Year'),
yaxis=dict(title='Temperature Anomaly')
)
fig = go.Figure(data=[trace0],layout=layout)
iplot(fig)
```

Do you see a trend in these data sets?

In order to draw with turles, we first need to install a package called `mobilechelonian`

. This only needs to be done once.

In [49]:

```
!pip install --user mobilechelonian
```

Then we can import that package, create a turle (named `t`

in this case), and move the turle around.

In [50]:

```
from mobilechelonian import Turtle
t = Turtle()
t.speed(10)
t.pencolor("red")
t.penup()
t.pendown()
t.forward(50)
t.backward(50)
t.right(90)
t.left(90)
t.circle(15)
t.home()
```

Using examples from the code above, have a new turtle draw a rectangle. Feel free to change the turtle's name, just make sure you spell it exactly the same each time.

In [51]:

```
venus = Turtle()
venus.forward(50)
```

Then try drawing some other shapes, or perhaps writing your name.

We can use a Python module called `randint`

to generate random integers. This allows us to simulate experiments where we roll dice or flip coins.

Let's say we wanted to flip a coin 20 times. We could generate a random integer that is 1 or 2 and decide that "heads" is 1 and "tails" is 2.

In [52]:

```
from random import randint
howMany = 20
sides = 2
# create an empty list that we will store results in
results = []
# create a loop that will run once for each number of flips
for x in range(0,howMany):
# generate a random number between 1 and 2
result = randint(1,sides)
# add the result to our list
results += [result]
results
```

Out[52]:

Each time you run the above code you should get a different list of numbers.

Now let's do the same thing, but create a bar graph of the frequency of each number.

In [53]:

```
from random import randint
howMany = 20
sides = 2
results = []
for x in range(0,howMany):
result = randint(1,sides)
results += [result]
# count how many times we got each sum
from collections import Counter
counts = Counter(results)
# create a bar graph from that count
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()
```

The same code would work if we wanted to use dice, we just need to change the number of sides.

In [54]:

```
from random import randint
howMany = 20
sides = 6
results = []
for x in range(0,howMany):
result = randint(1,sides)
results += [result]
from collections import Counter
counts = Counter(results)
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()
```

Theoretically we expect each number to be rolled with the same frequency, but 20 is not a large enough sample size. Let's try it with more rolls.

In [55]:

```
from random import randint
howMany = 1000
sides = 6
results = []
for x in range(0,howMany):
result = randint(1,sides)
results += [result]
from collections import Counter
counts = Counter(results)
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()
```

That's pretty close to the theoretical probability.

One more statistics experiment, let's roll two dice and add them together. Five thousand times.

In [56]:

```
from random import randint
howMany = 5000
sides = 6
results = []
for x in range(0,howMany):
number1 = randint(1,sides)
number2 = randint(1,sides)
result = number1 + number2
results += [result]
from collections import Counter
counts = Counter(results)
import matplotlib.pyplot as plot
plot.bar(counts.keys(),counts.values())
plot.show()
```

The most common sum is 7, since there are the most possible combinations. The least commonly rolled are 2 and 12, since there is only one way to make each of them (1+1 and 6+6 respectively).

If you're interested in automated text analysis, check out this notebook.