Earth temperature over time

Is global temperature rising? How much? This is a question of burning importance in today's world!

Data about global temperatures are available from several sources: NASA, the National Climatic Data Center (NCDC) and the University of East Anglia in the UK. Check out the University Corporation for Atmospheric Research (UCAR) for an in-depth discussion.

The NASA Goddard Space Flight Center is one of our sources of global climate data. They produced this video showing a color map of the changing global surface temperature anomalies from 1880 to 2015.

The term global temperature anomaly means the difference in temperature with respect to a reference value or a long-term average. It is a very useful way of looking at the problem and in many ways better than absolute temperature. For example, a winter month may be colder than average in Washington DC, and also in Miami, but the absolute temperatures will be different in both places.

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('gGOzHVUQCw0')
Out[1]:

How would we go about understanding the trends from the data on global temperature?

The first step in analyzing unknown data is to generate some simple plots using matplotlib. We are going to look at the temperature-anomaly history, contained in a file, and make our first plot to explore this data.

We are going to smooth the data and then we'll fit a line to it to find a trend, plotting along the way to see how it all looks.

Let's get started!

Step 1: Read a data file.

We took the data from the NOAA (National Oceanic and Atmospheric Administration) webpage. Feel free to play around with the webpage and analyze data on your own, but for now, let's make sure we're working with the same dataset.

Go to the folder data on the GitHub repository: Caminos, then select the file land_global_temperature_anomaly-1880-2016.csv, click where it says "raw" and finally right-click and select Save Page As (choosing a location on your local drive).

This file contains the year on the first column averages of land temperature anomaly listed sequentially on the second column, from 1880 to 2016. We will read the file, then make an initial plot to see what it looks like.

In [2]:
import numpy
In [3]:
year, temp_anomaly = numpy.loadtxt(fname='data/land_global_temperature_anomaly-1880-2016.csv', delimiter=',', 
                              skiprows=5, unpack=True)

Exercise Inspect the data by printing year and temp_anomaly

Step 2: Plot the data.

Let's first load our plotting library, called matplotlib. To get the plots inside the notebook (rather than as popups), we use a special "magic" command, %matplotlib inline:

In [4]:
from matplotlib import pyplot
%matplotlib inline

You can add a semicolon at the end of the plotting command to avoid that stuff that appeared on top of the figure, that Out[x]: [< ...>] ugliness. Try it.

In [5]:
pyplot.plot(year, temp_anomaly);

Now we have a plot but if I give you this plot without any information you would not be able to figure out what kind of data it is! We need labels on the axis, a title and why not a better color, font and size of the ticks. Publication quality plots should always be your standard for plotting. How you present your data will allow others (and probably you in the future) to better understand your work.

Let's make the font of a specific size and type. We don't want to write this out every time we create a plot. Instead, the next few lines of code will apply for all the plots we create from now on.

In [6]:
from matplotlib import rcParams
rcParams['font.family'] = 'serif'
rcParams['font.size'] = 16

We are going to plot the same plot as before but now we will add a few things to make it prettier and publication quality.

In [7]:
#You can set the size of the figure by doing:
pyplot.figure(figsize=(10,5))

#Plotting
pyplot.plot(year, temp_anomaly, color='#2929a3', linestyle='-', linewidth=1) 
pyplot.title('Land global temperature anomalies. \n')
pyplot.xlabel('Year')
pyplot.ylabel('Land temperature anomaly [°C]')
pyplot.grid();

Better ah? Feel free to play around with the parameters and see how it changes.

Step 3: Apply regression

Let's now fit a straight line through the temperature-anomaly data, to see the trends. We need to perform a least-squares linear regression to find the slope and intercept of a line

$$y = mx+b$$

that fits our data. Thankfully, Python and NumPy are here to help! With polyfit(), we get the slope and y-intercept of the line that best fits the data. With poly1d(), we can build the linear function from its slope and y-intercept.

In [8]:
m, b = numpy.polyfit(year, temp_anomaly, 1)

f_linear = numpy.poly1d((m, b)) 
In [9]:
print(f_linear)
 
0.01037 x - 20.15
In [10]:
pyplot.figure(figsize=(10, 5))

pyplot.plot(year, temp_anomaly, color='#2929a3', linestyle='-', linewidth=1, alpha=0.5) 
pyplot.plot(year, f_linear(year), 'k--', linewidth=2, label='Linear regression')
pyplot.xlabel('Year')
pyplot.ylabel('Land temperature anomaly [°C]')
pyplot.legend(loc='best', fontsize=15)
pyplot.grid();

Split regression

We have the linear function that best fits our data but this doesn't look like a good approximation. If you look at the plot you might have noticed that around 1970 the data temperature starts increasing faster.

What if we break the data in two (before and after 1970) and we do a liear regression in each segment? To do that, we need to find in which position of our year array is the year 1970 located. Thanksfully, numpy has a function called numpy.where() that can help us. We need to pass a condition and numpy.where will tells us where in the array the condition is True.

In [11]:
numpy.where(year==1970)
Out[11]:
(array([90]),)

To split the data, we can use a powerful instrument: the colon notation. A colon between two indices indicates a range of values from a start to an end. The rule is that [start:end] includes the element at index start but excludes the one at index end. For example, to grab the first 3 years in our year array, we do:

In [12]:
year[0:3]
Out[12]:
array([ 1880.,  1881.,  1882.])

Now we know how to split our data in two sets, to get two regression lines

In [13]:
year_1 , temp_anomaly_1 = year[0:90], temp_anomaly[0:90]
year_2 , temp_anomaly_2 = year[90:], temp_anomaly[90:]

m1, b1 = numpy.polyfit(year_1, temp_anomaly_1, 1)
m2, b2 = numpy.polyfit(year_2, temp_anomaly_2, 1)

f_linear_1 = numpy.poly1d((m1, b1))
f_linear_2 = numpy.poly1d((m2, b2))
In [14]:
pyplot.figure(figsize=(10, 5))

pyplot.plot(year, temp_anomaly, color='#2929a3', linestyle='-', linewidth=1, alpha=0.5) 
pyplot.plot(year_1, f_linear_1(year_1), 'g--', linewidth=2, label='1880-1969')
pyplot.plot(year_2, f_linear_2(year_2), 'r--', linewidth=2, label='1970-2016')

pyplot.xlabel('Year')
pyplot.ylabel('Land temperature anomaly [°C]')
pyplot.legend(loc='best', fontsize=15)
pyplot.grid();

We learned:

  • applying linear regression to data
  • making our plots more beautiful
  • accessing chunks of an array
  • The Earth is warming up!!!