Analysis of the NASA GISTEMP Datasets

by
Andreas Putz
In [1]:
import IPython
from IPython.display import HTML, display

The purpose of this notebook is to reproduce and improve on some graphs and movies I have recently encountered on social media such as LinkedIn and others. One of the most recent was a post on LinkedIn (https://www.linkedin.com/feed/update/urn:li:activity:6300917710327029760/) showing this video:

Originally I liked this graph, until I looked a bit closer:

  1. The bars should start at the 0 ring
  2. The global graph does not have any scales
  3. No definition of temperature anomaly
  4. How do I reproduce this from the data source cited?

The datasource cited is **Land-Ocean Temperature Index, ERSSTv4, 1200km smoothing**, which can be found here https://data.giss.nasa.gov/gistemp/ .

In this notebook, we will take a first look at the GISTEMP temperature anomaly datasets. These datasest denote the temperature deviation with respect to a reference time periode. For the sake of this notebook, we will not concern ourselves with how the anomaly data is prepred. This is a story for another article.

Notebook Setup

This section deals with the module imports for this notebook. This section needs to complete for the notebook to work correctly.

My anaconda setup:

  • Python 3.5 environment
  • Anaconda notebook extension + community notebook extensions
  • conda-forge channel activated
In [2]:
import IPython
from IPython.display import HTML, display
import datetime

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

import scipy as sp
import pandas as pd

import pylab as plt
import matplotlib.mlab as mlab
from matplotlib.dates import drange, MonthLocator
%matplotlib notebook

# Modules to create nice colormaps and earth projections
try:
    from mpl_toolkits.basemap import Basemap
    import cmocean
except Exception as ex:
    print('Module basemap,cmocean not installed, please use pip or conda to install basemap and cmocean')
    print(ex)

import urllib.request
import os
import sys
import gzip
import zipfile
import shutil


try:
    import plotly
    import plotly.plotly as py
    from plotly.graph_objs import *
    plotly.offline.init_notebook_mode(connected=True)
    #plotly.offline.init_notebook_mode()
    print('Plotly Version: ', plotly.__version__)
except:
    print('Plotly not installed correctly')
    
try:
    import netCDF4
except Exception as ex:
    print('netCDF4 is not correctly installed. Please us pip or conda to install.')
    print(ex)
Plotly Version:  2.0.12
In [3]:
%%html
<style>
.output_wrapper, .output {
    height:auto !important;
    max-height:4000px;  /* your desired max-height here */
}
.output_scroll {
    box-shadow:none !important;
    webkit-box-shadow:none !important;
}
</style>
In [4]:
print('Notebook Executed:\t ' + str(datetime.datetime.now()))
print('='*80)
print('Python Version:')
print('-'*80)
print(sys.version)
print('='*80)
Notebook Executed:	 2017-08-30 14:21:54.072342
================================================================================
Python Version:
--------------------------------------------------------------------------------
3.5.3 | packaged by conda-forge | (default, May 12 2017, 16:16:49) [MSC v.1900 64 bit (AMD64)]
================================================================================

Global Data: Combined Land-Surface Air and Sea-Surface Water Temperature Anomalies (Land-Ocean Temperature Index, LOTI)

The LOTI (Land-Ocean Temperature Index) uses a more complicated averaging scheme between the ocean and land data to account for the higher heat capactity of water. According to NASA this leads to a slight underestimation of cooling and warming trends due to the dampening effect of the oceans. The two input datasets for the LOTI calculations are:

  • Surface Air Temperatures (SATs) measured by weather stations worldwide, and
  • Sea Surface Temperatures (SST) measured by ships, buoys and more recentely satellite data.

The units for the LOTI are in degC and denote the temperature deviation with respect to a referece temperature.Currently the reference temperature is the mean temperature from 1951 to 1980. Nasa calls this quantity the temperature anomaly.

Nasa executes the avaeraging procedure every month and make the resulting datafiles available. For this section, three files were analyzed:

  • The global mean fore each month since 1880: GLB.Ts+dSST.csv
  • The northern hemisphere mean for each month since 1880: NH.Ts+dSST.csv
  • The sourthern hemisphere mean for each month since 1880: SH.Ts+dSST.csv

Source: https://data.giss.nasa.gov/gistemp/faq

Download and parse the files

In [5]:
months = {}
months['all']= ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']
months['Q1'] = ['Jan','Feb','Mar']
months['Q2'] = ['Apr','May','Jun']
months['Q3'] = ['Jul','Aug','Sep']
months['Q4'] = ['Oct','Nov','Dec']
In [6]:
try:
    urllib.request.urlretrieve('https://data.giss.nasa.gov/gistemp/tabledata_v3/GLB.Ts+dSST.csv',
                               'GLB.Ts+dSST.csv')
    urllib.request.urlretrieve('https://data.giss.nasa.gov/gistemp/tabledata_v3/NH.Ts+dSST.csv',
                               'NH.Ts+dSST.csv')
    urllib.request.urlretrieve('https://data.giss.nasa.gov/gistemp/tabledata_v3/SH.Ts+dSST.csv',
                               'SH.Ts+dSST.csv')
    print('Download successful')
except:
    print('Data cold not be retrieved !!')
Download successful
In [7]:
LOTI = {}
LOTI["GLOBAL"] = pd.read_csv('GLB.Ts+dSST.csv',header=1,na_values='***',index_col='Year')
LOTI["NH"] = pd.read_csv('NH.Ts+dSST.csv',header=1,na_values='***',index_col='Year')
LOTI["SH"] = pd.read_csv('SH.Ts+dSST.csv',header=1,na_values='***',index_col='Year')

Initial Data Inspection

In [8]:
df = LOTI['GLOBAL']
df[-10:]
Out[8]:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec J-D D-N DJF MAM JJA SON
Year
2008 0.23 0.34 0.73 0.51 0.47 0.46 0.58 0.43 0.63 0.63 0.65 0.53 0.52 0.51 0.34 0.57 0.49 0.64
2009 0.61 0.50 0.52 0.59 0.64 0.64 0.71 0.66 0.69 0.64 0.76 0.65 0.64 0.62 0.55 0.58 0.67 0.70
2010 0.73 0.80 0.92 0.84 0.73 0.62 0.59 0.63 0.59 0.69 0.77 0.46 0.70 0.71 0.73 0.83 0.61 0.68
2011 0.48 0.51 0.62 0.62 0.50 0.57 0.71 0.71 0.54 0.63 0.55 0.53 0.58 0.58 0.48 0.58 0.66 0.57
2012 0.45 0.47 0.56 0.68 0.74 0.62 0.54 0.61 0.72 0.75 0.73 0.52 0.62 0.62 0.48 0.66 0.59 0.74
2013 0.66 0.55 0.66 0.52 0.58 0.64 0.57 0.66 0.77 0.67 0.78 0.65 0.64 0.63 0.58 0.59 0.63 0.74
2014 0.73 0.52 0.76 0.78 0.85 0.66 0.56 0.81 0.88 0.82 0.66 0.78 0.73 0.72 0.63 0.80 0.68 0.78
2015 0.81 0.87 0.90 0.74 0.76 0.78 0.71 0.79 0.82 1.07 1.05 1.12 0.87 0.84 0.82 0.80 0.76 0.98
2016 1.18 1.35 1.32 1.09 0.92 0.78 0.82 0.99 0.87 0.89 0.90 0.84 1.00 1.02 1.22 1.11 0.86 0.89
2017 0.98 1.13 1.14 0.94 0.89 0.68 0.83 NaN NaN NaN NaN NaN NaN NaN 0.98 0.99 NaN NaN
In [9]:
plt.figure(1,figsize=(10,6))
plt.plot(df.index,df[months['all']].mean(axis=1),'k',label='mean')
plt.plot(df.index,df[months['all']].max(axis=1),'r--',label='max',alpha=0.3)
plt.plot(df.index,df[months['all']].min(axis=1),'b--',label='min',alpha=0.3)
ax = plt.gca()
ax.fill_between(df.index,df[months['all']].min(axis=1),df[months['all']].max(axis=1),facecolor='k',alpha=0.3)
plt.ylabel('LOTI [degC]')
plt.xlabel('Year')
plt.title('Global LOTI Data')
plt.legend()
Out[9]:
<matplotlib.legend.Legend at 0x28eb84d8630>

Interactive LOTI plots

Impact of Smoothing

Plots of the LOTI with different smoothing intervals for various seasonal means.

In [24]:
#widget_sel_file = widgets.Dropdown(description='File')
#widget_sel_file.options = list(LOTI.keys())
#display(widget_sel_file)

widget_sel_avg = widgets.Dropdown(description='Yearly Average Type')
widget_sel_avg.options = list(months.keys())
widget_sel_avg.value = 'all'
display(widget_sel_avg)

widget_sel_window = widgets.IntSlider(description='Averiging Window Size [Years]',min=1,max=20,value=10)
display(widget_sel_window)

widget_graph = widgets.Button(description = 'Make Graph')
display(widget_graph)

isfirst = True

def on_button_clicked(button):
    
    global isfirst
    plt.figure('LOTI - Averaging Study')
    i = 1
    windowsize = widget_sel_window.value
    avg_time = months[widget_sel_avg.value]
    
    ymax = 0; ymin = 0
    for key, df in LOTI.items():
        ymax=max(max(df[avg_time].mean(axis=1)),ymax)
        ymin=min(min(df[avg_time].mean(axis=1)),ymin)
    
    for key in LOTI.keys():
        df = LOTI[key]
        
        plt.subplot(1,3,i)
        if isfirst:
            plt.plot(df.index,df[avg_time].mean(axis=1),'k',label='Ann. Mean')


        plt.plot(df.index,
                 df[avg_time].mean(axis=1).rolling(window=windowsize,center=False).mean(),
                 label=str(windowsize) + ' yr, ' + widget_sel_avg.value)

        ax = plt.gca()
        ax.fill_between(df.index,
                        df[avg_time].mean(axis=1).rolling(window=windowsize,center=False).mean()-df[avg_time].mean(axis=1).rolling(window=windowsize,center=False).std(),
                        df[avg_time].mean(axis=1).rolling(window=windowsize,center=False).mean()+df[avg_time].mean(axis=1).rolling(window=windowsize,center=False).std(),
                        alpha=0.2)

        ax.axvspan(1951,1980,facecolor='green',alpha=0.3)
        
        if i==1:
            plt.ylabel('LOTI [degC]')
        plt.xlabel('Year')
        plt.legend()
        plt.title('Dataset: ' + key)
        plt.ylim(ymin,ymax)
        i += 1
    isfirst = False
    
   
plt.figure('LOTI - Averaging Study',figsize=(9.5,4)) 
widget_graph.on_click(on_button_clicked)

Seasonal data

In adition to the averaged data, we can also look at the monthly temperature variations.

In [11]:
plt.figure('Linegraphs - LOTI Monthly',figsize=(9.5,3))
i=1

for key, df in LOTI.items():
    plt.subplot(1,3,i)
    for year in df.index:
        plt.plot(df.loc[year][months['all']].tolist())
    
    i += 1