In [1]:
import urllib2
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
%matplotlib inline

Climate data from NOAA

This notebook presents the process of accessing NOAA's servers to extract average climate data at the station level. In particular, I am interested in yearly averages, but the logic would be the same for any other metric in the dataset.

Data come from the 1981-2010 Normals Data, and I will extract average values to characterize the climate of each weather station, which will also be plotted at the end.

The dataset can be accessed through FTP on the following url:

ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/products/temperature/

Information about the data is found in the entre file:

In [3]:
readme = urllib2.urlopen('ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/readme.txt')
print readme.read()[:200]
README FILE FOR NOAA'S 1981-2010 CLIMATE NORMALS

OUTLINE

I.   CONTENTS
II.  FILENAMING
III. FILE FORMATS
IV.  UNITS
V.   SPECIAL VALUES
VI.  FLAGS

I.  CONTENTS

    readme.txt - this file
    statu

We are going to extract annual (ann) average (normal) for the following variables:

In [4]:
vars2xtr = {
            'cldd': 'cooling degree days', \
    'cldh': 'cooling degree hours', \
    'clod': 'clouds', \
    'dewp': 'dew point temperature', \
    'dutr': 'diurnal temperature range', \
    'hidx': 'heat index', \
    'htdd': 'heating degree days', \
    'htdh': 'heating degree hours', \
    'prcp': 'precipitation', \
    'pres': 'sea level pressure', \
    'snow': 'snowfall', \
    'snwd': 'snow depth', \
    'tavg': 'mean temperature', \
    'temp': 'temperature', \
    'tmax': 'maximum temperature', \
    'tmin': 'minimum temperature', \
    'wchl': 'wind chill', \
    'wind': 'wind'
    }

Locating files

Copied from the source_path is a table of the file directory:

In [5]:
source_path = 'ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/products/temperature/'
In [6]:
files = '''File:ann-cldd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-base70.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-base72.txt 	184 KB 	30/06/11 	00:00:00
File:ann-cldd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:ann-dutr-normal.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-base40.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:ann-htdd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tavg-normal.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth040.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth050.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth060.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth070.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth080.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth090.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-grth100.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmax-normal.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth000.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth010.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth020.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth040.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth050.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth060.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-avgnds-lsth070.txt 	184 KB 	30/06/11 	00:00:00
File:ann-tmin-normal.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base70.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-base72.txt 	184 KB 	30/06/11 	00:00:00
File:djf-cldd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:djf-dutr-normal.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-base40.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:djf-htdd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tavg-normal.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth040.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth050.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth060.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth070.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth080.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth090.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-grth100.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmax-normal.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth000.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth010.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth020.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth040.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth050.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth060.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-avgnds-lsth070.txt 	184 KB 	30/06/11 	00:00:00
File:djf-tmin-normal.txt 	184 KB 	30/06/11 	00:00:00
File:dly-cldd-base45.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-base50.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-base55.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-base57.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-base60.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-base70.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-base72.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-cldd-normal.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-dutr-normal.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-dutr-stddev.txt 	17535 KB 	30/06/11 	00:00:00
File:dly-htdd-base40.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-htdd-base45.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-htdd-base50.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-htdd-base55.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-htdd-base57.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-htdd-base60.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-htdd-normal.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-tavg-normal.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-tavg-stddev.txt 	17535 KB 	30/06/11 	00:00:00
File:dly-tmax-normal.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-tmax-stddev.txt 	17535 KB 	30/06/11 	00:00:00
File:dly-tmin-normal.txt 	20658 KB 	30/06/11 	00:00:00
File:dly-tmin-stddev.txt 	17535 KB 	30/06/11 	00:00:00
File:jja-cldd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-base70.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-base72.txt 	184 KB 	30/06/11 	00:00:00
File:jja-cldd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:jja-dutr-normal.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-base40.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:jja-htdd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tavg-normal.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth040.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth050.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth060.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth070.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth080.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth090.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-grth100.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmax-normal.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth000.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth010.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth020.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth040.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth050.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth060.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-avgnds-lsth070.txt 	184 KB 	30/06/11 	00:00:00
File:jja-tmin-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base70.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-base72.txt 	184 KB 	30/06/11 	00:00:00
File:mam-cldd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mam-dutr-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-base40.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:mam-htdd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tavg-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth040.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth050.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth060.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth070.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth080.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth090.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-grth100.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmax-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth000.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth010.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth020.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth040.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth050.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth060.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-avgnds-lsth070.txt 	184 KB 	30/06/11 	00:00:00
File:mam-tmin-normal.txt 	184 KB 	30/06/11 	00:00:00
File:mly-cldd-base45.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-base50.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-base55.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-base57.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-base60.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-base70.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-base72.txt 	748 KB 	30/06/11 	00:00:00
File:mly-cldd-normal.txt 	748 KB 	30/06/11 	00:00:00
File:mly-dutr-normal.txt 	748 KB 	30/06/11 	00:00:00
File:mly-dutr-stddev.txt 	635 KB 	30/06/11 	00:00:00
File:mly-htdd-base40.txt 	748 KB 	30/06/11 	00:00:00
File:mly-htdd-base45.txt 	748 KB 	30/06/11 	00:00:00
File:mly-htdd-base50.txt 	748 KB 	30/06/11 	00:00:00
File:mly-htdd-base55.txt 	748 KB 	30/06/11 	00:00:00
File:mly-htdd-base57.txt 	748 KB 	30/06/11 	00:00:00
File:mly-htdd-base60.txt 	748 KB 	30/06/11 	00:00:00
File:mly-htdd-normal.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tavg-normal.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tavg-stddev.txt 	635 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth040.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth050.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth060.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth070.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth080.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth090.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-grth100.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-avgnds-lsth032.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-normal.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmax-stddev.txt 	635 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth000.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth010.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth020.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth032.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth040.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth050.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth060.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-avgnds-lsth070.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-normal.txt 	748 KB 	30/06/11 	00:00:00
File:mly-tmin-stddev.txt 	635 KB 	30/06/11 	00:00:00
File:son-cldd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-base70.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-base72.txt 	184 KB 	30/06/11 	00:00:00
File:son-cldd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:son-dutr-normal.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-base40.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-base45.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-base50.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-base55.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-base57.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-base60.txt 	184 KB 	30/06/11 	00:00:00
File:son-htdd-normal.txt 	184 KB 	30/06/11 	00:00:00
File:son-tavg-normal.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth040.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth050.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth060.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth070.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth080.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth090.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-grth100.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmax-normal.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth000.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth010.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth020.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth032.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth040.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth050.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth060.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-avgnds-lsth070.txt 	184 KB 	30/06/11 	00:00:00
File:son-tmin-normal.txt 	184 KB 	30/06/11 	00:00:00
'''
In [7]:
paths = []
for file in files.split('\n'):
    file = file.strip('File:').split(' ')[0]
    if ('ann' in file) and ('normal' in file):
        paths.append(file)

We get to the files we can access on annual averages:

In [8]:
for i in paths:
    name = vars2xtr[i.strip('.txt').split('-')[1]]
    print '%s | %s'%(i, name)
ann-cldd-normal.txt | cooling degree days
ann-dutr-normal.txt | diurnal temperature range
ann-htdd-normal.txt | heating degree days
ann-tavg-normal.txt | mean temperature
ann-tmax-normal.txt | maximum temperature
ann-tmin-normal.txt | minimum temperature

Extracting the data

In [9]:
def load_txt_file(url):
    dat = pd.read_table(url, header=None, squeeze=True)
    dat = dat.apply(parse_line)
    return dat

def parse_line(line):
    stnid = line[:11]
    val = line[11:-1]
    flag = line[-1:]
    return pd.Series({'stnid': stnid, 'val': val, 'flag': flag})
In [10]:
db_path = 'weather_data_stations.csv'
try:
    db = pd.read_csv(db_path)
except:
    db = []
    for var in paths:
        print 'Downloading: ', var
        dat = load_txt_file(source_path + var)
        dat['var'] = var
        db.append(dat)
    db = pd.concat(db, ignore_index=True)
    db.to_csv(db_path, index=False)

Geo-referencing the stations

Station info is given in a different file also accessible via FTP. The structure of this file is (as copied from the README file):

       ------------------------------
       Variable   Columns   Type
       ------------------------------
       ID            1-11   Character
       LATITUDE     13-20   Real
       LONGITUDE    22-30   Real
       ELEVATION    32-37   Real
       STATE        39-40   Character
       NAME         42-71   Character
       GSNFLAG      73-75   Character
       HCNFLAG      77-79   Character
       WMOID        81-85   Character
       METHOD*      87-99   Character
       ------------------------------

We can parse it and read it in to link it up to db, keeping only data values with quality enough (i.e. C, S, R).

In [11]:
def load_stations(url):
    dat = pd.read_table(url, header=None, squeeze=True)
    dat = dat.apply(parse_stn)
    return dat

def parse_stn(line):
    d = {
    'stnid': line[:11], \
    'lat': float(line[12:20]), \
    'lon': float(line[21:30]), \
    'ele': float(line[31:37]), \
    'state': line[38:40], \
    'name': line[41:71], \
    'gsnflag': line[72:75], \
    'hcnflag': line[76:79], \
    'wmoid': line[80:85], \
    'method': line[86:99]
    }
    return pd.Series(d)
    
In [14]:
stations_url = 'ftp://ftp.ncdc.noaa.gov/pub/data/normals/1981-2010/station-inventories/allstations.txt'
try:
    geodb = pd.read_csv('weather_geo.csv')
except:
    stations = load_stations(stations_url)
    geodb = db.join(stations.set_index('stnid')[['lon', 'lat', 'ele']], on='stnid')
    filt = geodb['flag'].apply(lambda x: x in ['C', 'S', 'R'])
    geodb = geodb[filt]
    geodb.to_csv('weather_geo.csv', index=False)

Both stations and db can be linked up by the station id:

In [15]:
geodb.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 38047 entries, 0 to 38046
Data columns (total 8 columns):
Unnamed: 0    38047 non-null int64
flag          38047 non-null object
stnid         38047 non-null object
val           38047 non-null int64
var           38047 non-null object
lon           38047 non-null float64
lat           38047 non-null float64
ele           38047 non-null float64
dtypes: float64(3), int64(2), object(3)

We can use PySAL plotting capabilities to get a quick plot of the station locations:

In [17]:
import pysal as ps
from pysal.contrib.viz import mapping as maps
In [26]:
f, ax = plt.subplots(figsize=(12, 6))
sc = plt.scatter(stations.lon, stations.lat, \
                 linewidth=0, c='blue', s=0.2, marker='.')
ax = maps.setup_ax([sc], ax)
plt.title('Location of NOAA weather stations')
plt.show()