In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import Image
import pandas as pd

Lecture 26:

  • Take a look at data with respect to time (time series)

  • Learn a bit about time series analysis.

Time series in Earth Science

Earth's orbit - pacemaker of the Ice Ages

One of the classic problems in Earth Science is, "What controlled the coming and going of the great ice sheets?" It has long been known that there were several, perhaps many, ice ages. In the European Alps, several keen observationalists (among them the German author/poet/gentleman scientist, Goethe), had noticed that Alpine glaciers must have been larger in the past.

Between 1837 and 1840, a Swiss-American geologist named Jean Louis Agassiz used the moraines, striations, and glacial erratics deposited by moving glaciers as evidence that much of Europe was once covered with a vast ice sheet like that still on Greenland today. That said, the causes of these profound climatic changes remained a mystery.

In 1920, Milutin Milankovitz explained the coming and going of ice ages as a response to changes in the Earth's insolation (the amount of energy recieved from the sun). He argued that insolation is controlled by changes in the Earth's orbit around the sun. This idea has now been widely embraced by the paleoclimate community, largely because of the very strong coherence between cycles in Earth's orbit and evidence for changes in ice volume using geochemical proxies like oxygen isotopic ratios.

The orbital cycles can be calculated by knowing the exact configuration of the planets. Milankovitch famously took the first stab at it from his prison cell during WWI. Nowadays it is calculated with fancy computers. The key parameters are eccentricity (or ovalness of the orbit around the sun), the obliquity (tilt) of the spin axis and the precession of the spin axis.

In [4]:
Image(filename='Figures/Milankovitch_Cycles.jpg')
Out[4]:

[Figure from http://www.jamstec.go.jp/e/about/press_release/20141027/; see also http://www.sciencecourseware.org/eec/GlobalWarming/Tutorials/Milankovitch/].

The Earth's orbital parameters of ellipticity, obliquity and precession vary in predictable ways. One commonly used model for variations in them over the last few hundred million years was published by Laskar et al. (2004; http://dx.doi.org/10.1051/0004-6361:20041335).

Let's take a look for the behavior of the last few million years using the data file from the Laskar et al. (2004) paper.

In [16]:
# Read in the datafile into a Pandas DataFrame
cycles=pd.read_csv('Datasets/IceAges/INSOLN.LA2004.BTL.100.csv')
print (cycles.columns)
Index(['Age (ka)', 'Eccentricity', 'Obliquity', 'Precession'], dtype='object')

First, we filter the data for the last million years and then we plot it as a time series.

In [17]:
cycles_1Ma=cycles[cycles['Age (ka)']<1000] # only look at last 1000 ka (1 Million).   

# set up the plot as three rows
# start with Eccentricity
fig=plt.figure(1,(8,8)) # make a nice big plot
fig.add_subplot(311) # notice how you do not need the commas!
plt.plot(cycles_1Ma['Age (ka)'],cycles_1Ma['Eccentricity'])
plt.ylabel('Eccentricity')

# add obliquity
fig.add_subplot(312)
plt.plot(cycles_1Ma['Age (ka)'],cycles_1Ma['Obliquity'])
plt.ylabel('Obliquity')

# add precession
fig.add_subplot(313)
plt.plot(cycles_1Ma['Age (ka)'],cycles_1Ma['Precession'])
plt.ylabel('Precession')
plt.xlabel('Age (ka)');

You can see a lot of cycles on different time scales. The question is how this relates to the amount of insolation. In the literature, there are many attempts to convert the orbital parameters, like those in the plot above, to the amount of insolation received by the Earth's atmosphere as a function of latitude and age. One recent paper was by Huybers (Huybers, P. 2006, http://dx.doi.org/10.1126/science.1125249). You can get the data (credited to Huybers and our own Professor Ian Eisenman) from here: https://www.ncdc.noaa.gov/paleo-search/study/5792.

It is traditional to consider the amount of insolation received at 65$^{\circ}$N. So let's take a look.

In [18]:
#read the data into a pandas DataFrame
fig=plt.figure(2,(8,3))
insol=pd.read_csv('Datasets/IceAges/j_65north.csv')
print (insol.columns)
plt.plot(insol['Age (ka)'],insol['Insolation (W/m^2)'])
plt.xlim(0,1000)
plt.ylabel('Insolation (W m$^{-2}$)')
plt.xlabel('Age (ka)');
Index(['Age (ka)', 'Insolation (W/m^2)'], dtype='object')

But how do we relate these wiggles to ice ages? Glaciers tend to over-ride the evidence of their older siblings, so how do we know the timing and extent of past glaciations? The answer is marine fossils. Marine fossils, for example, formainifera, are made of calcium carbonate and retain a record of the oxygen isotopic ratios of the sea water in which they live. And "So?", you say. What does that have to do with ice volume? Here is the answer in a nutshell:

  • There are several isotopes of oxygen, two of which are fairly common: $^{18}$O and $^{16}$O.
  • During evaporation, the lighter isotope is preferentially removed, leaving the water body enriched in the heavier isotope. When the water condenses, the process is reversed and the heavier isotope is preferentially removed. So nothing should happen over time, right? What goes up must come down. But it isn't that simple.
  • Evaporation occurs mostly in the tropics (because it is hotter) and then the clouds move north raining and snowing out as they go. This process is Rayleigh distillation and results in an enrichment of the light oxygen isotopes in the rain and snow. If the snow gets trapped on land, for example in a glacier or ice sheet, then there is an enrichment of the heavier isotope in the sea water. The ratio of the two isotopes is a proxy of the volume of ice.

We will need a measure of the isotopic ratios in foraminifera recovered from deep sea sediment cores and their DATES. The latter was done using magnetic stratigraphy (see, e.g., Shackleton and Opdyke, 1973, Quaternary Research, 3, 39-55, https://doi.org/10.1016/0033-5894(73)90052-5). [Magnetic stratigraphy uses records of reversals of the magnetic field recorded in sequences of rocks to date them.]

A modern version of these data was published by Lisecki and Raymo (2005, http://dx.doi.org/10.1029/2004PA001071) called the LR04 stack. This is a stack of 58 records of oxygen isotopic variations, several of which were independently dated using magnetostratigraphy, from all over the world's oceans.

Let's take a look at that record:

In [19]:
d18O=pd.read_csv('Datasets/IceAges/LR04stack.csv')
print (d18O.columns)
Index(['Age (ka)', 'd18O', 'uncertainty'], dtype='object')

The data are cast as $\delta ^{18}O$, defined as:

$$\delta ^{{18}}O={\Biggl (}{\frac {{\bigl (}{\frac {^{{18}}O}{^{{16}}O}}{\bigr )}_{{sample}}}{{\bigl (}{\frac {^{{18}}O}{^{{16}}O}}{\bigr )}_{{standard}}}}-1{\Biggr )}\times 1000\ ^{{o}}\!/\!_{{oo}}$$

And it is traditional to plot them with the heavier isotope down.

Let's plot them up for the last million years.

In [20]:
fig=plt.figure(3,(8,3))
d18O_1Ma=d18O[d18O['Age (ka)']<1000] # filter data for last 1Ma
plt.plot(d18O_1Ma['Age (ka)'],d18O_1Ma['d18O'])
plt.xlabel('Age (ka)')
plt.ylabel('$\delta ^{18}$O')
plt.ylim(5,3);

Both insolation and $\delta ^{18}$O have a lot of wiggles over the last million years, but are they the SAME wiggles? One way to look at this is using time series analysis. There are entire courses devoted to this subject and a complete treatment is WAY beyond the scope of this class, but we can begin to answer the basic question posed by the isotope/insolation story. The big question is: Do the two data sets have wiggles with the same frequencies?

The analysis boils down to this:

  • According to Fourier, any periodic function $f(t)$ can be represented as the sum of a series of sines and cosines:
$$f(t)=a_0+ \sum_{r=1}^\infty \bigr[a_r \cos \bigr( { {2\pi r}\over{T}} t\bigr) + b_r \sin \bigr( { {2\pi r}\over{T}} t\bigl) \bigl] $$
  • You can represent data as a series in time (in the time-domain) as we have been doing OR you can represent the data in terms of frequency, looking for the power in the data as a function of frequency. This is known as the power spectrum.

There is much more to learn about this that is beyond what we can do here (mostly having to do how you massage the data). If you are curious, there are some very nice Python wrappers for beautiful code for doing cool state-of-the-art time series analysis (see, e.g., http://krischer.github.io/mtspec/ for how to install the code and examples of how to use it. Let me know if you want help doing this, for example for a final project.)

Let's do a zero order analysis that produces a periodogram, which is a plot of power versus frequency. We will do this with the minimum of massaging.

Let us take advantage of a signal.periodogram function in the scipy package. That module has functions that allow us to calculate the power spectral density for a time series.

Also, we know that eccentricity is supposed to have a dominant period at 100 kyr, obliquity at 41 kyr and precession at 23 and 19 kyr. Remember that these numbers are expressed in terms of the period, which is the inverse of the frequency.

We can see how our minimalist treatment works.

Let's start with precession:

In [21]:
# first we import the necessary function:
from scipy import signal 

help(signal.periodogram)
Help on function periodogram in module scipy.signal.spectral:

periodogram(x, fs=1.0, window='boxcar', nfft=None, detrend='constant', return_onesided=True, scaling='density', axis=-1)
    Estimate power spectral density using a periodogram.
    
    Parameters
    ----------
    x : array_like
        Time series of measurement values
    fs : float, optional
        Sampling frequency of the `x` time series. Defaults to 1.0.
    window : str or tuple or array_like, optional
        Desired window to use. If `window` is a string or tuple, it is
        passed to `get_window` to generate the window values, which are
        DFT-even by default. See `get_window` for a list of windows and
        required parameters. If `window` is array_like it will be used
        directly as the window and its length must be nperseg. Defaults
        to 'boxcar'.
    nfft : int, optional
        Length of the FFT used. If `None` the length of `x` will be
        used.
    detrend : str or function or `False`, optional
        Specifies how to detrend each segment. If `detrend` is a
        string, it is passed as the `type` argument to the `detrend`
        function. If it is a function, it takes a segment and returns a
        detrended segment. If `detrend` is `False`, no detrending is
        done. Defaults to 'constant'.
    return_onesided : bool, optional
        If `True`, return a one-sided spectrum for real data. If
        `False` return a two-sided spectrum. Note that for complex
        data, a two-sided spectrum is always returned.
    scaling : { 'density', 'spectrum' }, optional
        Selects between computing the power spectral density ('density')
        where `Pxx` has units of V**2/Hz and computing the power
        spectrum ('spectrum') where `Pxx` has units of V**2, if `x`
        is measured in V and `fs` is measured in Hz. Defaults to
        'density'
    axis : int, optional
        Axis along which the periodogram is computed; the default is
        over the last axis (i.e. ``axis=-1``).
    
    Returns
    -------
    f : ndarray
        Array of sample frequencies.
    Pxx : ndarray
        Power spectral density or power spectrum of `x`.
    
    Notes
    -----
    .. versionadded:: 0.12.0
    
    See Also
    --------
    welch: Estimate power spectral density using Welch's method
    lombscargle: Lomb-Scargle periodogram for unevenly sampled data
    
    Examples
    --------
    >>> from scipy import signal
    >>> import matplotlib.pyplot as plt
    >>> np.random.seed(1234)
    
    Generate a test signal, a 2 Vrms sine wave at 1234 Hz, corrupted by
    0.001 V**2/Hz of white noise sampled at 10 kHz.
    
    >>> fs = 10e3
    >>> N = 1e5
    >>> amp = 2*np.sqrt(2)
    >>> freq = 1234.0
    >>> noise_power = 0.001 * fs / 2
    >>> time = np.arange(N) / fs
    >>> x = amp*np.sin(2*np.pi*freq*time)
    >>> x += np.random.normal(scale=np.sqrt(noise_power), size=time.shape)
    
    Compute and plot the power spectral density.
    
    >>> f, Pxx_den = signal.periodogram(x, fs)
    >>> plt.semilogy(f, Pxx_den)
    >>> plt.ylim([1e-7, 1e2])
    >>> plt.xlabel('frequency [Hz]')
    >>> plt.ylabel('PSD [V**2/Hz]')
    >>> plt.show()
    
    If we average the last half of the spectral density, to exclude the
    peak, we can recover the noise power on the signal.
    
    >>> np.mean(Pxx_den[25000:])
    0.00099728892368242854
    
    Now compute and plot the power spectrum.
    
    >>> f, Pxx_spec = signal.periodogram(x, fs, 'flattop', scaling='spectrum')
    >>> plt.figure()
    >>> plt.semilogy(f, np.sqrt(Pxx_spec))
    >>> plt.ylim([1e-4, 1e1])
    >>> plt.xlabel('frequency [Hz]')
    >>> plt.ylabel('Linear spectrum [V RMS]')
    >>> plt.show()
    
    The peak height in the power spectrum is an estimate of the RMS
    amplitude.
    
    >>> np.sqrt(Pxx_spec.max())
    2.0077340678640727

We can call the function, which returns frequencies and power.

In [22]:
# make an array out of the desired data series
prec=np.array(cycles_1Ma['Precession'])

# and calculate the frequencies 
prec_freqs,prec_power=signal.periodogram(prec)
# plot on a linear x, log y plot (using semilogy( ))
plt.semilogy(prec_freqs,prec_power)
plt.ylim(.01,1000) # truncate the Y axis
plt.xlim(.001,.06) # truncate the X axis
# put on the precessional frequencies
plt.axvline(x=1./23.,color='red')  # use a vertical line
plt.axvline(x=1./19.,color='black')
plt.xlabel('Frequency') # label the axes
plt.ylabel('Power')
plt.text(1./23.,1000,'23 kyr',ha='center',va='bottom')
plt.text(1./19.,1000,'19 kyr',ha='center',va='bottom');

It is a little rough but you can clearly see the two peaks near 23 and 19 kyr. We could smooth out the diagram by exploring windowing options, but for now, let us just plow on, looking at obliquity next.

In [23]:
# make an array out of the desired data series
obl=np.array(cycles_1Ma['Obliquity'])

# and calculate the frequencies 
obl_freqs,obl_power=signal.periodogram(obl)

plt.semilogy(obl_freqs,obl_power)
plt.ylim(.00001,.1)
plt.xlim(.001,.06)
# put on the obliquity frequencies
plt.axvline(x=1./41.,color='red') 
plt.xlabel('Frequency')
plt.ylabel('Power')
plt.text(1./41.,.1,'41 kyr',ha='center',va='bottom');

Are there orbital frequencies in the isotopic data?

At last we can look at our isotope time series and see if the same frequencies are there. If so, that would support Milankovitch's famous hypothesis regarding orbits and ice ages.

We will also throw in an example of a window in the periodogram calculation. What a window does is multiply the time series by a function (called a taper) that weights information, suppressing data at the edges of the window and focussing on the center of the window. The simplest window is a box car which gives equal weight to everything inside the window. In the following, we will use a Blackman window which looks more like a bell curve.

You can find out more about windowing (a.k.a. tapering) from this website:

https://en.wikipedia.org/wiki/Window_function

Let's take a look at the LR04 data set.

In [24]:
plt.figure(4,(6,5))
iso=d18O_1Ma['d18O'] # pick out the data series
# set the window with the keyword 'window'
iso_freqs,iso_power=signal.periodogram(iso,window='blackman')
plt.semilogy(iso_freqs,iso_power,label='LR04 stack',linewidth=2)
plt.ylim(.01,100) # put bounds on the axes
plt.xlim(.001,.06)
plt.axvline(x=1./100.,color='red',label='Eccentricity',linewidth=3) 
plt.axvline(x=1./41.,color='orange',label='Obliquity',linewidth=3) 
plt.axvline(x=1./23.,color='green',label='Precession (23 kyr)',linewidth=3) 
plt.axvline(x=1./19.,color='blue',label='Precession (19 kyr)',linewidth=3)
plt.axvline(x=1./29.,color='black',label='Mystery (29 kyr)',linewidth=3,linestyle='dotted')
plt.legend(bbox_to_anchor=(1.05, 1), loc=2) # notice this
# way of putting the legend to the right of the plot


plt.xlabel('Frequency')
plt.ylabel('Power'); 

And... of course there is lots of power in the Milankovitch periods. But there is an extra 'hump' at 29 kyr which Lisecki and Raymo say is, "commonly observed and is thought to be a nonlinear climate response", citing a paper by Huybers and Wunsch (2004). Hmmm. Room for more research.

Of course there are still many questions on this subject, like "How do the orbital parameters control the ice volume?" and, "Why aren't we heading into an ice age now?" as Milankovitch theory would predict.

For more on time series analysis in Earth Science:

What is covered in this lecture is just a conceptual start. There are no uncertainty estimates, for example. To get realistic uncertainties, it is necessary to subsample the data with different windows that use independent subsets, a technique known as the multi-taper spectral estimation. One excellent reference for that is:

Prieto, G. A., R. L. Parker, F. L. Vernon. (2009), A Fortran 90 library for multitaper spectrum analysis, Computers and Geosciences, 35, pp. 1701-1710. doi:10.1016/j.cageo.2008.06.007

And to make things super nice, there is a python wrapper for their code, which you can investigate here: http://krischer.github.io/mtspec/tutorial.html

It works very well on Linux and MacOS systems and gives you time series power spectra with error bars!

But first, you have to install some stuff (if it isn't already installed):

on the command line on a Mac:

  • conda config --add channels conda-forge

  • conda install mtspec

  • brew install gcc

for help, see http://krischer.github.io/mtspec/

Here is an example of how it works:

Then you read in the data and analyze it:

In [33]:
from mtspec import mtspec
from mtspec.util import _load_mtdata
plt.style.use("ggplot")


data = _load_mtdata('v22_174_series.dat.gz')

# Calculate the spectral estimation.
spec, freq, jackknife, _, _ = mtspec(
    data=data, delta=4930.0, time_bandwidth=3.5,
    number_of_tapers=5, nfft=312, statistics=True)

fig = plt.figure()
ax1 = fig.add_subplot(2, 1, 1)
# Plot in thousands of years.
ax1.plot(np.arange(len(data)) * 4.930, data, color='black')
ax1.set_xlim(0, 800)
ax1.set_ylim(-1.0, 1.0)
ax1.set_xlabel("Time [1000 years]")
ax1.set_ylabel("Change in $\delta^{18}O$")

ax2 = fig.add_subplot(2, 1, 2)
ax2.set_yscale('log')
# Convert frequency to Ma.
freq *= 1E6
ax2.plot(freq, spec, color='black')
ax2.fill_between(freq, jackknife[:, 0], jackknife[:, 1],
                 color="red", alpha=0.3)
ax2.set_xlim(freq[0], freq[-1])
ax2.set_ylim(0.1E1, 1E5)
ax2.set_xlabel("Frequency [c Ma$^{-1}]$")
ax2.set_ylabel("PSD ($\delta^{18}O/ca^{-1}$)");

Cool, no?