In [1]:

```
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from IPython.display import Image
import pandas as pd
```

Learn how to plot histograms and cumulative distributions

Learn how to get lists of random numbers

Learn about the topography of the Earth (hypsometric curve)

The hypsometric curve for the Earth was very important in that it demonstrated that the Earth's crust comes in two flavors: continental and oceanic.

In [2]:

```
Image(filename='Figures/hypsometric.v2_400.jpg')
```

Out[2]:

*Hypsometric curve for the Earth. The x-axis is the percentage of Earth's surface that falls above the elevation on the y-axis. For example, ~30% of the Earth's surface is above sea level. [Figure from this web site: https://serc.carleton.edu/mathyouneed/hypsometric/index.html ]*.

The insight from the hypsometric curve is that there are two plateaus in crustal elevations with the continents above sea level but another large fraction of crust below sea level. These are the two 'flavors' of continental and oceanic crust.

Oceanic crust must be denser than continental crust - that is why it "floats" lower which is the basis for *isostacy*. [For more on isostacy, see this link: https://en.wikipedia.org/wiki/Isostasy. ]

Today we will recreate a hypsometric from our knowledge of the Earth's topography. Nowadays, we can measure the topography from satellites, so there is extensive coverage over the surface of the Earth.

Here is a map projection of the Earth's topography. In the coming weeks, we will learn how to make this map, but for now, let's just take a look at it. Notice the mountain ranges, including the ridge crests.

In [3]:

```
Image(filename='Figures/etopo20.jpg')
```

Out[3]:

We are interested in the hypsometric curve, which is the distribution of elevations. One very straight forward way to do this is to look at the elevation data as a histogram. A histogram breaks the data into 'bins', sums the data points within each bin and then plots the totals in each bin as a series of bars.

To plot a histogram, we must first read in the data that were plotted on the map. We'll use a set of elevation data associated with a latitude/longitude grid around the earth called ETOPO20 (20-minute resolution).

The file is a compressed version (gnu-zip or gz) of the data and can be read directly using NumPy's **loadtxt( )** function:

In [4]:

```
topo=np.loadtxt('Datasets/Etopo/etopo20data.gz')
```

It is usually a good idea to take a peek at the data before you try plotting them, so let's just look at the first 10 cells in the array 'topo':

In [5]:

```
print (topo[0:10])
```

Our original hypsometric plot was in kilometers, whereas these data are in meters. We can convert the elevations to km by dividing *topo* by 1000 (because it is a NumPy array and not a list).

In [6]:

```
topo_km=topo/1000. # convert to kilometers
print (topo_km[0:10])
```

The elevations are in many rows that pair with the latitude and longitude data stored in two other files: etopo20lats.gz and etopo20lons.gz. For now, we're only concerned with the distribution of the topography, so we can "flatten" the data into a single row using the handy NumPy **flatten( )** method you already met:

In [7]:

```
flat_topo=topo_km.flatten()
print (flat_topo)
```

Now we can plot the data as a histogram using **plt.hist( )**. (Try **help(plt.hist)** for more info.)

**plt.hist( )** takes some keyword arguments, and we will use **bins** and **density** to set the number of bins to 20 and normalize the plot to sum to unity.

In [8]:

```
help(plt.hist)
```

In [9]:

```
plt.hist(flat_topo,bins=20,density=True)
plt.xlabel("Elevation (km)")
plt.ylabel("Fraction");
```

There are two peaks, one near sea-level and one below sea-level (the ocean basins). We saw this same pattern in the hypsometric curve.

But the hypsometric curve is not a histogram. It is in fact a kind of *cumulative distribution function (CDF)*. Instead of binning the data and finding the number of points in each bin as in a histogram, we find the cumulative distribution for each value of x. We plot on the y axis the proportion of values less than or equal to x for each x. So the value of y on the hypsometric curve represents the percentage of Earth's surface with an elevation LOWER than the elevation at the corresponding value of x.

Let's create a CDF from our topographic data. To do this, we must sort the data by elevation, then plot each elevation against a number from 1 to 100%.

To sort the data, we can use the **sorted( )** function on our **flat_topo** array. (This is similar to, but has a different syntax from, the way we sorted Panda DataFrames.)

In [10]:

```
topo_sorted=sorted(flat_topo) # sort the array in increasing numbers
```

For the y axis we need the percentage of Earth's surface that lies below a given elevation. For this, we just need an array with $N$ values equally spaced between 0 and 100. This sounds like a job for **np.linspace( )** [Lecture 7].

In [11]:

```
percent=np.linspace(0,100,len(topo_sorted))
```

And now for the plot:

In [12]:

```
plt.plot(topo_sorted,percent)
plt.xlabel('Elevation (km)')
plt.ylabel('Percent');
```

Compare this plot with the histogram. The same two peaks from the histogram correspond to the abrupt increases in percent values at ~-4km and ~0 km. The advantage of a CDF over a histogram is that you don't have to choose some arbitrary bin size.

If you look closer, you can see that this CDF is similar to our hypsometric curve but turned on its side and flipped left to right. Once we plot the data with the percent on the x-axis and the elevation on the y-axis, we'll get closer to our hypsometric curve:

In [13]:

```
plt.plot(percent,topo_sorted)
plt.xlabel('Percent')
plt.ylabel('Elevation (km)');
```

Our CDF is still different from the hypsometric curve we saw at the beginning of class - it is more like the mirror image of what we are after.

In [14]:

```
Image(filename='Figures/hypsometric.v2_400.jpg')
```

Out[14]:

Our CDF represents the percentage (x-axis) of Earth's surface that is LOWER than the corresponding elevation (y-axis), whereas the hypsometric curve represents the percentage of Earth's surface HIGHER than the corresponding elevation. So let's start over and sort the data by decreasing values instead of increasing. We do that with the keyword argument **reverse** in the function **sorted( )**.

In [15]:

```
topo_sorted=sorted(flat_topo,reverse=True) # sort the data by decreasing numbers
percent=np.linspace(0,100,len(topo_sorted)) # and percent just like before
plt.plot(percent,topo_sorted) # plot
plt.xlabel('Percent') # label
plt.ylabel('Elevation (km)');
```

In [16]:

```
Image(filename='Figures/hypsometric.v2_400.jpg')
```

Out[16]:

NOW we are getting somewhere! But, the dimensions of the plot are different. We can change this by using the **plt.figure( )** function, which as we already learned takes a tuple of the desired dimensions and a figure number as arguments. Here we use 1 as the figure number and make a 6 x 5 figure

In [17]:

```
plt.figure(1,(6,5)) #figure number and dimensions
plt.plot(percent,topo_sorted)
plt.ylabel('Elevation (km)')
plt.title("Hypsometric Curve")
plt.xlabel("Percent");
```

We're close but still not exactly right.

Recall that the data are binned into 20 minute grids of latitude and longitude. The problem with that approach is illustrated in the figure below. On the left is the way our data are constructed with equal weight at every line of latitude. On the right I replotted the data as a polar projection, with the North pole in the center. And yes you will learn how to do that too! But for now, you can see that there is a lot less surface area per line of latitude near the north pole than at the equator!

In [18]:

```
Image(filename='Figures/etopo20_polar.jpg')
```

Out[18]:

Because the area of one cell at the pole is MUCH less than that for an equatorial cell, we have to normalize the number of data points in each cell by its surface area. We can accomplish the same thing by re-sampling each grid and not use ALL the data but just select a number that is proportional to the surface area it covers. Each point would then represent the same percentage of Earth's surface. We would sample the most (say take all the grid cells) from the equator and few as we get to the poles.

To do this, we'll need to know how many cells are in each latitudinal bin. Let's use the NumPy **shape** attribute for that:

In [19]:

```
topo.shape
```

Out[19]:

This means that there are 540 latitudinal bins with 1081 longitudinal cells in each latitude bin.

Let's read in the latitude/longitude data files that defines our grid and take a look at them.

In [20]:

```
lats=np.loadtxt('Datasets/Etopo/etopo20lats.gz') #read in the data
lons=np.loadtxt('Datasets/Etopo/etopo20lons.gz')
print (lats[0:20])
print (lons[0:20])
print (lats.shape)
print (lons.shape)
```

There are 540 latitudinal bins and for each latitude and 1081 longitudinal bins.

The trick here is that the surface area $S$ of a lat/lon box varies as the cosine of the latitude ($\phi$ - see coordinate system in the figure below). In fact:

In [21]:

```
Image(filename='Figures/Sphere_wireframe.png',width=400)
```

Out[21]: