The purpose of this notebook is to provide the data and the plotting / analysis scripts that were used in the paper: Sylvester, Z., Pirmez, C., Cantelli, A., & Jobe, Z. R. (2013). Global (latitudinal) variation in submarine channel sinuosity: COMMENT. Geology, 41(5), e287–e287. doi:10.1130/G33548C.1. If you have comments or questions, please send them to zoltan_dot_sylvester_at_gmail_dot_com.
The original article that is discussed in the comment and the reply are as follows:
Peakall, J., Kane, I. A., Masson, D. G., Keevil, G., McCaffrey, W., & Corney, R. (2011). Global (latitudinal) variation in submarine channel sinuosity. Geology, 40(1), 11–14. doi:10.1130/G32295.1.
Peakall, J., Wells, M. G., Cossu, R., Kane, I. A., Masson, D. G., Keevil, G. M., et al. (2013). Global (latitudinal) variation in submarine channel sinuosity: REPLY. Geology, 41(5), e288–e288. doi:10.1130/G34319Y.1.
import pandas as pd
from scipy import stats
pd.set_option('display.mpl_style', 'default') # nicer plots than the default matplotlib style
fileurl = 'https://dl.dropboxusercontent.com/u/25694950/channel_sinuosities.csv'
data = pd.read_csv(fileurl)
The majority of the data comes from the book by Clark and Pickering (1996), which also forms the basis of the analysis by Peakall et al. (2011); graphs in the book were digitized using Graphclick. Data for the Amazon Channel comes from Pirmez and Flood (1995) (provided by Carlos Pirmez); and Popescu et al. (2001) (centerline digitized from map in paper and half-wavelength sinuosities calculated from centerline). See full references in paper.
Let's have a look at the first 20 lines of the data table:
data[:20]
Name | Slope | Sinuosity | Latitude | Large_river | |
---|---|---|---|---|---|
0 | DeSoto | 0.00148 | 1.24586 | 28 | 0 |
1 | DeSoto | 0.00228 | 1.31765 | 28 | 0 |
2 | DeSoto | 0.00181 | 2.04757 | 28 | 0 |
3 | DeSoto | 0.00187 | 2.00569 | 28 | 0 |
4 | DeSoto | 0.00251 | 2.23903 | 28 | 0 |
5 | Arguello | 0.00440 | 1.45093 | 33 | 0 |
6 | Arguello | 0.00464 | 1.36212 | 33 | 0 |
7 | Arguello | 0.00537 | 1.09572 | 33 | 0 |
8 | Arguello | 0.01455 | 1.56045 | 33 | 0 |
9 | Arguello | 0.01575 | 1.07500 | 33 | 0 |
10 | Arguello | 0.06686 | 1.10460 | 33 | 0 |
11 | Monterey | 0.00092 | 1.02485 | 35 | 0 |
12 | Monterey | 0.00136 | 1.06459 | 35 | 0 |
13 | Monterey | 0.00145 | 1.05607 | 35 | 0 |
14 | Monterey | 0.00143 | 1.13271 | 35 | 0 |
15 | Monterey | 0.00223 | 1.01634 | 35 | 0 |
16 | Monterey | 0.00249 | 1.08446 | 35 | 0 |
17 | Monterey | 0.00308 | 1.30300 | 35 | 0 |
18 | Monterey | 0.00320 | 1.17812 | 35 | 0 |
19 | Monterey | 0.00334 | 1.04472 | 35 | 0 |
20 rows × 5 columns
figure(figsize=(10,8))
data['Name'].value_counts().plot(kind='bar');
figure(figsize=(10,8))
groups = data.groupby('Large_river')
labels = ('no large river','large river')
colors = (plt.rcParams['axes.color_cycle'][4],plt.rcParams['axes.color_cycle'][0])
for name, group in groups:
plot(group.Slope, group.Sinuosity, marker='o', linestyle='', ms=8, label=labels[name], color=colors[name])
legend(loc='best')
xlabel('valley slope', fontsize=18)
ylabel('sinuosity', fontsize=18)
axis([0.0,0.03,1.0,3.0]);
figure(figsize=(10,8))
groups = data.groupby('Large_river')
labels = ('no large river','large river')
colors = (plt.rcParams['axes.color_cycle'][4],plt.rcParams['axes.color_cycle'][0])
for name, group in groups:
plot(group.Latitude, group.Sinuosity, marker='o', linestyle='', ms=8, label=labels[name], color=colors[name])
legend(loc='best')
xlabel('latitude', fontsize=18)
ylabel('sinuosity', fontsize=18)
axis([0.0,70,1.0,3.0]);
def bootstrap_regression(x,y,nit):
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
r = np.zeros(nit)
n = len(x)
for i in range(1,nit):
x1 = x[np.random.randint(n, size=n)]
y1 = y[np.random.randint(n, size=n)]
slope, intercept, r_value2, p_value2, std_err = stats.linregress(x1,y1)
r[i] = r_value2
if r_value>0:
p_value_boot = len(r[r >= r_value])/float(len(r))
else:
p_value_boot = len(r[r <= r_value])/float(len(r))
return r_value, p_value, p_value_boot
r_values = np.zeros([3,1])
p_values = np.zeros([3,1])
p_values_boot = np.zeros([3,1])
r_values[0], p_values[0], p_values_boot[0] = bootstrap_regression(data.Latitude,data.Sinuosity,1000)
# for variable pairs that involve slope, we can only use those data points that have slope measurements:
r_values[1], p_values[1], p_values_boot[1] = bootstrap_regression(data.Slope[isnan(data.Slope)==0],data.Sinuosity[isnan(data.Slope)==0],1000)
r_values[2], p_values[2], p_values_boot[2] = bootstrap_regression(data.Latitude[isnan(data.Slope)==0],data.Slope[isnan(data.Slope)==0],1000)
Create a dataframe that contains the result of the regression:
df = pd.DataFrame(np.hstack(([[292],[268],[268]],r_values,r_values**2,
p_values,p_values_boot)),columns=['n','Pearson R','R squared','p-value',
'p-value bootstrap'],index=('latitude and sinuosity','slope and sinuosity','latitude and slope'))
df
n | Pearson R | R squared | p-value | p-value bootstrap | |
---|---|---|---|---|---|
latitude and sinuosity | 292 | -0.347308 | 0.120623 | 1.058797e-09 | 0 |
slope and sinuosity | 268 | -0.277373 | 0.076936 | 4.020125e-06 | 0 |
latitude and slope | 268 | 0.202860 | 0.041152 | 8.370798e-04 | 0 |
3 rows × 5 columns