This notebook goes with the blog post of the same name, published on 10 August 2017.
We're going to load a shapefile containing some well data. Then we'll change its CRS, make a new attribute, and save a new shapefile.
We'll lean on geopandas
for help, but we'll also inspect the file with fiona
, a lower-level library that geopandas
uses under the hood.
Install geopandas
and its dependencies (like gdal
, proj
, and fiona
) with
conda install geopandas
conda install fiona # I had to do this too to get fiona to work.
import numpy as np
import fiona
import matplotlib.pyplot as plt
import folium
import pprint
with fiona.open('../data/offshore_wells_2011_Geographic_NAD27.shp') as src:
pprint.pprint(src[0])
{'geometry': {'coordinates': (-59.9170497222, 43.9346961111), 'type': 'Point'}, 'id': '0', 'properties': OrderedDict([('Well_No_', 1.0), ('D__', 1.0), ('Well_Name', 'Sable Island'), ('Well_Nam_1', 'C-67'), ('Company', 'Mobil et al'), ('Drilling_U', 'Bawden Rig 18'), ('Spud_Date', '1967-06-07'), ('Well_Termi', '1968-01-02'), ('RT_Elevati', 8.2), ('Water_Dept', 3.9), ('Total_Dept', 4604.0), ('Total_De_1', 15106.0), ('Well_Type', 'Exploratory'), ('Well_Symb', 'Plugged dry hole')]), 'type': 'Feature'}
import geopandas as gpd
Load our data into a GeoDataFrame (gdf):
gdf = gpd.read_file('../data/offshore_wells_2011_Geographic_NAD27.shp')
gdf.head()
Company | D__ | Drilling_U | RT_Elevati | Spud_Date | Total_De_1 | Total_Dept | Water_Dept | Well_Nam_1 | Well_Name | Well_No_ | Well_Symb | Well_Termi | Well_Type | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Mobil et al | 1.0 | Bawden Rig 18 | 8.2 | 1967-06-07 | 15106.0 | 4604.0 | 3.9 | C-67 | Sable Island | 1.0 | Plugged dry hole | 1968-01-02 | Exploratory | POINT (-59.9170497222 43.9346961111) |
1 | Shell | 2.0 | Sedneth 1 | 25.9 | 1969-09-01 | 13085.0 | 3988.0 | 57.9 | E-84 | Onondaga | 2.0 | Plugged gas well | 1969-11-11 | Exploratory | POINT (-60.2214388889 43.721147222222) |
2 | Shell | 3.0 | Sedneth 1 | 25.9 | 1969-11-16 | 13516.0 | 4120.0 | 82.3 | O-25 | Oneida | 3.0 | Plugged dry hole | 1970-02-10 | Exploratory | POINT (-61.5601366667 43.2492655556) |
3 | Shell | 4.0 | Sedneth 1 | 26.0 | 1970-02-16 | 7235.0 | 2205.0 | 95.1 | N-30 | Naskapi | 4.0 | Plugged dry hole | 1970-03-19 | Exploratory | POINT (-62.5665408333 43.4963302778) |
4 | Shell | 5.0 | Sedco H | 31.4 | 1970-05-03 | 6975.0 | 2126.0 | 117.0 | B-93 | Mohawk | 5.0 | Plugged dry hole | 1970-05-23 | Exploratory | POINT (-64.7315288889 42.7029227778) |
gdf.plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7fe08ce2d0f0>
Let's look at the first 5 rows of the geometry attribute.
gdf.geometry[:5]
0 POINT (-59.9170497222 43.9346961111) 1 POINT (-60.2214388889 43.721147222222) 2 POINT (-61.5601366667 43.2492655556) 3 POINT (-62.5665408333 43.4963302778) 4 POINT (-64.7315288889 42.7029227778) Name: geometry, dtype: object
Notice we're in lat, lon:
print(gdf.crs)
{'init': 'epsg:4267'}
Visiting EPSG 4267 tells us the datum is NAD27.
Let's cast the SHP to a new CRS: EPSG 26920:
gdf = gdf.to_crs({'init': 'epsg:26920'})
Now we're in a UTM coordinate system: UTM Zone 20N, with a NAD83 datum:
gdf.geometry[:5]
0 POINT (747510.4372073817 4869250.818941217) 1 POINT (723875.2971043108 4844664.069700127) 2 POINT (616945.850614059 4789511.343230169) 3 POINT (535101.4182156855 4816032.927423127) 4 POINT (358242.7522037996 4729291.710525603) Name: geometry, dtype: object
Now we'll also add a new attribute with the two-way seismic travel time to the seafloor (in milliseconds).
gdf['seafl_twt'] = 2 * 1000 * gdf.Water_Dept / 1485
gdf.head()
Company | D__ | Drilling_U | RT_Elevati | Spud_Date | Total_De_1 | Total_Dept | Water_Dept | Well_Nam_1 | Well_Name | Well_No_ | Well_Symb | Well_Termi | Well_Type | geometry | seafl_twt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Mobil et al | 1.0 | Bawden Rig 18 | 8.2 | 1967-06-07 | 15106.0 | 4604.0 | 3.9 | C-67 | Sable Island | 1.0 | Plugged dry hole | 1968-01-02 | Exploratory | POINT (747510.4372073817 4869250.818941217) | 5.252525 |
1 | Shell | 2.0 | Sedneth 1 | 25.9 | 1969-09-01 | 13085.0 | 3988.0 | 57.9 | E-84 | Onondaga | 2.0 | Plugged gas well | 1969-11-11 | Exploratory | POINT (723875.2971043108 4844664.069700127) | 77.979798 |
2 | Shell | 3.0 | Sedneth 1 | 25.9 | 1969-11-16 | 13516.0 | 4120.0 | 82.3 | O-25 | Oneida | 3.0 | Plugged dry hole | 1970-02-10 | Exploratory | POINT (616945.850614059 4789511.343230169) | 110.841751 |
3 | Shell | 4.0 | Sedneth 1 | 26.0 | 1970-02-16 | 7235.0 | 2205.0 | 95.1 | N-30 | Naskapi | 4.0 | Plugged dry hole | 1970-03-19 | Exploratory | POINT (535101.4182156855 4816032.927423127) | 128.080808 |
4 | Shell | 5.0 | Sedco H | 31.4 | 1970-05-03 | 6975.0 | 2126.0 | 117.0 | B-93 | Mohawk | 5.0 | Plugged dry hole | 1970-05-23 | Exploratory | POINT (358242.7522037996 4729291.710525603) | 157.575758 |
We can also get a statistical summary of the data frame:
gdf.describe()
D__ | RT_Elevati | Total_De_1 | Total_Dept | Water_Dept | Well_No_ | seafl_twt | |
---|---|---|---|---|---|---|---|
count | 207.000000 | 207.000000 | 207.000000 | 207.000000 | 207.000000 | 207.000000 | 207.000000 |
mean | 237.246377 | 34.719952 | 13026.159662 | 3969.690821 | 142.545894 | 104.043478 | 191.981002 |
std | 127.765494 | 11.185937 | 4081.887038 | 1262.667988 | 350.268990 | 59.936723 | 471.742748 |
min | 0.000000 | 6.400000 | 1503.000000 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
25% | 129.500000 | 25.900000 | 9984.000000 | 3043.000000 | 39.500000 | 52.500000 | 53.198653 |
50% | 267.000000 | 31.400000 | 13202.000000 | 4024.000000 | 53.300000 | 104.000000 | 71.784512 |
75% | 349.500000 | 41.500000 | 15683.000000 | 4800.500000 | 82.650000 | 156.500000 | 111.313131 |
max | 403.000000 | 82.000000 | 21404.000000 | 6676.000000 | 2091.500000 | 207.000000 | 2816.835017 |
gdf.to_file('../data/offshore_wells_2011_UTM20_NAD83.shp')
ls ../data/*.shp
../data/offshore_wells_2011_Geographic_NAD27.shp*
../data/offshore_wells_2011_UTM20_NAD83.shp
GeoJSON is a standard format for encoding geospatial data. Think of it as a more web-friendly shapefile. (It's friendly because it's all contained in a single file, and the file is in JSON format, which JavaScript can process natively and other languages can easily consume).
# Must be geographic coords, so casting to WGS84.
gdf = gdf.to_crs({'init': 'epsg:4326'})
with open('../data/offshore_wells.geojson', 'w') as f:
f.write(gdf.to_json())
You can get slippy maps right in Jupyter Notebook with folium
. Install it with:
conda install folium
import folium
# Must be geographic coords, so casting to WGS84.
gdf = gdf.to_crs({'init': 'epsg:4326'})
# Make the map, add the features via GeoJSON.
mymap = folium.Map(location=[45, -62], zoom_start=7)
features = folium.features.GeoJson(gdf.to_json())
mymap.add_child(features)
mymap
© Agile Geoscience 2016