This notebook uses data covering a few kilometres around the township of Bungendore in NSW, Australia.
For convenience compressed data for this notebook is available for download from this link. This data will be imported in the section 'Importing data' in this notebook. Once you adjust the data path to your liking, this notebook should run fine.
Sample data was derived from the Australian Bureau of Meteorology National Groundwater Information System and from ELVIS - Elevation and Depth - Foundation Spatial Data. You can check the licensing for these data; the short version is that use for learning purposes is fine.
This notebook is a realistic example of one case study the pyela package addresses. What we obtain at the end of the workflow is a 3D gridded model of primary lithologies. A visual outcome (normally interactive) is:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import rasterio
from rasterio.plot import show
import geopandas as gpd
# Only set to True for co-dev of ela from this use case:
ela_from_source = False
# ela_from_source = True
if ela_from_source:
if ('ELA_SRC' in os.environ):
root_src_dir = os.environ['ELA_SRC']
elif sys.platform == 'win32':
root_src_dir = r'C:\src\github_jm\pyela'
else:
username = os.environ['USER']
root_src_dir = os.path.join('/home', username, 'src/github_jm/pyela')
pkg_src_dir = root_src_dir
sys.path.append(pkg_src_dir)
from ela.textproc import *
from ela.utils import *
from ela.classification import *
from ela.visual import *
from ela.spatial import SliceOperation
There are two main sets of information we need: the borehole lithology logs, and the spatial information in the surface elevation (DEM) and geolocation of a subset of bores around Bungendore.
data_path = None
You probably want to explicitly set data_path
to the location where you put the 'Bungendore' folder e.g:
#data_path = '/home/myusername/data' # On Linux, if you now have the folder /home/myusername/data/Bungendore
#data_path = r'C:\data\Lithology' # windows, if you have C:\data\Lithology\Bungendore
Otherwise a fallback for the pyela developer(s)
if data_path is None:
if ('ELA_DATA' in os.environ):
data_path = os.environ['ELA_DATA']
elif sys.platform == 'win32':
data_path = r'C:\data\Lithology'
else:
username = os.environ['USER']
data_path = os.path.join('/home', username, 'data/Lithology')
bungendore_datadir = os.path.join(data_path, 'Bungendore')
bidgee_path = os.path.join(bungendore_datadir, 'gw_shp_murrumbidgee_river/shp_murrumbidgee_river')
bungendore_bore_locations_fn = os.path.join(bungendore_datadir,'gw_shp_murrumbidgee_river/shp_bungendore.shp')
Read the lithology logs from the whole Murrumbidgee catchment - we will subset later.
lithology_logs = pd.read_csv(os.path.join(bidgee_path, 'NGIS_LithologyLog.csv'))
The DEM raster and the bore location shapefile do not use the same projection (coordinate reference system) so we reproject one of them. We choose the raster's UTM.
bungendore_raster = rasterio.open(os.path.join(bungendore_datadir,'CLIP_66949','CLIP_reproj_UTM55.tif'))
bore_locations_raw = gpd.read_file(bungendore_bore_locations_fn)
bore_locations_raw.crs, bungendore_raster.crs
({'init': 'epsg:3577'}, CRS.from_epsg(32755))
bore_locations = bore_locations_raw.to_crs(bungendore_raster.crs)
bore_locations.head()
HydroID | HydroCode | StateBoreI | StatePipeI | StateTerri | Agency | WCode | BoreDepth | DrilledDep | Status | ... | Hydrostrat | WaterLevel | Salinity | WaterCount | WaterDateM | WaterDat_1 | SalinityCo | SalinityDa | Salinity_1 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10055899 | GW061345.1.1 | GW061345 | 1.1 | 1 | 1 | 8 | 53.3 | 53.3 | UNK | ... | 0 | 0 | 0 | 0 | None | None | 0 | None | None | POINT (713783.3445196336 6099618.639519438) |
1 | 10094010 | GW404937.1.1 | GW404937 | 1.1 | 1 | 1 | 8 | 96.0 | 96.0 | ABN | ... | 0 | 0 | 0 | 0 | None | None | 0 | None | None | POINT (718150.9998919906 6093237.005439439) |
2 | 10114667 | GW414316.1.1 | GW414316 | 1.1 | 1 | 1 | 8 | 42.0 | 42.0 | USE | ... | 0 | 0 | 0 | 0 | None | None | 0 | None | None | POINT (713568.9940640796 6106887.001336516) |
3 | 10114672 | GW414581.1.1 | GW414581 | 1.1 | 1 | 1 | 8 | 42.0 | 0.0 | USE | ... | 0 | 0 | 0 | 0 | None | None | 0 | None | None | POINT (715642.0000155731 6097858.004591262) |
4 | 10122533 | GW402017.1.1 | GW402017 | 1.1 | 1 | 1 | 8 | 90.0 | 90.0 | UNK | ... | 0 | 0 | 0 | 0 | None | None | 0 | None | None | POINT (712620.6561813344 6102894.13739373) |
5 rows × 52 columns
# backup, but likely obsolete:
# raster_projstring = "+proj=utm +zone=55 +south +datum=WGS84 +units=m +no_defs"
# bore_locations = bore_locations_raw.to_crs(raster_projstring)
# bore_locations.head()
fig, ax = plt.subplots(figsize=(12, 12))
show(bungendore_raster,title='East of Bungendore, AU', cmap='terrain', ax=ax)
bore_locations.plot(ax=ax, facecolor='black')
<matplotlib.axes._subplots.AxesSubplot at 0x7f4cbda66cc0>
lithology_logs.head(n=10)
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | MinorLithCode | Description | Source | LogType | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3295968 | 10096490 | GW403458.1.1 | 332.05 | UNK | 29.0 | 30.0 | 303.05 | 302.05 | None | NaN | Sump | UNK | 1 |
1 | 3295974 | 10117707 | GW400023.1.1 | 108.41 | UNK | 69.0 | 71.0 | 39.41 | 37.41 | None | NaN | NaN | UNK | 1 |
2 | 3295982 | 10094368 | GW405126.1.1 | 223.24 | UNK | 0.0 | 0.6 | 223.24 | 222.64 | TPSL | NaN | Topsoil | UNK | 1 |
3 | 3296095 | 10113483 | GW415991.1.1 | None | UNK | 0.4 | 10.0 | None | None | CLAY | NaN | Clay | UNK | 1 |
4 | 3296209 | 10144702 | GW401440.1.1 | 76.17 | UNK | 128.0 | 139.0 | -51.83 | -62.83 | None | NaN | Clay | UNK | 1 |
5 | 3296429 | 10108243 | GW416031.1.1 | None | UNK | 1.0 | 4.0 | None | None | CLAY | NaN | Clay | UNK | 1 |
6 | 3296436 | 10130964 | GW416560.1.1 | None | UNK | 5.1 | 7.0 | None | None | FSND | NaN | Fine sand, dry | UNK | 1 |
7 | 3296437 | 10126252 | GW415984.1.1 | None | UNK | 0.0 | 0.2 | None | None | FILL | NaN | Fill | UNK | 1 |
8 | 3296441 | 10106582 | GW416266.1.1 | None | UNK | 1.0 | 5.0 | None | None | CLAY | NaN | Clay | UNK | 1 |
9 | 3296577 | 10004408 | GW030348.1.1 | 115.9 | NGS | 4.58 | 6.1 | 111.32 | 109.8 | CLAY | NaN | Clay - grey to yellow brown; quartz grains fin... | WC&IC | 2 |
Note that around 2500 data points do not have their AHD height but do have their depth. This will be remediated when we calculate our own AHD heights.
The lithology logs are for all of the Murrumbidgee catchment, which is much larger than the area of interest and for which we have the geolocation of boreholes. We subset to the location of interest
df = lithology_logs
Let's subset the logs based on spatial locations, to keep only those "nearby" the DEM we have near the locality of Bungendore in this case study. First define a few constants:
df.columns
Index(['OBJECTID', 'BoreID', 'HydroCode', 'RefElev', 'RefElevDesc', 'FromDepth', 'ToDepth', 'TopElev', 'BottomElev', 'MajorLithCode', 'MinorLithCode', 'Description', 'Source', 'LogType'], dtype='object')
DEPTH_FROM_COL = 'FromDepth'
DEPTH_TO_COL = 'ToDepth'
TOP_ELEV_COL = 'TopElev'
BOTTOM_ELEV_COL = 'BottomElev'
LITHO_DESC_COL = 'Description'
HYDRO_CODE_COL = 'HydroCode'
The geopandas data frame has a column geometry listing POINT
objects. 'ela' includes get_coords_from_gpd_shape
to extrace the coordinates to a simpler structure. 'ela' has predefined column names (e.g. EASTING_COL) defined for easting/northing information, that we can use to name our coordinate information.
geoloc = get_coords_from_gpd_shape(bore_locations, colname='geometry', out_colnames=[EASTING_COL, NORTHING_COL])
geoloc[HYDRO_CODE_COL] = bore_locations[HYDRO_CODE_COL]
geoloc.head()
Easting | Northing | HydroCode | |
---|---|---|---|
0 | 713783.344520 | 6.099619e+06 | GW061345.1.1 |
1 | 718150.999892 | 6.093237e+06 | GW404937.1.1 |
2 | 713568.994064 | 6.106887e+06 | GW414316.1.1 |
3 | 715642.000016 | 6.097858e+06 | GW414581.1.1 |
4 | 712620.656181 | 6.102894e+06 | GW402017.1.1 |
With this data frame we can perform two operations in one go: subsetting the lithology records to only the 640 bores of interest, and adding to the result the x/y geolocations to the data frame.
len(geoloc), len(df)
(640, 58944)
df = pd.merge(df, geoloc, how='inner', on=HYDRO_CODE_COL, sort=False, copy=True, indicator=False, validate=None)
len(df)
2454
df.head()
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | MinorLithCode | Description | Source | LogType | Easting | Northing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3297041 | 10037380 | GW058949.1.1 | 691.43 | UNK | 8.8 | 11.0 | 682.63 | 680.43 | SHLE | NaN | Shale broken | UNK | 1 | 720769.160247 | 6.095597e+06 |
1 | 3297109 | 10037380 | GW058949.1.1 | 691.43 | UNK | 0.0 | 0.9 | 691.43 | 690.53 | TPSL | NaN | Topsoil | UNK | 1 | 720769.160247 | 6.095597e+06 |
2 | 3297126 | 10037380 | GW058949.1.1 | 691.43 | UNK | 0.9 | 3.3 | 690.53 | 688.13 | CLAY | NaN | Clay | UNK | 1 | 720769.160247 | 6.095597e+06 |
3 | 3297127 | 10037380 | GW058949.1.1 | 691.43 | UNK | 5.2 | 7.3 | 686.23 | 684.13 | SAND | NaN | Sand clay | UNK | 1 | 720769.160247 | 6.095597e+06 |
4 | 3297180 | 10037380 | GW058949.1.1 | 691.43 | UNK | 7.3 | 8.8 | 684.13 | 682.63 | CLAY | NaN | Clay | UNK | 1 | 720769.160247 | 6.095597e+06 |
We round the depth related columns to the upper integer value and drop the entries where the resulting depths have degenerated to 0. ela
has a class DepthsRounding
to facilitate this operations on lithology records with varying column names.
We first clean up height/depths columns to make sure they are numeric.
def as_numeric(x):
if isinstance(x, float):
return x
if x == 'None':
return np.nan
elif x is None:
return np.nan
elif isinstance(x, str):
return float(x)
else:
return float(x)
df[DEPTH_FROM_COL] = df[DEPTH_FROM_COL].apply(as_numeric)
df[DEPTH_TO_COL] = df[DEPTH_TO_COL].apply(as_numeric)
df[TOP_ELEV_COL] = df[TOP_ELEV_COL].apply(as_numeric)
df[BOTTOM_ELEV_COL] = df[BOTTOM_ELEV_COL].apply(as_numeric)
dr = DepthsRounding(DEPTH_FROM_COL, DEPTH_TO_COL)
"Before rounding heights we have " + str(len(df)) + " records"
'Before rounding heights we have 2454 records'
df = dr.round_to_metre_depths(df, np.round, True)
"After removing thin sliced entries of less than a metre, we are left with " + str(len(df)) + " records left"
'After removing thin sliced entries of less than a metre, we are left with 2222 records left'
descs = df[LITHO_DESC_COL]
descs = descs.reset_index()
descs = descs[LITHO_DESC_COL]
descs.head()
0 Shale broken 1 Topsoil 2 Clay 3 Sand clay 4 Clay Name: Description, dtype: object
The description column as read seems to be objects. Other columns seem to be objects when they should be numeric. We define two functions to clean these.
def clean_desc(x):
if isinstance(x, float):
return u''
elif x is None:
return u''
else:
# python2 return unicode(x)
return x
y = [clean_desc(x) for x in descs]
from striplog import Lexicon
lex = Lexicon.default()
y = clean_lithology_descriptions(y, lex)
We get a flat list of all the "tokens" but remove stop words ('s', 'the' and the like)
y = v_lower(y)
vt = v_word_tokenize(y)
flat = np.concatenate(vt)
import nltk
from nltk.corpus import stopwords
stoplist = stopwords.words('english')
exclude = stoplist + ['.',',',';',':','(',')','-']
flat = [word for word in flat if word not in exclude]
len(set(flat))
271
df_most_common= token_freq(flat, 50)
plot_freq(df_most_common)
<matplotlib.axes._subplots.AxesSubplot at 0x7f4cb8092668>
There are terms such as 'sandy', 'clayey', 'silty' and so on. Let's define functions to detect terms derived from lithology classes, and their frequency. Given the likely skewness, we use a y log scale.
plot_freq_for_root(flat, 'sand')
<matplotlib.axes._subplots.AxesSubplot at 0x7f4cb3e8a6d8>
plot_freq_for_root(flat, 'clay')
<matplotlib.axes._subplots.AxesSubplot at 0x7f4cb3d52080>
# TODO: add a section that defines additional clean up e.g. 'sand/clay/soil' or dashed union composite terms
# split_composite_term('sand/clay/soil', '/').replace('/', ' / ')
df_most_common
token | frequency | |
---|---|---|
0 | shale | 833 |
1 | clay | 711 |
2 | grey | 345 |
3 | gravel | 270 |
4 | brown | 259 |
5 | water | 250 |
6 | supply | 218 |
7 | yellow | 206 |
8 | black | 187 |
9 | soft | 182 |
10 | sand | 155 |
11 | hard | 109 |
12 | light | 101 |
13 | topsoil | 93 |
14 | white | 87 |
15 | slate | 79 |
16 | bands | 78 |
17 | sandy | 75 |
18 | weathered | 73 |
19 | quartz | 67 |
20 | soil | 66 |
21 | granite | 65 |
22 | blue | 65 |
23 | red | 60 |
24 | broken | 53 |
25 | decomposed | 52 |
26 | coarse | 49 |
27 | fractured | 47 |
28 | shales | 42 |
29 | siltstone | 32 |
30 | bearing | 31 |
31 | sandstone | 29 |
32 | fine | 27 |
33 | dark | 26 |
34 | rock | 26 |
35 | plastic | 24 |
36 | basalt | 22 |
37 | pink | 19 |
38 | clayey | 18 |
39 | clays | 16 |
40 | grained | 16 |
41 | interbedded | 16 |
42 | sticky | 16 |
43 | medium | 15 |
44 | orange | 15 |
45 | & | 15 |
46 | large | 15 |
47 | silt | 14 |
48 | loamy | 13 |
49 | top | 13 |
From the list of most common tokens, we may want to define lithology classes as follows:
df[LITHO_DESC_COL] = y
lithologies = ['clay','sand','gravel','granite','shale','silt','topsoil','loam','soil','slate','sandstone']
And to capture any of these we devise a regular expression:
any_litho_markers_re = r'sand|clay|ston|shale|silt|granit|soil|gravel|loam|slate'
regex = re.compile(any_litho_markers_re)
my_lithologies_numclasses = create_numeric_classes(lithologies)
lithologies_dict = dict([(x,x) for x in lithologies])
lithologies_dict['sands'] = 'sand'
lithologies_dict['clays'] = 'clay'
lithologies_dict['shales'] = 'shale'
lithologies_dict['claystone'] = 'clay'
lithologies_dict['siltstone'] = 'silt'
lithologies_dict['limesand'] = 'sand' # ??
lithologies_dict['calcarenite'] = 'limestone' # ??
lithologies_dict['calcitareous'] = 'limestone' # ??
lithologies_dict['mudstone'] = 'silt' # ??
lithologies_dict['capstone'] = 'limestone' # ??
lithologies_dict['ironstone'] = 'sandstone' # ??
#lithologies_dict['topsoil'] = 'soil' # ??
lithologies_adjective_dict = {
'sandy' : 'sand',
'clayey' : 'clay',
'clayish' : 'clay',
'shaley' : 'shale',
'silty' : 'silt',
'gravelly' : 'gravel'
}
v_tokens = v_word_tokenize(y)
litho_terms_detected = v_find_litho_markers(v_tokens, regex=regex)
Let's see if we detect these lithology markers in each bore log entries
zero_mark = [x for x in litho_terms_detected if len(x) == 0 ]
at_least_one_mark = [x for x in litho_terms_detected if len(x) >= 1]
at_least_two_mark = [x for x in litho_terms_detected if len(x) >= 2]
print('There are %s entries with no marker, %s entries with at least one, %s with at least two'%(len(zero_mark),len(at_least_one_mark),len(at_least_two_mark)))
There are 148 entries with no marker, 2074 entries with at least one, 465 with at least two
Note: probably need to think of precanned facilities in ela to assess the detection rate in such EDA. Maybe wordcloud not such a bad idea too.
descs_zero_mark = [y[i] for i in range(len(litho_terms_detected)) if len(litho_terms_detected[i]) == 0 ]
descs_zero_mark[1:20]
['quartz water supply', 'humus', 'quartz', 'aquifer water supply', 'none', 'none', 'black rock', 'rock water supply', 'yellow rock', 'basalt', 'coal lignitic', 'driller', 'dacite water supply', 'surface rock', 'dacite, yellowibrown', 'dacite, blue with black', 'quartz feldspar porphry - quartz veins', 'sediment, light browns-white', 'sediment - fine']
primary_litho = v_find_primary_lithology(litho_terms_detected, lithologies_dict)
secondary_litho = v_find_secondary_lithology(litho_terms_detected, primary_litho, lithologies_adjective_dict, lithologies_dict)
df[PRIMARY_LITHO_COL]=primary_litho
df[SECONDARY_LITHO_COL]=secondary_litho
df[PRIMARY_LITHO_NUM_COL] = v_to_litho_class_num(primary_litho, my_lithologies_numclasses)
df[SECONDARY_LITHO_NUM_COL] = v_to_litho_class_num(secondary_litho, my_lithologies_numclasses)
df.head()
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | MinorLithCode | Description | Source | LogType | Easting | Northing | Lithology_1 | Lithology_2 | Lithology_1_num | Lithology_2_num | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3297041 | 10037380 | GW058949.1.1 | 691.43 | UNK | 9.0 | 11.0 | 682.63 | 680.43 | SHLE | NaN | shale broken | UNK | 1 | 720769.160247 | 6.095597e+06 | shale | 4.0 | NaN | |
1 | 3297109 | 10037380 | GW058949.1.1 | 691.43 | UNK | 0.0 | 1.0 | 691.43 | 690.53 | TPSL | NaN | topsoil | UNK | 1 | 720769.160247 | 6.095597e+06 | topsoil | 6.0 | NaN | |
2 | 3297126 | 10037380 | GW058949.1.1 | 691.43 | UNK | 1.0 | 3.0 | 690.53 | 688.13 | CLAY | NaN | clay | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | 0.0 | NaN | |
3 | 3297127 | 10037380 | GW058949.1.1 | 691.43 | UNK | 5.0 | 7.0 | 686.23 | 684.13 | SAND | NaN | sand clay | UNK | 1 | 720769.160247 | 6.095597e+06 | sand | clay | 1.0 | 0.0 |
4 | 3297180 | 10037380 | GW058949.1.1 | 691.43 | UNK | 7.0 | 9.0 | 684.13 | 682.63 | CLAY | NaN | clay | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | 0.0 | NaN |
While the bore entries have columns for AHD elevations, many appear to be missing data. Since we have a DEM of the region we can correct this.
cd = HeightDatumConverter(bungendore_raster)
df = cd.add_height(df,
depth_from_col=DEPTH_FROM_COL, depth_to_col=DEPTH_TO_COL,
depth_from_ahd_col=DEPTH_FROM_AHD_COL, depth_to_ahd_col=DEPTH_TO_AHD_COL,
easting_col=EASTING_COL, northing_col=NORTHING_COL, drop_na=False)
df.head()
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | ... | Source | LogType | Easting | Northing | Lithology_1 | Lithology_2 | Lithology_1_num | Lithology_2_num | Depth From (AHD) | Depth To (AHD) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3297041 | 10037380 | GW058949.1.1 | 691.43 | UNK | 9.0 | 11.0 | 682.63 | 680.43 | SHLE | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | shale | 4.0 | NaN | 682.26709 | 680.26709 | |
1 | 3297109 | 10037380 | GW058949.1.1 | 691.43 | UNK | 0.0 | 1.0 | 691.43 | 690.53 | TPSL | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | topsoil | 6.0 | NaN | 691.26709 | 690.26709 | |
2 | 3297126 | 10037380 | GW058949.1.1 | 691.43 | UNK | 1.0 | 3.0 | 690.53 | 688.13 | CLAY | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | 0.0 | NaN | 690.26709 | 688.26709 | |
3 | 3297127 | 10037380 | GW058949.1.1 | 691.43 | UNK | 5.0 | 7.0 | 686.23 | 684.13 | SAND | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | sand | clay | 1.0 | 0.0 | 686.26709 | 684.26709 |
4 | 3297180 | 10037380 | GW058949.1.1 | 691.43 | UNK | 7.0 | 9.0 | 684.13 | 682.63 | CLAY | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | 0.0 | NaN | 684.26709 | 682.26709 |
5 rows × 22 columns
# to be reused in experimental notebooks:
classified_logs_filename = os.path.join(data_path, 'Bungendore','classified_logs.pkl')
df.to_pickle(classified_logs_filename)
# max/min bounds
shp_bbox = get_bbox(bore_locations)
shp_bbox
(706394.2929625576, 6089546.817226024, 729010.1694019751, 6109176.439475171)
raster_bbox = bungendore_raster.bounds
raster_bbox
BoundingBox(left=698072.3317566791, bottom=6081052.435896559, right=743160.9813157511, top=6117034.783610981)
x_min = max(shp_bbox[0], raster_bbox[0])
x_max = min(shp_bbox[2], raster_bbox[2])
y_min = max(shp_bbox[1], raster_bbox[1])
y_max = min(shp_bbox[3], raster_bbox[3])
grid_res = 150
m = create_meshgrid_cartesian(x_min, x_max, y_min, y_max, grid_res)
[x.shape for x in m]
[(151, 131), (151, 131)]
dem_array = surface_array(bungendore_raster, x_min, y_min, x_max, y_max, grid_res)
dem_array.shape
(151, 131)
plt.imshow(to_carto(dem_array), cmap='terrain')
<matplotlib.image.AxesImage at 0x7f4cb3b4dd30>
dem_array_data = {'bounds': (x_min, x_max, y_min, y_max), 'grid_res': grid_res, 'mesh_xy': m, 'dem_array': dem_array}
import pickle
fp = os.path.join(bungendore_datadir, 'dem_array_data.pkl')
if not os.path.exists(fp):
with open(fp, 'wb') as handle:
pickle.dump(dem_array_data, handle, protocol=pickle.HIGHEST_PROTOCOL)
df.head(10)
OBJECTID | BoreID | HydroCode | RefElev | RefElevDesc | FromDepth | ToDepth | TopElev | BottomElev | MajorLithCode | ... | Source | LogType | Easting | Northing | Lithology_1 | Lithology_2 | Lithology_1_num | Lithology_2_num | Depth From (AHD) | Depth To (AHD) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3297041 | 10037380 | GW058949.1.1 | 691.43 | UNK | 9.0 | 11.0 | 682.63 | 680.43 | SHLE | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | shale | 4.0 | NaN | 682.267090 | 680.267090 | |
1 | 3297109 | 10037380 | GW058949.1.1 | 691.43 | UNK | 0.0 | 1.0 | 691.43 | 690.53 | TPSL | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | topsoil | 6.0 | NaN | 691.267090 | 690.267090 | |
2 | 3297126 | 10037380 | GW058949.1.1 | 691.43 | UNK | 1.0 | 3.0 | 690.53 | 688.13 | CLAY | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | 0.0 | NaN | 690.267090 | 688.267090 | |
3 | 3297127 | 10037380 | GW058949.1.1 | 691.43 | UNK | 5.0 | 7.0 | 686.23 | 684.13 | SAND | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | sand | clay | 1.0 | 0.0 | 686.267090 | 684.267090 |
4 | 3297180 | 10037380 | GW058949.1.1 | 691.43 | UNK | 7.0 | 9.0 | 684.13 | 682.63 | CLAY | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | 0.0 | NaN | 684.267090 | 682.267090 | |
5 | 3302117 | 10037380 | GW058949.1.1 | 691.43 | UNK | 11.0 | 39.0 | 680.43 | 652.73 | CLAY | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | shale | 0.0 | 4.0 | 680.267090 | 652.267090 |
6 | 3302166 | 10037380 | GW058949.1.1 | 691.43 | UNK | 39.0 | 56.0 | 652.73 | 635.43 | SAND | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | sand | 1.0 | NaN | 652.267090 | 635.267090 | |
7 | 3302167 | 10037380 | GW058949.1.1 | 691.43 | UNK | 56.0 | 57.0 | 635.43 | 634.43 | CLAY | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | clay | shale | 0.0 | 4.0 | 635.267090 | 634.267090 |
8 | 3302256 | 10037380 | GW058949.1.1 | 691.43 | UNK | 3.0 | 5.0 | 688.13 | 686.23 | SHLE | ... | UNK | 1 | 720769.160247 | 6.095597e+06 | shale | 4.0 | NaN | 688.267090 | 686.267090 | |
9 | 3297085 | 10027294 | GW058023.1.1 | 804.98 | UNK | 1.0 | 18.0 | 803.98 | 786.98 | SLTE | ... | UNK | 1 | 716921.554268 | 6.100716e+06 | slate | 9.0 | NaN | 807.349976 | 790.349976 |
10 rows × 22 columns
We need to define min and max heights on the Z axis for which we interoplate. We use the KNN algorithm with 10 neighbours. We should use a domain such that there are enough points for each height. Let's find visually heights with at least 10 records
n_bins=100
p = df.hist(column=[DEPTH_FROM_AHD_COL,DEPTH_TO_AHD_COL], sharex=True, sharey=True, bins=n_bins, figsize=(15,5))
for axes in p:
axes[0].set_yscale("log", nonposy='clip')
n_neighbours=10
ahd_min=620
ahd_max=830
z_ahd_coords = np.arange(ahd_min,ahd_max,1)
dim_x,dim_y = m[0].shape
dim_z = len(z_ahd_coords)
dims = (dim_x,dim_y,dim_z)
lithology_3d_array=np.empty(dims)
gi = GridInterpolation(easting_col=EASTING_COL, northing_col=NORTHING_COL)
gi.interpolate_volume(lithology_3d_array, df, PRIMARY_LITHO_NUM_COL, z_ahd_coords, n_neighbours, m)
# Burn DEM into grid
z_index_for_ahd = z_index_for_ahd_functor(b=-ahd_min)
# check that we get the expected indexes for an AHD height
z_index_for_ahd(800), z_index_for_ahd(+630)
(180, 10)
dem_array.shape, m[0].shape, lithology_3d_array.shape
((151, 131), (151, 131), (151, 131, 210))
burn_volume(lithology_3d_array, dem_array, z_index_for_ahd, below=False)
#lithologies = ['clay', 'sand', 'gravel', 'granite', 'shale', 'silt', 'topsoil', 'loam', 'soil', 'slate', 'sandstone']
lithology_color_names = ['olive', 'yellow', 'lightgrey', 'dimgray', 'teal', 'cornsilk', 'saddlebrown', 'rosybrown', 'chocolate', 'lightslategrey', 'gold']
lithology_cmap = discrete_classes_colormap(lithology_color_names) # Later for exporting to RGB geotiffs??
litho_legend_display_info = [(lithology_cmap[i], lithologies[i], lithology_color_names[i]) for i in range(len(lithologies))]
litho_legend = legend_fig(litho_legend_display_info)
cms = cartopy_color_settings(lithology_color_names)
imgplot = plt.imshow(to_carto(lithology_3d_array[:, :, z_index_for_ahd(660)]), cmap=cms['cmap'])
title = plt.title('Primary litho at +660mAHD')
imgplot = plt.imshow(to_carto(lithology_3d_array[:, :, z_index_for_ahd(700)]), cmap=cms['cmap'])
title = plt.title('Primary litho at +700mAHD')
from ela.visual3d import *
xx, yy = dem_array_data['mesh_xy']
from mayavi import mlab
vis_litho = LithologiesClassesVisual3d(lithologies, lithology_color_names, 'black')
# TODO: problematic with this data - investigate
# vis_litho.render_classes_planar(lithology_3d_array, 'Primary lithology')
# vis_litho.render_class(lithology_3d_array, 0)
ela has facilities to visualise overlaid information: DEM, classified bore logs, and volumes of interpolated lithologies. This is important to convey .
First a bit of data filling for visual purposes, as NaN lithology class codes may cause issues.
df_infilled = df.fillna({PRIMARY_LITHO_NUM_COL: -1.0})
# df_2 = df_1[(df_1[DEPTH_TO_AHD_COL] > (ahd_min-20))]
# A factor to apply to Z coordinates, otherwise things would be squashed visually along the heights.
# Would prefer a visual only scaling factor, but could not find a way to do so.
Z_SCALING = 20.0
z_coords = np.arange(ahd_min,ahd_max,1)
overlay_vis_litho = LithologiesClassesOverlayVisual3d(lithologies, lithology_color_names, 'black', dem_array_data, z_coords, Z_SCALING, df_infilled, PRIMARY_LITHO_NUM_COL)
def view_class(value):
f = overlay_vis_litho.view_overlay(value, lithology_3d_array)
return f
# f = view_class(0.0)
f = view_class(1.0)
The interpolated volume is naturally using AHD coordinates for height. We may want to slice data such that we have maps of lithologies at particular depths below ground level. SliceOperation
is here for this.
sops = SliceOperation(dem_array, z_index_for_ahd)
depths = sops.from_ahd_to_depth_below_ground_level(lithology_3d_array, from_depth = 0, to_depth = 10)
depths.shape
(151, 131, 11)
def plt_litho(depth_bgl):
imgplot = plt.imshow(to_carto(depths[:,:,10-depth_bgl]), cmap=cms['cmap'])
title = plt.title('Primary litho at %s m BGL'%depth_bgl)
plt_litho(0)
Although the DEM had no missing data, there are some areas with no interpolated lithology classification. These correspond to elevations where there were not enough records at a given AHD to interpolate from.
plt_litho(10)
'ela' includes facilities to export slices to the GeoTIFF format. Features are available through the GeotiffExporter
class
from rasterio.transform import from_origin
transform = from_origin(x_min, y_max, grid_res, grid_res)
from ela.io import GeotiffExporter
ge = GeotiffExporter(bungendore_raster.crs, transform)
litho_depth = to_carto(depths[:,:,0])
lithology_cmap = discrete_classes_colormap(lithology_color_names)
# ge.export_rgb_geotiff(litho_depth, '/home/xxxyyy/tmp/test.tif', lithology_cmap)
The resulting file (RGB channels) should be visible from most explorer windows (Windows or Linux). Other functions export the georeference values.