Table of Contents

Madonna di Canneto

Can you help preserve "blue gold" using data to predict water availability?
Acea Smart Water Analytics

https://www.kaggle.com/c/acea-water-prediction/data

Data Description

This competition uses nine different datasets, completely independent and not linked to each other. Each dataset can represent a different kind of waterbody. As each waterbody is different from the other, the related features as well are different from each other. So, if for instance we consider a water spring we notice that its features are different from the lake’s one. This is correct and reflects the behavior and characteristics of each waterbody. The Acea Group deals with four different type of waterbodies: water spring (for which three datasets are provided), lake (for which a dataset is provided), river (for which a dataset is provided) and aquifers (for which four datasets are provided).

Let’s see how these nine waterbodies differ from each other.

Aquifer

Auser
  • This waterbody consists of two subsystems, called NORTH and SOUTH, where the former partly influences the behavior of the latter. Indeed, the north subsystem is a water table (or unconfined) aquifer while the south subsystem is an artesian (or confined) groundwater. The levels of the NORTH sector are represented by the values of the SAL, PAG, CoS and DIEC wells, while the levels of the SOUTH sector by the LT2 well.
Petrignano Aquifer
  • The wells field of the alluvial plain between Ospedalicchio di Bastia Umbra and Petrignano is fed by three underground aquifers separated by low permeability septa. The aquifer can be considered a water table groundwater and is also fed by the Chiascio river. The groundwater levels are influenced by the following parameters: rainfall, depth to groundwater, temperatures and drainage volumes, level of the Chiascio river.
Doganella Aquifer
  • The wells field Doganella is fed by two underground aquifers not fed by rivers or lakes but fed by meteoric infiltration. The upper aquifer is a water table with a thickness of about 30m. The lower aquifer is a semi-confined artesian aquifer with a thickness of 50m and is located inside lavas and tufa products. These aquifers are accessed through wells called Well 1, ..., Well 9. Approximately 80 \% of the drainage volumes come from the artesian aquifer. The aquifer levels are influenced by the following parameters: rainfall, humidity, subsoil, temperatures and drainage volumes.
Luco Aquifer
  • The Luco wells field is fed by an underground aquifer. This aquifer not fed by rivers or lakes but by meteoric infiltration at the extremes of the impermeable sedimentary layers. Such aquifer is accessed through wells called Well 1, Well 3 and Well 4 and is influenced by the following parameters: rainfall, depth to groundwater, temperature and drainage volumes.

Water spring

Amiata
  • The Amiata waterbody is composed of a volcanic aquifer not fed by rivers or lakes but fed by meteoric infiltration. This aquifer is accessed through Ermicciolo, Arbure, Bugnano and Galleria Alta water springs. The levels and volumes of the four sources are influenced by the parameters: rainfall, depth to groundwater, hydrometry, temperatures and drainage volumes.
Madonna di Canneto
  • The Madonna di Canneto spring is situated at an altitude of 1010m above sea level in the Canneto valley. It does not consist of an aquifer and its source is supplied by the water catchment area of the river Melfa.
  • Settefrati is a commune with an altitude of 784 m and surface area of 50,6 km².
Lupa
  • this water spring is located in the Rosciano Valley, on the left side of the Nera river. The waters emerge at an altitude of about 375 meters above sea level through a long draining tunnel that crosses, in its final section, lithotypes and essentially calcareous rocks. It provides drinking water to the city of Terni and the towns around it.

River

Arno
  • Arno is the second largest river in peninsular Italy and the main waterway in Tuscany and it has a relatively torrential regime, due to the nature of the surrounding soils (marl and impermeable clays). Arno results to be the main source of water supply of the metropolitan area of Florence-Prato-Pistoia. The availability of water for this waterbody is evaluated by checking the hydrometric level of the river at the section of Nave di Rosano.

Lake

Bilancino
  • Bilancino lake is an artificial lake located in the municipality of Barberino di Mugello (about 50 km from Florence). It is used to refill the Arno river during the summer months. Indeed, during the winter months, the lake is filled up and then, during the summer months, the water of the lake is poured into the Arno river.

Each waterbody has its own different features to be predicted. The table below shows the expected feature to forecast for each waterbody.

It is of the utmost importance to notice that some features like rainfall and temperature, which are present in each dataset, don’t go alongside the date. Indeed, both rainfall and temperature affect features like level, flow, depth to groundwater and hydrometry some time after it fell down. This means, for instance, that rain fell on 1st January doesn’t affect the mentioned features right the same day but some time later. As we don’t know how many days/weeks/months later rainfall affects these features, this is another aspect to keep into consideration when analyzing the dataset.

A short, tabular description of the waterbodies is available also downloading all datasets.

More information about the behavior of each kind of waterbody can be found at the following links:

Acea-Input.png

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
sns.set()
sns.set_context('notebook')
np.set_printoptions(threshold=20, precision=2, suppress=True)
pd.options.display.max_rows = 57
pd.options.display.max_columns = 38
pd.set_option('precision', 2)

from IPython.display import Image, display
%config InlineBackend.figure_format = 'retina';
from IPython.core.display import display, HTML, SVG
display(HTML("<style>.container {width:70% !important;}</style>"))
sns.axes_style()
Out[2]:
{'axes.facecolor': '#EAEAF2',
 'axes.edgecolor': 'white',
 'axes.grid': True,
 'axes.axisbelow': True,
 'axes.labelcolor': '.15',
 'figure.facecolor': 'white',
 'grid.color': 'white',
 'grid.linestyle': '-',
 'text.color': '.15',
 'xtick.color': '.15',
 'ytick.color': '.15',
 'xtick.direction': 'out',
 'ytick.direction': 'out',
 'lines.solid_capstyle': 'round',
 'patch.edgecolor': 'w',
 'patch.force_edgecolor': True,
 'image.cmap': 'rocket',
 'font.family': ['sans-serif'],
 'font.sans-serif': ['Arial',
  'DejaVu Sans',
  'Liberation Sans',
  'Bitstream Vera Sans',
  'sans-serif'],
 'xtick.bottom': False,
 'xtick.top': False,
 'ytick.left': False,
 'ytick.right': False,
 'axes.spines.left': True,
 'axes.spines.bottom': True,
 'axes.spines.right': True,
 'axes.spines.top': True}
In [4]:
imgValle= Image( url="http://vanoproy.be/css/Valle_Canneto-Irto-Bellaveduta-RoccaAltiera.jpg", width=1300) # http://vanoproy.be/css filename
imgInfil= Image( url="http://vanoproy.be/python/figs/evaporation_infiltration_en.png", width=500)
imgValle
Out[4]:

This map of VdC I found on a website reporting a daylong hike, which trail is imaged by the blue trace. The water spring itself surfaces on the right side of the valley, near the 1025 m notation.
South of the peak Monti della Meta there is a "plateau", a quasi horizontal surface with some local depressions, which can serve as a supplementory infiltration area for the water spring.
I can imaging there is some underground connection as dissolving calcium could have made conduits, while earthquakes and tectonics could have made cracks in the rocks.

Introduction: the study of springs

In recent years, long and frequent droughts have affected many countries in the world. These events require an ever more careful and rational management of water resources. Most of the globe’s unfrozen freshwater reserves are stored in aquifers. Groundwater is generally a renewable resource that shows good quality and resilience to fluctuations. Thus, if properly managed, groundwater could ensure long-term supply in order to meet increasing water demand.

For this purpose, it is of crucial importance to be able to predict the flow rates provided by springs. These represent the transitions from groundwater to surface water and reflect the dynamics of the aquifer, with the whole flow system behind. Moreover, spring influences water bodies into which they discharge. The importance of springs in groundwater research is highlighted in some significant contributions. In-depth studies on springs started only after the concept of sustainability was introduced in the management of water resources.

A spring hydrograph is the consequence of several processes governing the transformation of precipitation in the spring recharge area into the single output discharge at the spring. A water balance states that the change rate in water stored in the feeding aquifer is balanced by the rate at which water flows into and out of the aquifer. A quantitative water balance generally has to take the following terms into account: precipitation, infiltration, surface runoff, evapotranspiration, groundwater recharge, soil moisture deficit, spring discharge, lateral inflow to the aquifer, leakage between the aquifer and the underlying aquitard, well pumpage from the aquifer, and change of the storage in the aquifer.

In many cases, the evaluation of the terms of the water balance is very complicated. The complexity of the problem arises from many factors: hydrologic, hydrographic, and hydrogeological features, geologic and geomorphologic characteristics, land use, land cover, water withdrawals, and climatic conditions.

Even more complicated would be to estimate future spring discharges by using a model based on the balance equations. Therefore, simplified approaches are frequently pursued for practical purposes.

Many authors have addressed the problem of correlating the spring discharges to the rainfall through different approaches...

Prologue

Earthquakes 2019 - 2020

Let's see what a quake of magnitude 3- 4.5 can do to the flow rate. Approximation of the distance from epicenter by the coordinates.

Note: you can skip the earthquake chapter, as these phenomenon are unpredictable. However, closeby earthquakes have an influence: they do distort the data.

In [3]:
Earthquakes =pd.read_csv(r"earthquake_15_20_EMSC.csv",header=0, sep=",", engine='python', encoding="UTF-8",parse_dates=True  ); 
Earthquakes["Date"] = pd.to_datetime( Earthquakes.Date, format='%Y/%m/%d' , ) # euro dates utc=True
Earthquakes.set_index("Date", inplace=True)
In [8]:
Earthquakes["Dst"] = np.sqrt(np.abs((41.683364-Earthquakes.Latitude ) )+ np.abs((13.910494 -Earthquakes.Longitude)))#**2 **2
Earthquakes.nlargest(19, "Magnitude")
Out[8]:
Time UTC Latitude Longitude Depth Depth Type Magnitude Type Magnitude Region Name Last Update Eqid Dst
Date
2019-06-23 20:43:48 41.86 12.77 9 f ML 3.7 SOUTHERN ITALY 2019-06-24 03:07 772989 1.15
2017-02-03 21:33:03 41.05 12.72 2 ML 3.7 SOUTHERN ITALY 2017-02-04 13:22 564948 1.35
2017-02-03 23:08:05 41.01 12.69 2 ML 3.5 SOUTHERN ITALY 2017-02-04 13:21 564959 1.38
2018-12-29 23:52:48 41.87 12.78 10 f ML 2.9 SOUTHERN ITALY 2018-12-30 19:51 735792 1.15
2017-10-25 13:35:55 41.56 13.19 11 f ML 2.9 SOUTHERN ITALY 2017-10-25 13:46 626212 0.92
2017-09-28 06:57:15 41.53 12.95 5 ML 2.8 SOUTHERN ITALY 2017-09-28 07:10 620966 1.06
2017-02-04 02:10:42 41.07 12.68 11 f ML 2.8 SOUTHERN ITALY 2017-02-04 02:29 565004 1.36
2017-02-04 00:20:14 41.08 12.80 10 f ML 2.8 SOUTHERN ITALY 2017-02-04 00:34 564975 1.31
2016-12-27 21:56:24 41.97 13.43 10 ML 2.7 SOUTHERN ITALY 2016-12-27 22:28 555582 0.88
2019-03-22 02:04:14 41.55 13.39 10 ML 2.6 SOUTHERN ITALY 2019-03-22 02:13 752723 0.81
2019-01-01 07:33:55 41.69 13.37 11 ML 2.5 SOUTHERN ITALY 2019-01-01 07:48 736231 0.74
2018-11-15 07:35:55 41.68 12.76 11 ML 2.5 SOUTHERN ITALY 2018-11-15 07:57 725618 1.07
2018-03-22 02:30:37 41.65 13.19 9 ML 2.5 SOUTHERN ITALY 2018-03-22 02:48 655561 0.87
2017-07-18 08:13:28 41.81 13.21 9 ML 2.5 SOUTHERN ITALY 2017-07-18 08:20 605876 0.91
2017-01-14 07:33:01 41.72 12.68 10 ML 2.5 SOUTHERN ITALY 2017-01-14 08:09 559767 1.13
2016-03-02 06:12:33 41.95 12.64 10 ML 2.5 SOUTHERN ITALY 2016-03-02 06:25 491805 1.24
2015-07-07 17:47:27 41.93 12.79 11 ML 2.5 SOUTHERN ITALY 2015-07-07 17:56 449766 1.17
2015-04-07 07:37:23 41.93 12.70 12 ML 2.5 SOUTHERN ITALY 2015-04-07 07:51 435480 1.21
2019-11-15 00:59:07 41.87 12.77 10 ML 2.4 SOUTHERN ITALY 2019-11-15 01:08 805396 1.15
Earthquakes.info()
In [ ]:
Earthquakes.head(9)
In [6]:
Distances =pd.read_csv(r"http://vanoproy.be/css/Distances.csv",header=0, sep=",", engine='python', encoding="UTF-8",parse_dates=True ); 
Distances["Date"] = pd.to_datetime( Distances.Date, format='%Y/%m/%d', ) # euro dates
Distances.set_index("Date", inplace=True)
In [7]:
Distances.head()
Out[7]:
Dist
Date
2019-11-15 79.98
2019-11-15 96.82
2019-11-13 96.60
2019-11-09 96.39
2019-06-23 96.60

Distances.csv was made to calculate the correct distance of 2 points on a globe, before I found out the haversine def. in Datasist package. Therefore you find the 2 methods for distances here.

In [16]:
Earthquakes2= pd.merge(Earthquakes, Distances,left_index=True,right_index=True,how="left" )
Earthquakes2.head()
Out[16]:
Time UTC Latitude Longitude Depth Depth Type Magnitude Type Magnitude Region Name Last Update Eqid Dst Dist
Date
2014-11-26 21:55:22 41.86 12.74 10 ML 2.0 SOUTHERN ITALY 2014-11-29 09:25 410944 1.18 NaN
2015-02-04 09:36:59 41.92 12.67 16 ML 2.4 SOUTHERN ITALY 2015-02-04 09:48 425738 1.26 106.14
2015-02-28 19:21:04 41.75 13.23 9 ML 2.2 SOUTHERN ITALY 2015-02-28 19:36 429768 0.68 NaN
2015-03-07 15:14:35 41.64 12.98 20 ML 2.0 SOUTHERN ITALY 2015-03-07 15:37 430751 0.93 NaN
2015-03-20 05:06:40 41.87 13.27 8 ML 2.4 SOUTHERN ITALY 2015-03-20 05:16 432811 0.67 57.02
!pip install datasist # haversine
In [12]:
from datasist.feature_engineering import haversine_distance #afstanden
In [13]:
haversine_distance(41.683364, 13.910494, 41.87, 12.77)
Out[13]:
0    96.82
dtype: float64

afstanden=[(41.87, 12.77), (41.86, 12.77), (41.67, 12.75), (41.86, 12.77), (41.55, 13.39), (41.69, 13.37), (41.87, 12.78), (41.69, 12.67), (41.68, 12.76), (41.65, 13.19), (41.53, 13.48), (41.56, 13.19), (41.53, 12.95), (41.47, 12.97), (41.82, 13.21), (41.81, 13.21), (41.98, 13.05), (41.72, 13.21), (41.99, 12.93), (41.07, 12.68), (41.08, 12.8), (41.01, 12.69), (41.05, 12.72), (41.72, 12.68), (41.73, 12.68), (41.97, 13.43), (41.95, 12.64), (41.52, 12.94), (41.93, 12.79), (41.65, 12.8), (41.93, 12.7), (41.93, 12.72), (41.87, 13.27), (41.92, 12.67)]

In [ ]:
 

Earthquakes["Dist"]=pd.Series(afstanden)

In [13]:
EQpiv =Earthquakes.pivot("Eqid", "Depth","Magnitude")
In [14]:
fig, ax = plt.subplots(1,1, figsize=(19, 8.5))
sns.heatmap( data=EQpiv, annot=True, fmt=".1f",cmap="magma" ); #rows="Magnitude", columns="Dst",Earthquakes .corr()

Earthquakes with magnitude 2.5 - 4

Intensity = Dist² * original Magnitude

In [17]:
fig, ax = plt.subplots(1,1, figsize=(15, 5))
plt.xscale( "log")
sns.scatterplot (x=Earthquakes2["Dist"], y=Earthquakes2["Magnitude"] ,data=Earthquakes2,hue=Earthquakes2.Depth, label=Earthquakes2.Eqid,palette="magma"); 
In [21]:
Earthquakes2["ProxiMagnit"]=Earthquakes2["Magnitude"] /np.sqrt(Earthquakes2["Dist"])# Intensity = Dist²* original Magnitude
In [19]:
fig, ax = plt.subplots(1,1,  figsize=(14, 5)) 
sns.scatterplot( data= Earthquakes2,x=Earthquakes2["ProxiMagnit"], y=Earthquakes2["Dist"],hue=Earthquakes2["Magnitude"],size= Earthquakes2.Depth); 
fig, ax = plt.subplots(1,1, figsize=(14, 5)) sns.scatterplot( data= Earthquakes,x=Earthquakes["ProxiMagnit"], y=Earthquakes["Dst"],hue=Earthquakes["Magnitude"],size= Earthquakes.Depth);
In [23]:
Earthquakes2.nlargest(9, "ProxiMagnit")
Out[23]:
Time UTC Latitude Longitude Depth Depth Type Magnitude Type Magnitude Region Name Last Update Eqid Dst Dist ProxiMagnit
Date
2019-03-22 02:04:14 41.55 13.39 10 ML 2.6 SOUTHERN ITALY 2019-03-22 02:13 752723 0.54 45.74 0.38
2016-12-27 21:56:24 41.97 13.43 10 ML 2.7 SOUTHERN ITALY 2016-12-27 22:28 555582 0.56 51.00 0.38
2019-06-23 20:43:48 41.86 12.77 9 f ML 3.7 SOUTHERN ITALY 2019-06-24 03:07 772989 1.15 96.60 0.38
2019-01-01 07:33:55 41.69 13.37 11 ML 2.5 SOUTHERN ITALY 2019-01-01 07:48 736231 0.54 44.89 0.37
2017-10-25 13:35:55 41.56 13.19 11 f ML 2.9 SOUTHERN ITALY 2017-10-25 13:46 626212 0.73 61.44 0.37
2017-12-23 12:17:20 41.53 13.48 10 ML 2.3 SOUTHERN ITALY 2017-12-23 12:25 637193 0.46 39.65 0.37
2017-02-03 21:33:03 41.05 12.72 2 ML 3.7 SOUTHERN ITALY 2017-02-04 13:22 564948 1.35 121.78 0.34
2017-02-03 21:33:03 41.05 12.72 2 ML 3.7 SOUTHERN ITALY 2017-02-04 13:22 564948 1.35 126.44 0.33
2017-07-18 08:13:28 41.81 13.21 9 ML 2.5 SOUTHERN ITALY 2017-07-18 08:20 605876 0.71 59.80 0.32
In [10]:
import csv
Water_Spring_Madonna_di_Canneto =pd.read_csv(r"Water_Spring_Madonna_di_Canneto.csv",sep=",",encoding="UTF-8",decimal="." ,header=0,error_bad_lines=False,warn_bad_lines=False,quoting=csv.QUOTE_NONE)
Water_Spring_Madonna_di_Canneto["Date"] = pd.to_datetime( Water_Spring_Madonna_di_Canneto.Date, format='%d/%m/%Y' ) # euro dates
Water_Spring_Madonna_di_Canneto.set_index("Date", inplace=True) #Water_Spring_Madonna_di_Canneto.head() 
In [24]:
dfC= Water_Spring_Madonna_di_Canneto["2015":"2020"] # > quake plot
In [25]:
import datetime
sns.set_style("whitegrid", {'grid.linestyle': '--'})
fig, ax = plt.subplots(1,1,  figsize=(21, 8), sharex=True) #, Earthquakes2 2019 - 2020

Earthquakes2= Earthquakes2[Earthquakes2.Magnitude >= 2.25]
vert= Earthquakes2.index.date ##ax.vline( datetime.date(2015, 2, 28), [0] ,1) #list( )
for v in vert:
    plt.axvline( v, 0 ,1 , color="darkorange", lw=0.6) # datetime.date(2015, 2, 28)
sns.scatterplot(x=dfC.index, y= dfC.Flow_Rate_Madonna_di_Canneto, data=dfC, s=4) 
sns.scatterplot(x="Date", y= Earthquakes2.Magnitude*100, data=Earthquakes2, hue="Dist", label="Magnitude$ML*100$") #kind="line" ax=ax

plt.xticks(rotation=90); #axe.setxaxis(rotation=90)
plt.xlim(pd.to_datetime( "2015-01-01"),pd.to_datetime("2019-12-29" ) ); plt.ylim(180,380); 

The quake of 2018-03-22 02:30:37 with Magnitude 2.5 might have caused a short period of outflow. (depth:9 and eqid:655561 distance:0.71)

In [ ]:
vert

Earthquakes magnitude > 5

Some more energetic quakes, but less distorting because they originated further away from the spring.

In [8]:
#!pip install install openpyxl 
Earthquake15 =pd.read_excel(r"http://vanoproy.be/css/Water_Spring_Madonna_di_Canneto.xlsx",header=0, parse_dates=True, sheet_name='seismic' ); 
Earthquake15["Date_event"] = pd.to_datetime( Earthquake15.Date_event, format='%B %d %Y' ) # euro dates
Earthquake15.set_index("Date_event", inplace=True)
In [9]:
Earthquake15["Dst"] = np.sqrt(np.abs((41.683364-Earthquake15.Latitude )**2 )+ np.abs((13.910494-Earthquake15.Longitude)**2 ))
Earthquake15.head(21)
Out[9]:
Date_time_UTC Mw Epicenter_location Depth_km Latitude Longitude Dst
Date_event
2016-08-24 August 24 2016 01:36:32 6.0 Accumoli 8.1 42.70 13.23 1.22
2016-08-24 August 24 2016 02:33:28 5.3 Norcia 8.0 42.79 13.15 1.34
2016-10-26 October 26 2016 17:10:36 5.4 Castelsantangelo sul Nera 8.7 42.88 13.13 1.43
2016-10-26 October 26 2016 19:18:05 5.9 Castelsantangelo sul Nera 7.5 42.91 13.13 1.45
2016-10-30 October 30 2016 06:40:17 6.5 Norcia 9.2 42.83 13.11 1.40
2017-01-18 January 18 2017 09:25:40 5.1 Capitignano 10.0 42.55 13.28 1.07
2017-01-18 January 18 2017 10:14:09 5.5 Capitignano 9.6 42.53 13.28 1.05
2017-01-18 January 18 2017 10:25:23 5.4 Capitignano 9.4 42.50 13.28 1.04
2017-01-18 January 18 2017 13:33:36 5.0 Barete 9.5 42.47 13.27 1.01
In [10]:
Earthquake15.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 9 entries, 2016-08-24 to 2017-01-18
Data columns (total 7 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Date_time_UTC       9 non-null      object 
 1   Mw                  9 non-null      float64
 2   Epicenter_location  9 non-null      object 
 3   Depth_km            9 non-null      float64
 4   Latitude            9 non-null      float64
 5   Longitude           9 non-null      float64
 6   Dst                 9 non-null      float64
dtypes: float64(5), object(2)
memory usage: 576.0+ bytes
In [ ]:
Lati= Earthquake15["Latitude"].values ; Longi=Earthquake15["Longitude"].values #.values .values# .items()
print(Lati)
hvs1=[]; hvs2=[]
for La in Lati:#
    for Lon in Longi:
        print(La, Lon, type(hvs1)) #Lo 
        Earthquake15hvs =  haversine_distance(41.683364, 13.910494 ,float(La), 50 )  # havers gives 2 values
        print(Earthquake15hvs )
        #hvs1= hvs1.append(Earthquake15hvs[0]);# hvs2= hvs2.append(Earthquake15hvs[1])
        # Earthquake15["hvs"]
In [ ]:
Earthquake15["hvs"].values
In [27]:
Earthquake15hvs
Out[27]:
0    2957.21
dtype: float64
import datetime sns.set_style("whitegrid", {'grid.linestyle': '--'}) fig, ax = plt.subplots(1,1, figsize=(21, 8)) # #Earthquake15= Earthquake15[Earthquake15.Mw >= 5.0] vert= Earthquake15.index.date ##ax.vline( datetime.date(2015, 2, 28), [0] ,1) #list( ) for v in vert: plt.axvline( v, 0 ,1 , color="darkorange", lw=0.6) # datetime.date(2015, 2, 28) sns.scatterplot(x=dfC.index, y= dfC.Flow_Rate_Madonna_di_Canneto, data=dfC, s=4) sns.scatterplot(x=Earthquake15.index, y= Earthquake15.Mw *50, data=Earthquake15, label="Magnitude$ML*50$") #kind="line" ax=ax plt.xticks(rotation=90); #axe.setxaxis(rotation=90) plt.xlim(pd.to_datetime( "2015-01-01" ),pd.to_datetime("2019-12-29" ) );

Merge the Earthquakes data with Water_Spring_Madonna_di_Canneto...

dfCE= pd.merge(Water_Spring_Madonna_di_Canneto, Earthquakes2.Dist,left_index=True,right_index=True,how="left" ) dfCE

dfCE.Dist= dfCE.Dist.replace( np.nan,10000000)

In [5]:
import datetime
start = datetime.datetime(2004, 1, 1); end = datetime.datetime(2019, 1, 1)

In [6]:
pd.options.display.max_rows = 57
In [7]:
%precision 4
Out[7]:
'%.4f'

Groundwater discharge

From Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/Groundwater_discharge

Groundwater discharge is the volumetric flow rate of groundwater through an aquifer.

Total groundwater discharge, as reported through a specified area, is similarly expressed as:

${\displaystyle Q={\frac{dh}{dl}}KA}$

where

Q is the total groundwater discharge ( m3/s),
K is the hydraulic conductivity of the aquifer (L/s),
dh/dl is the hydraulic gradient ([L·L−1]; unitless), and
A is the area which the groundwater is flowing through ( m2)

For example, this can be used to determine the flow rate of water flowing along a plane with known geometry. The discharge potential ${\textstyle \Phi }$

The discharge potential is a potential in groundwater mechanics which links the physical properties, hydraulic head, with a mathematical formulation for the energy as a function of position. The discharge potential,${\textstyle \Phi }$ L3·T−1, is defined in such way that its gradient equals the discharge vector.

${\displaystyle Q_{x}=-{\frac {\partial \Phi }{\partial x}}}$ & ${\displaystyle Q_{y}=-{\frac {\partial \Phi }{\partial y}}}$

Thus the hydraulic head may be calculated in terms of the discharge potential, for confined flow as

${\displaystyle \Phi =K. H\phi }$

and for unconfined shallow flow as

$${\displaystyle \Phi ={\frac {1}{2}}K\phi^{2}+C}$$

where

$H$ is the thickness of the aquifer [L],
${ \phi }$ is the hydraulic head [L], and
${\textstyle C}$ is an arbitrary constant [L3·T−1] given by the boundary conditions.

As mentioned the discharge potential may also be written in terms of position. The discharge potential is a function of the Laplace's equation

$${\displaystyle {\frac {\partial ^{2}\Phi }{\partial x^{2}}}+{\frac {\partial^{2}\Phi }{\partial y^{2}}}=0}$$

which solution is a linear differential equation. Because the solution is a linear differential equation for which superposition principle holds, it may be combined with other solutions for the discharge potential, e.g. uniform flow, multiple wells, analytical elements (analytic element method). See also Groundwater flow equation


Water spring Madonna di Canneto

The Madonna di Canneto spring is situated at an altitude of 1010m above sea level in the Canneto valley. It does not consist of an aquifer and its source is supplied by the water catchment area of the river Melfa.
The surface area of Settefrati itself is 50,6 km².

Some interesting info about the valley

Valle di Canneto has been named after the bamboo plants which once grew there. The valley consists of several smaller streams and several springs, of which only few bear a name. Nonetheless, a spring called "Madonna" cannot be found on maps, perhaps cos it runs dry so regularly. Anyhow, the sanctuary bears this name, and attracts pelgrims since centuries.

Location 41° 40' 41.88'' N 13° 54' 29.772'' E   Soils of Italy

References

  • SPRINGS (CLASSIFICATION, FUNCTION,CAPTURING), Soulios G., Dep. of Geology, Aristotle University of Thessaloniki, Bulletin of the Geological Society of Greece, Vol. 43, (2010), DOI: 10.12681/bgsg.11174
  • Urban hydrology for small watersheds. Tech. Release 55., U.S. Department of Agriculture, Soil Conservation Service. 1986.
  • RUNOFF CURVE NUMBER METHOD: EXAMINATION OF THE INITIAL ABSTRACTION RATIO, Richard H. Hawkins, Professor, University of Arizona, Tucson, (USDA, Natural Resources Conservation Service, 2002)
  • Chapter 20 - Watershed Yield, Part 630 Hydrology, National Engineering Handbook, 2009, USDA
  • Groundwater - Hydrology of springs, Chapter 4 - Spring discharge hydrograph, Neven Kresic, Amec Foster Wheeler, Bonacci o., Univ Split Croatia, (2010)
  • Karst Groundwater Availability and Sustainable Development, F. Fiorillo, V. Ristić Vakanjac, I. Jemcov, S. P Milanovic, University of Belgrade, February 2015), DOI: 10.1007/978-3-319-12850-4_15
  • Predicted Maps for Soil Organic Matter Evaluation: The Case of Abruzzo Region (Italy). Piccini, C.; Francaviglia, R.; Marchetti, A., Land 2020, 9, 349. https://doi.org/10.3390/land9100349
  • The "APEX" Agricultural Policy Environmental eXtender Model theoretical documentation, Version 0604, BREC Report # 2008-17.
In [46]:
import csv
Water_Spring_Madonna_di_Canneto =pd.read_csv(r"Water_Spring_Madonna_di_Canneto.csv",sep=",",encoding="UTF-8",decimal="." ,header=0,error_bad_lines=False,warn_bad_lines=False,quoting=csv.QUOTE_NONE)
Water_Spring_Madonna_di_Canneto.head()  #  engine='python',
Out[46]:
Date Rainfall_Settefrati Temperature_Settefrati Flow_Rate_Madonna_di_Canneto
0 01/01/2012 0.0 5.25 NaN
1 02/01/2012 5.6 6.65 NaN
2 03/01/2012 10.0 8.85 NaN
3 04/01/2012 0.0 6.75 NaN
4 05/01/2012 1.0 5.55 NaN
In [47]:
Water_Spring_Madonna_di_Canneto["Date"] = pd.to_datetime( Water_Spring_Madonna_di_Canneto.Date, format='%d/%m/%Y' ) # euro dates, utc=True
Water_Spring_Madonna_di_Canneto.set_index("Date", inplace=True) 
Water_Spring_Madonna_di_Canneto["doy"] = Water_Spring_Madonna_di_Canneto.index.dayofyear 
Water_Spring_Madonna_di_Canneto["Month"] = Water_Spring_Madonna_di_Canneto.index.month 
Water_Spring_Madonna_di_Canneto.head(6)
Out[47]:
Rainfall_Settefrati Temperature_Settefrati Flow_Rate_Madonna_di_Canneto doy Month
Date
2012-01-01 0.0 5.25 NaN 1 1
2012-01-02 5.6 6.65 NaN 2 1
2012-01-03 10.0 8.85 NaN 3 1
2012-01-04 0.0 6.75 NaN 4 1
2012-01-05 1.0 5.55 NaN 5 1
2012-01-06 5.2 5.05 NaN 6 1
#Water_Spring_Madonna_di_Canneto.set_index("Date", inplace=True) #Water_Spring_Madonna_di_Canneto.drop("Flow_Rate_Madonna_di_Canneto", inplace=True) Water_Spring_Madonna_di_Canneto.index = pd.to_datetime( Water_Spring_Madonna_di_Canneto.index, format='%Y-%m-%d' ) # %H:%M:%S
In [7]:
Water_Spring_Madonna_di_Canneto.tail(6)
Out[7]:
Rainfall_Settefrati Temperature_Settefrati Flow_Rate_Madonna_di_Canneto doy Month date date_check
Date
2020-06-25 NaN NaN 224.01 177 6 2020-06-25 1 days
2020-06-26 NaN NaN 223.92 178 6 2020-06-26 1 days
2020-06-27 NaN NaN 223.86 179 6 2020-06-27 1 days
2020-06-28 NaN NaN 223.76 180 6 2020-06-28 1 days
2020-06-29 NaN NaN 223.77 181 6 2020-06-29 1 days
2020-06-30 NaN NaN 223.75 182 6 2020-06-30 1 days
In [49]:
Water_Spring_Madonna_di_Canneto.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 3104 entries, 2012-01-01 to 2020-06-30
Data columns (total 5 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Rainfall_Settefrati           2557 non-null   float64
 1   Temperature_Settefrati        2557 non-null   float64
 2   Flow_Rate_Madonna_di_Canneto  1387 non-null   float64
 3   doy                           3104 non-null   int64  
 4   Month                         3104 non-null   int64  
dtypes: float64(3), int64(2)
memory usage: 145.5 KB
In [50]:
Water_Spring_Madonna_di_Canneto_original= Water_Spring_Madonna_di_Canneto.copy()
import os for dirname, _, filenames in os.walk('acea-water-prediction'): # /kaggle/input for filename in filenames: print(os.path.join(dirname, filename)) continue
In [62]:
imgCapo=Image( filename="capodacqua-in-val-canneto.jpg"); imgCapo
Out[62]:

Rainfall

In [15]:
Water_Spring_Madonna_di_Canneto.Rainfall_Settefrati.plot(figsize=(20, 4)); # ["2019"]

This series stops end 2018, and there's nada from 2019 on... however there is flow rate data provided after this moment up to 30/6/2020, which I don't want to ignore nor leave untrained.
I could use the averages distilled from the other years. It is a balancing act between the scarcity of data and the weight of the margin of error when inserting averages.


In [51]:
average36= Water_Spring_Madonna_di_Canneto.loc["2013-01-01":"2015-12-31"] #  00:00:00 00:00:00
average5= Water_Spring_Madonna_di_Canneto.loc["2017-01-01":"2019-12-31"] #  
average3650=average36.append( average5)
average365= average3650.groupby('{:%m-%d}'.format).mean()
In [53]:
averageTemp= Water_Spring_Madonna_di_Canneto.groupby('{:%m-%d}'.format).mean()
averageTemp.doy=  averageTemp.doy.astype("int32")    # correction for doy 361.25
averageTemp2020 = averageTemp                       # copy of df for 2020
averageTemp#.index #.Rainfall_Settefrati
Out[53]:
Rainfall_Settefrati Temperature_Settefrati Flow_Rate_Madonna_di_Canneto doy Month
01-01 1.49 4.54 268.39 1 1
01-02 3.17 5.24 268.14 2 1
01-03 6.59 6.67 268.43 3 1
01-04 3.47 6.89 270.83 4 1
01-05 6.29 5.87 270.67 5 1
... ... ... ... ... ...
12-27 15.26 5.96 269.19 361 12
12-28 3.29 6.19 268.90 362 12
12-29 1.51 4.71 269.00 363 12
12-30 1.49 3.54 269.14 364 12
12-31 0.11 3.94 268.76 365 12

366 rows × 5 columns

type(averageTemp)

The series contains a leap year 2016: so we have to drop leap day "02-29" for 2019, but we leave it in for the averages of the year 2020.

averageTemp.Rainfall_Settefrati.indexaverageTemp2019 = "2019-" + averageTemp.index.astype(str) #replace( ) df['col'] = 'str' + df['col'].astype(str)

averageTemp2019

In [54]:
averageTemp2019= averageTemp.drop(averageTemp.index[59])  # leap years: drop leap day "02-29"
In [55]:
dr2019 =pd.date_range(start='1/1/2019', end='31/12/2019'); dr2019
Out[55]:
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08',
               '2019-01-09', '2019-01-10',
               ...
               '2019-12-22', '2019-12-23', '2019-12-24', '2019-12-25',
               '2019-12-26', '2019-12-27', '2019-12-28', '2019-12-29',
               '2019-12-30', '2019-12-31'],
              dtype='datetime64[ns]', length=365, freq='D')
In [56]:
averageTemp2019.index= dr2019 
In [57]:
averageTemp2019 
Out[57]:
Rainfall_Settefrati Temperature_Settefrati Flow_Rate_Madonna_di_Canneto doy Month
2019-01-01 1.49 4.54 268.39 1 1
2019-01-02 3.17 5.24 268.14 2 1
2019-01-03 6.59 6.67 268.43 3 1
2019-01-04 3.47 6.89 270.83 4 1
2019-01-05 6.29 5.87 270.67 5 1
... ... ... ... ... ...
2019-12-27 15.26 5.96 269.19 361 12
2019-12-28 3.29 6.19 268.90 362 12
2019-12-29 1.51 4.71 269.00 363 12
2019-12-30 1.49 3.54 269.14 364 12
2019-12-31 0.11 3.94 268.76 365 12

365 rows × 5 columns

In [58]:
dr2020 =pd.date_range(start='1/1/2020', end='31/12/2020'); dr2020
Out[58]:
DatetimeIndex(['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04',
               '2020-01-05', '2020-01-06', '2020-01-07', '2020-01-08',
               '2020-01-09', '2020-01-10',
               ...
               '2020-12-22', '2020-12-23', '2020-12-24', '2020-12-25',
               '2020-12-26', '2020-12-27', '2020-12-28', '2020-12-29',
               '2020-12-30', '2020-12-31'],
              dtype='datetime64[ns]', length=366, freq='D')
In [59]:
averageTemp2020.index= dr2020
#averageTemp2020 = averageTemp # copy of df averageTemp2020.index = "2020-" + averageTemp2020.index.astype(str) #.replace( "2019-","2020-") #d
In [60]:
averageTemp2020
Out[60]:
Rainfall_Settefrati Temperature_Settefrati Flow_Rate_Madonna_di_Canneto doy Month
2020-01-01 1.49 4.54 268.39 1 1
2020-01-02 3.17 5.24 268.14 2 1
2020-01-03 6.59 6.67 268.43 3 1
2020-01-04 3.47 6.89 270.83 4 1
2020-01-05 6.29 5.87 270.67 5 1
... ... ... ... ... ...
2020-12-27 15.26 5.96 269.19 361 12
2020-12-28 3.29 6.19 268.90 362 12
2020-12-29 1.51 4.71 269.00 363 12
2020-12-30 1.49 3.54 269.14 364 12
2020-12-31 0.11 3.94 268.76 365 12

366 rows × 5 columns

averageTemp2019= pd.DatetimeIndex(averageTemp2019, yearfirst= True) #['Date'] .index format="%Y-%m-%d"

Water_Spring_Madonna_di_Canneto_original["2019-01-01":"2019-12-31"].index

averageTemp.Rainfall_Settefrati.index = Water_Spring_Madonna_di_Canneto_original["2019-01-01":"2019-12-31"].index Water_Spring_Madonna_di_Canneto["2019-01-01":"2019-12-31"].loc["Rainfall_Settefrati"] = averageTemp["01-01":"12-31"].Rainfall_Settefrati

In [22]:
averageTemp.Rainfall_Settefrati["2019-01-01":"2020-12-31"].plot(figsize=(14, 4)); #