In this notebook I describe how to predict wind and solar generation from weather data using a simple linear regression algorithm and a dataset containing energy production and weather information for Germany during 2016.
More information can be found in the post in my blog.
# import necessary modules
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
The data I used in this analysis comes from Open Power System Data, a free-of-charge platform with data data on installed generation capacity by country/technology, individual power plants (conventional and renewable), and time series data. It can be found in:
https://open-power-system-data.org/
The platform contains data for 37 European countries, but in this project I’m going to focus on data for Germany in 2016 as an example. In particular, I’m going to use two datasets:
First, we read the csv file containing the renewable energy production data relative to Germany. For that, we read the column containing the timestamp (whose column name starts with 'utc') and the columns containing German data (whose columns start with 'DE').
production = pd.read_csv("data/time_series_60min_singleindex.csv",
usecols=(lambda s: s.startswith('utc') | s.startswith('DE')),
parse_dates=[0], index_col=0)
production.head(3)
DE_load_entsoe_power_statistics | DE_load_entsoe_transparency | DE_price_day_ahead | DE_solar_capacity | DE_solar_generation_actual | DE_solar_profile | DE_wind_capacity | DE_wind_generation_actual | DE_wind_profile | DE_wind_offshore_capacity | ... | DE_tennet_wind_generation_actual | DE_tennet_wind_generation_forecast | DE_tennet_wind_offshore_generation_actual | DE_tennet_wind_onshore_generation_actual | DE_transnetbw_load_entsoe_transparency | DE_transnetbw_solar_generation_actual | DE_transnetbw_solar_generation_forecast | DE_transnetbw_wind_generation_actual | DE_transnetbw_wind_generation_forecast | DE_transnetbw_wind_onshore_generation_actual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
utc_timestamp | |||||||||||||||||||||
2005-12-31 23:00:00 | 47823.0 | NaN | 32.10 | 2028.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1707.0 | 1864.0 | 0.0 | 1707.0 | NaN | NaN | NaN | NaN | NaN | NaN |
2006-01-01 00:00:00 | 47823.0 | NaN | 30.07 | 2028.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1755.0 | 1813.0 | 0.0 | 1755.0 | NaN | NaN | NaN | NaN | NaN | NaN |
2006-01-01 01:00:00 | 43444.0 | NaN | 27.94 | 2028.0 | NaN | NaN | NaN | NaN | NaN | NaN | ... | 1591.0 | 1726.0 | 0.0 | 1591.0 | NaN | NaN | NaN | NaN | NaN | NaN |
3 rows × 48 columns
production.tail(3)
DE_load_entsoe_power_statistics | DE_load_entsoe_transparency | DE_price_day_ahead | DE_solar_capacity | DE_solar_generation_actual | DE_solar_profile | DE_wind_capacity | DE_wind_generation_actual | DE_wind_profile | DE_wind_offshore_capacity | ... | DE_tennet_wind_generation_actual | DE_tennet_wind_generation_forecast | DE_tennet_wind_offshore_generation_actual | DE_tennet_wind_onshore_generation_actual | DE_transnetbw_load_entsoe_transparency | DE_transnetbw_solar_generation_actual | DE_transnetbw_solar_generation_forecast | DE_transnetbw_wind_generation_actual | DE_transnetbw_wind_generation_forecast | DE_transnetbw_wind_onshore_generation_actual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
utc_timestamp | |||||||||||||||||||||
2017-12-31 21:00:00 | NaN | 46566.0 | 1.86 | 41951.0 | 0.0 | 0.0 | 55790.0 | 30355.0 | 0.5441 | 5323.0 | ... | 12111.0 | 13904.0 | 3752.0 | 8359.0 | 5376.0 | 0.0 | 0.0 | 964.0 | 970.0 | 964.0 |
2017-12-31 22:00:00 | NaN | 45159.0 | -0.92 | 41951.0 | 0.0 | 0.0 | 55790.0 | 30594.0 | 0.5484 | 5323.0 | ... | 12249.0 | 13937.0 | 2800.0 | 9450.0 | 5195.0 | 0.0 | 0.0 | 1031.0 | 1026.0 | 1031.0 |
2017-12-31 23:00:00 | NaN | 43803.0 | -5.27 | 41951.0 | 0.0 | 0.0 | 55790.0 | 31578.0 | 0.5660 | 5323.0 | ... | 13035.0 | 13378.0 | 2723.0 | 10312.0 | 4902.0 | 0.0 | 0.0 | 1086.0 | 1029.0 | 1086.0 |
3 rows × 48 columns
The data file contains information between 2006 and 2017. We will only be interested in the data for 2016.
production = production.loc[production.index.year == 2016, :]
production.head(3)
DE_load_entsoe_power_statistics | DE_load_entsoe_transparency | DE_price_day_ahead | DE_solar_capacity | DE_solar_generation_actual | DE_solar_profile | DE_wind_capacity | DE_wind_generation_actual | DE_wind_profile | DE_wind_offshore_capacity | ... | DE_tennet_wind_generation_actual | DE_tennet_wind_generation_forecast | DE_tennet_wind_offshore_generation_actual | DE_tennet_wind_onshore_generation_actual | DE_transnetbw_load_entsoe_transparency | DE_transnetbw_solar_generation_actual | DE_transnetbw_solar_generation_forecast | DE_transnetbw_wind_generation_actual | DE_transnetbw_wind_generation_forecast | DE_transnetbw_wind_onshore_generation_actual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
utc_timestamp | |||||||||||||||||||||
2016-01-01 00:00:00 | 41913.0 | 39396.0 | 22.39 | 39021.0 | 0.0 | 0.0 | 44325.0 | 8579.0 | 0.1936 | 3283.0 | ... | 5615.0 | 4740.0 | 2263.0 | 3352.0 | 4652.0 | 0.0 | 0.0 | 24.0 | 0.0 | 24.0 |
2016-01-01 01:00:00 | 40331.0 | 37718.0 | 20.59 | 39021.0 | 0.0 | 0.0 | 44325.0 | 8542.0 | 0.1928 | 3283.0 | ... | 5498.0 | 4735.0 | 2218.0 | 3280.0 | 4431.0 | 0.0 | 0.0 | 24.0 | 0.0 | 24.0 |
2016-01-01 02:00:00 | 39162.0 | 36692.0 | 16.81 | 39021.0 | 0.0 | 0.0 | 44325.0 | 8443.0 | 0.1905 | 3283.0 | ... | 5439.0 | 4795.0 | 2005.0 | 3434.0 | 4325.0 | 0.0 | 0.0 | 17.0 | 0.0 | 17.0 |
3 rows × 48 columns
production.tail(3)
DE_load_entsoe_power_statistics | DE_load_entsoe_transparency | DE_price_day_ahead | DE_solar_capacity | DE_solar_generation_actual | DE_solar_profile | DE_wind_capacity | DE_wind_generation_actual | DE_wind_profile | DE_wind_offshore_capacity | ... | DE_tennet_wind_generation_actual | DE_tennet_wind_generation_forecast | DE_tennet_wind_offshore_generation_actual | DE_tennet_wind_onshore_generation_actual | DE_transnetbw_load_entsoe_transparency | DE_transnetbw_solar_generation_actual | DE_transnetbw_solar_generation_forecast | DE_transnetbw_wind_generation_actual | DE_transnetbw_wind_generation_forecast | DE_transnetbw_wind_onshore_generation_actual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
utc_timestamp | |||||||||||||||||||||
2016-12-31 21:00:00 | 50383.0 | 47205.0 | 33.71 | 40293.0 | 0.0 | 0.0 | 49569.0 | 14848.0 | 0.2995 | 4131.0 | ... | 8741.0 | 9438.0 | 2974.0 | 5766.0 | 6208.0 | 0.0 | 0.0 | 38.0 | 58.0 | 38.0 |
2016-12-31 22:00:00 | 48988.0 | 45548.0 | 27.95 | 40293.0 | 0.0 | 0.0 | 49569.0 | 15056.0 | 0.3037 | 4131.0 | ... | 8750.0 | 9444.0 | 2976.0 | 5774.0 | 6071.0 | 0.0 | 0.0 | 54.0 | 68.0 | 54.0 |
2016-12-31 23:00:00 | 47641.0 | 44512.0 | 20.96 | 40294.0 | 0.0 | 0.0 | 49569.0 | 15422.0 | 0.3111 | 4131.0 | ... | 8792.0 | 9543.0 | 2971.0 | 5821.0 | 5953.0 | 0.0 | 0.0 | 70.0 | 44.0 | 70.0 |
3 rows × 48 columns
production.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 8784 entries, 2016-01-01 00:00:00 to 2016-12-31 23:00:00 Data columns (total 48 columns): DE_load_entsoe_power_statistics 8784 non-null float64 DE_load_entsoe_transparency 8784 non-null float64 DE_price_day_ahead 8784 non-null float64 DE_solar_capacity 8784 non-null float64 DE_solar_generation_actual 8784 non-null float64 DE_solar_profile 8784 non-null float64 DE_wind_capacity 8784 non-null float64 DE_wind_generation_actual 8784 non-null float64 DE_wind_profile 8784 non-null float64 DE_wind_offshore_capacity 8784 non-null float64 DE_wind_offshore_generation_actual 8784 non-null float64 DE_wind_offshore_profile 8784 non-null float64 DE_wind_onshore_capacity 8784 non-null float64 DE_wind_onshore_generation_actual 8784 non-null float64 DE_wind_onshore_profile 8784 non-null float64 DE_50hertz_load_entsoe_transparency 8784 non-null float64 DE_50hertz_solar_generation_actual 8784 non-null float64 DE_50hertz_solar_generation_forecast 8784 non-null float64 DE_50hertz_wind_generation_actual 8784 non-null float64 DE_50hertz_wind_generation_forecast 8784 non-null float64 DE_50hertz_wind_offshore_generation_actual 8784 non-null float64 DE_50hertz_wind_offshore_generation_forecast 8784 non-null float64 DE_50hertz_wind_onshore_generation_actual 8784 non-null float64 DE_50hertz_wind_onshore_generation_forecast 8784 non-null float64 DE_AT_LU_load_entsoe_transparency 8784 non-null float64 DE_AT_LU_price_day_ahead 8784 non-null float64 DE_AT_LU_solar_generation_actual 8736 non-null float64 DE_AT_LU_wind_offshore_generation_actual 8733 non-null float64 DE_AT_LU_wind_onshore_generation_actual 8736 non-null float64 DE_amprion_load_entsoe_transparency 8784 non-null float64 DE_amprion_solar_generation_actual 8784 non-null float64 DE_amprion_solar_generation_forecast 8784 non-null float64 DE_amprion_wind_generation_actual 8784 non-null float64 DE_amprion_wind_generation_forecast 8784 non-null float64 DE_amprion_wind_onshore_generation_actual 8784 non-null float64 DE_tennet_load_entsoe_transparency 8783 non-null float64 DE_tennet_solar_generation_actual 8784 non-null float64 DE_tennet_solar_generation_forecast 8784 non-null float64 DE_tennet_wind_generation_actual 8784 non-null float64 DE_tennet_wind_generation_forecast 8784 non-null float64 DE_tennet_wind_offshore_generation_actual 8784 non-null float64 DE_tennet_wind_onshore_generation_actual 8784 non-null float64 DE_transnetbw_load_entsoe_transparency 8784 non-null float64 DE_transnetbw_solar_generation_actual 8784 non-null float64 DE_transnetbw_solar_generation_forecast 8784 non-null float64 DE_transnetbw_wind_generation_actual 8784 non-null float64 DE_transnetbw_wind_generation_forecast 8784 non-null float64 DE_transnetbw_wind_onshore_generation_actual 8784 non-null float64 dtypes: float64(48) memory usage: 3.3 MB
There 8784 entries, which correspond to the number of hours in a leap year like 2016.
Let's plot the time series for the wind and solar generation to get a better idea of the properties of the dataset.
# create plot
plt.plot(production.index, production['DE_wind_generation_actual'])
plt.title('Actual wind generation in Germany in MW')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
plt.ylim(0, 35000)
# save plot
plt.savefig("figs/wind.png", dpi=200)
# create plot
plt.plot(production.index, production['DE_solar_generation_actual'], c='OrangeRed')
plt.title('Actual solar generation in Germany in MW')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
plt.ylim(0, 27000)
# save plot
plt.savefig("figs/solar.png", dpi=200)
production_wind_solar = production[['DE_wind_generation_actual', 'DE_solar_generation_actual']]
Next, we read the weather data for Germany in 2016 by reading the full csv file.
The data in the file contains the following:
weather = pd.read_csv("data/weather_data_GER_2016.csv",
parse_dates=[0], index_col=0)
weather.head(3)
cumulated hours | lat | lon | v1 | v2 | v_50m | h1 | h2 | z0 | SWTDN | SWGDN | T | rho | p | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | ||||||||||||||
2016-01-01 00:00:00 | 0 | 47.5 | 5.625 | 0.81 | 1.88 | 3.36 | 2 | 10 | 0.052526 | 0.0 | 0.0 | 277.350159 | 1.236413 | 99282.710938 |
2016-01-01 01:00:00 | 1 | 47.5 | 5.625 | 0.77 | 1.61 | 2.63 | 2 | 10 | 0.052510 | 0.0 | 0.0 | 277.025665 | 1.239390 | 99300.164062 |
2016-01-01 02:00:00 | 2 | 47.5 | 5.625 | 0.66 | 1.22 | 1.89 | 2 | 10 | 0.052495 | 0.0 | 0.0 | 277.223755 | 1.243861 | 99310.992188 |
weather.tail(3)
cumulated hours | lat | lon | v1 | v2 | v_50m | h1 | h2 | z0 | SWTDN | SWGDN | T | rho | p | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | ||||||||||||||
2016-12-31 21:00:00 | 8781 | 55.0 | 15.0 | 10.49 | 12.65 | 14.38 | 2 | 10 | 0.001014 | 0.0 | 0.0 | 278.634644 | 1.270996 | 101653.398438 |
2016-12-31 22:00:00 | 8782 | 55.0 | 15.0 | 10.47 | 12.63 | 14.41 | 2 | 10 | 0.001009 | 0.0 | 0.0 | 278.678162 | 1.270252 | 101609.484375 |
2016-12-31 23:00:00 | 8783 | 55.0 | 15.0 | 10.39 | 12.53 | 14.33 | 2 | 10 | 0.000993 | 0.0 | 0.0 | 278.720184 | 1.269202 | 101545.859375 |
weather.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2248704 entries, 2016-01-01 00:00:00 to 2016-12-31 23:00:00 Data columns (total 14 columns): cumulated hours int64 lat float64 lon float64 v1 float64 v2 float64 v_50m float64 h1 int64 h2 int64 z0 float64 SWTDN float64 SWGDN float64 T float64 rho float64 p float64 dtypes: float64(11), int64(3) memory usage: 257.3 MB
Note that the data is given by chuncks of the German geographical territory, each determined by its latitute ('lat') and longitute ('lon'). Each chunck corresponds to the resolution of the MERRA-2 grid (which was used to obtain the data), which is 0.5º x 0.625º.
For example, the data for the time 2016-01-01 00:00:00 is the following:
weather.loc[weather.index == '2016-01-01 00:00:00', :]
cumulated hours | lat | lon | v1 | v2 | v_50m | h1 | h2 | z0 | SWTDN | SWGDN | T | rho | p | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | ||||||||||||||
2016-01-01 | 0 | 47.5 | 5.625 | 0.81 | 1.88 | 3.36 | 2 | 10 | 0.052526 | 0.0 | 0.0 | 277.350159 | 1.236413 | 99282.710938 |
2016-01-01 | 0 | 47.5 | 6.250 | 0.85 | 1.77 | 3.69 | 2 | 10 | 0.163823 | 0.0 | 0.0 | 277.609924 | 1.226037 | 98194.710938 |
2016-01-01 | 0 | 47.5 | 6.875 | 0.94 | 1.94 | 3.79 | 3 | 11 | 0.286626 | 0.0 | 0.0 | 276.850159 | 1.205101 | 96186.710938 |
2016-01-01 | 0 | 47.5 | 7.500 | 0.84 | 1.68 | 2.82 | 2 | 10 | 0.109472 | 0.0 | 0.0 | 276.414612 | 1.200341 | 95874.710938 |
2016-01-01 | 0 | 47.5 | 8.125 | 0.74 | 1.56 | 2.51 | 2 | 10 | 0.066503 | 0.0 | 0.0 | 276.039612 | 1.202233 | 96050.710938 |
2016-01-01 | 0 | 47.5 | 8.750 | 0.76 | 1.49 | 2.37 | 2 | 10 | 0.051061 | 0.0 | 0.0 | 275.898987 | 1.201012 | 95842.710938 |
2016-01-01 | 0 | 47.5 | 9.375 | 0.75 | 1.27 | 1.83 | 2 | 10 | 0.082921 | 0.0 | 0.0 | 275.578674 | 1.181908 | 94178.710938 |
2016-01-01 | 0 | 47.5 | 10.000 | 0.88 | 1.59 | 2.32 | 4 | 12 | 0.398931 | 0.0 | 0.0 | 274.123596 | 1.146264 | 90998.710938 |
2016-01-01 | 0 | 47.5 | 10.625 | 0.81 | 1.38 | 1.93 | 6 | 14 | 0.813970 | 0.0 | 0.0 | 273.670471 | 1.117516 | 88518.710938 |
2016-01-01 | 0 | 47.5 | 11.250 | 0.56 | 0.86 | 1.09 | 10 | 18 | 1.666021 | 0.0 | 0.0 | 274.074768 | 1.118920 | 88486.710938 |
2016-01-01 | 0 | 47.5 | 11.875 | 0.33 | 0.47 | 0.59 | 11 | 19 | 1.786626 | 0.0 | 0.0 | 274.301331 | 1.123864 | 88834.710938 |
2016-01-01 | 0 | 47.5 | 12.500 | 0.58 | 0.98 | 1.32 | 11 | 19 | 1.786138 | 0.0 | 0.0 | 272.826721 | 1.114037 | 87786.710938 |
2016-01-01 | 0 | 47.5 | 13.125 | 0.26 | 0.90 | 1.44 | 10 | 18 | 1.750005 | 0.0 | 0.0 | 272.853088 | 1.122155 | 88246.710938 |
2016-01-01 | 0 | 47.5 | 13.750 | 0.08 | 0.97 | 2.32 | 10 | 18 | 1.767095 | 0.0 | 0.0 | 271.089905 | 1.115258 | 87398.710938 |
2016-01-01 | 0 | 47.5 | 14.375 | 0.06 | 1.10 | 3.07 | 10 | 18 | 1.743657 | 0.0 | 0.0 | 271.062561 | 1.127038 | 88282.710938 |
2016-01-01 | 0 | 47.5 | 15.000 | 0.12 | 1.39 | 3.48 | 10 | 18 | 1.716802 | 0.0 | 0.0 | 271.574768 | 1.149498 | 90066.710938 |
2016-01-01 | 0 | 48.0 | 5.625 | 1.90 | 2.90 | 4.12 | 2 | 10 | 0.052312 | 0.0 | 0.0 | 276.928284 | 1.234093 | 98314.710938 |
2016-01-01 | 0 | 48.0 | 6.250 | 0.65 | 1.67 | 3.36 | 4 | 12 | 0.424810 | 0.0 | 0.0 | 277.461487 | 1.224938 | 98010.710938 |
2016-01-01 | 0 | 48.0 | 6.875 | 0.38 | 1.68 | 3.16 | 7 | 15 | 1.153814 | 0.0 | 0.0 | 277.238831 | 1.210351 | 96674.710938 |
2016-01-01 | 0 | 48.0 | 7.500 | 0.88 | 1.79 | 2.89 | 3 | 11 | 0.238103 | 0.0 | 0.0 | 276.432190 | 1.224205 | 98114.710938 |
2016-01-01 | 0 | 48.0 | 8.125 | 0.31 | 1.19 | 2.32 | 9 | 17 | 1.438970 | 0.0 | 0.0 | 275.115784 | 1.185936 | 94314.710938 |
2016-01-01 | 0 | 48.0 | 8.750 | 0.74 | 1.49 | 3.19 | 2 | 10 | 0.100133 | 0.0 | 0.0 | 275.313049 | 1.188744 | 94274.710938 |
2016-01-01 | 0 | 48.0 | 9.375 | 0.48 | 0.73 | 1.07 | 2 | 10 | 0.067174 | 0.0 | 0.0 | 275.361877 | 1.204918 | 95594.710938 |
2016-01-01 | 0 | 48.0 | 10.000 | 0.61 | 1.11 | 1.69 | 2 | 10 | 0.054891 | 0.0 | 0.0 | 275.195862 | 1.198937 | 95202.710938 |
2016-01-01 | 0 | 48.0 | 10.625 | 0.63 | 1.06 | 1.56 | 2 | 10 | 0.056585 | 0.0 | 0.0 | 275.238831 | 1.192711 | 94738.710938 |
2016-01-01 | 0 | 48.0 | 11.250 | 0.63 | 1.06 | 1.52 | 7 | 15 | 1.058110 | 0.0 | 0.0 | 275.305237 | 1.197228 | 95130.710938 |
2016-01-01 | 0 | 48.0 | 11.875 | 0.55 | 1.00 | 1.50 | 7 | 15 | 1.027349 | 0.0 | 0.0 | 275.055237 | 1.200097 | 95234.710938 |
2016-01-01 | 0 | 48.0 | 12.500 | 0.67 | 1.28 | 1.97 | 4 | 12 | 0.526128 | 0.0 | 0.0 | 274.383362 | 1.204918 | 95554.710938 |
2016-01-01 | 0 | 48.0 | 13.125 | 0.67 | 1.55 | 2.73 | 5 | 13 | 0.620366 | 0.0 | 0.0 | 273.934143 | 1.210289 | 96058.710938 |
2016-01-01 | 0 | 48.0 | 13.750 | 0.33 | 1.30 | 3.03 | 8 | 16 | 1.231939 | 0.0 | 0.0 | 273.258362 | 1.203820 | 95066.710938 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2016-01-01 | 0 | 54.5 | 6.875 | 9.24 | 11.03 | 12.56 | 2 | 10 | 0.000708 | 0.0 | 0.0 | 281.645081 | 1.259362 | 101834.710938 |
2016-01-01 | 0 | 54.5 | 7.500 | 8.39 | 9.97 | 11.32 | 2 | 10 | 0.000560 | 0.0 | 0.0 | 281.047424 | 1.263146 | 101930.710938 |
2016-01-01 | 0 | 54.5 | 8.125 | 7.17 | 8.62 | 9.96 | 2 | 10 | 0.006656 | 0.0 | 0.0 | 279.953674 | 1.268517 | 102026.710938 |
2016-01-01 | 0 | 54.5 | 8.750 | 5.23 | 6.72 | 8.21 | 2 | 10 | 0.030370 | 0.0 | 0.0 | 278.254456 | 1.276574 | 102074.710938 |
2016-01-01 | 0 | 54.5 | 9.375 | 3.80 | 5.22 | 6.54 | 2 | 10 | 0.048879 | 0.0 | 0.0 | 276.445862 | 1.284508 | 102018.710938 |
2016-01-01 | 0 | 54.5 | 10.000 | 3.80 | 4.95 | 5.94 | 2 | 10 | 0.037725 | 0.0 | 0.0 | 276.432190 | 1.285363 | 102026.710938 |
2016-01-01 | 0 | 54.5 | 10.625 | 4.45 | 5.26 | 6.01 | 2 | 10 | 0.011167 | 0.0 | 0.0 | 277.459534 | 1.282677 | 102154.710938 |
2016-01-01 | 0 | 54.5 | 11.250 | 4.06 | 4.81 | 5.89 | 2 | 10 | 0.001355 | 0.0 | 0.0 | 277.416565 | 1.282067 | 102202.710938 |
2016-01-01 | 0 | 54.5 | 11.875 | 3.88 | 4.54 | 5.43 | 2 | 10 | 0.001585 | 0.0 | 0.0 | 276.705627 | 1.286705 | 102234.710938 |
2016-01-01 | 0 | 54.5 | 12.500 | 3.36 | 4.02 | 4.82 | 2 | 10 | 0.011331 | 0.0 | 0.0 | 275.990784 | 1.289879 | 102250.710938 |
2016-01-01 | 0 | 54.5 | 13.125 | 2.55 | 3.23 | 4.09 | 2 | 10 | 0.018911 | 0.0 | 0.0 | 275.254456 | 1.291955 | 102282.710938 |
2016-01-01 | 0 | 54.5 | 13.750 | 3.00 | 3.47 | 3.94 | 2 | 10 | 0.009179 | 0.0 | 0.0 | 275.674377 | 1.293175 | 102314.710938 |
2016-01-01 | 0 | 54.5 | 14.375 | 3.59 | 3.98 | 4.21 | 2 | 10 | 0.002818 | 0.0 | 0.0 | 275.875549 | 1.295250 | 102378.710938 |
2016-01-01 | 0 | 54.5 | 15.000 | 4.12 | 4.54 | 4.68 | 2 | 10 | 0.001652 | 0.0 | 0.0 | 275.430237 | 1.299767 | 102418.710938 |
2016-01-01 | 0 | 55.0 | 5.625 | 10.90 | 13.21 | 15.47 | 2 | 10 | 0.001072 | 0.0 | 0.0 | 282.254456 | 1.251915 | 101562.710938 |
2016-01-01 | 0 | 55.0 | 6.250 | 10.31 | 12.43 | 14.46 | 2 | 10 | 0.000929 | 0.0 | 0.0 | 282.164612 | 1.253869 | 101658.710938 |
2016-01-01 | 0 | 55.0 | 6.875 | 9.57 | 11.49 | 13.30 | 2 | 10 | 0.000770 | 0.0 | 0.0 | 281.973206 | 1.256188 | 101762.710938 |
2016-01-01 | 0 | 55.0 | 7.500 | 8.70 | 10.42 | 12.16 | 2 | 10 | 0.000620 | 0.0 | 0.0 | 281.586487 | 1.258629 | 101858.710938 |
2016-01-01 | 0 | 55.0 | 8.125 | 7.18 | 8.72 | 10.69 | 2 | 10 | 0.004010 | 0.0 | 0.0 | 280.070862 | 1.264611 | 101938.710938 |
2016-01-01 | 0 | 55.0 | 8.750 | 5.17 | 6.84 | 9.00 | 2 | 10 | 0.032689 | 0.0 | 0.0 | 278.238831 | 1.272057 | 101898.710938 |
2016-01-01 | 0 | 55.0 | 9.375 | 4.04 | 5.65 | 7.59 | 2 | 10 | 0.045995 | 0.0 | 0.0 | 277.008362 | 1.277794 | 101842.710938 |
2016-01-01 | 0 | 55.0 | 10.000 | 4.22 | 5.46 | 6.57 | 2 | 10 | 0.031728 | 0.0 | 0.0 | 277.092346 | 1.280602 | 101922.710938 |
2016-01-01 | 0 | 55.0 | 10.625 | 4.15 | 5.15 | 6.26 | 2 | 10 | 0.017499 | 0.0 | 0.0 | 277.609924 | 1.279137 | 102026.710938 |
2016-01-01 | 0 | 55.0 | 11.250 | 3.55 | 4.44 | 5.91 | 2 | 10 | 0.011171 | 0.0 | 0.0 | 277.205627 | 1.280236 | 102082.710938 |
2016-01-01 | 0 | 55.0 | 11.875 | 3.08 | 3.94 | 5.53 | 2 | 10 | 0.016507 | 0.0 | 0.0 | 276.850159 | 1.281945 | 102106.710938 |
2016-01-01 | 0 | 55.0 | 12.500 | 3.71 | 4.36 | 5.06 | 2 | 10 | 0.006501 | 0.0 | 0.0 | 277.781799 | 1.280602 | 102218.710938 |
2016-01-01 | 0 | 55.0 | 13.125 | 3.82 | 4.34 | 4.69 | 2 | 10 | 0.002311 | 0.0 | 0.0 | 277.668518 | 1.283288 | 102282.710938 |
2016-01-01 | 0 | 55.0 | 13.750 | 3.75 | 4.23 | 4.61 | 2 | 10 | 0.003382 | 0.0 | 0.0 | 276.957581 | 1.287438 | 102290.710938 |
2016-01-01 | 0 | 55.0 | 14.375 | 3.82 | 4.30 | 4.64 | 2 | 10 | 0.000446 | 0.0 | 0.0 | 276.529846 | 1.290246 | 102298.710938 |
2016-01-01 | 0 | 55.0 | 15.000 | 4.15 | 4.66 | 4.99 | 2 | 10 | 0.000250 | 0.0 | 0.0 | 276.201721 | 1.293541 | 102346.710938 |
256 rows × 14 columns
Therefore, for any given hour, we have data for 256 'chuncks' of Germany. Dividing the total number of entries by the number of 'chuncks' returns the expected number of hours of a leap year.
2248704/256
8784.0
Averaging over all the 'chuncks':
weather_by_day = weather.groupby(weather.index).mean()
weather_by_day.head(24)
cumulated hours | lat | lon | v1 | v2 | v_50m | h1 | h2 | z0 | SWTDN | SWGDN | T | rho | p | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | ||||||||||||||
2016-01-01 00:00:00 | 0.0 | 51.25 | 10.3125 | 2.616758 | 3.714414 | 5.132500 | 2.546875 | 10.546875 | 0.164370 | 0.000000 | 0.000000 | 275.296747 | 1.253185 | 99281.007812 |
2016-01-01 01:00:00 | 1.0 | 51.25 | 10.3125 | 2.564375 | 3.631055 | 4.976094 | 2.546875 | 10.546875 | 0.164331 | 0.000000 | 0.000000 | 275.255473 | 1.253778 | 99288.492188 |
2016-01-01 02:00:00 | 2.0 | 51.25 | 10.3125 | 2.480859 | 3.516328 | 4.794688 | 2.546875 | 10.546875 | 0.164288 | 0.000000 | 0.000000 | 275.217089 | 1.254304 | 99296.398438 |
2016-01-01 03:00:00 | 3.0 | 51.25 | 10.3125 | 2.397148 | 3.404961 | 4.664023 | 2.546875 | 10.546875 | 0.164239 | 0.000000 | 0.000000 | 275.211715 | 1.254562 | 99306.695312 |
2016-01-01 04:00:00 | 4.0 | 51.25 | 10.3125 | 2.305703 | 3.286602 | 4.566289 | 2.546875 | 10.546875 | 0.164199 | 0.000000 | 0.000000 | 275.193369 | 1.254733 | 99319.750000 |
2016-01-01 05:00:00 | 5.0 | 51.25 | 10.3125 | 2.207227 | 3.160312 | 4.445078 | 2.546875 | 10.546875 | 0.164153 | 0.000000 | 0.000000 | 275.184228 | 1.254969 | 99336.710938 |
2016-01-01 06:00:00 | 6.0 | 51.25 | 10.3125 | 2.099336 | 3.027148 | 4.291758 | 2.546875 | 10.546875 | 0.164116 | 0.029503 | 0.002406 | 275.151314 | 1.255247 | 99357.078125 |
2016-01-01 07:00:00 | 7.0 | 51.25 | 10.3125 | 2.001992 | 2.903750 | 4.116836 | 2.542969 | 10.542969 | 0.164068 | 32.701265 | 4.885147 | 275.175961 | 1.255487 | 99383.765625 |
2016-01-01 08:00:00 | 8.0 | 51.25 | 10.3125 | 2.061836 | 2.915977 | 3.855273 | 2.542969 | 10.542969 | 0.164031 | 167.523132 | 34.330536 | 275.612767 | 1.255295 | 99415.406250 |
2016-01-01 09:00:00 | 9.0 | 51.25 | 10.3125 | 2.112539 | 2.905078 | 3.583516 | 2.542969 | 10.542969 | 0.163995 | 287.432617 | 78.543579 | 276.160692 | 1.254251 | 99436.265625 |
2016-01-01 10:00:00 | 10.0 | 51.25 | 10.3125 | 2.047969 | 2.776602 | 3.340352 | 2.542969 | 10.542969 | 0.163951 | 358.452148 | 116.500275 | 276.698255 | 1.252322 | 99428.968750 |
2016-01-01 11:00:00 | 11.0 | 51.25 | 10.3125 | 1.911523 | 2.582539 | 3.092773 | 2.542969 | 10.542969 | 0.163912 | 375.738281 | 134.623901 | 277.156940 | 1.250017 | 99399.437500 |
2016-01-01 12:00:00 | 12.0 | 51.25 | 10.3125 | 1.731445 | 2.361875 | 2.867461 | 2.542969 | 10.542969 | 0.163870 | 338.113037 | 124.632446 | 277.413811 | 1.248167 | 99369.289062 |
2016-01-01 13:00:00 | 13.0 | 51.25 | 10.3125 | 1.426953 | 2.038047 | 2.688203 | 2.542969 | 10.542969 | 0.163833 | 248.132080 | 87.318741 | 277.306795 | 1.247254 | 99346.148438 |
2016-01-01 14:00:00 | 14.0 | 51.25 | 10.3125 | 1.091406 | 1.761680 | 2.742617 | 2.542969 | 10.542969 | 0.163795 | 113.059875 | 33.667662 | 276.545491 | 1.247368 | 99328.453125 |
2016-01-01 15:00:00 | 15.0 | 51.25 | 10.3125 | 1.112891 | 1.874336 | 3.110781 | 2.542969 | 10.542969 | 0.163760 | 8.461687 | 1.855964 | 275.706389 | 1.248087 | 99311.734375 |
2016-01-01 16:00:00 | 16.0 | 51.25 | 10.3125 | 1.371055 | 2.213437 | 3.549258 | 2.542969 | 10.542969 | 0.163721 | 0.000000 | 0.000000 | 275.486617 | 1.249090 | 99289.906250 |
2016-01-01 17:00:00 | 17.0 | 51.25 | 10.3125 | 1.671875 | 2.600195 | 4.012930 | 2.542969 | 10.542969 | 0.163687 | 0.000000 | 0.000000 | 275.280497 | 1.250165 | 99255.312500 |
2016-01-01 18:00:00 | 18.0 | 51.25 | 10.3125 | 1.950234 | 2.967305 | 4.468164 | 2.542969 | 10.542969 | 0.163660 | 0.000000 | 0.000000 | 275.041161 | 1.251209 | 99210.710938 |
2016-01-01 19:00:00 | 19.0 | 51.25 | 10.3125 | 2.194336 | 3.288164 | 4.867109 | 2.542969 | 10.542969 | 0.163624 | 0.000000 | 0.000000 | 274.800643 | 1.252238 | 99158.867188 |
2016-01-01 20:00:00 | 20.0 | 51.25 | 10.3125 | 2.412812 | 3.566445 | 5.198477 | 2.542969 | 10.542969 | 0.163597 | 0.000000 | 0.000000 | 274.572911 | 1.253178 | 99098.531250 |
2016-01-01 21:00:00 | 21.0 | 51.25 | 10.3125 | 2.611875 | 3.809375 | 5.486484 | 2.542969 | 10.542969 | 0.163561 | 0.000000 | 0.000000 | 274.356602 | 1.253980 | 99038.382812 |
2016-01-01 22:00:00 | 22.0 | 51.25 | 10.3125 | 2.840469 | 4.101445 | 5.783359 | 2.542969 | 10.542969 | 0.163531 | 0.000000 | 0.000000 | 274.167320 | 1.254680 | 98987.726562 |
2016-01-01 23:00:00 | 23.0 | 51.25 | 10.3125 | 3.075664 | 4.406797 | 6.074883 | 2.542969 | 10.542969 | 0.163504 | 0.000000 | 0.000000 | 273.999309 | 1.255226 | 98935.304688 |
Let's plot the various weather-related quantities during 2016 to get a better idea of the properties of the dataset.
# create plot
plt.plot(weather_by_day.index, weather_by_day['v2'])
plt.title('Wind velocity 10 m above displacement height (m/s)')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
plt.ylim(0, 12)
# save plot
plt.savefig("figs/v1.png", dpi=200)
# create plot
plt.plot(weather_by_day.index, weather_by_day['SWGDN'], c='OrangeRed')
plt.title('Ground horizontal radiation (W/m²)')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
plt.ylim(0, 1000)
# save plot
plt.savefig("figs/radiation.png", dpi=200)
# create plot
plt.plot(weather_by_day.index, weather_by_day['T'], c='OrangeRed')
plt.title('Temperature (K)')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
plt.ylim(263, 303)
# save plot
plt.savefig("figs/temperature_K.png", dpi=200)
Let's use the more common Celsius units for the temperature.
weather_by_day['T (C)'] = weather_by_day['T'] - 273.15
# create plot
plt.plot(weather_by_day.index, weather_by_day['T (C)'], c='OrangeRed')
plt.title('Temperature (ºC)')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
plt.ylim(-10, 30)
# save plot
plt.savefig("figs/temperature_C.png", dpi=200)
# create plot
plt.plot(weather_by_day.index, weather_by_day['rho'])
plt.title('Air density at the surface (kg/m³)')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
(735964.0, 736330.0)
# create plot
plt.plot(weather_by_day.index, weather_by_day['p'])
plt.title('Air pressure at the surface (Pa)')
plt.xlim(pd.Timestamp('2016-01-01'), pd.Timestamp('2017-01-01'))
(735964.0, 736330.0)
As suggested by the above plots from both datasets (and by common sense), there seems to be some correlation between the wind and solar generation and some of the measured weather quantities.
First, let's merge the two DataFrames into a single DataFrame.
# merge production_wind_solar and weather_by_day DataFrames
combined = pd.merge(production_wind_solar, weather_by_day, how='left', left_index=True, right_index=True)
# drop redundant 'T (C)' column
combined = combined.drop('T (C)', axis=1)
combined.head()
DE_wind_generation_actual | DE_solar_generation_actual | cumulated hours | lat | lon | v1 | v2 | v_50m | h1 | h2 | z0 | SWTDN | SWGDN | T | rho | p | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
utc_timestamp | ||||||||||||||||
2016-01-01 00:00:00 | 8579.0 | 0.0 | 0.0 | 51.25 | 10.3125 | 2.616758 | 3.714414 | 5.132500 | 2.546875 | 10.546875 | 0.164370 | 0.0 | 0.0 | 275.296747 | 1.253185 | 99281.007812 |
2016-01-01 01:00:00 | 8542.0 | 0.0 | 1.0 | 51.25 | 10.3125 | 2.564375 | 3.631055 | 4.976094 | 2.546875 | 10.546875 | 0.164331 | 0.0 | 0.0 | 275.255473 | 1.253778 | 99288.492188 |
2016-01-01 02:00:00 | 8443.0 | 0.0 | 2.0 | 51.25 | 10.3125 | 2.480859 | 3.516328 | 4.794688 | 2.546875 | 10.546875 | 0.164288 | 0.0 | 0.0 | 275.217089 | 1.254304 | 99296.398438 |
2016-01-01 03:00:00 | 8295.0 | 0.0 | 3.0 | 51.25 | 10.3125 | 2.397148 | 3.404961 | 4.664023 | 2.546875 | 10.546875 | 0.164239 | 0.0 | 0.0 | 275.211715 | 1.254562 | 99306.695312 |
2016-01-01 04:00:00 | 7319.0 | 0.0 | 4.0 | 51.25 | 10.3125 | 2.305703 | 3.286602 | 4.566289 | 2.546875 | 10.546875 | 0.164199 | 0.0 | 0.0 | 275.193369 | 1.254733 | 99319.750000 |
Further evidence for the apparent correlation between certain quantities can be obtained from the following plots, in which the wind and solar generation is shown as a function of the several weather quantities.
sns.pairplot(combined, x_vars=['v1', 'v2', 'v_50m', 'z0'], y_vars=['DE_wind_generation_actual'])
<seaborn.axisgrid.PairGrid at 0x160d415cd68>
sns.pairplot(combined, x_vars=['SWTDN', 'SWGDN', 'T', 'rho', 'p'], y_vars=['DE_wind_generation_actual'])
<seaborn.axisgrid.PairGrid at 0x160d33f75c0>
sns.pairplot(combined, x_vars=['v1', 'v2', 'v_50m', 'z0'], y_vars=['DE_solar_generation_actual'])
<seaborn.axisgrid.PairGrid at 0x160d35d15c0>
sns.pairplot(combined, x_vars=['SWTDN', 'SWGDN', 'T', 'rho', 'p'], y_vars=['DE_solar_generation_actual'])
<seaborn.axisgrid.PairGrid at 0x160d3561128>
There seems to be a linear relation between the wind generation and the wind velocities v1, v2 and v_50m, but not the other quantities.
sns.jointplot(x='v1', y='DE_wind_generation_actual', data=combined, kind='reg')
sns.jointplot(x='v2', y='DE_wind_generation_actual', data=combined, kind='reg')
sns.jointplot(x='v_50m', y='DE_wind_generation_actual', data=combined, kind='reg')
C:\ProgramData\Anaconda3\envs\renewable-energy-weather\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg. warnings.warn("The 'normed' kwarg is deprecated, and has been "
<seaborn.axisgrid.JointGrid at 0x160d33ffcc0>
Similarly, there seems to be a linear relation between the solar generation and the top-of-the-atmosphere and ground radiation.
sns.jointplot(x='SWTDN', y='DE_solar_generation_actual', data=combined, kind='reg')
sns.jointplot(x='SWGDN', y='DE_solar_generation_actual', data=combined, kind='reg')
C:\ProgramData\Anaconda3\envs\renewable-energy-weather\lib\site-packages\matplotlib\axes\_axes.py:6462: UserWarning: The 'normed' kwarg is deprecated, and has been replaced by the 'density' kwarg. warnings.warn("The 'normed' kwarg is deprecated, and has been "
<seaborn.axisgrid.JointGrid at 0x160d46c2a90>
Given the observations above, I’m going to try a linear regression algorithm in order to predict the wind and solar generation from some of the above weather quantities.
# import necessary modules
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_score
# instantiate LinearRegression
lr = LinearRegression()
To predict the wind generation, we construct the features matrix X_wind
with the features v1, v2 and v_50m, and the target Y_wind
with actual wind generation.
X_wind = combined[['v1', 'v2', 'v_50m', 'z0']]
y_wind = combined['DE_wind_generation_actual']
Then, we implement a 5-fold CV procedure.
scores_wind = cross_val_score(lr, X_wind, y_wind, cv=5)
print(scores_wind, "\naverage =", np.mean(scores_wind))
[0.83709573 0.89114316 0.81583076 0.87216318 0.85405042] average = 0.8540566509869567
The first line contains the five values of $R^2$ for each of the 5 folds in the cross validation procedure, whereas the second line is their average. We see that our linear model has an $R^2$ of approximately 0.87, which is quite good for such a simple model! We can make good predictions about the wind generation in Germany in 2016 given only the wind velocities at different heights.
To predict the solar generation, we follow a very similar procedure. We again construct the features matrix X_solar
, but now with the features SWTDN, SWGDN and T, and the target Y_solar
with actual solar generation.
X_solar = combined[['SWTDN', 'SWGDN', 'T']]
y_solar = combined['DE_solar_generation_actual']
Then, we implement a 5-fold CV procedure:
scores_solar = cross_val_score(lr, X_solar, y_solar, cv=5)
print(scores_solar, "\naverage =", np.mean(scores_solar))
[0.8901974 0.95027431 0.95982151 0.95090201 0.8715077 ] average = 0.9245405855731855
We get an even better value of $R^2$! We can make good predictions about the solar generation in Germany in 2016 given only the temperature and top-of-the-atmosphere and ground radiation.