This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.
To reduce computing time, we also crop all data earlier than 2007.
Files should be executed in the following pipeline:
We load in the data that has been saved in the last step, so we don't need to calculate everything again it we pause the project and come back later.
The data contains some errors, which need to be cleaned. You can see, by looking at the output of main_df.describe() in the last cell, that the minimum teperature on some stations is -999. That means that there is no plausible measurement for this particular hour. We change this to np.nan, so that we can safely calculate the avarage values.
Finally we resample the data to daily means.
import numpy as np
import pandas as pd
from pathlib import Path
# Import and export paths
pkl_file = Path.cwd() / "export_uncleaned" / "to_clean.pkl"
cleaned_file = Path.cwd() / "export_cleaned" / "cleaned.csv"
# Read in the pickle file from the last cell
cleaning_df = pd.read_pickle(pkl_file)
# Replace all values with "-999", which indicate missing data
cleaning_df.replace(to_replace=-999, value=np.nan, inplace=True)
# Resample to daily frequency
cleaning_df = cleaning_df.resample('D').mean().round(decimals=2)
# Save as .csv
cleaning_df.to_csv(cleaned_file, sep=";", decimal=",")
display(cleaning_df.loc['2011-12-31':'2012-01-04'])
display(cleaning_df.describe())
display(cleaning_df)
TT_TU | |||||||||
---|---|---|---|---|---|---|---|---|---|
STATIONS_ID | 3 | 44 | 71 | 73 | 78 | 91 | 96 | 102 | 125 |
MESS_DATUM | |||||||||
2011-12-31 | NaN | 3.88 | 2.76 | 1.19 | 4.30 | 2.43 | NaN | 3.80 | NaN |
2012-01-01 | NaN | 10.90 | 8.14 | 4.03 | 10.96 | 10.27 | NaN | 9.01 | NaN |
2012-01-02 | NaN | 7.41 | 6.18 | 4.77 | 7.57 | 7.77 | NaN | 6.48 | 4.66 |
2012-01-03 | NaN | 6.14 | 3.61 | 4.46 | 6.38 | 5.28 | NaN | 5.63 | 3.51 |
2012-01-04 | NaN | 5.80 | 2.48 | 4.45 | 5.46 | 4.57 | NaN | 5.85 | 1.94 |
TT_TU | |||||||||
---|---|---|---|---|---|---|---|---|---|
STATIONS_ID | 3 | 44 | 71 | 73 | 78 | 91 | 96 | 102 | 125 |
count | 1551.000000 | 4629.000000 | 3683.000000 | 4652.000000 | 4748.000000 | 4748.000000 | 267.000000 | 4490.000000 | 3935.000000 |
mean | 10.103939 | 10.088153 | 8.411244 | 9.686855 | 9.872342 | 9.208837 | 13.193633 | 10.220345 | 8.466612 |
std | 6.742460 | 6.653983 | 7.511708 | 7.849776 | 6.658399 | 7.124324 | 6.762327 | 6.076649 | 7.711229 |
min | -10.870000 | -10.710000 | -14.940000 | -14.320000 | -12.390000 | -15.710000 | -0.970000 | -8.170000 | -16.420000 |
25% | 5.410000 | 5.250000 | 2.620000 | 3.397500 | 5.090000 | 3.870000 | 7.575000 | 5.790000 | 2.365000 |
50% | 10.140000 | 10.320000 | 8.570000 | 9.900000 | 9.900000 | 9.230000 | 13.770000 | 10.200000 | 8.540000 |
75% | 15.350000 | 15.380000 | 14.070000 | 16.080000 | 15.122500 | 14.820000 | 18.195000 | 15.260000 | 14.545000 |
max | 28.410000 | 28.450000 | 27.190000 | 26.940000 | 29.890000 | 27.550000 | 26.980000 | 27.330000 | 28.030000 |
TT_TU | |||||||||
---|---|---|---|---|---|---|---|---|---|
STATIONS_ID | 3 | 44 | 71 | 73 | 78 | 91 | 96 | 102 | 125 |
MESS_DATUM | |||||||||
2007-01-01 | 7.38 | NaN | NaN | NaN | 7.42 | 6.55 | NaN | 8.32 | NaN |
2007-01-02 | 4.67 | NaN | NaN | NaN | 4.49 | 2.88 | NaN | 6.73 | 0.51 |
2007-01-03 | 6.19 | NaN | NaN | NaN | 4.87 | 4.25 | NaN | 7.12 | 0.91 |
2007-01-04 | 7.69 | NaN | NaN | NaN | 7.82 | 5.85 | NaN | 8.34 | 4.43 |
2007-01-05 | 7.78 | NaN | NaN | NaN | 7.47 | 6.03 | NaN | 8.20 | 3.92 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2019-12-27 | NaN | 2.03 | 3.95 | 2.27 | 2.36 | 1.41 | 2.21 | 3.79 | 2.78 |
2019-12-28 | NaN | 0.38 | -0.59 | -0.27 | -0.07 | -2.10 | -0.05 | 2.32 | -1.29 |
2019-12-29 | NaN | 0.68 | -2.04 | -3.63 | 0.07 | -2.41 | -0.97 | 2.81 | -4.40 |
2019-12-30 | NaN | 5.92 | 1.88 | -2.46 | 5.57 | -1.26 | 3.78 | 5.97 | -1.32 |
2019-12-31 | NaN | 5.54 | 1.92 | -0.41 | 4.05 | -0.46 | 5.56 | 7.66 | 1.91 |
4748 rows × 9 columns