This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.
To reduce computing time, we also crop all data earlier than 2007.
Files should be executed in the following pipeline:
In this next step, we extract a single file from all the downloaded .zip files and save them to the 'import' folder. Beware, there is going to be a lot of data (~6 GB of .csv files)
from pathlib import Path
import glob
import re
from zipfile import ZipFile
# Folder definitions
download_folder = Path.cwd() / 'download'
import_folder = Path.cwd() / 'import'
# Find all .zip files and generate a list
unzip_files = glob.glob('download/stundenwerte_TU_*_hist.zip')
# Set the name pattern of the file we need
regex_name = re.compile('produkt.*')
# Open all files, look for files that match ne regex pattern, extract to 'import'
for file in unzip_files:
with ZipFile(file, 'r') as zipObj:
list_of_filenames = zipObj.namelist()
extract_filename = list(filter(regex_name.match, list_of_filenames))[0]
zipObj.extract(extract_filename, import_folder)
display('Done')
'Done'