This notebook pulls historical temperature data from the DWD server and formats it for future use in other projects. The data is delivered in a hourly frequencs in a .zip file for each of the available weather stations. To use the data, we need everythin in a single .csv-file, all stations side-by-side. Also, we need the daily average.
To reduce computing time, we also crop all data earlier than 2007.
Files should be executed in the following pipeline:
Here we download all relevant files from the DWS Server. The DWD Server is http-based, so we scrape the download page for all links that match 'stundenwerte_TU_.*_hist.zip' and download them to the folder 'download'.
Link to the relevant DWD-page: https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/
import requests
import re
from bs4 import BeautifulSoup
from pathlib import Path
# Set base values
download_folder = Path.cwd() / 'download'
base_url = 'https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/hourly/air_temperature/historical/'
# Initiate Session and get the Index-Page
with requests.Session() as s:
resp = s.get(base_url)
# Parse the Index-Page for all relevant <a href>
soup = BeautifulSoup(resp.content)
links = soup.findAll("a", href=re.compile("stundenwerte_TU_.*_hist.zip"))
# For testing, only download 10 files
file_max = 10
dl_count = 0
#Download the .zip files to the download_folder
for link in links:
zip_response = requests.get(base_url + link['href'], stream=True)
# Limit the downloads while testing
dl_count += 1
if dl_count > file_max:
break
with open(Path(download_folder) / link['href'], 'wb') as file:
for chunk in zip_response.iter_content(chunk_size=128):
file.write(chunk)
print('Done')
Done