Time series: Processing Notebook

This Notebook is part of the Time series Data Package of Open Power System Data.

# 1. About Open Power System Data¶

This notebook is part of the project Open Power System Data. Open Power System Data develops a platform for free and open data for electricity system modeling. We collect, check, process, document, and provide data that are publicly available but currently inconvenient to use. More info on Open Power System Data:

# 2. About Jupyter Notebooks and GitHub¶

This file is a Jupyter Notebook. A Jupyter Notebook is a file that combines executable programming code with visualizations and comments in markdown format, allowing for an intuitive documentation of the code. We use Jupyter Notebooks for combined coding and documentation. We use Python 3 as programming language. All Notebooks are stored on GitHub, a platform for software development, and are publicly available. More information on our IT-concept can be found here. See also our step-by-step manual how to use the dataplatform.

We provide data in different chunks, or data packages. The one you are looking at right now, Time series, contains various kinds of time series data:

• Electricity consumption (load): forecast and actual values
• wind and solar power: generation forecast, actual generation, installed capacity, capacity factors (profiles)

In which resolution the data is published depends on the "market time unit" applied in the respective jurisdiction as well as the type of data. For most data types, the following mapping applies:

• 15 minutes: Austria, Belgium, Germany, Hungary, Luxembourg, Netherlands
• 30 minutes: Cyprus, Ireland, United Kingdom
• 60 minutes: All other European countries

For data that are originally available in 15 or 30 minutes resolution, hourly averages are included with the 60 minutes dataset. The original resolition data are is provided in a separate file. The timeseries become available at different points in time depending on the sources. The full dataset is only available from 2015 onwards.

There are two sources for load data: ENTSO-E Power Statistics (PS) and the ENTSO-E Transparency Platform (TP). Both report "total load", which is defined as follows:

$${total \ load} = total \ generation - auxilary/self-consumption \ in \ power \ plants + imports - exports - consumption \ by \ storages$$

The two sources differ Values on PS (~500 TWh annaually in Germany) are usually slightly higher than on the TP (~490 TWh). The reason probably lies with different reporting deadlines: Values on the TP have to be reported "no later than one hour after the end of the operating period". For the PS, the data is published with a delay of up to 3 months, which might allow for more accurate metering. For a comparison of the two sources see Hirth, et al. (2018).

For some countries, the PS report a "represenativity factor" (91% in Germany until 2014, 97% since then), indicating that the reported values would have to be upscaled by this value resulting in ~520 TWh annually in Germany.

Schuhmacher & Hirth (2015) compare the German hourly load total load values to monthly and yearly aggregate consumption statistics for Germany, showing considerable differences, part of which may be explained by the fact that none of th ENTSO-E data cover industrial auto generation which is not transported over the transmission grid.

# 4. Data sources¶

The main data sources are the various European Transmission System Operators (TSOs), the ENTSO-E Power Statistics and the ENTSO-E Transparency Platform. A complete list of data sources is provided on the datapackage information website. They are also contained in the JSON file that contains all metadata.

# 5. Naming conventions¶

The table headers specifies each data column according to 3 categories: region, variable and attribute. region specifies the geographical scope according to the ISO 3166 codes. variable distinguishes consumption,generation and prices. attribute gives further properties of the data that are specific to the respective variable. See the table below for the set of possible combinations.

In [1]:
import pandas as pd; pd.read_csv('input/notation.csv', index_col=list(range(4)))

Out[1]:
region variable attribute Explanation
ISO 3166 area code and name or control area or bidding zone load actual_entsoe_power_statistics Total load as published on ENTSO-E Data Portal
actual_entsoe_transparency Total load as published on ENTSO-E Data Portal/Power Statistics
actual_net_consumption_tso Total load exluding transmission losses as published onby the TSO
actual_gross_generation_tso Total power generation from national TSO