National generation capacity: Processing notebook
This Notebook is part of the National Generation Capacity Datapackage of Open Power System Data. |
The script processes the compiled nationally aggregated generation capacity for European countries. Due to varying formats and data specifications of references for national generation capacity, the script focuses on rearranging the manually compiled data. Thus, the script itself does not collect, select, download or manage data from original sources.
import json
import logging
import os.path
import shutil
import sqlite3
import pandas as pd
import yaml # http://pyyaml.org/, pip install pyyaml, conda install pyyaml
#create output folder if it does not exist
os.makedirs('output', exist_ok=True)
os.makedirs(os.path.join('output', 'original_data'), exist_ok=True)
The manually compiled dataset is imported and rearranged to a DataFrame for further processing. The dataset comprises for each European country and specified generation technology different data entries, which are based on different sources. As these sources differ by country and year, information on the corresponding reference are directly given with the data entry.
data_file = 'National_Generation_Capacities.xlsx'
filepath = os.path.join('input', data_file)
# Read data into pandas
data_raw = pd.read_excel(filepath,
sheetname='Summary',
header=None,
na_values=['-'],
skiprows=0)
# Deal with merged cells from Excel: fill first three rows with information
data_raw.iloc[0:2] = data_raw.iloc[0:2].fillna(method='ffill', axis=1)
# Set index for rows
data_raw = data_raw.set_index([0])
data_raw.index.name = 'technology'
# Extract energylevels from raw data
energylevels = data_raw.ix[:, 0:5]
# Delete definition of energy levels from raw data
data_raw = data_raw.drop(data_raw.columns[[0, 1, 2, 3, 4]], axis=1)
# Set multiindex column names
data_raw.columns = pd.MultiIndex.from_arrays(data_raw[:6].values,
names=['country', 'type', 'year',
'source', 'source_type',
'capacity_definition'])
# Remove 3 rows which are already used as column names
data_raw = data_raw[pd.notnull(data_raw.index)]
data_raw
The initial raw data is given as a cross-table format. To ensure the compatibility of standard data formats, we convert the initial crosstab format of the input data to a list.
# Reshape dataframe to list
data = pd.DataFrame(data_raw.stack(level=['source', 'source_type', 'year',
'type', 'country',
'capacity_definition']))
# Reset index for dataframe
data = data.reset_index()
data['technology'] = data['technology'].str.replace('- ', '')
data = data.set_index('technology')
# Delete entries with missing source
data = data[data['source'].isnull() == False]
data = data[data['source'] != 0]
data = data.rename(columns={0: 'capacity'})
data['capacity'] = pd.to_numeric(data['capacity'], errors='coerce')
data.head()
Due to varying categorizations in the country-specific raw data, a revised categorization of energy sources and conversion technologies is applied with the aim to reflect the individual categorization of the different national references at the best. We specify the following four different energy source levels, which can be seen as general levels of the classification:
# Drop first rows
energylevels = energylevels[5:]
# Define multiindex for DataFrame
energylevels.columns = pd.MultiIndex.from_arrays(energylevels[:1].values,
names=['level'])
# Delete double rows due to multiindex defintion
energylevels = energylevels[pd.notnull(energylevels.index)]
# Reset index for Dataframe
energylevels = energylevels.reset_index()
energylevels['technology'] = energylevels['technology'].str.replace('- ', '')
energylevels = energylevels.set_index('technology')
# Rename column headings
energylevels.rename(columns={'Level 0': 'energy_source_level_0',
'Level 1': 'energy_source_level_1',
'Level 2': 'energy_source_level_2',
'Level 3': 'energy_source_level_3',
'Technology level': 'technology_level'},
inplace=True)
# Convert 0/1 assignment to boolean
energylevels = energylevels.replace({0: False, 1: True})
# Apply technology levels to data by merging both DataFrames
data = data.merge(energylevels,
left_index=True,
right_index=True,
how='left')
data = data.reset_index()
data.head()
We document the data packages meta data in the specific format JSON as proposed by the Open Knowledge Foundation. See the Frictionless Data project by OKFN (http://data.okfn.org/) and the Data Package specifications (http://dataprotocols.org/data-packages/) for more details.
In order to keep the notebook more readable, we first formulate the metadata in the human-readable YAML format using a multi-line string. We then parse the string into a Python dictionary and save that to disk as a JSON file.
# Here we define meta data of the resulting data package.
# The meta data follows the specification at:
# http://dataprotocols.org/data-packages/
metadata = """
name: opsd-national_generation_capacity
title: National generation capacity
description: Aggregated generation capacity by technology and country
long_description: This data package comprises technology-specific aggregated generation capacities for European countries. The generation capacities are consistently categorized based on fuel and technology. For each European country, various references are used ranging from international (e.g. ENTSOE or EUROSTAT) to national sources from e.g. regulatory authorities. The input data is processed in the script linked below.
version: "2016-10-27"
keywords:
- generation capacity
- conventional generation
- renewable generation
- Europe
- Open Power System Data
homepage: http://data.open-power-system-data.org/national_generation_capacity/
documentation: https://github.com/Open-Power-System-Data/national_generation_capacity/blob/2016-10-27/main.ipynb
geographical-scope: 28 European countries
last_changes: Revised technology classification, restructured input file format, adjusted input data to final version
resources:
- path: national_generation_capacity.xlsx
format: xlsx
mediatype: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
- path: national_generation_capacity_stacked.csv
format: csv
mediatype: text/csv
schema:
fields:
- name: id
description: ID for data entries
type: integer
- name: technology
description: Generation technology defined by fuel and conversion technology
type: string
opsd-contentfilter: "true"
- name: source
description: Source of data entry
type: string
opsd-contentfilter: "true"
- name: source_type
description: Type of data source
type: string
opsd-contentfilter: "true"
- name: year
description: Year of data entry
type: integer
format: YYYY
opsd-contentfilter: "true"
- name: type
description: Type of capacity (e.g. installed capacity)
type: string
- name: country
description: Country ISO code
type: string
opsd-contentfilter: "true"
- name: capacity_definition
description: Capacity definition used in the relevant source (net, gross, or unknown)
type: string
- name: capacity
description: Installed capacity in MW
type: float
- name: energy_source_level_0
description: Energy source level 0 (total aggregated capacity)
type: boolean
- name: energy_source_level_1
description: Energy source level 1 (aggregation or classification by type of fuel)
type: boolean
- name: energy_source_level_2
description: Energy source level 2 (aggregation or classification by fuel)
type: boolean
- name: energy_source_level_3
description: Energy source level 3 (aggregation or classification by fuel refined for bioenergy)
type: boolean
- name: technology_level
description: Technology (aggregation or classification by fuel and technology)
type: boolean
- path: national_generation_capacity_stacked.xlsx
format: xlsx
mediatype: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
schema:
fields:
- name: id
description: ID for data entries
type: integer
- name: technology
description: Generation technology defined by fuel and conversion technology
type: string
- name: source
description: Source of data entry
type: string
- name: source_type
description: Type of data source
type: string
- name: year
description: Year of data entry
type: integer
format: YYYY
- name: type
description: Type of capacity (e.g. installed capacity)
type: string
- name: country
description: Country ISO code
type: string
- name: capacity_definition
description: Capacity definition used in the relevant source (net, gross, or unknown)
type: string
- name: capacity
description: Installed capacity in MW
type: float
- name: energy_source_level_0
description: Energy source level 0 (total aggregated capacity)
type: boolean
- name: energy_source_level_1
description: Energy source level 1 (aggregation or classification by type of fuel)
type: boolean
- name: energy_source_level_2
description: Energy source level 2 (aggregation or classification by fuel)
type: boolean
- name: energy_source_level_3
description: Energy source level 3 (aggregation or classification by fuel refined for bioenergy)
type: boolean
- name: technology_level
description: Technology (aggregation or classification by fuel and technology)
type: boolean
sources:
- name: ENTSOE
web: https://www.entsoe.eu/publications/system-development-reports/adequacy-forecasts/Pages/default.aspx
- name: EUROSTAT
web: http://ec.europa.eu/energy/en/statistics/country
- name: e-control
web: http://www.e-control.at/statistik/strom/bestandsstatistik
- name: ELIA
web: http://www.elia.be/en/grid-data/power-generation/generating-facilities
- name: UN Statistical Office
web: http://data.un.org/Data.aspx?d=EDATA&f=cmID%3AEC
- name: BFE
web: http://www.bfe.admin.ch/themen/00526/00541/00542/00630/index.html?dossier_id=00765
- name: ERU
web: http://www.eru.cz/en/elektrina/statistika-a-sledovani-kvality/rocni-zpravy-o-provozu
- name: BMWi
web: http://www.bmwi.de/BMWi/Redaktion/Binaer/Energiedaten/energietraeger10-stromerzeugungskapazitaeten-bruttostromerzeugung,property=blob,bereich=bmwi2012,sprache=de,rwb=true.xls
- name: DEA
web: http://www.ens.dk/en/info/facts-figures/energy-statistics-indicators-energy-efficiency/annual-energy-statistics
- name: REE
web: http://www.ree.es/en/publications/statistical-data-of-spanish-electrical-system/national-indicators/
- name: RTE 2014
web: http://www.rte-france.com/en/document/overview-electrical-energy-france-march-2014
- name: RTE 2015
web: http://clients.rte-france.com/lang/an/visiteurs/vie/prod/parc_reference.jsp
- name: Terna 2013
web: http://download.terna.it/terna/0000/0216/17.XLSX
- name: Terna 2014
web: http://download.terna.it/terna/0000/0216/16.XLSX
- name: ILR
web: http://www.ilr.public.lu/electricite/statistiques/index.html
- name: Tennet NL
web: http://energieinfo.tennet.org/dataexport/exporteerdatacountry.aspx?id=InstalledCapacity
- name: CIRE
web: http://www.rynek-energii-elektrycznej.cire.pl/st,33,207,tr,75,0,0,0,0,0,podstawowe-dane.html
- name: TSO Bulgaria
web: http://www.tso.bg/uploads/file/Profile/en/ESO_Annual_Report_2012_en.pdf
- name: Statistics Estonia
web: http://pub.stat.ee/px-web.2001/Dialog/varval.asp?ma=FE032&ti=CAPACITY+AND+PRODUCTION+OF+POWER+PLANTS&path=../I_Databas/Economy/07Energy/02Energy_consumption_and_production/01Annual_statistics/&lang=1
- name: Statistics Finland
web: http://pxnet2.stat.fi/PXWeb/pxweb/en/StatFin/StatFin__ene__ehk/240_ehk_tau_112_en.px/table/tableViewLayout1/?rxid=31077c25-37e4-480e-81e6-49a66cbe4dc2
- name: Department of Energy & Climate Change UK
web: https://www.gov.uk/government/statistics/electricity-chapter-5-digest-of-united-kingdom-energy-statistics-dukes
- name: Regulatory Authority for Energy Greece
web: http://www.rae.gr/site/file/system/docs/ActionReports/national_2012
- name: Croation Transmission System operator (HOPS)
web: https://www.hops.hr/wps/wcm/connect/fbb3e297-dbfc-437a-bd36-458e02b9e7e4/Temeljni+podaci+2013.pdf?MOD=AJPERES
- name: Mavir 2014
web: http://www.mavir.hu/documents/10262/188569160/BT_terv_2014/9946a7a2-38ec-4794-9d7f-96a7a927d1b9
- name: Mavir 2013
web: http://www.mavir.hu/documents/10262/188569160/BT_terv_2013_11_12_EN/ea873e22-bf88-4ee4-8a00-db09030bbb34
- name: Eirgrid
web: http://www.soni.ltd.uk/media/documents/Operations/CapacityStatements/All%20Island%20Generation%20Capacity%20Statement%202015.%20-%202024..pdf
- name: Litgrid
web: http://www.litgrid.eu/index.php/power-system/power-system-information/generation-capacity/546
- name: Central Statistical Bureau of Latvia
web: http://data.csb.gov.lv/pxweb/en/vide/vide__ikgad__energetika/EN0130.px/table/tableViewLayout1/?rxid=a79839fe-11ba-4ecd-8cc3-4035692c5fc8
- name: Energy Ministry NO 2013
web: https://www.regjeringen.no/globalassets/upload/oed/faktaheftet/facts_energy_water.pdf
- name: Energy Ministry NO 2015
web: https://www.regjeringen.no/contentassets/fd89d9e2c39a4ac2b9c9a95bf156089a/facts_2015_energy_and_water_web.pdf
- name: REN
web: http://www.ren.pt/files/2015-05/2015-05-04145306_f7664ca7-3a1a-4b25-9f46-2056eef44c33$$72f445d4-8e31-416a-bd01-d7b980134d0f$$ee3c56e5-6d14-4aa0-ac1f-ca5006917e03$$storage_image$$pt$$1.pdf
- name: Anre
web: http://www.anre.ro/download.php?f=ga%2BCig%3D%3D&t=vdeyut7dlcecrLbbvbY%3D
- name: Svensk Energi
web: http://www.svenskenergi.se/Global/Statistik/El%C3%A5ret/El%C3%A5ret%202014_slututg%C3%A5va.pdf
- name: Agencija za energijo 2014
web: http://www.agen-rs.si/documents/10926/38704/Poro%C4%8Dilo/54b1b378-1e76-4d40-8e0d-c30339baa248
- name: Agencija za energijo 2013
web: http://www.agen-rs.si/documents/10926/0/Agencija-za-energijo---Energetika-SLO-za-2013-3.pdf/b63d191d-ecbc-4efe-8b91-1e0f80d3272b
- name: Statistical Office of Slovakia 2013
web: https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=bcc9ac82-9eb4-4320-b460-1f5c726db355
- name: Statistical Office of Slovakia 2014
web: https://slovak.statistics.sk/PortalTraffic/fileServlet?Dokument=6d8bdb1f-528c-41b3-9564-0ff365c98bb8
contributors:
- name: Friedrich Kunz
email: fkunz@diw.de
web: http://open-power-system-data.org/
"""
metadata = yaml.load(metadata)
datapackage_json = json.dumps(metadata, indent=4, separators=(',', ': '))
output_path = 'output'
# Copy original source file to folder original_data
shutil.copy2(os.path.join('input', data_file),
os.path.join('output', 'original_data', data_file))
# Write the result to file
data.to_csv(os.path.join(output_path, 'national_generation_capacity_stacked.csv'),
encoding='utf-8', index_label='ID')
# Write the results to excel file
data.to_excel(os.path.join(output_path, 'national_generation_capacity_stacked.xlsx'),
sheet_name='output', index_label='ID')
shutil.copy2(os.path.join('input', data_file),
os.path.join('output', 'national_generation_capacity.xlsx'))
# Write the results to sql database
data.to_sql('national_generation_capacity_stacked',
sqlite3.connect(os.path.join(output_path,
'national_generation_capacity.sqlite')),
if_exists="replace",index_label='ID')
# Write the information of the metadata
with open(os.path.join(output_path, 'datapackage.json'), 'w') as f:
f.write(datapackage_json)