Prepared by Mattia Girardi (EMOS).
This notebook aims at reproducing the illustrations in the Statistics Explained article on comparative price levels of consumer goods and services. In particular, the goal of this article is to carryi out an analysis for the price levels for consumer goods and services in the European Union (EU).
Data are based on Price Level Indices (PLIs), which provide a comparison of countries' price levels relative to the EU average and are calculated using Purchasing Power Parities.
In this work we will use three main packages:
# Importing the packages that are used across the project
import os
import eurostat
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
Interactive representations are generated using the plotly.io.write_html
method in the rest of the notebook.
These plots will be saved in a dedicated directory:
_DOMAIN_ = "economy/comparative-price-consumer-goods-services"
_SAVDIR_ = "../../docs/%s" % _DOMAIN_
In this section, the countries analyzed are the 27 EU Member States, United Kingdom, three EFTA countries (Iceland, Norway and Switzerland), five candidate countries (Albania, Montenegro, North Macedonia, Serbia and Turkey and one potential candidate country (Bosnia and Herzegovina); as well as, the United States and Japan (for having an extra-EU comparison).
In the following code, data are retrieved using the Python
package eurostat
which collects bulk data from the online page (and not from the API).
First of all, we create a variable that stores all the available datasets (in dataframe
format); this will help us in picking up the required dataset.
# We retrieve and save all the available datasets in a dataframe
toc_df = eurostat.get_toc_df()
In toc_df
, metadata are stored for each dataset listed, not the data. For extracting data, it is required a code
, which is contained in the second column of toc_df
dataframe.
So, we select the dataset that we need for this section, which has a title named "Comparative price levels". In order to pick the information about that, we use the function eurostat.subset_toc_df
, which allows us to search for the info required, filtering for a specific string, "comparative price".
# Filtering for the dataset we search for developing the project
toc_df_subset = eurostat.subset_toc_df(toc_df, 'comparative price')
Now, we can get the code required for extracting data, printing toc_df_subset
.
print(toc_df_subset)
The required code is reported in the second columns; therefore, by inserting it in the function eurostat.get_data_df
we can extract the bulk data:
# Retrieving the dataset and saving it in a variable
df = eurostat.get_data_df('tec00120')
print(df.head())
df1 = df.drop(index = [10,15,17,44])
print(df1.head())
In order to have a clear understanding of the data we will use, we print out the dictionary for the data category: data are codified through alpha-numeric format and categorized according to their scope. The dictionary explains the meaning of each category code.
This procedure will be particularly useful for the sections in which we will deal with multiple categories.
# Saving a dictionary for understanding categories
dic = eurostat.get_dic('ppp_cat')
print(dic)
We want to represent the first image in the article.
We are going to represent a plot which shows different colours for the Countries'categories, their status with respect to the EU. To do this, we use a for
loop:
# Writing the for loop
area = []
for i in df1['geo\\time']:
if i == 'JP' or i == 'US' or i == 'TR' or i == 'MK' or i == 'RS' or i == 'ME' or i == 'AL' or i == 'BA' or i == 'ME':
area.append('Etxra-EU')
elif i == 'UK':
area.append('Former EU')
elif i == 'EA19' or i == 'EU27_2020':
area.append('Aggregate EU')
else:
area.append('EU')
# Inserting the list to the dataset
df1['area'] = area
# check the dataset
print(df1.head())
Now, we can represent the barplot.
fig1 = px.bar(df1, x = 'geo\\time', y = 2019, labels = dict(area = 'Area'),
color = "area", color_discrete_sequence = ["orange", "blue","crimson", "steelblue"], opacity = 0.8)
fig1.update_xaxes(tickangle = 45)
fig1.update_layout(title = 'Price Level Index for Final Household Consumption Expenditure 2019',
xaxis = {'categoryorder':'array', 'categoryarray':['EU27_2020','EA19', 'AT', 'BE', 'BG', 'CH','CY','CZ', 'DE',
'DK', 'EE', 'EL', 'ES',
'FI', 'FR',
'HR', 'HU', 'IE', 'IS', 'IT', 'LT',
'LU', 'LV', 'MT', 'NL','NO', 'PL',
'PT', 'RO', 'SE', 'SI', 'SK','UK','AL','BA','ME','MK','RS','TR','US','JP']},
showlegend = False)
fig1.layout.template = 'plotly_white'
fig1.update_xaxes(title="Countries")
fig1.show()
pio.write_html(fig1, file=os.path.join(_SAVDIR_,'figure1.html'), auto_open=True)
We create the graph representing even the data for 2018, comparing the results, with a graph that is similar to the previous one.
This time we proceed using a for
loop assigning a color for each area; in this way, we can make a barplot assigning pre-defined colours for each area.
# Comparative price level for 2018 & 2019
colors = []
for i in df1['area']:
if i == 'Etxra-EU':
colors.append('orange')
elif i == 'Former EU':
colors.append('steelblue')
elif i == 'Aggregate EU':
colors.append('crimson')
else:
colors.append('blue')
df2 = df1
df2['color'] = colors
fig2 = go.Figure()
fig2.add_trace(go.Bar(x = df2['geo\\time'], y = df2[2018],
name = '2018', marker_color = df2['color']))
fig2.add_trace(go.Bar(x = df2['geo\\time'], y = df2[2019],
name = '2019', marker_color = df2['color'], opacity = 0.6))
fig2.update_xaxes(tickangle = 45)
fig2.layout.template = 'plotly_white'
fig2.update_layout(
title = 'Price Level Index for Final Household Consumption Expenditure for 2019 and 2018\
<br>(the opaque colors are for 2018)',
xaxis = {'categoryorder':'array', 'categoryarray':['EU27_2020', 'EA19', 'AT', 'BE', 'BG', 'CH', 'CY', 'CZ', 'DE',
'DK', 'EE', 'EL', 'ES',
'FI', 'FR',
'HR', 'HU', 'IE', 'IS', 'IT', 'LT',
'LU', 'LV', 'MT', 'NL', 'NO', 'PL',
'PT', 'RO', 'SE', 'SI', 'SK', 'UK', 'AL', 'BA', 'ME', 'MK', 'RS', 'TR', 'US', 'JP']},
yaxis = dict(
title = 'Values (€)',
titlefont_size = 15,
tickfont_size = 14,
), showlegend = True
)
fig2.update_xaxes(title="Countries")
fig2.show()
pio.write_html(fig2, file=os.path.join(_SAVDIR_,'figure2.html'), auto_open=True)
Now, we want to represent the data for HCFE by an increasing order, through a plot similar to Figure 1.
Instead of reporting values at the top of the bar, we will plot an interactive graph.
Through the following code we define the color for all of the countries, then we assign a different color for "EA19"
. After that we have just to create a plot, defining the ascending order.
# Overall price level
# Defining the colors
colors2 = ['purple'] * 38
colors2[10] = 'orange'
# Remove the rows that will not be useful for mine purposes
df2 = df1.drop(index = [16,25,43])
# Plotting the graph
fig3 = px.bar(df2, x = 'geo\\time', y = 2019, labels = dict(countries = 'Countries'), color = colors2, opacity = 0.8)
fig3.update_xaxes(tickangle = 45)
fig3.layout.template = 'plotly_white'
fig3.update_layout(title ='Price Level Index for Final Household Consumption Expenditure 2019 (EU-27=100)',
xaxis ={'categoryorder':'total ascending'},
showlegend = False)
fig3.update_xaxes(title="Countries")
fig3.update_yaxes(title="Values")
fig3.show()
pio.write_html(fig3, file=os.path.join(_SAVDIR_,'figure3.html'), auto_open=True)
We retrieve data using the eurostat.get_data_df
method, which extracts data by code (prc_ppp_ind), and store it in a variable.
We then select the data referring to the Price Level Indices (PLIs) for the 27 EU Member States (updated in 2020), filtering by the .loc
method. We use this last command even for filtering only the countries listed at the beginning of the article.
# Retrieving data for tables' plots
table_1 = eurostat.get_data_df('prc_ppp_ind')
# filtering data by index
tab3_2 = table_1.loc[table_1['na_item'] == "PLI_EU27_2020"]
# filtering data by countries
tab3_3 = tab3_2.loc[(tab3_2['geo\\time'] == 'EA19') | (tab3_2['geo\\time'] == 'AT') | (tab3_2['geo\\time'] == 'BE')|
(tab3_2['geo\\time'] == 'BG') | (tab3_2['geo\\time'] == 'CH') | (tab3_2['geo\\time'] == 'CY')
| (tab3_2['geo\\time'] == 'CZ') | (tab3_2['geo\\time'] == 'DE') | (tab3_2['geo\\time'] == 'DK')
| (tab3_2['geo\\time'] == 'EE') | (tab3_2['geo\\time'] == 'EL') | (tab3_2['geo\\time'] == 'ES')
| (tab3_2['geo\\time'] == 'FI') | (tab3_2['geo\\time'] == 'FR') | (tab3_2['geo\\time'] == 'HR' )
| (tab3_2['geo\\time'] == 'HU') | (tab3_2['geo\\time'] == 'IE')
| (tab3_2['geo\\time'] == 'IS') | (tab3_2['geo\\time'] == 'IT') | (tab3_2['geo\\time'] == 'LT')
| (tab3_2['geo\\time'] == 'LU') | (tab3_2['geo\\time'] == 'LV') | (tab3_2['geo\\time'] == 'MT')
| (tab3_2['geo\\time'] == 'NL') | (tab3_2['geo\\time'] == 'NO') | (tab3_2['geo\\time'] == 'PL')
| (tab3_2['geo\\time'] == 'PT') | (tab3_2['geo\\time'] == 'RO') | (tab3_2['geo\\time'] == 'SE')
| (tab3_2['geo\\time'] == 'SI') | (tab3_2['geo\\time'] == 'SK') | (tab3_2['geo\\time'] == 'UK')
| (tab3_2['geo\\time'] == 'AL') | (tab3_2['geo\\time'] == 'BA') | (tab3_2['geo\\time'] == 'ME')
| (tab3_2['geo\\time'] == 'MK') | (tab3_2['geo\\time'] == 'RS') | (tab3_2['geo\\time'] == 'TR')]
We represent Table 1. To do this, we filter data for the goods'/services' categories (using the codes provided in the above-represented dictionary). The codes are:
"A0101"
, indicating Food and non-alcoholic beverages;"A0102"
, indicating Alcoholic beverages, tobacco and narcotics;"A010301"
, indicating Clothing;"A010302"
, for Footwear category.Then, by a for
loop, we assign the string that allows interpreting each category code.
# Filtering for categories
final1 = tab3_3.loc[(tab3_3['ppp_cat'] == "A0101") | (tab3_3['ppp_cat'] == "A0102") |
(tab3_3['ppp_cat'] == "A010301") | (tab3_3['ppp_cat'] == "A010302")].copy()
# Explicating categories' codes
category1 = []
for i in final1['ppp_cat']:
if i == 'A0101':
category1.append('Food and non-alcoholic beverages')
elif i == 'A0102':
category1.append('Alcoholic beverages, tobacco and narcotics')
elif i == 'A010301':
category1.append('Clothing')
else:
category1.append('Footwear')
final1['ppp_cat_expl1'] = category1
print(final1.head())
After editing the dataset, we adjust it for the five subplots (one for each category previously mentioned and one for HCFE.
First of all, we subset the main-dataset by category.
Secondly, we make the subplot for HCFE (filtering for its code, "EO11"
). This subplot will be used across all of three graphs in this section. we use the same already described technique for filtering and explicating the category code. Besides, we assign the bar-colour to a variable. Furthermore, since we want to highlight the highest and lowest values, we reset the index of the dataset, using .reset_index
. we are going to define a different colour for the maximum and minimum values by .max
and .min
functions, and, by their results, we will locate the index, final1hcfe.index[final1hcfe[2019] == maxhcfe]
, corresponding to max and min, storing this result in a list form, appending .tolist()
to the previous command. The output of this procedure is an integer that will be saved in a variable. This variable will be used for defining a specific colour.
# Definig the dataset for each subplot, by filtering the category
s1final1=final1.loc[(tab3_3['ppp_cat'] == "A0101")].copy()
s2final1=final1.loc[(tab3_3['ppp_cat'] == "A0102")].copy()
s3final1=final1.loc[(tab3_3['ppp_cat'] == "A010301")].copy()
s4final1=final1.loc[(tab3_3['ppp_cat'] == "A010302")].copy()
## SUBPLOT for HFCE
# Making a subset
final1hcfe = tab3_3.loc[(tab3_3['ppp_cat'] == "E011")]
# Explicating the category code for HCFE
hcfe = []
for i in final1hcfe['ppp_cat']:
if i == 'E011':
hcfe.append('HCFE')
final1hcfe['hcfe'] = hcfe
# Defining the color for the bars
colorsub_hcfe = ['royalblue'] * 38
# Resetting indexes to easily find the country corresponding to the max and min value, selecting that index for
# making its bar differently-coloured
final1hcfe = final1hcfe.reset_index(drop=True)
final1hcfe.reset_index(drop=True, inplace=True)
# Identifying the highest and lowest values in the subset
maxhcfe = final1hcfe[2019].max()
minhcfe = final1hcfe[2019].min()
# Identifying the highest and lowest values in the subset by index and storing the results in variables
max_fhcfe = final1hcfe.index[final1hcfe[2019] == maxhcfe].tolist()
max_fhcfe = max_fhcfe[0]
min_fhcfe = final1hcfe.index[final1hcfe[2019] == minhcfe].tolist()
min_fhcfe = min_fhcfe[0]
# Assigning different color to max and min values
colorsub_hcfe[max_fhcfe] = ['midnightblue']
colorsub_hcfe[min_fhcfe] = ['midnightblue']
We use the same method for the other categories' subplots:
# Resetting indexes to find the country corresponding to the max and min value, selecting that index for
# making its bar differently-coloured
s1final1 = s1final1.reset_index(drop=True)
s1final1.reset_index(drop=True, inplace=True)
s2final1 = s2final1.reset_index(drop=True)
s2final1.reset_index(drop=True, inplace=True)
s3final1 = s3final1.reset_index(drop=True)
s3final1.reset_index(drop=True, inplace=True)
s4final1 = s4final1.reset_index(drop=True)
s4final1.reset_index(drop=True, inplace=True)
## SUBPLOT1
# Defining the color
colorsub_s1final1 = ['slategrey'] * 38
# Defining the highest value
max_s1final1 = s1final1[2019].max()
f_s1final1 = s1final1.index[s1final1[2019] == max_s1final1].tolist()
f_s1final1 = f_s1final1[0]
# Defining the lowest value
min_s1final1 = s1final1[2019].min()
f_s1final1_min = s1final1.index[s1final1[2019] == min_s1final1].tolist()
f_s1final1_min = f_s1final1_min[0]
# Defining the colors for max and min values
colorsub_s1final1[f_s1final1] = ['midnightblue']
colorsub_s1final1[f_s1final1_min] = ['midnightblue']
## SUBPLOT2
# Defining the color
colorsub_s2final1 = ['dodgerblue'] * 38
# Defining the highest value
max_s2final1 = s2final1[2019].max()
f_s2final1 = s2final1.index[s2final1[2019] == max_s2final1].tolist()
f_s2final1 = f_s2final1[0]
# Defining the lowest value
min_s2final1 = s2final1[2019].min()
f_s2final1_min = s2final1.index[s2final1[2019] == min_s2final1].tolist()
f_s2final1_min = f_s2final1_min[0]
# Defining the colors for max and min values
colorsub_s2final1[f_s2final1] = ['midnightblue']
colorsub_s2final1[f_s2final1_min] = ['midnightblue']
## SUBPLOT3
# Defining the color
colorsub_s3final1 = ['lightsteelblue'] * 38
# Defining the highest value
max_s3final1 = s3final1[2019].max()
f_s3final1 = s3final1.index[s3final1[2019] == max_s3final1].tolist()
f_s3final1 = f_s3final1[0]
# Defining the lowest value
min_s3final1 = s3final1[2019].min()
f_s3final1_min = s3final1.index[s3final1[2019] == min_s3final1].tolist()
f_s3final1_min = f_s3final1_min[0]
# Defining the colors for max and min values
colorsub_s3final1[f_s3final1] = ['midnightblue']
colorsub_s3final1[f_s3final1_min] = ['midnightblue']
## SUBPLOT4
# Defining the color
colorsub_s4final1 = ['steelblue'] * 38
# Defining the highest value
max_s4final1 = s4final1[2019].max()
f_s4final1 = s4final1.index[s4final1[2019] == max_s4final1].tolist()
f_s4final1 = f_s4final1[0]
# Defining the lowest value
min_s4final1 = s4final1[2019].min()
f_s4final1_min = s4final1.index[s4final1[2019] == min_s4final1].tolist()
f_s4final1_min = f_s4final1_min[0]
# Defining the colors for max and min values
colorsub_s4final1[f_s4final1] = ['midnightblue']
colorsub_s4final1[f_s4final1_min] = ['midnightblue']
Finally, we plot the subplots:
# Plotting
fig4 = make_subplots(rows=5, cols=1,
x_title='Countries',
y_title='Values (€)',shared_xaxes=True, shared_yaxes=True)
fig4.add_bar(x = final1hcfe['geo\\time'], y = final1hcfe[2019], name='HCFE', marker_color = colorsub_hcfe, row = 1, col = 1)
fig4.add_bar(x = s1final1['geo\\time'], y = s1final1[2019], name='Food', marker_color = colorsub_s1final1, row = 2, col = 1)
fig4.add_bar(x = s2final1['geo\\time'], y = s2final1[2019], name='Beverages', marker_color = colorsub_s2final1, row = 3, col = 1)
fig4.add_bar(x = s3final1['geo\\time'], y = s3final1[2019], name='Clothing', marker_color = colorsub_s3final1, row = 4, col = 1)
fig4.add_bar(x = s4final1['geo\\time'], y = s4final1[2019], name='Footwear', marker_color = colorsub_s4final1, row = 5, col = 1)
fig4.update_xaxes(tickangle = 45)
fig4.layout.template = 'plotly_white'
fig4.update_layout(
title = "Price levels for food, beverages, tobacco, clothing and footwear 2019\
<br> (darker bars represent the highest and lowest values)", showlegend = True
)
fig4.show()
pio.write_html(fig4, file=os.path.join(_SAVDIR_,'figure4.html'), auto_open=True)
We represent Table 2. The methodology is the same used in the previous section. There will not be any code referring to HCFE since everything was previously defined. Hereby, the codes analysed refer to the consumption of:
"A010405"
, Electricity;"A010501"
, Furniture;"A010503"
, Households appliances;"A050102"
, Electronics.# Filtering for categories
final2 = tab3_3.loc[(tab3_3['ppp_cat'] == "A010405") | (tab3_3['ppp_cat'] == "A010501") |
(tab3_3['ppp_cat'] == "A010503") | (tab3_3['ppp_cat']== "A050102")].copy()
# Explicating categories'codes
category2 = []
for i in final2['ppp_cat']:
if i == 'A010405':
category2.append('Electricity')
elif i == 'A010501':
category2.append('Furniture')
elif i == 'A010503':
category2.append('Households appliances')
else:
category2.append('Consumer electronics')
final2['ppp_cat_expl'] = category2
print(final2.head())
# Definig the dataset for each subplot, by filtering the category
s1final2=final2.loc[(tab3_3['ppp_cat'] == "A010405")]
s2final2=final2.loc[(tab3_3['ppp_cat'] == "A010501")]
s3final2=final2.loc[(tab3_3['ppp_cat'] == "A010503")]
s4final2=final2.loc[(tab3_3['ppp_cat'] == "A050102")]
# Resetting indexes to find the country corresponding to the max value and selecting that index for
# coloring its bar differently
s1final2 = s1final2.reset_index(drop=True)
s1final2.reset_index(drop=True, inplace=True)
s2final2 = s2final2.reset_index(drop=True)
s2final2.reset_index(drop=True, inplace=True)
s3final2 = s3final2.reset_index(drop=True)
s3final2.reset_index(drop=True, inplace=True)
s4final2 = s4final2.reset_index(drop=True)
s4final2.reset_index(drop=True, inplace=True)
## SUBPLOT1
# Defining the color
colorsub_s1final2 = ['slategrey'] * 38
# Defining the highest value
max_s1final2 = s1final2[2019].max()
f_s1final2 = s1final2.index[s1final2[2019] == max_s1final2].tolist()
f_s1final2 = f_s1final2[0]
# Defining the lowest value
min_s1final2 = s1final2[2019].min()
f_s1final2_min = s1final2.index[s1final2[2019] == min_s1final2].tolist()
f_s1final2_min = f_s1final2_min[0]
# Defining the colors for max and min values
colorsub_s1final2[f_s1final2] = ['midnightblue']
colorsub_s1final2[f_s1final2_min] = ['midnightblue']
## SUBPLOT2
# Defining the color
colorsub_s2final2 = ['dodgerblue'] * 38
# Defining the highest value
max_s2final2 = s2final2[2019].max()
f_s2final2 = s2final2.index[s2final2[2019] == max_s2final2].tolist()
f_s2final2 = f_s2final2[0]
# Defining the lowest value
min_s2final2 = s2final2[2019].min()
f_s2final2_min = s2final2.index[s2final2[2019] == min_s2final2].tolist()
f_s2final2_min = f_s2final2_min[0]
# Defining the colors for max and min values
colorsub_s2final2[f_s2final2] = ['midnightblue']
colorsub_s2final2[f_s2final2_min] = ['midnightblue']
## SUBPLOT3
# Defining the color
colorsub_s3final2 = ['lightsteelblue'] * 38
# Defining the highest value
max_s3final2 = s3final2[2019].max()
f_s3final2 = s3final2.index[s3final2[2019] == max_s3final2].tolist()
f_s3final2 = f_s3final2[0]
# Defining the lowest value
min_s3final2 = s3final2[2019].min()
f_s3final2_min = s3final2.index[s3final2[2019] == min_s3final2].tolist()
f_s3final2_min = f_s3final2_min[0]
# Defining the colors for max and min values
colorsub_s3final2[f_s3final2] = ['midnightblue']
colorsub_s3final2[f_s3final2_min] = ['midnightblue']
## SUBPLOT4
# Defining the color
colorsub_s4final2 = ['steelblue'] * 38
# Defining the highest value
max_s4final2 = s4final2[2019].max()
f_s4final2 = s4final2.index[s4final2[2019] == max_s4final2].tolist()
f_s4final2 = f_s4final2[0]
# Defining the lowest value
min_s4final2 = s4final2[2019].min()
f_s4final2_min = s4final2.index[s4final2[2019] == min_s4final2].tolist()
f_s4final2_min = f_s4final2_min[0]
# Defining the colors for max and min values
colorsub_s4final2[f_s4final2] = ['midnightblue']
colorsub_s4final2[f_s4final2_min] = ['midnightblue']
# Plotting
fig5 = make_subplots(rows=5, cols=1,
x_title='Countries',
y_title='Values (€)',shared_xaxes=True, shared_yaxes=True)
fig5.add_bar(x = final1hcfe['geo\\time'], y = final1hcfe[2019], name='HCFE', marker_color = colorsub_hcfe, row = 1, col = 1)
fig5.add_bar(x = s1final2['geo\\time'], y = s1final2[2019], name='Electricity', marker_color = colorsub_s1final2, row = 2, col = 1)
fig5.add_bar(x = s2final2['geo\\time'], y = s2final2[2019], name='Furniture', marker_color = colorsub_s2final2, row = 3, col = 1)
fig5.add_bar(x = s3final2['geo\\time'], y = s3final2[2019], name='Households appliances', marker_color = colorsub_s3final2, row = 4, col = 1)
fig5.add_bar(x = s4final2['geo\\time'], y = s4final2[2019], name='Consumer electronics', marker_color = colorsub_s4final2, row = 5, col = 1)
fig5.update_xaxes(tickangle = 45)
fig5.layout.template = 'plotly_white'
fig5.update_layout(
title = "Price levels for energy, furniture, household appliances and consumer electronics 2019\
<br> (Midnightblue bars represent the highest and lowest values)", showlegend = True
)
fig5.show()
pio.write_html(fig5, file=os.path.join(_SAVDIR_,'figure5.html'), auto_open=True)
We represent Table 2. The methodology is the same used in the previous sections. Again, there will not be any code referring to HCFE since everything was previously defined. Hereby, the codes analysed refer to the consumption of:
"A010701"
, Personal Transport Equipment;"A010703"
, Transport Services;"A0108"
, Communication;"A0111"
, Hotels & Restaurants.# Filtering for categories
final3 = tab3_3.loc[(tab3_3['ppp_cat'] == "A010701") | (tab3_3['ppp_cat'] == "A010703") |
(tab3_3['ppp_cat'] == "A0108") | (tab3_3['ppp_cat'] == "A0111")].copy()
# Explicating categories' codes
category3=[]
for i in final3['ppp_cat']:
if i == 'A010701':
category3.append('Personal Transport Equipment')
elif i == 'A010703':
category3.append('Transport Services')
elif i == 'A0108':
category3.append('Communication')
else:
category3.append('Hotels & Restaurants')
final3['ppp_cat_expl3'] = category3
print(final3.head())
# Definig the dataset for each subplot, by filtering the category
s1final3=final3.loc[(tab3_3['ppp_cat'] == "A010701")]
s2final3=final3.loc[(tab3_3['ppp_cat'] == "A010703")]
s3final3=final3.loc[(tab3_3['ppp_cat'] == "A0108")]
s4final3=final3.loc[(tab3_3['ppp_cat'] == "A0111")]
# Resetting indexes to find the country corresponding to the max value and selecting that index for
# coloring its bar differently
s1final3 = s1final3.reset_index(drop=True)
s1final3.reset_index(drop=True, inplace=True)
s2final3 = s2final3.reset_index(drop=True)
s2final3.reset_index(drop=True, inplace=True)
s3final3 = s3final3.reset_index(drop=True)
s3final3.reset_index(drop=True, inplace=True)
s4final3 = s4final3.reset_index(drop=True)
s4final3.reset_index(drop=True, inplace=True)
## SUBPLOT1
# Defining the color
colorsub_s1final3 = ['slategrey'] * 38
# Defining the highest value
max_s1final3 = s1final3[2019].max()
f_s1final3 = s1final3.index[s1final3[2019] == max_s1final3].tolist()
f_s1final3 = f_s1final3[0]
# Defining the lowest value
min_s1final3 = s1final3[2019].min()
f_s1final3_min = s1final3.index[s1final3[2019] == min_s1final3].tolist()
f_s1final3_min = f_s1final3_min[0]
# Defining the colors for max and min values
colorsub_s1final3[f_s1final3] = ['midnightblue']
colorsub_s1final3[f_s1final3_min] = ['midnightblue']
## SUBPLOT2
# Defining the color
colorsub_s2final3 = ['dodgerblue'] * 38
# Defining the highest value
max_s2final3 = s2final3[2019].max()
f_s2final3 = s2final3.index[s2final3[2019] == max_s2final3].tolist()
f_s2final3 = f_s2final3[0]
# Defining the lowest value
min_s2final3 = s2final3[2019].min()
f_s2final3_min = s2final3.index[s2final3[2019] == min_s2final3].tolist()
f_s2final3_min = f_s2final3_min[0]
# Defining the colors for max and min values
colorsub_s2final3[f_s2final3] = ['midnightblue']
colorsub_s2final3[f_s2final3_min] = ['midnightblue']
## SUBPLOT3
# Defining the color
colorsub_s3final3 = ['lightsteelblue'] * 38
# Defining the highest value
max_s3final3 = s3final3[2019].max()
f_s3final3 = s3final3.index[s3final3[2019] == max_s3final3].tolist()
f_s3final3 = f_s3final3[0]
# Defining the lowest value
min_s3final3 = s3final3[2019].min()
f_s3final3_min = s3final3.index[s3final3[2019] == min_s3final3].tolist()
f_s3final3_min = f_s3final3_min[0]
# Defining the colors for max and min values
colorsub_s3final3[f_s3final3] = ['midnightblue']
colorsub_s3final3[f_s3final3_min] = ['midnightblue']
## SUBPLOT4
# Defining the color
colorsub_s4final3 = ['steelblue'] * 38
# Defining the highest value
max_s4final3 = s4final3[2019].max()
f_s4final3 = s4final3.index[s4final3[2019] == max_s4final3].tolist()
f_s4final3 = f_s4final3[0]
# Defining the lowest value
min_s4final3 = s4final3[2019].min()
f_s4final3_min = s4final3.index[s4final3[2019] == min_s4final3].tolist()
f_s4final3_min = f_s4final3_min[0]
# Defining the colors for max and min values
colorsub_s4final3[f_s4final3] = ['midnightblue']
colorsub_s4final3[f_s4final3_min] = ['midnightblue']
# Plotting
fig6 = make_subplots(rows=5, cols=1,
x_title='Countries',
y_title='Values (€)',shared_xaxes=True, shared_yaxes=True)
fig6.add_bar(x = final1hcfe['geo\\time'], y = final1hcfe[2019], name='HCFE', marker_color = colorsub_hcfe, row = 1, col = 1)
fig6.add_bar(x = s1final3['geo\\time'], y = s1final3[2019], name='Personal Transport Equipment', marker_color = colorsub_s1final3, row = 2, col = 1)
fig6.add_bar(x = s2final3['geo\\time'], y = s2final3[2019], name='Transport Services', marker_color = colorsub_s2final3, row = 3, col = 1)
fig6.add_bar(x = s3final3['geo\\time'], y = s3final3[2019], name='Communication', marker_color = colorsub_s3final3, row = 4, col = 1)
fig6.add_bar(x = s4final3['geo\\time'], y = s4final3[2019], name='Hotels & Restaurants', marker_color = colorsub_s4final3, row = 5, col = 1)
fig6.update_xaxes(tickangle = 45)
fig6.layout.template = 'plotly_white'
fig6.update_layout(
title = "Price levels for personal transport equipment, transport services,\
<br>communication, restaurants and hotels 2019 (darker bars represent the highest and lowest values)", showlegend = True
)
fig6.show()
pio.write_html(fig6, file=os.path.join(_SAVDIR_,'figure6.html'), auto_open=True)
In this section we want to represent the graph of Figure 2.
This graph depicts the Coefficients of Variation (CV) of the PLI for total HCFE over time. A decreasing CV of the coefficients of variation signals a price convergence; when increasing, it depicts a divergence.
However, instead of displaying the values of 'All 27', we will use the value for the candidate and (Albania, Montenegro, North Macedonia, Serbia and Turkey) and one Potential countries (CPC1), due to data availability.
The script for extracting data is similar to the precedent sections. So, we get data through eurostat.get_data_df
, selecting prc_ppp_conv; we pick the values referring to the Coefficient of Variation for PLI with "CV_PLI"
, and we filter the data for HCFE with "1E011"
.
After that, we select the group countries used here.
# Retrieving data
eu_conv = eurostat.get_data_df('prc_ppp_conv')
eu_conv_2 = eu_conv.loc[eu_conv['statinfo'] == "CV_PLI"]
final_conv = eu_conv_2.loc[eu_conv_2['ppp_cat'] == "E011"]
final_conv2 = final_conv.loc[(final_conv['geo\\time'] == "EU15") | (final_conv['geo\\time'] == 'EA19') |
(final_conv['geo\\time'] == 'EU27_2020') | (final_conv['geo\\time'] == 'CPC1')]
print(final_conv2.head())
Since we want to replicate the above line-chart, we transpose the dataset by .T
. This operation will allow me to use years as x-axis ticks
:
final_conv2_2=final_conv2[[2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009]]
final_conv2_t=final_conv2_2.T
print(final_conv2_t.head())
Before making the chart, we manipulate the dataset renaming the variables (that were named with the indexes' values), we reset the indexes and we add a column for assigning the years as a variable:
final_conv2_t.rename(columns = {20: 'EA19', 21:'EU15', 24:'EU27', 13:'CPC1'}, inplace=True)
final_conv2_t2 = final_conv2_t.reset_index(drop=True)
final_conv2_t2['Years'] = [2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009]
print(final_conv2_t2)
We finally plot the graph:
# Plotting
fig7 = go.Figure()
fig7.add_trace(go.Scatter(x=final_conv2_t2['Years'], y=final_conv2_t2['EA19'],
mode='lines+markers',
name='EA19'))
fig7.add_trace(go.Scatter(x=final_conv2_t2['Years'], y=final_conv2_t2['EU15'],
mode='lines+markers',
name='EU15'))
fig7.add_trace(go.Scatter(x=final_conv2_t2['Years'], y=final_conv2_t2['EU27'],
mode='lines+markers',
name='EU27'))
fig7.add_trace(go.Scatter(x=final_conv2_t2['Years'], y=final_conv2_t2['CPC1'],
mode='lines+markers',
name='CPC1'))
fig7.layout.template = 'plotly_white'
fig7.update_xaxes(title = "Years", showline=True, linewidth=2, linecolor='black', tickangle = 45,tickmode='linear', showgrid=False)
fig7.update_yaxes(title = "Coefficiten of Variations", showline=True, linewidth=2, linecolor='black', showgrid=False)
fig7.update_layout(
title = "Price Convergence (CVs)", showlegend = True)
fig7.show()
pio.write_html(fig7, file=os.path.join(_SAVDIR_,'figure7.html'), auto_open=True)