In this project, we will explore a dataset that describes Euro daily exchange rates between 1999 and 2021 and use data visualization to compare the exchange rates of American dollar($) and euros(€). If the exchange rate between euro and the US dollar is 1.2, that means you get 1.2 dollars for 1 euro. Thus, euro has more value than US dollar in that exchange rate.
The dataset is made available on Kaggle by Daria Chemkaeve.
For our data storytelling, we will:
Let's start by importing pandas, reading our data into a dataframe and inspecting the dataframe to understand its structure and some basic facts.
import pandas as pd
exchange_rates = pd.read_csv('euro-daily-hist_1999_2020.csv')
exchange_rates.head() # inspecting the first 5 rows
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-01-08 | 1.5758 | 1.9558 | 6.5748 | 1.5543 | 1.0827 | 7.9184 | NaN | 26.163 | 7.4369 | ... | 4.8708 | 90.8000 | 10.0510 | 1.6228 | NaN | NaN | 36.8480 | 9.0146 | 1.2250 | 18.7212 |
1 | 2021-01-07 | 1.5836 | 1.9558 | 6.5172 | 1.5601 | 1.0833 | 7.9392 | NaN | 26.147 | 7.4392 | ... | 4.8712 | 91.2000 | 10.0575 | 1.6253 | NaN | NaN | 36.8590 | 8.9987 | 1.2276 | 18.7919 |
2 | 2021-01-06 | 1.5824 | 1.9558 | 6.5119 | 1.5640 | 1.0821 | 7.9653 | NaN | 26.145 | 7.4393 | ... | 4.8720 | 90.8175 | 10.0653 | 1.6246 | NaN | NaN | 36.9210 | 9.0554 | 1.2338 | 18.5123 |
3 | 2021-01-05 | 1.5927 | 1.9558 | 6.5517 | 1.5651 | 1.0803 | 7.9315 | NaN | 26.227 | 7.4387 | ... | 4.8721 | 91.6715 | 10.0570 | 1.6180 | NaN | NaN | 36.7760 | 9.0694 | 1.2271 | 18.4194 |
4 | 2021-01-04 | 1.5928 | 1.9558 | 6.3241 | 1.5621 | 1.0811 | 7.9484 | NaN | 26.141 | 7.4379 | ... | 4.8713 | 90.3420 | 10.0895 | 1.6198 | NaN | NaN | 36.7280 | 9.0579 | 1.2296 | 17.9214 |
5 rows × 41 columns
exchange_rates.tail() # inspecting the last 5 rows
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5694 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
5695 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
5696 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
5697 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
5698 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
5 rows × 41 columns
exchange_rates.info() # inspecting basic information about the dataset
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5699 entries, 0 to 5698 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Period\Unit: 5699 non-null object 1 [Australian dollar ] 5699 non-null object 2 [Bulgarian lev ] 5297 non-null object 3 [Brazilian real ] 5431 non-null object 4 [Canadian dollar ] 5699 non-null object 5 [Swiss franc ] 5699 non-null object 6 [Chinese yuan renminbi ] 5431 non-null object 7 [Cypriot pound ] 2346 non-null object 8 [Czech koruna ] 5699 non-null object 9 [Danish krone ] 5699 non-null object 10 [Estonian kroon ] 3130 non-null object 11 [UK pound sterling ] 5699 non-null object 12 [Greek drachma ] 520 non-null object 13 [Hong Kong dollar ] 5699 non-null object 14 [Croatian kuna ] 5431 non-null object 15 [Hungarian forint ] 5699 non-null object 16 [Indonesian rupiah ] 5699 non-null object 17 [Israeli shekel ] 5431 non-null object 18 [Indian rupee ] 5431 non-null object 19 [Iceland krona ] 3292 non-null float64 20 [Japanese yen ] 5699 non-null object 21 [Korean won ] 5699 non-null object 22 [Lithuanian litas ] 4159 non-null object 23 [Latvian lats ] 3904 non-null object 24 [Maltese lira ] 2346 non-null object 25 [Mexican peso ] 5699 non-null object 26 [Malaysian ringgit ] 5699 non-null object 27 [Norwegian krone ] 5699 non-null object 28 [New Zealand dollar ] 5699 non-null object 29 [Philippine peso ] 5699 non-null object 30 [Polish zloty ] 5699 non-null object 31 [Romanian leu ] 5637 non-null float64 32 [Russian rouble ] 5699 non-null object 33 [Swedish krona ] 5699 non-null object 34 [Singapore dollar ] 5699 non-null object 35 [Slovenian tolar ] 2085 non-null object 36 [Slovak koruna ] 2608 non-null object 37 [Thai baht ] 5699 non-null object 38 [Turkish lira ] 5637 non-null float64 39 [US dollar ] 5699 non-null object 40 [South African rand ] 5699 non-null object dtypes: float64(3), object(38) memory usage: 1.8+ MB
Period/Unit
and [US dollar]
columns).Before visualizing our data, let's do some basic data cleaning. In the following cell, we will do:
[US dollar]
to US_dollar
and Period\Unit
to Date
.Date
column to a datetime object.Date
in ascending order.exchange_rates.rename(columns={'[US dollar ]': 'US_dollar',
'Period\\Unit:': 'Date'},
inplace=True)
exchange_rates['Date'] = pd.to_datetime(exchange_rates['Date'])
exchange_rates.sort_values('Date', inplace=True) # from oldest to newest
exchange_rates.reset_index(drop=True, inplace=True)
Let's extract the US_dollar
and Date
columns from our dataset.
euro_to_dollar = exchange_rates[['Date', 'US_dollar']].copy()
Let's explore the US_dollar
column in our new dataset to make sure it does not contain any non-numeric character.
euro_to_dollar['US_dollar'].value_counts()
- 62 1.2276 9 1.1215 8 1.1305 7 1.1797 6 .. 1.2571 1 1.2610 1 1.2651 1 1.2632 1 1.2193 1 Name: US_dollar, Length: 3528, dtype: int64
US_dollar
column contains 62 (-) values.Let's remove them and convert the column to a float.
euro_to_dollar = euro_to_dollar[euro_to_dollar['US_dollar'] != '-']
euro_to_dollar['US_dollar'] = euro_to_dollar['US_dollar'].astype(float)
euro_to_dollar['US_dollar'].value_counts()
1.2276 9 1.1215 8 1.1305 7 1.3373 6 1.3532 6 .. 1.2571 1 1.2610 1 1.2651 1 1.2632 1 1.2193 1 Name: US_dollar, Length: 3527, dtype: int64
euro_to_dollar.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 5637 entries, 0 to 5698 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Date 5637 non-null datetime64[ns] 1 US_dollar 5637 non-null float64 dtypes: datetime64[ns](1), float64(1) memory usage: 132.1 KB
Now we are ready to start our data visualization.
Let's use a line chart to represent the evolution of euro to dollar exchange rates.
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(euro_to_dollar['Date'],
euro_to_dollar['US_dollar'])
plt.show()
To reduce the wiggles, we will use rolling mean. The rolling mean can be used to find the averages of rates per week, month or year. Let's see its working.
plt.figure(figsize=(9,6))
plt.subplot(3,2,1)
plt.plot(euro_to_dollar['Date'], euro_to_dollar['US_dollar'])
plt.title('Original values')
for i, rolling_mean in zip([2,3,4,5,6], [7,30,50,100,365]):
plt.subplot(3,2,i)
plt.plot(euro_to_dollar['Date'], euro_to_dollar['US_dollar'].rolling(rolling_mean).mean())
plt.title('Rolling window: ' + str(rolling_mean))
plt.tight_layout() # to automatically adjust paddings between subplots
plt.show()
For our data visualization, we will use a rolling mean of 30 days. Thus, we will plot the monthly means of exchange rates.
euro_to_dollar['rolling_mean'] = euro_to_dollar['US_dollar'].rolling(30).mean()
euro_to_dollar
Date | US_dollar | rolling_mean | |
---|---|---|---|
0 | 1999-01-04 | 1.1789 | NaN |
1 | 1999-01-05 | 1.1790 | NaN |
2 | 1999-01-06 | 1.1743 | NaN |
3 | 1999-01-07 | 1.1632 | NaN |
4 | 1999-01-08 | 1.1659 | NaN |
... | ... | ... | ... |
5694 | 2021-01-04 | 1.2296 | 1.211170 |
5695 | 2021-01-05 | 1.2271 | 1.212530 |
5696 | 2021-01-06 | 1.2338 | 1.213987 |
5697 | 2021-01-07 | 1.2276 | 1.215357 |
5698 | 2021-01-08 | 1.2250 | 1.216557 |
5637 rows × 3 columns
plt.plot(euro_to_dollar['Date'],
euro_to_dollar['rolling_mean'])
plt.show()
In this section, we will create a line plot using 'fivethirtyeight'
style and other matplotlib functions to tell the story of how euro-dollar rate changes during the coronavirus pandemic of 2020.
Thus, we will focus our attention to only few years before and after the pandemic.
Let's use 2018-2021 as our base and highlight the 2020-2021 part.
covid = euro_to_dollar.copy()[(euro_to_dollar['Date'].dt.year >= 2018) &
(euro_to_dollar['Date'].dt.year <= 2021)]
after_covid = euro_to_dollar.copy()[(euro_to_dollar['Date'].dt.year >= 2020) &
(euro_to_dollar['Date'].dt.year <= 2021)]
after_covid['rolling_mean'].min() # minimum rate during the pandemic
1.0849966666666666
# Adding the FiveThirtyEight style
import matplotlib.style as style
style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(11,6))
ax.plot(covid['Date'], covid['rolling_mean'], linewidth=1, color='#0000FF')
ax.plot(after_covid['Date'], after_covid['rolling_mean'], linewidth=3, color='#FF0000')
# Adding separate tick labels
ax.set_xticklabels([])
ax.set_yticklabels([])
# the x and y values below were obtained through trial and error
x = 17500.0
for year in ['2018', '2019', '2020', '2021']:
ax.text(x, 1.07, year, alpha=0.5, fontsize=12)
x += 370
y = 1.078
for rate in ['1.08', '1.10', '1.12', '1.14', '1.16', '1.18', '1.20', '1.22', '1.24']:
ax.text(17430.0, y, rate, alpha=0.5, fontsize=12)
y += 0.02
# highlighting peak rate during the crisis
ax.axvspan(xmin=18360.0, xmax=18425.0, ymin=0.018, ymax=0.5,
alpha=0.5, color='grey')
# adding title and subtitle
ax.text(17430.0, 1.26, "Euro-USD rate dipped to 1.085 during 2020 coronavirus pandemic",
weight='bold', size=20)
ax.text(17430.0, 1.25, 'Euro-USD exchange rates between 2018 and 2021',
size=16)
# making the grid lines transparent
ax.grid(alpha=0.3)
In this section, we will create a line plot using 'fivethirtyeight'
style and other matplotlib functions to tell the story of how euro-dollar rate changes during the financial crisis of 2007-2008.
Thus, we will focus our attention to only few years before and after the crisis.
Let's use 2006-2010 as our base and highlight the 2007-2008 part.
financial_crisis = euro_to_dollar.copy()[(euro_to_dollar['Date'].dt.year >= 2006) &
(euro_to_dollar['Date'].dt.year <= 2009 )]
financial_crisis_7_8 = euro_to_dollar.copy()[(euro_to_dollar['Date'].dt.year >= 2007) &
(euro_to_dollar['Date'].dt.year <= 2008)]
financial_crisis_7_8['rolling_mean'].max() # maximu rate during the financial crisis
1.5743333333333331
# Adding the plot
fig, ax = plt.subplots(figsize=(11,6))
ax.plot(financial_crisis['Date'],
financial_crisis['rolling_mean'],
linewidth=1, color='#A6D785')
# Highlighting the 2007-2008 period
ax.plot(financial_crisis_7_8['Date'],
financial_crisis_7_8['rolling_mean'],
linewidth=3, color='#e23d28')
# Adding separate tick labels
ax.set_xticklabels([])
ax.set_yticklabels([])
# the x and y values below were obtained through trial and error
x = 13120.0
for year in ['2006', '2007', '2008', '2009', '2010']:
ax.text(x, 1.15, year, alpha=0.5, fontsize=12)
x += 365
y = 1.196
for rate in ['1.2', '1.3', '1.4', '1.5']:
ax.text(13040, y, rate, alpha=0.5, fontsize=12)
y += 0.1
# highlighting peak rate during the crisis
ax.axvspan(xmin=13965.0, xmax=14145.0, ymin=0.085,
alpha=0.5, color='grey')
# adding title and subtitle
ax.text(13040.0, 1.64, "Euro-USD rate peaked at 1.57 during 2007-2008's financial crisis",
weight='bold', size=20)
ax.text(13040.0, 1.61, 'Euro-USD exchange rates between 2006 and 2010',
size=16)
# making the grid lines transparent
ax.grid(alpha=0.3)
#ax.text(13100, 1.14, '2006')
In this project we have used data storytelling to find out that the Euro-USD
rate dipped down to 1.085 during the 2020 coronavirus pandemic and peaked at 1.57 during the 2007-2008 financial crisis.