Our focus in the guided part of the project will be on the exchange rate between the euro and the American dollar.
Daria Chemkaeva put together the data set and made it available on Kaggle — the data source is the European Central Bank. Note that the dataset gets regular updates — we downloaded it on January 2021.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import datetime as dt
# Enables Jupyter to display graphs
%matplotlib inline
exchange_rates = pd.read_csv('euro-daily-hist_1999_2020.csv')
#learn some basic facts about the dataset:
exchange_rates.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5699 entries, 0 to 5698 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Period\Unit: 5699 non-null object 1 [Australian dollar ] 5699 non-null object 2 [Bulgarian lev ] 5297 non-null object 3 [Brazilian real ] 5431 non-null object 4 [Canadian dollar ] 5699 non-null object 5 [Swiss franc ] 5699 non-null object 6 [Chinese yuan renminbi ] 5431 non-null object 7 [Cypriot pound ] 2346 non-null object 8 [Czech koruna ] 5699 non-null object 9 [Danish krone ] 5699 non-null object 10 [Estonian kroon ] 3130 non-null object 11 [UK pound sterling ] 5699 non-null object 12 [Greek drachma ] 520 non-null object 13 [Hong Kong dollar ] 5699 non-null object 14 [Croatian kuna ] 5431 non-null object 15 [Hungarian forint ] 5699 non-null object 16 [Indonesian rupiah ] 5699 non-null object 17 [Israeli shekel ] 5431 non-null object 18 [Indian rupee ] 5431 non-null object 19 [Iceland krona ] 3292 non-null float64 20 [Japanese yen ] 5699 non-null object 21 [Korean won ] 5699 non-null object 22 [Lithuanian litas ] 4159 non-null object 23 [Latvian lats ] 3904 non-null object 24 [Maltese lira ] 2346 non-null object 25 [Mexican peso ] 5699 non-null object 26 [Malaysian ringgit ] 5699 non-null object 27 [Norwegian krone ] 5699 non-null object 28 [New Zealand dollar ] 5699 non-null object 29 [Philippine peso ] 5699 non-null object 30 [Polish zloty ] 5699 non-null object 31 [Romanian leu ] 5637 non-null float64 32 [Russian rouble ] 5699 non-null object 33 [Swedish krona ] 5699 non-null object 34 [Singapore dollar ] 5699 non-null object 35 [Slovenian tolar ] 2085 non-null object 36 [Slovak koruna ] 2608 non-null object 37 [Thai baht ] 5699 non-null object 38 [Turkish lira ] 5637 non-null float64 39 [US dollar ] 5699 non-null object 40 [South African rand ] 5699 non-null object dtypes: float64(3), object(38) memory usage: 1.8+ MB
exchange_rates.head()
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-01-08 | 1.5758 | 1.9558 | 6.5748 | 1.5543 | 1.0827 | 7.9184 | NaN | 26.163 | 7.4369 | ... | 4.8708 | 90.8000 | 10.0510 | 1.6228 | NaN | NaN | 36.8480 | 9.0146 | 1.2250 | 18.7212 |
1 | 2021-01-07 | 1.5836 | 1.9558 | 6.5172 | 1.5601 | 1.0833 | 7.9392 | NaN | 26.147 | 7.4392 | ... | 4.8712 | 91.2000 | 10.0575 | 1.6253 | NaN | NaN | 36.8590 | 8.9987 | 1.2276 | 18.7919 |
2 | 2021-01-06 | 1.5824 | 1.9558 | 6.5119 | 1.5640 | 1.0821 | 7.9653 | NaN | 26.145 | 7.4393 | ... | 4.8720 | 90.8175 | 10.0653 | 1.6246 | NaN | NaN | 36.9210 | 9.0554 | 1.2338 | 18.5123 |
3 | 2021-01-05 | 1.5927 | 1.9558 | 6.5517 | 1.5651 | 1.0803 | 7.9315 | NaN | 26.227 | 7.4387 | ... | 4.8721 | 91.6715 | 10.0570 | 1.6180 | NaN | NaN | 36.7760 | 9.0694 | 1.2271 | 18.4194 |
4 | 2021-01-04 | 1.5928 | 1.9558 | 6.3241 | 1.5621 | 1.0811 | 7.9484 | NaN | 26.141 | 7.4379 | ... | 4.8713 | 90.3420 | 10.0895 | 1.6198 | NaN | NaN | 36.7280 | 9.0579 | 1.2296 | 17.9214 |
5 rows × 41 columns
exchange_rates.tail()
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5694 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
5695 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
5696 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
5697 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
5698 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
5 rows × 41 columns
Basic Facts
Data Cleaning
#We rename the [US dollar ] and Period\Unit: columns to something easier to type — US_dollar and Time.
exchange_rates.rename(columns={'[US dollar ]': 'US_dollar','Period\\Unit:': 'Time'},inplace=True)
#We change the Time column to a datetime data type.
exchange_rates['Time'] = pd.to_datetime(exchange_rates['Time'])
#We sort the values by Time in ascending order.
exchange_rates.sort_values('Time', inplace=True)
#We reset the index (and drop the initial index).
exchange_rates.reset_index(drop=True, inplace=True)
#Isolate the Time and the US_dollar columns. Assign them to a different variable named euro_to_dollar.
euro_to_dollar = exchange_rates[['Time','US_dollar']]
#Run the Series.value_counts() method on the US_dollar column, and see if you notice anything wrong.
euro_to_dollar['US_dollar'].value_counts()
#Drop all the rows where the - character appears in the US_dollar column(# - 62)
euro_to_dollar = euro_to_dollar[euro_to_dollar['US_dollar'] != '-']
#Convert the US_dollar column to a float data type.
euro_to_dollar["US_dollar"] = euro_to_dollar["US_dollar"].astype(float)
euro_to_dollar.head(20)
Time | US_dollar | |
---|---|---|
0 | 1999-01-04 | 1.1789 |
1 | 1999-01-05 | 1.1790 |
2 | 1999-01-06 | 1.1743 |
3 | 1999-01-07 | 1.1632 |
4 | 1999-01-08 | 1.1659 |
5 | 1999-01-11 | 1.1569 |
6 | 1999-01-12 | 1.1520 |
7 | 1999-01-13 | 1.1744 |
8 | 1999-01-14 | 1.1653 |
9 | 1999-01-15 | 1.1626 |
10 | 1999-01-18 | 1.1612 |
11 | 1999-01-19 | 1.1616 |
12 | 1999-01-20 | 1.1575 |
13 | 1999-01-21 | 1.1572 |
14 | 1999-01-22 | 1.1567 |
15 | 1999-01-25 | 1.1584 |
16 | 1999-01-26 | 1.1582 |
17 | 1999-01-27 | 1.1529 |
18 | 1999-01-28 | 1.1410 |
19 | 1999-01-29 | 1.1384 |
Generated line plot to visualize the evolution of the euro-dollar exchange rate.
plt.style.use('default')
plt.plot(euro_to_dollar['Time'],
euro_to_dollar['US_dollar'])
plt.show()
Rolling mean
#Calculate the rolling means for the US_dollar(30 days) column using a moving window of 30 days. Add the rolling means to a new column named rolling_mean.
#plt.style.use('default')
euro_to_dollar['rolling_mean'] = euro_to_dollar['US_dollar'].rolling(2).mean()
#NOTE: FOR LOOP SYNTAX IS MORE CONSISE TO IMPLEMENT GRID
plt.figure(figsize=(10, 12))#controls size of grid
#1st subplot
plt.subplot(3, 2, 1)
plt.plot(euro_to_dollar['Time'],
euro_to_dollar['US_dollar'])
plt.title('Orginal value',fontweight="bold")
#2nd
plt.subplot(3, 2, 2)
plt.plot(euro_to_dollar['Time'],euro_to_dollar['US_dollar'].rolling(7).mean())
plt.title('Rolling window 2 days',fontweight="bold")
#3rd
plt.subplot(3, 2, 3)
plt.plot(euro_to_dollar['Time'],euro_to_dollar['US_dollar'].rolling(30).mean())
plt.title('rolling window 7 days',fontweight="bold")
#4rd
plt.subplot(3, 2, 4)
plt.plot(euro_to_dollar['Time'],euro_to_dollar['US_dollar'].rolling(50).mean())
plt.title('rolling window 50 days',fontweight="bold")
#5th
plt.subplot(3, 2,5)
plt.plot(euro_to_dollar['Time'],euro_to_dollar['US_dollar'].rolling(100).mean())
plt.title('rolling window 100 days', fontweight="bold")
#6th
plt.subplot(3, 2,6)
plt.plot(euro_to_dollar['Time'],euro_to_dollar['US_dollar'].rolling(365).mean())
plt.title('rolling mean 365 days',fontweight="bold")
plt.show()
Story telling
We show how the euro-dollar rate changed during the 2007-2008 financial crisis. We can also show the data for 2006 and 2009 for comparison. We can use a line plot.
Project: Euro-USD rate peaked at 1.59 during 2007-2008 financial crisis.
#get dates for usd exchange rates 2006-2010
exchange_rate_06_10 = euro_to_dollar.copy()[(euro_to_dollar['Time'].dt.year >= 2006) & (euro_to_dollar['Time'].dt.year <= 2009)]
#get dates for usd exchange rates 2007-2009
financial_07_08 = euro_to_dollar.copy()[(euro_to_dollar['Time'].dt.year > 2007) & (euro_to_dollar['Time'].dt.year < 2009)]
#financial_07_08.describe()
#exchange_rate_06_10['Time']
euro_to_dollar.columns
Index(['Time', 'US_dollar', 'rolling_mean'], dtype='object')
#line graph to show changes
import matplotlib.style as style
#style.available - check available styles for graph
style.use('fivethirtyeight')#style of graph
#adding the graph
fig, ax = plt.subplots(figsize=(12, 8))#controls size of figure
#2006 - 10 finanacial rate
ax.plot(exchange_rate_06_10['Time'],
exchange_rate_06_10['rolling_mean'],color = 'g')
#2007- 8 financial crisis
ax.plot(financial_07_08['Time'],
financial_07_08['rolling_mean'],color = 'red')
#add title and suptitle
plt.suptitle('Euro-USD rate peaked at 1.59 during 2007-2008\'s financial crisis',
fontweight = 'bold',fontsize = 22)
plt.title('Euro-USD rate peaked between 2006 - 2010',y = 1
,x = 0.245)
#highlight peek exchange rate with vertical line
#ax.axvspan([exchange_rate_06_10(2007), exchange_rate_06_10(2007)],facecolor='#2ca02c')
#modify xticks and yticks
#add signature
#ax.text(-0.1, -1,
#'©DATAQUEST Source: P. Cortez et al.',
# color = '#f0f0f0',
# backgroundcolor = '#4d4d4d',
#size=12)
plt.show()