The data set was downloaded from Kaggle on January 2021. It was put together by Daria Chemkaeva using data from the European Central Bank.
The data set is a collection of historic exchange rates from 1999-2021 of the EUR against other currencies, including the USD.
USD = USD exchange rate * EUR
Let's see what we are working with here!
'Period\Unit:' column:
'[US dollar ]' column:
import pandas as pd
x_rates = pd.read_csv("euro-daily-hist_1999_2020.csv")
x_rates.head()
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2021-01-08 | 1.5758 | 1.9558 | 6.5748 | 1.5543 | 1.0827 | 7.9184 | NaN | 26.163 | 7.4369 | ... | 4.8708 | 90.8000 | 10.0510 | 1.6228 | NaN | NaN | 36.8480 | 9.0146 | 1.2250 | 18.7212 |
1 | 2021-01-07 | 1.5836 | 1.9558 | 6.5172 | 1.5601 | 1.0833 | 7.9392 | NaN | 26.147 | 7.4392 | ... | 4.8712 | 91.2000 | 10.0575 | 1.6253 | NaN | NaN | 36.8590 | 8.9987 | 1.2276 | 18.7919 |
2 | 2021-01-06 | 1.5824 | 1.9558 | 6.5119 | 1.5640 | 1.0821 | 7.9653 | NaN | 26.145 | 7.4393 | ... | 4.8720 | 90.8175 | 10.0653 | 1.6246 | NaN | NaN | 36.9210 | 9.0554 | 1.2338 | 18.5123 |
3 | 2021-01-05 | 1.5927 | 1.9558 | 6.5517 | 1.5651 | 1.0803 | 7.9315 | NaN | 26.227 | 7.4387 | ... | 4.8721 | 91.6715 | 10.0570 | 1.6180 | NaN | NaN | 36.7760 | 9.0694 | 1.2271 | 18.4194 |
4 | 2021-01-04 | 1.5928 | 1.9558 | 6.3241 | 1.5621 | 1.0811 | 7.9484 | NaN | 26.141 | 7.4379 | ... | 4.8713 | 90.3420 | 10.0895 | 1.6198 | NaN | NaN | 36.7280 | 9.0579 | 1.2296 | 17.9214 |
5 rows × 41 columns
x_rates.tail()
Period\Unit: | [Australian dollar ] | [Bulgarian lev ] | [Brazilian real ] | [Canadian dollar ] | [Swiss franc ] | [Chinese yuan renminbi ] | [Cypriot pound ] | [Czech koruna ] | [Danish krone ] | ... | [Romanian leu ] | [Russian rouble ] | [Swedish krona ] | [Singapore dollar ] | [Slovenian tolar ] | [Slovak koruna ] | [Thai baht ] | [Turkish lira ] | [US dollar ] | [South African rand ] | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5694 | 1999-01-08 | 1.8406 | NaN | NaN | 1.7643 | 1.6138 | NaN | 0.58187 | 34.938 | 7.4433 | ... | 1.3143 | 27.2075 | 9.1650 | 1.9537 | 188.8400 | 42.560 | 42.5590 | 0.3718 | 1.1659 | 6.7855 |
5695 | 1999-01-07 | 1.8474 | NaN | NaN | 1.7602 | 1.6165 | NaN | 0.58187 | 34.886 | 7.4431 | ... | 1.3092 | 26.9876 | 9.1800 | 1.9436 | 188.8000 | 42.765 | 42.1678 | 0.3701 | 1.1632 | 6.8283 |
5696 | 1999-01-06 | 1.8820 | NaN | NaN | 1.7711 | 1.6116 | NaN | 0.58200 | 34.850 | 7.4452 | ... | 1.3168 | 27.4315 | 9.3050 | 1.9699 | 188.7000 | 42.778 | 42.6949 | 0.3722 | 1.1743 | 6.7307 |
5697 | 1999-01-05 | 1.8944 | NaN | NaN | 1.7965 | 1.6123 | NaN | 0.58230 | 34.917 | 7.4495 | ... | 1.3168 | 26.5876 | 9.4025 | 1.9655 | 188.7750 | 42.848 | 42.5048 | 0.3728 | 1.1790 | 6.7975 |
5698 | 1999-01-04 | 1.9100 | NaN | NaN | 1.8004 | 1.6168 | NaN | 0.58231 | 35.107 | 7.4501 | ... | 1.3111 | 25.2875 | 9.4696 | 1.9554 | 189.0450 | 42.991 | 42.6799 | 0.3723 | 1.1789 | 6.9358 |
5 rows × 41 columns
x_rates.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5699 entries, 0 to 5698 Data columns (total 41 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Period\Unit: 5699 non-null object 1 [Australian dollar ] 5699 non-null object 2 [Bulgarian lev ] 5297 non-null object 3 [Brazilian real ] 5431 non-null object 4 [Canadian dollar ] 5699 non-null object 5 [Swiss franc ] 5699 non-null object 6 [Chinese yuan renminbi ] 5431 non-null object 7 [Cypriot pound ] 2346 non-null object 8 [Czech koruna ] 5699 non-null object 9 [Danish krone ] 5699 non-null object 10 [Estonian kroon ] 3130 non-null object 11 [UK pound sterling ] 5699 non-null object 12 [Greek drachma ] 520 non-null object 13 [Hong Kong dollar ] 5699 non-null object 14 [Croatian kuna ] 5431 non-null object 15 [Hungarian forint ] 5699 non-null object 16 [Indonesian rupiah ] 5699 non-null object 17 [Israeli shekel ] 5431 non-null object 18 [Indian rupee ] 5431 non-null object 19 [Iceland krona ] 3292 non-null float64 20 [Japanese yen ] 5699 non-null object 21 [Korean won ] 5699 non-null object 22 [Lithuanian litas ] 4159 non-null object 23 [Latvian lats ] 3904 non-null object 24 [Maltese lira ] 2346 non-null object 25 [Mexican peso ] 5699 non-null object 26 [Malaysian ringgit ] 5699 non-null object 27 [Norwegian krone ] 5699 non-null object 28 [New Zealand dollar ] 5699 non-null object 29 [Philippine peso ] 5699 non-null object 30 [Polish zloty ] 5699 non-null object 31 [Romanian leu ] 5637 non-null float64 32 [Russian rouble ] 5699 non-null object 33 [Swedish krona ] 5699 non-null object 34 [Singapore dollar ] 5699 non-null object 35 [Slovenian tolar ] 2085 non-null object 36 [Slovak koruna ] 2608 non-null object 37 [Thai baht ] 5699 non-null object 38 [Turkish lira ] 5637 non-null float64 39 [US dollar ] 5699 non-null object 40 [South African rand ] 5699 non-null object dtypes: float64(3), object(38) memory usage: 1.8+ MB
To be able to work the the data more easily I'm going to change a a few column names and convert some values into different data types.
time column will contain datetime values
# rename the columns - I prefer all lowercase and snakecase
new_col_names={'[US dollar ]':'us_dollar',
'Period\\Unit:':'time'}
x_rates=x_rates.rename(columns=new_col_names)
#convert time string to datetime
x_rates['time'] = pd.to_datetime(x_rates['time'])
#sort the rows based on the time and re-index from the top
x_rates.sort_values('time', inplace=True)
x_rates.reset_index(drop=True, inplace=True)
I want to make changes so that:
us_dollar column will contain float values
Some rows have '-' as a value which need to be removed from the data set.
I noticed the rows had '-' values because of the error traceback when trying to convert the entire column to float.
I checked for placeholder values using value_count() of the us_dollar series and found 62 occurances.
Another good check, though, is the describe method. In the end I only included this check because it resulted in a more descriptive before and after column comparision.
# check for dummy data: '-' used for missing values
print("USD column values BEFORE")
print(x_rates['us_dollar'].describe())
print()
# remove the dummy data and create a dataframe of just time and us_dollar
x_usd = x_rates.loc[x_rates['us_dollar'] != '-', ["time", "us_dollar"]].copy()
x_usd['us_dollar'] = x_usd['us_dollar'].astype(float)
#check the result
print("USD column values AFTER")
print(x_usd['us_dollar'].describe())
USD column values BEFORE count 5699 unique 3528 top - freq 62 Name: us_dollar, dtype: object USD column values AFTER count 5637.000000 mean 1.199704 std 0.162328 min 0.825200 25% 1.098200 50% 1.202300 75% 1.320300 max 1.599000 Name: us_dollar, dtype: float64
Because I tried to work ahead of the guidance I ended up with different variable names.
I am using my names if they are accurately descriptive and shorter ("Thank you!" -my wrists).
In this analysis x_usd is equivalent to dataframe euro_to_dollar in the instructions
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(x_usd['time'], x_usd['us_dollar'])
plt.show()
Plotting each of the data points results in a rugged plot line. To smooth the plot line and increase the data-ink ratio, I will apply a rolling mean of 30 days. This will facilitate longer-term trends to emerge from the daily fluctuations.
x_usd['rolling_mean'] = x_usd['us_dollar'].rolling(30).mean()
plt.plot(x_usd['time'], x_usd['rolling_mean'])
plt.show()
Since the dataset includes multiple currencies, I would like to compare the historical USD-EUR exchange rates with CAD-EUR and visualize how closely the CAD performace follows the USD performance.
This is the focus of the data story I am interested in sharing insights about.
This dataset will be prepared similarly to the USD data set.
new_col_names={'[Canadian dollar ]':'ca_dollar'}
x_rates=x_rates.rename(columns=new_col_names)
print("CAD column values BEFORE")
print(x_rates['ca_dollar'].describe())
print()
# remove the dummy data and create a dataframe of just time and us_dollar
x_cad = x_rates.loc[x_rates['ca_dollar'] != '-', ["time", "ca_dollar"]].copy()
x_cad['ca_dollar'] = x_cad['ca_dollar'].astype(float)
#heck the result
print("CAD column values AFTER")
print(x_cad['ca_dollar'].describe())
CAD column values BEFORE count 5699 unique 2931 top - freq 62 Name: ca_dollar, dtype: object CAD column values AFTER count 5637.000000 mean 1.469395 std 0.099561 min 1.213900 25% 1.398700 50% 1.467600 75% 1.545300 max 1.812300 Name: ca_dollar, dtype: float64
The initial plot line of the CAD exchange rate rolling mean has a different shape than the USD plot. With the exception of a matching dip in 2002, no clear correlation is obvious.
x_cad['rolling_mean'] = x_cad['ca_dollar'].rolling(30).mean()
plt.plot(x_cad['time'], x_cad['rolling_mean'])
plt.show()
Initial focus is to compare the shape of each plot. To visualize this I'm going to superimpose the separate plots on the same plot.
plt.plot(x_usd['time'],x_usd['rolling_mean'], label="USD", color='green')
plt.plot(x_cad['time'],x_cad['rolling_mean'], label='CAD', color='red')
plt.legend(title="Destination\nCurrency", loc='lower center', bbox_to_anchor=(0.515, 0.05))
plt.xlabel("Time")
plt.ylabel("Exchange Rate From EUR")
plt.title("On Par EUR to USD and CAD Exchange Rates 2008-2012")
#x coordinate needs to be a datetime object to match x-axis values
# the rest of values came by fiddling
plt.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
plt.show()
OK well that was fun but now I need to redo it all as an axes to be able to remove the frame / border
fig, (ax, ax2) = plt.subplots(figsize=(6,8), nrows=2, ncols=1)
ax.plot(x_usd['time'],x_usd['rolling_mean'], label="USD", color='green')
ax.plot(x_cad['time'],x_cad['rolling_mean'], label='CAD', color='red')
ax.legend(title="Destination\nCurrency", loc='lower center', bbox_to_anchor=(0.515, 0.05), frameon=False, title_fontsize="small")
remove = ["top", "right"]
for side in remove :
ax.spines[side].set_visible(False)
# failed attempt at changing the date ticks ...
# abandoned due to lack of need
# year_ticks = [pd.to_datetime("2000-01-01"),
# pd.to_datetime("2004-01-01"),
# pd.to_datetime("2008-01-01"),
# pd.to_datetime("2012-01-01"),
# pd.to_datetime("2016-01-01"),
# pd.to_datetime("2012-01-01")]
#ax.set_xticks(year_ticks)
ax.set_xlabel("Time")
ax.set_ylabel("Exchange Rate From EUR")
#x coordinate needs to be a datetime object to match x-axis values
# the rest of values came by fiddling
ax.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
ax.set_title("On Par EUR to USD and CAD Exchange Rates 2008-2013")
Text(0.5, 1.0, 'On Par EUR to USD and CAD Exchange Rates 2008-2013')
I was originally comparing shape when I discovered this anomaly of EUR-CAD rate being on par with the EUR-USD rate.
To look at the shape similarities outside of the abnormal 2008-2012 period, I will re-baseline the CAD values so that the lowest CAD rate matches the lowest USD rate.
I am doing this in a very simple way - I'm taking the difference between the minimum values of the USD and CAD rates and adding it to all the CAD rates. The result will be all the CAD rate values being shifted up.
The overlapping plots may reveal something interesting about the usual correlation between these EUR exchange rate fluctuations.
min_usd = x_usd["rolling_mean"].min()
min_cad = x_cad["rolling_mean"].min()
#normally cad return is higher than usd return when changing euros
min_diff = round(min_cad-min_usd,2)
x_cad["baselined"] = x_cad["rolling_mean"] - min_diff
x_cad["baselined"].describe()
count 5608.000000 mean 1.088403 std 0.096622 min 0.851237 25% 1.018259 50% 1.086870 75% 1.166223 max 1.360690 Name: baselined, dtype: float64
x_usd["rolling_mean"].describe()
count 5608.000000 mean 1.199748 std 0.161895 min 0.849797 25% 1.101631 50% 1.204373 75% 1.319978 max 1.574333 Name: rolling_mean, dtype: float64
#start1 = x_usd['time'].iloc[0]
end1 = pd.to_datetime("2007-01-01")
start2 = pd.to_datetime("2014-01-01")
#end2 = x_usd['time'].iloc[-1]
#first section
sub_usd1 = x_usd.copy()
sub_usd1 = sub_usd1[sub_usd1['time'] <= end1]
sub_cad1 = x_cad.copy()
sub_cad1 = sub_cad1[sub_cad1['time'] <= end1]
#last section
sub_usd2 = x_usd.copy()
sub_usd2 = sub_usd2[sub_usd2['time'] >= start2]
sub_cad2 = x_cad.copy()
sub_cad2 = sub_cad2[sub_cad2['time'] >= start2]
#middle section
sub_usd3 = x_usd.copy()
sub_usd3 = sub_usd3[sub_usd3['time'] >= end1]
sub_usd3 = sub_usd3[sub_usd3['time'] <= start2]
sub_cad3 = x_cad.copy()
sub_cad3 = sub_cad3[sub_cad3['time'] >= end1]
sub_cad3 = sub_cad3[sub_cad3['time'] <= start2]
fig, (ax, ax2) = plt.subplots(figsize=(6,8), nrows=2, ncols=1)
ax.plot(x_usd['time'],x_usd['rolling_mean'],
# label="USD",
color='green', alpha=0.2)
ax.plot(x_cad['time'],x_cad['rolling_mean'],
# label='CAD',
color='red', alpha=0.2)
ax.plot(sub_cad3['time'], sub_cad3['rolling_mean'],
label="CAD",
color='red', alpha=0.8)
ax.plot(sub_usd3['time'], sub_usd3['rolling_mean'],
label="USD",
color='green', alpha=0.8)
ax.legend(title="Destination\nCurrency",
loc='lower center',
bbox_to_anchor=(0.515, 0.05),
#frameon=False,
framealpha=0.8,
title_fontsize="small")
remove = ["top", "right"]
for side in remove :
ax.spines[side].set_visible(False)
#ax.set_xlabel("Time")
ax.set_ylabel("Exchange Rate From EUR")
#ax.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
ax.set_title("On Par EUR to USD and CAD Exchange Rates 2008-2013")
ax2.plot(x_usd['time'],x_usd['rolling_mean'],
color='green', alpha=0.2)
ax2.plot(x_cad['time'],x_cad['baselined'],
color='red', alpha=0.2)
ax2.plot(sub_usd1['time'], sub_usd1['rolling_mean'],
label="USD",
color='green', alpha=0.8)
ax2.plot(sub_usd2['time'], sub_usd2['rolling_mean'],
color='green', alpha=0.8)
ax2.plot(sub_cad1['time'], sub_cad1['baselined'],
label="CAD Baselined",
color='red', alpha=0.8)
ax2.plot(sub_cad2['time'], sub_cad2['baselined'],
color='red', alpha=0.8)
ax2.legend(title="Destination\nCurrency",
bbox_to_anchor=(0.515, 0.85),
loc="upper center",
title_fontsize="small",
facecolor="white",
framealpha=0.8,
fontsize="small")
remove = ["top", "right"]
for side in remove :
ax2.spines[side].set_visible(False)
ax2.set_xlabel("Time")
ax2.set_ylabel("Exchange Rate From EUR")
#ax2.axvline(x=pd.to_datetime("2010-06-01"),
# ymin=0.05, ymax=.85,
# linewidth=94,
# color='grey', alpha=0.2)
ax2.set_title("USD and CAD Exchange Rates Otherwise Trend Together")
ax2.set_yticks([.9,1.1,1.3,1.5])
ax2.set_yticklabels([])
plt.plot()
[]
The EUR - USD rate is typically stronger than the EUR - CAD rate. Nevertheless, the EUR - CAD rate tends to fluctuate in parallel with the EUR - USD. This was visually depicted by moving the entire EUR-CAD plot up to superimpose it over the EUR - USD exchange rates.
An exception to their fluctuations being just in parallel occurred in 2008 when the US Housing Market crashed and during the restabiliztion years after.
For this period the EUR to USD rate significantly dropped to nearly match the EUR to CAD rate with the exception of a short-lived strong rebound in the immediate wake of the crisis.