USD - EUR: Exchange Rates 1999 - 2021¶

Data Set - About¶

The data set was downloaded from Kaggle on January 2021. It was put together by Daria Chemkaeva using data from the European Central Bank.

The data set is a collection of historic exchange rates from 1999-2021 of the EUR against other currencies, including the USD.

USD = USD exchange rate * EUR

Data Set - First Peek¶

Let's see what we are working with here!

'Period\Unit:' column:

could be used as the row index
contains a date in format YYYY-MM-DD
values are stored as strings

'[US dollar ]' column:

format is f.loat (ie. 1.2345)
has no missing values
values are stored as strings.

In [1]:

import pandas as pd
x_rates = pd.read_csv("euro-daily-hist_1999_2020.csv")
x_rates.head()

Out[1]:

	Period\Unit:	[Australian dollar ]	[Bulgarian lev ]	[Brazilian real ]	[Canadian dollar ]	[Swiss franc ]	[Chinese yuan renminbi ]	[Cypriot pound ]	[Czech koruna ]	[Danish krone ]	...	[Romanian leu ]	[Russian rouble ]	[Swedish krona ]	[Singapore dollar ]	[Slovenian tolar ]	[Slovak koruna ]	[Thai baht ]	[Turkish lira ]	[US dollar ]	[South African rand ]
0	2021-01-08	1.5758	1.9558	6.5748	1.5543	1.0827	7.9184	NaN	26.163	7.4369	...	4.8708	90.8000	10.0510	1.6228	NaN	NaN	36.8480	9.0146	1.2250	18.7212
1	2021-01-07	1.5836	1.9558	6.5172	1.5601	1.0833	7.9392	NaN	26.147	7.4392	...	4.8712	91.2000	10.0575	1.6253	NaN	NaN	36.8590	8.9987	1.2276	18.7919
2	2021-01-06	1.5824	1.9558	6.5119	1.5640	1.0821	7.9653	NaN	26.145	7.4393	...	4.8720	90.8175	10.0653	1.6246	NaN	NaN	36.9210	9.0554	1.2338	18.5123
3	2021-01-05	1.5927	1.9558	6.5517	1.5651	1.0803	7.9315	NaN	26.227	7.4387	...	4.8721	91.6715	10.0570	1.6180	NaN	NaN	36.7760	9.0694	1.2271	18.4194
4	2021-01-04	1.5928	1.9558	6.3241	1.5621	1.0811	7.9484	NaN	26.141	7.4379	...	4.8713	90.3420	10.0895	1.6198	NaN	NaN	36.7280	9.0579	1.2296	17.9214

5 rows × 41 columns

In [2]:

x_rates.tail()

Out[2]:

	Period\Unit:	[Australian dollar ]	[Bulgarian lev ]	[Brazilian real ]	[Canadian dollar ]	[Swiss franc ]	[Chinese yuan renminbi ]	[Cypriot pound ]	[Czech koruna ]	[Danish krone ]	...	[Romanian leu ]	[Russian rouble ]	[Swedish krona ]	[Singapore dollar ]	[Slovenian tolar ]	[Slovak koruna ]	[Thai baht ]	[Turkish lira ]	[US dollar ]	[South African rand ]
5694	1999-01-08	1.8406	NaN	NaN	1.7643	1.6138	NaN	0.58187	34.938	7.4433	...	1.3143	27.2075	9.1650	1.9537	188.8400	42.560	42.5590	0.3718	1.1659	6.7855
5695	1999-01-07	1.8474	NaN	NaN	1.7602	1.6165	NaN	0.58187	34.886	7.4431	...	1.3092	26.9876	9.1800	1.9436	188.8000	42.765	42.1678	0.3701	1.1632	6.8283
5696	1999-01-06	1.8820	NaN	NaN	1.7711	1.6116	NaN	0.58200	34.850	7.4452	...	1.3168	27.4315	9.3050	1.9699	188.7000	42.778	42.6949	0.3722	1.1743	6.7307
5697	1999-01-05	1.8944	NaN	NaN	1.7965	1.6123	NaN	0.58230	34.917	7.4495	...	1.3168	26.5876	9.4025	1.9655	188.7750	42.848	42.5048	0.3728	1.1790	6.7975
5698	1999-01-04	1.9100	NaN	NaN	1.8004	1.6168	NaN	0.58231	35.107	7.4501	...	1.3111	25.2875	9.4696	1.9554	189.0450	42.991	42.6799	0.3723	1.1789	6.9358

5 rows × 41 columns

In [3]:

x_rates.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5699 entries, 0 to 5698
Data columns (total 41 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Period\Unit:              5699 non-null   object 
 1   [Australian dollar ]      5699 non-null   object 
 2   [Bulgarian lev ]          5297 non-null   object 
 3   [Brazilian real ]         5431 non-null   object 
 4   [Canadian dollar ]        5699 non-null   object 
 5   [Swiss franc ]            5699 non-null   object 
 6   [Chinese yuan renminbi ]  5431 non-null   object 
 7   [Cypriot pound ]          2346 non-null   object 
 8   [Czech koruna ]           5699 non-null   object 
 9   [Danish krone ]           5699 non-null   object 
 10  [Estonian kroon ]         3130 non-null   object 
 11  [UK pound sterling ]      5699 non-null   object 
 12  [Greek drachma ]          520 non-null    object 
 13  [Hong Kong dollar ]       5699 non-null   object 
 14  [Croatian kuna ]          5431 non-null   object 
 15  [Hungarian forint ]       5699 non-null   object 
 16  [Indonesian rupiah ]      5699 non-null   object 
 17  [Israeli shekel ]         5431 non-null   object 
 18  [Indian rupee ]           5431 non-null   object 
 19  [Iceland krona ]          3292 non-null   float64
 20  [Japanese yen ]           5699 non-null   object 
 21  [Korean won ]             5699 non-null   object 
 22  [Lithuanian litas ]       4159 non-null   object 
 23  [Latvian lats ]           3904 non-null   object 
 24  [Maltese lira ]           2346 non-null   object 
 25  [Mexican peso ]           5699 non-null   object 
 26  [Malaysian ringgit ]      5699 non-null   object 
 27  [Norwegian krone ]        5699 non-null   object 
 28  [New Zealand dollar ]     5699 non-null   object 
 29  [Philippine peso ]        5699 non-null   object 
 30  [Polish zloty ]           5699 non-null   object 
 31  [Romanian leu ]           5637 non-null   float64
 32  [Russian rouble ]         5699 non-null   object 
 33  [Swedish krona ]          5699 non-null   object 
 34  [Singapore dollar ]       5699 non-null   object 
 35  [Slovenian tolar ]        2085 non-null   object 
 36  [Slovak koruna ]          2608 non-null   object 
 37  [Thai baht ]              5699 non-null   object 
 38  [Turkish lira ]           5637 non-null   float64
 39  [US dollar ]              5699 non-null   object 
 40  [South African rand ]     5699 non-null   object 
dtypes: float64(3), object(38)
memory usage: 1.8+ MB

Data Set - Clean Up 1¶

To be able to work the the data more easily I'm going to change a a few column names and convert some values into different data types.

time column will contain datetime values

In [4]:

# rename the columns - I prefer all lowercase and snakecase
new_col_names={'[US dollar ]':'us_dollar',
              'Period\\Unit:':'time'}
x_rates=x_rates.rename(columns=new_col_names)

#convert time string to datetime
x_rates['time'] = pd.to_datetime(x_rates['time'])

#sort the rows based on the time and re-index from the top
x_rates.sort_values('time', inplace=True)
x_rates.reset_index(drop=True, inplace=True)

Data Set - Clean Up 2¶

I want to make changes so that:

us_dollar column will contain float values

Some rows have '-' as a value which need to be removed from the data set.

I noticed the rows had '-' values because of the error traceback when trying to convert the entire column to float.

I checked for placeholder values using value_count() of the us_dollar series and found 62 occurances.

Another good check, though, is the describe method. In the end I only included this check because it resulted in a more descriptive before and after column comparision.

In [5]:

# check for dummy data: '-' used for missing values
print("USD column values BEFORE")
print(x_rates['us_dollar'].describe())
print()

# remove the dummy data and create a dataframe of just time and us_dollar
x_usd = x_rates.loc[x_rates['us_dollar'] != '-', ["time", "us_dollar"]].copy()
x_usd['us_dollar'] = x_usd['us_dollar'].astype(float)

#check the result
print("USD column values AFTER")
print(x_usd['us_dollar'].describe())

USD column values BEFORE
count     5699
unique    3528
top          -
freq        62
Name: us_dollar, dtype: object

USD column values AFTER
count    5637.000000
mean        1.199704
std         0.162328
min         0.825200
25%         1.098200
50%         1.202300
75%         1.320300
max         1.599000
Name: us_dollar, dtype: float64

Disclaimer on Variable Names¶

Because I tried to work ahead of the guidance I ended up with different variable names.

I am using my names if they are accurately descriptive and shorter ("Thank you!" -my wrists).

In this analysis x_usd is equivalent to dataframe euro_to_dollar in the instructions

Data Visualizations¶

Initial Plot Line¶

To start I will plot the us_dollar exchange rate against each time value (date).

In [6]:

import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(x_usd['time'], x_usd['us_dollar'])
plt.show()

Plot Line of the Rolling Mean¶

Plotting each of the data points results in a rugged plot line. To smooth the plot line and increase the data-ink ratio, I will apply a rolling mean of 30 days. This will facilitate longer-term trends to emerge from the daily fluctuations.

In [7]:

x_usd['rolling_mean'] = x_usd['us_dollar'].rolling(30).mean()

plt.plot(x_usd['time'], x_usd['rolling_mean'])
plt.show()

USD - EUR vs CAD - EUR Exchange Rate Comparision¶

Since the dataset includes multiple currencies, I would like to compare the historical USD-EUR exchange rates with CAD-EUR and visualize how closely the CAD performace follows the USD performance.

This is the focus of the data story I am interested in sharing insights about.

CAD - EUR Data Prep¶

This dataset will be prepared similarly to the USD data set.

In [8]:

new_col_names={'[Canadian dollar ]':'ca_dollar'}
x_rates=x_rates.rename(columns=new_col_names)

print("CAD column values BEFORE")
print(x_rates['ca_dollar'].describe())
print()

# remove the dummy data and create a dataframe of just time and us_dollar
x_cad = x_rates.loc[x_rates['ca_dollar'] != '-', ["time", "ca_dollar"]].copy()
x_cad['ca_dollar'] = x_cad['ca_dollar'].astype(float)

#heck the result
print("CAD column values AFTER")
print(x_cad['ca_dollar'].describe())

CAD column values BEFORE
count     5699
unique    2931
top          -
freq        62
Name: ca_dollar, dtype: object

CAD column values AFTER
count    5637.000000
mean        1.469395
std         0.099561
min         1.213900
25%         1.398700
50%         1.467600
75%         1.545300
max         1.812300
Name: ca_dollar, dtype: float64

CAD - EUR Rolling Mean Plot Line¶

The initial plot line of the CAD exchange rate rolling mean has a different shape than the USD plot. With the exception of a matching dip in 2002, no clear correlation is obvious.

In [9]:

x_cad['rolling_mean'] = x_cad['ca_dollar'].rolling(30).mean()

plt.plot(x_cad['time'], x_cad['rolling_mean'])
plt.show()

USD - and CAD - EUR Exchange Rates Plotted Together¶

Initial focus is to compare the shape of each plot. To visualize this I'm going to superimpose the separate plots on the same plot.

In [10]:

plt.plot(x_usd['time'],x_usd['rolling_mean'], label="USD", color='green')
plt.plot(x_cad['time'],x_cad['rolling_mean'], label='CAD', color='red')
plt.legend(title="Destination\nCurrency", loc='lower center', bbox_to_anchor=(0.515, 0.05))
plt.xlabel("Time")
plt.ylabel("Exchange Rate From EUR")
plt.title("On Par EUR to USD and CAD Exchange Rates 2008-2012")

#x coordinate needs to be a datetime object to match x-axis values
# the rest of values came by fiddling
plt.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
plt.show()

Gestalt Principles on Axes Plots¶

OK well that was fun but now I need to redo it all as an axes to be able to remove the frame / border

In [11]:

fig, (ax, ax2) = plt.subplots(figsize=(6,8), nrows=2, ncols=1)

ax.plot(x_usd['time'],x_usd['rolling_mean'], label="USD", color='green')
ax.plot(x_cad['time'],x_cad['rolling_mean'], label='CAD', color='red')
ax.legend(title="Destination\nCurrency", loc='lower center', bbox_to_anchor=(0.515, 0.05), frameon=False, title_fontsize="small")

remove = ["top", "right"]
for side in remove :
    ax.spines[side].set_visible(False)

# failed attempt at changing the date ticks ... 
# abandoned due to lack of need
# year_ticks = [pd.to_datetime("2000-01-01"), 
#               pd.to_datetime("2004-01-01"), 
#               pd.to_datetime("2008-01-01"), 
#               pd.to_datetime("2012-01-01"), 
#               pd.to_datetime("2016-01-01"), 
#               pd.to_datetime("2012-01-01")]
#ax.set_xticks(year_ticks)

ax.set_xlabel("Time")
ax.set_ylabel("Exchange Rate From EUR")
#x coordinate needs to be a datetime object to match x-axis values
# the rest of values came by fiddling
ax.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
ax.set_title("On Par EUR to USD and CAD Exchange Rates 2008-2013")

Out[11]:

Text(0.5, 1.0, 'On Par EUR to USD and CAD Exchange Rates 2008-2013')

Examining Plot Shapes¶

I was originally comparing shape when I discovered this anomaly of EUR-CAD rate being on par with the EUR-USD rate.

To look at the shape similarities outside of the abnormal 2008-2012 period, I will re-baseline the CAD values so that the lowest CAD rate matches the lowest USD rate.

I am doing this in a very simple way - I'm taking the difference between the minimum values of the USD and CAD rates and adding it to all the CAD rates. The result will be all the CAD rate values being shifted up.

The overlapping plots may reveal something interesting about the usual correlation between these EUR exchange rate fluctuations.

In [12]:

min_usd = x_usd["rolling_mean"].min()
min_cad = x_cad["rolling_mean"].min()
#normally cad return is higher than usd return when changing euros
min_diff = round(min_cad-min_usd,2)
x_cad["baselined"] = x_cad["rolling_mean"] - min_diff
x_cad["baselined"].describe()

Out[12]:

count    5608.000000
mean        1.088403
std         0.096622
min         0.851237
25%         1.018259
50%         1.086870
75%         1.166223
max         1.360690
Name: baselined, dtype: float64

In [13]:

x_usd["rolling_mean"].describe()

Out[13]:

count    5608.000000
mean        1.199748
std         0.161895
min         0.849797
25%         1.101631
50%         1.204373
75%         1.319978
max         1.574333
Name: rolling_mean, dtype: float64

Create Subsets for Plotting¶

In [14]:

#start1 = x_usd['time'].iloc[0]
end1 = pd.to_datetime("2007-01-01")
start2 = pd.to_datetime("2014-01-01")
#end2 = x_usd['time'].iloc[-1]

#first section
sub_usd1 = x_usd.copy()
sub_usd1 = sub_usd1[sub_usd1['time'] <= end1]
sub_cad1 = x_cad.copy()
sub_cad1 = sub_cad1[sub_cad1['time'] <= end1]

#last section
sub_usd2 = x_usd.copy()
sub_usd2 = sub_usd2[sub_usd2['time'] >= start2]
sub_cad2 = x_cad.copy()
sub_cad2 = sub_cad2[sub_cad2['time'] >= start2]

#middle section
sub_usd3 = x_usd.copy()
sub_usd3 = sub_usd3[sub_usd3['time'] >= end1]
sub_usd3 = sub_usd3[sub_usd3['time'] <= start2]
sub_cad3 = x_cad.copy()
sub_cad3 = sub_cad3[sub_cad3['time'] >= end1]
sub_cad3 = sub_cad3[sub_cad3['time'] <= start2]

In [17]:

fig, (ax, ax2) = plt.subplots(figsize=(6,8), nrows=2, ncols=1)

ax.plot(x_usd['time'],x_usd['rolling_mean'], 
#        label="USD", 
        color='green', alpha=0.2)
ax.plot(x_cad['time'],x_cad['rolling_mean'], 
#        label='CAD', 
        color='red', alpha=0.2)

ax.plot(sub_cad3['time'], sub_cad3['rolling_mean'],
         label="CAD", 
         color='red', alpha=0.8)

ax.plot(sub_usd3['time'], sub_usd3['rolling_mean'], 
         label="USD", 
         color='green', alpha=0.8)

ax.legend(title="Destination\nCurrency", 
          loc='lower center', 
          bbox_to_anchor=(0.515, 0.05), 
          #frameon=False, 
          framealpha=0.8, 
          title_fontsize="small")

remove = ["top", "right"]
for side in remove :
    ax.spines[side].set_visible(False)

#ax.set_xlabel("Time")
ax.set_ylabel("Exchange Rate From EUR")

#ax.axvline(x=pd.to_datetime("2010-06-01"), ymin=0.595, ymax=0.72, linewidth=80, color='grey', alpha=0.2)
ax.set_title("On Par EUR to USD and CAD Exchange Rates 2008-2013")

ax2.plot(x_usd['time'],x_usd['rolling_mean'], 
         color='green', alpha=0.2)
ax2.plot(x_cad['time'],x_cad['baselined'], 
         color='red', alpha=0.2)

ax2.plot(sub_usd1['time'], sub_usd1['rolling_mean'], 
         label="USD", 
         color='green', alpha=0.8)
ax2.plot(sub_usd2['time'], sub_usd2['rolling_mean'], 
         color='green', alpha=0.8)

ax2.plot(sub_cad1['time'], sub_cad1['baselined'],
         label="CAD Baselined", 
         color='red', alpha=0.8)
ax2.plot(sub_cad2['time'], sub_cad2['baselined'], 
         color='red', alpha=0.8)

ax2.legend(title="Destination\nCurrency", 
           bbox_to_anchor=(0.515, 0.85), 
           loc="upper center",
           title_fontsize="small", 
           facecolor="white", 
           framealpha=0.8, 
           fontsize="small")

remove = ["top", "right"]
for side in remove :
    ax2.spines[side].set_visible(False)

ax2.set_xlabel("Time")
ax2.set_ylabel("Exchange Rate From EUR")
#ax2.axvline(x=pd.to_datetime("2010-06-01"), 
#            ymin=0.05, ymax=.85, 
#            linewidth=94, 
#            color='grey', alpha=0.2)
ax2.set_title("USD and CAD Exchange Rates Otherwise Trend Together")
ax2.set_yticks([.9,1.1,1.3,1.5])
ax2.set_yticklabels([])

plt.plot()

Out[17]:

[]

Conclusions¶

The EUR - USD rate is typically stronger than the EUR - CAD rate. Nevertheless, the EUR - CAD rate tends to fluctuate in parallel with the EUR - USD. This was visually depicted by moving the entire EUR-CAD plot up to superimpose it over the EUR - USD exchange rates.

An exception to their fluctuations being just in parallel occurred in 2008 when the US Housing Market crashed and during the restabiliztion years after.

For this period the EUR to USD rate significantly dropped to nearly match the EUR to CAD rate with the exception of a short-lived strong rebound in the immediate wake of the crisis.

Areas for Improvement¶

Improve application of Gestalt principles, particularly whether enclosure is needed to isolate the area of interest on the graphs
Figure out why ax.text() method would prevent output and replace .title() with a title using ax.text() method
Add space between the 2 plots
Reconsider how to baseline the CAD data against the USD. Might involve creating USD baseline values
Cut the top off the 2nd plot where only faint points reach these values. There is too much dead space there right now.