This notebook explains how to create a moving average in pandas
.
This notebook will use gold and silver price data from rdatasets
for this tutorial
The documentation for each package used in this tutorial is linked below:
import statsmodels.api as sm
import pandas as pd
The data is from rdatasets
imported using the Python package statsmodels
.
df = sm.datasets.get_rdataset('GoldSilver', 'AER').data.reset_index().rename(columns={'index': 'date'})
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 9132 entries, 0 to 9131 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 date 9132 non-null object 1 gold 9132 non-null float64 2 silver 9132 non-null float64 dtypes: float64(2), object(1) memory usage: 214.2+ KB
df['date'] = pd.to_datetime(df.date)
To create a moving average, a rolling window first needs to be created using the pandas
function rolling
. Then any aggregation function, sum
, mean
, std
, etc.
df.sort_values('date', inplace=True)
df['silver_moving_average_5'] = df['silver'].rolling(5).mean()
df.head(10).tail()
date | gold | silver | silver_moving_average_5 | |
---|---|---|---|---|
5 | 1978-01-06 | 100.00 | 227.19 | 226.604 |
6 | 1978-01-09 | 101.23 | 229.62 | 227.844 |
7 | 1978-01-10 | 100.95 | 228.97 | 227.670 |
8 | 1978-01-11 | 102.25 | 231.22 | 228.998 |
9 | 1978-01-12 | 100.88 | 227.89 | 228.978 |
Alternatively, a rolling window could be created and multiple aggregations applied to it.
rolling_5 = df.rolling(5)
df['silver_moving_average_5_two'] = rolling_5['silver'].mean()
df['silver_moving_sum_5'] = rolling_5['silver'].sum()
df.head(10).tail()
date | gold | silver | silver_moving_average_5 | silver_moving_average_5_two | silver_moving_sum_5 | |
---|---|---|---|---|---|---|
5 | 1978-01-06 | 100.00 | 227.19 | 226.604 | 226.604 | 1133.02 |
6 | 1978-01-09 | 101.23 | 229.62 | 227.844 | 227.844 | 1139.22 |
7 | 1978-01-10 | 100.95 | 228.97 | 227.670 | 227.670 | 1138.35 |
8 | 1978-01-11 | 102.25 | 231.22 | 228.998 | 228.998 | 1144.99 |
9 | 1978-01-12 | 100.88 | 227.89 | 228.978 | 228.978 | 1144.89 |
This creates a moving average based on the last five observations, but rolling
can also take an offset to specify the time to use in the rolling window. For example, 5D can be used as an offset for 5 days.
First, a datetime index must be created from date.
df.set_index('date', inplace=True)
rolling_5d = df.rolling('5D')
df['silver_moving_average_5D'] = rolling_5d['silver'].mean()
df['silver_moving_sum_5D'] = rolling_5d['silver'].sum()
df.head(10).tail()
gold | silver | silver_moving_average_5 | silver_moving_average_5_two | silver_moving_sum_5 | silver_moving_average_5D | silver_moving_sum_5D | |
---|---|---|---|---|---|---|---|
date | |||||||
1978-01-06 | 100.00 | 227.19 | 226.604 | 226.604 | 1133.02 | 226.604000 | 1133.02 |
1978-01-09 | 101.23 | 229.62 | 227.844 | 227.844 | 1139.22 | 228.266667 | 684.80 |
1978-01-10 | 100.95 | 228.97 | 227.670 | 227.670 | 1138.35 | 228.593333 | 685.78 |
1978-01-11 | 102.25 | 231.22 | 228.998 | 228.998 | 1144.99 | 229.936667 | 689.81 |
1978-01-12 | 100.88 | 227.89 | 228.978 | 228.978 | 1144.89 | 229.425000 | 917.70 |