Author: Mariya Mansurova, Analyst & developer in Yandex.Metrics team. Translated by Ivan Zakharov, ML enthusiast.
This material is subject to the terms and conditions of the Creative Commons CC BY-NC-SA 4.0 license. Free use is permitted for any non-commercial purpose.
Same assignment as a Kaggle Kernel + solution.
Fill cells marked with "Your code here" and submit your answers to the questions through the web form.
import os
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import requests
from plotly import __version__
from plotly import graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot
print(__version__) # need 1.9.0 or greater
init_notebook_mode(connected=True)
def plotly_df(df, title=""):
data = []
for column in df.columns:
trace = go.Scatter(x=df.index, y=df[column], mode="lines", name=column)
data.append(trace)
layout = dict(title=title)
fig = dict(data=data, layout=layout)
iplot(fig, show_link=False)
df = pd.read_csv("../../data/wiki_machine_learning.csv", sep=" ")
df = df[df["count"] != 0]
df.head()
df.shape
We will train at first 5 months and predict the number of trips for June.
df.date = pd.to_datetime(df.date)
plotly_df(df.set_index("date")[["count"]])
from fbprophet import Prophet
predictions = 30
df = df[["date", "count"]]
df.columns = ["ds", "y"]
df.tail()
Question 1: What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.
# You code here
Estimate the quality of the prediction with the last 30 points.
# You code here
Question 2: What is MAPE equal to?
Question 3: What is MAE equal to?
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats
plt.rcParams["figure.figsize"] = (15, 10)
Question 4: Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?
# You code here
Next, we turn to the construction of the SARIMAX model (sm.tsa.statespace.SARIMAX
).
Question 5: What parameters are the best for the model according to the AIC
criterion?
# You code here