 ## mlcourse.ai – Open Machine Learning Course¶

Author: Mariya Mansurova, Analyst & developer in Yandex.Metrics team. Translated by Ivan Zakharov, ML enthusiast.
This material is subject to the terms and conditions of the Creative Commons CC BY-NC-SA 4.0 license. Free use is permitted for any non-commercial purpose.

# Assignment #9 (demo)

## Time series analysis

Same assignment as a Kaggle Kernel + solution.

Fill cells marked with "Your code here" and submit your answers to the questions through the web form.

In :
import os
import warnings

warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import requests
from plotly import __version__
from plotly import graph_objs as go

print(__version__)  # need 1.9.0 or greater
init_notebook_mode(connected=True)

def plotly_df(df, title=""):
data = []

for column in df.columns:
trace = go.Scatter(x=df.index, y=df[column], mode="lines", name=column)
data.append(trace)

layout = dict(title=title)
fig = dict(data=data, layout=layout)

3.2.1


## Data preparation¶

In :
df = pd.read_csv("../../data/wiki_machine_learning.csv", sep=" ")
df = df[df["count"] != 0]

Out:
date count lang page rank month title
81 2015-01-01 1414 en Machine_learning 8708 201501 Machine_learning
80 2015-01-02 1920 en Machine_learning 8708 201501 Machine_learning
79 2015-01-03 1338 en Machine_learning 8708 201501 Machine_learning
78 2015-01-04 1404 en Machine_learning 8708 201501 Machine_learning
77 2015-01-05 2264 en Machine_learning 8708 201501 Machine_learning
In :
df.shape

Out:
(383, 7)

## Predicting with FB Prophet¶

We will train at first 5 months and predict the number of trips for June.

In :
df.date = pd.to_datetime(df.date)

In :
plotly_df(df.set_index("date")[["count"]])

In :
from fbprophet import Prophet

In :
predictions = 30

df = df[["date", "count"]]
df.columns = ["ds", "y"]
df.tail()

Out:
ds y
382 2016-01-16 1644
381 2016-01-17 1836
376 2016-01-18 2983
375 2016-01-19 3389
372 2016-01-20 3559

Question 1: What is the prediction of the number of views of the wiki page on January 20? Round to the nearest integer.

• 4947
• 3426
• 5229
• 2744
In :
# You code here


Estimate the quality of the prediction with the last 30 points.

In :
# You code here


Question 2: What is MAPE equal to?

• 34.5
• 42.42
• 5.39
• 65.91

Question 3: What is MAE equal to?

• 355
• 4007
• 600
• 903

## Predicting with ARIMA¶

In :
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.api as sm
from scipy import stats

plt.rcParams["figure.figsize"] = (15, 10)


Question 4: Let's verify the stationarity of the series using the Dickey-Fuller test. Is the series stationary? What is the p-value?

• Series is stationary, p_value = 0.107
• Series is not stationary, p_value = 0.107
• Series is stationary, p_value = 0.001
• Series is not stationary, p_value = 0.001
In :
# You code here


Next, we turn to the construction of the SARIMAX model (sm.tsa.statespace.SARIMAX).
Question 5: What parameters are the best for the model according to the AIC criterion?

• D = 1, d = 0, Q = 0, q = 2, P = 3, p = 1
• D = 2, d = 1, Q = 1, q = 2, P = 3, p = 1
• D = 1, d = 1, Q = 1, q = 2, P = 3, p = 1
• D = 0, d = 0, Q = 0, q = 2, P = 3, p = 1
In :
# You code here