#!/usr/bin/env python
# coding: utf-8

# # About this Notebook
# 
# Temporal Regularized Matrix Factorization (TRMF) is an effective tool for imputing missing data within a given multivariate time series and forecasting time series with missing values. This approach is from the following literature:
# 
# > Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. [**Temporal regularized matrix factorization for high-dimensional time series prediction**](http://www.cs.utexas.edu/~rofuyu/papers/tr-mf-nips.pdf). 30th Conference on Neural Information Processing Systems (*NIPS 2016*), Barcelona, Spain.
# 
# **Acknowledgement**: We would like to thank
# 
# - Antony Masso Lussier (HEC Montreal)
# 
# for providing helpful suggestion and discussion. Thank you!

# ## Quick Run
# 
# This notebook is publicly available for any usage at our data imputation project. Please click [**transdim**](https://github.com/xinychen/transdim).
# 

# ## Data Organization: Matrix Structure
# 
# In this post, we consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),
# 
# $$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$
# 

# 
# ## TRMF model
# 
# Temporal Regularized Matrix Factorization (TRMF) is an approach to incorporate temporal dependencies into commonly-used matrix factorization model. The temporal dependencies are described among ${\boldsymbol{x}_t}$ explicitly. Such approach takes the form:
# 
# $$\boldsymbol{x}_{t}\approx\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l},$$
# where this autoregressive (AR) is specialized by a lag set $\mathcal{L}=\left\{l_1,l_2,...,l_d\right\}$ (e.g., $\mathcal{L}=\left\{1,2,144\right\}$) and weights $\boldsymbol{\theta}_{l}\in\mathbb{R}^{r},\forall l$, and we further define
# 
# $$\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)=\frac{1}{2}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{\eta}{2}\sum_{t=1}^{f}\boldsymbol{x}_{t}^\top\boldsymbol{x}_{t}.$$
# 
# Thus, TRMF-AR is given by solving
# 
# $$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\lambda_{w}\underbrace{\mathcal{R}_{w}\left(W\right)}_{W-\text{regularizer}}+\lambda_{x}\underbrace{\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)}_{\text{AR-regularizer}}+\lambda_{\theta}\underbrace{\mathcal{R}_{\theta}\left(\Theta\right)}_{\Theta-\text{regularizer}}$$
# 
# where $\mathcal{R}_{w}\left(W\right)=\frac{1}{2}\sum_{i=1}^{m}\boldsymbol{w}_{i}^\top\boldsymbol{w}_{i}$ and $\mathcal{R}_{\theta}\left(\Theta\right)=\frac{1}{2}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}$ are regularization terms.

# ### Define TRMF model with `Numpy`
# 
# Observing the optimization problem of TRMF model as mentioned above, we categorize the parameters within this model as **parameters** (i.e., `init_para` in the TRMF function) and **hyperparameters** (i.e., `init_hyper`).
# 
# - **Parameters** include spatial matrix $W$, temporal matrix $X$, and AR coefficients $\Theta$.
# - **Hyperparameters** include weight parameters on some regularizers, i.e., $\lambda_w$, $\lambda_x$, $\lambda_\theta$, and $\eta$.

# ### How to understand Python code of TRMF?
# 
# #### Update spatial matrix $W$
# 
# We write Python code for updating spatial matrix as follows,
# 
# ```python
# for i in range(dim1):
#     pos0 = np.where(sparse_mat[i, :] != 0)
#     Xt = X[pos0[0], :]
#     vec0 = Xt.T @ sparse_mat[i, pos0[0]]
#     mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
#     W[i, :] = mat0 @ vec0
# ```
# 
# For your better understanding of these codes, let us see what happened in each line. Recall that the equation for updating $W$ is
# $$\boldsymbol{w}_{i} \Leftarrow\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1} \sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}$$
# from the optimizization problem:
# $$\min _{W} \frac{1}{2} \underbrace{\sum_{(i, t) \in \Omega}\left(y_{i t}-\boldsymbol{w}_{i}^{T} \boldsymbol{x}_{t}\right)^{2}}_{\text {sum of squared residual errors }}+\frac{1}{2} \lambda_{w} \underbrace{\sum_{i=1}^{m} \boldsymbol{w}_{i}^{T} \boldsymbol{w}_{i}}_{\text{sum of squared entries}}.$$
# 
# As can be seen,
# 
# - `vec0 = Xt.T @ sparse_mat[i, pos0[0]])` corresponds to $$\sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}.$$
# 
# - `mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))` corresponds to $$\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1}.$$
# 
# - `W[i, :] = mat0 @ vec0` corresponds to the update:
# $$\boldsymbol{w}_{i} \Leftarrow\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1} \sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}.$$

# #### Update temporal matrix $X$
# 
# We write Python code for updating temporal matrix as follows,
# 
# ```python
# for t in range(dim2):
#     pos0 = np.where(sparse_mat[:, t] != 0)
#     Wt = W[pos0[0], :]
#     Mt = np.zeros((rank, rank))
#     Nt = np.zeros(rank)
#     if t < np.max(time_lags):
#         Pt = np.zeros((rank, rank))
#         Qt = np.zeros(rank)
#     else:
#         Pt = np.eye(rank)
#         Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
#     if t < dim2 - np.min(time_lags):
#         if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
#             index = list(range(0, d))
#         else:
#             index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
#         for k in index:
#             Ak = theta[k, :]
#             Mt += np.diag(Ak ** 2)
#             theta0 = theta.copy()
#             theta0[k, :] = 0
#             Nt += np.multiply(Ak, X[t + time_lags[k], :]
#                               - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
#     vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
#     mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
#     X[t, :] = mat0 @ vec0
# ```

# These codes seem to be very complicated. Let us first see the optimization problem for getting a closed-form update of $X$:
# $$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\underbrace{\frac{1}{2}\lambda_{x}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{1}{2}\lambda_{x}\eta\sum_{t=1}^{f}\boldsymbol{x}_{t}^\top\boldsymbol{x}_{t}}_{\text{AR-term}}+\underbrace{\frac{1}{2}\lambda_{\theta}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}}_{\Theta-\text{term}}.$$
# 
# - For $t=1,...,l_d$, update of $X$ is
# $$\boldsymbol{x}_{t} \Leftarrow\left(\sum_{i:(i, t) \in \Omega} \boldsymbol{w}_{i} \boldsymbol{w}_{i}^{T}+\lambda_{x} \eta I\right)^{-1} \sum_{i:(i, t) \in \Omega} y_{i t} \boldsymbol{w}_{i}.$$
# - For $t=l_d+1,...,f$, update of $X$ is
# $${\boldsymbol{x}_{t}\Leftarrow\left(\sum_{i:(i,t)\in\Omega}\boldsymbol{w}_{i}\boldsymbol{w}_{i}^{T}+\lambda_xI+\lambda_x\sum_{h\in\mathcal{L},t+h \leq T}\text{diag}(\boldsymbol{\theta}_{h}\circledast\boldsymbol{\theta}_{h})+\lambda_x\eta I\right)^{-1}}{\left(\sum_{i:(i,t)\in\Omega}y_{it}\boldsymbol{w}_{i}+\lambda_x\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}+\lambda_x\sum_{h\in\mathcal{L},t+h \leq T}\boldsymbol{\theta}_{h}\circledast\boldsymbol{\psi}_{t+h}\right)}.$$
# 
# Then, as can be seen,
# 
# - `Mt += np.diag(Ak ** 2)` corresponds to $$\sum_{h\in\mathcal{L},t+h \leq T}\text{diag}(\boldsymbol{\theta}_{h}\circledast\boldsymbol{\theta}_{h}).$$
# 
# - `Nt += np.multiply(Ak, X[t + time_lags[k], :] - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))` corresponds to $$\sum_{h\in\mathcal{L},t+h \leq T}\boldsymbol{\theta}_{h}\circledast\boldsymbol{\psi}_{t+h}.$$
# 
# - `Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])` corresponds to $$\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}.$$
# -`X[t, :] = mat0 @ vec0` corresponds to the update of $X$.

# #### Update AR coefficients $\Theta$
# 
# We write Python code for updating temporal matrix as follows,
# 
# ```python
# for k in range(d):
#     theta0 = theta.copy()
#     theta0[k, :] = 0
#     mat0 = np.zeros((dim2 - np.max(time_lags), rank))
#     for L in range(d):
#         mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
#     VarPi = X[np.max(time_lags) : dim2, :] - mat0
#     var1 = np.zeros((rank, rank))
#     var2 = np.zeros(rank)
#     for t in range(np.max(time_lags), dim2):
#         B = X[t - time_lags[k], :]
#         var1 += np.diag(np.multiply(B, B))
#         var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
#     theta[k, :] = inv(var1 + lambda_theta * np.eye(rank) / lambda_x) @ var2
# ```
# 
# For your better understanding of these codes, let us see what happened in each line. Recall that the equation for updating $\theta$ is
# $$
# \color{red} {\boldsymbol{\theta}_{h}\Leftarrow\left(\sum_{t=l_d+1}^{f}\text{diag}(\boldsymbol{x}_{t-h}\circledast \boldsymbol{x}_{t-h})+\frac{\lambda_{\theta}}{\lambda_x}I\right)^{-1}\left(\sum_{t=l_d+1}^{f}{\boldsymbol{\pi}_{t}^{h}}\circledast \boldsymbol{x}_{t-h}\right)}
# $$
# where $\boldsymbol{\pi}_{t}^{h}=\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L},l\neq h}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$ from the optimizization problem:
# $$
# \min_{\Theta}\frac{1}{2}\lambda_{x}\underbrace{\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)}_{\text{sum of squared residual errors}}+\frac{1}{2}\lambda_{\theta}\underbrace{\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}}_{\text{sum of squared entries}}
# $$
# 
# As can be seen,
# - `mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])` corresponds to $$\sum_{l\in\mathcal{L},l\neq h}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$$.
# 
# - `var1 += np.diag(np.multiply(B, B))` corresponds to $$\sum_{t=l_d+1}^{f}\text{diag}(\boldsymbol{x}_{t-h}\circledast \boldsymbol{x}_{t-h}).$$
# 
# - `var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]` corresponds to $$\sum_{t=l_d+1}^{f}{\boldsymbol{\pi}_{t}^{h}}\circledast \boldsymbol{x}_{t-h}.$$

# In[1]:


import numpy as np
from numpy.linalg import inv as inv


def TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter):
    """Temporal Regularized Matrix Factorization, TRMF."""
    
    ## Initialize parameters
    W = init_para["W"]
    X = init_para["X"]
    theta = init_para["theta"]
    
    ## Set hyperparameters
    lambda_w = init_hyper["lambda_w"]
    lambda_x = init_hyper["lambda_x"]
    lambda_theta = init_hyper["lambda_theta"]
    eta = init_hyper["eta"]
    
    dim1, dim2 = sparse_mat.shape
    pos_train = np.where(sparse_mat != 0)
    pos_test = np.where((dense_mat != 0) & (sparse_mat == 0))
    binary_mat = sparse_mat.copy()
    binary_mat[pos_train] = 1
    d, rank = theta.shape
    
    for it in range(maxiter):
        ## Update spatial matrix W
        for i in range(dim1):
            pos0 = np.where(sparse_mat[i, :] != 0)
            Xt = X[pos0[0], :]
            vec0 = Xt.T @ sparse_mat[i, pos0[0]]
            mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
            W[i, :] = mat0 @ vec0
        ## Update temporal matrix X
        for t in range(dim2):
            pos0 = np.where(sparse_mat[:, t] != 0)
            Wt = W[pos0[0], :]
            Mt = np.zeros((rank, rank))
            Nt = np.zeros(rank)
            if t < np.max(time_lags):
                Pt = np.zeros((rank, rank))
                Qt = np.zeros(rank)
            else:
                Pt = np.eye(rank)
                Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
            if t < dim2 - np.min(time_lags):
                if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
                    index = list(range(0, d))
                else:
                    index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
                for k in index:
                    Ak = theta[k, :]
                    Mt += np.diag(Ak ** 2)
                    theta0 = theta.copy()
                    theta0[k, :] = 0
                    Nt += np.multiply(Ak, X[t + time_lags[k], :]
                                      - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
            vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
            mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
            X[t, :] = mat0 @ vec0
        ## Update AR coefficients theta
        for k in range(d):
            theta0 = theta.copy()
            theta0[k, :] = 0
            mat0 = np.zeros((dim2 - np.max(time_lags), rank))
            for L in range(d):
                mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
            VarPi = X[np.max(time_lags) : dim2, :] - mat0
            var1 = np.zeros((rank, rank))
            var2 = np.zeros(rank)
            for t in range(np.max(time_lags), dim2):
                B = X[t - time_lags[k], :]
                var1 += np.diag(np.multiply(B, B))
                var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
            theta[k, :] = inv(var1 + lambda_theta * np.eye(rank) / lambda_x) @ var2

        mat_hat = W @ X.T
        mape = np.sum(np.abs(dense_mat[pos_test] - mat_hat[pos_test]) 
                      / dense_mat[pos_test]) / dense_mat[pos_test].shape[0]
        rmse = np.sqrt(np.sum((dense_mat[pos_test] - mat_hat[pos_test]) ** 2)/dense_mat[pos_test].shape[0])
        
        if (it + 1) % 100 == 0:
            print('Iter: {}'.format(it + 1))
            print('Imputation MAPE: {:.6}'.format(mape))
            print('Imputation RMSE: {:.6}'.format(rmse))
            print()
    
    return mat_hat


# ## Missing Data Imputation
# 
# In the following, we apply the above defined TRMF function to the task of missing data imputation task on the following spatiotemporal multivariate time series datasets/matrices:
# 
# - **Guangzhou data set**: [Guangzhou urban traffic speed data set](https://doi.org/10.5281/zenodo.1205228).
# - **Birmingham data set**: [Birmingham parking data set](https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham).
# - **Hangzhou data set**: [Hangzhou metro passenger flow data set](https://doi.org/10.5281/zenodo.3145403).
# - **Settle data set**: [Seattle freeway traffic speed data set](https://github.com/zhiyongc/Seattle-Loop-Data).
# 
# The original data sets have been adapted into our experiments, and it is now available at the fold of `datasets`.

# ## Evaluation on Guangzhou Speed Data
# 
# **Scenario setting**:
# 
# - Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
# - Non-random missing (NM)
# - 40% missing rate
# 

# In[2]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 10
# - Time lags: {1, 2, 144}
# - The number of iterations: 200

# In[3]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 144])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# **Scenario setting**:
# 
# - Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
# - Random missing (RM)
# - 40% missing rate
# 

# In[4]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 80
# - Time lags: {1, 2, 144}
# - The number of iterations: 200

# In[5]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 80
time_lags = np.array([1, 2, 144])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# **Scenario setting**:
# 
# - Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
# - Random missing (RM)
# - 60% missing rate
# 

# In[6]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 80
# - Time lags: {1, 2, 144}
# - The number of iterations: 200

# In[7]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 80
time_lags = np.array([1, 2, 144])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# ## Evaluation on Hangzhou Flow Data
# 
# **Scenario setting**:
# 
# - Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
# - Non-random missing (NM)
# - 40% missing rate
# 

# In[10]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 30
# - Time lags: {1, 2, 108}
# - The number of iterations: 200

# In[11]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 30
time_lags = np.array([1, 2, 108])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# **Scenario setting**:
# 
# - Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
# - Random missing (RM)
# - 40% missing rate
# 

# In[12]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 30
# - Time lags: {1, 2, 108}
# - The number of iterations: 200

# In[13]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 30
time_lags = np.array([1, 2, 108])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# **Scenario setting**:
# 
# - Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
# - Random missing (RM)
# - 60% missing rate
# 

# In[14]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 30
# - Time lags: {1, 2, 108}
# - The number of iterations: 200

# In[15]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 30
time_lags = np.array([1, 2, 108])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# ## Evaluation on Seattle Speed Data
# 
# **Scenario setting**:
# 
# - Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
# - Non-random missing (NM)
# - 40% missing rate
# 

# In[16]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 10
# - Time lags: {1, 2, 288}
# - The number of iterations: 200

# In[17]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 288])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# **Scenario setting**:
# 
# - Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
# - Random missing (RM)
# - 40% missing rate
# 

# In[18]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 50
# - Time lags: {1, 2, 288}
# - The number of iterations: 200

# In[19]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 50
time_lags = np.array([1, 2, 288])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# **Scenario setting**:
# 
# - Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
# - Random missing (RM)
# - 60% missing rate
# 

# In[20]:


import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor


# **Model setting**:
# 
# - Low rank: 50
# - Time lags: {1, 2, 288}
# - The number of iterations: 200

# In[21]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 50
time_lags = np.array([1, 2, 288])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# ## Evaluation on London Movement Speed Data
# 
# London movement speed data set is is a city-wide hourly traffic speeddataset collected in London.
# 
# - Collected from 200,000+ road segments.
# - 720 time points in April 2019.
# - 73% missing values in the original data.
# 
# |  Observation rate | $>90\%$ | $>80\%$ | $>70\%$ | $>60\%$ | $>50\%$ |
# |:------------------|--------:|--------:|--------:|--------:|--------:|
# |**Number of roads**|  17,666 |  27,148 |  35,912 |  44,352 |  52,727 |
# 
# 
# If want to test on the full dataset, you could consider the following setting for masking observations as missing values. 
# 
# ```python
# import numpy as np
# np.random.seed(1000)
# mask_rate = 0.20
# 
# dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
# pos_obs = np.where(dense_mat != 0)
# num = len(pos_obs[0])
# sample_ind = np.random.choice(num, size = int(mask_rate * num), replace = False)
# sparse_mat = dense_mat.copy()
# sparse_mat[pos_obs[0][sample_ind], pos_obs[1][sample_ind]] = 0
# ```
# 
# Notably, you could also consider to evaluate the model on a subset of the data with the following setting.

# In[22]:


import numpy as np
np.random.seed(1000)

missing_rate = 0.4

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Non-random missing (NM)
binary_mat = np.zeros(dense_mat.shape)
random_mat = np.random.rand(dense_mat.shape[0], 30)
for i1 in range(dense_mat.shape[0]):
    for i2 in range(30):
        binary_mat[i1, i2 * 24 : (i2 + 1) * 24] = np.round(random_mat[i1, i2] + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat


# **Model setting**:
# 
# - Low rank: 20
# - Time lags: {1, 2, 24}
# - The number of iterations: 200

# In[23]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 20
time_lags = np.array([1, 2, 24])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# In[24]:


import numpy as np
np.random.seed(1000)

missing_rate = 0.4

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat


# **Model setting**:
# 
# - Low rank: 20
# - Time lags: {1, 2, 24}
# - The number of iterations: 200

# In[25]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 20
time_lags = np.array([1, 2, 24])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# In[26]:


import numpy as np
np.random.seed(1000)

missing_rate = 0.6

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat


# **Model setting**:
# 
# - Low rank: 20
# - Time lags: {1, 2, 24}
# - The number of iterations: 200

# In[27]:


import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 20
time_lags = np.array([1, 2, 24])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))


# ### License
# 
# <div class="alert alert-block alert-danger">
# <b>This work is released under the MIT license.</b>
# </div>