Temporal Regularized Matrix Factorization (TRMF) is an effective tool for imputing missing data within a given multivariate time series and forecasting time series with missing values. This approach is from the following literature:
Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
Acknowledgement: We would like to thank
for providing helpful suggestion and discussion. Thank you!
In this post, we consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),
$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$Temporal Regularized Matrix Factorization (TRMF) is an approach to incorporate temporal dependencies into commonly-used matrix factorization model. The temporal dependencies are described among ${\boldsymbol{x}_t}$ explicitly. Such approach takes the form:
$$\boldsymbol{x}_{t}\approx\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l},$$where this autoregressive (AR) is specialized by a lag set $\mathcal{L}=\left\{l_1,l_2,...,l_d\right\}$ (e.g., $\mathcal{L}=\left\{1,2,144\right\}$) and weights $\boldsymbol{\theta}_{l}\in\mathbb{R}^{r},\forall l$, and we further define
$$\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)=\frac{1}{2}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{\eta}{2}\sum_{t=1}^{f}\boldsymbol{x}_{t}^\top\boldsymbol{x}_{t}.$$Thus, TRMF-AR is given by solving
$$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\lambda_{w}\underbrace{\mathcal{R}_{w}\left(W\right)}_{W-\text{regularizer}}+\lambda_{x}\underbrace{\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)}_{\text{AR-regularizer}}+\lambda_{\theta}\underbrace{\mathcal{R}_{\theta}\left(\Theta\right)}_{\Theta-\text{regularizer}}$$where $\mathcal{R}_{w}\left(W\right)=\frac{1}{2}\sum_{i=1}^{m}\boldsymbol{w}_{i}^\top\boldsymbol{w}_{i}$ and $\mathcal{R}_{\theta}\left(\Theta\right)=\frac{1}{2}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}$ are regularization terms.
Numpy
¶Observing the optimization problem of TRMF model as mentioned above, we categorize the parameters within this model as parameters (i.e., init_para
in the TRMF function) and hyperparameters (i.e., init_hyper
).
We write Python code for updating spatial matrix as follows,
for i in range(dim1):
pos0 = np.where(sparse_mat[i, :] != 0)
Xt = X[pos0[0], :]
vec0 = Xt.T @ sparse_mat[i, pos0[0]]
mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
W[i, :] = mat0 @ vec0
For your better understanding of these codes, let us see what happened in each line. Recall that the equation for updating $W$ is $$\boldsymbol{w}_{i} \Leftarrow\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1} \sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}$$ from the optimizization problem: $$\min _{W} \frac{1}{2} \underbrace{\sum_{(i, t) \in \Omega}\left(y_{i t}-\boldsymbol{w}_{i}^{T} \boldsymbol{x}_{t}\right)^{2}}_{\text {sum of squared residual errors }}+\frac{1}{2} \lambda_{w} \underbrace{\sum_{i=1}^{m} \boldsymbol{w}_{i}^{T} \boldsymbol{w}_{i}}_{\text{sum of squared entries}}.$$
As can be seen,
vec0 = Xt.T @ sparse_mat[i, pos0[0]])
corresponds to $$\sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}.$$
mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
corresponds to $$\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1}.$$
W[i, :] = mat0 @ vec0
corresponds to the update:
We write Python code for updating temporal matrix as follows,
for t in range(dim2):
pos0 = np.where(sparse_mat[:, t] != 0)
Wt = W[pos0[0], :]
Mt = np.zeros((rank, rank))
Nt = np.zeros(rank)
if t < np.max(time_lags):
Pt = np.zeros((rank, rank))
Qt = np.zeros(rank)
else:
Pt = np.eye(rank)
Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
if t < dim2 - np.min(time_lags):
if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
index = list(range(0, d))
else:
index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
for k in index:
Ak = theta[k, :]
Mt += np.diag(Ak ** 2)
theta0 = theta.copy()
theta0[k, :] = 0
Nt += np.multiply(Ak, X[t + time_lags[k], :]
- np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
X[t, :] = mat0 @ vec0
These codes seem to be very complicated. Let us first see the optimization problem for getting a closed-form update of $X$: $$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\underbrace{\frac{1}{2}\lambda_{x}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{1}{2}\lambda_{x}\eta\sum_{t=1}^{f}\boldsymbol{x}_{t}^\top\boldsymbol{x}_{t}}_{\text{AR-term}}+\underbrace{\frac{1}{2}\lambda_{\theta}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}}_{\Theta-\text{term}}.$$
Then, as can be seen,
Mt += np.diag(Ak ** 2)
corresponds to $$\sum_{h\in\mathcal{L},t+h \leq T}\text{diag}(\boldsymbol{\theta}_{h}\circledast\boldsymbol{\theta}_{h}).$$
Nt += np.multiply(Ak, X[t + time_lags[k], :] - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
corresponds to $$\sum_{h\in\mathcal{L},t+h \leq T}\boldsymbol{\theta}_{h}\circledast\boldsymbol{\psi}_{t+h}.$$
Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
corresponds to $$\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}.$$
-X[t, :] = mat0 @ vec0
corresponds to the update of $X$.
We write Python code for updating temporal matrix as follows,
for k in range(d):
theta0 = theta.copy()
theta0[k, :] = 0
mat0 = np.zeros((dim2 - np.max(time_lags), rank))
for L in range(d):
mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
VarPi = X[np.max(time_lags) : dim2, :] - mat0
var1 = np.zeros((rank, rank))
var2 = np.zeros(rank)
for t in range(np.max(time_lags), dim2):
B = X[t - time_lags[k], :]
var1 += np.diag(np.multiply(B, B))
var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
theta[k, :] = inv(var1 + lambda_theta * np.eye(rank) / lambda_x) @ var2
For your better understanding of these codes, let us see what happened in each line. Recall that the equation for updating $\theta$ is $$ \color{red} {\boldsymbol{\theta}_{h}\Leftarrow\left(\sum_{t=l_d+1}^{f}\text{diag}(\boldsymbol{x}_{t-h}\circledast \boldsymbol{x}_{t-h})+\frac{\lambda_{\theta}}{\lambda_x}I\right)^{-1}\left(\sum_{t=l_d+1}^{f}{\boldsymbol{\pi}_{t}^{h}}\circledast \boldsymbol{x}_{t-h}\right)} $$ where $\boldsymbol{\pi}_{t}^{h}=\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L},l\neq h}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$ from the optimizization problem: $$ \min_{\Theta}\frac{1}{2}\lambda_{x}\underbrace{\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)}_{\text{sum of squared residual errors}}+\frac{1}{2}\lambda_{\theta}\underbrace{\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}}_{\text{sum of squared entries}} $$
As can be seen,
mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
corresponds to $$\sum_{l\in\mathcal{L},l\neq h}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$$.
var1 += np.diag(np.multiply(B, B))
corresponds to $$\sum_{t=l_d+1}^{f}\text{diag}(\boldsymbol{x}_{t-h}\circledast \boldsymbol{x}_{t-h}).$$
var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
corresponds to $$\sum_{t=l_d+1}^{f}{\boldsymbol{\pi}_{t}^{h}}\circledast \boldsymbol{x}_{t-h}.$$
def ar4cast(theta, X, time_lags, multi_step):
dim, rank = X.shape
d = time_lags.shape[0]
X_new = np.append(X, np.zeros((multi_step, rank)), axis = 0)
for t in range(multi_step):
X_new[dim + t, :] = np.einsum('kr, kr -> r', theta, X_new[dim + t - time_lags, :])
return X_new
def compute_mape(var, var_hat):
return np.sum(np.abs(var - var_hat) / var) / var.shape[0]
def compute_rmse(var, var_hat):
return np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])
import numpy as np
from numpy.linalg import inv as inv
def TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter):
"""Temporal Regularized Matrix Factorization, TRMF."""
## Initialize parameters
W = init_para["W"]
X = init_para["X"]
theta = init_para["theta"]
## Set hyperparameters
lambda_w = init_hyper["lambda_w"]
lambda_x = init_hyper["lambda_x"]
lambda_theta = init_hyper["lambda_theta"]
eta = init_hyper["eta"]
dim1, dim2 = sparse_mat.shape
pos_train = np.where(sparse_mat != 0)
pos_test = np.where((dense_mat != 0) & (sparse_mat == 0))
binary_mat = sparse_mat.copy()
binary_mat[pos_train] = 1
d, rank = theta.shape
for it in range(maxiter):
## Update spatial matrix W
for i in range(dim1):
pos0 = np.where(sparse_mat[i, :] != 0)
Xt = X[pos0[0], :]
vec0 = Xt.T @ sparse_mat[i, pos0[0]]
mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
W[i, :] = mat0 @ vec0
## Update temporal matrix X
for t in range(dim2):
pos0 = np.where(sparse_mat[:, t] != 0)
Wt = W[pos0[0], :]
Mt = np.zeros((rank, rank))
Nt = np.zeros(rank)
if t < np.max(time_lags):
Pt = np.zeros((rank, rank))
Qt = np.zeros(rank)
else:
Pt = np.eye(rank)
Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
if t < dim2 - np.min(time_lags):
if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
index = list(range(0, d))
else:
index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
for k in index:
Ak = theta[k, :]
Mt += np.diag(Ak ** 2)
theta0 = theta.copy()
theta0[k, :] = 0
Nt += np.multiply(Ak, X[t + time_lags[k], :]
- np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
X[t, :] = mat0 @ vec0
## Update AR coefficients theta
for k in range(d):
theta0 = theta.copy()
theta0[k, :] = 0
mat0 = np.zeros((dim2 - np.max(time_lags), rank))
for L in range(d):
mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
VarPi = X[np.max(time_lags) : dim2, :] - mat0
var1 = np.zeros((rank, rank))
var2 = np.zeros(rank)
for t in range(np.max(time_lags), dim2):
B = X[t - time_lags[k], :]
var1 += np.diag(np.multiply(B, B))
var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
theta[k, :] = inv(var1 + lambda_theta * np.eye(rank) / lambda_x) @ var2
X_new = ar4cast(theta, X, time_lags, multi_step)
mat_new = W @ X_new[- multi_step :, :].T
mat_hat = W @ X.T
mape = np.sum(np.abs(dense_mat[pos_test] - mat_hat[pos_test])
/ dense_mat[pos_test]) / dense_mat[pos_test].shape[0]
rmse = np.sqrt(np.sum((dense_mat[pos_test] - mat_hat[pos_test]) ** 2)/dense_mat[pos_test].shape[0])
if (it + 1) % 100 == 0:
print('Iter: {}'.format(it + 1))
print('Imputation MAPE: {:.6}'.format(mape))
print('Imputation RMSE: {:.6}'.format(rmse))
print()
mat_hat = np.append(mat_hat, mat_new, axis = 1)
return mat_hat, W, X_new, theta
def update_x_partial(sparse_mat, W, X, theta, lambda_x, eta, time_lags, back_step):
dim2, rank = X.shape
tmax = np.max(time_lags)
for t in range(dim2 - back_step, dim2):
pos0 = np.where(sparse_mat[:, t] != 0)
Wt = W[pos0[0], :]
Mt = np.zeros((rank, rank))
Nt = np.zeros(rank)
if t < tmax:
Pt = np.zeros((rank, rank))
Qt = np.zeros(rank)
else:
Pt = np.eye(rank)
Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
if t < dim2 - np.min(time_lags):
if t >= tmax and t < dim2 - tmax:
index = list(range(0, d))
else:
index = list(np.where((t + time_lags >= tmax) & (t + time_lags < dim2)))[0]
for k in index:
Ak = theta[k, :]
Mt += np.diag(Ak ** 2)
theta0 = theta.copy()
theta0[k, :] = 0
Nt += np.multiply(Ak, X[t + time_lags[k], :]
- np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
X[t, :] = mat0 @ vec0
return X
def TRMF_partial(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter):
## Initialize parameters
W = init_para["W"]
X = init_para["X"]
theta = init_para["theta"]
## Set hyperparameters
lambda_x = init_hyper["lambda_x"]
eta = init_hyper["eta"]
back_step = 10 * multi_step
for it in range(maxiter):
X = update_x_partial(sparse_mat, W, X, theta, lambda_x, eta, time_lags, back_step)
X_new = ar4cast(theta, X, time_lags, multi_step)
mat_hat = W @ X_new[- multi_step :, :].T
mat_hat[mat_hat < 0] = 0
return mat_hat, W, X_new, theta
from ipywidgets import IntProgress
from IPython.display import display
def TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter):
dim1, T = dense_mat.shape
d = time_lags.shape[0]
start_time = T - pred_step
max_count = int(np.ceil(pred_step / multi_step))
mat_hat = np.zeros((dim1, max_count * multi_step))
f = IntProgress(min = 0, max = max_count) # instantiate the bar
display(f) # display the bar
for t in range(max_count):
if t == 0:
init_para = {"W": 0.1 * np.random.randn(dim1, rank),
"X": 0.1 * np.random.randn(start_time, rank),
"theta": 0.1 * np.random.randn(d, rank)}
mat, W, X_new, theta = TRMF(dense_mat[:, 0 : start_time], sparse_mat[:, 0 : start_time],
init_para, init_hyper, time_lags, maxiter)
else:
init_para = {"W": W, "X": X_new, "theta": theta}
mat, W, X_new, theta = TRMF_partial(dense_mat[:, 0 : start_time + t * multi_step],
sparse_mat[:, 0 : start_time + t * multi_step],
init_para, init_hyper, time_lags, maxiter)
mat_hat[:, t * multi_step : (t + 1) * multi_step] = mat[:, - multi_step :]
f.value = t
small_dense_mat = dense_mat[:, start_time : T]
pos = np.where(small_dense_mat != 0)
print('Prediction MAPE: {:.6}'.format(compute_mape(small_dense_mat[pos], mat_hat[pos])))
print('Prediction RMSE: {:.6}'.format(compute_rmse(small_dense_mat[pos], mat_hat[pos])))
print()
return mat_hat
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 144
time_lags = np.array([1, 2, 3, 144, 145, 146, 7 * 144, 7 * 144 + 1, 7 * 144 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: 0.115188 Imputation RMSE: 4.6092 Iter: 200 Imputation MAPE: 0.115174 Imputation RMSE: 4.60879 Prediction MAPE: 0.127976 Prediction RMSE: 4.87849 Running time: 1011 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=252)
Iter: 100 Imputation MAPE: 0.115176 Imputation RMSE: 4.60879 Iter: 200 Imputation MAPE: 0.115166 Imputation RMSE: 4.60855 Prediction MAPE: 0.128447 Prediction RMSE: 4.88498 Running time: 1033 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=168)
Iter: 100 Imputation MAPE: 0.115173 Imputation RMSE: 4.60868 Iter: 200 Imputation MAPE: 0.115167 Imputation RMSE: 4.60854 Prediction MAPE: 0.133175 Prediction RMSE: 5.11059 Running time: 991 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 144
time_lags = np.array([1, 2, 3, 144, 145, 146, 7 * 144, 7 * 144 + 1, 7 * 144 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: 0.113995 Imputation RMSE: 4.56227 Iter: 200 Imputation MAPE: 0.113972 Imputation RMSE: 4.5614 Prediction MAPE: 0.127169 Prediction RMSE: 4.83963 Running time: 1064 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=252)
Iter: 100 Imputation MAPE: 0.113994 Imputation RMSE: 4.56227 Iter: 200 Imputation MAPE: 0.11398 Imputation RMSE: 4.56178 Prediction MAPE: 0.12824 Prediction RMSE: 4.85103 Running time: 1061 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=168)
Iter: 100 Imputation MAPE: 0.113997 Imputation RMSE: 4.56235 Iter: 200 Imputation MAPE: 0.113969 Imputation RMSE: 4.56124 Prediction MAPE: 0.134086 Prediction RMSE: 5.14488 Running time: 982 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 144
time_lags = np.array([1, 2, 3, 144, 145, 146, 7 * 144, 7 * 144 + 1, 7 * 144 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: 0.125244 Imputation RMSE: 4.92197 Iter: 200 Imputation MAPE: 0.125242 Imputation RMSE: 4.92202 Prediction MAPE: 0.138505 Prediction RMSE: 5.18799 Running time: 978 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=252)
Iter: 100 Imputation MAPE: 0.125256 Imputation RMSE: 4.92244 Iter: 200 Imputation MAPE: 0.125242 Imputation RMSE: 4.92198 Prediction MAPE: 0.139365 Prediction RMSE: 5.18961 Running time: 979 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=168)
Iter: 100 Imputation MAPE: 0.125255 Imputation RMSE: 4.92228 Iter: 200 Imputation MAPE: 0.125252 Imputation RMSE: 4.92228 Prediction MAPE: 0.142974 Prediction RMSE: 5.32899 Running time: 981 seconds
import scipy.io
import warnings
warnings.simplefilter('ignore')
tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
sparse_mat = dense_mat.copy()
import time
start = time.time()
rank = 10
pred_step = 7 * 144
time_lags = np.array([1, 2, 3, 144, 145, 146, 7 * 144, 7 * 144 + 1, 7 * 144 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.115997 Prediction RMSE: 4.54619 Running time: 998 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=252)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.119562 Prediction RMSE: 4.61194 Running time: 1013 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=168)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.122818 Prediction RMSE: 4.90498 Running time: 1002 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 108
time_lags = np.array([1, 2, 3, 108, 109, 110, 7 * 108, 7 * 108 + 1, 7 * 108 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=378)
Iter: 100 Imputation MAPE: 0.251544 Imputation RMSE: 57.7424 Iter: 200 Imputation MAPE: 0.252459 Imputation RMSE: 57.1047 Prediction MAPE: 0.255831 Prediction RMSE: 38.6273 Running time: 367 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=189)
Iter: 100 Imputation MAPE: 0.251771 Imputation RMSE: 57.7999 Iter: 200 Imputation MAPE: 0.252862 Imputation RMSE: 57.1221 Prediction MAPE: 0.28039 Prediction RMSE: 40.4237 Running time: 366 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=126)
Iter: 100 Imputation MAPE: 0.251875 Imputation RMSE: 57.5126 Iter: 200 Imputation MAPE: 0.25278 Imputation RMSE: 57.1189 Prediction MAPE: 0.276867 Prediction RMSE: 42.6404 Running time: 365 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 108
time_lags = np.array([1, 2, 3, 108, 109, 110, 7 * 108, 7 * 108 + 1, 7 * 108 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=378)
Iter: 100 Imputation MAPE: 0.231354 Imputation RMSE: 50.8092 Iter: 200 Imputation MAPE: 0.231429 Imputation RMSE: 50.6602 Prediction MAPE: 0.238024 Prediction RMSE: 35.7663 Running time: 365 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=189)
Iter: 100 Imputation MAPE: 0.231151 Imputation RMSE: 50.6426 Iter: 200 Imputation MAPE: 0.231282 Imputation RMSE: 50.5112 Prediction MAPE: 0.273416 Prediction RMSE: 38.3588 Running time: 371 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=126)
Iter: 100 Imputation MAPE: 0.230971 Imputation RMSE: 50.5664 Iter: 200 Imputation MAPE: 0.231013 Imputation RMSE: 50.4308 Prediction MAPE: 0.273411 Prediction RMSE: 40.3949 Running time: 368 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 108
time_lags = np.array([1, 2, 3, 108, 109, 110, 7 * 108, 7 * 108 + 1, 7 * 108 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=378)
Iter: 100 Imputation MAPE: 0.245918 Imputation RMSE: 58.3018 Iter: 200 Imputation MAPE: 0.245755 Imputation RMSE: 58.3847 Prediction MAPE: 0.258238 Prediction RMSE: 41.9281 Running time: 368 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=189)
Iter: 100 Imputation MAPE: 0.245739 Imputation RMSE: 58.1084 Iter: 200 Imputation MAPE: 0.24575 Imputation RMSE: 58.2141 Prediction MAPE: 0.25099 Prediction RMSE: 44.2074 Running time: 367 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=126)
Iter: 100 Imputation MAPE: 0.245843 Imputation RMSE: 58.1553 Iter: 200 Imputation MAPE: 0.24597 Imputation RMSE: 58.2217 Prediction MAPE: 0.268117 Prediction RMSE: 46.4623 Running time: 368 seconds
import scipy.io
import warnings
warnings.simplefilter('ignore')
tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
sparse_mat = dense_mat.copy()
import time
start = time.time()
rank = 10
pred_step = 7 * 108
time_lags = np.array([1, 2, 3, 108, 109, 110, 7 * 108, 7 * 108 + 1, 7 * 108 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=378)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.224683 Prediction RMSE: 30.5755 Running time: 372 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=189)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.241361 Prediction RMSE: 32.6289 Running time: 370 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=126)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.25415 Prediction RMSE: 33.9146 Running time: 370 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 288
time_lags = np.array([1, 2, 3, 288, 289, 290, 7 * 288, 7 * 288 + 1, 7 * 288 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=1008)
Iter: 100 Imputation MAPE: 0.101807 Imputation RMSE: 5.56864 Iter: 200 Imputation MAPE: 0.10178 Imputation RMSE: 5.5675 Prediction MAPE: 0.116346 Prediction RMSE: 5.98404 Running time: 1119 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: 0.101811 Imputation RMSE: 5.56841 Iter: 200 Imputation MAPE: 0.101781 Imputation RMSE: 5.56744 Prediction MAPE: 0.116778 Prediction RMSE: 6.02036 Running time: 1123 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=336)
Iter: 100 Imputation MAPE: 0.101812 Imputation RMSE: 5.56882 Iter: 200 Imputation MAPE: 0.10178 Imputation RMSE: 5.56738 Prediction MAPE: 0.119966 Prediction RMSE: 6.17256 Running time: 1141 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 288
time_lags = np.array([1, 2, 3, 288, 289, 290, 7 * 288, 7 * 288 + 1, 7 * 288 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=1008)
Iter: 100 Imputation MAPE: 0.0962245 Imputation RMSE: 5.28646 Iter: 200 Imputation MAPE: 0.096197 Imputation RMSE: 5.28516 Prediction MAPE: 0.113667 Prediction RMSE: 5.85433 Running time: 1113 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: 0.096216 Imputation RMSE: 5.28609 Iter: 200 Imputation MAPE: 0.0961997 Imputation RMSE: 5.28534 Prediction MAPE: 0.115298 Prediction RMSE: 5.95917 Running time: 1125 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=336)
Iter: 100 Imputation MAPE: 0.0962624 Imputation RMSE: 5.28802 Iter: 200 Imputation MAPE: 0.0962058 Imputation RMSE: 5.28551 Prediction MAPE: 0.119754 Prediction RMSE: 6.14716 Running time: 1135 seconds
Scenario setting:
import time
import scipy.io
import numpy as np
np.random.seed(1000)
dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 288
time_lags = np.array([1, 2, 3, 288, 289, 290, 7 * 288, 7 * 288 + 1, 7 * 288 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=1008)
Iter: 100 Imputation MAPE: 0.106593 Imputation RMSE: 5.66466 Iter: 200 Imputation MAPE: 0.106592 Imputation RMSE: 5.66458 Prediction MAPE: 0.123772 Prediction RMSE: 6.19303 Running time: 1090 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: 0.106616 Imputation RMSE: 5.66563 Iter: 200 Imputation MAPE: 0.106596 Imputation RMSE: 5.66472 Prediction MAPE: 0.126678 Prediction RMSE: 6.36595 Running time: 1103 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=336)
Iter: 100 Imputation MAPE: 0.106616 Imputation RMSE: 5.66558 Iter: 200 Imputation MAPE: 0.106592 Imputation RMSE: 5.66445 Prediction MAPE: 0.128849 Prediction RMSE: 6.41523 Running time: 1105 seconds
import pandas as pd
import warnings
warnings.simplefilter('ignore')
dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
dense_mat = dense_mat.values
sparse_mat = dense_mat.copy()
import time
start = time.time()
rank = 10
pred_step = 7 * 288
time_lags = np.array([1, 2, 3, 288, 289, 290, 7 * 288, 7 * 288 + 1, 7 * 288 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=1008)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.104488 Prediction RMSE: 5.58347 Running time: 1157 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=504)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.106473 Prediction RMSE: 5.70034 Running time: 1162 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=336)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.111521 Prediction RMSE: 5.91872 Running time: 1181 seconds
London movement speed data set is is a city-wide hourly traffic speeddataset collected in London.
Observation rate | $>90\%$ | $>80\%$ | $>70\%$ | $>60\%$ | $>50\%$ |
---|---|---|---|---|---|
Number of roads | 17,666 | 27,148 | 35,912 | 44,352 | 52,727 |
If want to test on the full dataset, you could consider the following setting for masking observations as missing values.
import numpy as np
np.random.seed(1000)
mask_rate = 0.20
dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
pos_obs = np.where(dense_mat != 0)
num = len(pos_obs[0])
sample_ind = np.random.choice(num, size = int(mask_rate * num), replace = False)
sparse_mat = dense_mat.copy()
sparse_mat[pos_obs[0][sample_ind], pos_obs[1][sample_ind]] = 0
Notably, you could also consider to evaluate the model on a subset of the data with the following setting.
import numpy as np
np.random.seed(1000)
missing_rate = 0.4
dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
## Non-random missing (NM)
binary_mat = np.zeros(dense_mat.shape)
random_mat = np.random.rand(dense_mat.shape[0], 30)
for i1 in range(dense_mat.shape[0]):
for i2 in range(30):
binary_mat[i1, i2 * 24 : (i2 + 1) * 24] = np.round(random_mat[i1, i2] + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 24
time_lags = np.array([1, 2, 3, 24, 25, 26, 7 * 24, 7 * 24 + 1, 7 * 24 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=84)
Iter: 100 Imputation MAPE: 0.0993481 Imputation RMSE: 2.40038 Iter: 200 Imputation MAPE: 0.0993169 Imputation RMSE: 2.39941 Prediction MAPE: 0.116247 Prediction RMSE: 2.72127 Running time: 1184 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=42)
Iter: 100 Imputation MAPE: 0.0993366 Imputation RMSE: 2.40001 Iter: 200 Imputation MAPE: 0.0993201 Imputation RMSE: 2.39946 Prediction MAPE: 0.127815 Prediction RMSE: 3.14071 Running time: 1188 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=28)
Iter: 100 Imputation MAPE: 0.0993521 Imputation RMSE: 2.40041 Iter: 200 Imputation MAPE: 0.0993202 Imputation RMSE: 2.39952 Prediction MAPE: 0.123015 Prediction RMSE: 2.86905 Running time: 1188 seconds
import numpy as np
np.random.seed(1000)
missing_rate = 0.4
dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 24
time_lags = np.array([1, 2, 3, 24, 25, 26, 7 * 24, 7 * 24 + 1, 7 * 24 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=84)
Iter: 100 Imputation MAPE: 0.0972497 Imputation RMSE: 2.34772 Iter: 200 Imputation MAPE: 0.0972339 Imputation RMSE: 2.34705 Prediction MAPE: 0.114902 Prediction RMSE: 2.70622 Running time: 1168 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=42)
Iter: 100 Imputation MAPE: 0.0972483 Imputation RMSE: 2.34754 Iter: 200 Imputation MAPE: 0.0972287 Imputation RMSE: 2.34688 Prediction MAPE: 0.117015 Prediction RMSE: 2.76467 Running time: 1207 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=28)
Iter: 100 Imputation MAPE: 0.0972588 Imputation RMSE: 2.34771 Iter: 200 Imputation MAPE: 0.0972342 Imputation RMSE: 2.34695 Prediction MAPE: 0.125986 Prediction RMSE: 2.94896 Running time: 1239 seconds
import numpy as np
np.random.seed(1000)
missing_rate = 0.6
dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat
Model setting:
import time
start = time.time()
rank = 10
pred_step = 7 * 24
time_lags = np.array([1, 2, 3, 24, 25, 26, 7 * 24, 7 * 24 + 1, 7 * 24 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=84)
Iter: 100 Imputation MAPE: 0.102163 Imputation RMSE: 2.45197 Iter: 200 Imputation MAPE: 0.102118 Imputation RMSE: 2.45057 Prediction MAPE: 0.119481 Prediction RMSE: 2.7947 Running time: 1061 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=42)
Iter: 100 Imputation MAPE: 0.102112 Imputation RMSE: 2.45041 Iter: 200 Imputation MAPE: 0.102092 Imputation RMSE: 2.44987 Prediction MAPE: 0.120207 Prediction RMSE: 2.79563 Running time: 1072 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=28)
Iter: 100 Imputation MAPE: 0.102122 Imputation RMSE: 2.45049 Iter: 200 Imputation MAPE: 0.1021 Imputation RMSE: 2.44998 Prediction MAPE: 0.126814 Prediction RMSE: 2.94801 Running time: 1066 seconds
import numpy as np
import warnings
warnings.simplefilter('ignore')
dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]
sparse_mat = dense_mat.copy()
import time
start = time.time()
rank = 10
pred_step = 7 * 24
time_lags = np.array([1, 2, 3, 24, 25, 26, 7 * 24, 7 * 24 + 1, 7 * 24 + 2])
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 1
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
for multi_step in [2, 4, 6]:
start = time.time()
print('Prediction time horizon (delta) = {}.'.format(multi_step))
mat_hat = TRMF_forecast(dense_mat, sparse_mat, init_hyper, pred_step, multi_step, rank, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
print()
Prediction time horizon (delta) = 2.
IntProgress(value=0, max=84)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.113515 Prediction RMSE: 2.67703 Running time: 1452 seconds Prediction time horizon (delta) = 4.
IntProgress(value=0, max=42)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.116352 Prediction RMSE: 2.79261 Running time: 1420 seconds Prediction time horizon (delta) = 6.
IntProgress(value=0, max=28)
Iter: 100 Imputation MAPE: nan Imputation RMSE: nan Iter: 200 Imputation MAPE: nan Imputation RMSE: nan Prediction MAPE: 0.127962 Prediction RMSE: 3.04552 Running time: 1420 seconds