# Data Organization: Matrix Structure¶

Reference: Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

# Temporal Regularized Matrix Factorization(TRMF)¶

Temporal Regularized Matrix Factorization (TRMF) framework is an approach to incorporate temporal dependencies into matrix factorization models which use well-studied time series models to describe temporal dependencies among ${\boldsymbol{x}_t}$ explicitly.Such models take the form:

$$\boldsymbol{x}_{t}\approx\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$$

where this autoregressive (AR) is specialized by a lag set $\mathcal{L}=\left\{l_1,l_2,...,l_d\right\}$ (e.g., $\mathcal{L}=\left\{1,2,144\right\}$) and weights $\boldsymbol{\theta}_{l}\in\mathbb{R}^{r},\forall l$, and we further define

$$\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)=\frac{1}{2}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^T\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{\eta}{2}\sum_{t=1}^{f}\boldsymbol{x}_{t}^T\boldsymbol{x}_{t}.$$

Thus, TRMF-AR is given by solving

$$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\lambda_{w}\underbrace{\mathcal{R}_{w}\left(W\right)}_{W-\text{regularizer}}+\lambda_{x}\underbrace{\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)}_{\text{AR-regularizer}}+\lambda_{\theta}\underbrace{\mathcal{R}_{\theta}\left(\Theta\right)}_{\Theta-\text{regularizer}}$$

where $\mathcal{R}_{w}\left(W\right)=\frac{1}{2}\sum_{i=1}^{m}\boldsymbol{w}_{i}^T\boldsymbol{w}_{i}$ and $\mathcal{R}_{\theta}\left(\Theta\right)=\frac{1}{2}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^T\boldsymbol{\theta}_{l}$ are regularization terms.

In [1]:
import numpy as np
from numpy.linalg import inv as inv


# Matrix Computation Concepts¶

## Kronecker product¶

• Definition:

Given two matrices $A\in\mathbb{R}^{m_1\times n_1}$ and $B\in\mathbb{R}^{m_2\times n_2}$, then, the Kronecker product between these two matrices is defined as

$$A\otimes B=\left[ \begin{array}{cccc} a_{11}B & a_{12}B & \cdots & a_{1m_2}B \\ a_{21}B & a_{22}B & \cdots & a_{2m_2}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m_11}B & a_{m_12}B & \cdots & a_{m_1m_2}B \\ \end{array} \right]$$

where the symbol $\otimes$ denotes Kronecker product, and the size of resulted $A\otimes B$ is $(m_1m_2)\times (n_1n_2)$ (i.e., $m_1\times m_2$ columns and $n_1\times n_2$ rows).

• Example:

If $A=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]$ and $B=\left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10 \\ \end{array} \right]$, then, we have

$$A\otimes B=\left[ \begin{array}{cc} 1\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] & 2\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] \\ 3\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] & 4\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] \\ \end{array} \right]$$$$=\left[ \begin{array}{cccccc} 5 & 6 & 7 & 10 & 12 & 14 \\ 8 & 9 & 10 & 16 & 18 & 20 \\ 15 & 18 & 21 & 20 & 24 & 28 \\ 24 & 27 & 30 & 32 & 36 & 40 \\ \end{array} \right]\in\mathbb{R}^{4\times 6}.$$

## Khatri-Rao product (kr_prod)¶

• Definition:

Given two matrices $A=\left( \boldsymbol{a}_1,\boldsymbol{a}_2,...,\boldsymbol{a}_r \right)\in\mathbb{R}^{m\times r}$ and $B=\left( \boldsymbol{b}_1,\boldsymbol{b}_2,...,\boldsymbol{b}_r \right)\in\mathbb{R}^{n\times r}$ with same number of columns, then, the Khatri-Rao product (or column-wise Kronecker product) between $A$ and $B$ is given as follows,

$$A\odot B=\left( \boldsymbol{a}_1\otimes \boldsymbol{b}_1,\boldsymbol{a}_2\otimes \boldsymbol{b}_2,...,\boldsymbol{a}_r\otimes \boldsymbol{b}_r \right)\in\mathbb{R}^{(mn)\times r}$$

where the symbol $\odot$ denotes Khatri-Rao product, and $\otimes$ denotes Kronecker product.

• Example:

If $A=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]=\left( \boldsymbol{a}_1,\boldsymbol{a}_2 \right)$ and $B=\left[ \begin{array}{cc} 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ \end{array} \right]=\left( \boldsymbol{b}_1,\boldsymbol{b}_2 \right)$, then, we have

$$A\odot B=\left( \boldsymbol{a}_1\otimes \boldsymbol{b}_1,\boldsymbol{a}_2\otimes \boldsymbol{b}_2 \right)$$$$=\left[ \begin{array}{cc} \left[ \begin{array}{c} 1 \\ 3 \\ \end{array} \right]\otimes \left[ \begin{array}{c} 5 \\ 7 \\ 9 \\ \end{array} \right] & \left[ \begin{array}{c} 2 \\ 4 \\ \end{array} \right]\otimes \left[ \begin{array}{c} 6 \\ 8 \\ 10 \\ \end{array} \right] \\ \end{array} \right]$$$$=\left[ \begin{array}{cc} 5 & 12 \\ 7 & 16 \\ 9 & 20 \\ 15 & 24 \\ 21 & 32 \\ 27 & 40 \\ \end{array} \right]\in\mathbb{R}^{6\times 2}.$$
In [2]:
def kr_prod(a, b):
return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)

In [3]:
import numpy as np
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8], [9, 10]])
print(kr_prod(A, B))

[[ 5 12]
[ 7 16]
[ 9 20]
[15 24]
[21 32]
[27 40]]

In [4]:
def TRMF(dense_mat, sparse_mat, init, time_lags, lambda_w, lambda_x, lambda_theta, eta, maxiter):
W = init["W"]
X = init["X"]
theta = init["theta"]

dim1, dim2 = sparse_mat.shape
binary_mat = np.zeros((dim1,dim2))
position = np.where((sparse_mat != 0))
binary_mat[position] = 1
pos = np.where((dense_mat != 0) & (sparse_mat == 0))
d = len(time_lags)
r = theta.shape[1]

mape = np.zeros(maxiter)
rmse = np.zeros(maxiter)
for iter in range(maxiter):
var1 = X.T
var2 = kr_prod(var1,var1)
var3 = np.matmul(var2,binary_mat.T)
var4 = np.matmul(var1,sparse_mat.T)
for i in range(dim1):
W[i,:] = np.matmul(inv((var3[:,i].reshape([r,r]))+lambda_w * np.eye(r)), var4[:,i])

var1 = W.T
var2 = kr_prod(var1,var1)
var3 = np.matmul(var2, binary_mat)
var4 = np.matmul(var1, sparse_mat)
for t in range(dim2):
Mt = np.zeros((r,r))
Nt = np.zeros(r)
if t < max(time_lags):
Pt = np.zeros((r,r))
Qt = np.zeros(r)
else:
Pt = np.eye(r)
Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
if t < dim2 - np.min(time_lags):
if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
index = list(range(0, d))
else:
index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
for k in index:
theta0 = theta.copy()
theta0[k, :] = 0
Mt = Mt + np.diag(theta[k, :]**2);
Nt = Nt + np.multiply(theta[k,:],(X[t+time_lags[k], :]
- np.einsum('ij, ij -> j', theta0,
X[t + time_lags[k] - time_lags, :])))
X[t,:] = np.matmul(inv(var3[:, t].reshape([r,r])
+ lambda_x * Pt + lambda_x * Mt + lambda_x * eta * np.eye(r)),
(var4[:, t] + lambda_x * Qt + lambda_x * Nt))
elif t >= dim2 - np.min(time_lags):
X[t, :] = np.matmul(inv(var3[:, t].reshape([r, r]) + lambda_x * Pt
+ lambda_x * eta * np.eye(r)), (var4[:, t] + Qt))
for k in range(d):
var1 = X[np.max(time_lags) - time_lags[k] : dim2 - time_lags[k], :]
var2 = inv(np.diag(np.einsum('ij, ij -> j', var1, var1)) + (lambda_theta / lambda_x) * np.eye(r))
var3 = np.zeros(r)
for t in range(np.max(time_lags) - time_lags[k], dim2 - time_lags[k]):
var3 = var3 + np.multiply(X[t, :],
(X[t + time_lags[k], :]
- np.einsum('ij, ij -> j', theta, X[t + time_lags[k] - time_lags, :])
+np.multiply(theta[k, :], X[t,:])))
theta[k, :] = np.matmul(var2,var3)

mat_hat = np.matmul(W, X.T)
mape[iter] = np.sum(np.abs(dense_mat[pos] - mat_hat[pos]) / dense_mat[pos]) / dense_mat[pos].shape[0]
rmse[iter] = np.sqrt(np.sum((dense_mat[pos] - mat_hat[pos])**2)/dense_mat[pos].shape[0])
return W, X, theta

In [5]:
def OnlineTRMF(sparse_vec, init, lambda_x, time_lags):
W = init["W"]
X = init["X"]
theta = init["theta"]
dim = sparse_vec.shape[0]
t, rank = X.shape
position = np.where(sparse_vec != 0)
binary_vec = np.zeros(dim)
binary_vec[position] = 1

xt_tilde = np.einsum('ij, ij -> j', theta, X[t - 1 - time_lags, :])
var1 = W.T
var2 = kr_prod(var1, var1)
var_mu = np.matmul(var1, sparse_vec) + lambda_x * xt_tilde
inv_var_Lambda = inv(np.matmul(var2, binary_vec).reshape([rank, rank]) + lambda_x * np.eye(rank))
X[t - 1, :] = np.matmul(inv_var_Lambda, var_mu)
mat_hat = np.matmul(W, X.T)
return X

In [6]:
def st_prediction(dense_mat, sparse_mat, time_lags, lambda_w, lambda_x, lambda_theta, eta,
rank, pred_time_steps, maxiter):
start_time = dense_mat.shape[1] - pred_time_steps
dense_mat0 = dense_mat[:, 0 : start_time]
sparse_mat0 = sparse_mat[:, 0 : start_time]
dim1 = sparse_mat0.shape[0]
dim2 = sparse_mat0.shape[1]
mat_hat = np.zeros((dim1, pred_time_steps))

for t in range(pred_time_steps):
if t == 0:
init = {"W": 0.1 * np.random.rand(dim1, rank), "X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
W, X, theta = TRMF(dense_mat0, sparse_mat0, init, time_lags,
lambda_w, lambda_x, lambda_theta, eta, maxiter)
X0 = np.zeros((dim2 + t + 1, rank))
X0[0 : dim2 + t, :] = X.copy()
X0[dim2 + t, :] = np.einsum('ij, ij -> j', theta, X0[dim2 + t - time_lags, :])
else:
sparse_vec = sparse_mat[:, start_time + t - 1]
if np.where(sparse_vec > 0)[0].shape[0] > rank:
init = {"W": W, "X": X0[- np.max(time_lags) - 1 :, :], "theta": theta}
X = OnlineTRMF(sparse_vec, init, lambda_x/dim2, time_lags)
X0 = np.zeros((np.max(time_lags) + 1, rank))
X0[0 : np.max(time_lags), :] = X[1 :, :].copy()
X0[np.max(time_lags), :] = np.einsum('ij, ij -> j', theta, X0[np.max(time_lags) - time_lags, :])
else:
X0 = np.zeros((np.max(time_lags) + 1, rank))
X0[0 : np.max(time_lags), :] = X[1 :, :]
X0[np.max(time_lags), :] = np.einsum('ij, ij -> j', theta, X0[np.max(time_lags) - time_lags, :])
mat_hat[:, t] = np.matmul(W, X0[-1, :])
if (t + 1) % 40 == 0:
print('Time step: {}'.format(t + 1))

small_dense_mat = dense_mat[:, start_time : dense_mat.shape[1]]
pos = np.where(small_dense_mat != 0)
final_mape = np.sum(np.abs(small_dense_mat[pos] -
mat_hat[pos])/small_dense_mat[pos])/small_dense_mat[pos].shape[0]
final_rmse = np.sqrt(np.sum((small_dense_mat[pos] -
mat_hat[pos]) ** 2)/small_dense_mat[pos].shape[0])
print('Final MAPE: {:.6}'.format(final_mape))
print('Final RMSE: {:.6}'.format(final_rmse))
print()
return mat_hat

In [7]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.0

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [8]:
import time
start = time.time()
pred_time_steps = 144 * 5
time_lags = np.array([1, 2, 144])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Final MAPE: 0.106524
Final RMSE: 4.29955

Running time: 417 seconds

In [9]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [10]:
import time
start = time.time()
pred_time_steps = 144 * 5
time_lags = np.array([1, 2, 144])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Final MAPE: 0.106232
Final RMSE: 4.3062

Running time: 404 seconds

In [11]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [12]:
import time
start = time.time()
pred_time_steps = 144 * 5
time_lags = np.array([1, 2, 144])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Final MAPE: 0.10619
Final RMSE: 4.30295

Running time: 386 seconds

In [13]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [14]:
import time
start = time.time()
pred_time_steps = 144 * 5
time_lags = np.array([1, 2, 144])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Final MAPE: 0.106395
Final RMSE: 4.29308

Running time: 385 seconds

In [15]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [16]:
import time
start = time.time()
pred_time_steps = 144 * 5
time_lags = np.array([1, 2, 144])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Final MAPE: 0.107114
Final RMSE: 4.32297

Running time: 385 seconds


Experiment results of spatial-temporal data prediction using TRMF:

scenario rank Lambda_w Lambda_x Lambda_theta eta maxiter mape rmse
Original data 30 500 500 500 0.03 200 0.1065 4.30
20%, RM 30 500 500 500 0.03 200 0.1062 4.31
40%, RM 30 500 500 500 0.03 200 0.1062 4.30
20%, NM 30 500 500 500 0.03 200 0.1064 4.29
40%, NM 30 500 500 500 0.03 200 0.1071 4.32
In [17]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.0

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [18]:
import time
start = time.time()
pred_time_steps = 18 * 7
time_lags = np.array([1, 2, 18])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 100
lambda_x = 100
lambda_theta = 100
eta = 0.01
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Final MAPE: 0.326294
Final RMSE: 174.248

Running time: 42 seconds

In [19]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [20]:
import time
start = time.time()
pred_time_steps = 18 * 7
time_lags = np.array([1, 2, 18])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 100
lambda_x = 100
lambda_theta = 100
eta = 0.01
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Final MAPE: 0.326723
Final RMSE: 171.685

Running time: 37 seconds

In [21]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [22]:
import time
start = time.time()
pred_time_steps = 18 * 7
time_lags = np.array([1, 2, 18])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 100
lambda_x = 100
lambda_theta = 100
eta = 0.01
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Final MAPE: 0.344155
Final RMSE: 181.166

Running time: 37 seconds

In [23]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [24]:
import time
start = time.time()
pred_time_steps = 18 * 7
time_lags = np.array([1, 2, 18])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 100
lambda_x = 100
lambda_theta = 100
eta = 0.01
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Final MAPE: 0.319528
Final RMSE: 169.295

Running time: 39 seconds

In [25]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [26]:
import time
start = time.time()
pred_time_steps = 18 * 7
time_lags = np.array([1, 2, 18])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 100
lambda_x = 100
lambda_theta = 100
eta = 0.01
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Final MAPE: 0.33093
Final RMSE: 175.635

Running time: 38 seconds


Experiment results of spatial-temporal data prediction using TRMF:

scenario rank Lambda_w Lambda_x Lambda_theta eta back step mape rmse
Original data 10 100 100 100 0.01 200 0.3263 174.25
20%, RM 10 100 100 100 0.01 200 0.3267 171.69
40%, RM 10 100 100 100 0.01 200 0.3442 181.17
20%, NM 10 100 100 100 0.01 200 0.3195 169.30
40%, NM 10 100 100 100 0.01 200 0.3309 175.64
In [27]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.0

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [28]:
import time
start = time.time()
pred_time_steps = 108 * 5
time_lags = np.array([1, 2, 108])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 1000
lambda_x = 1000
lambda_theta = 1000
eta = 0.05
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Final MAPE: 0.277726
Final RMSE: 39.9873

Running time: 65 seconds

In [29]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [30]:
import time
start = time.time()
pred_time_steps = 108 * 5
time_lags = np.array([1, 2, 108])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 1000
lambda_x = 1000
lambda_theta = 1000
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Final MAPE: 0.275878
Final RMSE: 40.7251

Running time: 64 seconds

In [31]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
random_tensor.shape[1]
* random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(tensor.shape)
# for i1 in range(tensor.shape[0]):
#     for i2 in range(tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
#                                     * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [32]:
import time
start = time.time()
pred_time_steps = 108 * 5
time_lags = np.array([1, 2, 108])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 1000
lambda_x = 1000
lambda_theta = 1000
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Final MAPE: 0.266774
Final RMSE: 47.8046

Running time: 64 seconds

In [33]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [34]:
import time
start = time.time()
pred_time_steps = 108 * 5
time_lags = np.array([1, 2, 108])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 1000
lambda_x = 1000
lambda_theta = 1000
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Final MAPE: 0.265751
Final RMSE: 45.2281

Running time: 75 seconds

In [35]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = np.round(random_tensor + 0.5 - missing_rate).reshape([random_tensor.shape[0],
#                                                                    random_tensor.shape[1]
#                                                                    * random_tensor.shape[2]])
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1]
* binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [36]:
import time
start = time.time()
pred_time_steps = 108 * 5
time_lags = np.array([1, 2, 108])
dim1, dim2 = sparse_mat.shape
rank = 10
lambda_w = 1000
lambda_x = 1000
lambda_theta = 1000
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Final MAPE: 0.287804
Final RMSE: 41.0237

Running time: 75 seconds


Experiment results of spatial-temporal data prediction using TRMF:

scenario rank Lambda_w Lambda_x Lambda_theta eta maxiter mape rmse
Original data 10 1000 1000 1000 0.03 200 0.2777 39.99
20%, RM 10 1000 1000 1000 0.03 200 0.2759 40.73
40%, RM 10 1000 1000 1000 0.03 200 0.2668 47.80
20%, NM 10 1000 1000 1000 0.03 200 0.2658 45.23
40%, NM 10 1000 1000 1000 0.03 200 0.2878 41.02
In [7]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values

missing_rate = 0.2
# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(RM_mat + 0.5 - missing_rate)
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [8]:
import time
start = time.time()
pred_time_steps = 288 * 5
time_lags = np.array([1, 2, 288])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Time step: 760
Time step: 800
Time step: 840
Time step: 880
Time step: 920
Time step: 960
Time step: 1000
Time step: 1040
Time step: 1080
Time step: 1120
Time step: 1160
Time step: 1200
Time step: 1240
Time step: 1280
Time step: 1320
Time step: 1360
Time step: 1400
Time step: 1440
Final MAPE: 0.0795357
Final RMSE: 4.89963

Running time: 324 seconds

In [9]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values

missing_rate = 0.4
# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(RM_mat + 0.5 - missing_rate)
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [10]:
import time
start = time.time()
pred_time_steps = 288 * 5
time_lags = np.array([1, 2, 288])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Time step: 760
Time step: 800
Time step: 840
Time step: 880
Time step: 920
Time step: 960
Time step: 1000
Time step: 1040
Time step: 1080
Time step: 1120
Time step: 1160
Time step: 1200
Time step: 1240
Time step: 1280
Time step: 1320
Time step: 1360
Time step: 1400
Time step: 1440
Final MAPE: 0.0795249
Final RMSE: 4.89829

Running time: 298 seconds

In [11]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values

missing_rate = 0.2
# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
for i2 in range(binary_tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [12]:
import time
start = time.time()
pred_time_steps = 288 * 5
time_lags = np.array([1, 2, 288])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Time step: 760
Time step: 800
Time step: 840
Time step: 880
Time step: 920
Time step: 960
Time step: 1000
Time step: 1040
Time step: 1080
Time step: 1120
Time step: 1160
Time step: 1200
Time step: 1240
Time step: 1280
Time step: 1320
Time step: 1360
Time step: 1400
Time step: 1440
Final MAPE: 0.0794329
Final RMSE: 4.89428

Running time: 303 seconds

In [13]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values

missing_rate = 0.4
# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
for i2 in range(binary_tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [14]:
import time
start = time.time()
pred_time_steps = 288 * 5
time_lags = np.array([1, 2, 288])
dim1, dim2 = sparse_mat.shape
rank = 30
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
d = time_lags.shape[0]

maxiter = 200
mat_hat = st_prediction(dense_mat, dense_mat, time_lags, lambda_w, lambda_x, lambda_theta,
eta, rank, pred_time_steps, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:67: RuntimeWarning: invalid value encountered in double_scalars
/Users/xinyuchen/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:68: RuntimeWarning: invalid value encountered in double_scalars

Time step: 40
Time step: 80
Time step: 120
Time step: 160
Time step: 200
Time step: 240
Time step: 280
Time step: 320
Time step: 360
Time step: 400
Time step: 440
Time step: 480
Time step: 520
Time step: 560
Time step: 600
Time step: 640
Time step: 680
Time step: 720
Time step: 760
Time step: 800
Time step: 840
Time step: 880
Time step: 920
Time step: 960
Time step: 1000
Time step: 1040
Time step: 1080
Time step: 1120
Time step: 1160
Time step: 1200
Time step: 1240
Time step: 1280
Time step: 1320
Time step: 1360
Time step: 1400
Time step: 1440
Final MAPE: 0.0795817
Final RMSE: 4.89878

Running time: 310 seconds


Experiment results of spatial-temporal data prediction using TRMF:

scenario rank Lambda_w Lambda_x Lambda_theta eta maxiter mape rmse
Original data 30 500 500 500 0.03 200 0.0796 4.90
20%, RM 30 500 500 500 0.03 200 0.0795 4.90
40%, RM 30 500 500 500 0.03 200 0.0795 4.90
20%, NM 30 500 500 500 0.03 200 0.0794 4.89
40%, NM 30 500 500 500 0.03 200 0.0796 4.90