Bayesian temporal matrix factorization is a type of Bayesian matrix factorization that achieves state-of-the-art results on challenging imputation and prediction problems. In the following, we will discuss:

• What the proposed Bayesian temporal matrix factorization (BTMF for short) is?

• How to implement BTMF mainly using Python Numpy with high efficiency?

• How to make data imputations with real-world spatiotemporal datasets?

If you want to understand what is BTMF and its modeling tricks in detail, our paper is for you:

Xinyu Chen, Lijun Sun (2019). Bayesian temporal factorization for multidimensional time series prediction.

## Quick Run¶

This notebook is publicly available for any usage at our data imputation project. Please click transdim.

In [1]:
import numpy as np
from numpy.random import multivariate_normal as mvnrnd
from scipy.stats import wishart
from scipy.stats import invwishart
from numpy.linalg import inv as inv


# Part 1: Matrix Computation Concepts¶

## 1) Kronecker product¶

• Definition:

Given two matrices $A\in\mathbb{R}^{m_1\times n_1}$ and $B\in\mathbb{R}^{m_2\times n_2}$, then, the Kronecker product between these two matrices is defined as

$$A\otimes B=\left[ \begin{array}{cccc} a_{11}B & a_{12}B & \cdots & a_{1m_2}B \\ a_{21}B & a_{22}B & \cdots & a_{2m_2}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m_11}B & a_{m_12}B & \cdots & a_{m_1m_2}B \\ \end{array} \right]$$

where the symbol $\otimes$ denotes Kronecker product, and the size of resulted $A\otimes B$ is $(m_1m_2)\times (n_1n_2)$ (i.e., $m_1\times m_2$ columns and $n_1\times n_2$ rows).

• Example:

If $A=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]$ and $B=\left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10 \\ \end{array} \right]$, then, we have

$$A\otimes B=\left[ \begin{array}{cc} 1\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] & 2\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] \\ 3\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] & 4\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] \\ \end{array} \right]$$$$=\left[ \begin{array}{cccccc} 5 & 6 & 7 & 10 & 12 & 14 \\ 8 & 9 & 10 & 16 & 18 & 20 \\ 15 & 18 & 21 & 20 & 24 & 28 \\ 24 & 27 & 30 & 32 & 36 & 40 \\ \end{array} \right]\in\mathbb{R}^{4\times 6}.$$

## 2) Khatri-Rao product (kr_prod)¶

• Definition:

Given two matrices $A=\left( \boldsymbol{a}_1,\boldsymbol{a}_2,...,\boldsymbol{a}_r \right)\in\mathbb{R}^{m\times r}$ and $B=\left( \boldsymbol{b}_1,\boldsymbol{b}_2,...,\boldsymbol{b}_r \right)\in\mathbb{R}^{n\times r}$ with same number of columns, then, the Khatri-Rao product (or column-wise Kronecker product) between $A$ and $B$ is given as follows,

$$A\odot B=\left( \boldsymbol{a}_1\otimes \boldsymbol{b}_1,\boldsymbol{a}_2\otimes \boldsymbol{b}_2,...,\boldsymbol{a}_r\otimes \boldsymbol{b}_r \right)\in\mathbb{R}^{(mn)\times r}$$

where the symbol $\odot$ denotes Khatri-Rao product, and $\otimes$ denotes Kronecker product.

• Example:

If $A=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]=\left( \boldsymbol{a}_1,\boldsymbol{a}_2 \right)$ and $B=\left[ \begin{array}{cc} 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ \end{array} \right]=\left( \boldsymbol{b}_1,\boldsymbol{b}_2 \right)$, then, we have

$$A\odot B=\left( \boldsymbol{a}_1\otimes \boldsymbol{b}_1,\boldsymbol{a}_2\otimes \boldsymbol{b}_2 \right)$$$$=\left[ \begin{array}{cc} \left[ \begin{array}{c} 1 \\ 3 \\ \end{array} \right]\otimes \left[ \begin{array}{c} 5 \\ 7 \\ 9 \\ \end{array} \right] & \left[ \begin{array}{c} 2 \\ 4 \\ \end{array} \right]\otimes \left[ \begin{array}{c} 6 \\ 8 \\ 10 \\ \end{array} \right] \\ \end{array} \right]$$$$=\left[ \begin{array}{cc} 5 & 12 \\ 7 & 16 \\ 9 & 20 \\ 15 & 24 \\ 21 & 32 \\ 27 & 40 \\ \end{array} \right]\in\mathbb{R}^{6\times 2}.$$
In [2]:
def kr_prod(a, b):
return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)

In [3]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8], [9, 10]])
print(kr_prod(A, B))

[[ 5 12]
[ 7 16]
[ 9 20]
[15 24]
[21 32]
[27 40]]


## 3) Computing Covariance Matrix (cov_mat)¶

For any matrix $X\in\mathbb{R}^{m\times n}$, cov_mat can return a $n\times n$ covariance matrix for special use in the following.

In [4]:
def cov_mat(mat):
dim1, dim2 = mat.shape
new_mat = np.zeros((dim2, dim2))
mat_bar = np.mean(mat, axis = 0)
for i in range(dim1):
new_mat += np.einsum('i, j -> ij', mat[i, :] - mat_bar, mat[i, :] - mat_bar)
return new_mat


# Part 2: Bayesian Temporal Matrix Factorization (BTMF)¶

In [5]:
def BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2):
"""Bayesian Temporal Matrix Factorization, BTMF."""
W = init["W"]
X = init["X"]
theta = init["theta"]

d = theta.shape[0]
dim1, dim2 = sparse_mat.shape
pos = np.where((dense_mat != 0) & (sparse_mat == 0))
position = np.where(sparse_mat != 0)
binary_mat = np.zeros((dim1, dim2))
binary_mat[position] = 1

beta0 = 1
nu0 = rank
mu0 = np.zeros((rank))
W0 = np.eye(rank)
tau = 1
alpha = 1e-6
beta = 1e-6

mat_hat = np.zeros((dim1, dim2 + 1))
W_plus = np.zeros((dim1, rank))
X_plus = np.zeros((dim2, rank))
X_new = np.zeros((dim2 + 1, rank))
X_new_plus = np.zeros((dim2 + 1, rank))
theta_plus = np.zeros((d, rank))
mat_hat_plus = np.zeros((dim1, dim2 + 1))
for iters in range(maxiter1):
W_bar = np.mean(W, axis = 0)
var_mu_hyper = (dim1 * W_bar + beta0 * mu0)/(dim1 + beta0)
var_W_hyper = inv(inv(W0) + cov_mat(W) + dim1 * beta0/(dim1 + beta0)
* np.outer(W_bar - mu0, W_bar - mu0))
var_W_hyper = (var_W_hyper + var_W_hyper.T)/2
var_Lambda_hyper = wishart(df = dim1 + nu0, scale = var_W_hyper, seed = None).rvs()
var_mu_hyper = mvnrnd(var_mu_hyper, inv((dim1 + beta0) * var_Lambda_hyper))

var1 = X.T
var2 = kr_prod(var1, var1)
var3 = (tau * np.matmul(var2, binary_mat.T).reshape([rank, rank, dim1])
+ np.dstack([var_Lambda_hyper] * dim1))
var4 = (tau * np.matmul(var1, sparse_mat.T)
+ np.dstack([np.matmul(var_Lambda_hyper, var_mu_hyper)] * dim1)[0, :, :])
for i in range(dim1):
var_Lambda = var3[:, :, i]
inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)
var_mu = np.matmul(inv_var_Lambda, var4[:, i])
W[i, :] = mvnrnd(var_mu, inv_var_Lambda)
if iters + 1 > maxiter1 - maxiter2:
W_plus += W

mat0 = X[0 : np.max(time_lags), :]
mat = np.matmul(mat0.T, mat0)
new_mat = np.zeros((dim2 - np.max(time_lags), rank))
for t in range(dim2 - np.max(time_lags)):
new_mat[t, :] = (X[t + np.max(time_lags), :]
- np.einsum('ij, ij -> j', theta, X[t + np.max(time_lags) - time_lags, :]))
mat += np.matmul(new_mat.T, new_mat)
var_W_hyper = inv(inv(W0) + mat)
var_W_hyper = (var_W_hyper + var_W_hyper.T)/2
Lambda_x = wishart(df = dim2 + nu0, scale = var_W_hyper, seed = None).rvs()

var1 = W.T
var2 = kr_prod(var1, var1)
var3 = tau * np.matmul(var2, binary_mat).reshape([rank, rank, dim2]) + np.dstack([Lambda_x] * dim2)
var4 = tau * np.matmul(var1, sparse_mat)
for t in range(dim2):
Mt = np.zeros((rank, rank))
Nt = np.zeros(rank)
if t < np.max(time_lags):
Qt = np.zeros(rank)
else:
Qt = np.matmul(Lambda_x, np.einsum('ij, ij -> j', theta, X[t - time_lags, :]))
if t < dim2 - np.min(time_lags):
if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
index = list(range(0, d))
else:
index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
for k in index:
Ak = theta[k, :]
Mt += np.multiply(np.outer(Ak, Ak), Lambda_x)
theta0 = theta.copy()
theta0[k, :] = 0
var5 = (X[t + time_lags[k], :]
- np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
Nt += np.matmul(np.matmul(np.diag(Ak), Lambda_x), var5)
var_mu = var4[:, t] + Nt + Qt
var_Lambda = var3[:, :, t] + Mt
inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)
X[t, :] = mvnrnd(np.matmul(inv_var_Lambda, var_mu), inv_var_Lambda)
if iters + 1 > maxiter1 - maxiter2:
X_plus += X

mat_hat = np.matmul(W, X.T)
if iters + 1 > maxiter1 - maxiter2:
X_new[0 : dim2, :] = X.copy()
X_new[dim2, :] = np.einsum('ij, ij -> j', theta, X_new[dim2 - time_lags, :])
X_new_plus += X_new
mat_hat_plus += np.matmul(W, X_new.T)
rmse = np.sqrt(np.sum((dense_mat[pos] - mat_hat[pos]) ** 2)/dense_mat[pos].shape[0])

var_alpha = alpha + 0.5 * sparse_mat[position].shape[0]
error = sparse_mat - mat_hat
var_beta = beta + 0.5 * np.sum(error[position] ** 2)
tau = np.random.gamma(var_alpha, 1/var_beta)

theta_bar = np.mean(theta, axis = 0)
var_mu_hyper = (d * theta_bar + beta0 * mu0)/(d + beta0)
var_W_hyper = inv(inv(W0) + cov_mat(theta) + d * beta0/(d + beta0)
* np.outer(theta_bar - mu0, theta_bar - mu0))
var_W_hyper = (var_W_hyper + var_W_hyper.T)/2
var_Lambda_hyper = wishart(df = d + nu0, scale = var_W_hyper, seed = None).rvs()
var_mu_hyper = mvnrnd(var_mu_hyper, inv((d + beta0) * var_Lambda_hyper))

for k in range(d):
theta0 = theta.copy()
theta0[k, :] = 0
mat0 = np.zeros((dim2 - np.max(time_lags), rank))
for L in range(d):
mat0 += np.matmul(X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L], :],
np.diag(theta0[L, :]))
VarPi = X[np.max(time_lags) : dim2, :] - mat0
var1 = np.zeros((rank, rank))
var2 = np.zeros(rank)
for t in range(np.max(time_lags), dim2):
B = X[t - time_lags[k], :]
var1 += np.multiply(np.outer(B, B), Lambda_x)
var2 += np.matmul(np.matmul(np.diag(B), Lambda_x), VarPi[t - np.max(time_lags), :])
var_Lambda = var1 + var_Lambda_hyper
inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)
var_mu = np.matmul(inv_var_Lambda, var2 + np.matmul(var_Lambda_hyper, var_mu_hyper))
theta[k, :] = mvnrnd(var_mu, inv_var_Lambda)
if iters + 1 > maxiter1 - maxiter2:
theta_plus += theta

if (iters + 1) % 200 == 0 and iters < maxiter1 - maxiter2:
print('Iter: {}'.format(iters + 1))
print('RMSE: {:.6}'.format(rmse))
print()

W = W_plus/maxiter2
X = X_plus/maxiter2
X_new = X_new_plus/maxiter2
theta = theta_plus/maxiter2
mat_hat = mat_hat_plus/maxiter2
if maxiter1 >= 100:
final_mape = np.sum(np.abs(dense_mat[pos] - mat_hat[pos])/dense_mat[pos])/dense_mat[pos].shape[0]
final_rmse = np.sqrt(np.sum((dense_mat[pos] - mat_hat[pos]) ** 2)/dense_mat[pos].shape[0])
print('Imputation MAPE: {:.6}'.format(final_mape))
print('Imputation RMSE: {:.6}'.format(final_rmse))
print()

return mat_hat, W, X_new, theta


# Part 3: Data Organization¶

## 1) Matrix Structure¶

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

## 2) Tensor Structure¶

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

# Part 4: Experiments on Guangzhou Data Set¶

In [6]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = (np.round(random_tensor + 0.5 - missing_rate)
#               .reshape([random_tensor.shape[0], random_tensor.shape[1] * random_tensor.shape[2]]))
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [7]:
import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 144])
init = {"W": 0.1 * np.random.rand(dim1, rank),
"X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
maxiter1 = 1100
maxiter2 = 100
BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 4.54723

Iter: 400
RMSE: 4.55557

Iter: 600
RMSE: 4.55508

Iter: 800
RMSE: 4.56203

Iter: 1000
RMSE: 4.55163

Imputation MAPE: 0.104271
Imputation RMSE: 4.49679

Running time: 4171 seconds


Experiment results of missing data imputation using BTMF:

scenario rank maxiter1 maxiter2 mape rmse
0.2, RM 80 1100 100 0.0737 3.1399
0.4, RM 80 1100 100 0.0756 3.2403
0.2, NM 10 1100 100 0.1021 4.2718
0.4, NM 10 1100 100 0.1043 4.4968

# Part 5: Experiments on Birmingham Data Set¶

In [8]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = (np.round(random_tensor + 0.5 - missing_rate)
#               .reshape([random_tensor.shape[0], random_tensor.shape[1] * random_tensor.shape[2]]))
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [9]:
import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 18])
init = {"W": 0.1 * np.random.rand(dim1, rank),
"X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
maxiter1 = 1100
maxiter2 = 100
BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 30.3232

Iter: 400
RMSE: 30.3908

Iter: 600
RMSE: 30.9868

Iter: 800
RMSE: 31.5588

Iter: 1000
RMSE: 30.8343

Imputation MAPE: 0.121032
Imputation RMSE: 28.1819

Running time: 666 seconds


Experiment results of missing data imputation using BTMF:

scenario rank maxiter1 maxiter2 mape rmse
10%, RM 30 1100 100 0.0164 6.4044
30%, RM 30 1100 100 0.0231 13.7682
10%, NM 10 1100 100 0.1210 28.1819
30%, NM 10 1100 100 0.1510 59.661

# Part 6: Experiments on Hangzhou Data Set¶

In [10]:
import scipy.io

tensor = tensor['tensor']
random_matrix = random_matrix['random_matrix']
random_tensor = random_tensor['random_tensor']

dense_mat = tensor.reshape([tensor.shape[0], tensor.shape[1] * tensor.shape[2]])
missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_mat = (np.round(random_tensor + 0.5 - missing_rate)
#               .reshape([random_tensor.shape[0], random_tensor.shape[1] * random_tensor.shape[2]]))
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
for i2 in range(tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
binary_mat = binary_tensor.reshape([binary_tensor.shape[0], binary_tensor.shape[1] * binary_tensor.shape[2]])
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [11]:
import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 108])
init = {"W": 0.1 * np.random.rand(dim1, rank),
"X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
maxiter1 = 1100
maxiter2 = 100
BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 76.8435

Iter: 400
RMSE: 78.4871

Iter: 600
RMSE: 80.5467

Iter: 800
RMSE: 77.3786

Iter: 1000
RMSE: 77.0848

Imputation MAPE: 0.281268
Imputation RMSE: 71.872

Running time: 1289 seconds


Experiment results of missing data imputation using BTMF:

scenario rank maxiter1 maxiter2 mape rmse
20%, RM 50 1100 100 0.2240 30.3205
40%, RM 50 1100 100 0.2387 32.7297
20%, NM 10 1100 100 0.2680 74.265
40%, NM 10 1100 100 0.2813 71.872

# Part 7: Experiments on Seattle Data Set¶

In [8]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_mat = np.round(RM_mat + 0.5 - missing_rate)
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_mat)

In [9]:
import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 50
time_lags = np.array([1, 2, 288])
init = {"W": 0.1 * np.random.rand(dim1, rank),
"X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
maxiter1 = 1100
maxiter2 = 100
BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 4.03846

Iter: 400
RMSE: 4.02591

Iter: 600
RMSE: 4.0148

Iter: 800
RMSE: 4.01372

Iter: 1000
RMSE: 4.00902

Imputation MAPE: 0.0608192
Imputation RMSE: 3.78664

Running time: 12317 seconds

In [10]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values

missing_rate = 0.2

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
for i2 in range(binary_tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_tensor.reshape([dense_mat.shape[0], dense_mat.shape[1]]))

In [11]:
import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 288])
init = {"W": 0.1 * np.random.rand(dim1, rank),
"X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
maxiter1 = 1100
maxiter2 = 100
BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 5.32059

Iter: 400
RMSE: 5.3231

Iter: 600
RMSE: 5.32197

Iter: 800
RMSE: 5.31929

Iter: 1000
RMSE: 5.3221

Imputation MAPE: 0.0912485
Imputation RMSE: 5.27615

Running time: 3593 seconds

In [12]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
NM_mat = NM_mat.values

missing_rate = 0.4

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
for i2 in range(binary_tensor.shape[1]):
binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_mat = np.multiply(dense_mat, binary_tensor.reshape([dense_mat.shape[0], dense_mat.shape[1]]))

In [13]:
import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 288])
init = {"W": 0.1 * np.random.rand(dim1, rank),
"X": 0.1 * np.random.rand(dim2, rank),
"theta": 0.1 * np.random.rand(time_lags.shape[0], rank)}
maxiter1 = 1100
maxiter2 = 100
BTMF(dense_mat, sparse_mat, init, rank, time_lags, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 200
RMSE: 5.39469

Iter: 400
RMSE: 5.3884

Iter: 600
RMSE: 5.3911

Iter: 800
RMSE: 5.38716

Iter: 1000
RMSE: 5.38702

Imputation MAPE: 0.0920702
Imputation RMSE: 5.33163

Running time: 3490 seconds


Experiment results of missing data imputation using BTMF:

scenario rank maxiter1 maxiter2 mape rmse
20%, RM 50 1100 100 0.0592 3.7129
40%, RM 50 1100 100 0.0618 3.7866
20%, NM 10 1100 100 0.0912 5.2762
40%, NM 10 1100 100 0.0921 5.3316