Notebook

About this Notebook¶

Temporal Regularized Matrix Factorization (TRMF) is an effective tool for imputing missing data within a given multivariate time series and forecasting time series with missing values. This approach is from the following literature:

Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

Acknowledgement: We would like to thank

Antony Masso Lussier (HEC Montreal)

for providing helpful suggestion and discussion. Thank you!

Quick Run¶

This notebook is publicly available for any usage at our data imputation project. Please click transdim.

Data Organization: Matrix Structure¶

In this post, we consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

TRMF model¶

Temporal Regularized Matrix Factorization (TRMF) is an approach to incorporate temporal dependencies into commonly-used matrix factorization model. The temporal dependencies are described among ${\boldsymbol{x}_t}$ explicitly. Such approach takes the form:

$$\boldsymbol{x}_{t}\approx\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l},$$

where this autoregressive (AR) is specialized by a lag set $\mathcal{L}=\left\{l_1,l_2,...,l_d\right\}$ (e.g., $\mathcal{L}=\left\{1,2,144\right\}$) and weights $\boldsymbol{\theta}_{l}\in\mathbb{R}^{r},\forall l$, and we further define

$$\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)=\frac{1}{2}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{\eta}{2}\sum_{t=1}^{f}\boldsymbol{x}_{t}^\top\boldsymbol{x}_{t}.$$

Thus, TRMF-AR is given by solving

$$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\lambda_{w}\underbrace{\mathcal{R}_{w}\left(W\right)}_{W-\text{regularizer}}+\lambda_{x}\underbrace{\mathcal{R}_{AR}\left(X\mid \mathcal{L},\Theta,\eta\right)}_{\text{AR-regularizer}}+\lambda_{\theta}\underbrace{\mathcal{R}_{\theta}\left(\Theta\right)}_{\Theta-\text{regularizer}}$$

where $\mathcal{R}_{w}\left(W\right)=\frac{1}{2}\sum_{i=1}^{m}\boldsymbol{w}_{i}^\top\boldsymbol{w}_{i}$ and $\mathcal{R}_{\theta}\left(\Theta\right)=\frac{1}{2}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}$ are regularization terms.

Define TRMF model with `Numpy`¶

Observing the optimization problem of TRMF model as mentioned above, we categorize the parameters within this model as parameters (i.e., init_para in the TRMF function) and hyperparameters (i.e., init_hyper).

Parameters include spatial matrix $W$, temporal matrix $X$, and AR coefficients $\Theta$.
Hyperparameters include weight parameters on some regularizers, i.e., $\lambda_w$, $\lambda_x$, $\lambda_\theta$, and $\eta$.

How to understand Python code of TRMF?¶

Update spatial matrix $W$¶

We write Python code for updating spatial matrix as follows,

for i in range(dim1):
    pos0 = np.where(sparse_mat[i, :] != 0)
    Xt = X[pos0[0], :]
    vec0 = Xt.T @ sparse_mat[i, pos0[0]]
    mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
    W[i, :] = mat0 @ vec0

For your better understanding of these codes, let us see what happened in each line. Recall that the equation for updating $W$ is $$\boldsymbol{w}_{i} \Leftarrow\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1} \sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}$$ from the optimizization problem: $$\min _{W} \frac{1}{2} \underbrace{\sum_{(i, t) \in \Omega}\left(y_{i t}-\boldsymbol{w}_{i}^{T} \boldsymbol{x}_{t}\right)^{2}}_{\text {sum of squared residual errors }}+\frac{1}{2} \lambda_{w} \underbrace{\sum_{i=1}^{m} \boldsymbol{w}_{i}^{T} \boldsymbol{w}_{i}}_{\text{sum of squared entries}}.$$

As can be seen,

vec0 = Xt.T @ sparse_mat[i, pos0[0]]) corresponds to $$\sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}.$$
mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank)) corresponds to $$\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1}.$$
W[i, :] = mat0 @ vec0 corresponds to the update:

$$\boldsymbol{w}_{i} \Leftarrow\left(\sum_{t:(i, t) \in \Omega} \boldsymbol{x}_{t} \boldsymbol{x}_{t}^{T}+\lambda_{w} I\right)^{-1} \sum_{t:(i, t) \in \Omega} y_{i t} \boldsymbol{x}_{t}.$$

Update temporal matrix $X$¶

We write Python code for updating temporal matrix as follows,

for t in range(dim2):
    pos0 = np.where(sparse_mat[:, t] != 0)
    Wt = W[pos0[0], :]
    Mt = np.zeros((rank, rank))
    Nt = np.zeros(rank)
    if t < np.max(time_lags):
        Pt = np.zeros((rank, rank))
        Qt = np.zeros(rank)
    else:
        Pt = np.eye(rank)
        Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
    if t < dim2 - np.min(time_lags):
        if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
            index = list(range(0, d))
        else:
            index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
        for k in index:
            Ak = theta[k, :]
            Mt += np.diag(Ak ** 2)
            theta0 = theta.copy()
            theta0[k, :] = 0
            Nt += np.multiply(Ak, X[t + time_lags[k], :]
                              - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
    vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
    mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
    X[t, :] = mat0 @ vec0

These codes seem to be very complicated. Let us first see the optimization problem for getting a closed-form update of $X$: $$\min_{W,X,\Theta}\frac{1}{2}\underbrace{\sum_{(i,t)\in\Omega}\left(y_{it}-\boldsymbol{w}_{i}^T\boldsymbol{x}_{t}\right)^2}_{\text{sum of squared residual errors}}+\underbrace{\frac{1}{2}\lambda_{x}\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)+\frac{1}{2}\lambda_{x}\eta\sum_{t=1}^{f}\boldsymbol{x}_{t}^\top\boldsymbol{x}_{t}}_{\text{AR-term}}+\underbrace{\frac{1}{2}\lambda_{\theta}\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}}_{\Theta-\text{term}}.$$

For $t=1,...,l_d$, update of $X$ is

$$\boldsymbol{x}_{t} \Leftarrow\left(\sum_{i:(i, t) \in \Omega} \boldsymbol{w}_{i} \boldsymbol{w}_{i}^{T}+\lambda_{x} \eta I\right)^{-1} \sum_{i:(i, t) \in \Omega} y_{i t} \boldsymbol{w}_{i}.$$

For $t=l_d+1,...,f$, update of $X$ is

$${\boldsymbol{x}_{t}\Leftarrow\left(\sum_{i:(i,t)\in\Omega}\boldsymbol{w}_{i}\boldsymbol{w}_{i}^{T}+\lambda_xI+\lambda_x\sum_{h\in\mathcal{L},t+h \leq T}\text{diag}(\boldsymbol{\theta}_{h}\circledast\boldsymbol{\theta}_{h})+\lambda_x\eta I\right)^{-1}}{\left(\sum_{i:(i,t)\in\Omega}y_{it}\boldsymbol{w}_{i}+\lambda_x\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}+\lambda_x\sum_{h\in\mathcal{L},t+h \leq T}\boldsymbol{\theta}_{h}\circledast\boldsymbol{\psi}_{t+h}\right)}.$$

Then, as can be seen,

Mt += np.diag(Ak ** 2) corresponds to $$\sum_{h\in\mathcal{L},t+h \leq T}\text{diag}(\boldsymbol{\theta}_{h}\circledast\boldsymbol{\theta}_{h}).$$
Nt += np.multiply(Ak, X[t + time_lags[k], :] - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :])) corresponds to $$\sum_{h\in\mathcal{L},t+h \leq T}\boldsymbol{\theta}_{h}\circledast\boldsymbol{\psi}_{t+h}.$$
Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :]) corresponds to $$\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}.$$

-X[t, :] = mat0 @ vec0 corresponds to the update of $X$.

Update AR coefficients $\Theta$¶

We write Python code for updating temporal matrix as follows,

for k in range(d):
    theta0 = theta.copy()
    theta0[k, :] = 0
    mat0 = np.zeros((dim2 - np.max(time_lags), rank))
    for L in range(d):
        mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
    VarPi = X[np.max(time_lags) : dim2, :] - mat0
    var1 = np.zeros((rank, rank))
    var2 = np.zeros(rank)
    for t in range(np.max(time_lags), dim2):
        B = X[t - time_lags[k], :]
        var1 += np.diag(np.multiply(B, B))
        var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
    theta[k, :] = inv(var1 + lambda_theta * np.eye(rank) / lambda_x) @ var2

For your better understanding of these codes, let us see what happened in each line. Recall that the equation for updating $\theta$ is $$ \color{red} {\boldsymbol{\theta}_{h}\Leftarrow\left(\sum_{t=l_d+1}^{f}\text{diag}(\boldsymbol{x}_{t-h}\circledast \boldsymbol{x}_{t-h})+\frac{\lambda_{\theta}}{\lambda_x}I\right)^{-1}\left(\sum_{t=l_d+1}^{f}{\boldsymbol{\pi}_{t}^{h}}\circledast \boldsymbol{x}_{t-h}\right)} $$ where $\boldsymbol{\pi}_{t}^{h}=\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L},l\neq h}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$ from the optimizization problem: $$ \min_{\Theta}\frac{1}{2}\lambda_{x}\underbrace{\sum_{t=l_d+1}^{f}\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)^\top\left(\boldsymbol{x}_{t}-\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}\right)}_{\text{sum of squared residual errors}}+\frac{1}{2}\lambda_{\theta}\underbrace{\sum_{l\in\mathcal{L}}\boldsymbol{\theta}_{l}^\top\boldsymbol{\theta}_{l}}_{\text{sum of squared entries}} $$

As can be seen,

mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :]) corresponds to $$\sum_{l\in\mathcal{L},l\neq h}\boldsymbol{\theta}_{l}\circledast\boldsymbol{x}_{t-l}$$.
var1 += np.diag(np.multiply(B, B)) corresponds to $$\sum_{t=l_d+1}^{f}\text{diag}(\boldsymbol{x}_{t-h}\circledast \boldsymbol{x}_{t-h}).$$
var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :] corresponds to $$\sum_{t=l_d+1}^{f}{\boldsymbol{\pi}_{t}^{h}}\circledast \boldsymbol{x}_{t-h}.$$

In [1]:

import numpy as np
from numpy.linalg import inv as inv


def TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter):
    """Temporal Regularized Matrix Factorization, TRMF."""
    
    ## Initialize parameters
    W = init_para["W"]
    X = init_para["X"]
    theta = init_para["theta"]
    
    ## Set hyperparameters
    lambda_w = init_hyper["lambda_w"]
    lambda_x = init_hyper["lambda_x"]
    lambda_theta = init_hyper["lambda_theta"]
    eta = init_hyper["eta"]
    
    dim1, dim2 = sparse_mat.shape
    pos_train = np.where(sparse_mat != 0)
    pos_test = np.where((dense_mat != 0) & (sparse_mat == 0))
    binary_mat = sparse_mat.copy()
    binary_mat[pos_train] = 1
    d, rank = theta.shape
    
    for it in range(maxiter):
        ## Update spatial matrix W
        for i in range(dim1):
            pos0 = np.where(sparse_mat[i, :] != 0)
            Xt = X[pos0[0], :]
            vec0 = Xt.T @ sparse_mat[i, pos0[0]]
            mat0 = inv(Xt.T @ Xt + lambda_w * np.eye(rank))
            W[i, :] = mat0 @ vec0
        ## Update temporal matrix X
        for t in range(dim2):
            pos0 = np.where(sparse_mat[:, t] != 0)
            Wt = W[pos0[0], :]
            Mt = np.zeros((rank, rank))
            Nt = np.zeros(rank)
            if t < np.max(time_lags):
                Pt = np.zeros((rank, rank))
                Qt = np.zeros(rank)
            else:
                Pt = np.eye(rank)
                Qt = np.einsum('ij, ij -> j', theta, X[t - time_lags, :])
            if t < dim2 - np.min(time_lags):
                if t >= np.max(time_lags) and t < dim2 - np.max(time_lags):
                    index = list(range(0, d))
                else:
                    index = list(np.where((t + time_lags >= np.max(time_lags)) & (t + time_lags < dim2)))[0]
                for k in index:
                    Ak = theta[k, :]
                    Mt += np.diag(Ak ** 2)
                    theta0 = theta.copy()
                    theta0[k, :] = 0
                    Nt += np.multiply(Ak, X[t + time_lags[k], :]
                                      - np.einsum('ij, ij -> j', theta0, X[t + time_lags[k] - time_lags, :]))
            vec0 = Wt.T @ sparse_mat[pos0[0], t] + lambda_x * Nt + lambda_x * Qt
            mat0 = inv(Wt.T @ Wt + lambda_x * Mt + lambda_x * Pt + lambda_x * eta * np.eye(rank))
            X[t, :] = mat0 @ vec0
        ## Update AR coefficients theta
        for k in range(d):
            theta0 = theta.copy()
            theta0[k, :] = 0
            mat0 = np.zeros((dim2 - np.max(time_lags), rank))
            for L in range(d):
                mat0 += X[np.max(time_lags) - time_lags[L] : dim2 - time_lags[L] , :] @ np.diag(theta0[L, :])
            VarPi = X[np.max(time_lags) : dim2, :] - mat0
            var1 = np.zeros((rank, rank))
            var2 = np.zeros(rank)
            for t in range(np.max(time_lags), dim2):
                B = X[t - time_lags[k], :]
                var1 += np.diag(np.multiply(B, B))
                var2 += np.diag(B) @ VarPi[t - np.max(time_lags), :]
            theta[k, :] = inv(var1 + lambda_theta * np.eye(rank) / lambda_x) @ var2

        mat_hat = W @ X.T
        mape = np.sum(np.abs(dense_mat[pos_test] - mat_hat[pos_test]) 
                      / dense_mat[pos_test]) / dense_mat[pos_test].shape[0]
        rmse = np.sqrt(np.sum((dense_mat[pos_test] - mat_hat[pos_test]) ** 2)/dense_mat[pos_test].shape[0])
        
        if (it + 1) % 100 == 0:
            print('Iter: {}'.format(it + 1))
            print('Imputation MAPE: {:.6}'.format(mape))
            print('Imputation RMSE: {:.6}'.format(rmse))
            print()
    
    return mat_hat

Missing Data Imputation¶

In the following, we apply the above defined TRMF function to the task of missing data imputation task on the following spatiotemporal multivariate time series datasets/matrices:

Guangzhou data set: Guangzhou urban traffic speed data set.
Birmingham data set: Birmingham parking data set.
Hangzhou data set: Hangzhou metro passenger flow data set.
Settle data set: Seattle freeway traffic speed data set.

The original data sets have been adapted into our experiments, and it is now available at the fold of datasets.

Evaluation on Guangzhou Speed Data¶

Scenario setting:

Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
Non-random missing (NM)
40% missing rate

In [2]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 10
Time lags: {1, 2, 144}
The number of iterations: 200

In [3]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 144])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.103494
Imputation RMSE: 4.35683

Iter: 200
Imputation MAPE: 0.103585
Imputation RMSE: 4.36487

Running time: 438 seconds

Scenario setting:

Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
Random missing (RM)
40% missing rate

In [4]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 80
Time lags: {1, 2, 144}
The number of iterations: 200

In [5]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 80
time_lags = np.array([1, 2, 144])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0782638
Imputation RMSE: 3.27797

Iter: 200
Imputation MAPE: 0.0776718
Imputation RMSE: 3.25724

Running time: 888 seconds

Scenario setting:

Tensor size: $214\times 61\times 144$ (road segment, day, time of day)
Random missing (RM)
60% missing rate

In [6]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 80
Time lags: {1, 2, 144}
The number of iterations: 200

In [7]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 80
time_lags = np.array([1, 2, 144])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0867978
Imputation RMSE: 3.59856

Iter: 200
Imputation MAPE: 0.0847537
Imputation RMSE: 3.51839

Running time: 857 seconds

Evaluation on Hangzhou Flow Data¶

Scenario setting:

Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
Non-random missing (NM)
40% missing rate

In [10]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 30
Time lags: {1, 2, 108}
The number of iterations: 200

In [11]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 30
time_lags = np.array([1, 2, 108])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.257291
Imputation RMSE: 38.1766

Iter: 200
Imputation MAPE: 0.260397
Imputation RMSE: 37.864

Running time: 179 seconds

Scenario setting:

Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
Random missing (RM)
40% missing rate

In [12]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 30
Time lags: {1, 2, 108}
The number of iterations: 200

In [13]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 30
time_lags = np.array([1, 2, 108])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.232536
Imputation RMSE: 35.868

Iter: 200
Imputation MAPE: 0.232085
Imputation RMSE: 36.6294

Running time: 153 seconds

Scenario setting:

Tensor size: $80\times 25\times 108$ (metro station, day, time of day)
Random missing (RM)
60% missing rate

In [14]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')['tensor']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 30
Time lags: {1, 2, 108}
The number of iterations: 200

In [15]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 30
time_lags = np.array([1, 2, 108])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.242496
Imputation RMSE: 40.4021

Iter: 200
Imputation MAPE: 0.244181
Imputation RMSE: 41.1514

Running time: 138 seconds

Evaluation on Seattle Speed Data¶

Scenario setting:

Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
Non-random missing (NM)
40% missing rate

In [16]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.4 # Non-random missing (NM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1])[:, :, np.newaxis] + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 10
Time lags: {1, 2, 288}
The number of iterations: 200

In [17]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 10
time_lags = np.array([1, 2, 288])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0919508
Imputation RMSE: 5.29839

Iter: 200
Imputation MAPE: 0.0919374
Imputation RMSE: 5.29846

Running time: 391 seconds

Scenario setting:

Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
Random missing (RM)
40% missing rate

In [18]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.4 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 50
Time lags: {1, 2, 288}
The number of iterations: 200

In [19]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 50
time_lags = np.array([1, 2, 288])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0618055
Imputation RMSE: 3.79776

Iter: 200
Imputation MAPE: 0.0617049
Imputation RMSE: 3.79509

Running time: 615 seconds

Scenario setting:

Tensor size: $323\times 28\times 288$ (road segment, day, time of day)
Random missing (RM)
60% missing rate

In [20]:

import scipy.io
import numpy as np
np.random.seed(1000)

dense_tensor = scipy.io.loadmat('../datasets/Seattle-data-set/tensor.npz')['arr_0']
dim = dense_tensor.shape
missing_rate = 0.6 # Random missing (RM)
sparse_tensor = dense_tensor * np.round(np.random.rand(dim[0], dim[1], dim[2]) + 0.5 - missing_rate)
dense_mat = dense_tensor.reshape([dim[0], dim[1] * dim[2]])
sparse_mat = sparse_tensor.reshape([dim[0], dim[1] * dim[2]])
del dense_tensor, sparse_tensor

Model setting:

Low rank: 50
Time lags: {1, 2, 288}
The number of iterations: 200

In [21]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 50
time_lags = np.array([1, 2, 288])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0651363
Imputation RMSE: 3.93597

Iter: 200
Imputation MAPE: 0.0647982
Imputation RMSE: 3.92264

Running time: 620 seconds

Evaluation on London Movement Speed Data¶

London movement speed data set is is a city-wide hourly traffic speeddataset collected in London.

Collected from 200,000+ road segments.
720 time points in April 2019.
73% missing values in the original data.

Observation rate	$>90\%$	$>80\%$	$>70\%$	$>60\%$	$>50\%$
Number of roads	17,666	27,148	35,912	44,352	52,727

If want to test on the full dataset, you could consider the following setting for masking observations as missing values.

import numpy as np
np.random.seed(1000)
mask_rate = 0.20

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
pos_obs = np.where(dense_mat != 0)
num = len(pos_obs[0])
sample_ind = np.random.choice(num, size = int(mask_rate * num), replace = False)
sparse_mat = dense_mat.copy()
sparse_mat[pos_obs[0][sample_ind], pos_obs[1][sample_ind]] = 0

Notably, you could also consider to evaluate the model on a subset of the data with the following setting.

In [22]:

import numpy as np
np.random.seed(1000)

missing_rate = 0.4

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Non-random missing (NM)
binary_mat = np.zeros(dense_mat.shape)
random_mat = np.random.rand(dense_mat.shape[0], 30)
for i1 in range(dense_mat.shape[0]):
    for i2 in range(30):
        binary_mat[i1, i2 * 24 : (i2 + 1) * 24] = np.round(random_mat[i1, i2] + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat

Model setting:

Low rank: 20
Time lags: {1, 2, 24}
The number of iterations: 200

In [23]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 20
time_lags = np.array([1, 2, 24])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0958622
Imputation RMSE: 2.3367

Iter: 200
Imputation MAPE: 0.0957049
Imputation RMSE: 2.33306

Running time: 960 seconds

In [24]:

import numpy as np
np.random.seed(1000)

missing_rate = 0.4

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat

Model setting:

Low rank: 20
Time lags: {1, 2, 24}
The number of iterations: 200

In [25]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 20
time_lags = np.array([1, 2, 24])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0920514
Imputation RMSE: 2.23467

Iter: 200
Imputation MAPE: 0.0919819
Imputation RMSE: 2.23439

Running time: 1123 seconds

In [26]:

import numpy as np
np.random.seed(1000)

missing_rate = 0.6

dense_mat = np.load('../datasets/London-data-set/hourly_speed_mat.npy')
binary_mat = dense_mat.copy()
binary_mat[binary_mat != 0] = 1
pos = np.where(np.sum(binary_mat, axis = 1) > 0.7 * binary_mat.shape[1])
dense_mat = dense_mat[pos[0], :]

## Random missing (RM)
random_mat = np.random.rand(dense_mat.shape[0], dense_mat.shape[1])
binary_mat = np.round(random_mat + 0.5 - missing_rate)
sparse_mat = dense_mat * binary_mat

Model setting:

Low rank: 20
Time lags: {1, 2, 24}
The number of iterations: 200

In [27]:

import time
start = time.time()
dim1, dim2 = sparse_mat.shape
rank = 20
time_lags = np.array([1, 2, 24])
d = time_lags.shape[0]
## Initialize parameters
W = 0.1 * np.random.rand(dim1, rank)
X = 0.1 * np.random.rand(dim2, rank)
theta = 0.1 * np.random.rand(d, rank)
init_para = {"W": W, "X": X, "theta": theta}
## Set hyparameters
lambda_w = 500
lambda_x = 500
lambda_theta = 500
eta = 0.03
init_hyper = {"lambda_w": lambda_w, "lambda_x": lambda_x, "lambda_theta": lambda_theta, "eta": eta}
maxiter = 200
TRMF(dense_mat, sparse_mat, init_para, init_hyper, time_lags, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))

Iter: 100
Imputation MAPE: 0.0936975
Imputation RMSE: 2.27472

Iter: 200
Imputation MAPE: 0.0936174
Imputation RMSE: 2.27413

Running time: 951 seconds

License¶

This work is released under the MIT license.

About this Notebook¶

Quick Run¶

Data Organization: Matrix Structure¶

TRMF model¶

Define TRMF model with Numpy¶

How to understand Python code of TRMF?¶

Update spatial matrix $W$¶

Update temporal matrix $X$¶

Update AR coefficients $\Theta$¶

Missing Data Imputation¶

Evaluation on Guangzhou Speed Data¶

Evaluation on Hangzhou Flow Data¶

Evaluation on Seattle Speed Data¶

Evaluation on London Movement Speed Data¶

License¶

Define TRMF model with `Numpy`¶