About this Notebook


Bayesian Gaussian CP decomposition (or BGCP for short) is a type of Bayesian tensor decomposition that achieves state-of-the-art results on challenging the missing data imputation problem. In the following, we will discuss:

  • What the Bayesian Gaussian CP decomposition is.

  • How to implement BGCP mainly using Python numpy with high efficiency.

  • How to make imputations with real-world spatiotemporal datasets.

If you want to understand BGCP and its modeling tricks in detail, our paper is for you:

Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [data] [Matlab code]

Quick Run

This notebook is publicly available for any usage at our data imputation project. Please click transdim - GitHub.

We start by importing the necessary dependencies. We will make use of numpy and scipy.

In [1]:
import numpy as np
from numpy.random import multivariate_normal as mvnrnd
from scipy.stats import wishart
from numpy.linalg import inv as inv

Part 1: Matrix Computation Concepts

1) Kronecker product

  • Definition:

Given two matrices $A\in\mathbb{R}^{m_1\times n_1}$ and $B\in\mathbb{R}^{m_2\times n_2}$, then, the Kronecker product between these two matrices is defined as

$$A\otimes B=\left[ \begin{array}{cccc} a_{11}B & a_{12}B & \cdots & a_{1m_2}B \\ a_{21}B & a_{22}B & \cdots & a_{2m_2}B \\ \vdots & \vdots & \ddots & \vdots \\ a_{m_11}B & a_{m_12}B & \cdots & a_{m_1m_2}B \\ \end{array} \right]$$

where the symbol $\otimes$ denotes Kronecker product, and the size of resulted $A\otimes B$ is $(m_1m_2)\times (n_1n_2)$ (i.e., $m_1\times m_2$ columns and $n_1\times n_2$ rows).

  • Example:

If $A=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]$ and $B=\left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10 \\ \end{array} \right]$, then, we have

$$A\otimes B=\left[ \begin{array}{cc} 1\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] & 2\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] \\ 3\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] & 4\times \left[ \begin{array}{ccc} 5 & 6 & 7\\ 8 & 9 & 10\\ \end{array} \right] \\ \end{array} \right]$$$$=\left[ \begin{array}{cccccc} 5 & 6 & 7 & 10 & 12 & 14 \\ 8 & 9 & 10 & 16 & 18 & 20 \\ 15 & 18 & 21 & 20 & 24 & 28 \\ 24 & 27 & 30 & 32 & 36 & 40 \\ \end{array} \right]\in\mathbb{R}^{4\times 6}.$$

2) Khatri-Rao product (kr_prod)

  • Definition:

Given two matrices $A=\left( \boldsymbol{a}_1,\boldsymbol{a}_2,...,\boldsymbol{a}_r \right)\in\mathbb{R}^{m\times r}$ and $B=\left( \boldsymbol{b}_1,\boldsymbol{b}_2,...,\boldsymbol{b}_r \right)\in\mathbb{R}^{n\times r}$ with same number of columns, then, the Khatri-Rao product (or column-wise Kronecker product) between $A$ and $B$ is given as follows,

$$A\odot B=\left( \boldsymbol{a}_1\otimes \boldsymbol{b}_1,\boldsymbol{a}_2\otimes \boldsymbol{b}_2,...,\boldsymbol{a}_r\otimes \boldsymbol{b}_r \right)\in\mathbb{R}^{(mn)\times r},$$

where the symbol $\odot$ denotes Khatri-Rao product, and $\otimes$ denotes Kronecker product.

  • Example:

If $A=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]=\left( \boldsymbol{a}_1,\boldsymbol{a}_2 \right) $ and $B=\left[ \begin{array}{cc} 5 & 6 \\ 7 & 8 \\ 9 & 10 \\ \end{array} \right]=\left( \boldsymbol{b}_1,\boldsymbol{b}_2 \right) $, then, we have

$$A\odot B=\left( \boldsymbol{a}_1\otimes \boldsymbol{b}_1,\boldsymbol{a}_2\otimes \boldsymbol{b}_2 \right) $$$$=\left[ \begin{array}{cc} \left[ \begin{array}{c} 1 \\ 3 \\ \end{array} \right]\otimes \left[ \begin{array}{c} 5 \\ 7 \\ 9 \\ \end{array} \right] & \left[ \begin{array}{c} 2 \\ 4 \\ \end{array} \right]\otimes \left[ \begin{array}{c} 6 \\ 8 \\ 10 \\ \end{array} \right] \\ \end{array} \right]$$$$=\left[ \begin{array}{cc} 5 & 12 \\ 7 & 16 \\ 9 & 20 \\ 15 & 24 \\ 21 & 32 \\ 27 & 40 \\ \end{array} \right]\in\mathbb{R}^{6\times 2}.$$
In [2]:
def kr_prod(a, b):
    return np.einsum('ir, jr -> ijr', a, b).reshape(a.shape[0] * b.shape[0], -1)
In [3]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8], [9, 10]])
print(kr_prod(A, B))
[[ 5 12]
 [ 7 16]
 [ 9 20]
 [15 24]
 [21 32]
 [27 40]]

3) CP decomposition

CP Combination (cp_combine)

  • Definition:

The CP decomposition factorizes a tensor into a sum of outer products of vectors. For example, for a third-order tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$, the CP decomposition can be written as

$$\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s},$$

or element-wise,

$$\hat{y}_{ijt}=\sum_{s=1}^{r}u_{is}v_{js}x_{ts},\forall (i,j,t),$$

where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of factor matrices $U\in\mathbb{R}^{m\times r},V\in\mathbb{R}^{n\times r},X\in\mathbb{R}^{f\times r}$, respectively. The symbol $\circ$ denotes vector outer product.

  • Example:

Given matrices $U=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ \end{array} \right]\in\mathbb{R}^{2\times 2}$, $V=\left[ \begin{array}{cc} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{array} \right]\in\mathbb{R}^{3\times 2}$ and $X=\left[ \begin{array}{cc} 1 & 5 \\ 2 & 6 \\ 3 & 7 \\ 4 & 8 \\ \end{array} \right]\in\mathbb{R}^{4\times 2}$, then if $\hat{\mathcal{Y}}=\sum_{s=1}^{r}\boldsymbol{u}_{s}\circ\boldsymbol{v}_{s}\circ\boldsymbol{x}_{s}$, then, we have

$$\hat{Y}_1=\hat{\mathcal{Y}}(:,:,1)=\left[ \begin{array}{ccc} 31 & 42 & 65 \\ 63 & 86 & 135 \\ \end{array} \right],$$$$\hat{Y}_2=\hat{\mathcal{Y}}(:,:,2)=\left[ \begin{array}{ccc} 38 & 52 & 82 \\ 78 & 108 & 174 \\ \end{array} \right],$$$$\hat{Y}_3=\hat{\mathcal{Y}}(:,:,3)=\left[ \begin{array}{ccc} 45 & 62 & 99 \\ 93 & 130 & 213 \\ \end{array} \right],$$$$\hat{Y}_4=\hat{\mathcal{Y}}(:,:,4)=\left[ \begin{array}{ccc} 52 & 72 & 116 \\ 108 & 152 & 252 \\ \end{array} \right].$$
In [4]:
def cp_combine(U, V, X):
    return np.einsum('is, js, ts -> ijt', U, V, X)
In [5]:
U = np.array([[1, 2], [3, 4]])
V = np.array([[1, 3], [2, 4], [5, 6]])
X = np.array([[1, 5], [2, 6], [3, 7], [4, 8]])
print(cp_combine(U, V, X))
print()
print('tensor size:')
print(cp_combine(U, V, X).shape)
[[[ 31  38  45  52]
  [ 42  52  62  72]
  [ 65  82  99 116]]

 [[ 63  78  93 108]
  [ 86 108 130 152]
  [135 174 213 252]]]

tensor size:
(2, 3, 4)

4) Tensor Unfolding (ten2mat)

Using numpy reshape to perform 3rd rank tensor unfold operation. [link]

In [6]:
def ten2mat(tensor, mode):
    return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')
In [7]:
X = np.array([[[1, 2, 3, 4], [3, 4, 5, 6]], 
              [[5, 6, 7, 8], [7, 8, 9, 10]], 
              [[9, 10, 11, 12], [11, 12, 13, 14]]])
print('tensor size:')
print(X.shape)
print('original tensor:')
print(X)
print()
print('(1) mode-1 tensor unfolding:')
print(ten2mat(X, 0))
print()
print('(2) mode-2 tensor unfolding:')
print(ten2mat(X, 1))
print()
print('(3) mode-3 tensor unfolding:')
print(ten2mat(X, 2))
tensor size:
(3, 2, 4)
original tensor:
[[[ 1  2  3  4]
  [ 3  4  5  6]]

 [[ 5  6  7  8]
  [ 7  8  9 10]]

 [[ 9 10 11 12]
  [11 12 13 14]]]

(1) mode-1 tensor unfolding:
[[ 1  3  2  4  3  5  4  6]
 [ 5  7  6  8  7  9  8 10]
 [ 9 11 10 12 11 13 12 14]]

(2) mode-2 tensor unfolding:
[[ 1  5  9  2  6 10  3  7 11  4  8 12]
 [ 3  7 11  4  8 12  5  9 13  6 10 14]]

(3) mode-3 tensor unfolding:
[[ 1  5  9  3  7 11]
 [ 2  6 10  4  8 12]
 [ 3  7 11  5  9 13]
 [ 4  8 12  6 10 14]]

5) Computing Covariance Matrix (cov_mat)

For any matrix $X\in\mathbb{R}^{m\times n}$, cov_mat can return a $n\times n$ covariance matrix for special use in the following.

In [8]:
def cov_mat(mat):
    dim1, dim2 = mat.shape
    new_mat = np.zeros((dim2, dim2))
    mat_bar = np.mean(mat, axis = 0)
    for i in range(dim1):
        new_mat += np.einsum('i, j -> ij', mat[i, :] - mat_bar, mat[i, :] - mat_bar)
    return new_mat

Part 2: Bayesian Gaussian CP decomposition (BGCP)

1) Model Description

Gaussian assumption

Given a matrix $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$ which suffers from missing values, then the factorization can be applied to reconstruct the missing values within $\mathcal{Y}$ by

$$y_{ijt}\sim\mathcal{N}\left(\sum_{s=1}^{r}u_{is} v_{js} x_{ts},\tau^{-1}\right),\forall (i,j,t),$$

where vectors $\boldsymbol{u}_{s}\in\mathbb{R}^{m},\boldsymbol{v}_{s}\in\mathbb{R}^{n},\boldsymbol{x}_{s}\in\mathbb{R}^{f}$ are columns of latent factor matrices, and $u_{is},v_{js},x_{ts}$ are their elements. The precision term $\tau$ is an inverse of Gaussian variance.

Bayesian framework

Based on the Gaussian assumption over tensor elements $y_{ijt},(i,j,t)\in\Omega$ (where $\Omega$ is a index set indicating observed tensor elements), the conjugate priors of model parameters (i.e., latent factors and precision term) and hyperparameters are given as

$$\boldsymbol{u}_{i}\sim\mathcal{N}\left(\boldsymbol{\mu}_{u},\Lambda_{u}^{-1}\right),\forall i,$$$$\boldsymbol{v}_{j}\sim\mathcal{N}\left(\boldsymbol{\mu}_{v},\Lambda_{v}^{-1}\right),\forall j,$$$$\boldsymbol{x}_{t}\sim\mathcal{N}\left(\boldsymbol{\mu}_{x},\Lambda_{x}^{-1}\right),\forall t,$$$$\tau\sim\text{Gamma}\left(a_0,b_0\right),$$$$\boldsymbol{\mu}_{u}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_u\right)^{-1}\right),\Lambda_u\sim\mathcal{W}\left(W_0,\nu_0\right),$$$$\boldsymbol{\mu}_{v}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_v\right)^{-1}\right),\Lambda_v\sim\mathcal{W}\left(W_0,\nu_0\right),$$$$\boldsymbol{\mu}_{x}\sim\mathcal{N}\left(\boldsymbol{\mu}_0,\left(\beta_0\Lambda_x\right)^{-1}\right),\Lambda_x\sim\mathcal{W}\left(W_0,\nu_0\right).$$

2) Posterior Inference

In the following, we will apply Gibbs sampling to implement our Bayesian inference for the matrix factorization task.

- Sampling latent factors $\boldsymbol{u}_{i},i\in\left\{1,2,...,m\right\}$

Draw $\boldsymbol{u}_{i}\sim\mathcal{N}\left(\boldsymbol{\mu}_i^{*},(\Lambda_{i}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{i}^{*}=\left(\Lambda_{i}^{*}\right)^{-1}\left\{\tau\sum_{j,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)+\Lambda_u\boldsymbol{\mu}_u\right\},$$$$\Lambda_{i}^{*}=\tau\sum_{j,t:(i,j,t)\in\Omega}\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)\left(\boldsymbol{v}_{j}\circledast\boldsymbol{x}_{t}\right)^{T}+\Lambda_u.$$

- Sampling latent factors $\boldsymbol{v}_{j},j\in\left\{1,2,...,n\right\}$

Draw $\boldsymbol{v}_{j}\sim\mathcal{N}\left(\boldsymbol{\mu}_j^{*},(\Lambda_{j}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{j}^{*}=\left(\Lambda_{j}^{*}\right)^{-1}\left\{\tau\sum_{i,t:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)+\Lambda_v\boldsymbol{\mu}_v\right\}$$$$\Lambda_{j}^{*}=\tau\sum_{i,t:(i,j,t)\in\Omega}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)\left(\boldsymbol{u}_{i}\circledast\boldsymbol{x}_{t}\right)^{T}+\Lambda_v.$$

- Sampling latent factors $\boldsymbol{x}_{t},t\in\left\{1,2,...,f\right\}$

Draw $\boldsymbol{x}_{t}\sim\mathcal{N}\left(\boldsymbol{\mu}_t^{*},(\Lambda_{t}^{*})^{-1}\right)$ with following parameters:

$$\boldsymbol{\mu}_{t}^{*}=\left(\Lambda_{t}^{*}\right)^{-1}\left\{\tau\sum_{i,j:(i,j,t)\in\Omega}y_{ijt}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)+\Lambda_x\boldsymbol{\mu}_x\right\}$$$$\Lambda_{t}^{*}=\tau\sum_{i,j:(i,j,t)\in\Omega}\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)\left(\boldsymbol{u}_{i}\circledast\boldsymbol{v}_{j}\right)^{T}+\Lambda_x.$$

- Sampling precision term $\tau$

Draw $\tau\in\text{Gamma}\left(a^{*},b^{*}\right)$ with following parameters:

$$a^{*}=a_0+\frac{1}{2}|\Omega|,~b^{*}=b_0+\frac{1}{2}\sum_{(i,j,t)\in\Omega}\left(y_{ijt}-\sum_{s=1}^{r}u_{is}v_{js}x_{ts}\right)^2.$$

- Sampling hyperparameters $\left(\boldsymbol{\mu}_{u},\Lambda_{u}\right)$

Draw

  • $\Lambda_{u}\sim\mathcal{W}\left(W_u^{*},\nu_u^{*}\right)$
  • $\boldsymbol{\mu}_{u}\sim\mathcal{N}\left(\boldsymbol{\mu}_{u}^{*},\left(\beta_u^{*}\Lambda_u\right)^{-1}\right)$

with following parameters:

$$\boldsymbol{\mu}_{u}^{*}=\frac{m\boldsymbol{\bar{u}}+\beta_0\boldsymbol{\mu}_0}{m+\beta_0},~\beta_u^{*}=m+\beta_0,~\nu_u^{*}=m+\nu_0,$$$$\left(W_u^{*}\right)^{-1}=W_0^{-1}+mS_u+\frac{m\beta_0}{m+\beta_0}\left(\boldsymbol{\bar{u}}-\boldsymbol{\mu}_0\right)\left(\boldsymbol{\bar{u}}-\boldsymbol{\mu}_0\right)^T,$$

where $\boldsymbol{\bar{u}}=\sum_{i=1}^{m}\boldsymbol{u}_{i},~S_u=\frac{1}{m}\sum_{i=1}^{m}\left(\boldsymbol{u}_{i}-\boldsymbol{\bar{u}}\right)\left(\boldsymbol{u}_{i}-\boldsymbol{\bar{u}}\right)^T$.

- Sampling hyperparameters $\left(\boldsymbol{\mu}_{v},\Lambda_{v}\right)$

Draw

  • $\Lambda_{v}\sim\mathcal{W}\left(W_v^{*},\nu_v^{*}\right)$
  • $\boldsymbol{\mu}_{v}\sim\mathcal{N}\left(\boldsymbol{\mu}_{v}^{*},\left(\beta_v^{*}\Lambda_v\right)^{-1}\right)$

with following parameters:

$$\boldsymbol{\mu}_{v}^{*}=\frac{n\boldsymbol{\bar{v}}+\beta_0\boldsymbol{\mu}_0}{n+\beta_0},~\beta_v^{*}=n+\beta_0,~\nu_v^{*}=n+\nu_0,$$$$\left(W_v^{*}\right)^{-1}=W_0^{-1}+nS_v+\frac{n\beta_0}{n+\beta_0}\left(\boldsymbol{\bar{v}}-\boldsymbol{\mu}_0\right)\left(\boldsymbol{\bar{v}}-\boldsymbol{\mu}_0\right)^T,$$

where $\boldsymbol{\bar{v}}=\sum_{j=1}^{n}\boldsymbol{v}_{j},~S_v=\frac{1}{n}\sum_{j=1}^{n}\left(\boldsymbol{v}_{j}-\boldsymbol{\bar{v}}\right)\left(\boldsymbol{v}_{j}-\boldsymbol{\bar{v}}\right)^T$.

- Sampling hyperparameters $\left(\boldsymbol{\mu}_{x},\Lambda_{x}\right)$

Draw

  • $\Lambda_{x}\sim\mathcal{W}\left(W_x^{*},\nu_x^{*}\right)$
  • $\boldsymbol{\mu}_{x}\sim\mathcal{N}\left(\boldsymbol{\mu}_{x}^{*},\left(\beta_x^{*}\Lambda_x\right)^{-1}\right)$

with following parameters:

$$\boldsymbol{\mu}_{x}^{*}=\frac{f\boldsymbol{\bar{x}}+\beta_0\boldsymbol{\mu}_0}{f+\beta_0},~\beta_x^{*}=f+\beta_0,~\nu_x^{*}=f+\nu_0,$$$$\left(W_x^{*}\right)^{-1}=W_0^{-1}+fS_x+\frac{f\beta_0}{f+\beta_0}\left(\boldsymbol{\bar{x}}-\boldsymbol{\mu}_0\right)\left(\boldsymbol{\bar{x}}-\boldsymbol{\mu}_0\right)^T,$$

where $\boldsymbol{\bar{x}}=\sum_{t=1}^{f}\boldsymbol{x}_{t},~S_x=\frac{1}{f}\sum_{t=1}^{f}\left(\boldsymbol{x}_{t}-\boldsymbol{\bar{x}}\right)\left(\boldsymbol{x}_{t}-\boldsymbol{\bar{x}}\right)^T$.

In [9]:
def BGCP(dense_tensor, sparse_tensor, init, rank, maxiter1, maxiter2):
    """Bayesian Gaussian CP (BGCP) decomposition."""
    dim1, dim2, dim3 = sparse_tensor.shape
    binary_tensor = np.zeros((dim1, dim2, dim3))
    dim = np.array([dim1, dim2, dim3])
    pos = np.where((dense_tensor != 0) & (sparse_tensor == 0))
    position = np.where(sparse_tensor != 0)
    binary_tensor[position] = 1
    
    U = init["U"]
    V = init["V"]
    X = init["X"]

    beta0 = 1
    nu0 = rank
    mu0 = np.zeros((rank))
    W0 = np.eye(rank)
    tau = 1
    alpha = 1e-6
    beta = 1e-6
    
    U_plus = np.zeros((dim1, rank))
    V_plus = np.zeros((dim2, rank))
    X_plus = np.zeros((dim3, rank))
    tensor_hat_plus = np.zeros((dim1, dim2, dim3))
    for iters in range(maxiter1):
        for order in range(dim.shape[0]):
            if order == 0:
                mat = U.copy()
            elif order == 1:
                mat = V.copy()
            else:
                mat = X.copy()
            mat_bar = np.mean(mat, axis = 0)
            var_mu_hyper = (dim[order] * mat_bar + beta0 * mu0)/(dim[order] + beta0)
            var_W_hyper = inv(inv(W0) + cov_mat(mat) + dim[order] * beta0/(dim[order] + beta0)
                              * np.outer(mat_bar - mu0, mat_bar - mu0))
            var_Lambda_hyper = wishart(df = dim[order] + nu0, scale = var_W_hyper, seed = None).rvs()
            var_mu_hyper = mvnrnd(var_mu_hyper, inv((dim[order] + beta0) * var_Lambda_hyper))

            if order == 0:
                var1 = kr_prod(X, V).T
            elif order == 1:
                var1 = kr_prod(X, U).T
            else:
                var1 = kr_prod(V, U).T
            var2 = kr_prod(var1, var1)
            var3 = (tau * np.matmul(var2, ten2mat(binary_tensor, order).T).reshape([rank, rank, dim[order]])
                    + np.dstack([var_Lambda_hyper] * dim[order]))
            var4 = (tau * np.matmul(var1, ten2mat(sparse_tensor, order).T) 
                    + np.dstack([np.matmul(var_Lambda_hyper, var_mu_hyper)] * dim[order])[0, :, :])
            for i in range(dim[order]):
                var_Lambda = var3[ :, :, i]
                inv_var_Lambda = inv((var_Lambda + var_Lambda.T)/2)
                vec = mvnrnd(np.matmul(inv_var_Lambda, var4[:, i]), inv_var_Lambda)
                if order == 0:
                    U[i, :] = vec.copy()
                elif order == 1:
                    V[i, :] = vec.copy()
                else:
                    X[i, :] = vec.copy()

        if iters + 1 > maxiter1 - maxiter2:
            U_plus += U
            V_plus += V
            X_plus += X

        tensor_hat = cp_combine(U, V, X)
        if iters + 1 > maxiter1 - maxiter2:
            tensor_hat_plus += tensor_hat
        rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2)/dense_tensor[pos].shape[0])
        
        var_alpha = alpha + 0.5 * sparse_tensor[position].shape[0]
        error = sparse_tensor - tensor_hat
        var_beta = beta + 0.5 * np.sum(error[position] ** 2)
        tau = np.random.gamma(var_alpha, 1/var_beta)
        
        if (iters + 1) % 200 == 0 and iters < maxiter1 - maxiter2:
            print('Iter: {}'.format(iters + 1))
            print('RMSE: {:.6}'.format(rmse))
            print()
        
    U = U_plus/maxiter2
    V = V_plus/maxiter2
    X = X_plus/maxiter2
    tensor_hat = tensor_hat_plus/maxiter2
    final_mape = np.sum(np.abs(dense_tensor[pos] - tensor_hat[pos])/dense_tensor[pos])/dense_tensor[pos].shape[0]
    final_rmse = np.sqrt(np.sum((dense_tensor[pos] - tensor_hat[pos]) ** 2)/dense_tensor[pos].shape[0])
    print('Final MAPE: {:.6}'.format(final_mape))
    print('Final RMSE: {:.6}'.format(final_rmse))
    print()

Part 3: Data Organization

1) Matrix Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{f},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We express spatio-temporal dataset as a matrix $Y\in\mathbb{R}^{m\times f}$ with $m$ rows (e.g., locations) and $f$ columns (e.g., discrete time intervals),

$$Y=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \cdots & y_{mf} \\ \end{array} \right]\in\mathbb{R}^{m\times f}.$$

2) Tensor Structure

We consider a dataset of $m$ discrete time series $\boldsymbol{y}_{i}\in\mathbb{R}^{nf},i\in\left\{1,2,...,m\right\}$. The time series may have missing elements. We partition each time series into intervals of predifined length $f$. We express each partitioned time series as a matrix $Y_{i}$ with $n$ rows (e.g., days) and $f$ columns (e.g., discrete time intervals per day),

$$Y_{i}=\left[ \begin{array}{cccc} y_{11} & y_{12} & \cdots & y_{1f} \\ y_{21} & y_{22} & \cdots & y_{2f} \\ \vdots & \vdots & \ddots & \vdots \\ y_{n1} & y_{n2} & \cdots & y_{nf} \\ \end{array} \right]\in\mathbb{R}^{n\times f},i=1,2,...,m,$$

therefore, the resulting structure is a tensor $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$.

How to transform a data set into something we can use for time series imputation?

Part 4: Experiments on Guangzhou Data Set

In [10]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario:
# binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario:
binary_tensor = np.zeros(dense_tensor.shape)
for i1 in range(dense_tensor.shape[0]):
    for i2 in range(dense_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)

Question: Given only the partially observed data $\mathcal{Y}\in\mathbb{R}^{m\times n\times f}$, how can we impute the unknown missing values?

The main influential factors for such imputation model are:

  • rank.

  • maxiter1.

  • maxiter2.

In [11]:
import time
start = time.time()
dim1, dim2, dim3 = sparse_tensor.shape
rank = 10
init = {"U": 0.1 * np.random.rand(dim1, rank),
        "V": 0.1 * np.random.rand(dim2, rank),
       "X": 0.1 * np.random.rand(dim3, rank)}
maxiter1 = 1100
maxiter2 = 100
BGCP(dense_tensor, sparse_tensor, init, rank, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))
Iter: 200
RMSE: 4.34597

Iter: 400
RMSE: 4.34085

Iter: 600
RMSE: 4.33664

Iter: 800
RMSE: 4.32873

Iter: 1000
RMSE: 4.33022

Final MAPE: 0.10226
Final RMSE: 4.32141

Running time: 279 seconds

Experiment results of missing data imputation using Bayesian Gaussian CP decomposition (BGCP):

scenario rank maxiter1 maxiter2 mape rmse
20%, RM 80 1100 100 0.0828 3.5729
40%, RM 80 1100 100 0.0829 3.5869
20%, NM 10 1100 100 0.1023 4.2756
40%, NM 10 1100 100 0.1023 4.3214

Part 5: Experiments on Birmingham Data Set

In [12]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Birmingham-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Birmingham-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.1

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario:
# binary_tensor = np.zeros(dense_tensor.shape)
# for i1 in range(dense_tensor.shape[0]):
#     for i2 in range(dense_tensor.shape[1]):
#         binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)
In [13]:
import time
start = time.time()
dim1, dim2, dim3 = sparse_tensor.shape
rank = 30
init = {"U": 0.1 * np.random.rand(dim1, rank),
        "V": 0.1 * np.random.rand(dim2, rank),
       "X": 0.1 * np.random.rand(dim3, rank)}
maxiter1 = 1100
maxiter2 = 100
BGCP(dense_tensor, sparse_tensor, init, rank, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))
Iter: 200
RMSE: 22.9929

Iter: 400
RMSE: 22.202

Iter: 600
RMSE: 20.9082

Iter: 800
RMSE: 20.7856

Iter: 1000
RMSE: 20.5334

Final MAPE: 0.0649825
Final RMSE: 19.6926

Running time: 72 seconds

Experiment results of missing data imputation using Bayesian Gaussian CP decomposition (BGCP):

scenario rank maxiter1 maxiter2 mape rmse
10%, RM 30 1100 100 0.0650 19.6926
30%, RM 30 1100 100 0.0623 19.982
10%, NM 10 1100 100 0.1364 43.1498
30%, NM 10 1100 100 0.1593 57.0697

Part 6: Experiments on Hangzhou Data Set

In [14]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Hangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

missing_rate = 0.4

# =============================================================================
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario:
# binary_tensor = np.zeros(dense_tensor.shape)
# for i1 in range(dense_tensor.shape[0]):
#     for i2 in range(dense_tensor.shape[1]):
#         binary_tensor[i1, i2, :] = np.round(random_matrix[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)
In [15]:
import time
start = time.time()
dim1, dim2, dim3 = sparse_tensor.shape
rank = 50
init = {"U": 0.1 * np.random.rand(dim1, rank),
        "V": 0.1 * np.random.rand(dim2, rank),
       "X": 0.1 * np.random.rand(dim3, rank)}
maxiter1 = 1100
maxiter2 = 100
BGCP(dense_tensor, sparse_tensor, init, rank, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))
Iter: 200
RMSE: 36.6636

Iter: 400
RMSE: 33.6852

Iter: 600
RMSE: 33.7971

Iter: 800
RMSE: 33.6267

Iter: 1000
RMSE: 34.1026

Final MAPE: 0.195865
Final RMSE: 32.7057

Running time: 403 seconds

Experiment results of missing data imputation using Bayesian Gaussian CP decomposition (BGCP):

scenario rank maxiter1 maxiter2 mape rmse
20%, RM 50 1100 100 0.1901 41.1558
40%, RM 50 1100 100 0.1959 32.7057
20%, NM 10 1100 100 0.2557 35.9867
40%, NM 10 1100 100 0.2437 49.6438

Part 7: Experiments on New York Data Set

In [16]:
import scipy.io

tensor = scipy.io.loadmat('../datasets/NYC-data-set/tensor.mat')
dense_tensor = tensor['tensor']
rm_tensor = scipy.io.loadmat('../datasets/NYC-data-set/rm_tensor.mat')
rm_tensor = rm_tensor['rm_tensor']
nm_tensor = scipy.io.loadmat('../datasets/NYC-data-set/nm_tensor.mat')
nm_tensor = nm_tensor['nm_tensor']

missing_rate = 0.3

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
binary_tensor = np.round(rm_tensor + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
# binary_tensor = np.zeros(dense_tensor.shape)
# for i1 in range(dense_tensor.shape[0]):
#     for i2 in range(dense_tensor.shape[1]):
#         for i3 in range(61):
#             binary_tensor[i1, i2, i3 * 24 : (i3 + 1) * 24] = np.round(nm_tensor[i1, i2, i3] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)
In [17]:
import time
start = time.time()
dim1, dim2, dim3 = sparse_tensor.shape
rank = 30
init = {"U": 0.1 * np.random.rand(dim1, rank),
        "V": 0.1 * np.random.rand(dim2, rank),
       "X": 0.1 * np.random.rand(dim3, rank)}
maxiter1 = 1100
maxiter2 = 100
BGCP(dense_tensor, sparse_tensor, init, rank, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))
Iter: 200
RMSE: 5.34766

Iter: 400
RMSE: 5.26448

Iter: 600
RMSE: 5.19236

Iter: 800
RMSE: 5.16759

Iter: 1000
RMSE: 5.12773

Final MAPE: 0.525225
Final RMSE: 4.82181

Running time: 1274 seconds

Experiment results of missing data imputation using Bayesian Gaussian CP decomposition (BGCP):

scenario rank maxiter1 maxiter2 mape rmse
10%, RM 30 1100 100 0.5202 4.7106
30%, RM 30 1100 100 0.5252 4.8218
10%, NM 30 1100 100 0.5295 4.7879
30%, NM 30 1100 100 0.5282 4.8664

Part 8: Experiments on Seattle Data Set

In [29]:
import pandas as pd

dense_mat = pd.read_csv('../datasets/Seattle-data-set/mat.csv', index_col = 0)
RM_mat = pd.read_csv('../datasets/Seattle-data-set/RM_mat.csv', index_col = 0)
NM_mat = pd.read_csv('../datasets/Seattle-data-set/NM_mat.csv', index_col = 0)
dense_mat = dense_mat.values
RM_mat = RM_mat.values
NM_mat = NM_mat.values
dense_tensor = dense_mat.reshape([dense_mat.shape[0], 28, 288])

missing_rate = 0.2

# =============================================================================
### Random missing (RM) scenario
### Set the RM scenario by:
# binary_tensor = np.round(RM_mat.reshape([RM_mat.shape[0], 28, 288]) + 0.5 - missing_rate)
# =============================================================================

# =============================================================================
### Non-random missing (NM) scenario
### Set the NM scenario by:
binary_tensor = np.zeros((dense_mat.shape[0], 28, 288))
for i1 in range(binary_tensor.shape[0]):
    for i2 in range(binary_tensor.shape[1]):
        binary_tensor[i1, i2, :] = np.round(NM_mat[i1, i2] + 0.5 - missing_rate)
# =============================================================================

sparse_tensor = np.multiply(dense_tensor, binary_tensor)
In [30]:
import time
start = time.time()
dim1, dim2, dim3 = sparse_tensor.shape
rank = 10
init = {"U": 0.1 * np.random.rand(dim1, rank),
        "V": 0.1 * np.random.rand(dim2, rank),
       "X": 0.1 * np.random.rand(dim3, rank)}
maxiter1 = 1100
maxiter2 = 100
BGCP(dense_tensor, sparse_tensor, init, rank, maxiter1, maxiter2)
end = time.time()
print('Running time: %d seconds'%(end - start))
Iter: 200
RMSE: 5.737

Iter: 400
RMSE: 5.70642

Iter: 600
RMSE: 5.67171

Iter: 800
RMSE: 5.66189

Iter: 1000
RMSE: 5.66045

Final MAPE: 0.0993156
Final RMSE: 5.65238

Running time: 412 seconds

Experiment results of missing data imputation using Bayesian Gaussian CP decomposition (BGCP):

scenario rank maxiter1 maxiter2 mape rmse
20%, RM 50 1100 100 0.0745 4.50
40%, RM 50 1100 100 0.0758 4.54
20%, NM 10 1100 100 0.0993 5.65
40%, NM 10 1100 100 0.0994 5.68