#!/usr/bin/env python
# coding: utf-8
# # Lecture 17: Tensor decompositions
# ## Recap of the previous lecture
# * Wavelets
# * L1-norm minimization
# * Compressed sensing
# ## Today lecture
#
# * Tensor decompositions and applications
# * Practice
# # Tensors
#
# By **tensor** we imply nothing, but a **multidimensional array**:
# $$
# A(i_1, \dots, i_d), \quad 1\leq i_k\leq n_k,
# $$
# where $d$ is called dimension, $n_k$ - mode size.
# This is standard definition in applied mathematics community. For details see [[1]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.153.2059&rep=rep1&type=pdf), [[2]](http://arxiv.org/pdf/1302.7121.pdf), [[3]](http://epubs.siam.org/doi/abs/10.1137/090752286).
#
# * $d=2$ (matrices) $\Rightarrow$ classic theory (SVD, LU, QR, $\dots$)
#
# * $d\geq 3$ (tensors) $\Rightarrow$ under development. Generalization of standard matrix results is not always straightforward
# ## Curse of dimensionality
#
# The problem with multidimensional data is that number of parameters grows exponentially with $d$:
#
#
# $$
# \text{storage} = n^d.
# $$
# For instance, for $n=2$ and $d=500$
# $$
# n^d = 2^{500} \gg 10^{83} - \text{ number of atoms in the Universe}
# $$
#
# Why do we care? It seems that we are living in the 3D World :)
# ## Applications
#
# #### Quantum chemistry
#
# Stationary Schroedinger equation for system with $N_{el}$ electrons
# $$
# \hat H \Psi = E \Psi,
# $$
# where
# $$
# \Psi = \Psi(\{{\bf r_1},\sigma_1\},\dots, \{{\bf r_{N_{el}}},\sigma_{N_{el}}\})
# $$
# 3$N_{el}$ spatial variables and $N_{el}$ spin variable.
#
#
# * Drug and material design
# * Predicting physical experiments
# #### Uncertainty quantification
#
# Example: oil reservoir modeling. Model may depend on parameters $p_1,\dots,p_d$ (like measured experimentally procity or temperature) with uncertainty
#
# $$
# u = u(t,{\bf r},\,{p_1,\dots,p_d})
# $$
#
# #### And many more
#
# * Signal processing
# * Recommender systems
# * Neural networks
# * Language models
# * Financial mathematics
# * ...
# ## Working with many dimensions
#
# * **Monte-Carlo**: class of methods based on random sampling. Convergence issues
# * **Sparse grids**: special types of grids with small number of grid points. Strong regularity conditions
# * Young and promising approach based on tensor decompositions
# ## Tensor decompositions
#
# ## 2D
#
# Skeleton decomposition:
# $$
# A = UV^T
# $$
# or elementwise:
# $$
# a_{ij} = \sum_{\alpha=1}^r u_{i\alpha} v_{j\alpha}
# $$
# leads us to the idea of **separation of variables.**
#
# **Properties:**
# * Not unique: $A = U V^T = UBB^{-1}V^T = \tilde U \tilde V^T$
# * Can be calculated in a stable way by **SVD**
# ## 1. Canonical decomposition
#
# The most straightforward way to generize separation of variables to many dimensions is the **canonical decomposition**: (CP/CANDECOMP/PARAFAC)
# $$
# a_{ijk} = \sum_{\alpha=1}^r u_{i\alpha} v_{j\alpha} w_{k\alpha}
# $$
# minimal possible $r$ is called the **canonical rank**. Matrices $U$, $V$ and $W$ are called **canonical factors**. This decomposition was proposed in 1927 by Hitchcock.
# **Properties**:
#
# * For a $d$ dimensional tensor memory is $nrd$
# * Unique under mild conditions
# * Set of tensors with rank$\leq r$ is not closed (by contrast to matrices):
# $a_{ijk} = i+j+k$, $\text{rank}(A) = 3$, but
# $$
# a^\epsilon_{ijk} = \frac{(1+\epsilon i)(1+\epsilon j)(1+\epsilon k) - 1}{\epsilon}\to i+j+k=a_{ijk},\quad \epsilon\to 0
# $$
# and $\text{rank}(A^{\epsilon}) = 2$
# * No stable algorithms to compute best rank-$r$ approximation
# **Alternating Least Squares algorithm**
#
# 0. Intialize random $U,V,W$
# 1. fix $V,W$, solve least squares for $U$
# 2. fix $U,W$, solve least squares for $V$
# 3. fix $U,V$, solve least squares for $W$
# 4. go to 2.
# ## 2. Tucker decomposition
#
# Next attempt is the decomposition proposed by Tucker in 1963 in chemometrics journal:
# $$
# a_{ijk} = \sum_{\alpha_1,\alpha_2,\alpha_3=1}^{r_1,r_2,r_3}g_{\alpha_1\alpha_2\alpha_3} u_{i\alpha_1} v_{j\alpha_2} w_{k\alpha_3}.
# $$
# Here we have several different ranks. Minimal possible $r_1,r_2,r_3$ are called **Tucker ranks**.
#
# **Properties**:
#
# * For a $d$ dimensional tensor memory is $r^d$ $+ nrd$. Still **curse of dimensionality**
# * Stable SVD-based algorithm:
# 1. $U =$ principal components of the unfolding `A.reshape(n1, n2*n3)`
# 2. $V =$ principal components of the unfolding `A.transpose([1,0,2]).reshape(n2, n1*n3)`
# 3. $W =$ principal components of the unfolding `A.transpose([2,0,1]).reshape(n3, n1*n2)`
# 4. $g_{\alpha_1\alpha_2\alpha_3} = \sum_{i,j,k=1}^{n_1,n_2,n_3} a_{ijk} u_{i\alpha_1} v_{j\alpha_2} w_{k\alpha_3}$.
# ## 3. Tensor Train decomposition
#
# * Calculation of the canonical decomposition is unstable
# * Tucker decomposition suffers from the curse of dimensionality
#
# Tensor Train (**TT**) decomposition (Oseledets, Tyrtyshnikov 2009) is both stable and contains linear in $d$ number of parameters:
# $$
# a_{i_1 i_2 \dots i_d} = \sum_{\alpha_1,\dots,\alpha_{d-1}}
# g_{i_1\alpha_1} g_{\alpha_1 i_2\alpha_2}\dots g_{\alpha_{d-2} i_{d-1}\alpha_{d-1}} g_{\alpha_{d-1} i_{d}}
# $$
# or in the matrix form
# $$
# a_{i_1 i_2 \dots i_d} = G_1 (i_1)G_2 (i_2)\dots G_d(i_d)
# $$
# * The storage is $\mathcal{O}(dnr^2)$
# * Stable TT-SVD algorithm
# **Example**
# $$a_{i_1\dots i_d} = i_1 + \dots + i_d$$
# Canonical rank is $d$. At the same time TT-ranks are $2$:
# $$
# i_1 + \dots + i_d = \begin{pmatrix} i_1 & 1 \end{pmatrix}
# \begin{pmatrix} 1 & 0 \\ i_2 & 1 \end{pmatrix}
# \dots
# \begin{pmatrix} 1 & 0 \\ i_{d-1} & 1 \end{pmatrix}
# \begin{pmatrix} 1 \\ i_d \end{pmatrix}
# $$
# ## 4. Quantized Tensor Train
#
# Consider a 1D array $a_k = f(x_k)$, $k=1,\dots,2^d$ where $f$ is some 1D function calculated on grid points $x_k$.
#
# Let $$k = {2^{d-1} i_1 + 2^{d-2} i_2 + \dots + 2^0 i_{d}}\quad i_1,\dots,i_d = 0,1 $$
# be binary representation of $k$, then
# $$
# a_k = a_{2^{d-1} i_1 + 2^{d-2} i_2 + \dots + 2^0 i_{d}} \equiv \tilde a_{i_1,\dots,i_d},
# $$
# where $\tilde a$ is nothing, but a reshaped tensor $a$. TT decomposition of $\tilde a$ is called **Quantized Tensor Train (QTT)** decomposition. Interesting fact is that the QTT decomposition has relation to wavelets.
#
# Contains $\mathcal{O}(\log n r^2)$ elements!
# ## Cross approximation method
#
# If decomposition of a tensor is given, then there is no problem to do basic operations fast.
#
# However, the question is if it is possible to find decomposition taking into account that typically tensors even can not be stored.
#
# **Cross approximation** method allows to find the decomposition using only few of its elements.
# ## Summary
#
# * Tensor decompositions - useful tool to work with multidimensional data
# * Canonical, Tucker, TT, QTT decompositions
# * Cross approximation
# ## Next time
#
# Ping-Pong test
# ## Now lets have some practice
# In[1]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()