Prof. Ivan Oseledets, i.oseledets@skoltech.ru, http://oseledets.github.io, joint with the group members.
Below is the list of projects. Some of them can later evolve into the master theses. They can be also taken by a group of students (2-3).
The main prerequisite is the motivation work with us: our group is big, and we are always quite helpful, even if you are not a master in the particular topic. However it is a good idea that you know linear algebra (or at least learning it). For machine-learning related projects, machine learning experience is a good idea, for PDE-related it not bad to have a basic knowledge about finite differences and/or finite elements. Many projects are related to tensor train and quantized tensor train decompositions: this is our "special feature", which you will surely need to learn. Also, all the projects are related to programming and Python is the preferred language.
The tasks are of course specific for the particular projects, however the pipeline is similar.
Do a literature survey. Google Scholar will help you. Read all the papers you can find on the topic, not only the particular ones that the advisor has given to you. Follow the references in the papers, find keywords, define key ideas.
Find appropriate software and baseline algorithm. Typically, you are not the one solving a particular problem: look for the same projects, what was their approach and try to identify possible improvements
Start writing in $\LaTeX$ a short summary of what you are doing (something like a journal)
Try to formulate questions: a good question is the core to the answer.
Ask the questions to me or group members.
Write code, do experiments, get new results (or find that the approach is not working at all - this is also a scientific result).
Deep learning is now extremely popular for many different tasks, including image processing, speech recognition and many others. On the other hand, PDEs require a lot of computing power. The goal of this project is to study the applicability of deep learning methods (based on standard software) to approximate the functionals of solution of PDEs.
Consider a thermal conductivity problem $$ \nabla \cdot k \nabla u = f, $$ where $k$ is a given coefficient. Then we want to approximate the functional $F(u)$ of the solution (i.e., maximal temperature, mean temperature). The goal is to design a neural network that will be able to approximate this quantity to sufficient accuracy.
We propose the following classifier. Given a feature vector $p$, the classification is performed by the selection of the path $i_1, \ldots, i_d$ that maximizes the ``softmax'' criteria $$ F(p, i_1, \ldots, i_d) = \frac{p W_1(i_1) \ldots W_d(i_d)}{ p W_1(i_1) \ldots W_{d-1}(i_{d-1}) W}, $$ where $W_d = \sum_{i_d = 1}^c W_d(i_d)$ and the last variable $i_d$ is the class variable (which we want to discover). The learning of the matrices $W_k(i_k)$ can be done by a gradient-descent type algorithm: for each training sample $(p_s, c_s)$ given $W$ we update the best path $i^{(s)}_1, \ldots, i^{(s)}_{d-1}$ by maximizing the classification, and then for a given set $(p_s, i_1, \ldots, i_d)$ we minimize the loss (for example, logistic regression on the softmax criteria, or just the softmax itself) by taking a Riemanian gradient descent.
This can be viewed as a \textbf{multiple linear classificator}, but with a special structure of the classificator matrices, otherwise the overfitting would be obvious. For small number of layers the inference the classification would not be hard (i.e. $10$ layers each of size $2$).
It is also close to the highway networks.
A typical linear regression uses the following classifier. Given a feature vector $x$ and number of classes $c$, we learn an $F \times C$ weight matrix $W$ and the probabilities for different classes are given by the vector $$ p = \frac{1}{Z} \exp(W x), $$ where $Z$ is the normalization constant. For large number of features and large number of classes (maybe infinite, when the class variable is continious), storing matrix $W$ is difficult, and we propose to constraint it to the rank-$r$ case.
A machine learning task with many features can be difficult to solve; however, many features are not so important, and only important features can be used for learning larger and more accurate classifiers. But how to select those features? The idea that we want to test is based on the maximal volume submatrices. Represent the data as $N_{obj} \times N_F$, and find there the submatrix with maximal determinant.
Many stock prices can modelled using Brownian motion, i.e. the next state depends on the previous state plus some random event. Important statistical options (for example, option calls) can be then estimated as expectations of random process. The straightforward way is to do Monte-Carlo simulation, but it can not give high accuracy. Another way is to write down the expectation as a path integral and then approximate it by a multi-dimensional integral. For similar integrals we have recently developed a new technique, http://arxiv.org/abs/1504.06149. The idea is to test it for examples from risk analysis.
Sum-product-networks is a new deep architecture. One of its interesting property is that there a constructive algorithm for learning the structure of such network from the observations. On the other hand, one can look at SPN as an approximation of multivariate functions. The idea is then to use Metropolis-type algorithm that samples the given function as it was the probability density, and then apply the SPN learning to recover the SPN structure.
Sparse matrices are important and have $\mathcal{O}(N)$ parameters. However solving a linear system is much more difficult. The goal of this project is to get familiar with the subject, and implement the fast direct solver based on so-called HSS structure.
Boltzmann collission integral is an integral transform that takes two functions from $L^2(\mathbb{R}^3)$ to a one function in $L^2(\mathbb{R}^3)$. For example, see http://www.sam.math.ethz.ch/sam_reports/reports_final/reports2012/2012-28.pdf. Discretizated on a grid, it can be defined as a $N^3 \times N^3 \times N^3$ array, and evaluation requires $\mathcal{O}(N^9)$ complexity. However, it has many symmetries. Our goal is to compare different method for the evaluation of the Boltzmann integral.
TT-format (Tensor Train) is a representation of $d$-dimensional tensors of the form $$A(i_1, \ldots, i_d) = G_1(i_1) \ldots G_d(i_d),$$ where $G_k(i_k)$ has size $r_{k-1} \times r_k$ for fixed $i_k$. A typical task to find the TT-representation is to optimize over the cores $G_k$ in a sequential manner (all fixed, update one and so on). Our goal is to present an asynchonous versions of such algorithm, based on the connection with loopy belief propagation
The "mega-goal" is given a functional $F$ find a topology that optimizes this functional given some constaints (weight of the structure, for example). One of the typical task is linear elasticity, and it is being solved by iteration over the material properties. One can formulate the problem as a minimization problem with constraints (regularization).
The "mega-goal" is given a functional $F$ find a topology that optimizes this functional given some constaints (for example, weight of the structure). One of the ways to do so is to use "ball and sticks" method to create the structure: a set of primitives that can be connected in different ways and also parametrized with several parameters. Checking all possible parameters has exponential complexity, and each computation requires a solution of a PDE. The idea is to use TT-cross method to adaptively sample the parameter space.
The "mega-goal" is given a functional $F$ find a topology that optimizes this functional given some constaints (weights). One of the typical task is linear elasticity, and it is being solved by iteration over the material properties. One can formulate the problem as a minimization problem with constraints (regularization). Our goal here is to use the weirdest regularization: contraint the material properties to low-QTT rank case. However, you can be smarter, and come up with other ideas, for example coming from deep networks as well (that would allow for very curved shapes to be generated). However to work on this project you should get familiar both with the FEM solvers for a topology optimization codes, and with optimization with low-rank constraints.
Approximation of multivariate functions is a notoriously difficult tasks. One of the most efficient approaches, based on so-called sparse grids is based on using the set polynomials with bounded total degree. This provides a sparse basis set in the set of multidimensional functions. Interpolation is the simplest task, and in the standard setting gives rise to the so-called sparse grids. However, given the basis, the problem of selecting optimal interpolation points can be solved by the maximum volume algorithm=. The goal of this project is to compute those interpolation sets and evaluate their efficiency.
QTT (Quantized Tensor Train) is a very elegant & efficient representation of functions, which in many cases allows to achieve complexity logarithmic in the number of samples. However, even for 1d-functions using the basic software (http://github.com/oseledets/ttpy) it is not that straightforward to compute integrals, find roots. This is all implemented in the Chebfun system in a very nice and lucrative way. The goal of this project is implement a QTT alternative to the Chebfun project, and it should be much more efficient.
TT-Toolbox (http://github.com/oseledets/ttpy), there is also a MATLAB version, is our basic software for working with tensors, and it is actively used in many groups around the world. However it is not well-documented and the number of examples is limited. The codebase should be also cleaned as well.
Tensor train is a very efficient representation of multidimensional tensors. However, there are use-cases when not only an asymptotic complexity is needed, but the actual computation speed matters. Our main application in mind is machine learning with low-rank constraints, when someone has to do many iterations, and if it is possible to speedup the computations on GPU, it also improved the results. The TT-Toolbox (http://github.com/oseledets/ttpy) is implemented in Python with no GPU support. The goal of this project is two-fold: get familiar with modern GPU programming techniques, including pyopencl and Loo.py and based upon those, build a GPU version of basic tensor algorithms
Tensor train is a very efficient representation of multidimensional tensors. There are use-cases, when the representation is so large that is does not fit into the memory (i.e., large TT-ranks) and parallelization is required. The goal of this project is to try to figure out how the algorithms implemented in the TT-Toolbox (http://github.com/oseledets/ttpy) can be parallelized (for example, using mpi4py) and test the resulting prototype. Moreover,the actual parametrization of TT is very sequential, and it would be really interesting to consider asyncronous approaches to TT optimization.
Probably you have some understanding, what a partial differential equation is. However, there are cases, when volume integral equations (i.e., the system has the form $K u = f$, and $K$ is a non-local operator, for example, $$ \int \frac{q(y) dy}{\Vert x - y \Vert} = f. $$ After discretization, we are left with $N^3 \times N^3$ dense matrix. For a uniform grid, the complexity is then greatly reduced by using Fast Fourier Transform (FFT) to $\mathcal{O}(N^3 \log N)$ for one matrix-by-vector product. The goal of this project is to reduce this complexity much further, to $\mathcal{O}(N)$ or even $\mathcal{O}(\log^{\alpha}(N)$ using quantized tensor train representation.
Low-rank matrix factorization of the form $$ A \approx U V^{\top}, $$ where $U$ is $n \times r$ and $V$ is $m \times r$, appears in many applications, for example in recommender systems, where the rows enumerate the users, and the columns - the product. However, the typical norm used is the Frobenius norm, and it actually does not fit very well: we are interested only in the prediction of like/dislike, but not the actual numerical value. Thus, the optimization should be formulated in a non-standard way. Logistic matrix factorization is one of the possible ways to do so. Optimization of non-quadratic functionals can be done using Riemannian optimization in a very efficient way. This is the goal of this project.