#!/usr/bin/env python
# coding: utf-8

# # Empirical Approximation overview
# 
# For most models we use sampling MCMC algorithms like Metropolis or NUTS. In PyMC3 we got used to store traces of MCMC samples and then do analysis using them. There is a similar concept for the variational inference submodule in PyMC3: *Empirical*. This type of approximation stores particles for the SVGD sampler. There is no difference between independent SVGD particles and MCMC samples. *Empirical* acts as a bridge between MCMC sampling output and full-fledged VI utils like `apply_replacements` or `sample_node`. For the interface description, see [variational_api_quickstart](variational_api_quickstart.ipynb). Here we will just focus on `Emprical` and give an overview of specific things for the *Empirical* approximation

# In[1]:


import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import theano

print('Running on PyMC3 v{}'.format(pm.__version__))


# In[2]:


get_ipython().run_line_magic('config', "InlineBackend.figure_format = 'retina'")
az.style.use('arviz-darkgrid')
np.random.seed(42)
pm.set_tt_rng(42)


# ## Multimodal density
# Let's recall the problem from [variational_api_quickstart](variational_api_quickstart.ipynb) where we first got a NUTS trace

# In[3]:


w = pm.floatX([.2, .8])
mu = pm.floatX([-.3, .5])
sd = pm.floatX([.1, .1])

with pm.Model() as model:
    x = pm.NormalMixture('x', w=w, mu=mu, sigma=sd, dtype=theano.config.floatX)
    trace = pm.sample(50000)


# In[4]:


pm.traceplot(trace);


# Great. First having a trace we can create `Empirical` approx

# In[5]:


print(pm.Empirical.__doc__)


# In[6]:


with model:
    approx = pm.Empirical(trace)


# In[7]:


approx


# This type of approximation has it's own underlying storage for samples that is `theano.shared` itself

# In[8]:


approx.histogram


# In[9]:


approx.histogram.get_value()[:10]


# In[10]:


approx.histogram.get_value().shape


# It has exactly the same number of samples that you had in trace before. In our particular case it is 50k.  Another thing to notice is that if you have multitrace with **more than one chain** you'll get much **more samples** stored at once. We flatten all the trace for creating `Empirical`.
# 
# This *histogram* is about *how* we store samples. The structure is pretty simple: `(n_samples, n_dim)` The order of these variables is stored internally in the class and in most cases will not be needed for end user

# In[11]:


approx.ordering


# Sampling from posterior is done uniformly with replacements. Call `approx.sample(1000)` and you'll get again the trace but the order is not determined. There is no way now to reconstruct the underlying trace again with `approx.sample`.

# In[12]:


new_trace = approx.sample(50000)


# In[13]:


get_ipython().run_line_magic('timeit', 'new_trace = approx.sample(50000)')


# After sampling function is compiled sampling bacomes really fast

# In[14]:


pm.traceplot(new_trace);


# You see there is no order any more but reconstructed density is the same.
# 
# ## 2d density

# In[15]:


mu = pm.floatX([0., 0.])
cov = pm.floatX([[1, .5], [.5, 1.]])
with pm.Model() as model:
    pm.MvNormal('x', mu=mu, cov=cov, shape=2)
    trace = pm.sample(1000)


# In[16]:


with model:
    approx = pm.Empirical(trace)


# In[17]:


pm.traceplot(approx.sample(10000));


# In[18]:


import seaborn as sns


# In[19]:


sns.kdeplot(approx.sample(1000)['x'])


# Previously we had a `trace_cov` function

# In[20]:


with model:
    print(pm.trace_cov(trace))


# Now we can estimate the same covariance using `Empirical`

# In[21]:


print(approx.cov)


# That's a tensor itself

# In[22]:


print(approx.cov.eval())


# Estimations are very close and differ due to precision error. We can get the mean in the same way

# In[23]:


print(approx.mean.eval())


# In[24]:


get_ipython().run_line_magic('load_ext', 'watermark')
get_ipython().run_line_magic('watermark', '-n -u -v -iv -w')