This notebook is an interactive version of the companion webpage for the article, Deep Probabilistic Programming (Tran et al., 2017). See Edward's API for details on how to interact with data, models, inference, and criticism.
The code snippets assume the following versions.
pip install edward==1.3.1
pip install tensorflow==1.1.0 # alternatively, tensorflow-gpu==1.1.0
pip install keras==2.0.0
Figure 1. Beta-Bernoulli program.
import tensorflow as tf
from edward.models import Bernoulli, Beta
theta = Beta(1.0, 1.0)
x = Bernoulli(tf.ones(50) * theta)
For an example of it in use, see
examples/beta_bernoulli.py
in the Github repository.
Figure 2. Variational auto-encoder for a data set of 28 x 28 pixel images (Kingma & Welling, 2014; Rezende, Mohamed, & Wierstra, 2014).
import tensorflow as tf
from edward.models import Bernoulli, Normal
from keras.layers import Dense
N = 55000 # number of data points
d = 50 # latent dimension
# Probabilistic model
z = Normal(loc=tf.zeros([N, d]), scale=tf.ones([N, d]))
h = Dense(256, activation='relu')(z)
x = Bernoulli(logits=Dense(28 * 28, activation=None)(h))
# Variational model
qx = tf.placeholder(tf.float32, [N, 28 * 28])
qh = Dense(256, activation='relu')(qx)
qz = Normal(loc=Dense(d, activation=None)(qh),
scale=Dense(d, activation='softplus')(qh))
For an example of it in use, see
examples/vae.py
in the Github repository.
Figure 3. Bayesian recurrent neural network (Radford M Neal, 2012).
The program has an unspecified number of time steps; it uses a
symbolic for loop (tf.scan
).
import edward as ed
import tensorflow as tf
from edward.models import Normal
H = 50 # number of hidden units
D = 10 # number of features
def rnn_cell(hprev, xt):
return tf.tanh(ed.dot(hprev, Wh) + ed.dot(xt, Wx) + bh)
Wh = Normal(loc=tf.zeros([H, H]), scale=tf.ones([H, H]))
Wx = Normal(loc=tf.zeros([D, H]), scale=tf.ones([D, H]))
Wy = Normal(loc=tf.zeros([H, 1]), scale=tf.ones([H, 1]))
bh = Normal(loc=tf.zeros(H), scale=tf.ones(H))
by = Normal(loc=tf.zeros(1), scale=tf.ones(1))
x = tf.placeholder(tf.float32, [None, D])
h = tf.scan(rnn_cell, x, initializer=tf.zeros(H))
y = Normal(loc=tf.matmul(h, Wy) + by, scale=1.0)
Figure 5. Hierarchical model (Gelman & Hill, 2006). It is a mixture of Gaussians over $D$-dimensional data $\{x_n\}\in\mathbb{R}^{N\times D}$. There are $K$ latent cluster means $\beta\in\mathbb{R}^{K\times D}$.
import tensorflow as tf
from edward.models import Categorical, Normal
N = 10000 # number of data points
D = 2 # data dimension
K = 5 # number of clusters
beta = Normal(loc=tf.zeros([K, D]), scale=tf.ones([K, D]))
z = Categorical(logits=tf.zeros([N, K]))
x = Normal(loc=tf.gather(beta, z), scale=tf.ones([N, D]))
It is used below in Figure 6 (left/right) and Figure * (variational EM).
Figure 6 (left). Variational inference (Jordan, Ghahramani, Jaakkola, & Saul, 1999). It performs inference on the model defined in Figure 5.
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Categorical, Normal
x_train = np.zeros([N, D])
qbeta = Normal(loc=tf.Variable(tf.zeros([K, D])),
scale=tf.exp(tf.Variable(tf.zeros([K, D]))))
qz = Categorical(logits=tf.Variable(tf.zeros([N, K])))
inference = ed.VariationalInference({beta: qbeta, z: qz}, data={x: x_train})
Figure 6 (right). Monte Carlo (Robert & Casella, 1999). It performs inference on the model defined in Figure 5.
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Empirical
x_train = np.zeros([N, D])
T = 10000 # number of samples
qbeta = Empirical(params=tf.Variable(tf.zeros([T, K, D])))
qz = Empirical(params=tf.Variable(tf.zeros([T, N])))
Figure 7. Generative adversarial network (Goodfellow et al., 2014).
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Normal
from keras.layers import Dense
N = 55000 # number of data points
d = 50 # latent dimension
def generative_network(eps):
h = Dense(256, activation='relu')(eps)
return Dense(28 * 28, activation=None)(h)
def discriminative_network(x):
h = Dense(28 * 28, activation='relu')(x)
return Dense(1, activation=None)(h)
# DATA
x_train = np.zeros([N, 28 * 28])
# MODEL
eps = Normal(loc=tf.zeros([N, d]), scale=tf.ones([N, d]))
x = generative_network(eps)
# INFERENCE
inference = ed.GANInference(data={x: x_train},
discriminator=discriminative_network)
For an example of it in use, see the generative adversarial networks tutorial.
Figure *. Variational EM (Radford M. Neal & Hinton, 1993). It performs inference on the model defined in Figure 5.
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Categorical, PointMass
# DATA
x_train = np.zeros([N, D])
# INFERENCE
qbeta = PointMass(params=tf.Variable(tf.zeros([K, D])))
qz = Categorical(logits=tf.Variable(tf.zeros([N, K])))
inference_e = ed.VariationalInference({z: qz}, data={x: x_train, beta: qbeta})
inference_m = ed.MAP({beta: qbeta}, data={x: x_train, z: qz})
inference_e.initialize()
inference_m.initialize()
tf.initialize_all_variables().run()
for _ in range(10000):
inference_e.update()
inference_m.update()
For more details, see the
inference compositionality webpage.
See
examples/factor_analysis.py
for
a version performing Monte Carlo EM for logistic factor analysis
in the Github repository.
It leverages Hamiltonian Monte Carlo for the E-step to perform maximum
marginal a posteriori.
Figure *. Data subsampling.
import edward as ed
import tensorflow as tf
from edward.models import Categorical, Normal
N = 10000 # number of data points
M = 128 # batch size during training
D = 2 # data dimension
K = 5 # number of clusters
# DATA
x_batch = tf.placeholder(tf.float32, [M, D])
# MODEL
beta = Normal(loc=tf.zeros([K, D]), scale=tf.ones([K, D]))
z = Categorical(logits=tf.zeros([M, K]))
x = Normal(loc=tf.gather(beta, z), scale=tf.ones([M, D]))
# INFERENCE
qbeta = Normal(loc=tf.Variable(tf.zeros([K, D])),
scale=tf.nn.softplus(tf.Variable(tf.zeros([K, D]))))
qz = Categorical(logits=tf.Variable(tf.zeros([M, D])))
inference = ed.VariationalInference({beta: qbeta, z: qz}, data={x: x_batch})
inference.initialize(scale={x: float(N) / M, z: float(N) / M})
For more details, see the data subsampling webpage.
Figure 9. Bayesian logistic regression with Hamiltonian Monte Carlo.
import edward as ed
import numpy as np
import tensorflow as tf
from edward.models import Bernoulli, Empirical, Normal
N = 581012 # number of data points
D = 54 # number of features
T = 100 # number of empirical samples
# DATA
x_data = np.zeros([N, D])
y_data = np.zeros([N])
# MODEL
x = tf.Variable(x_data, trainable=False)
beta = Normal(loc=tf.zeros(D), scale=tf.ones(D))
y = Bernoulli(logits=ed.dot(x, beta))
# INFERENCE
qbeta = Empirical(params=tf.Variable(tf.zeros([T, D])))
inference = ed.HMC({beta: qbeta}, data={y: y_data})
inference.run(step_size=0.5 / N, n_steps=10)
For an example of it in use, see
examples/bayesian_logistic_regression.py
in the Github repository.
Figure 10. Bayesian neural network for classification (Denker, Schwartz, Wittner, & Solla, 1987).
import tensorflow as tf
from edward.models import Bernoulli, Normal
N = 1000 # number of data points
D = 528 # number of features
H = 256 # hidden layer size
W_0 = Normal(loc=tf.zeros([D, H]), scale=tf.ones([D, H]))
W_1 = Normal(loc=tf.zeros([H, 1]), scale=tf.ones([H, 1]))
b_0 = Normal(loc=tf.zeros(H), scale=tf.ones(H))
b_1 = Normal(loc=tf.zeros(1), scale=tf.ones(1))
x = tf.placeholder(tf.float32, [N, D])
y = Bernoulli(logits=tf.matmul(tf.nn.tanh(tf.matmul(x, W_0) + b_0), W_1) + b_1)
For an example of it in use, see
examples/getting_started_example.py
in the Github repository.
Figure 11. Latent Dirichlet allocation (D. M. Blei, Ng, & Jordan, 2003).
import tensorflow as tf
from edward.models import Categorical, Dirichlet
D = 4 # number of documents
N = [11502, 213, 1523, 1351] # words per doc
K = 10 # number of topics
V = 100000 # vocabulary size
theta = Dirichlet(tf.zeros([D, K]) + 0.1)
phi = Dirichlet(tf.zeros([K, V]) + 0.05)
z = [[0] * N] * D
w = [[0] * N] * D
for d in range(D):
for n in range(N[d]):
z[d][n] = Categorical(theta[d, :])
w[d][n] = Categorical(phi[z[d][n], :])
Figure 12. Gaussian matrix factorization (Salakhutdinov & Mnih, 2011).
import tensorflow as tf
from edward.models import Normal
N = 10
M = 10
K = 5 # latent dimension
U = Normal(loc=tf.zeros([M, K]), scale=tf.ones([M, K]))
V = Normal(loc=tf.zeros([N, K]), scale=tf.ones([N, K]))
Y = Normal(loc=tf.matmul(U, V, transpose_b=True), scale=tf.ones([N, M]))
Figure 13. Dirichlet process mixture model (Antoniak, 1974).
import tensorflow as tf
from edward.models import DirichletProcess, Normal
N = 1000 # number of data points
D = 5 # data dimensionality
dp = DirichletProcess(alpha=1.0, base=Normal(loc=tf.zeros(D), scale=tf.ones(D)))
mu = dp.sample(N)
x = Normal(loc=loc, scale=tf.ones([N, D]))
To see the essential component defining the DirichletProcess
, see
examples/pp_dirichlet_process.py
in the Github repository. Its source implementation can be found at
edward/models/dirichlet_process.py
.
Figure *. Stochastic variational inference (M. D. Hoffman, Blei, Wang, & Paisley, 2013). For more details, see the data subsampling webpage.
Figure 15. Variational auto-encoder (Kingma & Welling, 2014; Rezende et al., 2014). See the script \href{https://github.com/blei-lab/edward/blob/master/examples/vae.py%7D%7B%5Ctexttt%7Bexamples/vae.py%7D%7D in the Github repository.
Figure 16. Exponential family embedding (Rudolph, Ruiz, Mandt, & Blei, 2016). A Github repository with comprehensive features is available at mariru/exponential_family_embeddings.