We begin with a motivating example that requires "intelligent" goal-directed decision making: assume that you are an owl and that you're hungry. What are you going to do?
Have a look at Prof. Karl Friston's answer in this video segment on the cost function for intelligent behavior. (Do watch the video!)
Friston argues that intelligent decision making (behavior, action making) by an agent requires minimization of a functional of beliefs.
Friston further argues (later in the lecture and his papers) that this functional is a (variational) free energy (to be defined below), thus linking decision-making and acting to Bayesian inference.
In fact, Friston's Free Energy Principle (FEP) claims that all biological self-organizing processes (including brain processes) can be described as Free Energy minimization in a probabilistic model.
Taking inspiration from FEP, if we want to develop synthetic "intelligent" agents, we have (only) two issues to consider:
Agents that follow the FEP are said to be involved in Active Inference (AIF). An AIF agent updates its states and parameters (and ultimately its model structure) solely by FE minimization, and selects its actions through (expected) FE minimization (to be explained below).
The agent is embedded in an environment with "external states" $\tilde{s}_t$. The dynamics of the environment are driven by actions.
Actions $a_t$ are selected by the agent. Actions affect the environment and consequently affect future observations.
In pseudo-code, an AIF agent executes the following algorithm:
ACTIVE INFERENCE (AIF) AGENT ALGORITHM
SPECIFY generative model $p(x,s,u)$
ASSUME/SPECIFY environmental process $R$FORALL t DO
- $(x_t, \tilde{s}_t) = R(a_t, \tilde{s}_{t-1})$ % environment generates new observation
- $q(s_t) = \arg\min_q F[q]$ % update internal states (process observation)
- $q(u_{t+1}) = \arg\min_q H[q]$ % update control states (process observation)
- $\hat{u}_{t+1} \sim q(u_{t+1})$; $a_{t+1} = \hat{u}_{t+1}$ % sample next action and push to environment
END
What should the agent's model $p(x,s,u)$ be modeling? This question was (already) answered by Conant and Ashby (1970) as the good regulator theorem: every good regulator of a system must be a model of that system. See the OPTIONAL SLIDE for more information.
Conant and Ashley state: "The theorem has the interesting corollary that the living brain, so far as it is to be successful and efficient as a regulator for survival, must proceed, in learning, by the formation of a model (or models) of its environment."
Indeed, perception in brains is clearly affected by predictions about sensory inputs by the brain's own generative model.
), on the left you will likely see a bowl of vegetables, while the same picture upside down elicits with most people the perception of a gardener's face rather than an upside-down vegetable bowl.
where $a_t$ are actions (by the agent), $x_t$ are outcomes (the agent's observations) and $\tilde{s}_t$ holds the environmental latent states.
Note that $R$ only needs to be specified for simulated environments. If we were to deploy the agent in a real-world environment, we would not need to specify $R$.
The agent's knowledge about environmental process $R$ is expressed by its generative model $p(x_t,s_t,u_t|s_{t-1})$.
Note that we distinguish between control states and actions. Control states $u_t$ are latent variables in the agent's generative model. An action $a_t$ is a realization of a control state as observed by the environment.
Observations $x_t=\hat{x}_t$ are generated by the environment and observed by the agent. Vice versa, actions $a_t = \hat{u}_t$ are generated by the agent and observed by the environment.
After the agent makes a new observation $x_t=\hat{x}_t$, it will update beliefs over its latent variables. First the internal state variables $s$.
Assume the following at time step $t$:
The state updating task is to infer $q(s_{t}|\hat{x}_{1:t})$, based on the previous estimate $q(s_{t-1}|\hat{x}_{1:t-1})$, the new data $\{a_t=\hat{u}_t,x_t=\hat{x}_{t}\}$, and the agent's generative model.
Technically, this is a Bayesian filtering task. In a real brain, this process is called perception.
We specify the following FE functional
In case the generative model is a Linear Gaussian Dynamical System, minimization of the FE can be solved analytically in closed-form and leads to the standard Kalman filter.
In case these (linear Gaussian) conditions are not met, we can still minimize the FE by other means and arrive at some approximation of the Kalman filter, see for example Baltieri and Isomura (2021) for a Laplace approximation to variational Kalman filtering.
Our toolbox RxInfer specializes in automated execution of this minimization task.
Once the agent has updated its internal states, it will turn to inferring the next action.
In order to select a good next action, we need to investigate and compare consequences of a sequence of future actions.
A sequence of future actions $a= (a_{t+1}, a_{t+2}, \ldots, a_{t+T})$ is called a policy. Since relevant consequences are usually the result of an future action sequence rather than a single action, we will be interested in updating beliefs over policies.
In order to assess the consequences of a selected policy, we will, as a function of that policy, run the generative model forward-in-time to make predictions about future observations $x_{t+1:T}$.
Note that perception (state updating) preceeds policy updating. In order to accurately predict the future, the agent first needs to understand the current state of the world.
Consider an AIF agent at time step $t$ with (future) observations $x = (x_{t+1}, x_{t+2}, \ldots, x_{t+T})$, latent future internal states $s= (s_t, s_{t+1}, \ldots, s_{t+T})$, and latent future control variables $u= (u_{t+1}, u_{t+2}, \ldots, u_{t+T})$.
From the agent's viewpoint, the evolution of these future variables are constrained by its generative model, rolled out into the future:
In principle, this is a regular FE functional, with one difference to previous versions: since future observations $x$ have not yet occurred, $H[q]$ marginalizes not only over latent states $s$ and policies $u$, but also over future observations $x$.
We will update the beliefs over policies by minimization of Free Energy functional $H[q]$. In the optional slides below, we prove that the solution to this optimization task is given by (see AIF Algorithm, step 3, above)
where the factor $p(u)$ is a prior over admissible policies, and the factor $\exp(-G(u))$ updates the prior with information about future consequences of a selected policy $u$.
is called the Expected Free Energy (EFE) for policy $u$.
The FEP takes the following stance: if FE minimization is all that an agent does, then the only consistent and appropriate behavior for an agent is to select actions that minimize the expected Free Energy in the future (where expectation is taken over current beliefs about future observations).
Note that, since $q^*(u) \propto p(u)\exp(-G(u))$, the probability $q^*(u)$ for selecting a policy $u$ increases when EFE $G(u)$ gets smaller.
Once the policy (control) variables have been updated, in simulated environments, it is common to assume that the next action $a_{t+1}$ (an action is the observed control variable by the environment) gets selected in proportion to the probability of the related control variable (see AIF Agent Algorithm, step 4, above), i.e.,
Apparently, minimization of EFE leads to selection of policies that balances the following two imperatives:
minimization of the first term of $G(u)$, i.e. minimizing $\sum_x q(x|u) \log \frac{1}{p(x)}$, leads to policies ($u$) that align the inferred observations $q(x|u)$ under policy $u$ (i.e., predicted future observations under policy $u$) with a prior $p(x)$ on future observations. We are in control to choose any prior $p(x)$ and usually we choose a prior that aligns with desired (goal) observations. Hence, policies with low EFE leads to goal-seeking behavior (a.k.a. pragmatic behavior or exploitation). In the OPTIONAL SLIDES, we derive an alternative (perhaps clearer) expression to support this interpretation].
minimization of $G(u)$ maximizes the second term
which is the (conditional) mutual information between (posteriors on) future observations and states, for a given policy $u$. Thus, maximizing this term leads to actions that maximize statistical dependency between future observations and states. In other words, a policy with low EFE also leads to information-seeking behavior (a.k.a. epistemic behavior or exploration).
(The third term $\sum_{x,s} q(x,s|u) \log \frac{q(s|x)}{p(s|x)}$ is an (expected) KL divergence between posterior and prior on the states. This can be interpreted as a complexity/regularization term and $G(u)$ minimization will drive this term to zero.)
Seeking actions that balance goal-seeking behavior (exploitation) and information-seeking behavior (exploration) is a fundamental problem in the Reinforcement Learning literature.
Active Inference solves the exploration-exploitation dilemma. Both objectives are served by EFE minimization without need for any tuning parameters.
We highlight another great feature of FE minimizing agents. Consider an AIF agent ($m$) with generative model $p(x,s,u|m)$.
Consider the Divergence-Evidence decomposition of the FE again:
The first term, $-\log p(x|m)$, is the (negative log-) evidence for model $m$, given recorded data $x$.
Minimization of FE maximizes the evidence for the given model. The model captures the problem representation. A model with high evidence predicts the data well and therefore "understands the world".
The second term scores the cost of inference. In almost all cases, the solution to a problem can be phrased as an inference task on the generative model. Hence, the second term scores the accuracy of the inferred solution, for the given model.
FE minimization optimizes a balanced trade-off between a good-enough problem representation and a good-enough solution proposal for that model. Since FE comprises both a cost for solution and problem representation, it is a neutral criterion that applies across a very wide set of problems.
A good solution to the wrong problem is not good enough. A poor solution to a great problem statement is not sufficient either. In order to solve a problem well, we need both to represent the problem correctly (high model evidence) and we need to solve it well (low inference costs).
The above derivations are not trivial, but we have just shown that FE minimizing agents accomplish variational Bayesian perception (a la Kalman filtering), and a balanced exploration-exploitation trade-off for policy selection.
Moreover, the FE by itself serves as a proper objective across a very wide range of problems, since it scores both the cost of the problem statement and the cost of inferring the solution.
The current FEP theory claims that minimization of FE (and EFE) is all that brains do, i.e., FE minimization leads to perception, policy selection, learning, structure adaptation, attention, learning of problems and solutions, etc.
So we have here an AI framework (minimization of FE and associated EFE) that
Clearly, the FEP, and synthetic AIF agents as a realization of FEP, comprise a very attractive framework for all things relating to AI and AI agents.
A current big AI challenge is to design synthetic AIF agents based solely on FE/EFE minimization.
Executing a synthetic AIF agent often poses a large computational problem because of the following reasons:
So, in practice, executing a synthetic AIF agent may lead to a task of minimizing a time-varying FE function of thousands of variables in real-time!!
How to specify and execute a synthetic AIF agent is an active area of research.
There is no definitive solution approach to AIF agent modeling yet; we (BIASlab) think that (reactive) message passing in a factor graph representation provides a promising path.
After selecting an action $a_t$ and making an observation $x_t$, the FFG for the rolled-out generative model is given by the following FFG:
The open red nodes for $p(x_{t+k})$ specify desired future observations, whereas the open black boxes for $p(s_k|s_{k-1},u_k)$ and $p(x_k|s_k)$ reflect the agent's beliefs about how the world actually evolves (ie, the veridical model).
The (brown) dashed box is the agent's Markov blanket. Given the states on the Markov blanket, the internal states of the agent are independent of the state of the world.
Here we solve the cart parking problem as stated at the beginning of this lesson. We first specify a generative model for the agent's environment (which is the observed noisy position of the cart) and then constrain future observations by a prior distribution that is located on the target parking spot. Next, we schedule a message passing-based inference algorithm for the next action. This is followed by executing the Act-execute-observe --> infer --> slide procedure to infer a sequence of consecutive actions. Finally, the position of the cart over time is plotted. Note that the cart converges to the target spot.
using Pkg;Pkg.activate("probprog/workspace");Pkg.instantiate()
IJulia.clear_output();
using PyPlot, LinearAlgebra, ForneyLab
# Load helper functions. Feel free to explore these
include("ai_agent/environment_1d.jl")
include("ai_agent/helpers_1d.jl")
include("ai_agent/agent_1d.jl")
# Internal model perameters
gamma = 100.0 # Transition precision
phi = 10.0 # Observation precision
upsilon = 1.0 # Control prior variance
sigma = 1.0 # Goal prior variance
T = 10 # Lookahead
# Build internal model
fg = FactorGraph()
o = Vector{Variable}(undef, T) # Observed states
s = Vector{Variable}(undef, T) # internal states
u = Vector{Variable}(undef, T) # Control states
@RV s_t_min ~ GaussianMeanVariance(placeholder(:m_s_t_min),
placeholder(:v_s_t_min)) # Prior state
u_t = placeholder(:u_t)
@RV u[1] ~ GaussianMeanVariance(u_t, tiny)
@RV s[1] ~ GaussianMeanPrecision(s_t_min + u[1], gamma)
@RV o[1] ~ GaussianMeanPrecision(s[1], phi)
placeholder(o[1], :o_t)
s_k_min = s[1]
for k=2:T
@RV u[k] ~ GaussianMeanVariance(0.0, upsilon) # Control prior
@RV s[k] ~ GaussianMeanPrecision(s_k_min + u[k], gamma) # State transition model
@RV o[k] ~ GaussianMeanPrecision(s[k], phi) # Observation model
GaussianMeanVariance(o[k],
placeholder(:m_o, var_id=:m_o_*k, index=k-1),
placeholder(:v_o, var_id=:v_o_*k, index=k-1)) # Goal prior
s_k_min = s[k]
end
# Schedule message passing algorithm
algo = messagePassingAlgorithm(u[2]) # Infer internal states
source_code = algorithmSourceCode(algo)
eval(Meta.parse(source_code)) # Loads the step!() function for inference
s_0 = 2.0 # Initial State
N = 20 # Total simulation time
(execute, observe) = initializeWorld() # Let there be a world
(infer, act, slide) = initializeAgent() # Let there be an agent
# Step through action-perception loop
u_hat = Vector{Float64}(undef, N) # Actions
o_hat = Vector{Float64}(undef, N) # Observations
for t=1:N
u_hat[t] = act() # Evoke an action from the agent
execute(u_hat[t]) # The action influences hidden external states
o_hat[t] = observe() # Observe the current environmental outcome (update p)
infer(u_hat[t], o_hat[t]) # Infer beliefs from current model state (update q)
slide() # Prepare for next iteration
end
# Plot active inference results
plotTrajectory(u_hat, o_hat)
;
If interested, here is a link to a more detailed version of the 1D parking problem.
We also have a 2D version of this cart parking problem implemented on Raspberry Pi-based robot. (Credits for this implemention to Thijs van de Laar and Burak Ergul).
Just to be sure, you don't need to memorize all FE/EFE decompositions nor are you expected to derive them on-the-spot. We present these decompositions only to provide insight into the multitude of forces that underlie FEM-based action selection.
In a sense, the FEP is an umbrella for describing the mechanics and self-organization of intelligent behavior, in man and machines. Lots of sub-fields in AI, such as reinforcement learning, can be interpreted as a special case of active inference under the FEP, see e.g., Friston et al., 2009.
Is EFE minimization really different from "regular" FE minimization? Not really, it appears that EFE minimization can be reformulated as a special case of FE minimization. In other words, FE minimization is still the only game in town.
Active inference also completes the "scientific loop" picture. Under the FEP, experimental/trial design is driven by EFE minimization. Bayesian probability theory (and FEP) contains all the equations for running scientific inquiry.
In the end, all the state inference, parameter estimation, etc., in this lecture series could have been implemented by FE minimization in an appropriately specified generative probabilistic model. However, the Free Energy Principle extends beyond state and parameter estimation. Driven by FE minimization, brains change their structure as well over time. In fact, the FEP extends beyond brains to a general theory for biological self-organization, e.g., Darwin's natural selection process may be interpreted as a FE minimization-driven model optimization process, and here's an article on FEP for predictive processing in plants. Moreover, Constrained-FE minimization (rephrased as the Principle of Maximum Relative Entropy) provides an elegant framework to derive most (if not all) physical laws, as Caticha exposes in his brilliant monograph on Entropic Physics. Indeed, the framework of FE minimization is known in the physics community as the very fundamental Principle of Least Action that governs the equations-of-motion in nature.
So, the FEP is very fundamental and extends way beyond applications to machine learning. At our research lab at TU/e, we work on developing FEP-based intelligent agents that go out into the world and autonomously learn to accomplish a pre-determined task, such as learning-to-walk or learning-to-process-noisy-speech-signals. Free free to approach us if you want to know more about that effort.
In the derivations above, we decomposed the EFE into an upperbound on the sum of a goal-seeking and information-seeking term. Here, we derive an alternative (exact) decomposition that more clearly reveals the goal-seeking objective.
We consider again the EFE and factorize the generative model $p(x,s|u) = p^\prime(x) p(s|x,u)$ as a product of a target prior $p^\prime(x)$ on observations and a veridical state model $p(s|x,u)$.
Through the target prior $p^\prime(x)$, the agent declares which observations it wants to observe in the future. (The prime is just to distinguish the semantics of a desired future from the model for the actual future).
Through the veridical state model $p(s|x,u)$ , the agent implicitly declares its beliefs about how the world will actually generate observations.
it follows that in practice the agent may specify $p(s|x,u)$ implicitly by explicitly specifying a state transition model $p(s|u)$ and observation model $p(x|s)$.
Hence, an AIF agent holds both a model for its beliefs about how the world will actually evolve AND a model for its beliefs about how it desires the world to evolve!!
To highlight the role of these two models in the EFE, consider the following alternative EFE decomposition:
In this derivation, we have assumed that we can use the generative model to make inferences in the "forward" direction. Hence, $q(s|u)=p(s|u)$ and $q(x|s)=p(x|s)$.
The terms "ambiguity" and "risk" have their origin in utility theory for behavioral ecocomics. Minimization of EFE leads to minimizing both ambiguity and risk.
Ambiguous (future) states are states that map to large uncertainties about (future) observations. We want to avoid those ambiguous states since it implies that the model is not capable to predict how the world evolves. Ambiguity can be resolved by selecting information-seeking (epistemic) actions.
Minimization of the second term (risk) leads to choosing actions ($u$) that align predicted future observations (represented by $p(x|u)$) with desired future observations (represented by $p^\prime(x)$). Agents minimize risk by selecting pragmatic (goal-seeking) actions.
$\Rightarrow$ Actions fulfill desired expectations about the future!
According to Friston, an ``intelligent'' agent like a brain minimizes a variational free energy functional, which, in general, is a functional of a probability distribution $p$ and a variational posterior $q$.
What should the agent's model $p$ be modeling? This question was (already) answered by Conant and Ashby (1970) as the Good Regulator Theorem: every good regulator of a system must be a model of that system.
A Quote from Conant and Ashby's paper (this statement was later finessed by Friston (2013)):
"The theory has the interesting corollary that the living brain, insofar as it is successful and efficient as a regulator for survival, must proceed, in learning, by the formation of a model (or models) of its environment."
open("../../styles/aipstyle.html") do f
display("text/html", read(f,String))
end