- Goal
- Introduction to Active Inference and application to the design of synthetic intelligent agents

Materials

- Mandatory
- These lecture notes
- Karl Friston - 2016 - The Free Energy Principle (video)

Optional

- Raviv (2018), The Genius Neuroscientist Who Might Hold the Key to True AI.
- Interesting article on Karl Friston, who is a leading theoretical neuroscientist working on a theory that relates life and intelligent behavior to physics (and Free Energy minimization). (
**highly recommended**)

- Interesting article on Karl Friston, who is a leading theoretical neuroscientist working on a theory that relates life and intelligent behavior to physics (and Free Energy minimization). (
- Kirsch (2019), Theories of Intelligence (2/2): Active Inference
- A nice tutural blog on active inference.

- Van de Laar and De Vries (2019), Simulating Active Inference Processes by Message Passing
- How to implement active inference by message passing in a Forney-style factor graph.

- Raviv (2018), The Genius Neuroscientist Who Might Hold the Key to True AI.
References

- Friston (2013), Life as we know it
- Conant and Ashby (1970), Every good regulator of a system must be a model of that system

- Mandatory

LET'S DO THE MOUNTAIN CAR TASK HERE (THIJS CODE?) or BATMAN PARKING. ANY SUGGESTIONS?

- In the previous lessons we assumed that a data set was given.
- In this lesson we consider
*agents*. An agent is a system that*interacts*with its environment through both sensors and actuators. - Crucially, by acting onto the environment, the agent is able to affect the data that it will sense in the future.
- As an example, by changing the direction where I look, I can affect the sensory data that will be sensed by my retina.

- With this definition of an agent, (biological) organisms are agents, and so are robots, self-driving cars, etc.
- In an engineering context, we are particularly interesting in agents that behave with a
*purpose*(with a goal in mind), e.g. to drive a car or to design a speech recognition algorithm. - In this lesson, we will describe how
**goal-directed behavior**by biological (and synthetic) agents can also be interpreted as minimization of a free energy functional $F[q]$.

We begin with a motivating example that requires "intelligent" goal-directed decision making: assume that you are an owl and that you're hungry. What are you going to do?

Have a look at Prof. Karl Friston's answer in this video segment by on the cost function for intelligent behavior. (

**Do watch the video!**)Friston argues that intelligent decision making (behavior, action making) by an agent requires

*minimization of a functional of beliefs*.Friston further argues that this functional is a (variational) free energy (to be defined below), thus linking decision making to Bayesian inference.

In fact, Friston's

**Free Energy Principle**(FEP) claims that all biological self-organizing processes (including brain processes) can be described as Free Energy minimization in a probabilistic model.- This includes perception, learning, attention mechanisms, recall, action and decision making, etc.

Taking inspiration from FEP, if we want to develop synthetic "intelligent" agents, we have (only) two issues to consider:

- The specification of the FE functional (includes specification of generative model and constraints on the approximate posterior, a.k.a. the "recognition" model).
*How*to minimize the FE functional?

What should the agent's model be modeling? This question was (already) answered by Conant and Ashby (1970) as the

*good regulator theorem*:**every good regulator of a system must be a model of that system**.From Conant and Ashby's paper (this statement was later finessed by Friston (2013)):

The theory has the interesting corollary that the living brain, insofar as it is successful and efficient as a regulator for survival,

*must*proceed, in learning, by the formation of a model (or models) of its environment."

We will follow the idea that an agent needs to hold a generative model for its environment, which is observed through sensory channels. The environmental dynamics can be affected through actions onto the environment.

Agents that follow the FEP and infer actions by inference in a generative model of the environment are engaged in a process called

**active inference**. Let's draw a diagram to show the interactions between an active inference agent and its environment.

An active inference-based agent comprises

- A free energy functional $F[q] = \mathbb{E}_q\left[ \log\frac{q(z)}{p(x,z)}\right]$, where
- $p(x,z) = \prod_k p(x_k,z_k|z_{k-1})$ is a
*generative*model with observations $\{x_k\}$, latent variables $\{z_k\} = \left\{ \{s_k\}, \{u_k\}, \{\theta_k\}\right\}$ and $k$ is a time index. - $q(z)$ is a
*recognition*model.

- $p(x,z) = \prod_k p(x_k,z_k|z_{k-1})$ is a
- A recipe to minimize the free energy $F[q]$

- A free energy functional $F[q] = \mathbb{E}_q\left[ \log\frac{q(z)}{p(x,z)}\right]$, where
In the model above, the hidden variables $\{z_k\}$ of the agent comprise

*internal*states $\{s_k\}$,*control*variables $\{u_k\}$ (which are "observed" by the environment as actions $\{a_k\}$), and*parameters*$\{\theta_k\}$.We also assume that the agent interacts with an environment, which we represent by a dynamic model $$ (y_t,\tilde{s}_t) = R_t\left( a_t,\tilde{s}_{t-1}\right) $$ where $a_t$ are

*actions*, $y_t$ are*outcomes*and $\tilde{s}_t$ holds the environmental*states*.In the above equations, $u_t$ and $x_t$ are owned by the agent model, whereas $a_t$ and $y_t$ are variables in the environment model.

The agent can push actions $a_t$ onto the environment and measure responses $y_t$, but has no access to the environmental states $\tilde{s}_t$.

Interactions between the agent and environment are described by $$\begin{align*} a_t &\sim q(u_t) \\ x_t &= y_t \end{align*}$$ iow, actions are drawn from the posterior over control signals.

- Note that this system implies a recursive dependency since the agent's future observations depend on the agent's current (and past) actions: $$x_{t+1} = x_{t+1} \left( a_{t+1} \right) = x_{t+1} \left( a_{t+1} \left( u_{t+1}\left( x_t \left( a_t \left( \cdots \right) \right) \right)\right) \right)$$
- $\Rightarrow$ As a result,
**the agent actively engages in selecting its own data set!**

- $\Rightarrow$ As a result,

In biotic parlance,

*behavior*is inference for the control signals ($u$)*perception*is inference for the internal states ($s$).*learning*is inference for the parameters ($\theta$)

The CA decomposition of free energy shows that

*actions*aim to maximize accuracy since model complexity is not a function of the observations (and $x = x(a)$) $$ F[q]= \underbrace{\sum_z q(z)\log\frac{q(z)}{p(z)}}_{\text{complexity}} - \underbrace{\sum_z q(z) \log p(x|z)}_{\text{accuracy}}$$The DE decomposition reveals that

*perception*and*learning*minimize inference costs since log-evidence is not affected by inference (not a function of $q$) $$F[q] = \underbrace{\sum_z q(z) \log \frac{q(z)}{p(z|x)}}_{\substack{\text{divergence}\\ \text{"inference costs"}}} - \underbrace{\log p(x)}_{\text{log-evidence}}$$

Biological agents select their observations by controling their environment. Perception (and learning) serve to improve this data selection process by updating beliefs about the state of the world.

This process begs the question: if a (biological) agent seeks out observations, then which observations is the agent interested in? I.o.w. does the agent have a

**goal**"in mind" when it engages in active data selection?Yes! Agents set preferences for observations by prior distributions on

*future*sensations!- E.g. a self-driving agent in a car expects to observe no collisions.

We assume that agents live in a dynamic environment and consider the following generative model for the agent (omitting parameters $\theta$), and assuming the current time is $t$: $$\begin{align*} p^\prime(x,s,u) &= p(s_{t-1}) \prod_{k=t}^{t+T} \underbrace{p(x_k|s_k) \cdot p(s_k | s_{k-1}, u_k)}_{\text{internal dynamics}} \cdot\underbrace{p(u_k)}_{\substack{\text{control prior}}} \end{align*}$$

Note that the generative model at time $t$ can be run to make predictions (beliefs) about future observations $x_{t+1:T}$.

In order to infer

*goal-driven*(i.e., purposeful) behavior, we now add prior beliefs $p^+(x)$ about desired future observations, leading to an*extended*agent model: $$\begin{align*} p(x,s,u) &= \frac{p^\prime(x,s,u) p^+(x)}{\int_x p^\prime(x,s,u) p^+(x) \mathrm{d}x} \\ &\propto \underbrace{p(s_{t-1}) \prod_{k=t}^{t+T} p(x_k|s_k) p(s_k | s_{k-1}, u_k) p(u_k)}_{\text{original generative model}} \underbrace{p^+(x_k)}_{\substack{\text{extension}\\\text{"goal prior"}}} \end{align*}$$Goal-directed behavior follows from inference for controls (actions) at $t$, based on expectations (encoded by priors) about future ($>t$) observations.

$\Rightarrow$ Actions fulfill expectations about the future!

- After selecting an action $a_t$ and making an observation $y_t$, the FFG for the extended generative model is given by the following FFG:

- The (brown) dashed box is the agent's Markov blanket. Given the states on the Markov blanket, the internal states of the agent are independent of the state of the world.

- Online active inference proceeds by iteratively executing three stages: (1) act-execute-observe, (2) infer the next control/action, (3) slide forward

IMPLEMENT THE MOUNTAIN CAR/BATMAN WITH FORNEYLAB

Consider the agent's inference task at time step $t$, right after having selected an action $a_t$ and having made an observation $y_t$.

As usual, we record actions and observations by substituting the values into the generative model(in the Act-Execute-Observe phase): $$\begin{align*} p(x,s,u) &\propto \underbrace{p(x_t=y_t|s_t)}_{\text{observation}} p(s_t|s_{t-1},u_t) p(s_{t-1}) \underbrace{p(u_t=a_t)}_{\text{action}} \\ & \quad \cdot \underbrace{\prod_{k=t+1}^{t+T} p(x_k|s_k) p(s_k | s_{k-1}, u_k) p(u_k) p^+(x_k)}_{\text{future}} \end{align*}$$

Note that (future) $x$ is also a latent variable and hence we include $x$ in the recognition model.

This leads to the following free energy functional $$\begin{align*} F[q] &\propto \sum_{x,s,u} q(x,s,u) \log \frac{q(x,s,u)}{p(x,s,u)} \end{align*}$$

Lots of interesting FE decompositions are possible again. For instance $$\begin{align*} F[q] &\propto \sum_{x,s,u} q(x,s,u) \log \frac{q(x,s,u)}{p(x,s,u)} \\ &= \sum_{u} q(u) \underbrace{\sum_{x,s} q(x,s|u)\log \frac{q(x,s|u)}{p(x,s|u)}}_{F_u[q]} + \underbrace{\sum_{u} q(u) \log \frac{q(u)}{p(u)}}_{\text{complexity}} \end{align*}$$ breaks the FE into a complexity term and a term $F_u[q]$ that is conditioned on the policy $u$.

It can be shown (exercise) that the optimal posterior for the policy is now given by $$ q^*(u) \propto p(u) \exp \left( -F^*_u \right) $$

Let's consider a break-up $x=(x_t,x_{>t})$ with $x_{>t} = (x_{t+1},\ldots,x_{t+T})$ that recognizes the distinction between already observed and future data. Then $$\begin{align*} F_u[q] &= \underbrace{-\log p(x_t)}_{\substack{-\log(\text{evidence}) \\ \text{(surprise)}}} + \underbrace{\sum_{x,s} q(x_{>t},s|u)\log \frac{q(x_{>t},s|u)}{p(x_{>t},s|u)}}_{\substack{\text{divergence}\\ \text{(inference costs)}}}\,. \end{align*}$$

The inference costs (divergence term) can be further decomposed to $$\begin{align*} \underbrace{-\sum_{x} q(x_{>t}) \log p(x_{>t})}_{\substack{\text{expected surprise} \\ \text{(goal-directed, pragmatic costs)}}} + \underbrace{\sum_{x,s} q(x_{>t},s|u) \log \frac{q(x_{>t},s|u)}{p(s|x_{>t},u)}}_{\text{epistemic costs}} \end{align*}$$

Minimizing goal-directed costs selects actions that (expect to) fullfil the priors over future observations. Minimization of epistemic ("knowledge seeking") costs leads to actions that maximize information gain about the environmental dynamics. This can be seen by further decomposition of the epistemic costs into $$\begin{align*} &\sum_{x,s} q(x_>t,s|u) \log \frac{q(s|u)}{p(s|x_{>t},u)} + \sum_{x,s} q(x_{>t},s|u) \log q(x_{>t}|s,u) \\ \approx &\underbrace{\sum_{x,s} q(x_>t,s|u) \log \frac{q(s|u)}{q(s|x_{>t},u)}}_{-\text{mutual information}} - \underbrace{\mathbb{E}_{q(s|u)}\left[ H\left[ q(x_{>t}|s,u)\right]\right]}_{\text{ambiguity}} \end{align*}$$ where we used the approximation $q(s|x_{>t},u) \approx p(s|x_{>t},u)$ to illuminate the link to the mutual information.

Minimizing FE leads (approximately) to mutual information maximization between internal states $s$ and observations $x$. In other words, FEM leads to actions that aim to seek out observations that are maximally informative about the hidden causes of these observations.

Ambiguous states have uncertain mappings to observations. Minimizing FE leads to actions that try to avoid ambiguous states.

In short, if the generative model includes variables that represent (yet) unobserved future observations, then action selection by FEM leads to a very sophisticated behavioral strategy that is maximally consistent with

- Bayesian notions of model complexity
- evidence from past observations
- goal-directed imperatives by priors on future observations
- epistemic (knowledge seeking) value maximization, both in terms of MI maximization and avoidance of ambiguous states

All these imperatives are simultaneously represented and automatically balanced against each other in a single time-varying cost function (Free Energy) that needs no tuning parameters.

(Just to be sure, you don't need to memorize these derivations nor are you expected to derive them on-the-spot. We present these decompositions only to provide insight into the multitude of forces that underlie FEM-based action selection.)

In [1]:

```
open("../../styles/aipstyle.html") do f
display("text/html", read(f,String))
end
```

In [ ]:

```
```