This notebook is part of the supplementary material for:

Genewein T., Leibfried F., Grau-Moya J., Braun D.A. (2015) *Bounded rationality, abstraction and hierarchical decision-making: an information-theoretic optimality principle*, Frontiers in Robotics and AI.

More information on how to run the notebook on the accompanying github repsitory where you can also find updated versions of the code and notebooks.

This notebook corresponds to Section 3 and reproduces Figures 5, 6, 7 and 8 of the paper.

Note that the notebook **is not limited to the serial case** as it computes the serial-case solutions through the general case by setting the inverse temperatures accordingly. The parallel or general case can thus be easily reproduced with different settings of the $\beta$-values.

Here, we design a two-stage serial information processing system that first processes a world-state $w$ into an internal percept $x$ and then uses this percept to compute an action $a$. We assume that the action is conditionally independent of the world-state given the percept. This simply means that there is no direct flow of information from $W$ to $A$ and corresponds to the following graphical model $$ W \rightarrow X \rightarrow A$$ that leads to $p(w,x,a)=p(w)p(x|w)p(a|x)$.

Each stage of the information processing system is characterized by an information-theoretic channel

- $W \rightarrow X$ with the rate $I(W;X)$
- $X \rightarrow A$ with the rate $I(X;A)$

We want to maximize the utility function $U(w,a)$ while at the same time not exceeding some limits on the information processing rates on each stage. This can be formalized with the following variational problem $$\underset{p(x|w),p(a|x)}{\operatorname{arg~max}}~\mathbf{E}_{p(w,x,a)}[U(w,a)] - \frac{1}{\beta_1} I(W;X) - \frac{1}{\beta_2} I(O;A)$$

The solution to the optimization problem is given by $$\begin{align} p^*(x|w)&=\frac{1}{Z(w)}p(x)\exp\left(\beta_1 \Delta F_{\text{ser}}(w,x)\right)\\ p(x)&=\sum_w p(w)p^*(x|w)\\ p^*(a|x)&=\frac{1}{Z(x)} p(a) \exp \left(\beta_2 \sum_w p(w|x) U(w,a) \right)\\ p(a)&=\sum_{w,x} p(w)p^*(x|w)p^*(a|x), \end{align}$$ where $Z(w)$ and $Z(x)$ denote the corresponding normalization constants or partition sums. The conditional probability $p(w|x)$ is given by Bayes' rule $p(w|x) = \frac{p(w)p^*(x|w)}{p(x)}$ and $$\Delta F_{\text{ser}}(w,x):=\mathbf{E}_{p^*(a|x)}[U(w,a)] - \frac{1}{\beta_2} D_{\mathrm{KL}}(p^*(a|x)||p(a))$$ is the free energy difference of the action stage.

Interestingly, the solutions above can be used for constructing bounded-optimal perception-action systems. Perception-action systems need to perform inference over a (hidden) world-state $w$ and then use the gathered information $x$ to drive a decision-maker. Classically, inference and decision-making are separated: $$\begin{align} &\text{Bayesian inference:} & p(w|x)&=\frac{p(w)p(x|w)}{\sum_w p(w)p(x|w)} \\ &\text{Bounded rational decision:} & p^*(a|x)&=\frac{1}{Z(x)}p(a)e^{\beta_2 U(x,a)} \end{align}$$ where the last equation describes a bounded rational decision-maker as in Section 2 of the paper. The problem is that in such a case there is no principled way to specify the likelihood model $p(x|w)$. Often it is chosen to represent the world-state as faithfully as possible.

Now compare the two equations for Bayesian inference and bounded rational decision making with the four equations that we got as the solution to our variational problem. Interestingly Bayesian inference and bounded rational decision making drop out naturally - additionally the bounded optimal likelihood model $p^*(x|w)$ is also well-defined. Interestingly it does not try to represent $w$ as faithfully as possible but rather it tries to optimize the downstream free-energy trade-off of the action-part of the system. This leads to a tight coupling between action and perception, which will be illustrated in the example below.

Below, we compare the bounded-optimal perception-action system against a system that uses a handcrafted likelihood $p_\lambda(x|w)$. The model has a tunable precision-parameter $\lambda$ that affects the information processing rate of the perceptual channel $I(X;W)$ - this parameter corresponds to the level of perceptual noise and influences the likelihood of "confusing" one particular $w$ with similar ones.

The observation or percept $x$ is a discretized noisy version of $w$ with precision $\lambda$:

$$x|w,\lambda \sim \text{round}(\mathcal{N}(w,1/\lambda))$$By using this construction it is already obvious that the action-part of the system or the utility function have no influence on the perceptual part of the system. We will highlight this further in the example below.

In [16]:

```
#only run this once
include("RateDistortionDecisionMaking.jl")
```

Out[16]:

In [2]:

```
using RateDistortionDecisionMaking, Distances, DataFrames, Colors, Gadfly, Distributions, Interact, Reactive
#make the default plot size a bit larger
set_default_plot_size(15cm, 12cm)
```

To illustrate perception-action systems we design the follwing example. The agent is an animal and it encounters nine other kinds of animals that come in three different size-groups - each group has three members.

- G1: Small: 2,3,4
- G2: Medium: 6,7,8
- G3: Large: 10,11,12

The agent has sensors to observe $x$ which is a noisy version of $w$. The rate $I(W;X)$ governs how precise or noisy the percept is (for the hand-crafted model the precision is directly controlled by $\lambda$). Medium precision mostly leads to within-group confusion, low precision $\lambda$ leads to strong across-group confusion.

The agent can hunt the animals of the small and medium-sized group for food but it must flee from the large animals as it could fall prey to them. Alternatively, you can change the utility function to a mating scenario, where the agent must try to find a animals of a specific size within the medium-sized group to mate with (see description below the plot of the utility function for more details).

... change the utility function by commenting/uncommenting in the code-block below. If all other parameters are kept the same, we can observe how the bounded optimal percept $p^*(x|w)$ is coupled to the utility function.

In [3]:

```
#set up predator-prey example
include("PredatorPreyExample.jl")
###### Predator-prey scenario #############
#same utility as used in the main paper
w_values, w_strings, a_values, a_strings, p_w, U = setup_predator_prey_example()
###### Mating scenario ####################
#alternative utility - mating scenario
#w_values, w_strings, a_values, a_strings, p_w, U = setup_predator_prey_example(mating_utility=true)
numa = length(a_strings)
a_vec = collect(1:numa)
numw = length(w_strings)
w_vec = collect(1:numw)
#pre-compute utility
U_pre, Umax = setuputilityarrays(a_values,w_values,U)
#visualize utility
plt_utility = visualizeMatrix(U_pre, w_values, a_values, w_strings, a_strings, xlabel="animal size w",
ylabel="Action a", legendlabel="U(a,w)")
```

Out[3]:

Possible kinds of **animals**:

- Small animals (w={2,3,4})
- Can't hear well - sneaking up on them is very likely succesful
- The generic sneak-up pattern works very well
- An ambush also works but bears the risk that the animal is not moving towards you
- For each w2, w3, w4 there is a specific sneak-up pattern that increases chances of success for that particular kind of animal - however, using it on the wrong animal, the success rate will be 20% lower

- Medium-sized animals (w={6,7,8})
- Can hear well - sneaking up on them has lower chances of success compared to ambush
- The generic ambush pattern works well but bears the risk that the animal is not moving towards you
- A sneak up might also work but chances are quite low
- There are specific ambush patterns for w6, w7, w8 but using them has no advantage over the generic ambush, however using them on the wrong animal, the sucess rate will be 20% lower

- Large animals (w={10,11,12})
- Can potentially kill you, your chances of survival are low if you sneak up or ambush them (no matter with which kind of pattern)
- If you flee, your chances of survival are quite good

Possible kinds of **actions**:

- Generic
*Ambush*: wait for the animal to get close and then strike- advantage: causes no noise
- disadvantage: animal might not come towards you

- Generic
*Sneak up*: slowly move closer to the animal and then strike- advantage: works also if animal is not moving towards you
- disadvantage: animal might hear or you and flee

- Generic
*Flee*: run away from animal- advantage: if animal could kill you, your chances of survival are significantly increased
- disadvantage: if animal was potential prey you missed out on food

- Specific
*Sneak up*for w={2,3,4}: specific sneak up pattern that increases success when applied to exactly the right kind of animal, but decreases success when applied to another animal within the small animal group. When applied to an animal from the medium-sized animal group, the success-rate is equal to the generic sneak up pattern. - Specific
*Ambush*for w={6,7,8}: does not increase success compared to generic ambush pattern, but decreases success when applied to another animal within the medium-sized group. When applied to an animal from the small group, the success-rate is equal to the generic ambush pattern. As there is no advantage of the specific pattern over the genric pattern (but it might be disadvantegous when applied to the wrong animal), any bounded-rational decision-maker should assign no probability mass to these specific ambush patterns since using them would not increase the expected utility but would increase the informational cost.

Compared to the generic sneak-up and ambush patterns, the specific patterns require more computational resources, that is they require a larger rate on the action-channel, or in other words: they require good motor skills and a good motor hardware (which is more costly than a cheap motor system).

If you have bad motoric hardware, you'll never be able to use the specific hunting patterns, therefore it doesn't make sense to waste resources on an expensive perceptual system that allows you to precisely distinguish individual animals. Rather a system that allows you to tell the different groups from each other suffices. Dually, if your perceptual system does not allow you to distinguish between the individual animals, it would be lavish to have an expensive motor system that allows you to accurately execute very specific hunting patterns.

You are a medium sized animal (w=6) and try to find a mate - potential mates are w=8.

Possible actions

- Display
- Get the attention of the other animal which could be a potential mate

- Flee
- Quickly flee from the other animal

Possible kinds of **animals**:

- Small animals (w={2,3,4})
- Uninteresting. Fleeing from them will cost resources, displaying to them will also waste a similar amount of resources.

- Medium-sized animals
- w=6 rival, same size as you: displaying to them might drive them away, but has a risk of confrontation. Fleeing significantly decreases your chances of mating.
- w=7 rival, bigger than you. Displaying to them will most probably get you injured, fleeing from them is the better option.
- w=8 potential mate. Displaying to them will significantly increase your chances to mate, fleeing from them has a low utility.

- Large animals (w={10,11,12})
- Can potentially kill you, fleeing from them is good, displaying to them will almost surely get you killed.

Below, the hand-crafted perceptual model can be inspected by changing the precision parameter $\lambda$. Note that this will also directly influence the rate $I(W;X)$.

In [4]:

```
#set up hand-crafted likelihood model and p(x|w)
x_values = collect(1:13)
x_vec = collect(1:length(x_values))
x_strings = map((x)->string(x), x_values)
#slider for selecting λ
λ_vals = 0.1:0.1:5
slider_l = slider(λ_vals,label="Perceptual precision λ")
#use lift to connect the actual plotting-code to the slider
plt_pxgw_sl = lift(λ_vis->begin
p_xgw_hc = pogw_handcrafted(x_values, w_values, λ_vis)
visualizeBAconditional(p_xgw_hc, x_vec, w_vec, x_strings, w_strings,
wlabel="animal size w", alabel="observed size x", legendlabel="p(x|w,λ)")
end, slider_l)
display(slider_l)
display(plt_pxgw_sl)
```

...drag the slider above to change the perceptual precision and see how observations get more and more "blurry" with decreasing precision. Note that low perceptual precision incurs a low mutual information $I(X;W)$ whereas high precision also incurs a large mutual information between the actual size and the observed size.

If you see the slider but not the plot and `"Javascript error adding output!"`

, try moving the slider once, then the plot should appear.

If this does not work (i.e. kernel is busy with this cell but nothing ever happens), try ckecking out the most current version of Interact with: `Pkg.checkout("Interact")`

in a Julia console. You can undo this and go back to the latest released version with `Pkg.free("Interact")`

When changing the parameters make sure to re-run the corresponding cells below - in case of doubt simply re-run all cells below.

- $\lambda=1.65$, $\beta_1 = 8$, $\beta_2 = 10$ (same $I(X;W)$ for both models - almost unbounded actor)
- $\lambda=1.65$, $\beta_1 = 8$, $\beta_2 = 1$ (same EU - bounded actor, optimal perception changes, given the limits on the action channel)
- $\lambda=0.4$, $\beta_1 = 1$, $\beta_2 = 10$ (same $I(X;W)$ for both models - bounded actor, optimal action changes, given the limits on the perceptual channel)

- $\lambda=10$, $\beta_1 = 10$, $\beta_2 = 5$, mating-utility (optimal perception changes under a different utility function - compare against case above with identical parameters but different utility function)

Change the temperature on the action-channel (with very large resources on the observation channel) and see how the perception is affected by the action.

Change the temperature on the perception-channel (with very large resources on the action channel) and see how the action is affected by perception.

Change the utility function (mating-utility) and see how the percept is affected by the new utility.

In [5]:

```
#precision of the hand-crafted perceptual model
λ = 1.65 #0.4 #1.65
#the "price" for I(X;W) in the handcrafted case will be β1 (as given below)
#to make the two cases comparable. If you want I(X;W) to be the same in both
#cases, set β1 and run the sequential case and then tune λ until the mutual
#information terms are equal.
#
#Note that β2 will also be used for the bounded rational decision-maker that
#uses the hand-crafted perception. This should allow for easy comparison of
#the action-channels in both cases (hand-crafted vs. sequential case)
#inverse temperatures for sequential case
#β1: perceptual channel -> price for I(X;W)
#β2: action channel -> price for I(A;X)
#β3=0 sequential case (otherwise the general three-variable case is specified)
β1 = 8 #1
β2 = 10
β3 = 0;
```

**Note that the simulation implements the soltuions of the general case. By setting $\beta_3=0$, the serial case is recovered (which corresponds to the setting in the paper). However, any of the other cases (parallel or general) can be simulated by simply setting the inverse temperatures accordingly (in the code-cell above)**

The code cells below simulate a bounded-rational decision-maker using the hand-crafted perceptual model and visualize the solution.

In [6]:

```
#compute hand-crafted perception, then feed it into bounded rational decision-maker
#with inv. temperature β2 and compute optimal p(a|x) using hand-crafted p(x|w)
p_xgw_hc = pogw_handcrafted(x_values, w_values, λ)
numx = length(x_vec)
numa = length(a_vec)
#comptue expected utility under posterior over w given x
#
#posterior p(w|x)
p_wgx = zeros(numw, numx)
for k in 1:numx
p_wgx[:,k] = p_xgw_hc[k,:]' .* p_w
p_wgx[:,k] += eps() #add small epsilon to prevent numerical problems for λ>>
p_wgx[:,k] /= sum(p_wgx[:,k])
end
#U(a,x)=∑_w p(w|x)U(a,w)
U_ax = zeros(numa, numx)
for k in 1:numx
U_ax[:,k] = U_pre * p_wgx[:,k]
end
#p(x)=∑_w p(w)p(x|w)
p_x_hc = p_xgw_hc * p_w
#make sure that p(x) has non-zero entries (otherwise numerical problems arise in I(X;W))
p_x_hc += eps()
p_x_hc /= sum(p_x_hc)
#solve the decision-making part with one-step Blahut-Arimoto
ε = 0.0001 #convergence critetion for BAiterations
maxiter = 10000 #maximum number of BA iterations
#initialize p(a) uniformly
num_acts = length(a_vec)
pa_init = rand(num_acts)
pa_init /= sum(pa_init)
#BA iterations
p_agx_hc, p_a_hc, perf_df_hc = BAiterations(pa_init, β2, U_ax, p_x_hc, ε, maxiter,
compute_performance=true, performance_as_dataframe=true)
; #suppress output
```

In [7]:

```
#visualize solution p(x|w), p(a|x), p(a|w)
#p(x|w)
plt_pxgw_hc = visualizeBAconditional(p_xgw_hc, x_vec, w_vec, x_strings, w_strings,
wlabel="animal size w", alabel="observed size x", legendlabel="p(x|w,λ)")
#p(a|x)
plt_pa_hc, plt_pagx_hc = visualizeBAsolution(p_a_hc, p_agx_hc, a_vec, x_vec, a_strings, x_strings,
wlabel="observed size x", alabel="action a",
legendlabel_marginal="p(a)", legendlabel_conditional="p(a|x,λ)",
suppress_vis=true)
#compute p(a|w)
#p(a|w) = ∑_x p(x|w)p(a|x)
p_agw_hc = p_agx_hc * p_xgw_hc
plt_pagw_hc = visualizeBAconditional(p_agw_hc, a_vec, w_vec, a_strings, w_strings,
wlabel="animal size w", alabel="action a", legendlabel="p(a|w,λ)")
#visualize p(x|w)
display(plt_pxgw_hc)
#visualize p(a|x)
display(plt_pagx_hc)
#visualize p(a|w)
display(plt_pagw_hc)
```

The plots above show $p_\lambda(x|w)$, $p_\lambda(a|x)$ and the overall behavior $p_\lambda(a|w)$ respectively.

Below we compute the information terms of the hand-crafted solution and visualize them.

**! Note that in the legends below $O$ is used instead of $X$**

In [8]:

```
#compute p(a|x,w) = p(a|x) ∀w
p_agxw_hc = zeros(numa, numx, numw)
for j in 1:numw
p_agxw_hc[:,:,j] = p_agx_hc
end
#compute performance measures of solution
res_hc = analyze_three_var_BAsolution(p_w, p_x_hc, p_a_hc, p_xgw_hc, p_agx_hc, p_agxw_hc, p_agw_hc, U_pre, β1, β2, β3)
perf_df_hc = performancemeasures2DataFrame(res_hc...)
#visualize
p_MI, p_composed, p_perf = plot_three_var_performancemeasures(perf_df_hc, maximum(U_pre), β1, β2, β3)
display(hstack(p_MI, p_composed))
display(p_perf)
```

The code below simulates and visualizes a bounded optimal perception action system.

In [9]:

```
ε = 0.0001 #convergence critetion for BAiterations
maxiter = 10000 #maximum number of BA iterations
#This function performs Blahut-Arimoto iterations for the three-variable general case
px, pa, pxgw, pagx, pagxw, pagw, performance_df = threevarBAiterations(numx, β1, β2, β3, U_pre, p_w, ε, maxiter,
compute_performance=true, performance_as_dataframe=true,
init_pogw_sparse = true, init_pogw_uniformly = false,
init_pagow_uniformly = true);
```

In [10]:

```
#call the routine that creates plots for all the probability distributions
plt_px, plt_pa, plt_pxgw, plt_pagx, plt_pagw, dpdown, plt_pagow_vis = visualize_three_var_BAsolution(px, pa,
pxgw, pagx, pagxw, pagw,
x_vec, a_vec, w_vec,
x_strings, a_strings, w_strings,
olabel_string="x", alabel_string="a", wlabel_string="w")
#visualize p(x|w)
display(plt_pxgw)
#visualize p(a|x)
display(plt_pagx)
#visualize p(a|w)
display(plt_pagw)
```