# 1 - Thermodynamics and bounded rational decision-making¶

This notebook is part of the supplementary material for:
Genewein T., Leibfried F., Grau-Moya J., Braun D.A. (2015) Bounded rationality, abstraction and hierarchical decision-making: an information-theoretic optimality principle, Frontiers in Robotics and AI.

More information on how to run the notebook on the accompanying github repsitory where you can also find updated versions of the code and notebooks.

This notebook corresponds to Section 2.1 and reproduces Figure 1 of the paper.

## The problem with utility-maximization¶

Goal: design an agent that picks an action $a^*$ such that the utility $U(w,a)$ is maximized

$$a^* = \underset{a}{\operatorname{arg max}}~U(w,a),$$

where $w$ denotes a particular world-state (or context). An agent that deterministically maximizes its utility is called a (fully) rational agent.

Problem: if the set of possible actions $a$ is very large and the computational resources of the agent are limited, then finding the best action $a^*$ within a certain time-limit can become impossible.

Bounded rationality: because strictly maximizing the utility is (often) infeasible, use a satisficing action $\hat{a}$ that is "good enough" and computable with the limited resources of the agent. This basic idea of bounded rational decision-making can be implemented in many different ways. For instance, the agent could start searching through the space of possible actions (according to a prior $p_0(a)$) and once the time-limit is up it simply picks the best action that it found so far. Another alternative would be to set a certain utility-goal and pick the first action that satisfies this goal. In general this will lead to stochastic policies $p(a|w)$.

## Free energy optimization¶

One way to formalize bounded rational decision-making is by considering any change in behavior as computationally costly and then maximizing expected utility under the constraint that the computational effort does not exceed a certain bound. Change in behavior is quantified as the amount of information processing required to compute the desired behavior $p(a|w)$ given the prior $p_0(a)$. The prior $p_0(a)$ can be thought of as the agent's search strategy through the space of possible actions and $p(a|w)$ is the desired behavior, that maximizes expected utility $\sum_a p(a|w)U(a,w)$ under the constraint that the amount of computation stays bounded $D_\mathrm{KL}(p(a|w)||p_0(a)) \leq K$.

The question then is how to know the desired behavior? The answer to this question is found through a constrained optimization problem that can be turned into an unconstrained optimization problem using the method of Lagrange multipliers

$$p^*(a|w)=\underset{p(a|w)}{\operatorname{arg max}}~\sum_a p(a|w)U(w,a)-\frac{1}{\beta} \underbrace{\sum_a p(a|w) \log \frac{p(a|w)}{p_0(a)}}_{D_\mathrm{KL}(p(a|w)||p_0(a))},$$

where each $K$ translates into a particular value of $\beta$ and vice versa. Interestingly, this problem is very related to the free energy ($\Delta F$) variational problem in thermodynmics (strictly speaking the negative free energy difference)

$$\Delta F(w) = \underset{p(a|w)}{\operatorname{max}}\sum_a p(a|w)U(w,a)-\frac{1}{\beta} \sum_a p(a|w)\log \frac{p(a|w)}{p_0(a)}$$

Building upon the relations to thermodynamics (which are more than just a mathematical coincidence), the term $\beta$ is called the inverse temperature. It translates computational effort (usually in nats or bits) into the units of the utility function (utils). Since the whole second term has a negative sign it is considered a computational cost.

Through the free energy variational problem we have formalized information-theoretic bounded rationality as a problem of finding a desired behavior $p^*(a|w)$ that optimally trades off having large expected utility and low computational costs simultaneously. Computational costs are measured as the deviation of the desired behavior from the initial behavior $p_0(a)$. The trade-off is governed through the inverse temperature $\beta$.

## Solution to free energy optimization¶

Fortunately, the free energy variational problem has a closed-form analytic solution

$$p^*(a|w)=\frac{1}{Z(w)}p_0(a)e^{\beta U(w,a)},$$

where $Z(w)=\sum_a p_0(a|w)e^{\beta U(w,a)}$ is known as the partition sum. The inverse temperature also appears in the solution and has the following limit-cases

• No computational resources $\beta \rightarrow 0:~~~p^*(a|w)=p_0(a)$, posterior policy is equal to prior
• Unlimited computational resources $\beta \rightarrow \infty:~~~p^*(a|w)=\delta_{aa*}$, the policy becomes a delta-function for the optimal action $a^* = \underset{a}{\operatorname{arg max}}~U(w,a)$

For the cases in-between $0 \leq \beta \leq \infty$ the behavior $p^*(a|w)$ deviates towards large-utility outcomes but is bound by the maximally allowed KL-divergence from the prior.

# Example¶

The agent has to graps a cup from a table and has four potential actions to do so: $a_1, a_2, a_3, a_4$. Actions $a_1$ and $a_4$ correspond to successful grasps where $a_4$ is slightly better than $a_1$. Action $a_3$ is a successful grasp but it spills half the contents of the cup and action $a_2$ corresponds to an unsuccessful grasp. This is summarized in the utility function $U(w,a)$ below the next code-block. Additionally, we specify the prior $p_0(a)$ to be uniform, implying no prior preference for any of the actions (also shown after the next code-block).

### [Interact]: you can change the prior to a non-uniform prior in lines 11 to 16 below!¶

In [8]:
#only run this once
include("RateDistortionDecisionMaking.jl")

WARNING: replacing module RateDistortionDecisionMaking

Out[8]:
RateDistortionDecisionMaking
In [9]:
using RateDistortionDecisionMaking

#create n discrete actions a
a = collect(1:4)
na = length(a)
a_strings = ["a$aval" for aval in a] #string representation for plotting #--------- choose either one of the p0s (by commenting/uncommenting) --------- #uniform p0(a) p0 = ones(na)/na #skewed p0(a) #p0 = ones(na) #p0[1]=2 #p0 = p0/sum(p0) #--------------------- #U(a,o) - utility with two peaks that are almost identical in value U = ones(na) U[1]=3 U[2]=0.02 U[end]=3.1 #create (bar) plots for utility and prior plt_u = plot(x=a_strings,y=U,Geom.bar, Scale.x_discrete, Guide.xlabel("a", orientation=:horizontal),Guide.ylabel("U(w,a)",orientation=:vertical), Guide.title("Utility"),BAtheme()) plt_p0 = plot(x=a_strings,y=p0,Geom.bar, Scale.x_discrete, Scale.y_continuous(minvalue=0, maxvalue=1), Guide.xlabel("a", orientation=:horizontal),Guide.ylabel("p0(a)",orientation=:vertical), Guide.title("Prior"),BAtheme()) #show both plots stacked horizontally hstack(plt_u,plt_p0)  WARNING: using RateDistortionDecisionMaking.boltzmanndist in module Main conflicts with an existing identifier. WARNING: using RateDistortionDecisionMaking.BAtheme in module Main conflicts with an existing identifier.  Out[9]: ## Computing the posterior¶ In the code-block below, we compute and plot the bounded optimal posterior$p^*(a|w)$using the closed-form analytical solution (see function boltzmanndist in the file BlahutArimoto.jl). Additionally, we wrap the whole computation inside a slider which lets you easily change the value of$\beta$without having to (manually) re-run the code. ### [Interact] Evaluate the cell below and drag the slider up and down¶ ... to see how the posterior changes with different inverse temperatures (that correspond to different limits on the KL divergence between prior and posterior). Additionally try changing the prior to a non-uniform prior in the code-block above and see how that affects the posterior. In [17]: #compute the posterior p(a|w), using the inverse temperature β as a parameter #using a slider to interactively change the inverse temperature using Reactive, Interact println("Drag the slider and see how the inverse temperature influences the posterior.") sl_β = slider(0:0.1:50, label="inverse temp. β") display(sl_β) #displays the slider at the output of the current cell #tie the slider with the plotting routine (this will also show the figure, since it's the last command in this cell) plt_pa = lift(β->begin pagw = boltzmanndist(p0,β,U) #compute posterior plt_pagw = plot(x=a_strings,y=pagw,Geom.bar, Scale.x_discrete, Scale.y_continuous(minvalue=0, maxvalue=1), Guide.xlabel("a", orientation=:horizontal),Guide.ylabel("p*(a|w)",orientation=:vertical), Guide.title("Posterior, β=$β"),BAtheme())
end,sl_β)

Drag the slider and see how the inverse temperature influences the posterior.


If you see the slider but not the plot and "Javascript error adding output!", try moving the slider once, then the plot should appear.

If this does not work (i.e. kernel is busy with this cell but nothing ever happens), try ckecking out the most current version of Interact with: Pkg.checkout("Interact") in a Julia console. You can undo this and go back to the latest released version with Pkg.free("Interact")

When using the uniform prior:
For $\beta=0$, corresponding to no computational resources, the posterior is identical to the prior. For low $\beta$ (close to 1), the posterior is almost indifferent between $a1$ and $a4$, but at the same time has a very low chance of picking one of the "bad" options. This is a trade-off between using low computational resources (given the uniform prior) versus achieving a high expected utility. For large $\beta$, corresponding to unlimited computational resources, the agent (almost) deterministically picks $a4$ which will yield the highest expected utility, but also incurs large computational cost (see below).

## Trade-off as a function of $\beta$¶

In the code-block below, we sweep through a range of different $\beta$ values and plot the resulting expected utility and KL-divergence as a function of $\beta$.

In [11]:
#Sweep through β values and compute expected utility and KL-divergence

β_sweep = collect(0:0.1:50)
nβ = length(β_sweep)

EU = zeros(nβ)
DKL = zeros(nβ)

#inverse temp. sweep
for i=1:nβ
#compute posterior p(a|w) using the current β
pagw_s = boltzmanndist(p0,β_sweep[i],U)
#compute expected utility = ∑ p(a)U(a)
EUvec = (pagw_s') * U  #this will create a Float vector with only one element
#(which is still different from a scalar in Julia)
EU[i] = EUvec[1]  #this will actually extract the Float scalar
DKL[i] = kl_divergence(pagw_s,p0)/log(2) #divide by log(2) for bits
end

#plot expected utility and KL divergence as a function of β
plt_EU = plot(x=β_sweep,y=EU,Geom.line,
Guide.xlabel("β", orientation=:horizontal),Guide.ylabel("E[U]",orientation=:vertical),
Guide.title("Expected utility"),BAtheme())

plt_DKL = plot(x=β_sweep,y=DKL,Geom.line,
Guide.xlabel("β", orientation=:horizontal),Guide.ylabel("D<sub>KL</sub>(p||p0) [bits]",
orientation=:vertical),Guide.title("KL-divergence"),BAtheme())

hstack(plt_EU,plt_DKL)

Out[11]:

Bounded rationality: using limited computational resources as efficiently as possible
In the plots above (using the uniform prior) note that if the performance goal is set for instance to achieving $\approx 96\%$ of the maximum utility on average (which translates to an expected utility of roughly $3$ compared to the maximum of $3.1$), then an agent which is able to process roughly $1$ bit suffices. If, however the performance goal is set to achieving $99.999\%$ of the maximum utility on average, then an agent with twice the computational capacity of almost $2$ bits is needed. This means that an agent that does pretty well can be built with half the computational resources than a classical maximum expected utility agent.

Biological agents: humans and animals
It sounds a bit constructed to argue that selecting one of four options is a computationally hard problem that requires a bounded rational approach. But the example illustrates a deeper issue - imagine a biological agent (a human or an animal) that has to grasp a cup: there are infinitely many was to grasp the cup, there are also infinitely many ways to fail to grasp it or to grasp it with spilling some of its contents. It is impossible to internally evaluate all these possibilities. Additionally, our brains have limited information processing capacity but it seems that we can solve the task of grasping a cup almost effortless. It is plausible that we have (acquired) a good prior for searching the space of possible trajectories to grasp a cup and that we can easily find a solution that works "well enough" rather than the single best trajectory. The principle of information-theoretic bounded rationality provides a formal approach to solving computationally hard problems optimally, given the constraint of limited computational resources. In contrast, plain utility maximization easily breaks down completely under computational constraints.

## Plots for the paper¶

In [12]:
#for final figure: compute two different posteriors (low and high inverse temp.)

β_lo = 1  #inv. temp. for posterior with low computational resources (will stick close to prior)
β_hi = 100   #inv. temp. posterior with high computational resources (can deviate a lot from prior)

#compute posteriors
pagw_lo = boltzmanndist(p0,β_lo,U)
pagw_hi = boltzmanndist(p0,β_hi,U)

#create plots
plt_pagw_lo = plot(x=a_strings,y=pagw_lo,Geom.bar, Scale.x_discrete, Scale.y_continuous(minvalue=0, maxvalue=1),
Guide.xlabel("a", orientation=:horizontal),
Guide.ylabel("p*(a|w)",orientation=:vertical),Guide.title("Posterior, β=$β_lo"),BAtheme()) plt_pagw_hi = plot(x=a_strings,y=pagw_hi,Geom.bar, Scale.x_discrete, Scale.y_continuous(minvalue=0, maxvalue=1), Guide.xlabel("a", orientation=:horizontal),Guide.ylabel("p*(a|w)",orientation=:vertical), Guide.title("Posterior, β=$β_hi"),BAtheme())

hstack(plt_pagw_lo, plt_pagw_hi)

Out[12]:
In [13]:
#compose final figure - stacking all the plots
plt_final = vstack(hstack(plt_u,plt_p0,plt_EU),hstack(plt_pagw_lo, plt_pagw_hi,plt_DKL))

#show figure
display(plt_final)

#store figure (make sure that the folder 'Figures' exists or change path)
w = 18cm
h = 12cm
#draw(SVG("Figures/FEOptimization.svg", w, h), plt_final) #uncomment to store figure