http://nbviewer.jupyter.org/gist/sgttwld/1a31c9f739fbf8c374e9f460fce19fe6
from IPython.core.display import HTML
import urllib
response = urllib.urlopen("""
https://gist.githubusercontent.com/sgttwld/c060b18a9d6ce7c3a10e3c6dce2c0d3a/raw
""")
css = str(response.read().decode("utf-8"))
HTML("<style type='text/css'>"+css+"</style>")
Sebastian Gottwald, 2018
In thermodynamics, the work required to compress an ideal gas isothermally from a volume $V$ to a smaller volume $V'$ is given by
$$ W = kT \ln \frac{V}{V'} $$with the convention that work done on the system by its surroundings is positive. Here $T$ denotes the temperature and $k$ the Boltzmann constant. Since the compression is isothermally, the molecules of the gas have the same kinetic energy on average and they are the same amount than before, i.e. the internal energy $U$, which is the sum of the energies of its constituent parts, did not change:
$$ \Delta U = 0 $$By the first law of thermodynamics, $\Delta U = W + Q$, i.e. the work done on the system was converted into internal gas heat $Q=-W<0$, which is negative because it was released by the system into its surroundings, i.e. drained off into the thermal bath to keep the gas at the same temperature. What has changed is the state of the system from finding the gas in volume $V$ to finding it in $V'$. In thermodynamics such state changes are recorded by the Free Energy
$$ F = U - TS $$where $S$ denotes the entropy of the system. Since $\Delta U = \Delta F + T\Delta S$ and $\Delta S \geqslant \Delta Q/T$ (2nd law), the Free Energy $F$ is that part of the total energy of the system that can be used to perform work (hence the name):
$$ \Delta F = \Delta U - T\Delta S = W + (Q - T\Delta S) \leqslant W $$with equality for reversible processes. In particular, if the isothermal compression from above is reversible, then
\begin{equation}\tag{$\Delta$} \Delta F = kT \ln \frac{V}{V'} \end{equation}If we consider a gas only consisting of one molecule, the thermodynamic concepts of temperature, pressure, volume, free energy, and entropy still make sense as long as we consider them as a result of time averaging. Changing the volume from $V$ to $V'$ is then related to the reduction of uncertainty about the position of the particle, which averaged over time requires physical work given by equation $(\Delta)$.
In terms of the probabilities of where to find the particle inside of a bigger volume $\Omega$, i.e. $p:= V/\Omega$ to find it in $V$ and $p':= V'/\Omega$ to find it in $V'$, the work required to reduce the uncertainty from $p$ to $p'$ is
$$ W = kT \ln \frac{V}{V'}\frac{\Omega}{\Omega} = kT \ln \frac{p}{p'} $$A change in volume from $V$ to $V'$ can be viewed as a result of volume changes in multiple smaller compartments of the system. Consider for example $N$ pistons in a row with volumes $\{V_i\}_{i=1}^N$ that are changed to $\{V_i'\}_{i=1}^N$, so that $V=\sum_i V_i$ and $V'=\sum_i V_i'$. Then, for each $i=1,\dots,N$, $(\ast)$ can be decomposed into three terms:
\begin{equation} \tag{$\Delta\Delta$} kT \ln\frac{V}{V'} = \underbrace{kT \ln \frac{V}{V_i}}_{(i)} + \underbrace{kT \ln \frac{V_i}{V_i'}}_{(ii)} - \underbrace{kT \ln \frac{V'}{V_i'}}_{(iii)} \end{equation}which have the following interpretation:
In the single-particle interpretation, the quotients $q_i := \frac{V_i}{V}$ and $p_i := \frac{V_i'}{V'}$ are the probabilities of finding the particle in compartment $i$ before and after the compression, respectively. With these definitions, we can write the decompositions $(\Delta\Delta)$ as
$$ \Delta F = kT \ln \frac{V}{V'} = \sum_{i=1}^N p_i \, kT \ln \frac{V}{V'} = \sum_{i=1}^N p_i \, \Big[ \underbrace{kT \ln \frac{V_i}{V_i'}}_{(1)} + \underbrace{kT \ln \frac{p_i}{q_i}}_{(2)}\Big] $$where $(1)$ is work corresponding to the change in uncertainty of the particle's position inside of one piston and $(2)$ is the work corresponding to the change in uncertainty about the piston the particle might be in.
Hence, if $W_i = kT \ln \frac{V_i}{V_i'}$ denotes the work required locally in piston $i$, we have
$$ \Delta F = \sum_i p_i \, W_i + kT \, D_{KL}(p\|q) $$Notation We write $p(z) = P_Z(z) = \mathbb P(Z=z) = \mathbb P\circ Z^{-1}(z)$ and thus we may write $p(Z) := \mathbb P\circ Z^{-1}$, so that for all measurable functions $f$ on $Ran(Z)$
$$\sum_z p(z) f(z) = \mathbb E[f(Z)] = \mathbb E_{\mathbb P}[f(Z)] = \mathbb E_{\mathbb P\circ Z^{-1}}[f] = \mathbb E_{p(Z)}[f] = \mathbb E_{p(Z)}[f(\cdot)]\, .$$For better readability we may abuse notation and write $E_{p(Z)}[f(Z)]:=E_{p(Z)}[f]$. Similarly, for conditional probabilities we write $p(X|y):=P_X(\,\cdot\,|y)$, so that for each $y$ $$\sum_{x} p(x|y) \, f(x) = \mathbb E_{p(X|y)}[f] = \mathbb E_{p(X|y)}[f(X)]\,.$$
For $\alpha>0$, a discrete random variable $Z$ with values $z\in Ran(Z)$ and probabilities $p = \{p(z)\}_z$, define $F_p:Ran(Z)\to \{-\infty\}\cup (-\infty,0]$ by
$$F_p(z) := \frac{1}{\alpha} \log p(z)$$and similarly $F_p(z|y) := \frac{1}{\alpha} \log p(z|y)$.
Assume that there exist random variables $X$, $Y$, such that $Z=(X,Y)$. We may view $Y$ as a label for a coarse level/fibre, and $X$ as the variable running through the fibre given by $Y=y$.
Claim I: For all probability distributions $p$ and $q$ on $Ran(Z)$, and all $y\in Ran(Y)$, we have
$$F_q(y) = \mathbb E_{p(X|y)} \big[ F_q(X,y) \big] - \mathbb E_{p(X|y)}\big[ F_q(X|y)\big] \, .\tag{$\ast$}$$Proof: The claim follows from $$\alpha\, F_q(y) = \log q(y) = \Big( \sum_x p(x|y)\Big) \log q(y) = \sum_x p(x|y) \log \Big( q(y) \frac{q(x|y)}{q(x|y)} \Big)\\ = \sum_x p(x|y) \Big(\log q(x,y) - \log q(x|y)\Big).$$
Corollary: For $q=p$, equation $(\ast)$ reads
$$F_p(y) = \mathbb E_{p(X|y)} \big[ F_p(X,y) \big] + \frac{1}{\alpha}H\big[p(X|y)\big] \, ,\tag{$\ast\ast$}$$where $H[p]:= -\sum_x p(x) \log p(x)$ denotes the Shannon entropy of a discrete distribution $p$. Adding and subtracting $\mathbb E_{p(X|y)}[\frac{1}{\alpha}\log p(X)]$ on the right side of $(\ast\ast)$ gives
$$F_p(y) = \mathbb E_{p(X|y)}\big[F_p(y|X)\big] - \frac{1}{\alpha} D_{KL}\big(p(X|y)\big|\big|p(X)\big)\, .$$