Abstract: Physical landscapes are shaped by elevation, valleys, and peaks. We might expect information landscapes are molded by entropy, precision, and capacity constraints. To explore how these ideas might manifest we introduce Jaynes’ world, an entropy game that maximises instantaneous entropy production.
In this talk we’ll argue that this landscape has a precision/capacity trade-off that suggests the underlying configuration requires a density matrix representation.
::: {.cell .markdown}
[edit]
This game explores how structure, time, causality, and locality might emerge within a system governed solely by internal information-theoretic constraints. The hope is that it can serve as
Let Z={Z1,Z2,…,Zn} be the full set of system variables. At game turn t, define a partition where X(t)⊆Z: are active variables (currently contributing to entropy) and M(t)=Z∖X(t): latent or frozen variables that are stored in the form of an information reservoir (Barato and Seifert (2014),Parrondo et al. (2015)).
We’ll argue that the configuration space must be represented by a density matrix, ρ(θ)=1Z(θ)exp(∑iθiHi),
From this we can see that the log-partition function, which has an interpretation as the cummulant generating function, is A(θ)=logZ(θ)
We define our system to have a maximum entropy of N bits. If the dimension d of the parameter space is fixed, this implies a minimum detectable resolution in natural parameter space, ε∼12N,
Note if the dimension d scales with N (e.g., d=αN for some constant α), then the resolution constraint becomes more complex. In this case, the volume of distinguishable states (ε)d must equal 2N, which leads to ε=21/α, a constant independent of N. This suggests that as the system’s entropy capacity grows, it maintains a constant resolution while exponentially increasing the number of distinguishable states.
Each variable Zi is associated with a generator Hi, and a natural parameter θi. When we say a parameter θi∈X(t), we mean that the component of the system associated with Hi is active at time t and its parameter is evolving with |˙θi|≥ε. This comes from the duality variables, observables, and natural parameters that we find in exponential family representations and we also see in a density matrix representation.
Our core axiom is that the system evolves by steepest ascent in entropy. The gradient of the density matrix with respect to the natural parameters is given by ∇S[ρ]=−G(θ)θ
import numpy as np
First we write some helper code to plot the histogram and compute its entropy.
import matplotlib.pyplot as plt
import mlai.plot as plot
def plot_histogram(ax, p, max_height=None):
heights = p
if max_height is None:
max_height = 1.25*heights.max()
# Safe entropy calculation that handles zeros
nonzero_p = p[p > 0] # Filter out zeros
S = - (nonzero_p*np.log2(nonzero_p)).sum()
# Define bin edges
bins = [1, 2, 3, 4, 5] # Bin edges
# Create the histogram
if ax is None:
fig, ax = plt.subplots(figsize=(6, 4)) # Adjust figure size
ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black') # Use weights for probabilities
# Customize the plot for better slide presentation
ax.set_xlabel("Bin")
ax.set_ylabel("Probability")
ax.set_title(f"Four Bin Histogram (Entropy {S:.3f})")
ax.set_xticks(bins[:-1]) # Show correct x ticks
ax.set_ylim(0,max_height) # Set y limit for visual appeal
We can compute the entropy of any given histogram.
# Define probabilities
p = np.zeros(4)
p[0] = 4/13
p[1] = 3/13
p[2] = 3.7/13
p[3] = 1 - p.sum()
# Safe entropy calculation
nonzero_p = p[p > 0] # Filter out zeros
entropy = - (nonzero_p*np.log2(nonzero_p)).sum()
print(f"The entropy of the histogram is {entropy:.3f}.")
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
fig.tight_layout()
plot_histogram(ax, p)
ax.set_title(f"Four Bin Histogram (Entropy {entropy:.3f})")
mlai.write_figure(filename='four-bin-histogram.svg',
directory = './information-game')
Figure: The entropy of a four bin histogram.
We can play the entropy game by starting with a histogram with all the probability mass in the first bin and then ascending the gradient of the entropy function.
The simplest possible example of Jaynes’ World is a two-bin histogram with probabilities p and 1−p. This minimal system allows us to visualize the entire entropy landscape.
The natural parameter is the log odds, θ=logp1−p, and the update given by the entropy gradient is Δθsteepest=ηdSdθ=ηp(1−p)(log(1−p)−logp).
import numpy as np
# Python code for gradients
p_values = np.linspace(0.000001, 0.999999, 10000)
theta_values = np.log(p_values/(1-p_values))
entropy = -p_values * np.log(p_values) - (1-p_values) * np.log(1-p_values)
fisher_info = p_values * (1-p_values)
gradient = fisher_info * (np.log(1-p_values) - np.log(p_values))
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=plot.big_wide_figsize)
ax1.plot(theta_values, entropy)
ax1.set_xlabel('$\\theta$')
ax1.set_ylabel('Entropy $S(p)$')
ax1.set_title('Entropy Landscape')
ax2.plot(theta_values, gradient)
ax2.set_xlabel('$\\theta$')
ax2.set_ylabel('$\\nabla_\\theta S(p)$')
ax2.set_title('Entropy Gradient vs. Position')
mlai.write_figure(filename='two-bin-histogram-entropy-gradients.svg',
directory = './information-game')
Figure: Entropy gradients of the two bin histogram agains position.
This example reveals the entropy extrema at p=0, p=0.5, and p=1. At minimal entropy (p≈0 or p≈1), the gradient approaches zero, creating natural information reservoirs. The dynamics slow dramatically near these points - these are the areas of critical slowing that create information reservoirs.
We can visualize the entropy maximization process by performing gradient ascent in the natural parameter space θ. Starting from a low-entropy state, we follow the gradient of entropy with respect to θ to reach the maximum entropy state.
import numpy as np
# Helper functions for two-bin histogram
def theta_to_p(theta):
"""Convert natural parameter theta to probability p"""
return 1.0 / (1.0 + np.exp(-theta))
def p_to_theta(p):
"""Convert probability p to natural parameter theta"""
# Add small epsilon to avoid numerical issues
p = np.clip(p, 1e-10, 1-1e-10)
return np.log(p/(1-p))
def entropy(theta):
"""Compute entropy for given theta"""
p = theta_to_p(theta)
# Safe entropy calculation
return -p * np.log2(p) - (1-p) * np.log2(1-p)
def entropy_gradient(theta):
"""Compute gradient of entropy with respect to theta"""
p = theta_to_p(theta)
return p * (1-p) * (np.log2(1-p) - np.log2(p))
def plot_histogram(ax, theta, max_height=None):
"""Plot two-bin histogram for given theta"""
p = theta_to_p(theta)
heights = np.array([p, 1-p])
if max_height is None:
max_height = 1.25
# Compute entropy
S = entropy(theta)
# Create the histogram
bins = [1, 2, 3] # Bin edges
if ax is None:
fig, ax = plt.subplots(figsize=(6, 4))
ax.hist(bins[:-1], bins=bins, weights=heights, align='left', rwidth=0.8, edgecolor='black')
# Customize the plot
ax.set_xlabel("Bin")
ax.set_ylabel("Probability")
ax.set_title(f"Two-Bin Histogram (Entropy {S:.3f})")
ax.set_xticks(bins[:-1])
ax.set_ylim(0, max_height)
# Parameters for gradient ascent
theta_initial = -9.0 # Start with low entropy
learning_rate = 1
num_steps = 1500
# Initialize
theta_current = theta_initial
theta_history = [theta_current]
p_history = [theta_to_p(theta_current)]
entropy_history = [entropy(theta_current)]
# Perform gradient ascent in theta space
for step in range(num_steps):
# Compute gradient
grad = entropy_gradient(theta_current)
# Update theta
theta_current = theta_current + learning_rate * grad
# Store history
theta_history.append(theta_current)
p_history.append(theta_to_p(theta_current))
entropy_history.append(entropy(theta_current))
if step % 100 == 0:
print(f"Step {step+1}: θ = {theta_current:.4f}, p = {p_history[-1]:.4f}, Entropy = {entropy_history[-1]:.4f}")
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai
# Create a figure showing the evolution
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
fig.tight_layout(pad=3.0)
# Select steps to display
steps_to_show = [0, 300, 600, 900, 1200, 1500]
# Plot histograms for selected steps
for i, step in enumerate(steps_to_show):
row, col = i // 3, i % 3
plot_histogram(axes[row, col], theta_history[step])
axes[row, col].set_title(f"Step {step}: θ = {theta_history[step]:.2f}, p = {p_history[step]:.3f}")
mlai.write_figure(filename='two-bin-histogram-evolution.svg',
directory = './information-game')
# Plot entropy evolution
plt.figure(figsize=(10, 6))
plt.plot(range(num_steps+1), entropy_history, 'o-')
plt.xlabel('Gradient Ascent Step')
plt.ylabel('Entropy')
plt.title('Entropy Evolution During Gradient Ascent')
plt.grid(True)
mlai.write_figure(filename='two-bin-entropy-evolution.svg',
directory = './information-game')
# Plot trajectory in theta space
plt.figure(figsize=(10, 6))
theta_range = np.linspace(-5, 5, 1000)
entropy_curve = [entropy(t) for t in theta_range]
plt.plot(theta_range, entropy_curve, 'b-', label='Entropy Landscape')
plt.plot(theta_history, entropy_history, 'ro-', label='Gradient Ascent Path')
plt.xlabel('Natural Parameter θ')
plt.ylabel('Entropy')
plt.title('Gradient Ascent Trajectory in Natural Parameter Space')
plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)
plt.legend()
plt.grid(True)
mlai.write_figure(filename='two-bin-trajectory.svg',
directory = './information-game')
Figure: Evolution of the two-bin histogram during gradient ascent in natural parameter space.
Figure: Entropy evolution during gradient ascent for the two-bin histogram.
Figure: Gradient ascent trajectory in the natural parameter space for the two-bin histogram.
The gradient ascent visualization shows how the system evolves in the natural parameter space θ. Starting from a negative θ (corresponding to a low-entropy state with p<<0.5), the system follows the gradient of entropy with respect to θ until it reaches θ=0 (corresponding to p=0.5), which is the maximum entropy state.
Note that the maximum entropy occurs at θ=0, which corresponds to p=0.5. The gradient of entropy with respect to θ is zero at this point, making it a stable equilibrium for the gradient ascent process.
import numpy as np
# Define the entropy function
def entropy(lambdas):
p = lambdas**2/(lambdas**2).sum()
# Safe entropy calculation
nonzero_p = p[p > 0]
nonzero_lambdas = lambdas[p > 0]
return np.log2(np.sum(lambdas**2))-np.sum(nonzero_p * np.log2(nonzero_lambdas**2))
# Define the gradient of the entropy function
def entropy_gradient(lambdas):
denominator = np.sum(lambdas**2)
p = lambdas**2/denominator
# Safe log calculation
log_terms = np.zeros_like(lambdas)
nonzero_idx = lambdas != 0
log_terms[nonzero_idx] = np.log2(np.abs(lambdas[nonzero_idx]))
p_times_lambda_entropy = -2*log_terms/denominator
const = (p*p_times_lambda_entropy).sum()
gradient = 2*lambdas*(p_times_lambda_entropy - const)
return gradient
# Numerical gradient check
def numerical_gradient(func, lambdas, h=1e-5):
numerical_grad = np.zeros_like(lambdas)
for i in range(len(lambdas)):
temp_lambda_plus = lambdas.copy()
temp_lambda_plus[i] += h
temp_lambda_minus = lambdas.copy()
temp_lambda_minus[i] -= h
numerical_grad[i] = (func(temp_lambda_plus) - func(temp_lambda_minus)) / (2 * h)
return numerical_grad
We can then ascend the gradeint of the entropy function, starting at a parameter setting where the mass is placed in the first bin, we take λ2=λ3=λ4=0.01 and λ1=100.
First to check our code we compare our numerical and analytic gradients.
import numpy as np
# Initial parameters (lambda)
initial_lambdas = np.array([100, 0.01, 0.01, 0.01])
# Gradient check
numerical_grad = numerical_gradient(entropy, initial_lambdas)
analytical_grad = entropy_gradient(initial_lambdas)
print("Numerical Gradient:", numerical_grad)
print("Analytical Gradient:", analytical_grad)
print("Gradient Difference:", np.linalg.norm(numerical_grad - analytical_grad)) # Check if close to zero
Now we can run the steepest ascent algorithm.
import numpy as np
# Steepest ascent algorithm
lambdas = initial_lambdas.copy()
learning_rate = 1
turns = 15000
entropy_values = []
lambdas_history = []
for _ in range(turns):
grad = entropy_gradient(lambdas)
lambdas += learning_rate * grad # update lambda for steepest ascent
entropy_values.append(entropy(lambdas))
lambdas_history.append(lambdas.copy())
We can plot the histogram at a set of chosen turn numbers to see the progress of the algorithm.
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
plot_at = [0, 100, 1000, 2500, 5000, 7500, 10000, 12500, turns-1]
for i, iter in enumerate(plot_at):
plot_histogram(ax, lambdas_history[i]**2/(lambdas_history[i]**2).sum(), 1)
# write the figure,
mlai.write_figure(filename=f'four-bin-histogram-turn-{i:02d}.svg',
directory = './information-game')
import notutils as nu
from ipywidgets import IntSlider
nu.display_plots('two_point_sample{sample:0>3}.svg',
'./information-game',
sample=IntSlider(5, 5, 5, 1))
Figure: Intermediate stages of the histogram entropy game. After 0, 1000, 5000, 10000 and 15000 iterations.
And we can also plot the changing entropy as a function of the number of game turns.
fig, ax = plt.subplots(figsize=plot.big_wide_figsize)
ax.plot(range(turns), entropy_values)
ax.set_xlabel("turns")
ax.set_ylabel("entropy")
ax.set_title("Entropy vs. turns (Steepest Ascent)")
mlai.write_figure(filename='four-bin-histogram-entropy-vs-turns.svg',
directory = './information-game')
Figure: Four bin histogram entropy game. The plot shows the increasing entropy against the number of turns across 15000 iterations of gradient ascent.
Note that the entropy starts at a saddle point, increaseases rapidly, and the levels off towards the maximum entropy, with the gradient decreasing slowly in the manner of Zeno’s paradox.
We partition the Fisher Information Matrix G(θ) according to the active variables X(t) and latent information reservoir M(t): G(θ)=[GXXGXMGMXGMM]
The minimal-entropy state compatible with the system’s resolution constraint and regularity condition is represented by a density matrix of the exponential form, ρ(θo)=1Z(θo)exp(∑iθoiHi),
[edit]
To illustrate saddle points and information reservoirs, we need at least a 4-bin system. This creates a 3-dimensional parameter space where we can observe genuine saddle points.
Consider a 4-bin system parameterized by natural parameters θ1, θ2, and θ3 (with one constraint). A saddle point occurs where the gradient ∇θS=0, but the Hessian has mixed eigenvalues - some positive, some negative.
At these points, the Fisher information matrix G(θ) eigendecomposition reveals.
The eigenvectors of G(θ) at the saddle point determine which parameter combinations form information reservoirs.
import numpy as np
# Exponential family entropy functions for 4-bin system
def exponential_family_entropy(theta):
"""
Compute entropy of a 4-bin exponential family distribution
parameterized by natural parameters theta
"""
# Compute the log-partition function (normalization constant)
log_Z = np.log(np.sum(np.exp(theta)))
# Compute probabilities
p = np.exp(theta - log_Z)
# Compute entropy: -sum(p_i * log(p_i))
entropy = -np.sum(p * np.log(p), where=p>0)
return entropy
def entropy_gradient(theta):
"""
Compute the gradient of the entropy with respect to theta
"""
# Compute the log-partition function (normalization constant)
log_Z = np.log(np.sum(np.exp(theta)))
# Compute probabilities
p = np.exp(theta - log_Z)
# Gradient is theta times the second derivative of log partition function
return -p*theta + p*(np.dot(p, theta))
# Add a gradient check function
def check_gradient(theta, epsilon=1e-6):
"""
Check the analytical gradient against numerical gradient
"""
# Compute analytical gradient
analytical_grad = entropy_gradient(theta)
# Compute numerical gradient
numerical_grad = np.zeros_like(theta)
for i in range(len(theta)):
theta_plus = theta.copy()
theta_plus[i] += epsilon
entropy_plus = exponential_family_entropy(theta_plus)
theta_minus = theta.copy()
theta_minus[i] -= epsilon
entropy_minus = exponential_family_entropy(theta_minus)
numerical_grad[i] = (entropy_plus - entropy_minus) / (2 * epsilon)
# Compare
print("Analytical gradient:", analytical_grad)
print("Numerical gradient:", numerical_grad)
print("Difference:", np.abs(analytical_grad - numerical_grad))
return analytical_grad, numerical_grad
# Project gradient to respect constraints (sum of theta is constant)
def project_gradient(theta, grad):
"""
Project gradient to ensure sum constraint is respected
"""
# Project to space where sum of components is zero
return grad - np.mean(grad)
# Perform gradient ascent on entropy
def gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1):
"""
Perform gradient ascent on entropy for 4-bin system
"""
theta = theta_init.copy()
theta_history = [theta.copy()]
entropy_history = [exponential_family_entropy(theta)]
for _ in range(steps):
# Compute gradient
grad = entropy_gradient(theta)
proj_grad = project_gradient(theta, grad)
# Update parameters
theta += learning_rate * proj_grad
# Store history
theta_history.append(theta.copy())
entropy_history.append(exponential_family_entropy(theta))
return np.array(theta_history), np.array(entropy_history)
# Test the gradient calculation
test_theta = np.array([0.5, -0.3, 0.1, -0.3])
test_theta = test_theta - np.mean(test_theta) # Ensure constraint is satisfied
print("Testing gradient calculation:")
analytical_grad, numerical_grad = check_gradient(test_theta)
# Verify if we're ascending or descending
entropy_before = exponential_family_entropy(test_theta)
step_size = 0.01
test_theta_after = test_theta + step_size * analytical_grad
entropy_after = exponential_family_entropy(test_theta_after)
print(f"Entropy before step: {entropy_before}")
print(f"Entropy after step: {entropy_after}")
print(f"Change in entropy: {entropy_after - entropy_before}")
if entropy_after > entropy_before:
print("We are ascending the entropy gradient")
else:
print("We are descending the entropy gradient")
# Initialize with asymmetric distribution (away from saddle point)
theta_init = np.array([1.0, -0.5, -0.2, -0.3])
theta_init = theta_init - np.mean(theta_init) # Ensure constraint is satisfied
# Run gradient ascent
theta_history, entropy_history = gradient_ascent_four_bin(theta_init, steps=100, learning_rate=1.0)
# Create a grid for visualization
x = np.linspace(-2, 2, 100)
y = np.linspace(-2, 2, 100)
X, Y = np.meshgrid(x, y)
# Compute entropy at each grid point (with constraint on theta3 and theta4)
Z = np.zeros_like(X)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
# Create full theta vector with constraint that sum is zero
theta1, theta2 = X[i,j], Y[i,j]
theta3 = -0.5 * (theta1 + theta2)
theta4 = -0.5 * (theta1 + theta2)
theta = np.array([theta1, theta2, theta3, theta4])
Z[i,j] = exponential_family_entropy(theta)
# Compute gradient field
dX = np.zeros_like(X)
dY = np.zeros_like(Y)
for i in range(X.shape[0]):
for j in range(X.shape[1]):
# Create full theta vector with constraint
theta1, theta2 = X[i,j], Y[i,j]
theta3 = -0.5 * (theta1 + theta2)
theta4 = -0.5 * (theta1 + theta2)
theta = np.array([theta1, theta2, theta3, theta4])
# Get full gradient and project
grad = entropy_gradient(theta)
proj_grad = project_gradient(theta, grad)
# Store first two components
dX[i,j] = proj_grad[0]
dY[i,j] = proj_grad[1]
# Normalize gradient vectors for better visualization
norm = np.sqrt(dX**2 + dY**2)
# Avoid division by zero
norm = np.where(norm < 1e-10, 1e-10, norm)
dX_norm = dX / norm
dY_norm = dY / norm
# A few gradient vectors for visualization
stride = 10
import matplotlib.pyplot as plt
import mlai.plot as plot
import mlai
fig = plt.figure(figsize=plot.big_wide_figsize)
# Create contour lines only (no filled contours)
contours = plt.contour(X, Y, Z, levels=15, colors='black', linewidths=0.8)
plt.clabel(contours, inline=True, fontsize=8, fmt='%.2f')
# Add gradient vectors (normalized for direction, but scaled by magnitude for visibility)
plt.quiver(X[::stride, ::stride], Y[::stride, ::stride],
dX_norm[::stride, ::stride], dY_norm[::stride, ::stride],
color='r', scale=30, width=0.003, scale_units='width')
# Plot the gradient ascent trajectory
plt.plot(theta_history[:, 0], theta_history[:, 1], 'b-', linewidth=2,
label='Gradient Ascent Path')
plt.scatter(theta_history[0, 0], theta_history[0, 1], color='green', s=100,
marker='o', label='Start')
plt.scatter(theta_history[-1, 0], theta_history[-1, 1], color='purple', s=100,
marker='*', label='End')
# Add labels and title
plt.xlabel('$\\theta_1$')
plt.ylabel('$\\theta_2$')
plt.title('Entropy Contours with Gradient Field')
# Mark the saddle point (approximately at origin for this system)
plt.scatter([0], [0], color='yellow', s=100, marker='*',
edgecolor='black', zorder=10, label='Saddle Point')
plt.legend()
mlai.write_figure(filename='simplified-saddle-point-example.svg',
directory = './information-game')
# Plot entropy evolution during gradient ascent
plt.figure(figsize=plot.big_figsize)
plt.plot(entropy_history)
plt.xlabel('Gradient Ascent Step')
plt.ylabel('Entropy')
plt.title('Entropy Evolution During Gradient Ascent')
plt.grid(True)
mlai.write_figure(filename='four-bin-entropy-evolution.svg',
directory = './information-game')
Figure: Visualisation of a saddle point projected down to two dimensions.
Figure: Entropy evolution during gradient ascent on the four-bin system.
The animation of system evolution would show initial rapid movement along high-eigenvalue directions, progressive slowing in directions with low eigenvalues and formation of information reservoirs in the critically slowed directions. Parameter-capacity uncertainty emerges naturally at the saddle point.
τ(t) increases monotonically, preventing time-reversal globally.
At points where the latent-to-active flow functional is locally extremal (e.g., ), the system may exhibit critical slowing where information resevoir variables are slow relative to active variables. It may be possible to separate the system entropy into active variables and, I=S[ρX] and “intrinsic information” J=S[ρX|M] allowing us to create an information analogous to B. Roy Frieden’s extreme physical information (Frieden (1998)) which allows derivation of locally valid differential equations that depend on the information topography.
For more information on these subjects and more you might want to check the following resources.
[edit]
We will determine constraints on the Fisher Information Matrix G(θ) that are consistent with the system’s unfolding rules and internal information geometry. We follow Jaynes (Jaynes, 1957) in solving a variational problem that captures the allowed structure of the system’s origin (minimal entropy) state.
Hirschman Jr (1957) established a connection between entropy and the Fourier transform, showing that the entropy of a function and its Fourier transform cannot both be arbitrarily small. This result, known as the Hirschman uncertainty principle, was later strengthened by Beckner (Beckner, 1975) who derived the optimal constant in the inequality. Białynicki-Birula and Mycielski (1975) extended these ideas to derive uncertainty relations for information entropy in wave mechanics.
From these results we know that there are fundamental limits to how we express the entropy of position and its conjugate space simultaneously. These limits inspire us to focus on the von Neumann entropy so that our system respects the Hirschman uncertainty principle.
A density matrix has the form ρ(θ)=1Z(θ)exp(∑iθiHi)
The von Neumman entropy is given by S[ρ]=−tr(ρlogρ)
We now derive the minimal entropy configuration inspired by Jaynes’s free-form variational approach. This enables us to derive the form of the density matrix directly from information-theoretic constraints (Jaynes, 1963).
[edit]
Jaynes suggested that statistical mechanics problems should be treated as problems of inference. Assign the probability distribution (or density matrix) that is maximally noncommittal with respect to missing information, subject to known constraints.
While Jaynes applied this idea to derive the maximum entropy configuration given constraints, here we adapt it to derive the minimum entropy configuration, under an assumption of zero initial entropy bounded by a maximum entropy of N bits.
Let ρ be a density matrix describing the state of a system. The von Neumann entropy is, S[ρ]=−tr(ρlogρ),
In the game we assume that the system begins in a state of minimal entropy, the state cannot be a delta function (no singularities, so it must obey a resolution constraint ε) and the entropy is bounded above by N bits: S[ρ]≤N.
We apply a variational principle where we minimise S[ρ]=−tr(ρlogρ)
The first constraint is normalization, tr(ρ)=1.
The resolution constraint is motivated by the entropy being constrained to be, S[ρ]≤N
We introduce Lagrange multipliers λ0, λz, λp for these constraints and define the Lagrangian $$ \mathcal{L}[\rho] = -\mathrm{tr}(\rho \log \rho)
\frac{\delta \mathcal{L}}{\delta \rho} = -\log \rho - 1 - \lambda_x \hat{Z}^2 - \lambda_p \hat{P}^2 + \lambda_0 = 0 andsolvingfor$ρ$gives
The Lagrange multipliers λz,λp enforce lower bounds on variance. These define the natural parameters as θz=−λz and θp=−λp in the exponential family form ρ(θ)∝exp(θ⋅H). The form of ρ is a density matrix. The curvature (second derivative) of logZ(θ) gives the Fisher Information matrix G(θ). Steepest ascent trajectories in θ space will trace the system’s entropy dynamics.
Next we compute G(θ) from logZ(θ) to explore the information geometry. From this we should verify that the following conditions hold, |[G(θ)θ]i|<εfor all i
The Hermitians have a non-commuting observable pair constraint, [Hi,Hj]≠0,
We can then use ε and N to define initial thresholds and maximum resolution and examine how variables decouple and how saddle-like regions emerge as the landscape unfolds through gradient ascent.
This constrained minimization problem yields the structure of the initial density matrix ρ(θo) and the permissible curvature geometry G(θo) and a constraint-consistent basis of observables {Hi} that have a quadratic form. This ensures the system begins in a regular, latent, low-entropy state.
This is the configuration from which entropy ascent and symmetry-breaking transitions emerge.
Barato, A.C., Seifert, U., 2014. Stochastic thermodynamics with information reservoirs. Physical Review E 90, 042150. https://doi.org/10.1103/PhysRevE.90.042150
Beckner, W., 1975. Inequalities in Fourier analysis. Annals of Mathematics 159–182. https://doi.org/10.2307/1970980
Białynicki-Birula, I., Mycielski, J., 1975. Uncertainty relations for information entropy in wave mechanics. Communications in Mathematical Physics 44, 129–132. https://doi.org/10.1007/BF01608825
Frieden, B.R., 1998. Physics from fisher information: A unification. Cambridge University Press, Cambridge, UK. https://doi.org/10.1017/CBO9780511622670
Hirschman Jr, I.I., 1957. A note on entropy. American Journal of Mathematics 79, 152–156. https://doi.org/10.2307/2372390
Jaynes, E.T., 1963. Information theory and statistical mechanics, in: Ford, K.W. (Ed.), Brandeis University Summer Institute Lectures in Theoretical Physics, Vol. 3: Statistical Physics. W. A. Benjamin, Inc., New York, pp. 181–218.
Jaynes, E.T., 1957. Information theory and statistical mechanics. Physical Review 106, 620–630. https://doi.org/10.1103/PhysRev.106.620
Parrondo, J.M.R., Horowitz, J.M., Sagawa, T., 2015. Thermodynamics of information. Nature Physics 11, 131–139. https://doi.org/10.1038/nphys3230