Python Symbolic Information Theoretic Inequality Prover -- Demo¶

For references, refer to https://github.com/cheuktingli/psitip#references

In [1]:

%matplotlib inline
from psitip import *
PsiOpts.setting(
    solver = "ortools.GLOP",    # Set linear programming solver
    str_style = "std",          # Conventional notations in output
    proof_note_color = "blue",  # Reasons in proofs are blue
    solve_display_reg = True,   # Display claims in solve commands
    random_seed = 4321          # Random seed for example searching
)

X, Y, Z, W, U, V, M, S = rv("X, Y, Z, W, U, V, M, S") # Declare random variables

In [2]:

H(X+Y) - H(X) - H(Y)  # Simplify H(X,Y) - H(X) - H(Y)

Out[2]:

$-I(Y; X)$

In [3]:

bool(H(X) + I(Y & Z | X) >= I(Y & Z))  # Check H(X) + I(Y;Z|X) >= I(Y;Z)

Out[3]:

True

In [4]:

# Prove an implication
(markov(X+W, Y, Z) >> (I(X & W | Y) / 2 <= H(X | Z))).solve(full=True)

Out[4]:

$\begin{align*} & \begin{array}{l} \begin{array}{l} \displaystyle (X, W) \leftrightarrow Y \leftrightarrow Z\\ \displaystyle \Rightarrow \; \frac{1}{2}I(X; W|Y) \le H(X|Z)\\ \end{array}\;\mathrm{is}\;\mathrm{True}\\ \end{array}\\ \\ &I(X; W|Y)\\ &\le H(X|Y)\\ &= H(X|Y, Z)\;\;\;{\color{blue}{\left(\because\, Z \leftrightarrow Y \leftrightarrow X\right)}}\\ &\le H(X|Z)\\ &\le 2H(X|Z)\\ \end{align*} $

In [5]:

# Information diagram that shows the above implication
(markov(X+W, Y, Z) >> (I(X & W | Y) / 2 <= H(X | Z))).venn()

In [6]:

# Disprove an implication by a counterexample
(markov(X+W, Y, Z) >> (I(X & W | Y) * 3 / 2 <= H(X | Z))).solve(full=True)

Out[6]:

$\begin{array}{l} \begin{array}{l} \displaystyle (X, W) \leftrightarrow Y \leftrightarrow Z\\ \displaystyle \Rightarrow \; \frac{3}{2}I(X; W|Y) \le H(X|Z)\\ \end{array}\;\mathrm{is}\;\mathrm{False}\\ \\ \begin{array}{l} P(X, W, Y) = \left[ \begin{array}{cc} 0 & \frac{1}{2}\\ 0 & 0\\ \hline 0 & 0\\ 0 & \frac{1}{2}\end{array}\right]\\ P(Z|Y) = \left[ \begin{array}{cc} \frac{11}{12} & \frac{1}{12}\\ \frac{7}{11} & \frac{4}{11}\end{array}\right]\\ \end{array}\\ \end{array}$

In [7]:

# The condition "there exists Y independent of X such that 
# X-Y-Z forms a Markov chain" can be simplified to "X,Z independent"
(markov(X, Y, Z) & indep(X, Y)).exists(Y).simplified()

Out[7]:

$ X {\perp\!\!\perp} Z$

Additive combinatorics¶

In [8]:

A, B, C = rv("A, B, C", alg="abelian")  # Abelian-group-valued RVs

# Entropy of sum (or product) is submodular [Madiman 2008]
(indep(A, B, C) >> (H(A*B*C) + H(B) <= H(A*B) + H(B*C))).solve(full=True)

Out[8]:

$\begin{align*} & \begin{array}{l} \begin{array}{l} \displaystyle A {\perp\!\!\perp} B {\perp\!\!\perp} C\\ \displaystyle \Rightarrow \; H(A B C)+H(B) \le H(A B)+H(B C)\\ \end{array}\;\mathrm{is}\;\mathrm{True}\\ \end{array}\\ \\ &H(A B C)+H(B)\\ &= I(A B; A B C)+H(A B C, A B|A)\;\;\;{\color{blue}{\left(\because\, H(A B) = H(B)+I(A B, A B C; A)\right)}}\\ &\le H(A B)+H(A B C|A)\\ &= H(A B)+H(B C)\;\;\;{\color{blue}{\left(\because\, H(B C) = H(A B C|A)\right)}}\\ \end{align*} $

In [9]:

# Entropy form of Ruzsa triangle inequality [Ruzsa 1996], [Tao 2010]
(indep(A, B, C) >> (H(A/C) <= H(A/B) + H(B/C) - H(B))).solve(full=True)

Out[9]:

$\begin{align*} & \begin{array}{l} \begin{array}{l} \displaystyle A {\perp\!\!\perp} B {\perp\!\!\perp} C\\ \displaystyle \Rightarrow \; H(B)+H(A C^{-1}) \le H(A B^{-1})+H(B C^{-1})\\ \end{array}\;\mathrm{is}\;\mathrm{True}\\ \end{array}\\ \\ &H(B)+H(A C^{-1})\\ &= H(A, A B^{-1}, A C^{-1})-H(A|A C^{-1})\;\;\;{\color{blue}{\left(\because\, H(A B^{-1}|A, A C^{-1}) = H(B)\right)}}\\ &\le H(A B^{-1}, A C^{-1})\\ &= H(A B^{-1}, B C^{-1})\;\;\;{\color{blue}{\left(\because\, H(A B^{-1}, B C^{-1}) = H(A B^{-1}, A C^{-1})\right)}}\\ &\le H(A B^{-1})+H(B C^{-1})\\ \end{align*} $

User-defined information quantities¶

In [10]:

# Define Gács-Körner common information [Gács-Körner 1973]
gkci = ((H(V|X) == 0) & (H(V|Y) == 0)).maximum(H(V), V)

# Define Wyner's common information [Wyner 1975]
wci = markov(X, U, Y).minimum(I(U & X+Y), U)

# Define common entropy [Kumar-Li-El Gamal 2014]
eci = markov(X, U, Y).minimum(H(U), U)

In [11]:

(gkci <= I(X & Y)).solve()        # Gács-Körner <= I(X;Y)

Out[11]:

$\begin{array}{l} \sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V) \le I(X; Y)\;\mathrm{is}\;\mathrm{True}\\ \end{array}$

In [12]:

(I(X & Y) <= wci).solve()         # I(X;Y) <= Wyner

Out[12]:

$\begin{array}{l} I(X; Y) \le \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y)\;\mathrm{is}\;\mathrm{True}\\ \end{array}$

In [13]:

(wci <= emin(H(X), H(Y))).solve() # Wyner <= min(H(X),H(Y))

Out[13]:

$\begin{array}{l} \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y) \le \min\left(H(X),\, H(Y)\right)\;\mathrm{is}\;\mathrm{True}\\ \left\{ \begin{array}{ll} U & := X\\ U & := Y\end{array}\right.\\ \end{array}$

In [14]:

(gkci <= wci).solve(full=True) # Output proof of Gács-Körner <= Wyner

Out[14]:

$\begin{align*} & \begin{array}{l} \sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V) \le \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y)\;\mathrm{is}\;\mathrm{True}\\ \end{array}\\ \\ &1.\;\text{Claim:}\\ & \sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V) \le \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y)\\ \\ &\;\;1.1.\;\text{Steps: }\\ &\;\;\sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V)\\ &\;\;\le H(V)\;\;\;{\color{blue}{\left(\because\,\text{at optimum}\text{:}\, \sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V) \le H(V)\right)}}\\ &\;\;= I(V; Y)\;\;\;{\color{blue}{\left(\because\, H(V|Y) = 0\right)}}\\ &\;\;\le I(V, X; Y)\\ &\;\;= I(X; Y)\;\;\;{\color{blue}{\left(\because\, V \leftrightarrow X \leftrightarrow Y\right)}}\\ &\;\;\le I(U, Y; X)\\ &\;\;= I(U; X)\;\;\;{\color{blue}{\left(\because\, Y \leftrightarrow U \leftrightarrow X\right)}}\\ &\;\;\le I(U; X, Y)\\ &\;\;\le \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y)\\ &\;\;\;\;\;\;\;\;\;\;{\color{blue}{\left(\because\,\text{at optimum}\text{:}\, \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y) \ge I(U; X, Y)\right)}}\\ \end{align*} $

In [15]:

# Automatically discover inequalities among quantities
universe().discover([X, Y, gkci, wci, eci])

Out[15]:

$\left\{\begin{array}{l} \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}H(U) \le H(X),\\ \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}H(U) \le H(Y),\\ I(X; Y) \le \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y),\\ \sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V) \ge 0,\\ \sup_{V:\, H(V|X) = 0,\; H(V|Y) = 0}H(V) \le I(X; Y),\\ \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}I(U; X, Y) \le \inf_{U:\, X \leftrightarrow U \leftrightarrow Y}H(U)\\ \end{array} \right\}$

Automatic inner/outer bound for degraded broadcast channel¶

In [16]:

X, Y, Z = rv("X, Y, Z")
M1, M2 = rv_array("M", 1, 3)
R1, R2 = real_array("R", 1, 3)

model = CodingModel()
model.add_node(M1+M2, X, label="Enc")  # Encoder maps M1,M2 to X
model.add_edge(X, Y)                   # Channel X -> Y -> Z
model.add_edge(Y, Z)
model.add_node(Y, M1, label="Dec 1")   # Decoder1 maps Y to M1
model.add_node(Z, M2, label="Dec 2")   # Decoder2 maps Z to M2
model.set_rate(M1, R1)                 # Rate of M1 is R1
model.set_rate(M2, R2)                 # Rate of M2 is R2

In [17]:

model.graph()             # Draw diagram

Out[17]:

In [18]:

# Inner bound via [Lee-Chung 2015], give superposition region [Bergmans 1973], [Gallager 1974]
r = model.get_inner(is_proof=True)  # Display codebook, encoding and decoding info
r.display(note=True)

$\displaystyle \begin{array}{l} {\color{blue}{\text{Codebook:}}}\\ {\color{blue}{\;\;X[i_1], M_1[i_1]\text{,}\,\text{rate = }I(A_{M_2}, Y; X)}}\\ {\color{blue}{\;\;A_{M_2}[i_2], M_2[i_2]\text{,}\,\text{rate = }I(A_{M_2}; Z)}}\\ {\color{blue}{\text{Enc}\,\text{finds}\,i_1, i_2\text{:}\,\left(M_1, M_2\text{,}X[i_1], M_1[i_1], A_{M_2}[i_2], M_2[i_2]\right)\in\mathcal{T}}}\\ {\color{blue}{\text{Dec 1}\,\text{finds}\,i_1\text{:}\,\exists\,i_2\text{:}\,\left(Y\text{,}A_{M_2}[i_2], M_2[i_2]\text{,}X[i_1], M_1[i_1]\right)\in\mathcal{T}}}\\ {\color{blue}{\text{Dec 2}\,\text{finds}\,i_2\text{:}\,\left(Z\text{,}A_{M_2}[i_2], M_2[i_2]\right)\in\mathcal{T}}}\\ \exists A_{M_2}:\, \left\{\begin{array}{l} R_1 \ge 0,\\ R_2 \ge 0,\\ \begin{array}{l} R_2 \le I(A_{M_2}; Z)\\ {\color{blue}{\;\;\;\;\left(\because\,\text{enc }M_1, M_2\text{ to }A_{M_2}\text{,}\,\text{dec }Z\text{ to }A_{M_2}, M_2\right)}} \end{array},\\ \begin{array}{l} R_1+R_2 \le I(A_{M_2}; Z)+I(X; Y|A_{M_2})\\ {\color{blue}{\;\;\;\;\left(\because\,\text{enc }M_1, M_2\text{ to }X, A_{M_2}\text{,}\,\text{dec }Y\text{ to }X, M_1\text{,}\,\text{dec }Z\text{ to }A_{M_2}, M_2\right)}} \end{array},\\ A_{M_2} \leftrightarrow X \leftrightarrow Y \leftrightarrow Z\\ \end{array} \right\} \end{array}$

In [19]:

# Automatic outer bound with 1 auxiliary, gives superposition region
model.get_outer(1)

Out[19]:

$\exists A:\, \left\{\begin{array}{l} R_1 \ge 0,\\ R_2 \ge 0,\\ R_2 \le I(A; Z),\\ R_1 \le I(X; Y|A),\\ A \leftrightarrow X \leftrightarrow Y \leftrightarrow Z\\ \end{array} \right\}$

In [20]:

# Converse proof, print auxiliary random variables
(model.get_outer() >> r).solve(display_reg=False)

Out[20]:

$\begin{array}{l} \mathrm{True}\\ A_{M_2} := (Y_P, M_2)\\ \end{array}$

In [21]:

# Output the converse proof
(model.get_outer(is_proof = True) >> r).proof()

Out[21]:

$\begin{align*} &1.\;\text{Claim:}\\ &\exists A_{M_2}:\, \left\{\begin{array}{l} R_1 \ge 0,\\ R_2 \ge 0,\\ R_2 \le I(A_{M_2}; Z),\\ R_1+R_2 \le I(A_{M_2}; Z)+I(X; Y|A_{M_2}),\\ A_{M_2} \leftrightarrow X \leftrightarrow Y \leftrightarrow Z\\ \end{array} \right\}\\ \\ &\;\;1.1.\;\text{Substitute }A_{M_2} := (Y_P, M_2)\text{:}\\ \\ &\;\;1.2.\;\text{Steps: }\\ &\;\;R_2\\ &\;\;\le I(M_2; Z|Z_P)\;\;\;{\color{blue}{\left(\because\,\text{decode }M_2\text{:}\, R_2 \le I(M_2; Z|Z_P)\right)}}\\ &\;\;\le I(M_2, Z_P; Z)\\ &\;\;\le I(M_2, Y_P, Z_P; Z)\\ &\;\;= I(M_2, Y_P; Z)\;\;\;{\color{blue}{\left(\because\, Z_P \leftrightarrow (Y_P, M_2) \leftrightarrow Z\right)}}\\ \\ &\;\;1.3.\;\text{Steps: }\\ &\;\;R_1+R_2\\ &\;\;\le R_2+I(M_1; Y|Y_P)\;\;\;{\color{blue}{\left(\because\,\text{decode }M_1\text{:}\, R_1 \le I(M_1; Y|Y_P)\right)}}\\ &\;\;\le I(M_1; Y|Y_P)+I(M_2; Z|Z_P)\;\;\;{\color{blue}{\left(\because\,\text{decode }M_2\text{:}\, R_2 \le I(M_2; Z|Z_P)\right)}}\\ &\;\;\le I(M_2; Z|Z_P)+I(M_1; Y|M_2, Y_P)\;\;\;{\color{blue}{\left(\because\,\text{indep. of msgs }M_1\text{, }M_2\text{:}\, I(M_1; Y; M_2|Y_P) \le 0\right)}}\\ &\;\;\le I(M_1; Y|M_2, Y_P)+I(M_2, Z_P; Z)\\ &\;\;\le I(M_1; Y|M_2, Y_P)+I(M_2, Y_P, Z_P; Z)\\ &\;\;= I(M_1; Y|M_2, Y_P)+I(M_2, Y_P; Z)\;\;\;{\color{blue}{\left(\because\, Z_P \leftrightarrow (Y_P, M_2) \leftrightarrow Z\right)}}\\ &\;\;\le I(M_2, Y_P; Z)+I(M_1, Y_F; Y|M_2, Y_P)\\ &\;\;= I(M_2, Y_P; Z)+I(M_1, M_2, Y_F; Y)-I(M_2, Y_P; Y)\;\;\;{\color{blue}{\left(\because\, Y_P \leftrightarrow (M_2, M_1, Y_F) \leftrightarrow Y\right)}}\\ &\;\;= I(M_2, Y_P; Z)+I(M_1, M_2, X, Y_F; Y)-I(M_2, Y_P; Y)\;\;\;{\color{blue}{\left(\because\, X \leftrightarrow (M_2, M_1, Y_F) \leftrightarrow Y\right)}}\\ &\;\;= I(M_2, Y_P; Z)+I(X; Y|M_2, Y_P)-I(M_2, Y_P; Y|X)\;\;\;{\color{blue}{\left(\because\, (M_1, M_2, Y_F) \leftrightarrow X \leftrightarrow Y\right)}}\\ &\;\;= I(M_2, Y_P; Z)+I(X; Y|M_2, Y_P)\;\;\;{\color{blue}{\left(\because\, (M_2, Y_P) \leftrightarrow X \leftrightarrow Y\right)}}\\ \end{align*} $

In [22]:

r.maximum(R1 + R2, [R1, R2])          # Max sum rate

Out[22]:

$\sup_{ X \leftrightarrow Y \leftrightarrow Z}I(X; Y)$

In [23]:

r.maximum(emin(R1, R2), [R1, R2])     # Max symmetric rate

Out[23]:

$\sup_{A_{M_2}:\, A_{M_2} \leftrightarrow X \leftrightarrow Y \leftrightarrow Z}\min\left(I(A_{M_2}; Z),\, \frac{1}{2}I(A_{M_2}; Z)+\frac{1}{2}I(X; Y|A_{M_2})\right)$

In [24]:

r.exists(R1)   # Eliminate R1, same as r.projected(R2)

Out[24]:

$\left\{\begin{array}{l} R_2 \ge 0,\\ R_2 \le I(X; Z),\\ X \leftrightarrow Y \leftrightarrow Z\\ \end{array} \right\}$

In [25]:

# Eliminate Z, i.e., taking union of the region over all choices of Z
# The program correctly deduces that it suffices to consider Z = Y
r.exists(Z).simplified()

Out[25]:

$\left\{\begin{array}{l} R_1 \ge 0,\\ R_2 \ge 0,\\ R_1+R_2 \le I(X; Y)\\ \end{array} \right\}$

Non-Shannon-type Inequalities¶

In [26]:

# Zhang-Yeung inequality [Zhang-Yeung 1998] cannot be proved by Shannon-type inequalities
(2*I(Z&W) <= I(X&Y) + I(X & Z+W) + 3*I(Z&W | X) + I(Z&W | Y)).solve()

Out[26]:

$\begin{array}{l} 2I(Z; W) \le I(X; Y)+I(X; Z, W)+3I(Z; W|X)+I(Z; W|Y)\;\mathrm{is}\;\mathrm{Unknown}\\ \end{array}$

In [27]:

# Using copy lemma [Zhang-Yeung 1998], [Dougherty-Freiling-Zeger 2011]
# You may use the built-in "with copylem().assumed():" instead of the below
with eqdist([X, Y, U], [X, Y, Z]).exists(U).forall(X+Y+Z).assumed():
    
    # Prove Zhang-Yeung inequality, and print how the copy lemma is used
    display((2*I(Z&W) <= I(X&Y) + I(X & Z+W) + 3*I(Z&W | X) + I(Z&W | Y)).solve())

$\begin{array}{l} 2I(Z; W) \le I(X; Y)+I(X; Z, W)+3I(Z; W|X)+I(Z; W|Y)\;\mathrm{is}\;\mathrm{True}\\ \left\{ \begin{array}{ll} X_1 & := X\\ Y_1 & := Z\\ Z_1 & := W\end{array}\right.\\ \end{array}$

In [28]:

# State the copy lemma
r = eqdist([X, Y, U], [X, Y, Z]).exists(U)

# Automatically discover non-Shannon-type inequalities using copy lemma
r.discover([X, Y, Z, W]).simplified()

Out[28]:

$\left\{\begin{array}{l} I(W; X; Z) \le 2I(X; Z|Y)+I(W; Y)+I(X; Y|Z)+I(Y; Z|X),\\ I(W; Y; Z) \le 2I(Y; Z|X)+I(W; X)+I(X; Y|Z)+I(X; Z|Y),\\ I(X; Y; Z) \le I(W; Z)+I(X; Y|W)+I(X; Y|Z)+I(X; Z|Y)+I(Y; Z|X)\\ \end{array} \right\}$

References¶

Refer to https://github.com/cheuktingli/psitip#references

Madiman, Mokshay. "On the entropy of sums." 2008 IEEE Information Theory Workshop. IEEE, 2008.
Ruzsa, Imre Z. "Sums of finite sets." Number Theory: New York Seminar 1991-1995. Springer, New York, NY, 1996.
Tao, Terence. "Sumset and inverse sumset theory for Shannon entropy." Combinatorics, Probability and Computing 19.4 (2010): 603-639.
Peter Gacs and Janos Korner. Common information is far less than mutual information.Problems of Control and Information Theory, 2(2):149-162, 1973.
A. D. Wyner. The common information of two dependent random variables. IEEE Trans. Info. Theory, 21(2):163-179, 1975.
G. R. Kumar, C. T. Li, and A. El Gamal, "Exact common information," in Proc. IEEE Symp. Info. Theory. IEEE, 2014, pp. 161-165.
Si-Hyeon Lee and Sae-Young Chung, "A unified approach for network information theory," 2015 IEEE International Symposium on Information Theory (ISIT), IEEE, 2015.
Bergmans, P. "Random coding theorem for broadcast channels with degraded components." IEEE Transactions on Information Theory 19.2 (1973): 197-207.
Gallager, Robert G. "Capacity and coding for degraded broadcast channels." Problemy Peredachi Informatsii 10.3 (1974): 3-14.
Z. Zhang and R. W. Yeung, "On characterization of entropy function via information inequalities," IEEE Trans. Inform. Theory, vol. 44, pp. 1440-1452, Jul 1998.
Randall Dougherty, Chris Freiling, and Kenneth Zeger. "Non-Shannon information inequalities in four random variables." arXiv preprint arXiv:1104.3602 (2011).

In [ ]: