This tutorial defines and illustrates reasoning in LTN. It expects basic familiarity with other parts of LTN.
The essence of reasoning is to determine if a closed formula $\phi$ is the logical consequence of a knowledgebase $(\mathcal{K},\mathcal{G}_\theta,\Theta)$, where $\mathcal{K}$ denotes the set of rules in the knowledgebase and $\mathcal{G}_\theta$ denotes a grounding that depends on some parameters $\theta \in \Theta$.
The notion of logical consequence is adapted to Real Logic as follows:
Logical consequence in Real Logic, by direct application of the definition, requires querying the truth value of $\phi$ for a potentially infinite set of groundings. We therefore, in practice, consider the following directions:
In this tutorial, we illustrate the second option, reasoning by refutation.
import ltn
import tensorflow as tf
Init Plugin Init Graph Optimizer Init Kernel
We illustrate reasoning on the following toy example: $$ (A \lor B) \models_q A \ ? $$ where $A$ and $B$ are two propositional variables, and $\frac{1}{2} < q < 1$ is the satisfaction threshold.
We define $\mathcal{K}=\{A \lor B\}$ and $\phi=A$.
A = ltn.Proposition(0.,trainable=True)
B = ltn.Proposition(0.,trainable=True)
Or = ltn.Wrapper_Connective(ltn.fuzzy_ops.Or_ProbSum())
def axioms():
kb = Or(A,B)
sat = kb.tensor
return sat
def phi():
sat = A.tensor
return sat
Metal device set to: Apple M1 systemMemory: 16.00 GB maxCacheSize: 5.33 GB
2021-09-24 17:22:20.379651: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. 2021-09-24 17:22:20.379750: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
The goal is to find a grounding that satisfies $\mathcal{K}$ but does not satisfy $\phi$. One can perform a directed search for such a counterexample by minimizing $\mathcal{G}_\theta(\phi)$ while imposing a constraint that invalidates results where $\mathcal{G}_\theta(\mathcal{K})<q$.
Let us define $\mathrm{penalty}(\mathcal{G}_\theta,q)=\begin{cases} c \ \text{if}\ \mathcal{G}_\theta(\mathcal{K}) < q,\\ 0 \ \text{otherwise}, \end{cases}$ where $c>1$ and set the objective: $$ \mathcal{G}^\ast = \mathrm{argmin}_{\mathcal{G}_\theta} (\mathcal{G}_\theta(\phi) + \mathrm{penalty}(\mathcal{G}_\theta,q)) $$
The penalty $c$ ($>1$) is higher than any potential reduction in $\mathcal{G}(\phi)$ ($\leq 1$). $\mathcal{G}^\ast$ should satisfy in priority $\mathcal{G}^*(\mathcal{K}) \geq q$ before reducing $\mathcal{G}^*(\phi)$.
However, as $\mathrm{penalty}(\mathcal{G}_\theta,q)$ is a constant function on the continuous parts of its domain (zero gradients), it cannot be used directly as an objective to reach via gradient descent optimization.Instead, one should approximate the penalty with a soft constraint.
We use $\mathtt{elu}(\alpha,\beta (q-\mathcal{G}_\theta(\mathcal{K})))=\begin{cases} \beta (q-\mathcal{G}_\theta(\mathcal{K}))\ &\text{if}\ \mathcal{G}_\theta(\mathcal{K}) \leq 0,\\ \alpha (e^{q-\mathcal{G}_\theta(\mathcal{K})}-1) \ &\text{otherwise}, \end{cases}$ where $\alpha \geq 0$ and $\beta \geq 0$ are two hyper-parameters:
Setting $\beta$ high, the gradients for $\mathcal{G}_\theta(\mathcal{K})$ will be high in absolute value if the knowledgebase in not satisfied; the minimizer will prioritize increasing the satisfaction of the knowledgebase.
Setting $\alpha$ low but non zero ensures that, while the penalty plays an insignificant role when the knowledgebase is satisfied, the gradients do not vanish.
trainable_variables = ltn.as_tensors([A,B])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
# hyperparameters of the soft constraint
alpha = 0.05
beta = 10
# satisfaction threshold
q = 0.95
for epoch in range(4000):
with tf.GradientTape() as tape:
sat_KB = axioms()
sat_phi = phi()
penalty = tf.keras.activations.elu(beta*(q-sat_KB),alpha=alpha)
loss = sat_phi + penalty
grads = tape.gradient(loss, trainable_variables)
optimizer.apply_gradients(zip(grads, trainable_variables))
if epoch%400 == 0:
print("Epoch %d: Sat Level Knowledgebase %.3f Sat Level phi %.3f"%(epoch, axioms(), phi()))
print("Training finished at Epoch %d with Sat Level Knowledgebase %.3f Sat Level phi %.3f"%(epoch, axioms(), phi()))
Epoch 0: Sat Level Knowledgebase 0.002 Sat Level phi 0.001 Epoch 400: Sat Level Knowledgebase 0.591 Sat Level phi 0.358 Epoch 800: Sat Level Knowledgebase 0.862 Sat Level phi 0.610 Epoch 1200: Sat Level Knowledgebase 0.950 Sat Level phi 0.705 Epoch 1600: Sat Level Knowledgebase 0.950 Sat Level phi 0.615 Epoch 2000: Sat Level Knowledgebase 0.951 Sat Level phi 0.485 Epoch 2400: Sat Level Knowledgebase 0.962 Sat Level phi 0.319 Epoch 2800: Sat Level Knowledgebase 0.998 Sat Level phi 0.116 Epoch 3200: Sat Level Knowledgebase 1.000 Sat Level phi 0.000 Epoch 3600: Sat Level Knowledgebase 1.000 Sat Level phi 0.000 Training finished at Epoch 3999 with Sat Level Knowledgebase 1.000 Sat Level phi 0.000
At the end of training, the optimizer has found a grounding that satisfies $A \lor B$ but not $A$ (given the satisfaction threshold $q=0.95$). This is a counterexample to the logical consequence, proving that $A \lor B \nvDash A$