MNIST Digit Addition Problem

Consider a task where one needs to learn a classifier $\mathtt{addition(X,Y,N)}$ where $\mathtt{X}$ and $\mathtt{Y}$ are images of digits (the MNIST data set will be used), and $\mathtt{N}$ is a natural number corresponding to the sum of these digits. The classifier should return an estimate of the validity of the addition ($0$ is invalid, $1$ is valid).

For instance, if $\mathtt{X}$ is an image of a 0 and $\mathtt{Y}$ is an image of a 9:

  • if $\mathtt{N} = 9$, then the addition is valid;
  • if $\mathtt{N} = 4$, then the addition is not valid.

A natural approach is to seek to first 1) learn a single digit classifier, then 2) benefit from knowledge readily available about the properties of addition. For instance, suppose that a predicate $\mathrm{digit}(x,d)$ gives the likelihood of an image $x$ being of digit $d$, one could query with LTN:
$$ \exists d_1,d_2 : d_1+d_2= \mathtt{N} \ (\mathrm{digit}(\mathtt{X},d_1)\land \mathrm{digit}(\mathtt{Y},d_2)) $$ and use the satisfaction of this query as the output of $\mathtt{addition(X,Y,N)}$ .

The challenge is the following:

  • We provide, in the data, pairs of images $\mathtt{X}$, $\mathtt{Y}$ and the result of the addition $\mathtt{N}$ (final label),
  • We do not provide the intermediate labels, the correct digits for $d_1$, $d_2$.

Regardless, it is possible to use the equation above as background knowledge to train $\mathrm{digit}$ with LTN. In contrast, a standard neural network baseline cannot incorporate such intermediate components as nicely.

In [1]:
import tensorflow as tf
import logictensornetworks as ltn
import baselines, data, commons
import matplotlib.pyplot as plt

Data

Dataset of images for the digits X and Y, and their label Z s.t. X+Y=Z.

In [2]:
ds_train, ds_test = data.get_mnist_op_dataset(
        count_train=3000,
        count_test=1000,
        buffer_size=3000,
        batch_size=16,
        n_operands=2,
        op=lambda args: args[0]+args[1])

# Visualize one example
x, y, z = next(ds_train.as_numpy_iterator())
plt.subplot(121)
plt.imshow(x[0][:,:,0])
plt.subplot(122)
plt.imshow(y[0][:,:,0])
print("Result label is %i" % z[0])
2021-08-31 08:20:37.202932: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.209954: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.210547: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.211984: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-08-31 08:20:37.212439: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.213037: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.213572: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.779554: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.780203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.780767: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-08-31 08:20:37.781309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13803 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5
Result label is 7
2021-08-31T08:20:38.344252 image/svg+xml Matplotlib v3.4.2, https://matplotlib.org/

LTN

In [3]:
logits_model = baselines.SingleDigit()
Digit = ltn.Predicate(ltn.utils.LogitsToPredicateModel(logits_model))

d1 = ltn.Variable("digits1", range(10))
d2 = ltn.Variable("digits2", range(10))

Not = ltn.Wrapper_Connective(ltn.fuzzy_ops.Not_Std())
And = ltn.Wrapper_Connective(ltn.fuzzy_ops.And_Prod())
Or = ltn.Wrapper_Connective(ltn.fuzzy_ops.Or_ProbSum())
Implies = ltn.Wrapper_Connective(ltn.fuzzy_ops.Implies_Reichenbach())
Forall = ltn.Wrapper_Quantifier(ltn.fuzzy_ops.Aggreg_pMeanError(),semantics="forall")
Exists = ltn.Wrapper_Quantifier(ltn.fuzzy_ops.Aggreg_pMean(),semantics="exists")

Notice the use of Diag: when grounding $x$,$y$,$n$ with three sequences of values, the $i$-th examples of each variable are matching. That is, (images_x[i],images_y[i],labels[i]) is a tuple from our dataset of valid additions. Using the diagonal quantification, LTN aggregates pairs of images and their corresponding result, rather than any combination of images and results.

Notice also the guarded quantification: by quantifying only on the "intermediate labels" (not given during training) that could add up to the result label (given during training), we incorporate symbolic information into the system.

In [4]:
# mask
add = ltn.Function.Lambda(lambda inputs: inputs[0]+inputs[1])
equals = ltn.Predicate.Lambda(lambda inputs: inputs[0] == inputs[1])

### Axioms
@tf.function
def axioms(images_x, images_y, labels_z, p_schedule=tf.constant(2.)):
    images_x = ltn.Variable("x", images_x)
    images_y = ltn.Variable("y", images_y)
    labels_z = ltn.Variable("z", labels_z)
    axiom = Forall(
            ltn.diag(images_x,images_y,labels_z),
            Exists(
                (d1,d2),
                And(Digit([images_x,d1]),Digit([images_y,d2])),
                mask=equals([add([d1,d2]), labels_z]),
                p=p_schedule
            ),
            p=2
        )
    sat = axiom.tensor
    return sat

images_x, images_y, labels_z = next(ds_train.as_numpy_iterator())
axioms(images_x, images_y, labels_z)
2021-08-31 08:20:38.916546: W tensorflow/core/data/root_dataset.cc:167] Optimization loop failed: Cancelled: Operation was cancelled
2021-08-31 08:20:41.384729: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2021-08-31 08:20:42.167858: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8202
2021-08-31 08:20:42.751238: I tensorflow/core/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory
Out[4]:
<tf.Tensor: shape=(), dtype=float32, numpy=0.010769486>

Optimizer, training steps and metrics

In [5]:
optimizer = tf.keras.optimizers.Adam(0.001)
metrics_dict = {
    'train_loss': tf.keras.metrics.Mean(name="train_loss"),
    'train_accuracy': tf.keras.metrics.Mean(name="train_accuracy"),
    'test_loss': tf.keras.metrics.Mean(name="test_loss"),
    'test_accuracy': tf.keras.metrics.Mean(name="test_accuracy")    
}

@tf.function
def train_step(images_x, images_y, labels_z, **parameters):
    # loss
    with tf.GradientTape() as tape:
        loss = 1.- axioms(images_x, images_y, labels_z, **parameters)
    gradients = tape.gradient(loss, logits_model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, logits_model.trainable_variables))
    metrics_dict['train_loss'](loss)
    # accuracy
    predictions_x = tf.argmax(logits_model(images_x),axis=-1)
    predictions_y = tf.argmax(logits_model(images_y),axis=-1)
    predictions_z = predictions_x + predictions_y
    match = tf.equal(predictions_z,tf.cast(labels_z,predictions_z.dtype))
    metrics_dict['train_accuracy'](tf.reduce_mean(tf.cast(match,tf.float32)))
    
@tf.function
def test_step(images_x, images_y, labels_z, **parameters):
    # loss
    loss = 1.- axioms(images_x, images_y, labels_z, **parameters)
    metrics_dict['test_loss'](loss)
    # accuracy
    predictions_x = tf.argmax(logits_model(images_x),axis=-1)
    predictions_y = tf.argmax(logits_model(images_y),axis=-1)
    predictions_z = predictions_x + predictions_y
    match = tf.equal(predictions_z,tf.cast(labels_z,predictions_z.dtype))
    metrics_dict['test_accuracy'](tf.reduce_mean(tf.cast(match,tf.float32)))

Training

In [6]:
from collections import defaultdict

scheduled_parameters = defaultdict(lambda: {})
for epoch in range(0,4):
    scheduled_parameters[epoch] = {"p_schedule":tf.constant(1.)}
for epoch in range(4,8):
    scheduled_parameters[epoch] = {"p_schedule":tf.constant(2.)}
for epoch in range(8,12):
    scheduled_parameters[epoch] = {"p_schedule":tf.constant(4.)}
for epoch in range(12,20):
    scheduled_parameters[epoch] = {"p_schedule":tf.constant(6.)}
In [7]:
commons.train(
    20,
    metrics_dict,
    ds_train,
    ds_test,
    train_step,
    test_step,
    scheduled_parameters=scheduled_parameters
)
Epoch 0, train_loss: 0.9904, train_accuracy: 0.0522, test_loss: 0.9914, test_accuracy: 0.0466
Epoch 1, train_loss: 0.9904, train_accuracy: 0.0522, test_loss: 0.9916, test_accuracy: 0.0456
Epoch 2, train_loss: 0.9703, train_accuracy: 0.1918, test_loss: 0.9457, test_accuracy: 0.3284
Epoch 3, train_loss: 0.9326, train_accuracy: 0.4192, test_loss: 0.9338, test_accuracy: 0.3760
Epoch 4, train_loss: 0.8543, train_accuracy: 0.4292, test_loss: 0.8611, test_accuracy: 0.3909
Epoch 5, train_loss: 0.8462, train_accuracy: 0.4511, test_loss: 0.8611, test_accuracy: 0.3899
Epoch 6, train_loss: 0.8377, train_accuracy: 0.4664, test_loss: 0.8620, test_accuracy: 0.3829
Epoch 7, train_loss: 0.8362, train_accuracy: 0.4691, test_loss: 0.8589, test_accuracy: 0.3978
Epoch 8, train_loss: 0.7728, train_accuracy: 0.4774, test_loss: 0.8083, test_accuracy: 0.3899
Epoch 9, train_loss: 0.7698, train_accuracy: 0.4761, test_loss: 0.8011, test_accuracy: 0.4008
Epoch 10, train_loss: 0.7600, train_accuracy: 0.4927, test_loss: 0.7937, test_accuracy: 0.4127
Epoch 11, train_loss: 0.7565, train_accuracy: 0.4977, test_loss: 0.7985, test_accuracy: 0.4067
Epoch 12, train_loss: 0.7305, train_accuracy: 0.5017, test_loss: 0.7805, test_accuracy: 0.4028
Epoch 13, train_loss: 0.7299, train_accuracy: 0.5073, test_loss: 0.7729, test_accuracy: 0.4157
Epoch 14, train_loss: 0.7295, train_accuracy: 0.4993, test_loss: 0.7763, test_accuracy: 0.4157
Epoch 15, train_loss: 0.7179, train_accuracy: 0.5196, test_loss: 0.7700, test_accuracy: 0.4206
Epoch 16, train_loss: 0.7170, train_accuracy: 0.5213, test_loss: 0.7727, test_accuracy: 0.4127
Epoch 17, train_loss: 0.7166, train_accuracy: 0.5209, test_loss: 0.7745, test_accuracy: 0.4097
Epoch 18, train_loss: 0.7096, train_accuracy: 0.5336, test_loss: 0.7771, test_accuracy: 0.4028
Epoch 19, train_loss: 0.7001, train_accuracy: 0.5465, test_loss: 0.7755, test_accuracy: 0.4067
In [ ]: