Tracking inconsistencies in Jupyter notebooks

This is the first in a series of multiple posts covering different aspects of Jupyter notebooks, specifically focused the issues that arise from the interaction between and invisible runtime (the "kernel") and a source that can arbitrarily be executed out of order.

One of the major criticism of notebooks (see for example Joel Grus' talk at JupyterCon 2018 and this follow up article) by Yihui Xie is that they do not enforce execution order, ie. you can run the first cell, then the third, and finally the second, and end up in an inconsistent state.

There have been several ideas surfaced to alleviate this concern, for example turning notebooks into a dataflow programming model which adds named identifiers to the outputs, in effect creating a DAG of cells. Alternatively, nodebook serializes the entire state after the execution of each cell and enforces that re-executing a prior cell picks up from that serialized states and invalidates anything under it.

The latter is interesting, in that it brings notebooks closer to the way people think about Python scripts in general, but the constraint that the entire state needs to be serialized is far too big, especially considering notebooks are mostly used to analyze large amounts of data, which would be prohibitively costly to store multiple copies of.

This however gave me an idea. Instead of strictly enforcing that everything be executed from the top, what if we could use a similar technique to detect which past executions end up being inconsistent with the current state of a kernel. Conceptually, we can treat each execution as a function of the variables it reads from the environment, and mark executions as inconsistent if they operate on variables that have changed. This notebook investigates how this idea might work in practice, and what limitations it might have, using a toy example. As we'll see, this isn't a silver bullet and comes with several caveats which I'll get into towards the end.

Tracking changes

The first step to implement the above is to have a solid strategy for checking whether values have changed in between executions. To that effect, I'm using dill because as a complex hashing function because it supports throwing a bunch of stuff at it:

In [1]:
import dill
from typing import Any, NewType, Optional

DillHash = NewType("DillHash", Optional[int])

def dill_hash(value: Any) -> DillHash:
        return hash(dill.dumps(value))
    except TypeError:
        return None

This will handle numbers, strings, lists, functions, etc and is stable across calls:

In [2]:
values = [123, "foo", lambda x: x]

for value in values:
    assert dill_hash(value) == dill_hash(value)

And will fallback on returning None for objects that dill cannot pickle.

In [3]:
assert dill_hash(y for y in [1, 2, 3]) == None

A spherical kernel

Since this is not meant to be a full blown implementation of the concept, rather an exploration of what is possible with this technique, we'll implement something that resembles an iPython kernel for illustration purposes. At its simplest level, this boils down to the following interface:

In [4]:
import abc

class AbstractKernel(metaclass=abc.ABCMeta):
    execution_count: int

    def execute(self, source: str) -> "ExecutedCode":
        Execute `source` within the context of this kernel.

Consistency Tracking Kernel

In order to track which variables are required during code execution, we'll need several building blocks:

  1. A way of describing what state variables and cells are in.
  2. An environment to store global variables and a way to check what state they are in
  3. An abstraction to represent code that has been executed within the kernel
  4. Some mechanism to capture which global variables a piece of code has read
  5. Finally, a kernel that ties all of the above together.


This is a ternary value because as stated above, in some cases where variables aren't hashable we might not know whether they have changed or not. We'll also want to pretty print those so let's abuse the Enum to store some ANSI colors in there too.

In [5]:
from enum import Enum

class State(Enum):
    Consistent = "\033[30m"  # black
    Unknown = "\033[33m"  # yellow
    Inconsistent = "\033[31m"  # red

RESET = "\033[0m"

for state in State:


Assuming we store the environment in a namespace dictionary, the following method will be useful to check which state a variable is in:

In [6]:
class Namespace(dict):
    def get_state(self, name: str, previous_hash: int) -> State:
        Compare the current hash of a given item name with the one provided.

        If either hashes is None, we can't establish whether the variable is 
        consistent and return Unknown. Otherwise we return Consistent if both
        hashes are identical.
        if name not in self:
            return State.Inconsistent

        current_hash = dill_hash(self[name])
        if None in [current_hash, previous_hash]:
            return State.Unknown

        if current_hash == previous_hash:
            return State.Consistent
            return State.Inconsistent

Executed Code

Next we'll need something representing a piece of code we executed within the context of a kernel

In [7]:
from textwrap import dedent
from typing import Dict

class ExecutedCode:
    Abstraction capturing the code that was run in a given call to 
    alongside its dependencies and at which index it was executed

    source: str
    dependencies: Dict[str, DillHash]
    execution_order: int

    def __init__(self, source: str, execution_order: int) -> None:
        self.source = dedent(source).strip()
        self.execution_order = execution_order
        self.dependencies = {}

    def add_dependency(self, name: str, value: Any) -> None:
        self.dependencies[name] = dill_hash(value)

    def get_state(self, namespace: Namespace) -> State:
        Aggregates variable level state into code level state:
        Inconsistent wins over Unknown wins over Consistent.
        if not self.dependencies:
            return State.Consistent

        return max(
                namespace.get_state(name, previous_hash=value_hash)
                for name, value_hash in self.dependencies.items()

Namespace Proxy

In order to know which variables are accessed when code is being executed, we have to wrap our Namespace in some kind of observer that can keep track of reads and accordingly update the relevant input code's dependencies.

In [8]:
class NamespaceProxy(dict):
    A dictionary subclass that tracks when variables are read and updates the relevant `Code`

    namespace: Namespace
    executed_code: ExecutedCode

    def __init__(self, namespace: Namespace, executed_code: ExecutedCode) -> None:
        self.namespace = namespace
        self.executed_code = executed_code

    def __getitem__(self, name: str) -> Any:
        value = self.namespace[name]
        self.executed_code.add_dependency(name, value)
        return value

    def __setitem__(self, name: str, value: Any) -> None:
        self.namespace[name] = value

    def __delitem__(self, name: str) -> None:
        del self.namespace[name]


Finally we have all the ingredients to define our kernel as follows:

In [9]:
from typing import Dict, List, NamedTuple, Tuple

class Kernel(AbstractKernel):
    A Kernel implementation that keeps track of inconsistent code executions.

    execution_count: int
    namespace: Namespace

    def __init__(self) -> None:
        self.execution_count = 0
        self.namespace = Namespace()

    def execute(self, source: str) -> ExecutedCode:
        executed_code = ExecutedCode(source, execution_order=self.execution_count)

        exec(executed_code.source, NamespaceProxy(self.namespace, executed_code))
        self.execution_count += 1

        return executed_code


Let's consider the following very simple example: we first assign x, assign twice its value to y, and then update x. As one would expect, once the third line has been executed, the second one becomes inconsistent, because at that point x == 2 and y == 2 which is different from 2 * x.

In [10]:
kernel = Kernel()

for executed_code in [
    kernel.execute("x = 1"),
    kernel.execute("y = 2 * x"),
    kernel.execute("x = 2"),
x = 1
y = 2 * x
x = 2

Tying things into a "Notebook"

The confusing part of executing isn't so much the fact that the line above might become inconsistent, after all the above is idiomatic Python. What usually trips users up is when cells are executed out of order, for example:

[0] x = 1
[1] print(x)

And the user goes back and edits/executes a previous cell, such as:

[2] x = 2
[1] print(x)

In order to illustrate this, let's come up with a very simple model of a notebook:

In [11]:
from textwrap import indent

class Cell:
    executed_code: ExecutedCode
    state: State

    def __init__(self, executed_code: ExecutedCode) -> None:
        self.executed_code = executed_code
        self.state = State.Consistent

    def display(self) -> None:
        execution_order_prefix = f"[{self.executed_code.execution_order}] "
        formatted_source = indent(self.executed_code.source, len(execution_order_prefix) * " ").strip()

Which we can use to render a cell containing a code block:

In [12]:
        def square(x: int) -> int:
            return x * x
[2] def square(x: int) -> int:
        return x * x

Finally, we can assemble several such cells in a notebook which has an associated kernel where we'll execute their code:

In [13]:
class Notebook:
    A notebook is a collection of cells and a kernel

    cells: List[Cell]
    kernel: Kernel

    def __init__(self, *sources: str) -> None:
        self.cells = []
        self.kernel = Kernel()

        for source in sources:

    def add_cell(self, source: str) -> None:

    def update_cell(self, index: int, source: str) -> None:
        self.cells[index] = Cell(executed_code=self.kernel.execute(source))

        for cell in self.cells[index + 1 :]:
            cell.state = cell.executed_code.get_state(self.kernel.namespace)

    def reexecute_cell(self, index: int) -> None:
        self.update_cell(index, self.cells[index].executed_code.source)

    def display(self):
        for cell in self.cells:

Going back to the example above:

In [14]:
nb = Notebook(
    "x = 1", 
    "y = x",
[0] x = 1
[1] y = x

If we update the first cell to x = 2, then we end up with an inconsistent second cell, because y != x after the execution:

In [15]:
nb.update_cell(0, "x = 2")
[2] x = 2
[1] y = x

Similarly, if instead of starting with a number we start with x being a generator, which we can't serialize, we can surface that we're now in an unknown state (ie. it might be consistent, but we can't ensure it):

In [16]:
nb2 = Notebook(
    "x = (y for y in [1, 2, 3])", 
    "z = x",
[2] x = (y for y in [1, 2, 3])
[1] z = x

Here's a slightly more convoluted example, where the variable itself doesn't change, but its value does.

In [17]:
nb3 = Notebook(
    "counters = {'a': 0, 'b': 1}",
    "counters['a'] += 1",
    "x = counters['a']",
    "y = 1 + 1",
[0] counters = {'a': 0, 'b': 1}
[1] counters['a'] += 1
[2] x = counters['a']
[3] y = 1 + 1

Suppose we now execute cell 1 again, then cell 2 becomes inconsistent, because x == 1, whereas counters['a'] == 2.

In [18]:

print("x:", nb3.kernel.namespace["x"])
print("counters['a']:", nb3.kernel.namespace["counters"]["a"])
[0] counters = {'a': 0, 'b': 1}
[4] counters['a'] += 1
[2] x = counters['a']
[3] y = 1 + 1

x: 1
counters['a']: 2

Caveats and limitations

Like I mentioned earlier, there are quite a few limitations to this approach:

  • It requires objects to be picklable to reliably assess whether cells are consistent or not. Introducing a concept of Unknown consistency somehow mitigates that, in that there is at least a fallback behavior even if not totally satisfactory.

  • It assumes that cells are deterministic, which excludes a large swath of use cases where randomness is involved.

  • This method will err on the side of inconsistency. Consider the following example, because the dictionary has changed in between calls, we're now marking the last cell as inconsistent, even though the output is the same. I don't claim to have a solution for this, but I would still argue in favor of at least warning the user that that cell might be in an inconsistent state because d changed.
In [19]:
nb4 = Notebook(
    "d = {1: 2}", 
    "d[2] = 3", 
    "x = d[1]",
nb4.update_cell(1, "d[2] = 4")

[0] d = {1: 2}
[3] d[2] = 4
[2] x = d[1]
  • Because we are using dill.dumps to compute the value hash, the overhead on larger data structures might be prohibitively expensive (think gigabyte sized dataframes)
  • Some might prefer a different definition of consistency, for example that inconsistencies should propagate, that is in the following case, both cells 1 and 2 should be inconsistent. This is slightly out of scope as this notebook is getting rather long, but it can be mitigated by having Code track both its inputs and outputs, and propagate state along those edges.
In [20]:
nb5 = Notebook(
    "x = 1", 
    "y = x + 1", 
    "z = y + 1",
nb5.update_cell(0, "x = 2")
[3] x = 2
[1] y = x + 1
[2] z = y + 1


We've explored using hashing to keep track of dependencies between cells in a notebook and its associated kernel, and we've shown this method to be effective on toy examples. While implementing this in Jupyter could provide value to the user - one could imagine building a plugin surfacing the likely state of a cell right next to it, we've also surfaced a number of limitations that might prevent it from being the most efficient solution to the problem of keeping track of out of order executions in a notebook.

In a subsequent note, we'll take a look at a different strategy, leveraging the OS itself to take care of some of the grunt work for us.


Are you excited about this kind of work? Then you should reach out, I'm starting mlon to build better tools for data science and AI.