Typically a notebook's author will begin an idea from a blank documents in an editable state. Through cycles of interactive computing an author will transform the notebook's data by adding narrative, code, and metadata. The notebook's cells are parts of a whole computable document described by the notebook format.
The interactive in-memory editing mode is a critical, but fleeting stage in the life of a computable document. Notebooks spend most of their existence as whole & static files on disk. The static state of notebooks are reusable; and for notebooks to be reusable they must be reused.
Procedural notebooks are readable and reusable literate documents that can be executed successfully in other contexts like documention, module development, or external jobs. This notebook explores the reusability of procedural notebooks that successfully Restart and Run All.
Procedural notebooks are inspired by Paco Nathan's Oriole: a new learning medium based on Jupyter + Docker given at Jupyter Day Atlanta 2016. In Paco's unofficial styleguide for authoring Jupyter notebooks he suggests:
clear all output then "Run All" -- or it didn't happen
structured, and literate programming actions.
Its cells Restart and Run All to create a module and python package called particles:
particles is inspired by the New York Times R&D The Future of News is Not an Article. particles treat elements of computable documents as data and modular components.
readme.ipynb generates the particles module either in interactive mode, or procedurally from a converted Python script.
attach
is a callable used by readme to append the recent In
put as cell source to particles.ipynb; it is erroneous to the particles module. _If the readme.ipynb cells are run out of order then particles.ipynb could be created incorrectly.
from nbformat import v4, NotebookNode
nb, particles = 'particles.ipynb', v4.new_notebook();
def attach(nb:NotebookNode=particles)->None:
"""attach an input to another notebook removing attach statements.
>>> nb = v4.new_notebook();
>>> assert attach(nb) or ('doctest' in nb.cells[-1].source)"""
'In' in globals() and nb.cells.append(v4.new_code_cell('\n'.join(
str for str in In[-1].splitlines() if not str.startswith('attach'))))
%%file requirements.txt
pandas
matplotlib
Overwriting requirements.txt
particles.ipynb
¶Many cells in readme.ipynb have lived and died before you read this line.
The code cell below will be appended to particles.ipynb. It import__s tools into __readme.ipynb's interactive mode. It now becomes quite easy to iteratively develop and test parts of the procedural document.
attach(particles)
"""particles treat notebooks as data"""
'particles treat notebooks as data'
attach(particles)
from nbformat import reads, v4
from pandas import concat, DataFrame, to_datetime
from pathlib import Path
particles
¶Create two main functions for particles to export
attach()
def read_notebooks(dir:str='.')->DataFrame:
"""Read a directory of notebooks into a pandas.DataFrame
>>> df = read_notebooks('.')
>>> assert len(df) and isinstance(df, DataFrame)"""
return concat({
file: DataFrame(reads(file.read_text(), 4)['cells'])
for file in Path(dir).glob('*.ipynb')
}).reset_index(-1, drop=True)
The read_notebooks
index is a pathlib object containing extra metadata. files_to_data
extracts the stat
properties for each file.
attach()
def files_to_data(df:DataFrame)->DataFrame:
"""Transform an index of Path's to a dataframe of os_stat.
>>> df = files_to_data(read_notebooks())
"""
stats, index = [], df.index.unique()
for file in index:
stat = file.stat()
stats.append({
key: to_datetime(
getattr(stat, key), unit=key.endswith('s') and key.rsplit('_')[-1] or 's'
) if 'time' in key else getattr(stat, key)
for key in dir(stat) if not key.startswith('_') and not callable(getattr(stat, key))})
# Append the change in time to the dataframe.
return DataFrame(stats, index).pipe(lambda df: df.join((df.st_mtime - df.st_birthtime).rename('dt')))
A procedural notebooks will use clues from a namespace to decide what statements to execute in different contexts.
if __name__ != '__main__': assert __name__+'.py' == __file__
__name__
=='__main__'
, but nothing is known about the python object__file__
.
__name__
=='__main__'
andassert __file__
.
__name__ + '.py'
==__file__
.
get_ipython
¶The get_ipython
context must be manually imported to use magics in converted notebooks.
from IPython import get_ipython
Introspect the interactive Jupyter namespace to control expressions in procedural notebooks.
thing = get_ipython().user_ns.get('thing', 42):
readme
procedures to make particles
¶Below are the procedures to test and create the particles
package.
doctest
s were declared in each of our functions. doctest
can be run in an interactive notebook session, unittest
cannot.
doctest
catches a lot of errors when it is in the Restart and Run All pipeline. It is a great place to stash repeatedly typed statements.
When the tests pass write the particles.ipynb notebook.
if __name__ == '__main__':
print(__import__('doctest').testmod())
Path(nb).write_text(__import__('nbformat').writes(particles))
TestResults(failed=0, attempted=5)
if __name__ == '__main__' and '__file__' not in globals():
!jupyter nbconvert --to python --TemplateExporter.exclude_input_prompt=True particles.ipynb readme.ipynb
!autopep8 --in-place --aggressive readme.py particles.ipynb
!python -m doctest particles.py & echo "success"
!jupyter nbconvert --to markdown --TemplateExporter.exclude_input_prompt=True readme.ipynb
[NbConvertApp] Converting notebook particles.ipynb to python [NbConvertApp] Writing 1234 bytes to particles.py [NbConvertApp] Converting notebook readme.ipynb to python [NbConvertApp] Writing 10488 bytes to readme.py success [NbConvertApp] Converting notebook readme.ipynb to markdown [NbConvertApp] Writing 10407 bytes to readme.md
setuptools
will install the particles package using the conditions for setup mode.
Install the particles package
python readme.py develop
if __name__ == '__main__' and '__file__' in globals():
__import__('setuptools').setup(
name="particles",
py_modules=['particles'],
install_requires=['notebook', 'pandas'])
particles
¶A notebook that can be imported is reusable.
Particles can now be imported into the current scope. particles allow the user to explore notebooks and their cells as data.
import particles
assert particles.__file__.endswith('.py')
%matplotlib inline
from matplotlib import pyplot as plt
df = particles.read_notebooks()
df.sample(5)
cell_type | execution_count | metadata | outputs | source | |
---|---|---|---|---|---|
readme.ipynb | code | NaN | {} | [] | if __name__ == '__main__':\n print(__import... |
readme.ipynb | markdown | NaN | {'slideshow': {'slide_type': '-'}} | NaN | ### In Jupyter mode\n\n> **`__name__`** == **`... |
readme.ipynb | code | NaN | {'collapsed': True} | [] | from IPython import get_ipython |
particles.ipynb | code | NaN | {} | [] | def files_to_data(df:DataFrame)->DataFrame:\n ... |
readme.ipynb | markdown | NaN | {} | NaN | > __particles__ is inspired by the New York T... |
df.source.str.split('\n').apply(len).groupby([df.index, df.cell_type]).sum().to_frame('lines of ...').unstack(-1)
lines of ... | ||
---|---|---|
cell_type | code | markdown |
particles.ipynb | 26.0 | NaN |
readme.ipynb | 66.0 | 108.0 |
df.cell_type.groupby(df.index).value_counts().unstack('cell_type').apply(lambda df: df.plot.pie() and plt.show());
This document must Restart and Run All to acheive the goals of creating the particles module.