Notebook Splitter

A simple routine for splitting Jupyter notebooks.

Simply add one or more code or markdown cells containing just the line:

# --SPLIT HERE--

and run the code cells below.

The script will split your long_notebook.ipynb at the defined split points, creating new notebooks long_notebook_PART1.ipynb, long_notebook_PART2.ipynb, etc...

Note that we could start to elaborate this script, for example by providing an option to clean the new notebook parts by removing the code output cells, resetting the code cell execution numbers, etc etc.

In [114]:
#Enter the file name for the file you want to split
fn = 'Notebook Splitter.ipynb'

#Prevent the overwriting of files that already exist
overwrite = False
In [115]:
# --SPLIT HERE--
In [116]:
#The nbformat package provides an API for working with notebook documents
#https://nbformat.readthedocs.io/en/latest/api.html
import nbformat
nb = nbformat.read(fn, as_version=4)

#Make a copy of the notebook, just in case...
nb2 = nb.copy()

if overwrite:
    print('If notebook parts pre-exist, they will be overwritten...\nSet: overwrite=False to guard against this.')

--SPLIT HERE--

In [119]:
#We are going to see if we can split the notebook into separate parts
parts=[]

#Each part will contain the cells for the part
#The rest of the notebook structure, (notebook metadata etc) will be copied from the original notebook
partcells = []

#Iterate through all the cells
for cell in nb2['cells']:
    #Check for a splitline marker - go defensive!
    splitline = cell['source'].upper().strip().replace(' ','') == '#--SPLITHERE--'
    #If we're not at a split line,
    if not splitline:
        #Append the cell to the cell list for this part
        partcells.append(cell)
    else:
        #Otherwise, save the cells to the part...
        parts.append(partcells)
        #...and create a new part cells list
        partcells=[]

#Commit the final set of cells to the final part
parts.append(partcells)

#This is a precautionary step...
# Try not to clobber files we may have created previously
import os
import warnings

#We can provide more warning notices...
caution = False

for ix, cells in enumerate(parts):
    part_fn = fn.replace('.ipynb','_PART {}.ipynb'.format(ix))
    if os.path.isfile(part_fn) and not overwrite:
        raise Exception('File {} already exists. Set: overwrite=True. Exiting...'.format(part_fn))
    elif caution:
        warnings.warn("You were warned... The following file will be overwritten: {}".format(part_fn))

#If you want to go really defensive, put the next bit of code into it's own cell so you can react
# the file overwriting warning...
        
#For each part, write out a separate notebook containing just the cells in that part.
for ix, cells in enumerate(parts):
    part_fn = fn.replace('.ipynb','_PART {}.ipynb'.format(ix))   
    nb_out=nb.copy()
    nb_out['cells']=cells
    print('Writing {}...'.format(part_fn))
    nbformat.write(nb_out, part_fn)
Writing Notebook Splitter_PART 0.ipynb...
Writing Notebook Splitter_PART 1.ipynb...
Writing Notebook Splitter_PART 2.ipynb...