This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

3.2. Converting an IPython notebook to other formats with nbconvert

You need pandoc, a LateX distribution, and the Notebook dataset on the book's website. On Windows, you also need pywin32 (conda install pywin32 if you use Anaconda).

  1. Let's open the test notebook in the data folder. A notebook is just a plain text file (JSON), so we open it in text mode (r mode).
In [ ]:
with open('data/test.ipynb', 'r') as f:
    contents = f.read()
print(len(contents))
In [ ]:
print(contents[:345] + '...' + contents[-33:])
  1. Now that we have loaded the notebook as a string, let's parse it with the json module.
In [ ]:
import json
nb = json.loads(contents)
  1. Let's have a look at the keys in the notebook dictionary.
In [ ]:
print(nb.keys())
print('nbformat ' + str(nb['nbformat']) + 
      '.' + str(nb['nbformat_minor']))

The version of the notebook format is indicated in nbformat and nbformat_minor.

  1. The main field is worksheets: there is only one by default. A worksheet contains a list of cells, and some metadata.
In [ ]:
nb['worksheets'][0].keys()
  1. Each cell has a type, optional metadata, some contents (text or code), possibly one or several outputs, and other information. Let's look at a Markdown cell and a code cell.
In [ ]:
nb['worksheets'][0]['cells'][1]
In [ ]:
nb['worksheets'][0]['cells'][2]
  1. Once parsed, the notebook is represented as a Python dictionary. Manipulating it is therefore quite convenient in Python. Here, we count the number of Markdown and code cells.
In [ ]:
cells = nb['worksheets'][0]['cells']
nm = len([cell for cell in cells
          if cell['cell_type'] == 'markdown'])
nc = len([cell for cell in cells
          if cell['cell_type'] == 'code'])
print(("There are {nm} Markdown cells and "
       "{nc} code cells.").format(
        nm=nm, nc=nc))
  1. Let's have a closer look at the image output of the cell with the matplotlib figure.
In [ ]:
png = cells[2]['outputs'][0]['png']
cells[2]['outputs'][0]['png'] = png[:20] + '...' + png[-20:]
cells[2]['outputs'][0]

In general, there can be zero, one, or multiple outputs. Besides, each output can have multiple representations. Here, the matplotlib figure has a PNG representation (the base64-encoded image) and a text representation (the internal representation of the figure).

  1. Now, we are going to use nbconvert to convert our text notebook to other formats. This tool can be used from the command-line (if you are using IPython < 4.x, replace command jupyter with ipython). Here, we convert the notebook to an HTML document.
In [ ]:
!jupyter nbconvert --to html data/test.ipynb
  1. Let's display this document in an <iframe> (a small window showing an external HTML document within the notebook).
In [ ]:
from IPython.display import IFrame
IFrame('test.html', 600, 200)
  1. We can also convert the notebook to LaTeX and PDF. In order to specify the title and author of the document, we need to extend the default LaTeX template. First, we create a file mytemplate.tplx that extends the default article.tplx template provided by nbconvert. We precise the contents of the author and title blocks here.
In [ ]:
%%writefile mytemplate.tplx
((*- extends 'article.tplx' -*))

((* block author *))
\author{Cyrille Rossant}
((* endblock author *))

((* block title *))
\title{My document}
((* endblock title *))
  1. Then, we can run nbconvert by specifying our custom template.
In [ ]:
!jupyter nbconvert --to latex --template mytemplate data/test.ipynb
!pdflatex test.tex

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).