jupyter-diff

From Jupyter Lab Open Studio! (at Bloomberg HQ)

Kunal Marwaha and Saul Shanabrook // June 22 2019

This is an experiment in using version control for Jupyter notebooks, using the tool notedown to automatically store notebooks in Markdown.

In [2]:
print("I ❤ Jupyter!")
I ❤ Jupyter!

Why

Jupyter is a great tool for creating notebooks (.ipynb files). With notebooks, one can explore new computational ideas and share narratives with code.

Ideas can change, and many turn to a form of version control to track changes.

PhD Comics by Jorge Cham

However, the underlying format of notebooks is JSON. It is relatively difficult to see changes to JSON objects in classic tools like diff. There is still not consensus on how to put Jupyter notebooks in version control. See here, here, and here, with varied solutions like nbdime, nbstripout, or jupytext.

Here, I will use notedown to work with .md files in Jupyter Lab, and use git for version control.

In [5]:
bool("Markdown") and bool("Jupyter")
Out[5]:
True

Setup

First, boot up Jupyter Lab.

Let's install notedown directly from the notebook:

In [3]:
import sys
!{sys.executable} -m pip install numpy
Requirement already satisfied: numpy in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (1.15.1)
You are using pip version 19.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.


Now, we'll add the below line to your Jupyter settings:

In [5]:
LINE_TO_ADD = "c.NotebookApp.contents_manager_class = 'notedown.NotedownContentsManager'"
In [8]:
%store LINE_TO_ADD >> ~/.jupyter/jupyter_notebook_config.py
Writing 'LINE_TO_ADD' (str) to file '/Users/kmarwaha/.jupyter/jupyter_notebook_config.py'.


If you're following along, you can relaunch Jupyter, and create a new Markdown file. Once you create the file, right-click and Open With -> Notebook. "Open With" image

Issues

Not everything is perfect. Here are the issues I've seen:

  1. Links wrap, and occasionally get spaces added to them. This is from 80-character line limits in Markdown format. Some links will be broken because of this. One partial solution is to reference links (see examples [here][mdcheat]). You can also use URL shortening services like http://tiny.cc.

  2. Sometimes, Markdown cells get pasted together. It can be easier to edit Markdown cells in smaller chunks. To separate them, I sometimes add little code snippets.

  3. GitHub doesn't render the markdown as a notebook! Maybe if more people use it, they can try it out. As a compromise, I converted to notebook-output.ipynb for you to read online :) This was done with notedown notebook.md > notebook-output.ipynb

[mdcheat]: https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet

In [26]:
for i in range(3):
    print("~"*9 + (":-)" if i==1 else "~~~") + "~"*9)
~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~:-)~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~

Sample project

Let's look at some GeoJSON data in San Francisco.

In [57]:
import json
data = None
with open('san-francisco.geojson') as json_file:  
    data = json.load(json_file)

We can see a list of neighborhoods (at least, according to this dataset):

In [58]:
## get a list of neighborhoods
for i in range(len(data['features'])):
    print(data['features'][i]['properties']['name'])
Seacliff
Marina
Pacific Heights
Nob Hill
Presidio Heights
Downtown/Civic Center
Excelsior
Bernal Heights
Western Addition
Chinatown
North Beach
Haight Ashbury
Outer Mission
Crocker Amazon
West of Twin Peaks
South of Market
Potrero Hill
Inner Richmond
Bayview
Noe Valley
Inner Sunset
Diamond Heights
Lakeshore
Russian Hill
Treasure Island/YBI
Twin Peaks
Outer Richmond
Visitacion Valley
Golden Gate Park
Parkside
Financial District
Ocean View
Mission
Presidio
Castro/Upper Market
Outer Sunset
Glen Park

Now let's install GeoPandas!

In [75]:
import sys
!{sys.executable} -m pip install geopandas descartes
Requirement already satisfied: geopandas in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (0.5.0)
Requirement already satisfied: descartes in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (1.1.0)
Requirement already satisfied: pyproj in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (2.2.0)
Requirement already satisfied: fiona in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (1.8.6)
Requirement already satisfied: pandas in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (0.24.1)
Requirement already satisfied: shapely in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from geopandas) (1.6.4.post2)
Requirement already satisfied: matplotlib in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from descartes) (2.2.3)
Requirement already satisfied: aenum; python_version < "3.6" in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pyproj->geopandas) (2.1.2)
Requirement already satisfied: attrs>=17 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (18.1.0)
Requirement already satisfied: click-plugins>=1.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (1.1.1)
Requirement already satisfied: click<8,>=4.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (7.0)
Requirement already satisfied: munch in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (2.3.2)
Requirement already satisfied: six>=1.7 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (1.11.0)
Requirement already satisfied: cligj>=0.5 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from fiona->geopandas) (0.5.0)
Requirement already satisfied: python-dateutil>=2.5.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pandas->geopandas) (2.7.3)
Requirement already satisfied: numpy>=1.12.0 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pandas->geopandas) (1.15.1)
Requirement already satisfied: pytz>=2011k in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from pandas->geopandas) (2018.5)
Requirement already satisfied: cycler>=0.10 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from matplotlib->descartes) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from matplotlib->descartes) (2.2.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from matplotlib->descartes) (1.0.1)
Requirement already satisfied: setuptools in /Users/kmarwaha/miniconda3/lib/python3.5/site-packages (from kiwisolver>=1.0.1->matplotlib->descartes) (40.2.0)
You are using pip version 19.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

Ok! Let's use it!

In [79]:
import descartes
import geopandas as gpd
import matplotlib.pyplot as plt
file = gpd.read_file('san-francisco.geojson')
file.head()
Out[79]:
name cartodb_id created_at updated_at geometry
0 Seacliff 1 2013-02-10T05:44:04 2013-02-10T05:44:04 (POLYGON ((-122.484089 37.78791, -122.484346 3...
1 Marina 20 2013-02-10T05:44:04 2013-02-10T05:44:04 (POLYGON ((-122.446806 37.805401, -122.44678 3...
2 Pacific Heights 23 2013-02-10T05:44:04 2013-02-10T05:44:04 (POLYGON ((-122.446825 37.787251, -122.447228 ...
3 Nob Hill 25 2013-02-10T05:44:04 2013-02-10T05:44:04 (POLYGON ((-122.418609 37.78891, -122.421954 3...
4 Presidio Heights 29 2013-02-10T05:44:04 2013-02-10T05:44:04 (POLYGON ((-122.4626 37.789041, -122.460923 37...

We can also see nice maps :-)

In [82]:
ax = file.plot(color='blue')

Does it work?

You can decide for yourself. Explore the diffs on GitHub, for example here or here.

In my opinion, this is still a prototype, but a cool possibility for using Jupyter and Git together.

Thanks!

Check out the latest Jupyter Lab interface for working with notebooks in a friendly, responsive way.