Project Jupyter (including IPython) is transforming interactive development and data exploration across multiple industries.
It is used in:
Jupyter is a product of a long process of scientific programming tool development that began as an "enhanced inteactive shell" called IPython (mind the upper case "I") developed by a CU Boulder graduate student:
IPython creator Fernando Perez:
I started using Python in 2001 and liked the language, but its interactive prompt felt like a crippled toy compared to the systems mentioned above or to a Unix shell. When I found out about sys.displayhook, I realized that by putting in a callable object, I would be able to hold state and capture previous results for reuse. I then wrote a python startup file to provide these features and some other niceties such as loading Numeric and Gnuplot, giving me a 'mini-mathematica' in Python (femto- might be a better description, in fairness). Thus was my 'ipython-0.0.1' born, a mere 259 lines to be loaded as $PYTYHONSTARTUP.
IPython (Interactive Python) is an enhanced Python shell which provides a more robust and productive development environment for users. There are several key features that set it apart from the standard Python shell.
While Jupyter runs code in many programming languages, Python is a requirement (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook.
For new most users, we highly recommend installing Anaconda, which conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science.
We recommend using the Anaconda distribution to install Python and Jupyter, following the instructions on the download page. If it is not installed already, Jupyter notebooks can be installed via the conda
tool:
conda install notebook
Congratulations, you have installed Jupyter Notebook! To run the notebook:
jupyter notebook
As an existing Python user, you may wish to install Jupyter using Python’s package manager, pip
, instead of Anaconda.
pip install jupyter
Over time, the IPython project grew to include several components, including:
As each component has evolved, several had grown to the point that they warrented projects of their own. For example, pieces like the notebook and protocol are not even specific to Python. As the result, the IPython team created Project Jupyter, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook in which you are reading this text.
The HTML notebook that is part of the Jupyter project supports interactive data visualization and easy high-performance parallel computing.
Interface tour:
.ipynb
files¶Jupyter notebook files are simple JSON documents, containing text, source code, rich media output, and metadata. each segment of the document is stored in a cell.
Notebooks can be populated with cells of content. They all have the same structure in the JSON notebook file:
{
"cell_type" : "name",
"metadata" : {},
"source" : "single string or [list, of, strings]",
}
There are three types of cells:
Markdown cells are used for body-text, and contain markdown, as defined in GitHub-flavored markdown.
Code cells are the primary content of Jupyter notebooks. They contain source code in the language of the document’s associated kernel, and a list of outputs associated with executing that code.
Raw NBConvert cells contain content that should be included unmodified in nbconvert output. For example, this cell could include raw LaTeX for nbconvert to pdf via LaTeX. These are not rendered by the notebook.
The text you are reading now is a markdown cell. Here's a simple example calculation in a code cell:
import numpy as np
np.random.random(5)
array([0.3213146 , 0.28337549, 0.89938275, 0.15538227, 0.61234138])
The notebook user interface is modal. This means that the keyboard behaves differently depending upon the current mode of the notebook. A notebook has two modes: edit and command.
Edit mode is indicated by a green cell border and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor.
Command mode is indicated by a grey cell border. When in command mode, the structure of the notebook can be modified as a whole, but the text in individual cells cannot be changed. Most importantly, the keyboard is mapped to a set of shortcuts for efficiently performing notebook and cell actions. For example, pressing c
when in command mode, will copy the current cell; no modifier is needed.
The Jupyter architecture consists of four components:
Engine The IPython engine is a Python instance that accepts Python commands over a network connection. When multiple engines are started, parallel and distributed computing becomes possible. An important property of an IPython engine is that it blocks while user code is being executed.
Hub The hub keeps track of engine connections, schedulers, clients, as well as persist all task requests and results in a database for later use.
Schedulers All actions that can be performed on the engine go through a Scheduler. While the engines themselves block when user code is run, the schedulers hide that from the user to provide a fully asynchronous interface to a set of engines.
Client The primary object for connecting to a cluster.
(courtesy Min Ragan-Kelley)
This architecture is implemented using the ØMQ messaging library and the associated Python bindings in pyzmq
.
The notebook lets you document your workflow using either HTML or Markdown.
The Jupyter Notebook consists of two related components:
Starting the Notebook server with the command:
$ ipython notebook
initiates an iPython engine, which is a Python instance that takes Python commands over a network connection.
The IPython controller provides an interface for working with a set of engines, to which one or more iPython clients can connect.
The Notebook gives you everything that a browser gives you. For example, you can embed images, videos, or entire websites.
from IPython.display import IFrame
IFrame('https://jupyter.org', width='100%', height=350)
from IPython.display import YouTubeVideo
YouTubeVideo("Rc4JQWowG5I")
In IPython, all your inputs and outputs are saved. There are two variables named In
and Out
which are assigned as you work with your results. All outputs are saved automatically to variables of the form _N
, where N
is the prompt number, and inputs to _iN
. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable.
np.sin(4)**2
0.5727500169043067
_4
0.5727500169043067
exec(_i4)
_4 / 4.
0.14318750422607668
If you want details regarding the properties and functionality of any Python objects currently loaded into IPython, you can use the ?
to reveal any details that are available:
some_dict = {}
some_dict?
Type: dict String form: {} Length: 0 Docstring: dict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object's (key, value) pairs dict(iterable) -> new dictionary initialized as if via: d = {} for k, v in iterable: d[k] = v dict(**kwargs) -> new dictionary initialized with the name=value pairs in the keyword argument list. For example: dict(one=1, two=2)
If available, additional detail is provided with two question marks, including the source code of the object itself.
from numpy.linalg import cholesky
cholesky
<function numpy.linalg.linalg.cholesky(a)>
This syntax can also be used to search namespaces with wildcards (*
).
np.random.rand*?
np.random.rand np.random.randint np.random.randn np.random.random np.random.random_integers np.random.random_sample
Because IPython allows for introspection, it is able to afford the user the ability to tab-complete commands that have been partially typed. This is done by pressing the <tab>
key at any point during the process of typing a command:
np.arccos
<ufunc 'arccos'>
In IPython, you can type ls
to see your files or cd
to change directories, just like you would at a regular system prompt:
ls /Users/fonnescj/Dropbox/Notes/
ls: cannot access '/Users/fonnescj/Dropbox/Notes/': No such file or directory
Virtually any system command can be accessed by prepending !
, which passes any subsequent command directly to the OS.
!locate python | grep pdf
/bin/sh: 1: locate: not found
You can even use Python variables in commands sent to the OS:
file_type = 'png'
!ls images/*$file_type
ls: cannot access 'images/*png': No such file or directory
The output of a system command using the exclamation point syntax can be assigned to a Python variable.
data_files = !ls images/
data_files
["ls: cannot access 'images/': No such file or directory"]
%qtconsole
If you type at the system prompt:
$ ipython qtconsole
instead of opening in a terminal, IPython will start a graphical console that at first sight appears just like a terminal, but which is in fact much more capable than a text-only terminal. This is a specialized terminal designed for interactive scientific work, and it supports full multi-line editing with color highlighting and graphical calltips for functions, it can keep multiple IPython sessions open simultaneously in tabs, and when scripts run it can display the figures inline directly in the work area.
Any plots generated by the Python kernel can be redirected to appear inline, in an output cell. Here is an example using Matplotlib:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
def f(x):
return (x-3)*(x-5)*(x-7)+85
import numpy as np
x = np.linspace(0, 10, 200)
y = f(x)
plt.plot(x,y)
[<matplotlib.lines.Line2D at 0x78693282af28>]
Matplotlib will be your workhorse for creating plots in notebooks. But it's not the only game in town! A recent new player is Bokeh, a visualization library to make amazing interactive plots and share them online. It can also handle very large data sets with excellent performance.
If you installed Anaconda in your system, you will probably already have Bokeh. You can check if it's there by running the conda list
command. If you installed Miniconda, you will need to install it with conda install bokeh
.
After installing Bokeh, we have many modules available: bokeh.plotting
gives you the ability to create interactive figures with zoom, pan, resize, save, and other tools.
from bokeh import plotting as bplotting
Bokeh integrates with Jupyter notebooks by calling the output function, as follows:
import numpy
# Get an array of 100 evenly spaced points from 0 to 2*pi
x = np.linspace(0.0, 2.0 * numpy.pi, 100)
# Make a pointwise function of x with exp(sin(x))
y = np.exp(np.sin(x))
deriv_exact = y * np.cos(x) # analytical derivative
def forward_diff(y, x):
"""Compute derivative by forward differencing."""
# Use numpy.empty to make an empty array to put our derivatives in
deriv = numpy.empty(y.size - 1)
# Use a for-loop to go through each point and compute the derivative.
for i in range(deriv.size):
deriv[i] = (y[i+1] - y[i]) / (x[i+1] - x[i])
# Return the derivative (a NumPy array)
return deriv
# Call the function to perform finite differencing
deriv = forward_diff(y, x)
# create a new Bokeh plot with axis labels, name it "bop"
bop = bplotting.figure(x_axis_label='x', y_axis_label='dy/dx')
# add a title, change the font
bop.title.text = "Derivative of exp(sin(x))"
bop.title.text_font = "palatino"
# add a line with legend and line thickness to "bop"
bop.line(x, deriv_exact, legend="analytical", line_width=2)
# add circle markers with legend, specify color
bop.circle((x[1:] + x[:-1]) / 2.0, deriv, legend="numerical", fill_color="gray", size=8, line_color=None)
bop.grid.grid_line_alpha=0.3
bplotting.show(bop);
Markdown is a simple markup language that allows plain text to be converted into HTML.
The advantages of using Markdown over HTML (and LaTeX):
For example, instead of writing:
<p>In order to create valid
<a href="http://en.wikipedia.org/wiki/HTML">HTML</a>, you
need properly coded syntax that can be cumbersome for
“non-programmers” to write. Sometimes, you
just want to easily make certain words <strong>bold
</strong>, and certain words <em>italicized</em> without
having to remember the syntax. Additionally, for example,
creating lists:</p>
<ul>
<li>should be easy</li>
<li>should not involve programming</li>
</ul>
we can write the following in Markdown:
In order to create valid [HTML], you need properly
coded syntax that can be cumbersome for
"non-programmers" to write. Sometimes, you just want
to easily make certain words **bold**, and certain
words *italicized* without having to remember the
syntax. Additionally, for example, creating lists:
* should be easy
* should not involve programming
Markdown uses *
(asterisk) and _
(underscore) characters as
indicators of emphasis.
*italic*, _italic_
**bold**, __bold__
***bold-italic***, ___bold-italic___
italic, italic
bold, bold
*bold-italic*, ___bold-italic___
Markdown supports both unordered and ordered lists. Unordered lists can use *
, -
, or
+
to define a list. This is an unordered list:
* Apples
* Bananas
* Oranges
Ordered lists are numbered lists in plain text:
1. Bryan Ferry
2. Brian Eno
3. Andy Mackay
4. Paul Thompson
5. Phil Manzanera
Markdown inline links are equivalent to HTML <a href='foo.com'>
links, they just have a different syntax.
[Biostatistics home page](http://biostat.mc.vanderbilt.edu "Visit Biostat!")
Block quotes are denoted by a >
(greater than) character
before each line of the block quote.
> Sometimes a simple model will outperform a more complex model . . .
> Nevertheless, I believe that deliberately limiting the complexity
> of the model is not fruitful when the problem is evidently complex.
Sometimes a simple model will outperform a more complex model . . . Nevertheless, I believe that deliberately limiting the complexity of the model is not fruitful when the problem is evidently complex.
Images look an awful lot like Markdown links, they just have an extra
!
(exclamation mark) in front of them.
![Python logo](images/python-logo-master-v3-TM.png)
Use %load
to add remote code
# %load http://matplotlib.org/mpl_examples/statistics/boxplot_demo.py
"""
=========================================
Demo of artist customization in box plots
=========================================
This example demonstrates how to use the various kwargs
to fully customize box plots. The first figure demonstrates
how to remove and add individual components (note that the
mean is the only value not shown by default). The second
figure demonstrates how the styles of the artists can
be customized. It also demonstrates how to set the limit
of the whiskers to specific percentiles (lower right axes)
A good general reference on boxplots and their history can be found
here: http://vita.had.co.nz/papers/boxplots.pdf
"""
import numpy as np
import matplotlib.pyplot as plt
# fake data
np.random.seed(937)
data = np.random.lognormal(size=(37, 4), mean=1.5, sigma=1.75)
labels = list('ABCD')
fs = 10 # fontsize
# demonstrate how to toggle the display of different elements:
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True)
axes[0, 0].boxplot(data, labels=labels)
axes[0, 0].set_title('Default', fontsize=fs)
axes[0, 1].boxplot(data, labels=labels, showmeans=True)
axes[0, 1].set_title('showmeans=True', fontsize=fs)
axes[0, 2].boxplot(data, labels=labels, showmeans=True, meanline=True)
axes[0, 2].set_title('showmeans=True,\nmeanline=True', fontsize=fs)
axes[1, 0].boxplot(data, labels=labels, showbox=False, showcaps=False)
tufte_title = 'Tufte Style \n(showbox=False,\nshowcaps=False)'
axes[1, 0].set_title(tufte_title, fontsize=fs)
axes[1, 1].boxplot(data, labels=labels, notch=True, bootstrap=10000)
axes[1, 1].set_title('notch=True,\nbootstrap=10000', fontsize=fs)
axes[1, 2].boxplot(data, labels=labels, showfliers=False)
axes[1, 2].set_title('showfliers=False', fontsize=fs)
for ax in axes.flatten():
ax.set_yscale('log')
ax.set_yticklabels([])
fig.subplots_adjust(hspace=0.4)
plt.show()
# demonstrate how to customize the display different elements:
boxprops = dict(linestyle='--', linewidth=3, color='darkgoldenrod')
flierprops = dict(marker='o', markerfacecolor='green', markersize=12,
linestyle='none')
medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick')
meanpointprops = dict(marker='D', markeredgecolor='black',
markerfacecolor='firebrick')
meanlineprops = dict(linestyle='--', linewidth=2.5, color='purple')
fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True)
axes[0, 0].boxplot(data, boxprops=boxprops)
axes[0, 0].set_title('Custom boxprops', fontsize=fs)
axes[0, 1].boxplot(data, flierprops=flierprops, medianprops=medianprops)
axes[0, 1].set_title('Custom medianprops\nand flierprops', fontsize=fs)
axes[0, 2].boxplot(data, whis='range')
axes[0, 2].set_title('whis="range"', fontsize=fs)
axes[1, 0].boxplot(data, meanprops=meanpointprops, meanline=False,
showmeans=True)
axes[1, 0].set_title('Custom mean\nas point', fontsize=fs)
axes[1, 1].boxplot(data, meanprops=meanlineprops, meanline=True,
showmeans=True)
axes[1, 1].set_title('Custom mean\nas line', fontsize=fs)
axes[1, 2].boxplot(data, whis=[15, 85])
axes[1, 2].set_title('whis=[15, 85]\n#percentiles', fontsize=fs)
for ax in axes.flatten():
ax.set_yscale('log')
ax.set_yticklabels([])
fig.suptitle("I never said they'd be pretty")
fig.subplots_adjust(hspace=0.4)
plt.show()
# %load http://matplotlib.org/mpl_examples/shapes_and_collections/scatter_demo.py
"""
Simple demo of a scatter plot.
"""
import numpy as np
import matplotlib.pyplot as plt
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2 # 0 to 15 point radii
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()
Mathjax ia a javascript implementation $\alpha$ of LaTeX that allows equations to be embedded into HTML. For example, this markup:
"""$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$"""
becomes this:
$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$Code cells can be displayed as LaTeX
from IPython.display import Latex
Latex(r"""\begin{eqnarray}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0
\end{eqnarray}""")
SymPy is a Python library for symbolic mathematics. It supports:
from sympy import *
init_printing()
x, y = symbols("x y")
eq = ((x+y)**2 * (x+1))
eq
expand(eq)
(1/cos(x)).series(x, 0, 6)
limit((sin(x)-x)/x**3, x, 0)
diff(cos(x**2)**2 / (1+x), x)
IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax. These include:
%run
%edit
%debug
%timeit
%paste
%load_ext
%lsmagic
Available line magics: %alias %alias_magic %autoawait %autocall %automagic %autosave %bookmark %cat %cd %clear %colors %config %connect_info %cp %debug %dhist %dirs %doctest_mode %ed %edit %env %gui %hist %history %killbgscripts %ldir %less %lf %lk %ll %load %load_ext %loadpy %logoff %logon %logstart %logstate %logstop %ls %lsmagic %lx %macro %magic %man %matplotlib %mkdir %more %mv %notebook %page %pastebin %pdb %pdef %pdoc %pfile %pinfo %pinfo2 %popd %pprint %precision %prun %psearch %psource %pushd %pwd %pycat %pylab %qtconsole %quickref %recall %rehashx %reload_ext %rep %rerun %reset %reset_selective %rm %rmdir %run %save %sc %set_env %store %sx %system %tb %time %timeit %unalias %unload_ext %who %who_ls %whos %xdel %xmode Available cell magics: %%! %%HTML %%SVG %%bash %%capture %%debug %%file %%html %%javascript %%js %%latex %%markdown %%perl %%prun %%pypy %%python %%python2 %%python3 %%ruby %%script %%sh %%svg %%sx %%system %%time %%timeit %%writefile Automagic is ON, % prefix IS NOT needed for line magics.
Timing the execution of code; the timeit
magic exists both in line and cell form:
%timeit np.linalg.eigvals(np.random.rand(100,100))
26.2 ms ± 1.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit a = np.random.rand(100, 100)
np.linalg.eigvals(a)
30.9 ms ± 2.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
IPython also creates aliases for a few common interpreters, such as bash, ruby, perl, etc.
These are all equivalent to %%script <name>
%%ruby
puts "Hello from Ruby #{RUBY_VERSION}"
Hello from Ruby 2.3.3
%%bash
echo "hello from $BASH"
hello from /bin/bash
IPython has an rmagic
extension that contains a some magic functions for working with R via rpy2. This extension can be loaded using the %load_ext
magic as follows:
%load_ext rpy2.ipython
If the above generates an error, it is likely that you do not have the rpy2
module installed. You can install this now via:
!pip install rpy2
Requirement already satisfied: rpy2 in ./anaconda3/envs/dev/lib/python3.6/site-packages (2.9.4) Requirement already satisfied: six in ./anaconda3/envs/dev/lib/python3.6/site-packages (from rpy2) (1.12.0) Requirement already satisfied: jinja2 in ./anaconda3/envs/dev/lib/python3.6/site-packages (from rpy2) (2.10) Requirement already satisfied: MarkupSafe>=0.23 in ./anaconda3/envs/dev/lib/python3.6/site-packages (from jinja2->rpy2) (1.1.0)
x,y = np.arange(10), np.random.normal(size=10)
%R print(lm(rnorm(10)~rnorm(10)))
Call: lm(formula = rnorm(10) ~ rnorm(10)) Coefficients: (Intercept) -0.6027
%%R -i x,y -o XYcoef
lm.fit <- lm(y~x)
par(mfrow=c(2,2))
print(summary(lm.fit))
plot(lm.fit)
XYcoef <- coef(lm.fit)
Call: lm(formula = y ~ x) Residuals: Min 1Q Median 3Q Max -0.6616 -0.4986 -0.1731 0.5322 1.1252 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.95079 0.40765 2.332 0.048 * x -0.10710 0.07636 -1.403 0.198 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6936 on 8 degrees of freedom Multiple R-squared: 0.1974, Adjusted R-squared: 0.09703 F-statistic: 1.967 on 1 and 8 DF, p-value: 0.1983
XYcoef
0.950793 | -0.107098 |
As we have seen already, IPython has cell and line magics. You can define your own magics using any Python function and the register_magic_function
method:
from IPython.core.magic import (register_line_magic, register_cell_magic,
register_line_cell_magic)
@register_line_magic
def countdown(line):
"""Print a countdown before executing cell"""
import time
for i in range(int(line)):
time.sleep(1)
print(i+1, '... ', end='')
print('Go!')
%countdown 5
np.random.random(5)
1 ... 2 ... 3 ... 4 ... 5 ... Go!
array([0.87612941, 0.01246525, 0.76656015, 0.73752183, 0.33195304])
Its very easy to run Jupyter on top of languages other than Python, using a language kernel that is either native or wrapped in IPython. Many kernels already exist.
The exact procedure to install a kernel for a different language will depend on the specificity of each language. Though ther is a common set of step to follow.
While usually a kernel is though as a specific language, a kernel may be:
Installing multiple kernels does not automatically allow one notebook to use many languages at once, but this is also possible.
The %debug
magic can be used to trigger the IPython debugger (ipd
) for a cell that raises an exception. The debugger allows you to step through code line-by-line and inspect variables and execute code.
def div(x, y):
return x/y
div(1,0)
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-48-a6dbe3c69c60> in <module> 2 return x/y 3 ----> 4 div(1,0) <ipython-input-48-a6dbe3c69c60> in div(x, y) 1 def div(x, y): ----> 2 return x/y 3 4 div(1,0) ZeroDivisionError: division by zero
%debug
> <ipython-input-48-a6dbe3c69c60>(2)div() 1 def div(x, y): ----> 2 return x/y 3 4 div(1,0)
In Jupyter, one can convert an .ipynb
notebook document file into various static formats via the nbconvert
tool. Currently, nbconvert is a command line tool, run as a script using Jupyter.
!jupyter nbconvert --to html "IPython and Jupyter.ipynb"
[NbConvertApp] Converting notebook IPython and Jupyter.ipynb to html [NbConvertApp] Writing 551579 bytes to IPython and Jupyter.html
Currently, nbconvert
supports HTML (default), LaTeX, Markdown, reStructuredText, Python and HTML5 slides for presentations. Some types can be post-processed, such as LaTeX to PDF (this requires Pandoc to be installed, however).
!jupyter nbconvert --to pdf "IPython and Jupyter.ipynb"
A very useful online service is the IPython Notebook Viewer which allows you to display your notebook as a static HTML page, which is useful for sharing with others:
IFrame("http://nbviewer.ipython.org/2352771", width='100%', height=350)
As of this year, GitHub supports the rendering of Jupyter Notebooks stored on its repositories.
reproducing conclusions from a single experiment based on the measurements from that experiment
The most basic form of reproducibility is a complete description of the data and associated analyses (including code!) so the results can be exactly reproduced by others.
Reproducing calculations can be onerous, even with one's own work!
Scientific data are becoming larger and more complex, making simple descriptions inadequate for reproducibility. As a result, most modern research is irreproducible without tremendous effort.
*** Reproducible research is not yet part of the culture of science in general, or scientific computing in particular. ***
There are a number of steps to scientific endeavors that involve computing:
Many of the standard tools impose barriers between one or more of these steps. This can make it difficult to iterate, reproduce work.
The Jupyter notebook eliminates or reduces these barriers to reproducibility.
Since a Jupyter Notebook is just a JSON file, it can be readily added to a version control system, such as Git. However, JSON is not inherently conducive resolving merge conflicts or other tasks that involve comparing different versions of the file.
nbdime
provides tools for diffing and merging of notebooks.
nbdiff
compare notebooks in a terminal-friendly waynbmerge
three-way merge of notebooks with automatic conflict resolutionnbdiff-web
shows you a rich rendered diff of notebooksnbmerge-web
gives you a web-based three-way merge tool for notebooksnbshow
present a single notebook in a terminal-friendly wayThere are a growing number of services for hosting and running notebooks remotely.
Have a repository full of Jupyter notebooks? With Binder, you can add a badge that opens those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere.
My PyCon 2017 tutorial in mybinder
If you followed the instructions above, you launched the Jupyter Notebook App on a cloud service called tmpnb
. As the prefix tmp
indicates, it gives you a temporary demo: as soon as you close the browser tab (or after a few minutes of inactivity), the kernel dies and the content you wrote is not saved anywhere. This is a free service, sponsored by the company Rackspace, j
In June 2016, Microsoft announced notebooks hosted on Azure cloud. You need to create a free account to be able to save your work.
Though you can set up your own Amazon EC2 instances and run notebooks, Domino Data lab does most of the work in setting up and provisioning Amazon services with Jupyter notebooks (and other cloud software) ready to launch. You can choose the compute size, and pay on a sliding scale.
In addition to this, you can host your own temporary notebooks using tmpnb, which uses Docker images to run notebooks on-demand, that automatically shut down when they are idle.
IPython Notebook Viewer Displays static HTML versions of notebooks, and includes a gallery of notebook examples.
A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data A landmark example of reproducible research in genomics: Git repo, iPython notebook, data and scripts.
Jacques Ravel and K Eric Wommack. 2014. All Hail Reproducibility in Microbiome Research. Microbiome, 2:8.
Benjamin Ragan-Kelley et al.. 2013. Collaborative cloud-enabled tools allow rapid, reproducible biological insights. The ISME Journal, 7, 461–464; doi:10.1038/ismej.2012.123;