#!/usr/bin/env python # coding: utf-8 # # IPython and Jupyter # # ![](http://jupyter.org/assets/nav_logo.svg) # # Project Jupyter (including IPython) is transforming interactive development and data exploration across multiple industries. # # It is used in: # # - educational materials at Software and Data Carpentries # - courses at many universities including: Bryn Mawr College, Cal Poly, Clemson University, George Washington University, Michigan State University, Vanderbilt University, New York University, Northwestern University, UC Berkeley, and University of Sheffield. # - GitHub has made a built-in support for rendering notebooks # - news organizations are writing data-driven news articles using Jupyter # - traditional published books have been written using notebooks, such as at O'Reilly Media # - Google is using it in their cloud data lab # - Brookhaven National Laboratory uses Jupyter in analysis of data from scientific instruments # - Netflix uses Jupyter internally # - NASA uses it for data analysis # - IBM uses it in their Data Science experience # - Quantopian has hosted notebooks for financial modeling # - Lawrence Berkeley National Laboratory uses Jupyter in multiple scientific projects # - It is also in use at the NBA, Bloomberg, and Microsoft. # # # ### A brief history of interactive scientific computing in Python # # Jupyter is a product of a long process of scientific programming tool development that began as an "enhanced inteactive shell" called IPython (mind the upper case "I") developed by a CU Boulder graduate student: # # IPython creator Fernando Perez: # # > # I started using Python in 2001 and liked the language, but its interactive prompt felt like a crippled toy compared to the systems mentioned above or to a Unix shell. When I found out about sys.displayhook, I realized that by putting in a callable object, I would be able to hold state and capture previous results for reuse. I then wrote a python startup file to provide these features and some other niceties such as loading Numeric and Gnuplot, giving me a 'mini-mathematica' in Python (femto- might be a better description, in fairness). Thus was my 'ipython-0.0.1' born, a mere 259 lines to be loaded as $PYTYHONSTARTUP. # # **IPython** (Interactive Python) is an enhanced Python shell which provides a more robust and productive development environment for users. There are several key features that set it apart from the standard Python shell. # # * Interactive data analysis and visualization # * Python kernel for Jupyter notebooks # * Easy parallel computation # * Flexible, embeddable interpreters to load into your own projects # ## Installation # # While Jupyter runs code in many programming languages, **Python is a requirement** (Python 3.3 or greater, or Python 2.7) for installing the Jupyter Notebook. # # For ~~new~~ *most* users, we highly recommend installing Anaconda, which conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and data science. # # We recommend using the Anaconda distribution to install Python and Jupyter, following the instructions on the download page. If it is not installed already, Jupyter notebooks can be installed via the `conda` tool: # # conda install notebook # # Congratulations, you have installed Jupyter Notebook! To run the notebook: # # jupyter notebook # # As an existing Python user, you may wish to install Jupyter using Python’s package manager, `pip`, instead of Anaconda. # # pip install jupyter # # Jupyter Notebook # # Over time, the IPython project grew to include several components, including: # # * an interactive shell # * a REPL protocol # * a notebook document fromat # * a notebook document conversion tool # * a web-based notebook authoring tool # * tools for building interactive UI (widgets) # * interactive parallel Python # # As each component has evolved, several had grown to the point that they warrented projects of their own. For example, pieces like the notebook and protocol are not even specific to Python. As the result, the IPython team created Project Jupyter, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook in which you are reading this text. # # The HTML notebook that is part of the Jupyter project supports **interactive data visualization** and easy high-performance **parallel computing**. # # ## The Basics # # Interface tour: # # - Home page and tabs # - Starting a new kernel # - Menubar and toolbar # # ### `.ipynb` files # # Jupyter notebook files are simple JSON documents, containing text, source code, rich media output, and metadata. each segment of the document is stored in a cell. # # ### Notebook cells # # Notebooks can be populated with **cells** of content. They all have the same structure in the JSON notebook file: # # ```json # { # "cell_type" : "name", # "metadata" : {}, # "source" : "single string or [list, of, strings]", # } # ``` # # # There are three types of cells: # # **Markdown cells** are used for body-text, and contain markdown, as defined in GitHub-flavored markdown. # # **Code cells** are the primary content of Jupyter notebooks. They contain source code in the language of the document’s associated kernel, and a list of outputs associated with executing that code. # # **Raw NBConvert cells** contain content that should be included unmodified in nbconvert output. For example, this cell could include raw LaTeX for nbconvert to pdf via LaTeX. These are not rendered by the notebook. # # The text you are reading now is a markdown cell. Here's a simple example calculation in a code cell: # In[1]: import numpy as np np.random.random(5) # ### Modality # # The notebook user interface is *modal*. This means that the keyboard behaves differently depending upon the current mode of the notebook. A notebook has two modes: **edit** and **command**. # # **Edit mode** is indicated by a green cell border and a prompt showing in the editor area. When a cell is in edit mode, you can type into the cell, like a normal text editor. # # ![](images/edit_mode.png) # # **Command mode** is indicated by a grey cell border. When in command mode, the structure of the notebook can be modified as a whole, but the text in individual cells cannot be changed. Most importantly, the keyboard is mapped to a set of shortcuts for efficiently performing notebook and cell actions. For example, pressing **`c`** when in command mode, will copy the current cell; no modifier is needed. # # ![](images/command_mode.png) # #
#
# Enter edit mode by pressing `Enter` or using the mouse to click on a cell's editor area. #
#
# Enter command mode by pressing `Esc` or using the mouse to click *outside* a cell's editor area. #
#
# Do not attempt to type into a cell when in command mode; unexpected things will happen! #
# ## Jupyter Architecture # # The Jupyter architecture consists of four components: # # 1. **Engine** The IPython engine is a Python instance that accepts Python commands over a network connection. When multiple engines are started, parallel and distributed computing becomes possible. An important property of an IPython engine is that it blocks while user code is being executed. # # 2. **Hub** The hub keeps track of engine connections, schedulers, clients, as well as persist all task requests and results in a database for later use. # # 3. **Schedulers** All actions that can be performed on the engine go through a Scheduler. While the engines themselves block when user code is run, the schedulers hide that from the user to provide a fully asynchronous interface to a set of engines. # # 4. **Client** The primary object for connecting to a cluster. # # ![IPython architecture](images/ipython_architecture.png) # (courtesy Min Ragan-Kelley) # # This architecture is implemented using the ØMQ messaging library and the associated Python bindings in `pyzmq`. # The notebook lets you document your workflow using either HTML or Markdown. # # The Jupyter Notebook consists of two related components: # # * A JSON based Notebook document format for recording and distributing Python code and rich text. # * A web-based user interface for authoring and running notebook documents. # # Starting the Notebook server with the command: # # $ ipython notebook # # initiates an **iPython engine**, which is a Python instance that takes Python commands over a network connection. # # The **IPython controller** provides an interface for working with a set of engines, to which one or more **iPython clients** can connect. # # The Notebook gives you everything that a browser gives you. For example, you can embed images, videos, or entire websites. # In[2]: from IPython.display import IFrame IFrame('https://jupyter.org', width='100%', height=350) # In[3]: from IPython.display import YouTubeVideo YouTubeVideo("Rc4JQWowG5I") # ## Code Cells # # ### Command history # # In IPython, all your inputs and outputs are saved. There are two variables named `In` and `Out` which are assigned as you work with your results. All outputs are saved automatically to variables of the form `_N`, where `N` is the prompt number, and inputs to `_iN`. This allows you to recover quickly the result of a prior computation by referring to its number even if you forgot to store it as a variable. # In[4]: np.sin(4)**2 # In[5]: _4 # In[6]: exec(_i4) # In[7]: _4 / 4. # ### Introspection # # If you want details regarding the properties and functionality of any Python objects currently loaded into IPython, you can use the `?` to reveal any details that are available: # In[8]: some_dict = {} get_ipython().run_line_magic('pinfo', 'some_dict') # If available, additional detail is provided with two question marks, including the source code of the object itself. # In[9]: from numpy.linalg import cholesky cholesky # This syntax can also be used to search namespaces with wildcards (`*`). # In[10]: get_ipython().run_line_magic('psearch', 'np.random.rand*') # ### Tab completion # # Because IPython allows for introspection, it is able to afford the user the ability to tab-complete commands that have been partially typed. This is done by pressing the `` key at any point during the process of typing a command: # In[11]: np.arccos # ### System commands # # In IPython, you can type `ls` to see your files or `cd` to change directories, just like you would at a regular system prompt: # In[12]: ls /Users/fonnescj/Dropbox/Notes/ # Virtually any system command can be accessed by prepending `!`, which passes any subsequent command directly to the OS. # In[13]: get_ipython().system('locate python | grep pdf') # You can even use Python variables in commands sent to the OS: # In[14]: file_type = 'png' get_ipython().system('ls images/*$file_type') # The output of a system command using the exclamation point syntax can be assigned to a Python variable. # In[15]: data_files = get_ipython().getoutput('ls images/') # In[16]: data_files # In[19]: get_ipython().run_line_magic('qtconsole', '') # ## Qt Console # # If you type at the system prompt: # # $ ipython qtconsole # # instead of opening in a terminal, IPython will start a graphical console that at first sight appears just like a terminal, but which is in fact much more capable than a text-only terminal. This is a specialized terminal designed for interactive scientific work, and it supports full multi-line editing with color highlighting and graphical calltips for functions, it can keep multiple IPython sessions open simultaneously in tabs, and when scripts run it can display the figures inline directly in the work area. # # ![qtconsole](https://d.pr/i/ypFkU2+) # ## Interactive Plotting # # Any plots generated by the Python kernel can be redirected to appear **inline**, in an output cell. Here is an example using Matplotlib: # In[20]: get_ipython().run_line_magic('matplotlib', 'inline') import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') def f(x): return (x-3)*(x-5)*(x-7)+85 import numpy as np x = np.linspace(0, 10, 200) y = f(x) plt.plot(x,y) # ### Bokeh # # Matplotlib will be your workhorse for creating plots in notebooks. But it's not the only game in town! A recent new player is [**Bokeh**](http://nbviewer.jupyter.org/github/bokeh/bokeh-notebooks/blob/master/index.ipynb), a visualization library to make amazing interactive plots and share them online. It can also handle very large data sets with excellent performance. # # If you installed **Anaconda** in your system, you will probably already have **Bokeh**. You can check if it's there by running the `conda list` command. If you installed **Miniconda**, you will need to install it with `conda install bokeh`. # # After installing **Bokeh**, we have many modules available: [`bokeh.plotting`](http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh-plotting) gives you the ability to create interactive figures with zoom, pan, resize, save, and other tools. # In[21]: from bokeh import plotting as bplotting # **Bokeh** integrates with Jupyter notebooks by calling the output function, as follows: # In[22]: bplotting.output_notebook() # In[23]: import numpy # Get an array of 100 evenly spaced points from 0 to 2*pi x = np.linspace(0.0, 2.0 * numpy.pi, 100) # Make a pointwise function of x with exp(sin(x)) y = np.exp(np.sin(x)) deriv_exact = y * np.cos(x) # analytical derivative # In[24]: def forward_diff(y, x): """Compute derivative by forward differencing.""" # Use numpy.empty to make an empty array to put our derivatives in deriv = numpy.empty(y.size - 1) # Use a for-loop to go through each point and compute the derivative. for i in range(deriv.size): deriv[i] = (y[i+1] - y[i]) / (x[i+1] - x[i]) # Return the derivative (a NumPy array) return deriv # Call the function to perform finite differencing deriv = forward_diff(y, x) # In[25]: # create a new Bokeh plot with axis labels, name it "bop" bop = bplotting.figure(x_axis_label='x', y_axis_label='dy/dx') # add a title, change the font bop.title.text = "Derivative of exp(sin(x))" bop.title.text_font = "palatino" # add a line with legend and line thickness to "bop" bop.line(x, deriv_exact, legend="analytical", line_width=2) # add circle markers with legend, specify color bop.circle((x[1:] + x[:-1]) / 2.0, deriv, legend="numerical", fill_color="gray", size=8, line_color=None) bop.grid.grid_line_alpha=0.3 bplotting.show(bop); # ## Markdown cells # # Markdown is a simple *markup* language that allows plain text to be converted into HTML. # # The advantages of using Markdown over HTML (and LaTeX): # # - its a **human-readable** format # - allows writers to focus on content rather than formatting and layout # - easier to learn and use # # For example, instead of writing: # # ```html #

In order to create valid # HTML, you # need properly coded syntax that can be cumbersome for # “non-programmers” to write. Sometimes, you # just want to easily make certain words bold # , and certain words italicized without # having to remember the syntax. Additionally, for example, # creating lists:

# # ``` # # we can write the following in Markdown: # # ```markdown # In order to create valid [HTML], you need properly # coded syntax that can be cumbersome for # "non-programmers" to write. Sometimes, you just want # to easily make certain words **bold**, and certain # words *italicized* without having to remember the # syntax. Additionally, for example, creating lists: # # * should be easy # * should not involve programming # ``` # # ### Emphasis # # Markdown uses `*` (asterisk) and `_` (underscore) characters as # indicators of emphasis. # # *italic*, _italic_ # **bold**, __bold__ # ***bold-italic***, ___bold-italic___ # # *italic*, _italic_ # **bold**, __bold__ # ***bold-italic***, ___bold-italic___ # # ### Lists # # Markdown supports both unordered and ordered lists. Unordered lists can use `*`, `-`, or # `+` to define a list. This is an unordered list: # # * Apples # * Bananas # * Oranges # # * Apples # * Bananas # * Oranges # # Ordered lists are numbered lists in plain text: # # 1. Bryan Ferry # 2. Brian Eno # 3. Andy Mackay # 4. Paul Thompson # 5. Phil Manzanera # # 1. Bryan Ferry # 2. Brian Eno # 3. Andy Mackay # 4. Paul Thompson # 5. Phil Manzanera # # ### Links # # Markdown inline links are equivalent to HTML `` # links, they just have a different syntax. # # [Biostatistics home page](http://biostat.mc.vanderbilt.edu "Visit Biostat!") # # [Biostatistics home page](http://biostat.mc.vanderbilt.edu "Visit Biostat!") # # ### Block quotes # # Block quotes are denoted by a `>` (greater than) character # before each line of the block quote. # # > Sometimes a simple model will outperform a more complex model . . . # > Nevertheless, I believe that deliberately limiting the complexity # > of the model is not fruitful when the problem is evidently complex. # # > Sometimes a simple model will outperform a more complex model . . . # > Nevertheless, I believe that deliberately limiting the complexity # > of the model is not fruitful when the problem is evidently complex. # # ### Images # # Images look an awful lot like Markdown links, they just have an extra # `!` (exclamation mark) in front of them. # # ![Python logo](images/python-logo-master-v3-TM.png) # # ![Python logo](https://www.python.org/static/community_logos/python-logo.png) # ### Remote Code # # Use `%load` to add remote code # In[26]: # %load http://matplotlib.org/mpl_examples/statistics/boxplot_demo.py """ ========================================= Demo of artist customization in box plots ========================================= This example demonstrates how to use the various kwargs to fully customize box plots. The first figure demonstrates how to remove and add individual components (note that the mean is the only value not shown by default). The second figure demonstrates how the styles of the artists can be customized. It also demonstrates how to set the limit of the whiskers to specific percentiles (lower right axes) A good general reference on boxplots and their history can be found here: http://vita.had.co.nz/papers/boxplots.pdf """ import numpy as np import matplotlib.pyplot as plt # fake data np.random.seed(937) data = np.random.lognormal(size=(37, 4), mean=1.5, sigma=1.75) labels = list('ABCD') fs = 10 # fontsize # demonstrate how to toggle the display of different elements: fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True) axes[0, 0].boxplot(data, labels=labels) axes[0, 0].set_title('Default', fontsize=fs) axes[0, 1].boxplot(data, labels=labels, showmeans=True) axes[0, 1].set_title('showmeans=True', fontsize=fs) axes[0, 2].boxplot(data, labels=labels, showmeans=True, meanline=True) axes[0, 2].set_title('showmeans=True,\nmeanline=True', fontsize=fs) axes[1, 0].boxplot(data, labels=labels, showbox=False, showcaps=False) tufte_title = 'Tufte Style \n(showbox=False,\nshowcaps=False)' axes[1, 0].set_title(tufte_title, fontsize=fs) axes[1, 1].boxplot(data, labels=labels, notch=True, bootstrap=10000) axes[1, 1].set_title('notch=True,\nbootstrap=10000', fontsize=fs) axes[1, 2].boxplot(data, labels=labels, showfliers=False) axes[1, 2].set_title('showfliers=False', fontsize=fs) for ax in axes.flatten(): ax.set_yscale('log') ax.set_yticklabels([]) fig.subplots_adjust(hspace=0.4) plt.show() # demonstrate how to customize the display different elements: boxprops = dict(linestyle='--', linewidth=3, color='darkgoldenrod') flierprops = dict(marker='o', markerfacecolor='green', markersize=12, linestyle='none') medianprops = dict(linestyle='-.', linewidth=2.5, color='firebrick') meanpointprops = dict(marker='D', markeredgecolor='black', markerfacecolor='firebrick') meanlineprops = dict(linestyle='--', linewidth=2.5, color='purple') fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(6, 6), sharey=True) axes[0, 0].boxplot(data, boxprops=boxprops) axes[0, 0].set_title('Custom boxprops', fontsize=fs) axes[0, 1].boxplot(data, flierprops=flierprops, medianprops=medianprops) axes[0, 1].set_title('Custom medianprops\nand flierprops', fontsize=fs) axes[0, 2].boxplot(data, whis='range') axes[0, 2].set_title('whis="range"', fontsize=fs) axes[1, 0].boxplot(data, meanprops=meanpointprops, meanline=False, showmeans=True) axes[1, 0].set_title('Custom mean\nas point', fontsize=fs) axes[1, 1].boxplot(data, meanprops=meanlineprops, meanline=True, showmeans=True) axes[1, 1].set_title('Custom mean\nas line', fontsize=fs) axes[1, 2].boxplot(data, whis=[15, 85]) axes[1, 2].set_title('whis=[15, 85]\n#percentiles', fontsize=fs) for ax in axes.flatten(): ax.set_yscale('log') ax.set_yticklabels([]) fig.suptitle("I never said they'd be pretty") fig.subplots_adjust(hspace=0.4) plt.show() # In[27]: # %load http://matplotlib.org/mpl_examples/shapes_and_collections/scatter_demo.py """ Simple demo of a scatter plot. """ import numpy as np import matplotlib.pyplot as plt N = 50 x = np.random.rand(N) y = np.random.rand(N) colors = np.random.rand(N) area = np.pi * (15 * np.random.rand(N))**2 # 0 to 15 point radii plt.scatter(x, y, s=area, c=colors, alpha=0.5) plt.show() # ### Mathjax Support # # Mathjax ia a javascript implementation $\alpha$ of LaTeX that allows equations to be embedded into HTML. For example, this markup: # # """$$ \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). $$""" # # becomes this: # # $$ # \int_{a}^{b} f(x)\, dx \approx \frac{1}{2} \sum_{k=1}^{N} \left( x_{k} - x_{k-1} \right) \left( f(x_{k}) + f(x_{k-1}) \right). # $$ # Code cells can be displayed as LaTeX # In[28]: from IPython.display import Latex Latex(r"""\begin{eqnarray} \nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\ \nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\ \nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\ \nabla \cdot \vec{\mathbf{B}} & = 0 \end{eqnarray}""") # ## SymPy Support # # SymPy is a Python library for symbolic mathematics. It supports: # # * polynomials # * calculus # * solving equations # * discrete math # * matrices # In[29]: from sympy import * init_printing() x, y = symbols("x y") # In[30]: eq = ((x+y)**2 * (x+1)) eq # In[31]: expand(eq) # In[32]: (1/cos(x)).series(x, 0, 6) # In[33]: limit((sin(x)-x)/x**3, x, 0) # In[34]: diff(cos(x**2)**2 / (1+x), x) # ### Magic functions # # IPython has a set of predefined ‘magic functions’ that you can call with a command line style syntax. These include: # # * `%run` # * `%edit` # * `%debug` # * `%timeit` # * `%paste` # * `%load_ext` # # # In[35]: get_ipython().run_line_magic('lsmagic', '') # Timing the execution of code; the `timeit` magic exists both in line and cell form: # In[36]: get_ipython().run_line_magic('timeit', 'np.linalg.eigvals(np.random.rand(100,100))') # In[37]: get_ipython().run_cell_magic('timeit', 'a = np.random.rand(100, 100)', 'np.linalg.eigvals(a)\n') # IPython also creates aliases for a few common interpreters, such as bash, ruby, perl, etc. # # These are all equivalent to `%%script ` # In[38]: get_ipython().run_cell_magic('ruby', '', 'puts "Hello from Ruby #{RUBY_VERSION}"\n') # In[39]: get_ipython().run_cell_magic('bash', '', 'echo "hello from $BASH"\n') # IPython has an `rmagic` extension that contains a some magic functions for working with R via rpy2. This extension can be loaded using the `%load_ext` magic as follows: # In[40]: get_ipython().run_line_magic('load_ext', 'rpy2.ipython') # If the above generates an error, it is likely that you do not have the `rpy2` module installed. You can install this now via: # In[41]: get_ipython().system('pip install rpy2') # In[42]: x,y = np.arange(10), np.random.normal(size=10) get_ipython().run_line_magic('R', 'print(lm(rnorm(10)~rnorm(10)))') # In[43]: get_ipython().run_cell_magic('R', '-i x,y -o XYcoef', 'lm.fit <- lm(y~x)\npar(mfrow=c(2,2))\nprint(summary(lm.fit))\nplot(lm.fit)\nXYcoef <- coef(lm.fit)\n') # In[44]: XYcoef # ### Custom magics # # As we have seen already, IPython has cell and line magics. You can define your own magics using any Python function and the `register_magic_function` method: # In[45]: from IPython.core.magic import (register_line_magic, register_cell_magic, register_line_cell_magic) # In[46]: @register_line_magic def countdown(line): """Print a countdown before executing cell""" import time for i in range(int(line)): time.sleep(1) print(i+1, '... ', end='') print('Go!') # In[47]: get_ipython().run_line_magic('countdown', '5') np.random.random(5) # ## Language kernels # # Its very easy to run Jupyter on top of languages other than Python, using a language kernel that is either **native** or **wrapped** in IPython. [Many kernels already exist](https://github.com/jupyter/jupyter/wiki/Jupyter-kernels). # # The exact procedure to install a kernel for a different language will depend on the specificity of each language. # Though ther is a common set of step to follow. # # - Install the language stack you are interested in. # - Install the kernel for this language (often using given language package manager). # - Register the kernel globally with Jupyter. # # While usually a kernel is though as a specific language, a kernel may be: # # - A virtual environment (or equivalent) # - A set of configuration/environment variables. # - A physical location (for remote kernels) # # Installing multiple kernels does not automatically allow one notebook to use many languages at once, but this is also possible. # ## Debugging # # The `%debug` magic can be used to trigger the IPython debugger (`ipd`) for a cell that raises an exception. The debugger allows you to step through code line-by-line and inspect variables and execute code. # In[48]: def div(x, y): return x/y div(1,0) # In[49]: get_ipython().run_line_magic('debug', '') # ## Exporting and Converting Notebooks # # In Jupyter, one can convert an `.ipynb` notebook document file into various static formats via the `nbconvert` tool. Currently, nbconvert is a command line tool, run as a script using Jupyter. # In[50]: get_ipython().system('jupyter nbconvert --to html "IPython and Jupyter.ipynb"') # Currently, `nbconvert` supports HTML (default), LaTeX, Markdown, reStructuredText, Python and HTML5 slides for presentations. Some types can be post-processed, such as LaTeX to PDF (this requires [Pandoc](http://johnmacfarlane.net/pandoc/) to be installed, however). # In[ ]: get_ipython().system('jupyter nbconvert --to pdf "IPython and Jupyter.ipynb"') # A very useful online service is the [IPython Notebook Viewer](http://nbviewer.ipython.org) which allows you to display your notebook as a static HTML page, which is useful for sharing with others: # In[52]: IFrame("http://nbviewer.ipython.org/2352771", width='100%', height=350) # As of this year, GitHub supports the [rendering of Jupyter Notebooks](https://github.com/fonnesbeck/Bios8366/blob/master/notebooks/Section1_2-Programming-with-Python.ipynb) stored on its repositories. # ## Reproducible Research # # > reproducing conclusions from a single experiment based on the measurements from that experiment # # The most basic form of reproducibility is a complete description of the data and associated analyses (including code!) so the results can be *exactly* reproduced by others. # # Reproducing calculations can be onerous, even with one's own work! # # Scientific data are becoming larger and more complex, making simple descriptions inadequate for reproducibility. As a result, most modern research is irreproducible without tremendous effort. # # *** Reproducible research is not yet part of the culture of science in general, or scientific computing in particular. *** # ## Scientific Computing Workflow # # There are a number of steps to scientific endeavors that involve computing: # # ![workflow](images/workflow.png) # # # Many of the standard tools impose barriers between one or more of these steps. This can make it difficult to iterate, reproduce work. # # The Jupyter notebook eliminates or reduces these barriers to reproducibility. # ## Version control # # Since a Jupyter Notebook is just a JSON file, it can be readily added to a version control system, such as Git. However, JSON is not inherently conducive resolving merge conflicts or other tasks that involve comparing different versions of the file. # # # ### Jupyter Notebook Diff and Merge tools # # `nbdime` provides [tools for diffing and merging of notebooks](https://nbdime.readthedocs.io/en/latest/). # # - `nbdiff` compare notebooks in a terminal-friendly way # - `nbmerge` three-way merge of notebooks with automatic conflict resolution # - `nbdiff-web` shows you a rich rendered diff of notebooks # - `nbmerge-web` gives you a web-based three-way merge tool for notebooks # - `nbshow` present a single notebook in a terminal-friendly way # # ![](https://github.com/jupyter/nbdime/raw/master/docs/source/images/nbmerge-web.png) # ## Jupyter cloud services # # There are a growing number of services for hosting and running notebooks remotely. # # ### mybinder.org # # Have a repository full of Jupyter notebooks? With Binder, you can add a badge that opens those notebooks in an executable environment, making your code immediately reproducible by anyone, anywhere. # # [My PyCon 2017 tutorial in mybinder](https://beta.mybinder.org/v2/gh/fonnesbeck/intro_stat_modeling_2017/master) # # If you followed the instructions above, you launched the Jupyter Notebook App on a cloud service called **`tmpnb`**. As the prefix `tmp` indicates, it gives you a temporary demo: as soon as you close the browser tab (or after a few minutes of inactivity), the kernel dies and the content you wrote is not saved anywhere. This is a free service, sponsored by the company Rackspace, j # # ### Azure # # In June 2016, Microsoft announced [notebooks hosted on Azure](https://notebooks.azure.com) cloud. You need to create a free account to be able to save your work. # # ### Domino Data Lab # # Though you can set up your own Amazon EC2 instances and run notebooks, [Domino Data lab](https://www.dominodatalab.com/) does most of the work in setting up and provisioning Amazon services with Jupyter notebooks (and other cloud software) ready to launch. You can choose the compute size, and pay on a sliding scale. # # In addition to this, you can host your own temporary notebooks using [tmpnb](https://github.com/jupyter/tmpnb), which uses Docker images to run notebooks on-demand, that automatically shut down when they are idle. # # --- # # ## Links and References # # * [IPython Notebook Viewer](http://nbviewer.ipython.org) Displays static HTML versions of notebooks, and includes a gallery of notebook examples. # # * [A Reference-Free Algorithm for Computational Normalization of Shotgun Sequencing Data](http://ged.msu.edu/papers/2012-diginorm/) A landmark example of reproducible research in genomics: Git repo, iPython notebook, data and scripts. # # * Jacques Ravel and K Eric Wommack. 2014. [All Hail Reproducibility in Microbiome Research](http://www.microbiomejournal.com/content/pdf/2049-2618-2-8.pdf). Microbiome, 2:8. # # * Benjamin Ragan-Kelley et al.. 2013. [Collaborative cloud-enabled tools allow rapid, reproducible biological insights](http://www.nature.com/ismej/journal/v7/n3/full/ismej2012123a.html). The ISME Journal, 7, 461–464; doi:10.1038/ismej.2012.123;