#!/usr/bin/env python # coding: utf-8 # # Jupyter Notebooks

# ## for Collaborative and Reproducible Research # ## Reproducible Research # # > reproducing conclusions from a single experiment based on the measurements from that experiment # # The most basic form of reproducibility is a complete description of the data and associated analyses (including code!) so the results can be *exactly* reproduced by others. # # Reproducing calculations can be onerous, even with one's own work! # # Scientific data are becoming larger and more complex, making simple descriptions inadequate for reproducibility. As a result, most modern research is irreproducible without tremendous effort. # # *** Reproducible research is not yet part of the culture of science in general, or scientific computing in particular. *** # ## Scientific Computing Workflow # # There are a number of steps to scientific endeavors that involve computing: # # ![workflow](images/workflow.png) # # # Many of the standard tools impose barriers between one or more of these steps. This can make it difficult to iterate, reproduce work. # # The Jupyter notebook [eliminates or reduces these barriers to reproducibility](http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261). # # Jupyter/IPython notebooks have already motivated the generation of [reproducible publications](https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks#reproducible-academic-publications) and an [open source statistics textbook](http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/) # ## Jupyter Notebook # # The Jupyter Notebook is an **interactive computing environment** that enables users to author notebook documents that include: # - Live code # - Interactive widgets # - Plots # - Narrative text # - Equations # - Images # - Video # # These documents provide a **complete and self-contained record of a computation** that can be converted to various formats and shared with others using email, [Dropbox](http://dropbox.com), version control systems (like git/[GitHub](http://github.com)) or [nbviewer.ipython.org](http://nbviewer.ipython.org). # ### Components # # The Jupyter Notebook combines three components: # # * **The notebook web application**: An interactive web application for writing and running code interactively and authoring notebook documents. # * **Kernels**: Separate processes started by the notebook web application that runs users' code in a given language and returns output back to the notebook web application. The kernel also handles things like computations for interactive widgets, tab completion and introspection. # * **Notebook documents**: Self-contained documents that contain a representation of all content visible in the notebook web application, including inputs and outputs of the computations, narrative # text, equations, images, and rich media representations of objects. Each notebook document has its own kernel. # ## Kernels # # Through IPython's kernel and messaging architecture, the Notebook allows code to be run in a range of different programming languages. For each notebook document that a user opens, the web application starts a kernel that runs the code for that notebook. Each kernel is capable of running code in a single programming language and there are kernels available in the following languages: # # * [Python](https://github.com/ipython/ipython) # * [Julia](https://github.com/JuliaLang/IJulia.jl) # * [R](https://github.com/takluyver/IRkernel) # * [Ruby](https://github.com/minrk/iruby) # * [Haskell](https://github.com/gibiansky/IHaskell) # * [Scala](https://github.com/Bridgewater/scala-notebook) # * [node.js](https://github.com/n-riesco/ijavascript) # * [Go](https://github.com/takluyver/igo) # # The default kernel runs Python code. IPython 3.0 provides a simple way for users to pick which of these kernels is used for a given notebook. # # Each of these kernels communicate with the notebook web application and web browser using a JSON over ZeroMQ/WebSockets message protocol that is described [here](http://ipython.org/ipython-doc/dev/development/messaging.html). Most users don't need to know about these details, but it helps to understand that "kernels run code." # ## Notebook Documents # # Notebook documents contain the **inputs and outputs** of an interactive session as well as **narrative text** that accompanies the code but is not meant for execution. **Rich output** generated by running code, including HTML, images, video, and plots, is embeddeed in the notebook, which makes it a complete and self-contained record of a computation. # # When you run the notebook web application on your computer, notebook documents are just **files on your local filesystem with a `.ipynb` extension**. This allows you to use familiar workflows for organizing your notebooks into folders and sharing them with others. # # Notebooks consist of a **linear sequence of cells**. There are three basic cell types: # # * **Code cells:** Input and output of live code that is run in the kernel # * **Markdown cells:** Narrative text with embedded LaTeX equations # * **Raw cells:** Unformatted text that is included, without modification, when notebooks are converted to different formats using nbconvert # # Internally, notebook documents are **[JSON](http://en.wikipedia.org/wiki/JSON) data** with **binary values [base64](http://en.wikipedia.org/wiki/Base64)** encoded. This allows them to be **read and manipulated programmatically** by any programming language. Because JSON is a text format, notebook documents are version control friendly. # # **Notebooks can be exported** to different static formats including HTML, reStructeredText, LaTeX, PDF, and slide shows ([reveal.js](http://lab.hakim.se/reveal-js/#/)) using IPython's `nbconvert` utility. # # Furthermore, any notebook document available from a **public URL** can be shared via [nbviewer](http://nbviewer.ipython.org). This service loads the notebook document from the URL and renders it as a static web page. The resulting web page may thus be shared with others **without their needing to install IPython**. # ## Installation and Configuration # # # While Jupyter runs code in many different programming languages, Python is a prerequisite for installing Jupyter notebook. # # Perhaps the easiest way to get a feature-complete version of Python on your system is to install the [Anaconda](http://continuum.io/downloads.html) distribution by Continuum Analytics. Anaconda is a completely free Python environment that includes includes almost 200 of the best Python packages for science and data analysis. Its simply a matter of downloading the installer (either graphical or command line), and running it on your system. # # Be sure to download the Python 3.5 installer, by following the **Python 3.5 link** for your computing platform (Mac OS X example shown below). # # ![get Python 3](http://fonnesbeck-dropshare.s3.amazonaws.com/687474703a2f2f666f6e6e65736265636b2d64726f7073686172652e73332e616d617a6f6e6177732e636f6d2f53637265656e2d53686f742d323031362d30332d31382d61742d332e32342e32362d504d2e706e67.png) # # Once Python is installed, installing Jupyter is a matter of running a single command: # # conda install jupyter # # If you prefer to install Jupyter from source, or you did not use Anaconda to install Python, you can also use `pip`: # # pip install jupyter # ## Installing Kernels # # Individual language kernels must be installed from each respective language. We will show the R kernel installation as an example. # # Setting up the R kernel involves two commands from within the R shell. The first installs the packages: # # ```r # install.packages(c('repr', 'IRkernel', 'IRdisplay'), # repos = c('http://irkernel.github.io/', getOption('repos'))) # ``` # # and the second links the kernel to Jupyter: # # ```r # IRkernel::installspec() # ``` # ## Running Jupyter Notebooks # # Once installed, a notebook session can be initiated from the command line via: # # jupyter notebook # # If you installed Jupyter via Anaconda, you will also have a graphical launcher available. # ## IPython # # **IPython** (Interactive Python) is an enhanced Python shell which provides a more robust and productive development environment for users. There are several key features that set it apart from the standard Python shell. # # * Interactive data analysis and visualization # * Python kernel for Jupyter notebooks # * Easy parallel computation # # Over time, the IPython project grew to include several components, including: # # * an interactive shell # * a REPL protocol # * a notebook document fromat # * a notebook document conversion tool # * a web-based notebook authoring tool # * tools for building interactive UI (widgets) # * interactive parallel Python # # As each component has evolved, several had grown to the point that they warrented projects of their own. For example, pieces like the notebook and protocol are not even specific to Python. As the result, the IPython team created Project Jupyter, which is the new home of language-agnostic projects that began as part of IPython, such as the notebook in which you are reading this text. # # The HTML notebook that is part of the Jupyter project supports **interactive data visualization** and easy high-performance **parallel computing**. # # In[1]: get_ipython().run_line_magic('matplotlib', 'inline') import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') def f(x): return (x-3)*(x-5)*(x-7)+85 import numpy as np x = np.linspace(0, 10, 200) y = f(x) plt.plot(x,y) # The Notebook gives you everything that a browser gives you. For example, you can embed images, videos, or entire websites. # In[2]: from IPython.display import IFrame IFrame('http://biostat.mc.vanderbilt.edu/wiki', width='100%', height=350) # In[3]: from IPython.display import YouTubeVideo YouTubeVideo("rl5DaFbLc60") # # Running Code # First and foremost, the IPython Notebook is an interactive environment for writing and running code. IPython is capable of running code in a wide range of languages. However, this notebook, and the default kernel in IPython 3, runs Python code. # ## Code cells allow you to enter and run Python code # Run a code cell using `Shift-Enter` or pressing the button in the toolbar above: # In[4]: a = 10 # In[5]: print(a) # There are three keyboard shortcuts for running code: # # * `Shift-Enter` runs the current cell, enters command mode, and select next cell. # * `Ctrl-Enter` runs the current cell and enters command mode. # * `Alt-Enter` runs the current cell and inserts a new one below, enters edit mode. # # These keyboard shortcuts works both in command and edit mode. # ## Managing the IPython Kernel # Code is run in a separate process called the IPython Kernel. The Kernel can be interrupted or restarted. Try running the following cell and then hit the button in the toolbar above. # In[6]: import time time.sleep(10) # If the Kernel dies it will be automatically restarted up to 3 times. # If it cannot be restarted automatically you will be prompted to try again, or abort. # Here we call the low-level system libc.time routine with the wrong argument via # ctypes to segfault the Python interpreter: # In[ ]: import sys from ctypes import CDLL # This will crash a Linux or Mac system # equivalent calls can be made on Windows dll = 'dylib' if sys.platform == 'darwin' else 'so.6' libc = CDLL("libc.%s" % dll) libc.time(-1) # BOOM!! # ## Cell menu # The "Cell" menu has a number of menu items for running code in different ways. These includes: # # * Run # * Run and Select Below # * Run and Insert Below # * Run All # * Run All Above # * Run All Below # ## Restarting the kernels # The kernel maintains the state of a notebook's computations. You can reset this state by restarting the kernel. This is done by clicking on the in the toolbar above, or by using the `00` (press 0 twice) shortcut in command mode. # ## Output is asynchronous # All output is displayed asynchronously as it is generated in the Kernel. If you execute the next cell, you will see the output one piece at a time, not all at the end. # In[7]: import time, sys for i in range(8): print(i) time.sleep(0.5) # ## Large outputs # To better handle large outputs, the output area can be collapsed. Run the following cell and then single- or double- click on the active area to the left of the output: # In[8]: for i in range(50): print(i) # Beyond a certain point, output will scroll automatically: # In[9]: for i in range(500): print(2**i - 1) # ## Markdown cells # # Markdown is a simple *markup* language that allows plain text to be converted into HTML. # # The advantages of using Markdown over HTML (and LaTeX): # # - its a **human-readable** format # - allows writers to focus on content rather than formatting and layout # - easier to learn and use # # For example, instead of writing: # # ```html #

In order to create valid # HTML, you # need properly coded syntax that can be cumbersome for # “non-programmers” to write. Sometimes, you # just want to easily make certain words bold # , and certain words italicized without # having to remember the syntax. Additionally, for example, # creating lists:

should be easy
should not involve programming