Jupyter and conda for R

This notebook explains the many benefits Jupyter, the IRKernel and conda can provide for data scientists working in R.

Jupyter, previously called IPython, is already widely adopted by data scientists, researchers, and analysts. Jupyter's notebook user interface enables mixing executable code with narrative text, equations, interactive visualizations, and images to enhance team collaboration and advance the state of reproducible research and training. Jupyter began with Python and now has kernels for 50 different languages, and the IRKernel is the native R kernel for Jupyter.

Data scientists, researchers, and analysts use the conda package manager to install and organize project dependencies. With conda they can easily build and share metapackages, which are downloadable bundles of packages. Conda works with Linux, OS X, and Windows, and is language agnostic, so we can use it with any programming language and with projects that depend on multiple languages.

Let's use conda and Jupyter to start a data science project in R.

"R Essentials" setup

The Anaconda team has created an "R Essentials" bundle with the IRKernel and over 80 of the most used R packages for data science, including dplyr, shiny, ggplot2, tidyr, caret and nnet.

Downloading "R Essentials" requires conda. Miniconda includes conda, Python, and a few other necessary packages, while Anaconda includes all this and over 200 of the most popular Python packages for science, math, engineering, and data analysis. Users may install all of Anaconda at once, or they may install Miniconda at first and then use conda to install any other packages they need, including any of the packages in Anaconda.

Once you have conda, you may install "R Essentials" into the current environment:

conda install -c r r-essentials

or create a new environment just for "R essentials":

conda create -n my-r-env -c r r-essentials

Jupyter

Jupyter provides a great notebook interface to write your analysis and share it with your peers. Open a shell and run this command to start the Jupyter notebook interface in your browser:

jupyter notebook

Start a new R notebook:

create an R notebook with jupyter

You can immediately write and run R code in the notebook cells.

An R notebook example

Now you can:

  • import the data wrangling R package, dplyr:
In [2]:
library(dplyr)
  • explore one of the available datasets, such as the iris:
In [3]:
iris
Out[3]:
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
15.13.51.40.2setosa
24.931.40.2setosa
34.73.21.30.2setosa
44.63.11.50.2setosa
553.61.40.2setosa
65.43.91.70.4setosa
74.63.41.40.3setosa
853.41.50.2setosa
94.42.91.40.2setosa
104.93.11.50.1setosa
115.43.71.50.2setosa
124.83.41.60.2setosa
134.831.40.1setosa
144.331.10.1setosa
155.841.20.2setosa
165.74.41.50.4setosa
175.43.91.30.4setosa
185.13.51.40.3setosa
195.73.81.70.3setosa
205.13.81.50.3setosa
215.43.41.70.2setosa
225.13.71.50.4setosa
234.63.610.2setosa
245.13.31.70.5setosa
254.83.41.90.2setosa
26531.60.2setosa
2753.41.60.4setosa
285.23.51.50.2setosa
295.23.41.40.2setosa
304.73.21.60.2setosa
314.83.11.60.2setosa
325.43.41.50.4setosa
335.24.11.50.1setosa
345.54.21.40.2setosa
354.93.11.50.2setosa
3653.21.20.2setosa
375.53.51.30.2setosa
384.93.61.40.1setosa
394.431.30.2setosa
405.13.41.50.2setosa
4153.51.30.3setosa
424.52.31.30.3setosa
434.43.21.30.2setosa
4453.51.60.6setosa
455.13.81.90.4setosa
464.831.40.3setosa
475.13.81.60.2setosa
484.63.21.40.2setosa
495.33.71.50.2setosa
5053.31.40.2setosa
5173.24.71.4versicolor
526.43.24.51.5versicolor
536.93.14.91.5versicolor
545.52.341.3versicolor
556.52.84.61.5versicolor
565.72.84.51.3versicolor
576.33.34.71.6versicolor
584.92.43.31versicolor
596.62.94.61.3versicolor
605.22.73.91.4versicolor
61523.51versicolor
625.934.21.5versicolor
6362.241versicolor
646.12.94.71.4versicolor
655.62.93.61.3versicolor
666.73.14.41.4versicolor
675.634.51.5versicolor
685.82.74.11versicolor
696.22.24.51.5versicolor
705.62.53.91.1versicolor
715.93.24.81.8versicolor
726.12.841.3versicolor
736.32.54.91.5versicolor
746.12.84.71.2versicolor
756.42.94.31.3versicolor
766.634.41.4versicolor
776.82.84.81.4versicolor
786.7351.7versicolor
7962.94.51.5versicolor
805.72.63.51versicolor
815.52.43.81.1versicolor
825.52.43.71versicolor
835.82.73.91.2versicolor
8462.75.11.6versicolor
855.434.51.5versicolor
8663.44.51.6versicolor
876.73.14.71.5versicolor
886.32.34.41.3versicolor
895.634.11.3versicolor
905.52.541.3versicolor
915.52.64.41.2versicolor
926.134.61.4versicolor
935.82.641.2versicolor
9452.33.31versicolor
955.62.74.21.3versicolor
965.734.21.2versicolor
975.72.94.21.3versicolor
986.22.94.31.3versicolor
995.12.531.1versicolor
1005.72.84.11.3versicolor
1016.33.362.5virginica
1025.82.75.11.9virginica
1037.135.92.1virginica
1046.32.95.61.8virginica
1056.535.82.2virginica
1067.636.62.1virginica
1074.92.54.51.7virginica
1087.32.96.31.8virginica
1096.72.55.81.8virginica
1107.23.66.12.5virginica
1116.53.25.12virginica
1126.42.75.31.9virginica
1136.835.52.1virginica
1145.72.552virginica
1155.82.85.12.4virginica
1166.43.25.32.3virginica
1176.535.51.8virginica
1187.73.86.72.2virginica
1197.72.66.92.3virginica
12062.251.5virginica
1216.93.25.72.3virginica
1225.62.84.92virginica
1237.72.86.72virginica
1246.32.74.91.8virginica
1256.73.35.72.1virginica
1267.23.261.8virginica
1276.22.84.81.8virginica
1286.134.91.8virginica
1296.42.85.62.1virginica
1307.235.81.6virginica
1317.42.86.11.9virginica
1327.93.86.42virginica
1336.42.85.62.2virginica
1346.32.85.11.5virginica
1356.12.65.61.4virginica
1367.736.12.3virginica
1376.33.45.62.4virginica
1386.43.15.51.8virginica
139634.81.8virginica
1406.93.15.42.1virginica
1416.73.15.62.4virginica
1426.93.15.12.3virginica
1435.82.75.11.9virginica
1446.83.25.92.3virginica
1456.73.35.72.5virginica
1466.735.22.3virginica
1476.32.551.9virginica
1486.535.22virginica
1496.23.45.42.3virginica
1505.935.11.8virginica
  • calculate the average sepal width by species:
In [4]:
iris %>%
 group_by(Species) %>%
 summarise(Sepal.Width.Avg = mean(Sepal.Width)) %>%
 arrange(Sepal.Width.Avg)
Out[4]:
SpeciesSepal.Width.Avg
1versicolor2.77
2virginica2.974
3setosa3.428
  • import the R visualization library ggplot:
In [5]:
library(ggplot2)
  • plot the Sepal.Width vs. Sepal.Length
In [6]:
ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) + geom_point(size=3)

Creating custom R bundles

For our users' convenience, we have packaged some of the most used R packages for data science in "R Essentials". It is also very easy to create your own custom set of R packages to share with peers with the conda metapackage command. For example, to provide a download called custom-r-bundle with only the libraries used in our example notebook, just create the metapackage:

conda metapackage custom-r-bundle 0.1.0 --dependencies r-irkernel jupyter r-ggplot2 r-dplyr --summary "My custom R bundle"

Share it with colleagues by uploading it to Anaconda.org:

conda install anaconda-client
anaconda login
anaconda upload custom-r-bundle-0.1.0-0.tar.bz2

Now anyone can get the bundle with those packages and dependencies by replacing "anacondauser" with your Anaconda.org username and running this command:

conda install -c anacondauser custom-r-bundle

From notebook to slides

Jupyter can convert a notebook into an online slide deck for talks and tutorials.

To convert a notebook into a reveal.js presentation, set "Cell Toolbar" to "Slideshow":

Convert R notebooks to slides step 1

Organize the cells into slides and subslides:

Convert R notebooks to slides step 2

And convert:

jupyter nbconvert my_r_notebook.ipynb --to slides --post serve

This opens a browser showing the slidedeck:

Convert R notebooks to slides