This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

7.8. Analyzing data with R in the IPython notebook

UPDATE (2014-09-29): in newer versions of rpy2, the IPython extension with the R magic is rpy2.ipython and not rmagic as stated in the book.

There are three steps to use R from IPython. First, install R and rpy2 (R to Python interface). Of course, you only need to do this step once. Then, to use R in an IPython session, you need to load the IPython R extension.

  1. Download and install R for your operating system. (http://cran.r-project.org/mirrors.html)
  2. Download and install rpy2. Windows users can try to download an experimental installer on Chris Gohlke's webpage. (http://www.lfd.uci.edu/~gohlke/pythonlibs/#rpy2)
  3. Then, to be able to execute R code in an IPython notebook, execute %load_ext rpy2.ipython first.

rpy2 does not appear to work well on Windows. We recommend using Linux or Mac OS X.

To install R and rpy2 on Ubuntu, run the following commands:

sudo apt-get install r-base-dev
sudo apt-get install python-rpy2

Here, we will use the following workflow. First, we load data from Python. Then, we use R to design and fit a model, and to make some plots in the IPython notebook. We could also load data from R, or design and fit a statistical model with Python's statsmodels package, etc. In particular, the analysis we do here could be done entirely in Python, without resorting to the R language. This recipe just shows the basics of R and illustrates how R and Python can play together within an IPython session.

  1. Let's load the longley dataset with the statsmodels package. This dataset contains a few economic indicators in the US from 1947 to 1962. We also load the IPython R extension.
In [ ]:
import statsmodels.datasets as sd
In [ ]:
data = sd.longley.load_pandas()
In [ ]:
%load_ext rpy2.ipython
  1. We define x and y as the exogeneous (independent) and endogenous (dependent) variables, respectively. The endogenous variable quantifies the total employment in the country.
In [ ]:
data.endog_name, data.exog_name
In [ ]:
y, x = data.endog, data.exog
  1. For convenience, we add the endogenous variable to the x DataFrame.
In [ ]:
x['TOTEMP'] = y
In [ ]:
x
  1. We will make a simple plot in R. First, we need to pass Python variables to R. We can use the %R -i var1,var2 magic. Then, we can call R's plot command.
In [ ]:
gnp = x['GNP']
totemp = x['TOTEMP']
In [ ]:
%R
In [ ]:
%R -i totemp,gnp plot(gnp, totemp)
  1. Now that the data has been passed to R, we can fit a linear model to the data. The lm function lets us perform a linear regression. Here, we want to express totemp (total employement) as a function of the country's GNP.
In [ ]:
%%R
fit <- lm(totemp ~ gnp);  # Least-squares regression
print(fit$coefficients)  # Display the coefficients of the fit.
plot(gnp, totemp)  # Plot the data points.
abline(fit)  # And plot the linear regression.

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).