In [1]:
from functools import partial
from rpy2.ipython import html
html.html_rdataframe=partial(html.html_rdataframe, table_class="docutils")

R and pandas data frames

R data.frame and :class:pandas.DataFrame objects share a lot of conceptual similarities, and :mod:pandas chose to use the class name DataFrame after R objects.

In a nutshell, both are sequences of vectors (or arrays) of consistent length or size for the first dimension (the "number of rows"). if coming from the database world, an other way to look at them is column-oriented data tables, or data table API.

rpy2 is providing an interface between Python and R, and a convenience conversion layer between :class:rpy2.robjects.vectors.DataFrame and :class:pandas.DataFrame objects, implemented in :mod:rpy2.robjects.pandas2ri.

In [2]:
import pandas as pd
import rpy2.robjects as ro
from rpy2.robjects.packages import importr 
from rpy2.robjects import pandas2ri

from rpy2.robjects.conversion import localconverter

From pandas to R

Pandas data frame:

In [3]:
pd_df = pd.DataFrame({'int_values': [1,2,3],
                      'str_values': ['abc', 'def', 'ghi']})

int_values str_values
0 1 abc
1 2 def
2 3 ghi

R data frame converted from a pandas data frame:

In [4]:
with localconverter(ro.default_converter + pandas2ri.converter):
  r_from_pd_df = ro.conversion.py2ro(pd_df)

R/rpy2 DataFrame (3 x 2)
int_values str_values
1 'abc'
2 'def'

The conversion is automatically happening when calling R functions. For example, when calling the R function base::summary:

In [5]:
base = importr('base')

with localconverter(ro.default_converter + pandas2ri.converter):
  df_summary = base.summary(pd_df)
   int_values   str_values       
 Min.   :1.0   Length:3          
 1st Qu.:1.5   Class :character  
 Median :2.0   Mode  :character  
 Mean   :2.0                     
 3rd Qu.:2.5                     
 Max.   :3.0                     

Note that a ContextManager is used to limit the scope of the conversion. Without it, rpy2 will not know how to convert a pandas data frame:

In [6]:
  df_summary = base.summary(pd_df)
except NotImplementedError as nie:
Conversion 'py2ri' not defined for objects of type '<class 'pandas.core.frame.DataFrame'>'

From R to pandas

Starting from an R data frame this time:

In [7]:
r_df = ro.DataFrame({'int_values': ro.IntVector([1,2,3]),
                     'str_values': ro.StrVector(['abc', 'def', 'ghi'])})

R/rpy2 DataFrame (3 x 2)
int_values str_values
1 'abc'
2 'def'

It can be converted to a pandas data frame using the same converter:

In [8]:
with localconverter(ro.default_converter + pandas2ri.converter):
  pd_from_r_df = ro.conversion.ri2py(r_df)

int_values str_values
0 1 abc
1 2 def
2 3 ghi