Build a Tableau TDE using the Tableau Python API and the RPY2 Interface to R

About

This notebook demonstrates a super quick way to score a dataset in R, and then bring in the saved .Rdata file into python. Then, we use pandas to store the dataframe and leverage the metadata to (somewhat) intellgently build a Tableau Data Extract.

There are bound to be bugs and this is not a comprehensive code set. Also, please reference some of the URLs below, as others have demonstrated more robust ways of using Tableau API in python.

At some point, I would like to greatly improve the code, but this is more of a proof of concept than anything else.

Github repo

In [1]:
# import the modules
# ASSUMES THAT THESE ARE ALREADY PROPERLY SETUP, which may not be easy
import dataextract as tde
import pandas as pd
import pandas.rpy.common as com
from rpy2 import robjects as r
import os
import sys
from datetime import datetime

Before using python and pandas with the Tableau Data Extract API

The code below assumes that you have already ran the R script Model-in-R.R to score a basic dataset and save out the dataframe as a .Rdata file.

Grab the R Dataset that has been scored

In [2]:
# read in the data from the R session within python
r.r("load('test-data.rdata')")
Out[2]:
<StrVector - Python:0x056BFDC8 / R:0x0811E670>
['df']
In [3]:
# check that its there
r.r('dim(df)')
Out[3]:
<IntVector - Python:0x056C4CB0 / R:0x08121160>
[     150,        6]
In [4]:
# use comm to bring in as pandas data frame
# http://pandas.pydata.org/pandas-docs/stable/r_interface.html
df = com.load_data('df')
df.head()
Out[4]:
sepal.length sepal.width petal.length petal.width species pred
1 5.1 3.5 1.4 0.2 setosa setosa
2 4.9 3.0 1.4 0.2 setosa setosa
3 4.7 3.2 1.3 0.2 setosa setosa
4 4.6 3.1 1.5 0.2 setosa setosa
5 5.0 3.6 1.4 0.2 setosa setosa

Create our Tableau Data Extract

In [5]:
# Reference the Make Order example that comes with the Tableau API
# Also help from http://ryrobes.com/python/build-tableau-data-extracts-out-of-csv-files-more-python-tde-api-madness/

# create the extract, replace if it already exists
fname = 'use-R.tde'
try:  
    tdefile = tde.Extract(fname) 
except: 
    os.system('del ' + fname)
    os.system('del DataExtract.log') 
    tdefile = tde.Extract(fname)
In [6]:
# define the table definition
tableDef = tde.TableDefinition()
In [7]:
# create a list of column names
colnames = df.columns
# create a list of column types
coltypes = df.dtypes
In [8]:
# create a dict for the field maps
# Define type maps
# Caveat: I am not including all of the possibilities here
fieldMap = { 
    'float64' :     tde.Type.DOUBLE,
    'float32' :     tde.Type.DOUBLE,
    'int64' :       tde.Type.DOUBLE,
    'int32' :       tde.Type.DOUBLE,
    'object':       tde.Type.UNICODE_STRING,
    'bool' :        tde.Type.BOOLEAN
}
In [9]:
# for each column, add the appropriate info the Table Definition
for i in range(0, len(colnames)):
    cname = colnames[i]
    ctype = fieldMap.get(str(coltypes[i]))
    tableDef.addColumn(cname, ctype)        
In [10]:
# create the extract from the Table Definition
# not sure why this is the proper syntax - Tableau API documentation seems limited
# Super Hacky, but legible 
# for each row, add the data to the table
# Again, not accounting for every type or errors
with tdefile as extract:
    table = extract.addTable("Extract", tableDef)
    for r in range(0, df.shape[0]):
        row = tde.Row(tableDef)
        for c in range(0, len(coltypes)):
            if str(coltypes[c]) == 'float64':
                row.setDouble(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'float32':
                row.setDouble(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'int64':
                row.setDouble(c, df.iloc[r,c])   
            elif str(coltypes[c]) == 'int32':
                row.setDouble(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'object':
                row.setString(c, df.iloc[r,c]) 
            elif str(coltypes[c]) == 'bool':
                row.setBoolean(c, df.iloc[r,c])
            else:
                row.setNull(c)
        # insert the row
        table.insert(row)