The following is an example of how to use ggplot2
inside an IPython notebook.
For the data I will use the results of some evolutionary simulations I ran. As the main point here is to demonstrate the use of R
and ggplot2
in the IPython noteook I will not explain what the data.
First I need to parse the filename for the simulation parameters.
The regular expression was written using the Python regular expression testing tool.
import re
filename_pattern = pattern = re.compile('^pop_(?P<pop>\d+)_G_(?P<G>\d+)_s_(?P<s>\d\.?\d*)_H_(?P<H>\d\.?\d*)_U_(?P<U>\d\.?\d*)_beta_(?P<beta>\d\.?\d*)_pi_(?P<pi>\d\.?\d*)_tau_(?P<tau>\d\.?\d*)_(?P<date>\d{4}-\w{3}-\d{1,2})_(?P<time>\d{2}-\d{2}-\d{2}-\d{6}).(?P<extension>\w+)$')
def parse_filename(fname):
m = pattern.match(fname)
if m:
return m.groupdict()
else:
return dict()
import ujson as json
import gzip
folder = 'output/fixation/'
def process_data_file(fname):
fpath = folder + fname
params = parse_filename(fname)
if not params:
print "Failed parsing file name", fpath
return {},[],[]
with gzip.open(fpath) as f:
data = json.load(f,precise_float=True)
if not data:
print "Failed reading data", fpath
return {},[],[]
data.update(params)
W = data.pop('W')
p = data.pop('p')
data['fname'] = fname
for k in ['tau', 'G', 'H', 'pop', 'beta', 'U', 'T', 'pop_size', 's', 'pi']:
if str == type(data[k]):
data[k] = eval(data[k])
return data, W, p
Process all files into a list
, each item in the list
is a dict
containing the results of a single simulation:
import glob, os, time
tic = time.clock()
file_list = glob.glob1(folder, '*.data')
all_data = [None] * len(file_list)
print "processing", len(file_list), "data files"
for i,fname in enumerate(file_list) :
data,W,p = process_data_file(fname)
all_data[i] = data
toc = time.clock()
print "processed all files in", (toc-tic), "seconds"
processing 316 data files processed all files in 0.34 seconds
Next I create a matrix of the values I want to plot:
df = [[data['T'],data['tau'],data['s'],data['pi']] for data in all_data]
ggplot2
¶I call the rmagic
extension of IPython notebook`. Make sure you install rpy2
, for example run: pip install rpy2
.
%load_ext rmagic
The final step is to send the df
to R and plot the data using ggplot2
. The input to R
is defined by using the -i
option:
%%R -i df
df <- as.data.frame(df)
names(df) <- c("T","tau","s","pi")
library(ggplot2)
p <- ggplot(df, aes(tau, T))
p <- p +
geom_point(alpha=I(0.3)) +
scale_x_log10() + scale_y_log10() +
facet_grid(facets=s~pi, labeller=function(variable,value) {paste0(variable,'=',as.character(value))}) +
labs(y="Adaptation time", x=expression(tau))
print(p)
The code is free (CC0). The data and results are currently not available for reuse.