Nyaplot Tutorial 2: Interaction with DataFrame

I have alreay explained about Nyaplot::DataFrame in tutorial 1, but it's not enough to tell the usefulness. This notebook consists of 2 use case using DataFrame.

In [1]:
gem 'nyaplot', '0.1.5'
require 'nyaplot'
Out[1]:
Out[1]:
true

Case 1: Scatter with original tooltips

First, prepare sample data and put it into DataFrame. Then build a scatter plot based on it.

In [2]:
samples = Array.new(10).map.with_index{|d,i| 'cat'+i.to_s}
x=[];y=[];home=[]
10.times do
  x.push(5*rand)
  y.push(5*rand)
end
df = Nyaplot::DataFrame.new({x:x,y:y,name:samples})
df
Out[2]:
xyname
0.37769419037567813.150631669472884cat0
1.165235395263623.636084983829032cat1
3.40240156902235571.5265781962652962cat2
0.64787862051837844.697932060531303cat3
1.5023036926759780.23551205104727635cat4
0.61868355622977041.5712039108775393cat5
2.3604171188525783.037560696785177cat6
0.99588513288467463.3965329595925136cat7
1.02070065962500060.020049783384994968cat8
1.88740150583957262.973976083790996cat9
In [3]:
plot = Nyaplot::Plot.new
plot.x_label("weight [kg]")
plot.y_label("height [m]")
sc = plot.add_with_df(df, :scatter, :x, :y)
Out[3]:
#<Nyaplot::Diagram:0x000000011704e8 @properties={:type=>:scatter, :options=>{:x=>:x, :y=>:y}, :data=>"4415e395-2c04-483a-a794-91e437f82ff9"}, @xrange=[0.3776941903756781, 3.4024015690223557], @yrange=[0.020049783384994968, 4.697932060531303]>
In [4]:
plot.show
Out[4]:

The plot above is not contain name information, so add it into tool-tip. Use tooltip_contents to add contents to tool-tip.

In [5]:
sc.tooltip_contents([:name])
plot.show
Out[5]:

Tool-tip can include multiple lines, but the DataFrame has only three columns and that's not enough to add more line to tool-tip. Let's add home column to it.

In [6]:
address = ['London', 'Kyoto', 'Los Angeles', 'Puretoria']
home = Array.new(10,'').map{|d| address.clone.sample}
df.home = home
df
Out[6]:
xynamehome
0.37769419037567813.150631669472884cat0Kyoto
1.165235395263623.636084983829032cat1London
3.40240156902235571.5265781962652962cat2Los Angeles
0.64787862051837844.697932060531303cat3Kyoto
1.5023036926759780.23551205104727635cat4Puretoria
0.61868355622977041.5712039108775393cat5Los Angeles
2.3604171188525783.037560696785177cat6London
0.99588513288467463.3965329595925136cat7Puretoria
1.02070065962500060.020049783384994968cat8Kyoto
1.88740150583957262.973976083790996cat9Los Angeles
In [7]:
sc.tooltip_contents([:name, :home])
plot.show
Out[7]:

Then, fill points on the scatter in different colors according to 'home' column. To do so, specify column name by fill_by method.

In [8]:
colors = Nyaplot::Colors.qual
Out[8]:
rgb(251,180,174)rgb(179,205,227)rgb(204,235,197)rgb(222,203,228)rgb(254,217,166)rgb(255,255,204)rgb(229,216,189)rgb(253,218,236)rgb(242,242,242)
         
In [9]:
sc.color(colors)
sc.fill_by(:home)
plot.show
Out[9]:

Use shape_by method to change shape according to value in a column.

In [10]:
sc.color(colors)
sc.shape_by(:home)
plot.show
Out[10]:

Case 2: Multiple panes

DataFrame is also useful when visualizing data in multiple panes. Let's create plot from data about mutation.
First, fetch data from csv file. (All data used in this Tutorial is included in Nyaplot's repository: /examples/notebook/data/*)

In [11]:
path = File.expand_path("../data/first.tab", __FILE__)
df = Nyaplot::DataFrame.from_csv(path, sep="\t")
Out[11]:
mutationbloodset1set2set3set12set21set31
G>A0.00.0192307692307692320.00.482142857142857150.00.00.4782608695652174
C>T0.00.425925925925925930.00.00.3750.00.0
C>G0.00.00.00.00.00.5250.0
C>A0.00.00.19354838709677420.00.00.46666666666666670.0
C>A0.00.00.083333333333333330.00.00.51612903225806450.0
G>T0.00.00.00.00.44444444444444440.00.0
C>G0.00.00.00.00.00.00.4
C>A0.00.00.033333333333333330.00.00.428571428571428550.0
A>C0.00.61538461538461540.00.00.59259259259259260.00.0
C>A0.00.00.00.00.00.320.0
C>A0.00.00.00.00.00.42307692307692310.0
T>A0.00.5250.00.00.52777777777777780.00.0
C>T0.00.428571428571428550.00.00.66666666666666660.00.0
G>A0.00.00.00.00.00.50.03225806451612903
T>C0.00.00.00.00.00.00.3793103448275862
C>T0.00.00.00.457142857142857130.00.00.46875
........................
A>T0.00.47169811320754720.00.0144927536231884060.31428571428571430.00.0

Now I want to plot SET1 column, but it contains many zero cells. Then filter them out.

In [12]:
df.filter! {|row| row[:set1] != 0.0}
df
Out[12]:
mutationbloodset1set2set3set12set21set31
G>A0.00.0192307692307692320.00.482142857142857150.00.00.4782608695652174
C>T0.00.425925925925925930.00.00.3750.00.0
A>C0.00.61538461538461540.00.00.59259259259259260.00.0
T>A0.00.5250.00.00.52777777777777780.00.0
C>T0.00.428571428571428550.00.00.66666666666666660.00.0
G>A0.00.456521739130434760.00.00.434782608695652160.00.0
C>T0.00.098039215686274510.00.00.371428571428571440.00.0
T>A0.00.57692307692307690.00.00.47826086956521740.00.0
A>G0.00.438596491228070150.00.00.51428571428571420.00.0
T>C0.00.58064516129032260.00.00.63414634146341460.00.0
G>A0.00.0147058823529411760.00.00.00.00.41935483870967744
G>A0.00.66666666666666660.00.00.50.00.0
C>T0.00.5322580645161290.00.00.52272727272727270.00.0
G>A0.00.56521739130434780.00.00.38709677419354840.00.0
G>A0.00.428571428571428550.00.00.33333333333333330.00.0
C>A0.00.48888888888888890.00.00.432432432432432460.00.0
........................
A>T0.00.47169811320754720.00.0144927536231884060.31428571428571430.00.0

Next prepare instance of Nyaplot::Plot as usual. Nyaplot::Plot.filter is a method for adding 'filter box' to the plot.

In [13]:
plot4=Nyaplot::Plot.new
plot4.add_with_df(df, :histogram, :set1)
plot4.configure do
  height(400)
  x_label('PNR')
  y_label('Frequency')
  filter({target:'x'})
  yrange([0,130])
end

plot5=Nyaplot::Plot.new
plot5.add_with_df(df, :bar, :mutation)
plot5.configure do
  height(400)
  x_label('Mutation types')
  y_label('Frequency')
  yrange([0,100])
end
Out[13]:

Then create an instance of Nyaplot::Frame. It can hold multiple plots in it, and it helps them to interact with each other.

In [14]:
frame = Nyaplot::Frame.new
frame.add(plot4)
frame.add(plot5)
frame.show
Out[14]: