I have alreay explained about Nyaplot::DataFrame in tutorial 1, but it's not enough to tell the usefulness. This notebook consists of 2 use case using DataFrame.
gem 'nyaplot', '0.1.5'
require 'nyaplot'
true
First, prepare sample data and put it into DataFrame. Then build a scatter plot based on it.
samples = Array.new(10).map.with_index{|d,i| 'cat'+i.to_s}
x=[];y=[];home=[]
10.times do
x.push(5*rand)
y.push(5*rand)
end
df = Nyaplot::DataFrame.new({x:x,y:y,name:samples})
df
x | y | name |
---|---|---|
0.3776941903756781 | 3.150631669472884 | cat0 |
1.16523539526362 | 3.636084983829032 | cat1 |
3.4024015690223557 | 1.5265781962652962 | cat2 |
0.6478786205183784 | 4.697932060531303 | cat3 |
1.502303692675978 | 0.23551205104727635 | cat4 |
0.6186835562297704 | 1.5712039108775393 | cat5 |
2.360417118852578 | 3.037560696785177 | cat6 |
0.9958851328846746 | 3.3965329595925136 | cat7 |
1.0207006596250006 | 0.020049783384994968 | cat8 |
1.8874015058395726 | 2.973976083790996 | cat9 |
plot = Nyaplot::Plot.new
plot.x_label("weight [kg]")
plot.y_label("height [m]")
sc = plot.add_with_df(df, :scatter, :x, :y)
#<Nyaplot::Diagram:0x000000011704e8 @properties={:type=>:scatter, :options=>{:x=>:x, :y=>:y}, :data=>"4415e395-2c04-483a-a794-91e437f82ff9"}, @xrange=[0.3776941903756781, 3.4024015690223557], @yrange=[0.020049783384994968, 4.697932060531303]>
plot.show
The plot above is not contain name
information, so add it into tool-tip. Use tooltip_contents
to add contents to tool-tip.
sc.tooltip_contents([:name])
plot.show
Tool-tip can include multiple lines, but the DataFrame has only three columns and that's not enough to add more line to tool-tip. Let's add home
column to it.
address = ['London', 'Kyoto', 'Los Angeles', 'Puretoria']
home = Array.new(10,'').map{|d| address.clone.sample}
df.home = home
df
x | y | name | home |
---|---|---|---|
0.3776941903756781 | 3.150631669472884 | cat0 | Kyoto |
1.16523539526362 | 3.636084983829032 | cat1 | London |
3.4024015690223557 | 1.5265781962652962 | cat2 | Los Angeles |
0.6478786205183784 | 4.697932060531303 | cat3 | Kyoto |
1.502303692675978 | 0.23551205104727635 | cat4 | Puretoria |
0.6186835562297704 | 1.5712039108775393 | cat5 | Los Angeles |
2.360417118852578 | 3.037560696785177 | cat6 | London |
0.9958851328846746 | 3.3965329595925136 | cat7 | Puretoria |
1.0207006596250006 | 0.020049783384994968 | cat8 | Kyoto |
1.8874015058395726 | 2.973976083790996 | cat9 | Los Angeles |
sc.tooltip_contents([:name, :home])
plot.show
Then, fill points on the scatter in different colors according to 'home' column. To do so, specify column name by fill_by
method.
colors = Nyaplot::Colors.qual
rgb(251,180,174) | rgb(179,205,227) | rgb(204,235,197) | rgb(222,203,228) | rgb(254,217,166) | rgb(255,255,204) | rgb(229,216,189) | rgb(253,218,236) | rgb(242,242,242) |
---|---|---|---|---|---|---|---|---|
sc.color(colors)
sc.fill_by(:home)
plot.show
Use shape_by
method to change shape according to value in a column.
sc.color(colors)
sc.shape_by(:home)
plot.show
DataFrame is also useful when visualizing data in multiple panes. Let's create plot from data about mutation.
First, fetch data from csv file. (All data used in this Tutorial is included in Nyaplot's repository: /examples/notebook/data/*)
path = File.expand_path("../data/first.tab", __FILE__)
df = Nyaplot::DataFrame.from_csv(path, sep="\t")
mutation | blood | set1 | set2 | set3 | set12 | set21 | set31 |
---|---|---|---|---|---|---|---|
G>A | 0.0 | 0.019230769230769232 | 0.0 | 0.48214285714285715 | 0.0 | 0.0 | 0.4782608695652174 |
C>T | 0.0 | 0.42592592592592593 | 0.0 | 0.0 | 0.375 | 0.0 | 0.0 |
C>G | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.525 | 0.0 |
C>A | 0.0 | 0.0 | 0.1935483870967742 | 0.0 | 0.0 | 0.4666666666666667 | 0.0 |
C>A | 0.0 | 0.0 | 0.08333333333333333 | 0.0 | 0.0 | 0.5161290322580645 | 0.0 |
G>T | 0.0 | 0.0 | 0.0 | 0.0 | 0.4444444444444444 | 0.0 | 0.0 |
C>G | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.4 |
C>A | 0.0 | 0.0 | 0.03333333333333333 | 0.0 | 0.0 | 0.42857142857142855 | 0.0 |
A>C | 0.0 | 0.6153846153846154 | 0.0 | 0.0 | 0.5925925925925926 | 0.0 | 0.0 |
C>A | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.32 | 0.0 |
C>A | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.4230769230769231 | 0.0 |
T>A | 0.0 | 0.525 | 0.0 | 0.0 | 0.5277777777777778 | 0.0 | 0.0 |
C>T | 0.0 | 0.42857142857142855 | 0.0 | 0.0 | 0.6666666666666666 | 0.0 | 0.0 |
G>A | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.5 | 0.03225806451612903 |
T>C | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.3793103448275862 |
C>T | 0.0 | 0.0 | 0.0 | 0.45714285714285713 | 0.0 | 0.0 | 0.46875 |
... | ... | ... | ... | ... | ... | ... | ... |
A>T | 0.0 | 0.4716981132075472 | 0.0 | 0.014492753623188406 | 0.3142857142857143 | 0.0 | 0.0 |
Now I want to plot SET1 column, but it contains many zero cells. Then filter them out.
df.filter! {|row| row[:set1] != 0.0}
df
mutation | blood | set1 | set2 | set3 | set12 | set21 | set31 |
---|---|---|---|---|---|---|---|
G>A | 0.0 | 0.019230769230769232 | 0.0 | 0.48214285714285715 | 0.0 | 0.0 | 0.4782608695652174 |
C>T | 0.0 | 0.42592592592592593 | 0.0 | 0.0 | 0.375 | 0.0 | 0.0 |
A>C | 0.0 | 0.6153846153846154 | 0.0 | 0.0 | 0.5925925925925926 | 0.0 | 0.0 |
T>A | 0.0 | 0.525 | 0.0 | 0.0 | 0.5277777777777778 | 0.0 | 0.0 |
C>T | 0.0 | 0.42857142857142855 | 0.0 | 0.0 | 0.6666666666666666 | 0.0 | 0.0 |
G>A | 0.0 | 0.45652173913043476 | 0.0 | 0.0 | 0.43478260869565216 | 0.0 | 0.0 |
C>T | 0.0 | 0.09803921568627451 | 0.0 | 0.0 | 0.37142857142857144 | 0.0 | 0.0 |
T>A | 0.0 | 0.5769230769230769 | 0.0 | 0.0 | 0.4782608695652174 | 0.0 | 0.0 |
A>G | 0.0 | 0.43859649122807015 | 0.0 | 0.0 | 0.5142857142857142 | 0.0 | 0.0 |
T>C | 0.0 | 0.5806451612903226 | 0.0 | 0.0 | 0.6341463414634146 | 0.0 | 0.0 |
G>A | 0.0 | 0.014705882352941176 | 0.0 | 0.0 | 0.0 | 0.0 | 0.41935483870967744 |
G>A | 0.0 | 0.6666666666666666 | 0.0 | 0.0 | 0.5 | 0.0 | 0.0 |
C>T | 0.0 | 0.532258064516129 | 0.0 | 0.0 | 0.5227272727272727 | 0.0 | 0.0 |
G>A | 0.0 | 0.5652173913043478 | 0.0 | 0.0 | 0.3870967741935484 | 0.0 | 0.0 |
G>A | 0.0 | 0.42857142857142855 | 0.0 | 0.0 | 0.3333333333333333 | 0.0 | 0.0 |
C>A | 0.0 | 0.4888888888888889 | 0.0 | 0.0 | 0.43243243243243246 | 0.0 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... |
A>T | 0.0 | 0.4716981132075472 | 0.0 | 0.014492753623188406 | 0.3142857142857143 | 0.0 | 0.0 |
Next prepare instance of Nyaplot::Plot as usual. Nyaplot::Plot.filter is a method for adding 'filter box' to the plot.
plot4=Nyaplot::Plot.new
plot4.add_with_df(df, :histogram, :set1)
plot4.configure do
height(400)
x_label('PNR')
y_label('Frequency')
filter({target:'x'})
yrange([0,130])
end
plot5=Nyaplot::Plot.new
plot5.add_with_df(df, :bar, :mutation)
plot5.configure do
height(400)
x_label('Mutation types')
y_label('Frequency')
yrange([0,100])
end
Then create an instance of Nyaplot::Frame. It can hold multiple plots in it, and it helps them to interact with each other.
frame = Nyaplot::Frame.new
frame.add(plot4)
frame.add(plot5)
frame.show