Creating Visualizations with DataFrame

Using nyaplot in the background to generate interactive plots, which can be viewed in your browser.

In this tutorial we'll see how we can create some interesting plots with Daru::DataFrame using the Daru::View::Plot function.

In [1]:
require 'daru/view'
Install the spreadsheet gem version ~>1.1.1 for using spreadsheet functions.

Install the mechanize gem version ~>2.7.5 for using mechanize functions.
Out[1]:
true
In [2]:
# Set a default plotting library
Daru::View.plotting_library = :nyaplot
Out[2]:
:nyaplot

Scatter Plot

In [3]:
df = Daru::DataFrame.new({
  a: Array.new(100) {|i| i}, 
  b: 100.times.map{rand}
})
Out[3]:
Daru::DataFrame(100x2)
a b
0 0 0.5901045999837067
1 1 0.436683497259194
2 2 0.8201904794946736
3 3 0.7412154680988353
4 4 0.7918037303341074
5 5 0.75465991314003
6 6 0.001594846966242347
7 7 0.8452403857618761
8 8 0.8315826505233574
9 9 0.9898896370377346
10 10 0.19305512177494977
11 11 0.471331281205303
12 12 0.2727093997406749
13 13 0.9974328400097253
14 14 0.32827550306051856
15 15 0.9325995560620807
16 16 0.7035670210595155
17 17 0.7262435775960362
18 18 0.9429472518587119
19 19 0.5115635843253195
20 20 0.25640688913675325
21 21 0.8651112057349126
22 22 0.44472174755120386
23 23 0.1696831523121588
24 24 0.18008052430086519
25 25 0.7870035406333422
26 26 0.145610247050024
27 27 0.8604271138888401
28 28 0.9581902890007646
29 29 0.6406388227168683
... ... ...
99 99 0.2551244214639039
In [4]:
scatter_1 = Daru::View::Plot.new(df, type: :scatter, x: :a, y: :b)
Out[4]:
#<Daru::View::Plot:0x007ff4512724f8 @data=#<Daru::DataFrame(100x2)>
                     a          b
          0          0 0.59010459
          1          1 0.43668349
          2          2 0.82019047
          3          3 0.74121546
          4          4 0.79180373
          5          5 0.75465991
          6          6 0.00159484
          7          7 0.84524038
          8          8 0.83158265
          9          9 0.98988963
         10         10 0.19305512
         11         11 0.47133128
         12         12 0.27270939
         13         13 0.99743284
         14         14 0.32827550
        ...        ...        ..., @options={:type=>:scatter, :x=>:a, :y=>:b}, @adapter=Daru::View::Adapter::NyaplotAdapter, @chart=#<Nyaplot::Plot:0x007ff451272278 @properties={:diagrams=>[#<Nyaplot::Diagram:0x007ff45185ae60 @properties={:type=>:scatter, :options=>{:x=>:a, :y=>:b}, :data=>"f4063d0d-fc9a-4aa1-b6ea-336fbba2d1a0"}, @xrange=[0, 99], @yrange=[2.5766515678427027e-05, 0.9974328400097253]>], :options=>{}}>>
In [5]:
scatter_1.show_in_iruby

Just specifying the options to plot yields a very simple graph without much customization.

But what if you want to enhance your scatter plot with colors, add tooltips for each point and change the label of the X and Y axes. Also you may be faced with a situation where you want to see two different scatter plots on the same graph, each with a different color.

All this can be done by combining #plot with a block. The #plot method yields the corresponding Nyaplot::Plot and Nyaplot::Diagram objects for the graph, which can be used for many varied customizations. Lets see some examples:

In [6]:
# DataFrame denoting Ice Cream sales of a particular food chain in a city
# according to the maximum recorded temperature in that city. It also lists
# the staff strength present in each city.

df = Daru::DataFrame.new({
  :temperature => [30.4, 23.5, 44.5, 20.3, 34, 24, 31.45, 28.34, 37, 24],
  :sales       => [350, 150, 500, 200, 480, 250, 330, 400, 420, 560],
  :city        => ['Pune', 'Delhi']*5,
  :staff       => [15,20]*5
  })
df
Out[6]:
Daru::DataFrame(10x4)
city sales staff temperature
0 Pune 350 15 30.4
1 Delhi 150 20 23.5
2 Pune 500 15 44.5
3 Delhi 200 20 20.3
4 Pune 480 15 34
5 Delhi 250 20 24
6 Pune 330 15 31.45
7 Delhi 400 20 28.34
8 Pune 420 15 37
9 Delhi 560 20 24
In [7]:
# Generating a scatter plot with tool tips, colours and different shapes.
scatter_2 = Daru::View::Plot.new(df, type: :scatter, x: :temperature, y: :sales).chart
scatter_2.tap do |plot, diagram|
  plot.x_label "Temperature"
  plot.y_label "Sales"
  plot.yrange [100, 600]
  plot.xrange [15, 50]
  plot.diagrams[0].tooltip_contents([:city, :staff])
  plot.diagrams[0].color(Nyaplot::Colors.qual) # set the color scheme for this diagram. See Nyaplot::Colors for more info.
  plot.diagrams[0].fill_by(:city) # Change color of each point WRT to the city that it belongs to.
  plot.diagrams[0].shape_by(:city) # Shape each point WRT to the city that it belongs to.
end

# Move the mouse pointer over the points to see the tool tips.
Out[7]:
In [8]:
# Array of diagrams
scatter_2.diagrams
Out[8]:
[#<Nyaplot::Diagram:0x007ff451bd1a58 @properties={:type=>:scatter, :options=>{:x=>:temperature, :y=>:sales, :tooltip_contents=>[:city, :staff], :color=>#<Nyaplot::Color:0x007ff451bd0b80 @source=["rgb(127,201,127)", "rgb(190,174,212)", "rgb(253,192,134)", "rgb(255,255,153)", "rgb(56,108,176)", "rgb(240,2,127)", "rgb(191,91,23)", "rgb(102,102,102)"]>, :fill_by=>:city, :shape_by=>:city}, :data=>"f78d4dd9-8e28-4504-8ded-1d5a75d7ce2c"}, @xrange=[20.3, 44.5], @yrange=[150, 560]>]

Bar Graph

Generating a bar graph requires passing :bar into the :type option.

In [9]:
# A Bar Graph denoting the age at which various Indian Kings died.

df = Daru::DataFrame.new({
  name: ['Emperor Asoka', 'Akbar The Great', 'Rana Pratap', 'Shivaji Maharaj', 'Krishnadevaraya'],
  age:  [72,63,57,53,58] 
  }, order: [:name, :age])

df.sort!([:age])
Out[9]:
Daru::DataFrame(5x2)
name age
3 Shivaji Maharaj 53
2 Rana Pratap 57
4 Krishnadevaraya 58
1 Akbar The Great 63
0 Emperor Asoka 72
In [10]:
Daru::View::Plot.new(df, type: :bar, x: :name, y: :age).chart.tap do |plot|
  plot.x_label "Name"
  plot.y_label "Age"
  plot.yrange [20,80]
end
Out[10]:

It is also possible to simply pass in the :x parameter if you want to the frequency of occurence of each element in a Vector.

In [11]:
a = ['A', 'C', 'G', 'T']
v = 1000.times.map { a.sample }

puts "v : ", v

df = Daru::DataFrame.new({
  a: v
  })
Daru::View::Plot.new(df, type: :bar, x: :a).chart.tap do |plot|
  plot.yrange [0,350]
  plot.y_label "Frequency"
  plot.x_label "Letter"
end
v : 
["T", "T", "G", "G", "A", "A", "C", "T", "C", "A", "T", "G", "T", "A", "A", "G", "A", "T", "G", "G", "G", "T", "C", "G", "G", "A", "T", "A", "T", "T", "T", "A", "G", "A", "A", "C", "A", "A", "C", "A", "G", "C", "A", "A", "A", "A", "A", "A", "T", "T", "C", "G", "T", "A", "C", "C", "A", "A", "A", "T", "C", "G", "T", "T", "T", "T", "G", "T", "T", "T", "T", "T", "T", "C", "G", "G", "A", "G", "C", "A", "A", "G", "G", "G", "G", "G", "C", "C", "A", "A", "G", "C", "A", "C", "T", "T", "A", "G", "A", "A", "T", "C", "A", "C", "C", "G", "T", "T", "C", "T", "A", "C", "T", "G", "A", "T", "A", "C", "A", "A", "A", "C", "T", "T", "G", "C", "G", "T", "A", "C", "A", "T", "T", "A", "C", "C", "T", "A", "C", "C", "G", "C", "T", "A", "C", "G", "T", "A", "T", "A", "C", "C", "A", "T", "G", "C", "C", "A", "G", "G", "G", "C", "T", "T", "C", "G", "G", "T", "G", "T", "G", "G", "A", "G", "C", "G", "G", "C", "A", "C", "T", "C", "A", "A", "C", "A", "T", "A", "T", "T", "T", "A", "C", "T", "G", "C", "G", "C", "G", "G", "C", "A", "A", "C", "A", "A", "T", "C", "A", "T", "T", "T", "G", "A", "T", "C", "A", "A", "A", "A", "T", "T", "A", "A", "G", "C", "A", "T", "C", "T", "T", "C", "C", "T", "C", "T", "A", "C", "G", "G", "C", "T", "T", "T", "A", "T", "T", "T", "C", "A", "C", "C", "A", "T", "T", "C", "G", "G", "A", "G", "T", "G", "T", "A", "A", "C", "C", "T", "C", "G", "G", "G", "T", "A", "A", "G", "C", "G", "A", "T", "C", "G", "A", "G", "A", "A", "C", "C", "A", "T", "C", "T", "A", "T", "C", "A", "A", "A", "A", "C", "T", "A", "A", "A", "G", "G", "C", "A", "A", "T", "C", "G", "A", "C", "G", "G", "G", "C", "G", "T", "T", "G", "T", "A", "C", "C", "G", "A", "A", "A", "T", "T", "C", "A", "G", "A", "T", "T", "C", "A", "T", "G", "A", "G", "G", "T", "A", "T", "G", "A", "T", "T", "G", "C", "T", "T", "C", "C", "G", "A", "T", "A", "G", "A", "G", "T", "G", "T", "C", "G", "G", "T", "A", "C", "C", "A", "G", "T", "G", "G", "G", "C", "G", "C", "T", "G", "G", "G", "A", "A", "C", "A", "G", "G", "C", "C", "C", "T", "T", "G", "G", "G", "G", "C", "A", "A", "T", "C", "G", "T", "G", "G", "T", "C", "T", "T", "A", "A", "G", "G", "A", "G", "T", "T", "G", "C", "G", "G", "C", "T", "T", "T", "G", "C", "T", "G", "G", "T", "A", "G", "A", "A", "A", "G", "A", "T", "T", "G", "A", "G", "G", "A", "C", "A", "T", "A", "A", "G", "A", "C", "T", "T", "G", "G", "A", "G", "C", "A", "A", "C", "T", "T", "C", "A", "C", "C", "T", "C", "T", "T", "A", "T", "T", "A", "C", "C", "A", "A", "G", "A", "G", "T", "C", "C", "T", "C", "A", "A", "C", "G", "G", "G", "C", "A", "G", "T", "G", "G", "A", "T", "C", "T", "T", "A", "A", "T", "C", "T", "C", "C", "A", "C", "T", "T", "G", "A", "C", "C", "A", "C", "T", "G", "A", "T", "G", "G", "T", "A", "G", "T", "G", "A", "T", "C", "A", "G", "G", "G", "C", "T", "T", "G", "C", "G", "C", "G", "C", "T", "T", "C", "G", "A", "G", "G", "T", "T", "C", "A", "A", "T", "G", "G", "G", "T", "C", "G", "G", "G", "T", "G", "A", "T", "A", "A", "G", "A", "A", "C", "C", "C", "G", "T", "C", "T", "C", "G", "G", "G", "T", "A", "G", "T", "G", "T", "T", "T", "C", "G", "T", "A", "A", "G", "A", "A", "T", "C", "G", "A", "C", "T", "C", "A", "C", "A", "A", "C", "C", "A", "A", "G", "C", "A", "G", "T", "A", "A", "A", "G", "G", "C", "G", "C", "C", "T", "C", "A", "T", "C", "G", "G", "G", "C", "T", "C", "T", "A", "T", "C", "A", "G", "G", "T", "A", "A", "C", "A", "G", "C", "A", "T", "C", "A", "G", "C", "T", "G", "C", "G", "T", "T", "A", "C", "T", "C", "T", "A", "C", "T", "A", "C", "T", "T", "A", "A", "T", "C", "A", "C", "T", "T", "G", "T", "A", "T", "A", "T", "G", "C", "C", "G", "C", "G", "T", "T", "T", "G", "A", "C", "C", "A", "C", "C", "T", "G", "G", "C", "C", "C", "G", "A", "C", "G", "C", "T", "G", "C", "G", "G", "G", "T", "A", "C", "C", "T", "A", "A", "C", "T", "T", "G", "T", "C", "T", "G", "A", "C", "G", "G", "G", "A", "C", "G", "C", "C", "G", "C", "A", "G", "G", "A", "T", "A", "C", "C", "T", "A", "A", "A", "T", "C", "T", "A", "T", "A", "G", "A", "C", "G", "T", "C", "C", "G", "G", "G", "G", "C", "A", "G", "T", "G", "A", "C", "T", "T", "G", "G", "T", "T", "T", "C", "G", "A", "T", "A", "C", "G", "G", "A", "G", "T", "G", "G", "C", "G", "T", "T", "C", "A", "G", "A", "G", "G", "C", "A", "A", "G", "G", "G", "A", "A", "A", "A", "A", "A", "A", "G", "T", "A", "C", "G", "A", "T", "A", "T", "C", "A", "C", "G", "C", "G", "T", "T", "G", "G", "T", "A", "A", "C", "C", "T", "A", "G", "C", "T", "G", "A", "T", "C", "A", "A", "G", "C", "T", "C", "C", "C", "T", "A", "G", "A", "A", "G", "C", "T", "T", "C", "A", "A", "C", "C", "A", "C", "G", "T", "A", "C", "C", "A", "T", "C", "T", "T", "C", "G", "T", "T", "A", "A", "G", "A", "G", "C", "G", "A", "T", "G", "G", "A", "T", "T", "A", "G", "A", "G", "A", "G", "G", "T", "G", "C", "G", "C", "G", "A", "T", "C", "C", "C", "A", "C", "A", "C", "T", "G", "C", "G", "G", "G", "T", "A", "T", "T", "G", "C", "T", "T", "T", "A", "G", "A", "T", "A", "C", "G", "A", "C", "C", "C", "G", "G", "C", "A", "G", "C", "A", "T", "C", "G", "T", "T", "T", "C", "C", "T", "C", "C", "T", "C", "T", "A"]
Out[11]:

Box Plots

A box plot can be generated of the numerical vectors in the DataFrame by simply passing :box to the :type argument.

To demonstrate, I'll prepare some data using the distribution gem to get a bunch of normally distributed random variables. We'll then plot in a Box plot after creating a DataFrame with the data.

In [19]:
require 'distribution'
rng = Distribution::Normal.rng
# Daru.lazy_update = false

arr = []
1000.times {arr.push(rng.call)}

arr1 = arr.map{|val| val/0.8-2}
arr2 = arr.map{|val| val*1.1+0.3}
arr3 = arr.map{|val| val*1.3+0.3}

df = Daru::DataFrame.new({ a: arr, b: arr1, c: arr2, d: arr3 })
box_1 = Daru::View::Plot.new(df, type: :box)
box_1.show_in_iruby

Line Graphs

Line graphs can be easily generated by passing :line to the :type option.

For example, lets plot a simple line graph showing the temperature of New York City over a week.

In [24]:
df = Daru::DataFrame.new({
  temperature: [43,53,50,57,59,47],
  day:         [1,2,3,4,5,6]
})

line_1 = Daru::View::Plot.new(df,type: :line, x: :day, y: :temperature).chart
line_1.tap do |plot|
  plot.x_label "Day"
  plot.y_label "Temperature"
  plot.yrange [20,60]
  plot.xrange [1,6]
  plot.legend true
  plot.diagrams[0].title "Temperature in NYC"
end
Out[24]:

Histogram

Specify :histogram to :type will make a histogram from the data.

Histograms dont need a X axis label (because they show the frequency of elements in each bin) so you need to specify the name of the vector you want to plot by passing its name into the :x option.

In [26]:
v = 1000.times.map { rand }

df = Daru::DataFrame.new({
  a: v
  })

Daru::View::Plot.new(df, type: :histogram, x: :a).chart.tap do |plot|
  plot.yrange [0,150]
  plot.y_label "Frequency"
  plot.x_label "Bins"
end
Out[26]:

Multiple Diagrams on the same Plot¶

Scatter Diagrams on the same Plot

Daru allows you to plot as many columns of your dataframe as you want on the same plot.

This can allow you to plot data from the dataframe onto the same graph and visually compare results from observations. You can individually set the color or point shape of each diagram on the plot.

As a first demostration, lets create a DataFrame of the temperatures of three different cities over the period of a week. Then, we'll plot them all on the same graph by passing options to the plot method which tell it the Vectors that are to be used for each of the diagrams.

In [28]:
df = Daru::DataFrame.new({
  nyc_temp:     [43,53,50,57,59,47],
  chicago_temp: [23,30,35,20,26,38],
  sf_temp:      [60,65,73,67,55,52],
  day:          [1 ,2 ,3 ,4 ,5 , 6]
  })

# As you can see, the options passed denote the x and y axes that are to be used by each diagram.
# You can add as many x any y axes as you want, just make sure the relevant vectors are present
# in your DataFrame!
#
# Heres an explanation of all the options passed:
#
# * type - The type of graph to be drawn. All the diagrams will be of the same type in this case.
# * x1/x2/x3 - The Vector from the DataFrame that is to be treated as the X axis for each of the 
# three diagrams. In this case all of them need the :day Vector.
# * y1/y2/y3 - The Vector from the DataFrame that is to be treated as the Y axis for each of the
# three diagrams. As you can see the 1st diagram will plot nyc_temp, the 2nd chicago_temp and the
# the 3rd sf_temp.
# 
# The values yielded in the block are also slightly different in this case.
# The first argument ('plot') is the same as in all the above examples (Nyaplot::Plot), but the 
# second argument ('diagrams') is now an Array of Nyaplot::Diagram objects. Each of the elements
# in the Array represents the diagrams that you want to plot according to the sorting sequence
# of the options specifying the axes.

graph = Daru::View::Plot.new(df, type: :scatter, x1: :day, y1: :nyc_temp, x2: :day, y2: :chicago_temp, x3: :day, y3: :sf_temp) 

graph.chart.tap do |plot|
  nyc     = plot.diagrams[0]
  chicago = plot.diagrams[1]
  sf      = plot.diagrams[2]
    
  nyc.title "Temprature in NYC"
  nyc.color "#00FF00"
  
  chicago.title "Temprature in Chicago"
  chicago.color "#FFFF00"
  
  sf.title "Temprature in SF"
  sf.color "#0000FF"
  
  plot.legend true
  plot.yrange [0,100]
  plot.x_label "Day"
  plot.y_label "Temperature"
end
Out[28]:

Scatter and Line Diagram on the same Plot

It is also possible to plot two different kinds of diagrams on the same plot. To show you how this works, I will plot a scatter graph and a line graph on the same plot.

To elaborate, we will be plotting the a set of points on a scatter plot alongwith their line of best fit.

In [31]:
df = Daru::DataFrame.new({
  burger: ["Hamburger","Cheeseburger","Quarter Pounder","Quarter Pounder with Cheese","Big Mac","Arch Sandwich Special","Arch Special with Bacon","Crispy Chicken","Fish Fillet","Grilled Chicken","Grilled Chicken Light"],
  fat: [9,13 ,21 ,30 ,31 ,31 ,34 ,25 ,28 ,20 ,5],
  calories: [260,320,420,530,560,550,590,500,560,440,300]
  },
  order: [:burger, :fat, :calories])
Out[31]:
Daru::DataFrame(11x3)
burger fat calories
0 Hamburger 9 260
1 Cheeseburger 13 320
2 Quarter Pounder 21 420
3 Quarter Pounder with Cheese 30 530
4 Big Mac 31 560
5 Arch Sandwich Special 31 550
6 Arch Special with Bacon 34 590
7 Crispy Chicken 25 500
8 Fish Fillet 28 560
9 Grilled Chicken 20 440
10 Grilled Chicken Light 5 300

We'll now write a small algorithm to compute the slope of the line of best fit by placing the fat content as the X co-ordinates and calories as Y co-ordinates.

The line of best fit will be a line graph of red color and the fat and calorie contents will be plotted as usual using a scatter plot.

In [34]:
# Algorithm for computing the line of best fit

sum_x  = df[:fat].sum
sum2_x = (df[:fat]*df[:fat]).sum 
sum_xy = (df[:fat]*df[:calories]).sum
mean_x = df[:fat].mean
mean_y = df[:calories].mean

slope = (sum_xy - sum_x * mean_y) / (sum2_x - sum_x * mean_x)
yint  = mean_y - slope * mean_x

# Assign the computed Y co-ordinates of the line of best fit to a column 
# in the DataFrame called :y_coords
df[:y_coords] = df[:fat].map {|f| f*slope + yint }

# As you can see the options passed into plot are slightly different this time.
#
# Instead of passing Vector names into :x1, :x2... separately, this time we pass
# the relevant names of the X and Y axes co-ordinates as an Array into the :x and 
# :y options.This is a simpler and easier way to plot multiple diagrams.
# 
# As is demonstrated in the previous example, the first argument yields a Nyaplot::Plot
# object and the second an Array of Nyaplot::Diagram objects. The diagrams are ordered
# according to the types specified in the `:type` option.

graph = Daru::View::Plot.new(df, type: [:scatter, :line], x: [:fat, :fat], y: [:calories, :y_coords]) 
graph.chart.tap do |plot|
  plot.x_label "Fat"
  plot.y_label "Calories"
  plot.xrange [0,50]
  
  scatter = plot.diagrams[0]
  line    = plot.diagrams[1]
  
  line.color "#FF0000" #set color of the line to 'red'
  scatter.tooltip_contents [:burger] # set tool tip to :burger
end
Out[34]: