Visualizing data from Daru containers

DARU (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data. You can find information about daru in its repository.

GnuplotRB takes from Daru::Vector or Daru::DataFrame name as dataset's title and index column as xtic. Example:

In [1]:
require 'daru'
require 'gnuplotrb'
include GnuplotRB
include GnuplotRB::Fit

df = Daru::DataFrame.new({
  Build: [312, 630, 315, 312],
  Test: [525, 1050, 701, 514],
  Deploy: [215, 441, 370, 220]
  },
  index: ['Run A', 'Run B', 'Run C', 'Run D']
)
df[:Overall] = df[:Build] + df[:Test] + df[:Deploy]
df
Out[1]:
Daru::DataFrame:33875020 rows: 4 cols: 4
BuildDeployTestOverall
Run A3122155251052
Run B63044110502121
Run C3153707011386
Run D3122205141046

When you pass DataFrame to Plot.new it uses every column of DataFrame as a dataset with column name as dataset title:

In [2]:
from_daru = Plot.new(
    df,
    style_data: 'lines',
    yrange: 0..2200,
    xlabel: 'Number of test',
    ylabel: 'Time, s',
    title: 'Time spent to run deploy pipeline'
)
Out[2]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 0 500 1000 1500 2000 Run A Run B Run C Run D Time, s Number of test Time spent to run deploy pipeline Build Build Deploy Deploy Test Test Overall Overall
In [3]:
from_daru.options(
    style_data: 'histograms',
    style_fill: 'pattern border'
)
Out[3]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 0 500 1000 1500 2000 Run A Run B Run C Run D Time, s Number of test Time spent to run deploy pipeline Build Build Deploy Deploy Test Test Overall Overall

Datasets may be initialized both with Array or DataFrame:

In [4]:
Plot.new([df[:Overall], with: 'lines'])
Out[4]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 1000 1200 1400 1600 1800 2000 2200 Run A Run B Run C Run D Overall Overall
In [5]:
rows = (1..30).map do |i|
  [i**2 * (rand(4) + 3) / 5, rand(70)]
end
df = Daru::DataFrame.rows(rows, order: [:Value, :Error], name: 'Confidence interval')

random_points = Plot.new(
  [df[:Value], with: 'lines', title: 'Average value'],
  [df, with: 'err']
)
Out[5]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 -100 0 100 200 300 400 500 600 700 800 900 0 5 10 15 20 25 30 Average value Average value Confidence interval Confidence interval

ok, and now lets try to fit it with polynomial:

In [6]:
poly = fit_poly(df, degree: 5)
random_points.add_dataset(poly[:formula_ds])
Out[6]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 -100 0 100 200 300 400 500 600 700 800 900 0 5 10 15 20 25 30 Fit formula Fit formula Average value Average value Confidence interval Confidence interval
In [7]:
df = Daru::DataFrame.new({
    a: Array.new(100) {|i| i}, 
    b: 100.times.map{rand}
  },
  name: 'Scatter example'
)

Plot.new([df, pt: 6, ps: 1, using: '2:3'], xrange: -10..110, yrange: -0.1..1.1)
Out[7]:
Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Scatter example Scatter example
In [8]:
frames = 100.times.map do |i|
  Plot.new([df.row[0..i], using: '2:3', pt: 6, ps: 1])
end

Animation.new(*frames, xrange: -10..110, yrange: -0.1..1.1)
Out[8]: