In [1]:
require 'mikon'
Out[1]:
Out[1]:
true

Analyzing Iris dataset using Mikon and StatSample

In [2]:
path = File.expand_path("../iris.csv", __FILE__)
df = Mikon::DataFrame.from_csv(path)
Out[2]:
sepal_lengthsepal_widthpetal_lengthpetal_widthspecies
05.13.51.40.2setosa
14.93.01.40.2setosa
24.73.21.30.2setosa
34.63.11.50.2setosa
45.03.61.40.2setosa
55.43.91.70.4setosa
64.63.41.40.3setosa
75.03.41.50.2setosa
84.42.91.40.2setosa
94.93.11.50.1setosa
105.43.71.50.2setosa
114.83.41.60.2setosa
124.83.01.40.1setosa
134.33.01.10.1setosa
145.84.01.20.2setosa
155.74.41.50.4setosa
165.43.91.30.4setosa
175.13.51.40.3setosa
185.73.81.70.3setosa
195.13.81.50.3setosa
205.43.41.70.2setosa
215.13.71.50.4setosa
224.63.61.00.2setosa
235.13.31.70.5setosa
244.83.41.90.2setosa
255.03.01.60.2setosa
265.03.41.60.4setosa
275.23.51.50.2setosa
285.23.41.40.2setosa
294.73.21.60.2setosa
304.83.11.60.2setosa
315.43.41.50.4setosa
325.24.11.50.1setosa
335.54.21.40.2setosa
344.93.11.50.1setosa
355.03.21.20.2setosa
365.53.51.30.2setosa
374.93.11.50.1setosa
384.43.01.30.2setosa
395.13.41.50.2setosa
405.03.51.30.3setosa
414.52.31.30.3setosa
424.43.21.30.2setosa
435.03.51.60.6setosa
445.13.81.90.4setosa
454.83.01.40.3setosa
465.13.81.60.2setosa
474.63.21.40.2setosa
485.33.71.50.2setosa
495.03.31.40.2setosa
507.03.24.71.4versicolor
..................
1495.93.05.11.8virginica
In [3]:
plot = df.plot(type: :scatter, x: :sepal_length, y: :petal_length, fill_by: :species, color: :qual)
Out[3]:
In [11]:
require 'statsample'
Out[11]:
false

Then draw a regression line using Statsample::Regression.

In [13]:
lr = Statsample::Regression.simple(df[:sepal_length], df[:petal_length])
puts lr.summary
a, b = lr.a, lr.b
= Regression of sepal_length over petal_length
  Table 2
+----------+--------+
| Variable | Value  |
+----------+--------+
| r        | 0.872  |
| r^2      | 0.760  |
| a        | -7.095 |
| b        | 1.858  |
| s.e      | 0.867  |
+----------+--------+


Out[13]:
[-7.095381478279314, 1.8575096654214456]
In [14]:
x = (df[:sepal_length].min.round..df[:sepal_length].max.round).to_a
y = x.map{|v| b*v+a}
plot.add(:line, x, y)
plot
Out[14]: