In [1]:
require '~/workspace/daru/lib/daru.rb'

Out[1]:
true

## Categorical Vector Visualization¶

In [2]:
dv = Daru::Vector.new ['III']*10 + ['II']*5 + ['I']*5, type: :category, categories: ['I', 'II', 'III']
dv.type

Out[2]:
:category

### Bar graph¶

#### 1. Frequency (count)¶

In [3]:
dv.plot(type: :bar) do |p, d|
p.x_label 'Categories'
p.y_label 'Frequency'
end


#### 2. Percentage¶

In [4]:
dv.plot(type: :bar, method: :percentage) do |p, d|
p.x_label 'Categories'
p.y_label 'Percentage (%)'
end


#### 3. Fraction¶

In [5]:
dv.plot(type: :bar, method: :fraction) do |p, d|
p.x_label 'Categories'
p.y_label 'Fraction'
end


## Categorical data visualization in Dataframe¶

### Bar Graph¶

In [6]:
df = Daru::DataFrame.new({
a: [1, 2, 4, -2, 5, 23, 0],
b: [3, 1, 3, -6, 2, 1, 0],
c: ['I', 'II', 'I', 'III', 'I', 'III', 'II']
})
df.to_category :c
df[:c].type

Out[6]:
:category
In [7]:
df.plot(type: :bar, x: :c)


### Scatter plot categorized by categorical variable¶

Plots can be categorized by

• Color
• Size
• Shape
In [8]:
df = Daru::DataFrame.new({
a: [1, 2, 4, -2, 5, 23, 0],
b: [3, 1, 3, -6, 2, 1, 0],
c: ['I', 'II', 'I', 'III', 'I', 'III', 'II']
})
df.to_category :c
df[:c].type

Out[8]:
:category

Below are few examples

In [9]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :color}) do |p, d|
p.xrange [-10, 10]
p.yrange [-10, 10]
end

In [10]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :shape}) do |p, d|
p.xrange [-10, 10]
p.yrange [-10, 10]
end


One can also specify custom colors, size and shape. For example:

In [11]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :color, color: [:red, :blue, :green]}) do |p, d|
p.xrange [-10, 10]
p.yrange [-10, 10]
end

In [12]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :size, size: [300, 600, 900]}) do |p, d|
p.xrange [-10, 10]
p.yrange [-10, 10]
end


### Line plot categorized by categorical variable¶

It works similar to Scatter plot above and all options are same except that there's no categorization by size but instead there is categorization by stroke_width in line plots.

In [13]:
df = Daru::DataFrame.new({
a: [1, 2, 3, 4, 5, 6, 7, 8, 9],
b: [2, 4, 6, 1, 3, 5, 6, 4, 3],
c: ['I']*3 + ['II']*3 + ['III']*3
})
df.to_category :c
df[:c].type

Out[13]:
:category
In [14]:
df.plot type: :line, x: :a, y: :b, categorized: {by: :c, method: :color} do end

In [15]:
df.plot type: :line, x: :a, y: :b, categorized: {by: :c, method: :stroke_width} do |p, d|
p.xrange [-10, 10]
p.yrange [-10, 10]
end

In [ ]: