# A Glimpse of Daru::Vector¶

In daru, the Daru::Vector is a 1 dimensional array with axis labels.

Labels should be unique. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods automatically exclude missing data (currently represented by default as nil).

Operations between Vectors (+, -, /, , *) align values based on their associated index values. The Vectors need not be of similar length. The result index will be the sorted union of the two indexes.

Daru::Vector is similar to pandas.Series.

The examples below demonstrates how a simple Daru::Vector can be created and its data viewed.

This first example shows very basic creation of a Vector with missing data (represented by nil).

In [1]:
require 'daru'

Out[1]:
true

The very basic way to create a Vector is by just passing an Array of values into the constructor.

Index labels can be specified using the :index option and you can also name your Vector something using the :name option. In case :index isn't specified, the Vector will be assigned an index starting from 0.

In [2]:
a = Daru::Vector.new([1,2,3,4,5], index: [:a, :b, :c, :d, :e], name: :bazinga)

Out[2]:
Daru::Vector:30420060 size: 5
bazinga
a1
b2
c3
d4
e5

Values can be accessed using their labels with the #[] operator.

In [3]:
a[:b]

Out[3]:
2

OR you can even specify a range with labels...

In [4]:
a[:b..:d]

Out[4]:
Daru::Vector:29850660 size: 3
bazinga
b2
c3
d4

Values can be assigned with the #[]= operator.

In [5]:
a[:b] = 999
a

Out[5]:
Daru::Vector:30420060 size: 5
bazinga
a1
b999
c3
d4
e5

If you want to treat values apart from nil as missing, you can specify them using the :missing_values option.

The #only_valid method can then be used for obtaining all the non-missing values of the Vector. Notice that only_valid preserves the indexes (labels) of the data.

In [6]:
a = Daru::Vector.new([1,2,3,5,5,4,6,nil,nil], missing_values: [5,nil])
a.only_valid

Out[6]:
Daru::Vector:29284260 size: 5
nil
01
12
23
54
66

The Vector.[] class method creates a vector from almost any object that has a #to_a method defined on it. It is similar to R's c method.

In [7]:
b = Daru::Vector[1,2,3,4,6..10]

Out[7]:
Daru::Vector:28825140 size: 9
nil
01
12
23
34
46
57
68
79
810

The new_with_size class method lets you create a Daru::Vector by specifying the size as the argument. The optional block, if supplied, is run once for populating each element in the Vector.

The result of each run of the block is the value that is ultimately assigned to that position in the Vector.

In [8]:
a = Daru::Vector.new_with_size(1000, name: :new_vector) { r=rand(5); r == 4 ? nil: r; }

Out[8]:
Daru::Vector:28500640 size: 1000
new_vector
02
13
20
3
4
52
62
72
81
91
103
110
122
130
143
151
163
171
183
190
201
211
222
232
242
253
263
272
280
29
303
313
......
9993

Use the #head method for obtaining the top 10 values of the Vector.

In [9]:
a.head

Out[9]:
Daru::Vector:27175540 size: 10
new_vector
02
13
20
3
4
52
62
72
81
91

### Sorting¶

The Daru::Vector#sort method will sort the Vector and preserve the indexes.

In [10]:
a = Daru::Vector.new([23,144,332,11,2,5,6765,3])

Out[10]:
Daru::Vector:25317760 size: 8
nil
023
1144
2332
311
42
55
66765
73
In [11]:
a.sort

Out[11]:
Daru::Vector:24840120 size: 8
nil
42
73
55
311
023
1144
2332
66765

### Basic Math¶

Arithmetic operations done between two vectors will always perform the arithmetic on corresponding elements of the same index.

The concerned vectors need not have the same size of even the same index. In case of a mismatch, a sorted union of the indexes of both the Vectors is used as an index for the resulting vector.

In case a particular index exists in one vector but not in the other, the result Vector has a nil placed in that index position.

Daru::Vector supports +, -, *, / and ** operators.

In [12]:
a = Daru::Vector.new([1,2,3,4,5,6], index: [:a, :b, :c, :d, :five, :f])
b = Daru::Vector.new([1,2,3,4,5], index: [:a, :b, :c, :ff,:five])

a + b

Out[12]:
Daru::Vector:24525720 size: 7
nil
a2
b4
c6
d
f
ff
five10
In [13]:
a ** b

Out[13]:
Daru::Vector:24243560 size: 7
nil
a1
b4
c27
d
f
ff
five3125

Performing arithmetic with a single number will perform the operation on each element in the Vector and return the resultant Vector.

In [14]:
a * 5

Out[14]:
Daru::Vector:23813900 size: 6
nil
a5
b10
c15
d20
five25
f30

### Statistics¶

Daru::Vector defines a host of statistics methods, which are useful for performing ephemeral statistics on numeric data. All the statistics methods ignore the missing values and work only on the valid data.

For a complete list of statistics functions see the Daru::Maths::Statistics::Vector module in the docs.

In [15]:
v = Daru::Vector.new([1,2,3,4,5,nil,6,nil,7])
v.mean

Out[15]:
4.0
In [16]:
v.variance

Out[16]:
4.666666666666667
In [17]:
v.median

Out[17]:
4

### Plotting¶

Daru uses nyaplot internally for generating interactive plots.

You can also use rubyvis through statsample for quickly generating scatter plots, histograms and box plots.

A simple scatter plot can be generated by simply calling the #plot function on Daru::Vector. Feel free to interact with the generated plot.

In [18]:
v = Daru::Vector.new((0..360).step(7).map { |i| Math.sin((i*Math::PI)/180) })
v.plot


Now, lets take some dummy data of a survey that shows the number of people of each age group that are part of this survey. We want to plot the number of people from each age group who have taken the test in a bar graph.

For this purpose we use the #plot function again, but this time supply it with the :type option, and set the value of this option as :bar. The plot function yeilds the corresponding Nyaplot::Plot object in the block, which can then be used for setting different parameters of the final plot. For more configuration methods see the Nyaplot::Plot documentation.

In [19]:
v = Daru::Vector.new([40,50,20,70,10], index: ['18-24', '24-30', 'Under 18', '30-40', '40-50'], name: "Age Range")
v.plot(type: :bar) do |plt|
plt.x_label "Age Groups"
plt.y_label "Number of People Surveyed"
end


The third kind of plot that Daru::Vector can easily generate from nyaplot is the histogram.

To demonstrate, we'll prepare some sample data using the rnorm function from the statsample ruby gem. The rnorm function just generates normally distributied random variables (1000 in this case) and returns a Daru::Vector object that contains these numbers (in variable a).

A histogram of the normally distributed function has been generated below.

In [20]:
require 'statsample'
include Statsample::Shorthand

a = rnorm(1000)
a.plot type: :histogram do |p|
p.yrange [0,200]
p.y_label "Frequency"
p.x_label "Bins"
end


#### More plotting support¶

Apart from interfacing with nyaplot, Daru::Vector also works out-of-the-box with rubyvis through statsample. To see generating plots with statsample and rubyvis in action, checkout the following notebooks: