# Time Series Functionality in daru¶

This notebook describes the time series functionality of daru. We'll go through some examples of creating and interacting with a time series and also see the functionality that is offered by the specialized index that deals with time series data, called DateTimeIndex. A few functions that are particularly useful for analyzing time-based data will also be demoed.

At the end we'll see how a time series can be visualized using the excellent GNU plot gem.

In [1]:
require 'daru'
require 'awesome_print'

Out[1]:
true

For a Daru::Vector or DataFrame to qualify as timeseries, it must be indexed using the Daru::DateTimeIndex class. A DateTimeIndex class can be created by using the .date_range function or by using the class constructor directly.

## Creating DateTimeIndex with .date_range¶

The DateTimeIndex.date_range function accepts the following options as parameters:

• :start - A DateTime object or date-like string that defines the start of the date range.
• :end - A DateTime object or date-like string that defines the end of the date range.
• :freq - The interval between each date in the index. This can either be a string specifying the frequency (i.e. one of the frequency aliases) or a Daru::Offset object.
• :periods - The number of periods that should go into this index. Takes precedence over :end.

If you specify :start and :end options as strings, they can be complete or partial dates and daru will intelligently infer the date from the string directly. However, note that the date-like string must be in the format YYYY-MM-DD HH:MM:SS. Currently the precision of DateTimeIndex is upto seconds only, though this will improve in the future.

In [2]:
# In the code below we will create a DateTimeIndex starting from 2012-4-4 to 2012-4-19
# with a daily frequency. The 'D' supplied to the :freq argument specifies that frequency
# has to be daily. It can be any of the string offset alaises amongst those supported. See
# the section below for a complete overview of date offsets.

index = Daru::DateTimeIndex.date_range(:start => '2012-4-4', :end => '2012-4-19', :freq => 'D')

Out[2]:
#<DateTimeIndex:20856340 offset=D periods=16 data=[2012-04-04T00:00:00+00:00...2012-04-19T00:00:00+00:00]>

As you see above .date_range has created a DateTimeIndex with 16 dates (or periods) with a daily frequency between each date.

Converting this index to an Array shows that this is true:

In [3]:
ap index.to_a
nil

[
[ 0] #<DateTime: 2012-04-04T00:00:00+00:00 ((2456022j,0s,0n),+0s,2299161j)>,
[ 1] #<DateTime: 2012-04-05T00:00:00+00:00 ((2456023j,0s,0n),+0s,2299161j)>,
[ 2] #<DateTime: 2012-04-06T00:00:00+00:00 ((2456024j,0s,0n),+0s,2299161j)>,
[ 3] #<DateTime: 2012-04-07T00:00:00+00:00 ((2456025j,0s,0n),+0s,2299161j)>,
[ 4] #<DateTime: 2012-04-08T00:00:00+00:00 ((2456026j,0s,0n),+0s,2299161j)>,
[ 5] #<DateTime: 2012-04-09T00:00:00+00:00 ((2456027j,0s,0n),+0s,2299161j)>,
[ 6] #<DateTime: 2012-04-10T00:00:00+00:00 ((2456028j,0s,0n),+0s,2299161j)>,
[ 7] #<DateTime: 2012-04-11T00:00:00+00:00 ((2456029j,0s,0n),+0s,2299161j)>,
[ 8] #<DateTime: 2012-04-12T00:00:00+00:00 ((2456030j,0s,0n),+0s,2299161j)>,
[ 9] #<DateTime: 2012-04-13T00:00:00+00:00 ((2456031j,0s,0n),+0s,2299161j)>,
[10] #<DateTime: 2012-04-14T00:00:00+00:00 ((2456032j,0s,0n),+0s,2299161j)>,
[11] #<DateTime: 2012-04-15T00:00:00+00:00 ((2456033j,0s,0n),+0s,2299161j)>,
[12] #<DateTime: 2012-04-16T00:00:00+00:00 ((2456034j,0s,0n),+0s,2299161j)>,
[13] #<DateTime: 2012-04-17T00:00:00+00:00 ((2456035j,0s,0n),+0s,2299161j)>,
[14] #<DateTime: 2012-04-18T00:00:00+00:00 ((2456036j,0s,0n),+0s,2299161j)>,
[15] #<DateTime: 2012-04-19T00:00:00+00:00 ((2456037j,0s,0n),+0s,2299161j)>
]


Specifying a number before the date alias in the :freq option will set the frequency to a multiple of the offset. Using this you can create date ranges with frequency in multiples of whatever you want.

In [4]:
# The following code will create a range between 2014-5-1 00:00:00 and 2014,5,2 00:00:00,
# with a difference of 6 hours between each date.

index = Daru::DateTimeIndex.date_range(:start => DateTime.new(2014,5,1), :end => DateTime.new(2014,5,2), :freq => '6H')
ap index.to_a; nil

[
[0] #<DateTime: 2014-05-01T00:00:00+00:00 ((2456779j,0s,0n),+0s,2299161j)>,
[1] #<DateTime: 2014-05-01T06:00:00+00:00 ((2456779j,21600s,0n),+0s,2299161j)>,
[2] #<DateTime: 2014-05-01T12:00:00+00:00 ((2456779j,43200s,0n),+0s,2299161j)>,
[3] #<DateTime: 2014-05-01T18:00:00+00:00 ((2456779j,64800s,0n),+0s,2299161j)>,
[4] #<DateTime: 2014-05-02T00:00:00+00:00 ((2456780j,0s,0n),+0s,2299161j)>
]


The freqeuncy strings that you just saw are translated under the hood into objects of type Daru::Offsets. These offsets determine the distance with which the dates are shifted. See this blog post for a detailed coverage of date offsets and thier string aliases.

:freq can also accept a Daru::DateOffset object or any of the objects under the namespace of Daru::Offsets. For example, to create a date range that has a frequency of 6 seconds:

In [5]:
offset = Daru::Offsets::Second.new(6)
index  = Daru::DateTimeIndex.date_range(:start => '2012-5-6', :end => '2012-5-6 20:00:00', :freq => offset)

Out[5]:
#<DateTimeIndex:18324260 offset=6S periods=12001 data=[2012-05-06T00:00:00+00:00...2012-05-06T20:00:00+00:00]>

Another way to specify the range of the date index is to use the :periods option. This option will decide exactly how many index objects will go into DateTimeIndex, and will take precedence over whatever date is specified in the :end option.

So to create an index of 50 periods starting from the date '2012-5-2' with a frequency of one month end between each:

In [6]:
index = Daru::DateTimeIndex.date_range(:start => '2012-5-2', :periods => 50, :freq => 'ME')

Out[6]:
#<DateTimeIndex:14838220 offset=ME periods=50 data=[2012-05-31T00:00:00+00:00...2016-06-30T00:00:00+00:00]>

You can ask for the frequency of an index with the #frequency method.

In [7]:
index.frequency

Out[7]:
"ME"

## Creating a DateTimeIndex with the DateTimeIndex constructor.¶

The DateTimeIndex constructor allows you to create DateTimeIndex even if the dates are not separated by a particular frequency.

In [8]:
index = Daru::DateTimeIndex.new(
[DateTime.new(2012,4,5), DateTime.new(2012,4,6), DateTime.new(2012,4,7), DateTime.new(2012,4,8)])

Out[8]:
#<DateTimeIndex:13228320 offset=nil periods=4 data=[2012-04-05T00:00:00+00:00...2012-04-08T00:00:00+00:00]>

The constructor also accepts an optional :freq option that allows you to either pass a frequency string alias or an offset object. If you want daru to infer the frequency of your data by itself, pass it the :infer option and it will try to figure out the frequency of the data by itself (if a frequency cannot be inferred it will be set to nil).

In [9]:
index = Daru::DateTimeIndex.new([
DateTime.new(2012,4,5), DateTime.new(2012,4,6), DateTime.new(2012,4,7), DateTime.new(2012,4,8),
DateTime.new(2012,4,9), DateTime.new(2012,4,10), DateTime.new(2012,4,11), DateTime.new(2012,4,12)
], freq: :infer)

Out[9]:
#<DateTimeIndex:13119080 offset=D periods=8 data=[2012-04-05T00:00:00+00:00...2012-04-12T00:00:00+00:00]>

## DateTimeIndex methods¶

The DateTimeIndex offers a host of methods for manipulating and knowing more about the data contained in the index. Let us consider a sample DateTimeIndex and demonstrate:

In [10]:
index = Daru::DateTimeIndex.date_range(:start => '2012', :periods => 10, :freq => 'YEAR')

Out[10]:
#<DateTimeIndex:11607200 offset=YEAR periods=10 data=[2012-01-01T00:00:00+00:00...2021-01-01T00:00:00+00:00]>

You can get a Ruby Array of all the years that each of the indexes belongs to with the #year method:

In [11]:
index.year

Out[11]:
[2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021]

Similarly you can query for #month, #day, #hour, #min or #sec using the respective methods.

To move all the data points of a DateTimeIndex to the future, the #shift method can be used, or to move all of them to the past, use the #lag method.

Passing an offset to #shift will offset each data point by the offset value:

In [12]:
index.shift(Daru::Offsets::Hour.new(3))

Out[12]:
#<DateTimeIndex:12276600 offset=nil periods=10 data=[2012-01-01T03:00:00+00:00...2021-01-01T03:00:00+00:00]>

Passing a positive integer into #shift will offset each data point by the same offset that it was created with:

In [13]:
index.shift(4) # Shift by 4 years

Out[13]:
#<DateTimeIndex:12095520 offset=YEAR periods=10 data=[2016-01-01T00:00:00+00:00...2025-01-01T00:00:00+00:00]>

#lag works in a similar manner:

In [14]:
index.lag(Daru::DateOffset.new(days: 4))

Out[14]:
#<DateTimeIndex:11703700 offset=nil periods=10 data=[2011-12-28T00:00:00+00:00...2020-12-28T00:00:00+00:00]>
In [15]:
index.lag(2)

Out[15]:
#<DateTimeIndex:10628980 offset=YEAR periods=10 data=[2010-01-01T00:00:00+00:00...2019-01-01T00:00:00+00:00]>

## Using DateTimeIndex with Vector and DataFrame¶

When used with Daru::Vector or Daru::DataFrame, DateTimeIndex functions exactly like any other index. You can query individual dates, slices, etc. and retrieve the relevant data by specifying the date either completely or partially.

One of the salient features of indexing time-based data with the DateTimeIndex is that it lets you retrieve data of a given time period by specifying just a partial data. We'll see how exactly this can be done with some examples:

For starters lets create a basic Daru::Vector that is indexed on DateTimeIndex.

In [16]:
index = Daru::DateTimeIndex.date_range(:start => '2012-3-4', :periods => 50000, :freq => 'H')
vector = Daru::Vector.new([1,2,3,4,5]*10000, index: index)

Out[16]:
Daru::Vector:36599420 size: 10
nil
2012-03-04T00:00:00+00:001
2012-03-04T01:00:00+00:002
2012-03-04T02:00:00+00:003
2012-03-04T03:00:00+00:004
2012-03-04T04:00:00+00:005
2012-03-04T05:00:00+00:001
2012-03-04T06:00:00+00:002
2012-03-04T07:00:00+00:003
2012-03-04T08:00:00+00:004
2012-03-04T09:00:00+00:005

You can retrieve data by specifying the date completely or partially. Specifying it partially will retrive all the data that falls under that time period. For example, to retreive all the data that falls under April 2012 (thats '2012-4'):

In [17]:
vector['2012-4']

Out[17]:
Daru::Vector:37106300 size: 720
nil
2012-04-01T00:00:00+00:003
2012-04-01T01:00:00+00:004
2012-04-01T02:00:00+00:005
2012-04-01T03:00:00+00:001
2012-04-01T04:00:00+00:002
2012-04-01T05:00:00+00:003
2012-04-01T06:00:00+00:004
2012-04-01T07:00:00+00:005
2012-04-01T08:00:00+00:001
2012-04-01T09:00:00+00:002
2012-04-01T10:00:00+00:003
2012-04-01T11:00:00+00:004
2012-04-01T12:00:00+00:005
2012-04-01T13:00:00+00:001
2012-04-01T14:00:00+00:002
2012-04-01T15:00:00+00:003
2012-04-01T16:00:00+00:004
2012-04-01T17:00:00+00:005
2012-04-01T18:00:00+00:001
2012-04-01T19:00:00+00:002
2012-04-01T20:00:00+00:003
2012-04-01T21:00:00+00:004
2012-04-01T22:00:00+00:005
2012-04-01T23:00:00+00:001
2012-04-02T00:00:00+00:002
2012-04-02T01:00:00+00:003
2012-04-02T02:00:00+00:004
2012-04-02T03:00:00+00:005
2012-04-02T04:00:00+00:001
2012-04-02T05:00:00+00:002
2012-04-02T06:00:00+00:003
2012-04-02T07:00:00+00:004
......
2012-04-30T23:00:00+00:002

As you can see only the data with an index on April 2012 was retreived.

Now, say you want all the data under the year 2013. You can just specify the year as a string:

In [18]:
vector['2013']

Out[18]:
Daru::Vector:38535340 size: 8760
nil
2013-01-01T00:00:00+00:003
2013-01-01T01:00:00+00:004
2013-01-01T02:00:00+00:005
2013-01-01T03:00:00+00:001
2013-01-01T04:00:00+00:002
2013-01-01T05:00:00+00:003
2013-01-01T06:00:00+00:004
2013-01-01T07:00:00+00:005
2013-01-01T08:00:00+00:001
2013-01-01T09:00:00+00:002
2013-01-01T10:00:00+00:003
2013-01-01T11:00:00+00:004
2013-01-01T12:00:00+00:005
2013-01-01T13:00:00+00:001
2013-01-01T14:00:00+00:002
2013-01-01T15:00:00+00:003
2013-01-01T16:00:00+00:004
2013-01-01T17:00:00+00:005
2013-01-01T18:00:00+00:001
2013-01-01T19:00:00+00:002
2013-01-01T20:00:00+00:003
2013-01-01T21:00:00+00:004
2013-01-01T22:00:00+00:005
2013-01-01T23:00:00+00:001
2013-01-02T00:00:00+00:002
2013-01-02T01:00:00+00:003
2013-01-02T02:00:00+00:004
2013-01-02T03:00:00+00:005
2013-01-02T04:00:00+00:001
2013-01-02T05:00:00+00:002
2013-01-02T06:00:00+00:003
2013-01-02T07:00:00+00:004
......
2013-12-31T23:00:00+00:002

Passing a string to #[] evaluates it to the greatest possible accuracy and then retrieves the relevant data. Now say you want the data that happens to be on 4th February 2013. Just specify this as a string:

In [19]:
vector['2013-2-4']

Out[19]:
Daru::Vector:40565680 size: 24
nil
2013-02-04T00:00:00+00:004
2013-02-04T01:00:00+00:005
2013-02-04T02:00:00+00:001
2013-02-04T03:00:00+00:002
2013-02-04T04:00:00+00:003
2013-02-04T05:00:00+00:004
2013-02-04T06:00:00+00:005
2013-02-04T07:00:00+00:001
2013-02-04T08:00:00+00:002
2013-02-04T09:00:00+00:003
2013-02-04T10:00:00+00:004
2013-02-04T11:00:00+00:005
2013-02-04T12:00:00+00:001
2013-02-04T13:00:00+00:002
2013-02-04T14:00:00+00:003
2013-02-04T15:00:00+00:004
2013-02-04T16:00:00+00:005
2013-02-04T17:00:00+00:001
2013-02-04T18:00:00+00:002
2013-02-04T19:00:00+00:003
2013-02-04T20:00:00+00:004
2013-02-04T21:00:00+00:005
2013-02-04T22:00:00+00:001
2013-02-04T23:00:00+00:002

Passing accuracy upto minutes will return precisely that data point, because the highest accuracy of the index is minutes.

In [20]:
vector['2013-2-4 22']

Out[20]:
1

For specifying dates precisely, it is even possible to pass a DateTime object into #[]:

In [21]:
vector[DateTime.new(2012,5,1)]

Out[21]:
3

DateTimeIndex can be used with DataFrame the way it was used with Vector. We can index both rows and columns of a DataFrame using a DateTimeIndex:

In [22]:
index = Daru::DateTimeIndex.date_range(:start => '2012-4-5', :periods => 50, :freq => 'D')
df = Daru::DataFrame.new({
a: [1,2,3,4,5]*10,
b: ['a','b','c','d','e']*10,
c: ['foo', 'bar','baz','razz','jazz']*10
}, index: index)

Out[22]:
Daru::DataFrame:40773500 rows: 50 cols: 3
abc
2012-04-05T00:00:00+00:001afoo
2012-04-06T00:00:00+00:002bbar
2012-04-07T00:00:00+00:003cbaz
2012-04-08T00:00:00+00:004drazz
2012-04-09T00:00:00+00:005ejazz
2012-04-10T00:00:00+00:001afoo
2012-04-11T00:00:00+00:002bbar
2012-04-12T00:00:00+00:003cbaz
2012-04-13T00:00:00+00:004drazz
2012-04-14T00:00:00+00:005ejazz
2012-04-15T00:00:00+00:001afoo
2012-04-16T00:00:00+00:002bbar
2012-04-17T00:00:00+00:003cbaz
2012-04-18T00:00:00+00:004drazz
2012-04-19T00:00:00+00:005ejazz
2012-04-20T00:00:00+00:001afoo
2012-04-21T00:00:00+00:002bbar
2012-04-22T00:00:00+00:003cbaz
2012-04-23T00:00:00+00:004drazz
2012-04-24T00:00:00+00:005ejazz
2012-04-25T00:00:00+00:001afoo
2012-04-26T00:00:00+00:002bbar
2012-04-27T00:00:00+00:003cbaz
2012-04-28T00:00:00+00:004drazz
2012-04-29T00:00:00+00:005ejazz
2012-04-30T00:00:00+00:001afoo
2012-05-01T00:00:00+00:002bbar
2012-05-02T00:00:00+00:003cbaz
2012-05-03T00:00:00+00:004drazz
2012-05-04T00:00:00+00:005ejazz
2012-05-05T00:00:00+00:001afoo
2012-05-06T00:00:00+00:002bbar
............
2012-05-24T00:00:00+00:005ejazz

Rows can be retreived using a syntax similar to that of Daru::Vector:

In [23]:
df.row['2012-5']

Out[23]:
Daru::DataFrame:40777240 rows: 24 cols: 3
abc
2012-05-01T00:00:00+00:002bbar
2012-05-02T00:00:00+00:003cbaz
2012-05-03T00:00:00+00:004drazz
2012-05-04T00:00:00+00:005ejazz
2012-05-05T00:00:00+00:001afoo
2012-05-06T00:00:00+00:002bbar
2012-05-07T00:00:00+00:003cbaz
2012-05-08T00:00:00+00:004drazz
2012-05-09T00:00:00+00:005ejazz
2012-05-10T00:00:00+00:001afoo
2012-05-11T00:00:00+00:002bbar
2012-05-12T00:00:00+00:003cbaz
2012-05-13T00:00:00+00:004drazz
2012-05-14T00:00:00+00:005ejazz
2012-05-15T00:00:00+00:001afoo
2012-05-16T00:00:00+00:002bbar
2012-05-17T00:00:00+00:003cbaz
2012-05-18T00:00:00+00:004drazz
2012-05-19T00:00:00+00:005ejazz
2012-05-20T00:00:00+00:001afoo
2012-05-21T00:00:00+00:002bbar
2012-05-22T00:00:00+00:003cbaz
2012-05-23T00:00:00+00:004drazz
2012-05-24T00:00:00+00:005ejazz