Prerequisites
datatable
package should be upgraded to the latest version (or v1.0.0+).!python3 -m pip install -U pip
!python3 -m pip install -U datatable
!wget https://raw.githubusercontent.com/vopani/datatableton/main/data/datatableton_sample.csv
Requirement already satisfied: pip in /opt/conda/lib/python3.7/site-packages (21.1.2) Collecting pip Downloading pip-21.2.1-py3-none-any.whl (1.6 MB) |████████████████████████████████| 1.6 MB 833 kB/s Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 21.1.2 Uninstalling pip-21.1.2: Successfully uninstalled pip-21.1.2 Successfully installed pip-21.2.1 WARNING: Running pip as root will break packages and permissions. You should install packages reliably by using venv: https://pip.pypa.io/warnings/venv Requirement already satisfied: datatable in /opt/conda/lib/python3.7/site-packages (1.0.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv --2021-07-28 06:17:05-- https://raw.githubusercontent.com/vopani/datatableton/main/data/datatableton_sample.csv Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 940 [text/plain] Saving to: ‘datatableton_sample.csv’ datatableton_sample 100%[===================>] 940 --.-KB/s in 0s 2021-07-28 06:17:05 (34.5 MB/s) - ‘datatableton_sample.csv’ saved [940/940]
import datatable as dt
data = dt.fread('datatableton_sample.csv')
del data[:, ['age', 'category']]
data
timestamp | user | product | price | quantity | |
---|---|---|---|---|---|
▪▪▪▪▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪ | ▪▪▪▪▪▪▪▪ | ▪▪▪▪ | |
0 | 2017-01-01T13:22:41 | U1 | Eggs | 2.3 | 2 |
1 | 2017-05-22T09:54:21 | U2 | Bread | 0.6 | 2 |
2 | 2018-11-09T15:00:01 | U3 | Banana | 0.7 | 1 |
3 | 2018-12-24T23:03:33 | U1 | Water | 0.2 | 1 |
4 | 2019-03-03T06:21:58 | U4 | Eggs | 2.3 | 2 |
5 | 2019-06-17T16:13:39 | U5 | Grapes | 1.5 | 1 |
6 | 2019-07-28T21:03:11 | U5 | Bread | 0.6 | 1 |
7 | 2019-12-05T04:15:42 | U1 | Banana | 0.7 | 2 |
8 | 2020-02-02T03:45:34 | U3 | Banana | 0.7 | 1 |
9 | 2020-03-05T07:09:12 | U4 | Grapes | 1.5 | 1 |
10 | 2020-03-22T19:29:38 | U1 | Water | 0.2 | 1 |
11 | 2020-03-30T09:44:30 | U4 | Water | 0.2 | 1 |
12 | 2020-04-01T13:21:41 | U1 | Banana | 0.7 | 2 |
13 | 2020-07-08T11:45:25 | U2 | Grapes | 1.5 | 1 |
14 | 2020-11-19T18:51:22 | U5 | Water | 0.2 | 1 |
15 | 2020-12-03T16:23:48 | U3 | Banana | 0.5 | 3 |
16 | 2021-02-03T01:14:40 | U5 | Eggs | 2.1 | 4 |
17 | 2021-05-26T22:42:15 | U3 | Bread | 0.6 | 1 |
18 | 2021-06-14T15:49:28 | U4 | Eggs | 2.3 | 2 |
19 | 2021-07-01T04:37:31 | U4 | Water | 0.3 | 1 |
Exercise 71: Extract the date from timestamp
and add it as a column date
in data
Exercise 72: Extract the year, month, day and day-of-week from date
and add them as new columns year
, month
, day
and day_of_week
in data
Exercise 73: Extract the hour, minute and second from timestamp
and add them as new columns hour
, minute
and second
in data
Exercise 74: Create a date column date_new
in data
using the year
, month
and day
columns
Exercise 75: Create a time column time_new
in data
using the year
, month
, day
, hour
, minute
and second
columns
Exercise 76: Create a lead variable of product
with shift of 2 called next_product_2
in data
Exercise 77: Create a lag variable of date
with single shift called previous_date
in data
Exercise 78: Create a lag variable of timestamp
by user
with single shift called previous_user_time
in data
Exercise 79: Create a column date_diff
which is the difference between date
and previous_date
in days
Exercise 80: Create a column time_diff
which is the difference between timestamp
and previous_user_time
in seconds