Tutorial hidrokit.prep.timeseries

  • Kategori: data preparation
  • Tujuan: Memanipulasi dataset timeseries untuk penggunaan machine learning / ANN
  • Dokumentasi: readthedocs

Informasi notebook

  • notebook name: taruma_hidrokit_prep_timeseries
  • notebook version/date : 1.0.1/20190713
  • notebook server: Google Colab
  • hidrokit version: 0.2.0
  • python version: 3.7

Instalasi hidrokit

In [0]:
### Instalasi melalui PyPI
!pip install hidrokit

### Instalasi melalui Github
# !pip install git+https://github.com/taruma/hidrokit.git

### Instalasi melalui Github (Latest)
# !pip install git+https://github.com/taruma/[email protected]
Collecting hidrokit
  Downloading https://files.pythonhosted.org/packages/43/9d/343d2a413a07463a21dd13369e31d664d6733bbfd46276abef5d804c83d1/hidrokit-0.2.0-py2.py3-none-any.whl
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from hidrokit) (1.16.4)
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from hidrokit) (0.24.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from hidrokit) (3.0.3)
Requirement already satisfied: python-dateutil>=2.5.0 in /usr/local/lib/python3.6/dist-packages (from pandas->hidrokit) (2.5.3)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->hidrokit) (2018.9)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit) (2.4.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit) (0.10.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.5.0->pandas->hidrokit) (1.12.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib->hidrokit) (41.0.1)
Installing collected packages: hidrokit
Successfully installed hidrokit-0.2.0

Import Library

In [0]:
import numpy as np
import pandas as pd

Dataset

dataset memiliki tujuh fitur (a, b, c, d, e, f, g)

In [0]:
# Buat dataset menggunakan numpy

np.random.seed(110891)
date_index = pd.date_range('20190101', '20191231')
data = np.random.rand(len(date_index), 7) * 100
columns = 'a b c d e f g'.split()
dataset = pd.DataFrame(
    data=data.round(),
    columns=columns,
    index=date_index.strftime('%Y-%b-%d')
)
dataset.head(10)
Out[0]:
a b c d e f g
2019-Jan-01 29.0 32.0 26.0 61.0 5.0 22.0 78.0
2019-Jan-02 86.0 34.0 80.0 32.0 16.0 17.0 76.0
2019-Jan-03 50.0 52.0 72.0 46.0 3.0 18.0 81.0
2019-Jan-04 5.0 2.0 86.0 36.0 19.0 9.0 97.0
2019-Jan-05 9.0 93.0 7.0 32.0 55.0 62.0 31.0
2019-Jan-06 94.0 38.0 87.0 87.0 51.0 100.0 18.0
2019-Jan-07 54.0 13.0 23.0 59.0 43.0 66.0 68.0
2019-Jan-08 61.0 41.0 96.0 73.0 57.0 44.0 77.0
2019-Jan-09 89.0 54.0 40.0 77.0 66.0 51.0 76.0
2019-Jan-10 60.0 87.0 62.0 35.0 42.0 47.0 62.0
In [0]:
# Info Dataset
dataset.info()
<class 'pandas.core.frame.DataFrame'>
Index: 365 entries, 2019-Jan-01 to 2019-Dec-31
Data columns (total 7 columns):
a    365 non-null float64
b    365 non-null float64
c    365 non-null float64
d    365 non-null float64
e    365 non-null float64
f    365 non-null float64
g    365 non-null float64
dtypes: float64(7)
memory usage: 22.8+ KB

Fungsi timeseries.timestep_table()

  • Tujuan: Membuat tabel timesteps dari DataFrame
  • Sintaks: prep.timeseries.timestep_table(dataframe, columns=None, timesteps=2, keep_first=True)
  • Return: DataFrame
  • Dokumentasi: readthedocs
In [0]:
from hidrokit.prep import timeseries

Argument: None

Jika tidak diberikan argumen maka menggunakan nilai default yaitu seluruh kolom akan dibuat timestep dan menyertakan kolom pada waktu $t_{0}$. Nilai default timesteps adalah dua baris sebelumnya (dalam kasus ini, dua hari sebelumnya).

In [0]:
tabel_ts = timeseries.timestep_table(dataset)
tabel_ts.head()
Out[0]:
a_tmin0 a_tmin1 a_tmin2 b_tmin0 b_tmin1 b_tmin2 c_tmin0 c_tmin1 c_tmin2 d_tmin0 d_tmin1 d_tmin2 e_tmin0 e_tmin1 e_tmin2 f_tmin0 f_tmin1 f_tmin2 g_tmin0 g_tmin1 g_tmin2
2019-Jan-03 50.0 86.0 29.0 52.0 34.0 32.0 72.0 80.0 26.0 46.0 32.0 61.0 3.0 16.0 5.0 18.0 17.0 22.0 81.0 76.0 78.0
2019-Jan-04 5.0 50.0 86.0 2.0 52.0 34.0 86.0 72.0 80.0 36.0 46.0 32.0 19.0 3.0 16.0 9.0 18.0 17.0 97.0 81.0 76.0
2019-Jan-05 9.0 5.0 50.0 93.0 2.0 52.0 7.0 86.0 72.0 32.0 36.0 46.0 55.0 19.0 3.0 62.0 9.0 18.0 31.0 97.0 81.0
2019-Jan-06 94.0 9.0 5.0 38.0 93.0 2.0 87.0 7.0 86.0 87.0 32.0 36.0 51.0 55.0 19.0 100.0 62.0 9.0 18.0 31.0 97.0
2019-Jan-07 54.0 94.0 9.0 13.0 38.0 93.0 23.0 87.0 7.0 59.0 87.0 32.0 43.0 51.0 55.0 66.0 100.0 62.0 68.0 18.0 31.0

Argument: columns=

Memilih kolom tertentu yang akan dimanipulasi.

In [0]:
tabel_ts_columns = timeseries.timestep_table(dataset, columns=['a', 'c', 'd'])
tabel_ts_columns.head()
Out[0]:
a_tmin0 a_tmin1 a_tmin2 b c_tmin0 c_tmin1 c_tmin2 d_tmin0 d_tmin1 d_tmin2 e f g
2019-Jan-03 50.0 86.0 29.0 52.0 72.0 80.0 26.0 46.0 32.0 61.0 3.0 18.0 81.0
2019-Jan-04 5.0 50.0 86.0 2.0 86.0 72.0 80.0 36.0 46.0 32.0 19.0 9.0 97.0
2019-Jan-05 9.0 5.0 50.0 93.0 7.0 86.0 72.0 32.0 36.0 46.0 55.0 62.0 31.0
2019-Jan-06 94.0 9.0 5.0 38.0 87.0 7.0 86.0 87.0 32.0 36.0 51.0 100.0 18.0
2019-Jan-07 54.0 94.0 9.0 13.0 23.0 87.0 7.0 59.0 87.0 32.0 43.0 66.0 68.0

Argument: keep_first=

Jika diatur False maka kolom waktu $t_0$ tidak disertakan.

In [0]:
tabel_ts_keep = timeseries.timestep_table(dataset, columns=['a', 'b', 'c'], keep_first=False)
tabel_ts_keep.head()
Out[0]:
a_tmin1 a_tmin2 b_tmin1 b_tmin2 c_tmin1 c_tmin2 d e f g
2019-Jan-03 50.0 86.0 52.0 34.0 72.0 80.0 46.0 3.0 18.0 81.0
2019-Jan-04 5.0 50.0 2.0 52.0 86.0 72.0 36.0 19.0 9.0 97.0
2019-Jan-05 9.0 5.0 93.0 2.0 7.0 86.0 32.0 55.0 62.0 31.0
2019-Jan-06 94.0 9.0 38.0 93.0 87.0 7.0 87.0 51.0 100.0 18.0
2019-Jan-07 54.0 94.0 13.0 38.0 23.0 87.0 59.0 43.0 66.0 68.0

Argument: timesteps=

Menentukan banyaknya baris yang disertakan dalam kolom timesteps. Contoh: membuat tabel dengan menyertakan informasi 4 hari sebelumnya.

In [0]:
tabel_ts_time = timeseries.timestep_table(dataset, columns='a', keep_first=False, timesteps=4)
tabel_ts_time.head()
Out[0]:
a_tmin1 a_tmin2 a_tmin3 a_tmin4 b c d e f g
2019-Jan-05 9.0 5.0 50.0 86.0 93.0 7.0 32.0 55.0 62.0 31.0
2019-Jan-06 94.0 9.0 5.0 50.0 38.0 87.0 87.0 51.0 100.0 18.0
2019-Jan-07 54.0 94.0 9.0 5.0 13.0 23.0 59.0 43.0 66.0 68.0
2019-Jan-08 61.0 54.0 94.0 9.0 41.0 96.0 73.0 57.0 44.0 77.0
2019-Jan-09 89.0 61.0 54.0 94.0 54.0 40.0 77.0 66.0 51.0 76.0

Changelog

- 20190713 - 1.0.1 - Informasi notebook
- 20190713 - 1.0.0 - Initial

Source code in this notebook is licensed under a MIT License. Data in this notebook is licensed under a Creative Common Attribution 4.0 International.