Tutorial hidrokit.prep.excel

  • Kategori: data preparation
  • Tujuan: Memperoleh dataframe dari excel file.
  • Dokumentasi: readthedocs

Informasi notebook

  • notebook name: taruma_hidrokit_prep_excel
  • notebook version/date: 1.0.2/20190713
  • notebook server: Google Colab
  • hidrokit version: 0.2.0
  • python version: 3.7

CATATAN

Modul .prep.excel masih dalam tahap pengembangan sehingga fungsi yang akan ditampilkan pada tutorial ini adalah private function yang ditandai dengan diawali _ pada setiap fungsinya. Fungsi tersebut seharusnya tidak digunakan oleh users dan hanya bertugas sebagai fungsi di balik layar. Notebook ini hanya melengkapi tutorial hidrokit.

Disarankan tidak menggunakan private function, karena private function akan mengalami perubahan signifikan selama pengembangan.

Instalasi hidrokit

In [0]:
### Instalasi melalui PyPI
!pip install hidrokit[excel] # Digunakan [excel] karena membutuhkan paket xlrd

### Instalasi melalui Github
# !pip install git+https://github.com/taruma/hidrokit.git

### Instalasi melalui Github (Latest)
# !pip install git+https://github.com/taruma/[email protected]
Requirement already satisfied: hidrokit[excel] in /usr/local/lib/python3.6/dist-packages (0.2.0)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from hidrokit[excel]) (3.0.3)
Requirement already satisfied: pandas in /usr/local/lib/python3.6/dist-packages (from hidrokit[excel]) (0.24.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from hidrokit[excel]) (1.16.4)
Requirement already satisfied: openpyxl; extra == "excel" in /usr/local/lib/python3.6/dist-packages (from hidrokit[excel]) (2.5.9)
Requirement already satisfied: xlwt; extra == "excel" in /usr/local/lib/python3.6/dist-packages (from hidrokit[excel]) (1.3.0)
Requirement already satisfied: xlrd; extra == "excel" in /usr/local/lib/python3.6/dist-packages (from hidrokit[excel]) (1.1.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit[excel]) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit[excel]) (1.1.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit[excel]) (2.4.0)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->hidrokit[excel]) (2.5.3)
Requirement already satisfied: pytz>=2011k in /usr/local/lib/python3.6/dist-packages (from pandas->hidrokit[excel]) (2018.9)
Requirement already satisfied: jdcal in /usr/local/lib/python3.6/dist-packages (from openpyxl; extra == "excel"->hidrokit[excel]) (1.4.1)
Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.6/dist-packages (from openpyxl; extra == "excel"->hidrokit[excel]) (1.0.1)
Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from cycler>=0.10->matplotlib->hidrokit[excel]) (1.12.0)
Requirement already satisfied: setuptools in /usr/local/lib/python3.6/dist-packages (from kiwisolver>=1.0.1->matplotlib->hidrokit[excel]) (41.0.1)

Import Library

In [0]:
import numpy as np
import pandas as pd

Dataset

In [0]:
# Ambil dataset dari data test hidrokit
!wget -O '2006 data hujan stasiun A.xls' "https://github.com/taruma/hidrokit/blob/master/tests/data/excel/2006%20HUJAN%20DISNEY%20LAND.xls?raw=true"
--2019-07-13 02:17:07--  https://github.com/taruma/hidrokit/blob/master/tests/data/excel/2006%20HUJAN%20DISNEY%20LAND.xls?raw=true
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.com/taruma/hidrokit/raw/master/tests/data/excel/2006%20HUJAN%20DISNEY%20LAND.xls [following]
--2019-07-13 02:17:08--  https://github.com/taruma/hidrokit/raw/master/tests/data/excel/2006%20HUJAN%20DISNEY%20LAND.xls
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/taruma/hidrokit/master/tests/data/excel/2006%20HUJAN%20DISNEY%20LAND.xls [following]
--2019-07-13 02:17:08--  https://raw.githubusercontent.com/taruma/hidrokit/master/tests/data/excel/2006%20HUJAN%20DISNEY%20LAND.xls
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36864 (36K) [application/octet-stream]
Saving to: ‘2006 data hujan stasiun A.xls’

2006 data hujan sta 100%[===================>]  36.00K  --.-KB/s    in 0.01s   

2019-07-13 02:17:08 (2.72 MB/s) - ‘2006 data hujan stasiun A.xls’ saved [36864/36864]

In [0]:
# Menampilkan file excel dalam bentuk dataframe
filepath = '2006 data hujan stasiun A.xls'
year = 2006
pd.read_excel(filepath)
Out[0]:
Unnamed: 0 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10 Unnamed: 11 Unnamed: 12 Unnamed: 13
0 NaN DATA CURAH HUJAN BIASA DALAM MILIMETER NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NO. POS HUJAN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NAMA POS HUJAN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NO. KADASTER NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 NaN DAERAH ALIRAN SUNGAI NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 NaN LOKASI GEOGRAFI NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 NaN KECAMATAN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 NaN DESA NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
9 NaN TINGGI DARI PERMUKAAN LAUT (m) NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
10 NaN SATUAN WILAYAH SUNGAI NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11 NaN SUNGAI NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
12 NaN DIBANGUN OLEH NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 NaN TAHUN PENDIRIAN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
14 NaN TAHUN DATA NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
15 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
16 NaN Tgl. Bulan NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17 NaN NaN Jan Feb Mar Apr Mei Juni Juli Agt Sept Okt Nov Des
18 NaN 1 - 28.3 28.3 8.4 - - - - - - - 159
19 NaN 2 9.8 16.2 - - - 29 - 0.7 - - 63.4 52.5
20 NaN 3 - 9.3 16.5 13.9 - - - - - 0.3 - 11.6
21 NaN 4 66.3 - 11.3 - - 13.2 - - - - - 12.3
22 NaN 5 2.7 - - - - - - - - 0.6 - -
23 NaN 6 - 11.8 - - - - - - - - 27.5 0.6
24 NaN 7 - 15.2 - - - - - - - - - 4.3
25 NaN 8 31.4 - 32.8 - - 12.8 - - - - - 0.8
26 NaN 9 - - - - - - 88.5 - - - 38.6 -
27 NaN 10 - - 17.2 0.7 92.6 - - - - 11.6 - -
28 NaN 11 0.9 - - 13.3 - - - - - - - 46
29 NaN 12 - 48.2 - - - - - - - - - -
30 NaN 13 16.3 27.8 - 36.7 - - 43.5 - - - - 0.6
31 NaN 14 28.2 31.6 - 20.2 - - - - - 23.2 - 18.3
32 NaN 15 36.3 - 36.2 15.3 - - - - 0.7 - - -
33 NaN 16 - 11.3 18.7 - - - - - - - - 15.2
34 NaN 17 0.7 - - - 13.2 - - - - 16.5 - -
35 NaN 18 - - - - - - 37.8 13.2 - - - -
36 NaN 19 - - 21.8 - - - - - - - - -
37 NaN 20 - - - 0.6 - - - - - - - 0.8
38 NaN 21 35.3 21.7 - - - 31.5 - - - - - -
39 NaN 22 - 17.3 7.3 92 - - 16.4 - - - 13.8 -
40 NaN 23 18.9 - 14.6 13.5 - 62.5 - - 0.9 - 26.7 14.5
41 NaN 24 40.3 - - - - - - - - - 15.3 -
42 NaN 25 - 28.3 31.3 - 33.8 - 11.8 11.3 - 42.9 - 105.5
43 NaN 26 - 15.5 20.7 11.6 51.5 - - - - - 11.6 -
44 NaN 27 - - - - - - - - - 21.4 - 0.6
45 NaN 28 - 0.7 - - - 0.8 - - - - 14.8 79
46 NaN 29 - - 11.8 17.3 - - 24.8 - 9.2 45.9 27.3 -
47 NaN 30 0.6 - 15.3 - - - - - - - 6.5 0.3
48 NaN 31 - - - - - - 57.3 - - - - 44
49 NaN Jumlah 287.7 283.2 283.8 243.5 191.1 149.8 280.1 25.2 10.8 162.4 245.5 565.9
50 NaN Rerata 22.1308 20.2286 20.2714 20.2917 47.775 24.9667 40.0143 8.4 3.6 20.3 24.55 31.4389
51 NaN Max 66.3 48.2 36.2 92 92.6 62.5 88.5 13.2 9.2 45.9 63.4 159
52 NaN Min 0.6 0.7 7.3 0.6 13.2 0.8 11.8 0.7 0.7 0.3 6.5 0.3
53 NaN Keterangan : 0.0 = Tidak ada hujan NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Fungsi private prep.excel()

  • Tujuan: Memperoleh data dari excel file.
  • Dokumentasi: readthedocs
In [0]:
from hidrokit.prep import excel

_file_single_pivot()

Memperoleh tabel pivot dari excel file

In [0]:
tabel_pivot = excel._file_single_pivot(filepath)
tabel_pivot
Out[0]:
2 3 4 5 6 7 8 9 10 11 12 13
19 - 28.3 28.3 8.4 - - - - - - - 159
20 9.8 16.2 - - - 29 - 0.7 - - 63.4 52.5
21 - 9.3 16.5 13.9 - - - - - 0.3 - 11.6
22 66.3 - 11.3 - - 13.2 - - - - - 12.3
23 2.7 - - - - - - - - 0.6 - -
24 - 11.8 - - - - - - - - 27.5 0.6
25 - 15.2 - - - - - - - - - 4.3
26 31.4 - 32.8 - - 12.8 - - - - - 0.8
27 - - - - - - 88.5 - - - 38.6 -
28 - - 17.2 0.7 92.6 - - - - 11.6 - -
29 0.9 - - 13.3 - - - - - - - 46
30 - 48.2 - - - - - - - - - -
31 16.3 27.8 - 36.7 - - 43.5 - - - - 0.6
32 28.2 31.6 - 20.2 - - - - - 23.2 - 18.3
33 36.3 - 36.2 15.3 - - - - 0.7 - - -
34 - 11.3 18.7 - - - - - - - - 15.2
35 0.7 - - - 13.2 - - - - 16.5 - -
36 - - - - - - 37.8 13.2 - - - -
37 - - 21.8 - - - - - - - - -
38 - - - 0.6 - - - - - - - 0.8
39 35.3 21.7 - - - 31.5 - - - - - -
40 - 17.3 7.3 92 - - 16.4 - - - 13.8 -
41 18.9 - 14.6 13.5 - 62.5 - - 0.9 - 26.7 14.5
42 40.3 - - - - - - - - - 15.3 -
43 - 28.3 31.3 - 33.8 - 11.8 11.3 - 42.9 - 105.5
44 - 15.5 20.7 11.6 51.5 - - - - - 11.6 -
45 - - - - - - - - - 21.4 - 0.6
46 - 0.7 - - - 0.8 - - - - 14.8 79
47 - - 11.8 17.3 - - 24.8 - 9.2 45.9 27.3 -
48 0.6 - 15.3 - - - - - - - 6.5 0.3
49 - - - - - - 57.3 - - - - 44

_dataframe_table()

Mengubah tabel pivot ke dataframe kolom tunggal.

In [0]:
tabel_tunggal = excel._dataframe_table(tabel_pivot, year, name='sta_a')
tabel_tunggal
Out[0]:
sta_a
2006-01-01 -
2006-01-02 9.8
2006-01-03 -
2006-01-04 66.3
2006-01-05 2.7
2006-01-06 -
2006-01-07 -
2006-01-08 31.4
2006-01-09 -
2006-01-10 -
2006-01-11 0.9
2006-01-12 -
2006-01-13 16.3
2006-01-14 28.2
2006-01-15 36.3
2006-01-16 -
2006-01-17 0.7
2006-01-18 -
2006-01-19 -
2006-01-20 -
2006-01-21 35.3
2006-01-22 -
2006-01-23 18.9
2006-01-24 40.3
2006-01-25 -
2006-01-26 -
2006-01-27 -
2006-01-28 -
2006-01-29 -
2006-01-30 0.6
... ...
2006-12-02 52.5
2006-12-03 11.6
2006-12-04 12.3
2006-12-05 -
2006-12-06 0.6
2006-12-07 4.3
2006-12-08 0.8
2006-12-09 -
2006-12-10 -
2006-12-11 46
2006-12-12 -
2006-12-13 0.6
2006-12-14 18.3
2006-12-15 -
2006-12-16 15.2
2006-12-17 -
2006-12-18 -
2006-12-19 -
2006-12-20 0.8
2006-12-21 -
2006-12-22 -
2006-12-23 14.5
2006-12-24 -
2006-12-25 105.5
2006-12-26 -
2006-12-27 0.6
2006-12-28 79
2006-12-29 -
2006-12-30 0.3
2006-12-31 44

365 rows × 1 columns

Changelog

- 20190713 - 1.0.2 - Informasi notebook
- 20190713 - 1.0.1 - Fix typo
- 20190713 - 1.0.0 - Initial

Source code in this notebook is licensed under a MIT License. Data in this notebook is licensed under a Creative Common Attribution 4.0 International.