9. 其他實用工具 Other Utilities¶

Python 的通用性來自於豐富的標準函式庫，本章介紹以下幾種常用的工具模組。

9.1 日期與時間（Date and Time）
9.2 物件序列化（Python Object Serialization）
9.3 Jason
9.4 亂數（Random Numbers）
9.5 數學函數（Math Functions）
9.6 檔案系統路徑（File System Paths）
9.7 資料型別提示（Type Hints）

9.1 日期與時間 Date and Time¶

Python 標準函式庫中的 datetime 模組可以用來處理日期時間相關的資料，包含了 date, time, datetime, timedelta, timezone 等型別。

§ timedelta¶

timedelta 物件是用來表示時間差、時間概念上的距離，不是特定某天幾點幾分的時間，可以用來進行加減乘除的四則運算。

In [1]:

# 載入 timedelta 模組
from datetime import timedelta

delta = timedelta(
    days=50,
    seconds=27,
    microseconds=10,
    milliseconds=29000,
    minutes=5,
    hours=8,
    weeks=2
)
# Only days, seconds and microseconds are stored internally
delta

Out[1]:

datetime.timedelta(days=64, seconds=29156, microseconds=10)

In [2]:

ten_years = timedelta(days=365) * 10
ten_years

Out[2]:

datetime.timedelta(days=3650)

In [3]:

ten_years.days // 365

Out[3]:

§ date¶

date 有 year、month、day 屬性，沒有時分秒的概念，算數運算與比較會忽略 timedelta.seconds 和 timedelta.microseconds。

操作範例
`date2 = date1 + timedelta`
`date2 = date1 - timedelta`
`timedelta = date1 - date2`
`date1 < date2`

In [4]:

# 載入 date 模組
from datetime import date

today = date.today()
today

Out[4]:

datetime.date(2021, 3, 17)

In [5]:

today.replace(year=2022)

Out[5]:

datetime.date(2022, 3, 17)

In [6]:

(today - ten_years).isoformat()

Out[6]:

'2011-03-20'

In [7]:

# 記得 timedelta 只有天數的概念，沒有年月的概念
date.today() - date(2020, 3, 16)

Out[7]:

datetime.timedelta(days=366)

§ time¶

time 是時區性的時分秒概念的物件，有hour、minute、second、microsecond、tzinfo的屬性。

§ datetime¶

datetime 綜合了 date 與 time 物件資訊的物件。

操作範例
`datetime2 = datetime1 + timedelta`
`datetime2 = datetime1 - timedelta`
`timedelta = datetime1 - datetime2`
`datetime1 < datetime2`

In [8]:

# 載入 datetime 模組
from datetime import datetime

# 現在的日期時間，返回 datetime 型別
t1 = datetime.now()
t1

Out[8]:

datetime.datetime(2021, 3, 17, 11, 39, 4, 937520)

In [9]:

# 轉成字串 str 型別
str(t1)

Out[9]:

'2021-03-17 11:39:04.937520'

In [10]:

# 現在的日期時間，轉成指定格式的字串
t1.strftime('%m/%d/%Y %H:%M:%S')

Out[10]:

'03/17/2021 11:39:04'

In [11]:

# 從日期時間的字串轉成 datetime 型別
t2 = datetime.strptime('2020-10-28  15:10:00', '%Y-%m-%d %H:%M:%S')

# 比較兩個 datetime
if (t1 > t2):
    print('t1 比 t2 晚', t1 - t2)
else:
    print('t2 比 t1 晚', t2 - t1)

t1 比 t2 晚 139 days, 20:29:04.937520

標準函式庫還有另外一個 time 模組，提供了專門用來處理時間相關的函式，大多是從系統的C函式庫來的比較低階的處理。

In [12]:

# time 模組也有 strftime 可以用來格式化時間字串
import time
time.strftime('%Y%m%d %H%M%S')

Out[12]:

'20210317 113905'

9.2 物件序列化 Python Object Serialization¶

Python 標準函式庫中的 pickle 模組，提供了將 Python 物件序列化（serializing）及解序列化（de-serializing）的方法。序列化指的是將物件階層轉換成位元組串流（byte stream），以方便物件的儲存、網路傳送、以及不同平臺的互通交換，反向的解序列化操作則是將位元組串流轉換成物件階層。

pickle 模組可以將物件儲存至檔案，或從檔案載入物件，檔案的存取需使用 binary 模式。
pickle 模組提供的序列化功能只適用於 Python 物件專用，標準函式庫中另外有跨平臺及程式語言的通用型的序列化模組 json，但 json 只支援較少的 Python 內建物件型別。

In [13]:

# 載入 pickle 模組
import pickle

In [14]:

# 建立一個數據記錄的結構
tformat = '%Y-%m-%d %H:%M:%S'
record = [
    {'時間':datetime.strptime('2019-04-03 10:35:58', tformat), '體溫':37.0, '速度':35.0, '心率':92},
    {'時間':datetime.strptime('2019-04-03 10:37:00', tformat), '體溫':37.1, '速度':33.8, '心率':97},
    {'時間':datetime.strptime('2019-04-03 10:37:59', tformat), '體溫':37.4, '速度':35.5, '心率':99}
]

In [15]:

# 開啟新的 binary 檔案，用 pickle 將 record 物件 serialize
# 注意： pickle 的檔案是 binary 的格式
pfile = open('record.pkl', 'wb')
pickle.dump(record, pfile)
pfile.close()

In [16]:

# 學過 context manager 了，應該這樣寫比較ok
with open('record.pkl', 'wb') as pfile:
    pickle.dump(record, pfile)

In [17]:

# 讀入檔案，將 record 物件 de-serialize
pfile = open('record.pkl', 'rb')
record2 = pickle.load(pfile)
pfile.close()

In [18]:

# 載入 pickle 物件也改成 context manager 的寫法
with open('record.pkl', 'rb') as pfile:
    record2 = pickle.load(pfile)

record2

Out[18]:

[{'時間': datetime.datetime(2019, 4, 3, 10, 35, 58),
  '體溫': 37.0,
  '速度': 35.0,
  '心率': 92},
 {'時間': datetime.datetime(2019, 4, 3, 10, 37),
  '體溫': 37.1,
  '速度': 33.8,
  '心率': 97},
 {'時間': datetime.datetime(2019, 4, 3, 10, 37, 59),
  '體溫': 37.4,
  '速度': 35.5,
  '心率': 99}]

9.3 Json¶

JSON（JavaScript Object Notation）是常用的公開規格的資料交換格式，副檔名慣例為 .json。 Python 標準函式庫中的 json 模組提供了類似 pickle 的方法，可以用來將內建的物件型別輸出成 JSON 檔，或反過來載入用 JSON 格式編碼的物件。支援的物件類型與 JSON 編碼的對應表列如下：

JSON 物件	Python 物件
object	dict
array	list
string	str
int	int
real	float
true	True
false	False
null	None

In [19]:

# 準備要儲存成 JSON 檔的物件
card = {
    "image": {
        "Width":  600,
        "Height": 800,
        "Title":  "Portrait",
    },
    
    "person": {
        "firstName": "John",
        "lastName": "Doe",
        "isAlive": True,
        "age": 27,
        "phoneNumbers": [
            {
                "type": "home",
                "number": "212 555-1234"
            },
            {
                "type": "office",
                "number": "646 555-4567"
            }
        ],
        "spouse": None
    }
}

card

Out[19]:

{'image': {'Width': 600, 'Height': 800, 'Title': 'Portrait'},
 'person': {'firstName': 'John',
  'lastName': 'Doe',
  'isAlive': True,
  'age': 27,
  'phoneNumbers': [{'type': 'home', 'number': '212 555-1234'},
   {'type': 'office', 'number': '646 555-4567'}],
  'spouse': None}}

In [20]:

# 載入 json 模組
import json

In [21]:

# 開啟新的文字檔案，將 python 物件編碼成 JSON 輸出到檔案
# 注意： .json 的檔案是文字格式
with open('card.json', 'w') as jfile:
    json.dump(card, jfile)

In [22]:

# 開啟.json檔案，將 JSON 編碼的物件載入轉成 Python 物件
with open('card.json', 'r') as jfile:
    card2 = json.load(jfile)

card2

Out[22]:

{'image': {'Width': 600, 'Height': 800, 'Title': 'Portrait'},
 'person': {'firstName': 'John',
  'lastName': 'Doe',
  'isAlive': True,
  'age': 27,
  'phoneNumbers': [{'type': 'home', 'number': '212 555-1234'},
   {'type': 'office', 'number': '646 555-4567'}],
  'spouse': None}}

9.4 亂數 Random Numbers¶

Python 標準函式庫中的 random 模組，提供了擬隨機（pseudo-random）亂數產生的方法。

random() - 返回下一個 [0.0, 1.0) 區間內的隨機實數。
randrange(start, stop[, step]) - 返回下一個 [start, stop) 區間內的隨機整數。
randint(a, b) - 返回下一個 [a, b] 區間內的隨機整數，同 randrange(a, b+1)。
choice(seq) - 從 seq 序列中隨機選取其中一個成員。
shuffle(seq) - 將 seq 序列中的元素順序重新隨機排列，序列必須是可就地變更的容器類別。
sample(seq, k) - 從 seq 序列或集合中，返回隨機選取 k 個樣本的 List 清單。

In [23]:

# 載入 random 模組
import random

In [24]:

# 產生 100 個隨機實數數列
Lr = [random.random() for x in range(100)]
print(Lr)

[0.49356032291668317, 0.2411096099519041, 0.24274807400132448, 0.6806583376348618, 0.4007445154008329, 0.5020029135428243, 0.8753881901813494, 0.2064575923259282, 0.5471457366330188, 0.6100255703399993, 0.6754079019577277, 0.028047998071924818, 0.9113825389832221, 0.12481777147365836, 0.9197373983810999, 0.4972709758446857, 0.3109592944098044, 0.6135724808834165, 0.9030384725303868, 0.4016312744745454, 0.4224038103832404, 0.4288471001262948, 0.466593180358435, 0.47192722041625734, 0.23088689632757775, 0.5080219321975132, 0.23119624893044544, 0.766038585063923, 0.9432255999781156, 0.4438950384837139, 0.008875158371981162, 0.7747935214178607, 0.8328097496865488, 0.03820973930946581, 0.2425404000214182, 0.20378756529358255, 0.9011823798074147, 0.9429615434171732, 0.008365499494094153, 0.30710204474646563, 0.7714685577914125, 0.44228662030116717, 0.47293711004645833, 0.9272615793168927, 0.6274046238104688, 0.06124138720195915, 0.20303402087805467, 0.00551746765636052, 0.29608613992703825, 0.8392178754932461, 0.4092118352756213, 0.5371365032419317, 0.45466517861888456, 0.07374675487497973, 0.38185686745620595, 0.0329699613561929, 0.15696658530359375, 0.8553618168471122, 0.3398270023737717, 0.7099281747926457, 0.23799987535976463, 0.8074101269935077, 0.7577429606338424, 0.06977378136798007, 0.38924769663238856, 0.13725757006274264, 0.9314593644916109, 0.5800782709115695, 0.5442005381571936, 0.09078629592206988, 0.33982237325614884, 0.5233793070412616, 0.24834424239295672, 0.10656032584693331, 0.6222531212266836, 0.11026076444867894, 0.5767222541627232, 0.8213605421082407, 0.4042220342914117, 0.5086261360293176, 0.10830352652922781, 0.31460119979844836, 0.29420699858731425, 0.27102235776003614, 0.7215244001345396, 0.003757854865288124, 0.13378381296717023, 0.4004733598153881, 0.508240336008106, 0.9004693485487324, 0.5161319869840322, 0.44929251677640736, 0.0816625073655195, 0.40020492559186704, 0.305598752686734, 0.20873123907155022, 0.3384601272052107, 0.22090273575138908, 0.7775569232073181, 0.8853173389504151]

In [25]:

# 產生 100 個隨機整數數列
Li = [random.randint(1, 100) for x in range(100)]
print(Li)

[11, 57, 31, 91, 59, 91, 10, 56, 74, 13, 38, 5, 94, 73, 37, 86, 90, 5, 14, 56, 63, 74, 69, 51, 43, 22, 10, 57, 98, 47, 60, 82, 34, 77, 41, 33, 15, 95, 70, 95, 48, 48, 17, 28, 40, 44, 100, 17, 58, 93, 100, 16, 91, 82, 85, 96, 40, 3, 86, 92, 33, 36, 60, 63, 53, 30, 76, 93, 23, 84, 100, 2, 73, 47, 90, 67, 67, 46, 16, 15, 6, 74, 32, 46, 28, 24, 47, 27, 7, 65, 19, 9, 27, 70, 49, 79, 36, 6, 87, 4]

In [26]:

# 從數列中隨機選取 10 個樣本，產生新的隨機數列
[x * y for x, y in zip(random.sample(Lr, 10), random.sample(Li, 10))]

Out[26]:

[1.5547964720490222,
 30.879379115434432,
 26.426500783069663,
 1.2249376104827925,
 2.030760763231264,
 77.75569232073181,
 23.323987785256936,
 42.76530392264297,
 68.92799297237922,
 13.824802801220837]

9.5 數學函數 Math Functions¶

Python 標準函式庫中的 math 模組，提供了用於實數運算的常用函數。

In [27]:

# 載入 math 模組
import math

In [28]:

# 內建函式的 sum() 在浮點數運算的精度不足
print(sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]))

0.9999999999999999

In [29]:

# math 模組的 fsum() 可避免精度的誤差
print(math.fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]))

1.0

In [30]:

# cosine 180 度
print('cosine(pi) =', math.cos(math.pi))

cosine(pi) = -1.0

In [31]:

# sine 90 度
print('sine(pi/2) =', math.sin(math.radians(90)))

sine(pi/2) = 1.0

In [32]:

# 載入 math 模組裡要用到的函式
from math import sqrt

# 計算 N 維的歐幾里得距離
def EuclideanDist(p1, p2):
    return sqrt(sum((x1 - x2) ** 2 for x1, x2 in zip(p1, p2)))

m, n = (1, 3, 5, 7, 9), (2, 4, 6, 8, 10)
print('Euclidean distance between {}, {} = {}'.format(m, n, EuclideanDist(m, n)))

Euclidean distance between (1, 3, 5, 7, 9), (2, 4, 6, 8, 10) = 2.23606797749979

9.6 檔案系統路徑 File System Paths¶

Python 標準函式庫中的 pathlib 模組，提供了通用於不同平台的檔案系統路徑操作，Path 物件可以比較、解析路徑的組成部份、也可以串接重組，主要有以下屬性：

Path.drive - 目標路徑的磁碟代號
Path.root - 目標路徑的根目錄
Path.parent - 目標路徑的上層目錄
Path.name - 目標路徑最後部份的名字
Path.suffix - 目標路徑最後部份的副檔名
Path.stem - 目標路徑最後部份去除副檔名的名字

常用的 Path 類別方法如下：

Path.cwd() - 目前工作目錄。
Path.home() - 登入使用者的家目錄。
Path(str) - 從字串 str 建立路徑物件。
Path.exists() - 路徑的檔案或目錄是否存在。
Path.glob(pattern) - 返回生成函式，用來列出路徑下符合指定 pattern 的所有檔案或目錄。
Path.is_dir() - 檢查路徑的目標是否爲目錄。
Path.is_file() - 檢查路徑的目標是否爲檔案。
Path.iterdir() - 當目標路徑爲目錄時，用來迭代尋訪目錄下的所有檔案。
Path.mkdir() - 當目標路徑爲目錄時，爲該目標建立目錄。
Path.rename(new_name) - 重新命名檔案。
Path.open(mode) - 功能同內建函式 open()，使用指定模式開啓檔案，返回檔案物件。
Path.rmdir() - 刪除目錄，必須是空目錄才能刪除。
Path.unlink() - 刪除檔案或連結（symbolic link）。

In [33]:

# 載入 Path 類別
from pathlib import Path

In [34]:

# 取得目前工作目錄
pwd = Path.cwd()
print('Current working directory: ', pwd)

# 列出目前工作目錄下所有的檔案及目錄
for f in pwd.iterdir():
    print(f.name)

Current working directory:  D:\Users\James\Documents\Code\Lecture\Python-Machine-Learning\Lecture-Notes
.directory
.git
.gitignore
.ipynb_checkpoints
01-Getting_Started.ipynb
02-Syntax_Overview_1.ipynb
03-Syntax_Overview_2.ipynb
04-String_Operations.ipynb
05-List_Operations.ipynb
06-Tuple_Operations.ipynb
07-Dict_Operations.ipynb
08-File_Operations.ipynb
09-Other_Utilities.ipynb
10-Coding_Project.ipynb
11-Numpy_Vectorized_Computation.ipynb
12-Matplotlib_Data_Visualization.ipynb
13-Pandas_Data_Processing.ipynb
14-Sklearn_Building_A_Machine_Learning_Model.ipynb
15-Sklearn_Data_Preprocessing.ipynb
16-Sklearn_Best_Practice_Techniques.ipynb
17-Artificial_Neural_Network_with_tf_Keras.ipynb
18-ANN_Case_Studies.ipynb
19-Practical_Autoencoders.ipynb
20-CNN_Fundamental.ipynb
card.json
dataset
QuickStart
README.md
record.pkl

In [35]:

# 建構 Path 物件可以用不同的表示法
file2remove = [Path(pwd, 'card.json'), Path(pwd / 'record.pkl')]

# 刪除之前建立的測試用檔案
for path in file2remove:
    if path.exists():
        path.unlink()
        print('File "{}"" {} removed.'.format(path.name, 'is not' if path.exists() else 'is'))

File "card.json"" is removed.
File "record.pkl"" is removed.

In [36]:

# 建立一個記錄檔案名字與副檔名對照的字典
{f.stem:f.suffix for f in pwd.iterdir() if f.is_file()}

Out[36]:

{'.directory': '',
 '.gitignore': '',
 '01-Getting_Started': '.ipynb',
 '02-Syntax_Overview_1': '.ipynb',
 '03-Syntax_Overview_2': '.ipynb',
 '04-String_Operations': '.ipynb',
 '05-List_Operations': '.ipynb',
 '06-Tuple_Operations': '.ipynb',
 '07-Dict_Operations': '.ipynb',
 '08-File_Operations': '.ipynb',
 '09-Other_Utilities': '.ipynb',
 '10-Coding_Project': '.ipynb',
 '11-Numpy_Vectorized_Computation': '.ipynb',
 '12-Matplotlib_Data_Visualization': '.ipynb',
 '13-Pandas_Data_Processing': '.ipynb',
 '14-Sklearn_Building_A_Machine_Learning_Model': '.ipynb',
 '15-Sklearn_Data_Preprocessing': '.ipynb',
 '16-Sklearn_Best_Practice_Techniques': '.ipynb',
 '17-Artificial_Neural_Network_with_tf_Keras': '.ipynb',
 '18-ANN_Case_Studies': '.ipynb',
 '19-Practical_Autoencoders': '.ipynb',
 '20-CNN_Fundamental': '.ipynb',
 'README': '.md'}

In [37]:

# 列出目前工作目錄下所有副檔名是 .ipynb 的檔案
[f.name for f in pwd.glob('*.ipynb')]

Out[37]:

['01-Getting_Started.ipynb',
 '02-Syntax_Overview_1.ipynb',
 '03-Syntax_Overview_2.ipynb',
 '04-String_Operations.ipynb',
 '05-List_Operations.ipynb',
 '06-Tuple_Operations.ipynb',
 '07-Dict_Operations.ipynb',
 '08-File_Operations.ipynb',
 '09-Other_Utilities.ipynb',
 '10-Coding_Project.ipynb',
 '11-Numpy_Vectorized_Computation.ipynb',
 '12-Matplotlib_Data_Visualization.ipynb',
 '13-Pandas_Data_Processing.ipynb',
 '14-Sklearn_Building_A_Machine_Learning_Model.ipynb',
 '15-Sklearn_Data_Preprocessing.ipynb',
 '16-Sklearn_Best_Practice_Techniques.ipynb',
 '17-Artificial_Neural_Network_with_tf_Keras.ipynb',
 '18-ANN_Case_Studies.ipynb',
 '19-Practical_Autoencoders.ipynb',
 '20-CNN_Fundamental.ipynb']

9.7 資料型別提示 Type Hints¶

版本 3.5 之後的 Python 在執行時期都有支援資料型別提示的語法，不需要載入特別的模組。

def function(arg: arg_type) -> return_type:
    statements
    return value

Python 是動態型別的程式語言，沒有強制變數或函式參數要事先宣告型別，但在大型的專案中，有型別的提示可以讓程式的結構設計具備較高的可讀性。進階的型別支援功能可以透過載入 typing 模組來取得。

In [38]:

# def 函式名稱(參數: 型別) -> 回傳型別
def jiume(who: str) -> str:
    return who + ' >.^ '

jiume('Mary')

Out[38]:

'Mary >.^ '

In [39]:

def addmyself(myself: int) -> int:
    return myself + myself

addmyself(5)

Out[39]:

In [40]:

from typing import Any

def triple(what: Any) -> Any:
    return what * 3

print(triple(jiume('Mary')))
print(triple(addmyself(5)))

Mary >.^ Mary >.^ Mary >.^ 
30

In [41]:

from math import fsum

# dot_product() 函式接受兩個 list 當參數
def dot_product(vec1: list, vec2: list) -> float:
    return fsum(c1 * c2 for c1, c2 in zip(vec1, vec2))

# 型別提醒就只是提醒，傳兩個 tuple 還是可以正常運作
vector1, vector2 = (1, 2, 3), (4, 5, 6)
print('{} dot {} = {}'.format(vector1, vector2, dot_product(vector1, vector2)))

(1, 2, 3) dot (4, 5, 6) = 32.0

§ 型別別名 Type Aliases¶

當某個型別定義在很深層的套件的模組裡時，使用別名可以讓程式看起來簡潔。

In [42]:

from typing import Tuple
from math import hypot

point3d = Tuple[float, float, float]

def distance3d(p1: point3d, p2: point3d) -> float:
    return hypot(*[(x1 - x2) for x1, x2 in zip(p1, p2)])

In [43]:

a = (1, 2, 3)
b = (3., 2., 1.)
print('distance between points', a, 'and', b, '=', distance3d(a, b))

distance between points (1, 2, 3) and (3.0, 2.0, 1.0) = 2.8284271247461903