Python 的通用性來自於豐富的標準函式庫,本章介紹以下幾種常用的工具模組。
timedelta
物件是用來表示時間差、時間概念上的距離,不是特定某天幾點幾分的時間,可以用來進行加減乘除的四則運算。
# 載入 timedelta 模組
from datetime import timedelta
delta = timedelta(
days=50,
seconds=27,
microseconds=10,
milliseconds=29000,
minutes=5,
hours=8,
weeks=2
)
# Only days, seconds and microseconds are stored internally
delta
datetime.timedelta(days=64, seconds=29156, microseconds=10)
ten_years = timedelta(days=365) * 10
ten_years
datetime.timedelta(days=3650)
ten_years.days // 365
10
date
有 year、month、day 屬性,沒有時分秒的概念,算數運算與比較會忽略 timedelta.seconds
和 timedelta.microseconds
。
操作範例 |
---|
date2 = date1 + timedelta |
date2 = date1 - timedelta |
timedelta = date1 - date2 |
date1 < date2 |
# 載入 date 模組
from datetime import date
today = date.today()
today
datetime.date(2021, 3, 17)
today.replace(year=2022)
datetime.date(2022, 3, 17)
(today - ten_years).isoformat()
'2011-03-20'
# 記得 timedelta 只有天數的概念,沒有年月的概念
date.today() - date(2020, 3, 16)
datetime.timedelta(days=366)
# 載入 datetime 模組
from datetime import datetime
# 現在的日期時間,返回 datetime 型別
t1 = datetime.now()
t1
datetime.datetime(2021, 3, 17, 11, 39, 4, 937520)
# 轉成字串 str 型別
str(t1)
'2021-03-17 11:39:04.937520'
# 現在的日期時間,轉成指定格式的字串
t1.strftime('%m/%d/%Y %H:%M:%S')
'03/17/2021 11:39:04'
# 從日期時間的字串轉成 datetime 型別
t2 = datetime.strptime('2020-10-28 15:10:00', '%Y-%m-%d %H:%M:%S')
# 比較兩個 datetime
if (t1 > t2):
print('t1 比 t2 晚', t1 - t2)
else:
print('t2 比 t1 晚', t2 - t1)
t1 比 t2 晚 139 days, 20:29:04.937520
標準函式庫還有另外一個 time
模組,提供了專門用來處理時間相關的函式,大多是從系統的C函式庫來的比較低階的處理。
# time 模組也有 strftime 可以用來格式化時間字串
import time
time.strftime('%Y%m%d %H%M%S')
'20210317 113905'
Python 標準函式庫中的 pickle
模組,提供了將 Python 物件序列化(serializing)及解序列化(de-serializing)的方法。 序列化指的是將物件階層轉換成位元組串流(byte stream),以方便物件的儲存、網路傳送、以及不同平臺的互通交換,反向的解序列化操作則是將位元組串流轉換成物件階層。
pickle
模組可以將物件儲存至檔案,或從檔案載入物件,檔案的存取需使用 binary 模式。pickle
模組提供的序列化功能只適用於 Python 物件專用,標準函式庫中另外有跨平臺及程式語言的通用型的序列化模組 json
,但 json
只支援較少的 Python 內建物件型別。# 載入 pickle 模組
import pickle
# 建立一個數據記錄的結構
tformat = '%Y-%m-%d %H:%M:%S'
record = [
{'時間':datetime.strptime('2019-04-03 10:35:58', tformat), '體溫':37.0, '速度':35.0, '心率':92},
{'時間':datetime.strptime('2019-04-03 10:37:00', tformat), '體溫':37.1, '速度':33.8, '心率':97},
{'時間':datetime.strptime('2019-04-03 10:37:59', tformat), '體溫':37.4, '速度':35.5, '心率':99}
]
# 開啟新的 binary 檔案,用 pickle 將 record 物件 serialize
# 注意: pickle 的檔案是 binary 的格式
pfile = open('record.pkl', 'wb')
pickle.dump(record, pfile)
pfile.close()
# 學過 context manager 了,應該這樣寫比較ok
with open('record.pkl', 'wb') as pfile:
pickle.dump(record, pfile)
# 讀入檔案,將 record 物件 de-serialize
pfile = open('record.pkl', 'rb')
record2 = pickle.load(pfile)
pfile.close()
# 載入 pickle 物件也改成 context manager 的寫法
with open('record.pkl', 'rb') as pfile:
record2 = pickle.load(pfile)
record2
[{'時間': datetime.datetime(2019, 4, 3, 10, 35, 58), '體溫': 37.0, '速度': 35.0, '心率': 92}, {'時間': datetime.datetime(2019, 4, 3, 10, 37), '體溫': 37.1, '速度': 33.8, '心率': 97}, {'時間': datetime.datetime(2019, 4, 3, 10, 37, 59), '體溫': 37.4, '速度': 35.5, '心率': 99}]
# 準備要儲存成 JSON 檔的物件
card = {
"image": {
"Width": 600,
"Height": 800,
"Title": "Portrait",
},
"person": {
"firstName": "John",
"lastName": "Doe",
"isAlive": True,
"age": 27,
"phoneNumbers": [
{
"type": "home",
"number": "212 555-1234"
},
{
"type": "office",
"number": "646 555-4567"
}
],
"spouse": None
}
}
card
{'image': {'Width': 600, 'Height': 800, 'Title': 'Portrait'}, 'person': {'firstName': 'John', 'lastName': 'Doe', 'isAlive': True, 'age': 27, 'phoneNumbers': [{'type': 'home', 'number': '212 555-1234'}, {'type': 'office', 'number': '646 555-4567'}], 'spouse': None}}
# 載入 json 模組
import json
# 開啟新的文字檔案,將 python 物件編碼成 JSON 輸出到檔案
# 注意: .json 的檔案是文字格式
with open('card.json', 'w') as jfile:
json.dump(card, jfile)
# 開啟.json檔案,將 JSON 編碼的物件載入轉成 Python 物件
with open('card.json', 'r') as jfile:
card2 = json.load(jfile)
card2
{'image': {'Width': 600, 'Height': 800, 'Title': 'Portrait'}, 'person': {'firstName': 'John', 'lastName': 'Doe', 'isAlive': True, 'age': 27, 'phoneNumbers': [{'type': 'home', 'number': '212 555-1234'}, {'type': 'office', 'number': '646 555-4567'}], 'spouse': None}}
Python 標準函式庫中的 random
模組,提供了擬隨機(pseudo-random)亂數產生的方法。
random()
- 返回下一個 [0.0, 1.0) 區間內的隨機實數。randrange(start, stop[, step])
- 返回下一個 [start, stop) 區間內的隨機整數。randint(a, b)
- 返回下一個 [a, b] 區間內的隨機整數,同 randrange(a, b+1)
。choice(seq)
- 從 seq 序列中隨機選取其中一個成員。shuffle(seq)
- 將 seq 序列中的元素順序重新隨機排列,序列必須是可就地變更的容器類別。sample(seq, k)
- 從 seq 序列或集合中,返回隨機選取 k 個樣本的 List 清單。# 載入 random 模組
import random
# 產生 100 個隨機實數數列
Lr = [random.random() for x in range(100)]
print(Lr)
[0.49356032291668317, 0.2411096099519041, 0.24274807400132448, 0.6806583376348618, 0.4007445154008329, 0.5020029135428243, 0.8753881901813494, 0.2064575923259282, 0.5471457366330188, 0.6100255703399993, 0.6754079019577277, 0.028047998071924818, 0.9113825389832221, 0.12481777147365836, 0.9197373983810999, 0.4972709758446857, 0.3109592944098044, 0.6135724808834165, 0.9030384725303868, 0.4016312744745454, 0.4224038103832404, 0.4288471001262948, 0.466593180358435, 0.47192722041625734, 0.23088689632757775, 0.5080219321975132, 0.23119624893044544, 0.766038585063923, 0.9432255999781156, 0.4438950384837139, 0.008875158371981162, 0.7747935214178607, 0.8328097496865488, 0.03820973930946581, 0.2425404000214182, 0.20378756529358255, 0.9011823798074147, 0.9429615434171732, 0.008365499494094153, 0.30710204474646563, 0.7714685577914125, 0.44228662030116717, 0.47293711004645833, 0.9272615793168927, 0.6274046238104688, 0.06124138720195915, 0.20303402087805467, 0.00551746765636052, 0.29608613992703825, 0.8392178754932461, 0.4092118352756213, 0.5371365032419317, 0.45466517861888456, 0.07374675487497973, 0.38185686745620595, 0.0329699613561929, 0.15696658530359375, 0.8553618168471122, 0.3398270023737717, 0.7099281747926457, 0.23799987535976463, 0.8074101269935077, 0.7577429606338424, 0.06977378136798007, 0.38924769663238856, 0.13725757006274264, 0.9314593644916109, 0.5800782709115695, 0.5442005381571936, 0.09078629592206988, 0.33982237325614884, 0.5233793070412616, 0.24834424239295672, 0.10656032584693331, 0.6222531212266836, 0.11026076444867894, 0.5767222541627232, 0.8213605421082407, 0.4042220342914117, 0.5086261360293176, 0.10830352652922781, 0.31460119979844836, 0.29420699858731425, 0.27102235776003614, 0.7215244001345396, 0.003757854865288124, 0.13378381296717023, 0.4004733598153881, 0.508240336008106, 0.9004693485487324, 0.5161319869840322, 0.44929251677640736, 0.0816625073655195, 0.40020492559186704, 0.305598752686734, 0.20873123907155022, 0.3384601272052107, 0.22090273575138908, 0.7775569232073181, 0.8853173389504151]
# 產生 100 個隨機整數數列
Li = [random.randint(1, 100) for x in range(100)]
print(Li)
[11, 57, 31, 91, 59, 91, 10, 56, 74, 13, 38, 5, 94, 73, 37, 86, 90, 5, 14, 56, 63, 74, 69, 51, 43, 22, 10, 57, 98, 47, 60, 82, 34, 77, 41, 33, 15, 95, 70, 95, 48, 48, 17, 28, 40, 44, 100, 17, 58, 93, 100, 16, 91, 82, 85, 96, 40, 3, 86, 92, 33, 36, 60, 63, 53, 30, 76, 93, 23, 84, 100, 2, 73, 47, 90, 67, 67, 46, 16, 15, 6, 74, 32, 46, 28, 24, 47, 27, 7, 65, 19, 9, 27, 70, 49, 79, 36, 6, 87, 4]
# 從數列中隨機選取 10 個樣本,產生新的隨機數列
[x * y for x, y in zip(random.sample(Lr, 10), random.sample(Li, 10))]
[1.5547964720490222, 30.879379115434432, 26.426500783069663, 1.2249376104827925, 2.030760763231264, 77.75569232073181, 23.323987785256936, 42.76530392264297, 68.92799297237922, 13.824802801220837]
# 載入 math 模組
import math
# 內建函式的 sum() 在浮點數運算的精度不足
print(sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]))
0.9999999999999999
# math 模組的 fsum() 可避免精度的誤差
print(math.fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]))
1.0
# cosine 180 度
print('cosine(pi) =', math.cos(math.pi))
cosine(pi) = -1.0
# sine 90 度
print('sine(pi/2) =', math.sin(math.radians(90)))
sine(pi/2) = 1.0
# 載入 math 模組裡要用到的函式
from math import sqrt
# 計算 N 維的歐幾里得距離
def EuclideanDist(p1, p2):
return sqrt(sum((x1 - x2) ** 2 for x1, x2 in zip(p1, p2)))
m, n = (1, 3, 5, 7, 9), (2, 4, 6, 8, 10)
print('Euclidean distance between {}, {} = {}'.format(m, n, EuclideanDist(m, n)))
Euclidean distance between (1, 3, 5, 7, 9), (2, 4, 6, 8, 10) = 2.23606797749979
Python 標準函式庫中的 pathlib
模組,提供了通用於不同平台的檔案系統路徑操作,Path
物件可以比較、解析路徑的組成部份、也可以串接重組,主要有以下屬性:
Path.drive
- 目標路徑的磁碟代號Path.root
- 目標路徑的根目錄Path.parent
- 目標路徑的上層目錄Path.name
- 目標路徑最後部份的名字Path.suffix
- 目標路徑最後部份的副檔名Path.stem
- 目標路徑最後部份去除副檔名的名字常用的 Path
類別方法如下:
Path.cwd()
- 目前工作目錄。Path.home()
- 登入使用者的家目錄。Path(str)
- 從字串 str 建立路徑物件。Path.exists()
- 路徑的檔案或目錄是否存在。Path.glob(pattern)
- 返回生成函式,用來列出路徑下符合指定 pattern 的所有檔案或目錄。Path.is_dir()
- 檢查路徑的目標是否爲目錄。Path.is_file()
- 檢查路徑的目標是否爲檔案。Path.iterdir()
- 當目標路徑爲目錄時,用來迭代尋訪目錄下的所有檔案。Path.mkdir()
- 當目標路徑爲目錄時,爲該目標建立目錄。Path.rename(new_name)
- 重新命名檔案。Path.open(mode)
- 功能同內建函式 open()
,使用指定模式開啓檔案,返回檔案物件。Path.rmdir()
- 刪除目錄,必須是空目錄才能刪除。Path.unlink()
- 刪除檔案或連結(symbolic link)。# 載入 Path 類別
from pathlib import Path
# 取得目前工作目錄
pwd = Path.cwd()
print('Current working directory: ', pwd)
# 列出目前工作目錄下所有的檔案及目錄
for f in pwd.iterdir():
print(f.name)
Current working directory: D:\Users\James\Documents\Code\Lecture\Python-Machine-Learning\Lecture-Notes .directory .git .gitignore .ipynb_checkpoints 01-Getting_Started.ipynb 02-Syntax_Overview_1.ipynb 03-Syntax_Overview_2.ipynb 04-String_Operations.ipynb 05-List_Operations.ipynb 06-Tuple_Operations.ipynb 07-Dict_Operations.ipynb 08-File_Operations.ipynb 09-Other_Utilities.ipynb 10-Coding_Project.ipynb 11-Numpy_Vectorized_Computation.ipynb 12-Matplotlib_Data_Visualization.ipynb 13-Pandas_Data_Processing.ipynb 14-Sklearn_Building_A_Machine_Learning_Model.ipynb 15-Sklearn_Data_Preprocessing.ipynb 16-Sklearn_Best_Practice_Techniques.ipynb 17-Artificial_Neural_Network_with_tf_Keras.ipynb 18-ANN_Case_Studies.ipynb 19-Practical_Autoencoders.ipynb 20-CNN_Fundamental.ipynb card.json dataset QuickStart README.md record.pkl
# 建構 Path 物件可以用不同的表示法
file2remove = [Path(pwd, 'card.json'), Path(pwd / 'record.pkl')]
# 刪除之前建立的測試用檔案
for path in file2remove:
if path.exists():
path.unlink()
print('File "{}"" {} removed.'.format(path.name, 'is not' if path.exists() else 'is'))
File "card.json"" is removed. File "record.pkl"" is removed.
# 建立一個記錄檔案名字與副檔名對照的字典
{f.stem:f.suffix for f in pwd.iterdir() if f.is_file()}
{'.directory': '', '.gitignore': '', '01-Getting_Started': '.ipynb', '02-Syntax_Overview_1': '.ipynb', '03-Syntax_Overview_2': '.ipynb', '04-String_Operations': '.ipynb', '05-List_Operations': '.ipynb', '06-Tuple_Operations': '.ipynb', '07-Dict_Operations': '.ipynb', '08-File_Operations': '.ipynb', '09-Other_Utilities': '.ipynb', '10-Coding_Project': '.ipynb', '11-Numpy_Vectorized_Computation': '.ipynb', '12-Matplotlib_Data_Visualization': '.ipynb', '13-Pandas_Data_Processing': '.ipynb', '14-Sklearn_Building_A_Machine_Learning_Model': '.ipynb', '15-Sklearn_Data_Preprocessing': '.ipynb', '16-Sklearn_Best_Practice_Techniques': '.ipynb', '17-Artificial_Neural_Network_with_tf_Keras': '.ipynb', '18-ANN_Case_Studies': '.ipynb', '19-Practical_Autoencoders': '.ipynb', '20-CNN_Fundamental': '.ipynb', 'README': '.md'}
# 列出目前工作目錄下所有副檔名是 .ipynb 的檔案
[f.name for f in pwd.glob('*.ipynb')]
['01-Getting_Started.ipynb', '02-Syntax_Overview_1.ipynb', '03-Syntax_Overview_2.ipynb', '04-String_Operations.ipynb', '05-List_Operations.ipynb', '06-Tuple_Operations.ipynb', '07-Dict_Operations.ipynb', '08-File_Operations.ipynb', '09-Other_Utilities.ipynb', '10-Coding_Project.ipynb', '11-Numpy_Vectorized_Computation.ipynb', '12-Matplotlib_Data_Visualization.ipynb', '13-Pandas_Data_Processing.ipynb', '14-Sklearn_Building_A_Machine_Learning_Model.ipynb', '15-Sklearn_Data_Preprocessing.ipynb', '16-Sklearn_Best_Practice_Techniques.ipynb', '17-Artificial_Neural_Network_with_tf_Keras.ipynb', '18-ANN_Case_Studies.ipynb', '19-Practical_Autoencoders.ipynb', '20-CNN_Fundamental.ipynb']
# def 函式名稱(參數: 型別) -> 回傳型別
def jiume(who: str) -> str:
return who + ' >.^ '
jiume('Mary')
'Mary >.^ '
def addmyself(myself: int) -> int:
return myself + myself
addmyself(5)
10
from typing import Any
def triple(what: Any) -> Any:
return what * 3
print(triple(jiume('Mary')))
print(triple(addmyself(5)))
Mary >.^ Mary >.^ Mary >.^ 30
from math import fsum
# dot_product() 函式接受兩個 list 當參數
def dot_product(vec1: list, vec2: list) -> float:
return fsum(c1 * c2 for c1, c2 in zip(vec1, vec2))
# 型別提醒就只是提醒,傳兩個 tuple 還是可以正常運作
vector1, vector2 = (1, 2, 3), (4, 5, 6)
print('{} dot {} = {}'.format(vector1, vector2, dot_product(vector1, vector2)))
(1, 2, 3) dot (4, 5, 6) = 32.0
當某個型別定義在很深層的套件的模組裡時,使用別名可以讓程式看起來簡潔。
from typing import Tuple
from math import hypot
point3d = Tuple[float, float, float]
def distance3d(p1: point3d, p2: point3d) -> float:
return hypot(*[(x1 - x2) for x1, x2 in zip(p1, p2)])
a = (1, 2, 3)
b = (3., 2., 1.)
print('distance between points', a, 'and', b, '=', distance3d(a, b))
distance between points (1, 2, 3) and (3.0, 2.0, 1.0) = 2.8284271247461903