from benchmarkit import benchmark, benchmark_analyze, benchmark_run
N = 10000
seq_list = list(range(N))
seq_set = set(range(N))
SAVE_PATH = '/tmp/benchmark_time.jsonl'
@benchmark(num_iters=100, save_params=True)
def search_in_list(num_items=N):
return num_items - 1 in seq_list
@benchmark(num_iters=100, save_params=True)
def search_in_set(num_items=N):
return num_items - 1 in seq_set
benchmark_results = benchmark_run([search_in_list, search_in_set], SAVE_PATH, comment='initial benchmark search', extra_fields=['num_items'])
File: benchmark_time.jsonl Function: search_in_list date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1022 0.1125 10000 NaN ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Function: search_in_set date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:47 master 3a1e755 2019-05-20 initial benchmark search 0.003 0.0041 10000 NaN ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Total date time branch commit commit_date comment best_time mean_time mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1052 0.1166 NaN --------------------------------------------------------------------------------------------------------------------
Change N
and repeat benchmark (enough to change N
in cell 2 and restart cells 3 and 4):
N = 1000000
seq_list = list(range(N))
seq_set = set(range(N))
@benchmark(num_iters=100, save_params=True)
def search_in_list(num_items=N):
return num_items - 1 in seq_list
@benchmark(num_iters=100, save_params=True)
def search_in_set(num_items=N):
return num_items - 1 in seq_set
benchmark_results = benchmark_run([search_in_list, search_in_set], SAVE_PATH, comment='million items', extra_fields=['num_items'])
File: benchmark_time.jsonl Function: search_in_list date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1022 0.1125 10000 NaN 1 2019-05-21 18:51:02 master 3a1e755 2019-05-20 million items 9.6100 10.1782 1000000 10.0657 ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Function: search_in_set date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:47 master 3a1e755 2019-05-20 initial benchmark search 0.0030 0.0041 10000 NaN 1 2019-05-21 18:51:06 master 3a1e755 2019-05-20 million items 0.0032 0.0043 1000000 0.0002 ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Total date time branch commit commit_date comment best_time mean_time mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1052 0.1166 NaN 1 2019-05-21 18:51:02 master 3a1e755 2019-05-20 million items 9.6132 10.1825 10.0659 --------------------------------------------------------------------------------------------------------------------
benchmark_results
contains benchmark data for the last run
benchmark_results
[{'name': 'search_in_list', 'best_time': 9.61, 'mean_time': 10.1782, 'date': '2019-05-21', 'time': '18:51:02', 'branch': 'master', 'commit': '3a1e755', 'commit_date': '2019-05-20', 'num_items': 1000000, 'comment': 'million items', '_id': 'b443bfb9-78ef-4894-9fa5-aaa41025731c'}, {'name': 'search_in_set', 'best_time': 0.0032, 'mean_time': 0.0043, 'date': '2019-05-21', 'time': '18:51:06', 'branch': 'master', 'commit': '3a1e755', 'commit_date': '2019-05-20', 'num_items': 1000000, 'comment': 'million items', '_id': 'b443bfb9-78ef-4894-9fa5-aaa41025731c'}]
run benchmark_run
from command line (without !
in the real terminal):
!benchmark_run ../test_data/time/benchmark_functions.py --save_dir /tmp/ --comment "million items" --extra_fields num_items
File: benchmark_functions.jsonl Function: search_in_list date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:51:16 master 3a1e755 2019-05-20 million items 9.591 10.0661 1000000 NaN -------------------------------------------------------------------------------------------------------------------- File: benchmark_functions.jsonl Function: search_in_set date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:51:19 master 3a1e755 2019-05-20 million items 0.0022 0.0032 1000000 NaN -------------------------------------------------------------------------------------------------------------------- File: benchmark_functions.jsonl Total date time branch commit commit_date comment best_time mean_time mean_time_diff 0 2019-05-21 18:51:16 master 3a1e755 2019-05-20 million items 9.5932 10.0693 NaN ---------------------------------------------------------------------------------------------------------
benchmark_analyze
outputs results of benchmark stored in the file
benchmark_df = benchmark_analyze(SAVE_PATH, extra_fields=['num_items'])
File: benchmark_time.jsonl Function: search_in_list date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1022 0.1125 10000 NaN 1 2019-05-21 18:51:02 master 3a1e755 2019-05-20 million items 9.6100 10.1782 1000000 10.0657 ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Function: search_in_set date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:47 master 3a1e755 2019-05-20 initial benchmark search 0.0030 0.0041 10000 NaN 1 2019-05-21 18:51:06 master 3a1e755 2019-05-20 million items 0.0032 0.0043 1000000 0.0002 ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Total date time branch commit commit_date comment best_time mean_time mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1052 0.1166 NaN 1 2019-05-21 18:51:02 master 3a1e755 2019-05-20 million items 9.6132 10.1825 10.0659 --------------------------------------------------------------------------------------------------------------------
benchmark_df
contains pandas DataFrame with the results
benchmark_df
date | time | name | branch | commit | commit_date | comment | best_time | mean_time | num_items | mean_time_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-05-21 | 18:50:44 | search_in_list | master | 3a1e755 | 2019-05-20 | initial benchmark search | 0.1022 | 0.1125 | 10000.0 | NaN |
1 | 2019-05-21 | 18:51:02 | search_in_list | master | 3a1e755 | 2019-05-20 | million items | 9.6100 | 10.1782 | 1000000.0 | 10.0657 |
0 | 2019-05-21 | 18:50:47 | search_in_set | master | 3a1e755 | 2019-05-20 | initial benchmark search | 0.0030 | 0.0041 | 10000.0 | NaN |
1 | 2019-05-21 | 18:51:06 | search_in_set | master | 3a1e755 | 2019-05-20 | million items | 0.0032 | 0.0043 | 1000000.0 | 0.0002 |
0 | 2019-05-21 | 18:50:44 | total | master | 3a1e755 | 2019-05-20 | initial benchmark search | 0.1052 | 0.1166 | NaN | NaN |
1 | 2019-05-21 | 18:51:02 | total | master | 3a1e755 | 2019-05-20 | million items | 9.6132 | 10.1825 | NaN | 10.0659 |
run benchmark_analyze
from command line (without !
in the real terminal):
!benchmark_analyze /tmp/benchmark_time.jsonl --extra_fields num_items
File: benchmark_time.jsonl Function: search_in_list date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1022 0.1125 10000 NaN 1 2019-05-21 18:51:02 master 3a1e755 2019-05-20 million items 9.6100 10.1782 1000000 10.0657 ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Function: search_in_set date time branch commit commit_date comment best_time mean_time num_items mean_time_diff 0 2019-05-21 18:50:47 master 3a1e755 2019-05-20 initial benchmark search 0.0030 0.0041 10000 NaN 1 2019-05-21 18:51:06 master 3a1e755 2019-05-20 million items 0.0032 0.0043 1000000 0.0002 ------------------------------------------------------------------------------------------------------------------------------- File: benchmark_time.jsonl Total date time branch commit commit_date comment best_time mean_time mean_time_diff 0 2019-05-21 18:50:44 master 3a1e755 2019-05-20 initial benchmark search 0.1052 0.1166 NaN 1 2019-05-21 18:51:02 master 3a1e755 2019-05-20 million items 9.6132 10.1825 10.0659 --------------------------------------------------------------------------------------------------------------------
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
MODEL_BENCHMARK_SAVE_FILE = '/tmp/benchmark_model.jsonl'
x, y = load_iris(return_X_y=True)
Only parameters, passed to the decorated function log_regression
will be saved (regularization parameter C
and fit_intercept
).
In order to save results, decorated function log_regression
should return dict
with the results that need to be saved.
@benchmark(save_params=True, save_output=True)
def log_regression(C=1.0, fit_intercept=True):
clf = LogisticRegression(
random_state=0,
solver='lbfgs',
multi_class='multinomial',
C=C,
fit_intercept=fit_intercept,
)
clf.fit(x, y)
score = clf.score(x, y)
return {'score': score}
model_benchmark_results = benchmark_run(
log_regression,
MODEL_BENCHMARK_SAVE_FILE,
comment='baseline model',
extra_fields=['C', 'fit_intercept'],
metric='score',
bigger_is_better=True,
)
File: benchmark_model.jsonl Function: log_regression date time branch commit commit_date comment best_time mean_time C fit_intercept score score_diff 0 2019-05-21 18:51:35 master 3a1e755 2019-05-20 baseline model 26.2866 26.2866 1.0 True 0.973333 NaN ------------------------------------------------------------------------------------------------------------------------------------
/mnt/ubuntu_storage/home/vitaliy/scoutbee/benchmarkit/env/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:947: ConvergenceWarning: lbfgs failed to converge. Increase the number of iterations. "of iterations.", ConvergenceWarning)
Change hyperparameter C
@benchmark(save_params=True, save_output=True)
def log_regression(C=0.5, fit_intercept=True):
clf = LogisticRegression(
random_state=0,
solver='lbfgs',
multi_class='multinomial',
C=C,
fit_intercept=fit_intercept,
)
clf.fit(x, y)
score = clf.score(x, y)
return {'score': score}
model_benchmark_results = benchmark_run(
log_regression,
MODEL_BENCHMARK_SAVE_FILE,
comment='stronger regularization',
extra_fields=['C', 'fit_intercept'],
metric='score',
bigger_is_better=True,
)
File: benchmark_model.jsonl Function: log_regression date time branch commit commit_date comment best_time mean_time C fit_intercept score score_diff 0 2019-05-21 18:51:35 master 3a1e755 2019-05-20 baseline model 26.2866 26.2866 1.0 True 0.973333 NaN 1 2019-05-21 18:51:37 master 3a1e755 2019-05-20 stronger regularization 22.2981 22.2981 0.5 True 0.966667 -0.006667 ---------------------------------------------------------------------------------------------------------------------------------------------
model_benchmark_results
[{'name': 'log_regression', 'best_time': 22.2981, 'mean_time': 22.2981, 'date': '2019-05-21', 'time': '18:51:37', 'branch': 'master', 'commit': '3a1e755', 'commit_date': '2019-05-20', 'C': 0.5, 'fit_intercept': True, 'score': 0.9666666666666667, 'comment': 'stronger regularization', '_id': 'f3c3f2a0-fd11-4790-a58a-34aac1f44f5a'}]
model_benchmark_df = benchmark_analyze(MODEL_BENCHMARK_SAVE_FILE, metric='score', bigger_is_better=True, extra_fields=['C', 'fit_intercept'])
File: benchmark_model.jsonl Function: log_regression date time branch commit commit_date comment best_time mean_time C fit_intercept score score_diff 0 2019-05-21 18:51:35 master 3a1e755 2019-05-20 baseline model 26.2866 26.2866 1.0 True 0.973333 NaN 1 2019-05-21 18:51:37 master 3a1e755 2019-05-20 stronger regularization 22.2981 22.2981 0.5 True 0.966667 -0.006667 ---------------------------------------------------------------------------------------------------------------------------------------------
model_benchmark_df
date | time | name | branch | commit | commit_date | comment | best_time | mean_time | C | fit_intercept | score | score_diff | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-05-21 | 18:51:35 | log_regression | master | 3a1e755 | 2019-05-20 | baseline model | 26.2866 | 26.2866 | 1.0 | True | 0.973333 | NaN |
1 | 2019-05-21 | 18:51:37 | log_regression | master | 3a1e755 | 2019-05-20 | stronger regularization | 22.2981 | 22.2981 | 0.5 | True | 0.966667 | -0.006667 |
run benchmark_analyze
from command line (without !
in the real terminal):
!benchmark_analyze /tmp/benchmark_model.jsonl --metric score --bigger_is_better --extra_fields C fit_intercept
File: benchmark_model.jsonl Function: log_regression date time branch commit commit_date comment best_time mean_time C fit_intercept score score_diff 0 2019-05-21 18:51:35 master 3a1e755 2019-05-20 baseline model 26.2866 26.2866 1.0 True 0.973333 NaN 1 2019-05-21 18:51:37 master 3a1e755 2019-05-20 stronger regularization 22.2981 22.2981 0.5 True 0.966667 -0.006667 ---------------------------------------------------------------------------------------------------------------------------------------------