!pip3 install seaborn scikit-learn matplotlib ax-platform ray tabulate torchvision
Requirement already satisfied: seaborn in /usr/local/lib/python3.8/site-packages (0.10.1) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.8/site-packages (0.23.2) Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/site-packages (3.3.1) Requirement already satisfied: ax-platform in /usr/local/lib/python3.8/site-packages (0.1.14) Requirement already satisfied: ray in /usr/local/lib/python3.8/site-packages (0.8.7) Requirement already satisfied: tabulate in /usr/local/lib/python3.8/site-packages (0.8.7) Requirement already satisfied: torchvision in /usr/local/lib/python3.8/site-packages (0.7.0) Requirement already satisfied: scipy>=1.0.1 in /usr/local/lib/python3.8/site-packages (from seaborn) (1.5.2) Requirement already satisfied: numpy>=1.13.3 in /usr/local/lib/python3.8/site-packages (from seaborn) (1.16.0) Requirement already satisfied: pandas>=0.22.0 in /usr/local/lib/python3.8/site-packages (from seaborn) (1.0.0) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (0.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.8/site-packages (from scikit-learn) (2.1.0) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/site-packages (from matplotlib) (0.10.0) Requirement already satisfied: certifi>=2020.06.20 in /usr/local/lib/python3.8/site-packages (from matplotlib) (2020.6.20) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/site-packages (from matplotlib) (2.8.1) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.8/site-packages (from matplotlib) (7.2.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /usr/local/lib/python3.8/site-packages (from matplotlib) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/site-packages (from matplotlib) (1.2.0) Requirement already satisfied: botorch>=0.2.2 in /usr/local/lib/python3.8/site-packages (from ax-platform) (0.3.0) Requirement already satisfied: plotly in /usr/local/lib/python3.8/site-packages (from ax-platform) (4.9.0) Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/site-packages (from ax-platform) (2.11.2) Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.8/site-packages (from ray) (7.1.2) Requirement already satisfied: protobuf>=3.8.0 in /usr/local/lib/python3.8/site-packages (from ray) (3.13.0) Requirement already satisfied: pyyaml in /usr/local/lib/python3.8/site-packages (from ray) (5.3.1) Requirement already satisfied: gpustat in /usr/local/lib/python3.8/site-packages (from ray) (0.6.0) Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /usr/local/lib/python3.8/site-packages (from ray) (1.0.0) Requirement already satisfied: colorama in /usr/local/lib/python3.8/site-packages (from ray) (0.4.3) Requirement already satisfied: google in /usr/local/lib/python3.8/site-packages (from ray) (3.0.0) Requirement already satisfied: opencensus in /usr/local/lib/python3.8/site-packages (from ray) (0.7.10) Requirement already satisfied: aioredis in /usr/local/lib/python3.8/site-packages (from ray) (1.3.1) Requirement already satisfied: aiohttp in /usr/local/lib/python3.8/site-packages (from ray) (3.6.2) Requirement already satisfied: grpcio>=1.28.1 in /usr/local/lib/python3.8/site-packages (from ray) (1.31.0) Requirement already satisfied: jsonschema in /usr/local/lib/python3.8/site-packages (from ray) (3.2.0) Requirement already satisfied: colorful in /usr/local/lib/python3.8/site-packages (from ray) (0.5.4) Requirement already satisfied: py-spy>=0.2.0 in /usr/local/lib/python3.8/site-packages (from ray) (0.3.3) Requirement already satisfied: requests in /usr/local/lib/python3.8/site-packages (from ray) (2.24.0) Requirement already satisfied: filelock in /usr/local/lib/python3.8/site-packages (from ray) (3.0.12) Requirement already satisfied: redis<3.5.0,>=3.3.2 in /usr/local/lib/python3.8/site-packages (from ray) (3.4.1) Requirement already satisfied: prometheus-client>=0.7.1 in /usr/local/lib/python3.8/site-packages (from ray) (0.8.0) Requirement already satisfied: torch==1.6.0 in /usr/local/lib/python3.8/site-packages (from torchvision) (1.6.0) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.8/site-packages (from pandas>=0.22.0->seaborn) (2020.1) Requirement already satisfied: six in /usr/local/lib/python3.8/site-packages (from cycler>=0.10->matplotlib) (1.15.0) Requirement already satisfied: gpytorch>=1.1.1 in /usr/local/lib/python3.8/site-packages (from botorch>=0.2.2->ax-platform) (1.2.0) Requirement already satisfied: retrying>=1.3.3 in /usr/local/lib/python3.8/site-packages (from plotly->ax-platform) (1.3.3) Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.8/site-packages (from jinja2->ax-platform) (1.1.1) Requirement already satisfied: setuptools in /usr/local/lib/python3.8/site-packages (from protobuf>=3.8.0->ray) (49.2.0) Requirement already satisfied: blessings>=1.6 in /usr/local/lib/python3.8/site-packages (from gpustat->ray) (1.7) Requirement already satisfied: psutil in /usr/local/lib/python3.8/site-packages (from gpustat->ray) (5.7.2) Requirement already satisfied: nvidia-ml-py3>=7.352.0 in /usr/local/lib/python3.8/site-packages (from gpustat->ray) (7.352.0) Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.8/site-packages (from google->ray) (4.9.1) Requirement already satisfied: google-api-core<2.0.0,>=1.0.0 in /usr/local/lib/python3.8/site-packages (from opencensus->ray) (1.22.1) Requirement already satisfied: opencensus-context==0.1.1 in /usr/local/lib/python3.8/site-packages (from opencensus->ray) (0.1.1) Requirement already satisfied: hiredis in /usr/local/lib/python3.8/site-packages (from aioredis->ray) (1.1.0) Requirement already satisfied: async-timeout in /usr/local/lib/python3.8/site-packages (from aioredis->ray) (3.0.1) Requirement already satisfied: multidict<5.0,>=4.5 in /usr/local/lib/python3.8/site-packages (from aiohttp->ray) (4.7.6) Requirement already satisfied: chardet<4.0,>=2.0 in /usr/local/lib/python3.8/site-packages (from aiohttp->ray) (3.0.4) Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.8/site-packages (from aiohttp->ray) (19.3.0) Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.8/site-packages (from aiohttp->ray) (1.5.1) Requirement already satisfied: pyrsistent>=0.14.0 in /usr/local/lib/python3.8/site-packages (from jsonschema->ray) (0.16.0) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.8/site-packages (from requests->ray) (1.25.9) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.8/site-packages (from requests->ray) (2.10) Requirement already satisfied: future in /usr/local/lib/python3.8/site-packages (from torch==1.6.0->torchvision) (0.18.2) Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.8/site-packages (from beautifulsoup4->google->ray) (2.0.1) Requirement already satisfied: google-auth<2.0dev,>=1.19.1 in /usr/local/lib/python3.8/site-packages (from google-api-core<2.0.0,>=1.0.0->opencensus->ray) (1.21.0) Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.8/site-packages (from google-api-core<2.0.0,>=1.0.0->opencensus->ray) (1.52.0) Requirement already satisfied: rsa<5,>=3.1.4; python_version >= "3.5" in /usr/local/lib/python3.8/site-packages (from google-auth<2.0dev,>=1.19.1->google-api-core<2.0.0,>=1.0.0->opencensus->ray) (4.6) Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.8/site-packages (from google-auth<2.0dev,>=1.19.1->google-api-core<2.0.0,>=1.0.0->opencensus->ray) (4.1.1) Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.8/site-packages (from google-auth<2.0dev,>=1.19.1->google-api-core<2.0.0,>=1.0.0->opencensus->ray) (0.2.8) Requirement already satisfied: pyasn1>=0.1.3 in /usr/local/lib/python3.8/site-packages (from rsa<5,>=3.1.4; python_version >= "3.5"->google-auth<2.0dev,>=1.19.1->google-api-core<2.0.0,>=1.0.0->opencensus->ray) (0.4.8)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, train_test_split, cross_validate, learning_curve
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, auc, classification_report
%matplotlib inline
df = pd.read_csv('churn.csv', usecols=[
'price',
'room_type',
'accommodates',
'review_scores_cleanliness',
'review_scores_location',
'host_response_rate',
'availability_365',
'number_of_reviews',
'reviews_per_month',
'churn'
])
labels = df.churn
features = df.drop('churn', axis=1)
features.head()
price | room_type | accommodates | review_scores_cleanliness | review_scores_location | host_response_rate | availability_365 | number_of_reviews | reviews_per_month | |
---|---|---|---|---|---|---|---|---|---|
0 | 200.0 | Entire home/apt | 3 | 10.0 | 10.0 | 0.0 | 243.0 | 4 | 0.0 |
1 | 450.0 | Entire home/apt | 4 | 10.0 | 10.0 | 0.0 | 208.0 | 7 | 0.0 |
2 | 28.0 | Shared room | 1 | 0.0 | 0.0 | 0.0 | 364.0 | 0 | 0.0 |
3 | 48.0 | Private room | 2 | 8.0 | 6.0 | 0.0 | 143.0 | 1 | 0.0 |
4 | 160.0 | Private room | 2 | 0.0 | 0.0 | 0.0 | 365.0 | 0 | 0.0 |
feature_matrix = pd.get_dummies(features).values
feature_matrix
array([[200., 3., 10., ..., 1., 0., 0.], [450., 4., 10., ..., 1., 0., 0.], [ 28., 1., 0., ..., 0., 0., 1.], ..., [200., 3., 10., ..., 1., 0., 0.], [235., 2., 10., ..., 1., 0., 0.], [ 49., 5., 10., ..., 0., 1., 0.]])
X_train, X_test, y_train, y_test = train_test_split(feature_matrix, labels, test_size=0.3)
print(len(X_train))
print(len(X_test))
54040 23161
# create model (estimator) object
model = LogisticRegression(penalty='l2')
model
LogisticRegression()
# fit model to training data
model.fit(X_train, y_train)
/usr/local/lib/python3.8/site-packages/sklearn/linear_model/_logistic.py:762: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(
LogisticRegression()
model.get_params()
{'C': 1.0, 'class_weight': None, 'dual': False, 'fit_intercept': True, 'intercept_scaling': 1, 'l1_ratio': None, 'max_iter': 100, 'multi_class': 'auto', 'n_jobs': None, 'penalty': 'l2', 'random_state': None, 'solver': 'lbfgs', 'tol': 0.0001, 'verbose': 0, 'warm_start': False}
model.coef_
array([[-1.14667640e-04, -2.89031320e-02, -1.19389395e-01, 5.97402377e-02, -1.82766351e-03, 4.30085341e-03, -1.94983882e-02, 3.06824947e-02, 2.54465715e-01, -1.58843037e-01, -3.63911327e-03]])
# make predictions
predictions = model.predict(X_test)
predictions
array([False, True, True, ..., True, False, True])
print(classification_report(y_test, predictions))
precision recall f1-score support False 0.62 0.51 0.56 9797 True 0.68 0.77 0.72 13364 accuracy 0.66 23161 macro avg 0.65 0.64 0.64 23161 weighted avg 0.66 0.66 0.66 23161
# evaluate model
accuracy = (predictions == y_test).sum() / len(y_test)
accuracy
0.6617158153793015
model.score(X_test, y_test)
0.6617158153793015
df.churn.value_counts()
True 44626 False 32575 Name: churn, dtype: int64
from sklearn.model_selection import validation_curve
def plot_validation_curve(clf, X, y, param_name, param_range):
train_scores, test_scores = validation_curve(
clf, X, y, param_name=param_name, param_range=param_range,
scoring="accuracy", n_jobs=-1)
### matplotlib
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.title("Validation Curve")
plt.xlabel(param_name)
plt.ylabel("Score")
plt.ylim(0.0, 1.1)
lw = 2
plt.semilogx(param_range, train_scores_mean, label="Training score",
color="darkorange", lw=lw)
plt.fill_between(param_range, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.2,
color="darkorange", lw=lw)
plt.semilogx(param_range, test_scores_mean, label="Cross-validation score",
color="navy", lw=lw)
plt.fill_between(param_range, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.2,
color="navy", lw=lw)
plt.legend(loc="best")
plt.show()
clf = LogisticRegression(penalty='l1', solver='liblinear', max_iter=1000)
param_range = [0.00001, 0.0001, 0.001, 0.01, 0.1, 1.0]
plot_validation_curve(clf, feature_matrix[:, :], labels[:], "C", param_range = param_range)
from sklearn.svm import SVC
%%timeit -n 1 -r 1
plot_validation_curve(SVC(), X_train[:5000, :], y_train[:5000], "gamma", param_range = np.logspace(-10, 1, 20))
17.1 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)
# from https://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
def plot_learning_curve(estimator, title, X, y, ylim=None, cv=3,
n_jobs=None, train_sizes=np.linspace(.1, 1.0, 5)):
train_sizes, train_scores, test_scores = learning_curve(
estimator, X, y, cv=cv, n_jobs=n_jobs, train_sizes=train_sizes)
# plotting functions
plt.figure(figsize=(20, 10))
plt.title(title)
if ylim is not None:
plt.ylim(*ylim)
plt.xlabel("Training examples")
plt.ylabel("Score")
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)
plt.grid()
plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
train_scores_mean + train_scores_std, alpha=0.1,
color="r")
plt.fill_between(train_sizes, test_scores_mean - test_scores_std,
test_scores_mean + test_scores_std, alpha=0.1, color="g")
plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
label="Training score")
plt.plot(train_sizes, test_scores_mean, 'o-', color="g",
label="Cross-validation score")
plt.legend(loc="best")
return plt, learning_curve
clf = SVC() #LogisticRegression(penalty='l2', solver='lbfgs', max_iter=2000)
plot, curve = plot_learning_curve(clf, 'Learning Curve', X_train[:10000], y_train[:10000], train_sizes=np.linspace(.1, 1.0, 5), cv=3)
def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
def plot_roc(y_test, y_score):
fpr, tpr, thresholds = roc_curve(y_test, y_score[:, 1])
plt.figure(figsize=(10,10))
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % auc(fpr, tpr))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC plot for Logistic Regression for Airbnb Churn')
plt.legend(loc="lower right")
plt.show()
y_pred = pipeline.predict(X_test)
y_score = pipeline.predict_proba(X_test)
# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)
np.set_printoptions(precision=2)
print('Confusion matrix')
print(cm)
plt.figure(figsize=(10,10))
plot_confusion_matrix(cm)
Confusion matrix [[4569 2074] [2349 6449]]
plot_roc(y_test, y_score)
X_train, X_test, y_train, y_test = train_test_split(feature_matrix, labels, test_size=0.2)
len(10.0 ** -np.arange(1, 15))
14
%%time
# See also https://scikit-learn.org/stable/modules/grid_search.html
scores = np.zeros((2, 14))
kernels = ['rbf', 'sigmoid']
gamma = (10.0 ** -np.arange(1, 15))[::-1]
for i, kernel in enumerate(kernels):
for j, g in enumerate(gamma):
# train_evaluate(penalty, C)
clf = SVC(gamma = g, kernel=kernel) #LogisticRegression(penalty=penalty, C=C, solver='liblinear', max_iter=1500)
scores[i, j] = cross_val_score(clf, X_train[:10000, :], y_train[:10000], cv=3).mean()
CPU times: user 2min 24s, sys: 1.84 s, total: 2min 26s Wall time: 2min 27s
sns.heatmap(scores, xticklabels=gamma, yticklabels=kernels)
<AxesSubplot:>
scores[0].argmax()
4
# Create model with best performing hyperparameters
clf = SVC(gamma=0.0001, kernel='rbf')
# Fit on all of your (training) data
clf.fit(X_train, y_train)
# Evaluate on Holdout set
print(clf.score(X_test, y_test))
0.7130367204196619
import numpy as np
import time
import ray
from ray import tune
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.suggest.ax import AxSearch
import torch
import torchvision
import torchvision.transforms as transforms
# test everything is working
from ax import optimize
best_parameters, best_values, experiment, model = optimize(
parameters=[
{
"name": "x1",
"type": "range",
"bounds": [-10.0, 10.0],
},
{
"name": "x2",
"type": "range",
"bounds": [-10.0, 10.0],
},
],
# Booth function
evaluation_function=lambda p: (p["x1"] + 2*p["x2"] - 7)**2 + (2*p["x1"] + p["x2"] - 5)**2,
minimize=True,
)
best_parameters
[INFO 09-10 14:45:17] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 trials, GPEI for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting. [INFO 09-10 14:45:17] ax.service.managed_loop: Started full optimization with 20 steps. [INFO 09-10 14:45:17] ax.service.managed_loop: Running optimization trial 1... [INFO 09-10 14:45:17] ax.service.managed_loop: Running optimization trial 2... [INFO 09-10 14:45:17] ax.service.managed_loop: Running optimization trial 3... [INFO 09-10 14:45:17] ax.service.managed_loop: Running optimization trial 4... [INFO 09-10 14:45:17] ax.service.managed_loop: Running optimization trial 5... [INFO 09-10 14:45:17] ax.service.managed_loop: Running optimization trial 6... [INFO 09-10 14:45:18] ax.service.managed_loop: Running optimization trial 7... [INFO 09-10 14:45:18] ax.service.managed_loop: Running optimization trial 8... [INFO 09-10 14:45:19] ax.service.managed_loop: Running optimization trial 9... [INFO 09-10 14:45:20] ax.service.managed_loop: Running optimization trial 10... [INFO 09-10 14:45:20] ax.service.managed_loop: Running optimization trial 11... [INFO 09-10 14:45:21] ax.service.managed_loop: Running optimization trial 12... [INFO 09-10 14:45:21] ax.service.managed_loop: Running optimization trial 13... [INFO 09-10 14:45:22] ax.service.managed_loop: Running optimization trial 14... [INFO 09-10 14:45:23] ax.service.managed_loop: Running optimization trial 15... [INFO 09-10 14:45:23] ax.service.managed_loop: Running optimization trial 16... [INFO 09-10 14:45:24] ax.service.managed_loop: Running optimization trial 17... [INFO 09-10 14:45:24] ax.service.managed_loop: Running optimization trial 18... [INFO 09-10 14:45:25] ax.service.managed_loop: Running optimization trial 19... [INFO 09-10 14:45:26] ax.service.managed_loop: Running optimization trial 20...
{'x1': 0.9397810290289481, 'x2': 3.0414953429269413}
import numpy as np
from ax.modelbridge.cross_validation import cross_validate
from ax.modelbridge import NumpyModelBridge
from ax.modelbridge.registry import Models
from ax.plot.contour import interact_contour, plot_contour
from ax.plot.diagnostic import interact_cross_validation
from ax.plot.scatter import(
interact_fitted,
plot_objective_vs_constraints,
tile_fitted,
)
from ax.plot.slice import plot_slice
from ax.utils.notebook.plotting import render, init_notebook_plotting
from scipy import stats
init_notebook_plotting()
[INFO 09-10 14:46:48] ipy_plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.
def train_evaluate(parameterization):
clf = SVC(kernel=parameterization.get('kernel'), gamma=parameterization.get('gamma'))
scores = cross_val_score(clf, X_train[:10000, :], y_train[:10000], cv=3)
return {"accuracy": (scores.mean(), stats.sem(scores))}
%%time
kernels = ['rbf', 'sigmoid']
gamma = (10.0 ** -np.arange(1, 15))[::-1]
best_parameters, values, experiment, model = optimize(
parameters=[
{"name": "kernel", "type": "choice", "values": ['rbf', 'sigmoid']},
{"name": "gamma", "type": "range", "bounds": [1e-8, 0.1], "log_scale": True},
],
evaluation_function=train_evaluate,
objective_name='accuracy',
total_trials=10 # Optional.
)
[INFO 09-10 15:20:47] ax.modelbridge.dispatch_utils: Using Sobol generation strategy. [INFO 09-10 15:20:47] ax.service.managed_loop: Started full optimization with 10 steps. [INFO 09-10 15:20:47] ax.service.managed_loop: Running optimization trial 1... [INFO 09-10 15:20:53] ax.service.managed_loop: Running optimization trial 2... [INFO 09-10 15:20:58] ax.service.managed_loop: Running optimization trial 3... [INFO 09-10 15:21:04] ax.service.managed_loop: Running optimization trial 4... [INFO 09-10 15:21:10] ax.service.managed_loop: Running optimization trial 5... [INFO 09-10 15:21:15] ax.service.managed_loop: Running optimization trial 6... [INFO 09-10 15:21:18] ax.service.managed_loop: Running optimization trial 7... [INFO 09-10 15:21:26] ax.service.managed_loop: Running optimization trial 8... [INFO 09-10 15:21:31] ax.service.managed_loop: Running optimization trial 9... [INFO 09-10 15:21:39] ax.service.managed_loop: Running optimization trial 10...
CPU times: user 56.1 s, sys: 607 ms, total: 56.7 s Wall time: 57.1 s
plt.figure(figsize=(5,5))
sns.heatmap(scores, xticklabels=gamma, yticklabels=kernels)
<AxesSubplot:>
best_parameters
{'gamma': 0.0001269977587076426, 'kernel': 'rbf'}
values[0]
{'accuracy': 0.6942002538586193}
from sklearn.neural_network import MLPClassifier
def train_evaluate(parameterization):
clf = MLPClassifier(solver='sgd', \
learning_rate_init=parameterization.get('lr'), \
momentum=parameterization.get('momentum'), \
hidden_layer_sizes=eval(parameterization.get('layer')), max_iter=5000)
scores = cross_val_score(clf, X_train[:10000], y_train[:10000], cv=3)
return {"accuracy": (scores.mean(), stats.sem(scores))}
best_parameters, values, experiment, model = optimize(
parameters=[
{"name": "lr", "type": "range", "bounds": [1e-6, 1.0], "log_scale": True},
{"name": "momentum", "type": "range", "bounds": [0.0, 1.0]},
{"name": "layer", "type": "fixed", "value": '(100,)'},
],
evaluation_function=train_evaluate,
objective_name='accuracy',
total_trials=15 # Optional.
)
[INFO 09-10 15:09:16] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 trials, GPEI for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting. [INFO 09-10 15:09:16] ax.service.managed_loop: Started full optimization with 15 steps. [INFO 09-10 15:09:16] ax.service.managed_loop: Running optimization trial 1... [INFO 09-10 15:09:20] ax.service.managed_loop: Running optimization trial 2... [INFO 09-10 15:09:21] ax.service.managed_loop: Running optimization trial 3... /usr/local/lib/python3.8/site-packages/sklearn/neural_network/_multilayer_perceptron.py:587: UserWarning: Training interrupted by user. [INFO 09-10 15:10:32] ax.service.managed_loop: Running optimization trial 4... [INFO 09-10 15:10:34] ax.service.managed_loop: Running optimization trial 5... [INFO 09-10 15:12:05] ax.service.managed_loop: Running optimization trial 6... [INFO 09-10 15:12:17] ax.service.managed_loop: Running optimization trial 7... [INFO 09-10 15:12:20] ax.service.managed_loop: Running optimization trial 8... [INFO 09-10 15:12:26] ax.service.managed_loop: Running optimization trial 9... [INFO 09-10 15:12:30] ax.service.managed_loop: Running optimization trial 10... [INFO 09-10 15:12:54] ax.service.managed_loop: Running optimization trial 11... [INFO 09-10 15:12:59] ax.service.managed_loop: Running optimization trial 12... [INFO 09-10 15:13:02] ax.service.managed_loop: Running optimization trial 13... [INFO 09-10 15:13:05] ax.service.managed_loop: Running optimization trial 14... [INFO 09-10 15:13:18] ax.service.managed_loop: Running optimization trial 15...
# sklearn default lr=0.001, momentum=0.9
best_parameters
{'lr': 9.194136515259465e-06, 'momentum': 0.6134162649047076, 'layer': '(100,)'}
render(plot_contour(model=model, param_x='lr', param_y='momentum', metric_name='accuracy'))
# sklearn default lr=0.001
render(plot_slice(model, "lr", "accuracy"))
# sklearn default momentum=0.9
render(plot_slice(model, "momentum", "accuracy"))
from ax.plot.trace import optimization_trace_single_method
best_objectives = np.array([[trial.objective_mean*100 for trial in experiment.trials.values()]])
best_objective_plot = optimization_trace_single_method(
y=np.maximum.accumulate(best_objectives, axis=1),
title="Model performance vs. # of iterations",
ylabel="Classification Accuracy, %",
)
render(best_objective_plot)
cv_results = cross_validate(model)
render(interact_cross_validation(cv_results))
best_parameters, values, experiment, model = optimize(
parameters=[
{"name": "lr", "type": "range", "bounds": [1e-6, 1.0], "log_scale": True},
{"name": "momentum", "type": "range", "bounds": [0.0, 1.0]},
{"name": "layer", "type": "choice", "values": ['(100,)','(128,64,32,16,4,2)']},
],
evaluation_function=train_evaluate,
objective_name='accuracy',
total_trials=10 # Optional.
)
[INFO 09-10 15:06:23] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 trials, GPEI for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting. [INFO 09-10 15:06:23] ax.service.managed_loop: Started full optimization with 10 steps. [INFO 09-10 15:06:23] ax.service.managed_loop: Running optimization trial 1... [INFO 09-10 15:06:25] ax.service.managed_loop: Running optimization trial 2... [INFO 09-10 15:06:25] ax.service.managed_loop: Running optimization trial 3... [INFO 09-10 15:06:32] ax.service.managed_loop: Running optimization trial 4... [INFO 09-10 15:06:32] ax.service.managed_loop: Running optimization trial 5... [INFO 09-10 15:06:33] ax.service.managed_loop: Running optimization trial 6... [INFO 09-10 15:07:03] ax.service.managed_loop: Running optimization trial 7... [INFO 09-10 15:07:04] ax.service.managed_loop: Running optimization trial 8... [INFO 09-10 15:07:11] ax.service.managed_loop: Running optimization trial 9... [INFO 09-10 15:07:30] ax.service.managed_loop: Running optimization trial 10...
# sklearn default lr=0.001, momentum=0.9, layer=(100,)
best_parameters
{'lr': 0.0007868271339151282, 'momentum': 0.9069598317146301, 'layer': '(10,10,10)'}
values[0]
{'accuracy': 0.6509893126659594}
import numpy as np
import time
import ray
from ray import tune
from ray.tune.schedulers import AsyncHyperBandScheduler
from ray.tune.suggest.ax import AxSearch
import logging
from ray import tune
from ray.tune import report
from ray.tune.suggest.ax import AxSearch
logger = logging.getLogger(tune.__name__)
logger.setLevel(level=logging.CRITICAL)
from ax.plot.contour import plot_contour
from ax.plot.trace import optimization_trace_single_method
from ax.service.ax_client import AxClient
from ax.utils.notebook.plotting import render, init_notebook_plotting
from ax.utils.tutorials.cnn_utils import CNN, load_mnist, train, evaluate
init_notebook_plotting()
[INFO 09-10 15:29:50] ipy_plotting: Injecting Plotly library into cell. Do not overwrite or delete cell.
ray.shutdown()
ray.init()
2020-09-10 15:44:33,484 INFO resource_spec.py:223 -- Starting Ray with 9.72 GiB memory available for workers and up to 4.88 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-09-10 15:44:33,905 INFO services.py:1191 -- View the Ray dashboard at localhost:8265
{'node_ip_address': '192.168.1.107', 'raylet_ip_address': '192.168.1.107', 'redis_address': '192.168.1.107:6379', 'object_store_address': '/tmp/ray/session_2020-09-10_15-44-33_456876_60061/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2020-09-10_15-44-33_456876_60061/sockets/raylet', 'webui_url': 'localhost:8265', 'session_dir': '/tmp/ray/session_2020-09-10_15-44-33_456876_60061'}
ax = AxClient(enforce_sequential_optimization=False, verbose_logging=False)
ax.create_experiment(
name="nn_experiment",
parameters=[
{"name": "lr", "type": "range", "bounds": [1e-6, 1.0], "log_scale": True},
{"name": "momentum", "type": "range", "bounds": [0.0, 1.0]},
{"name": "layer", "type": "fixed", "value": '(100,)'},
],
objective_name="accuracy",
)
[INFO 09-10 15:35:58] ax.modelbridge.dispatch_utils: Using Bayesian Optimization generation strategy: GenerationStrategy(name='Sobol+GPEI', steps=[Sobol for 5 trials, GPEI for subsequent trials]). Iterations after 5 will take longer to generate due to model-fitting.
from sklearn.neural_network import MLPClassifier
def train_evaluate(parameterization):
clf = MLPClassifier(solver='sgd', \
learning_rate_init=parameterization.get('lr'), \
momentum=parameterization.get('momentum'), \
hidden_layer_sizes=eval(parameterization.get('layer')), max_iter=5000)
for i in range(5000):
clf.partial_fit(X_train[:2000], y_train[:2000], np.unique(y_train))
if i % 100 == 0:
report(accuracy=clf.score(X_test, y_test))
scheduler = AsyncHyperBandScheduler(metric="accuracy", mode="max")
tune.run(
train_evaluate,
num_samples=500,
search_alg=AxSearch(ax), # Note that the argument here is the `AxClient`.
verbose=1, # Set this level to 1 to see status updates and to 2 to also see trial results.
scheduler=scheduler
)
# just sequential Bayesian Opt.
{'lr': 9.194136515259465e-06,
'momentum': 0.6134162649047076,
'layer': '(100,)'}
best_parameters, values = ax.get_best_parameters()
best_parameters
{'lr': 0.012067419606835588, 'momentum': 0.26883253641426563, 'layer': '(100,)'}
means, covariances = values
print(means)
print(covariances)
{'accuracy': 0.700343242018004} {'accuracy': {'accuracy': 0.0}}