First of all, we should change the Azure Storage version to 0.30.0 so as to use the module BlockBlobService
to create the web service and deploy the model.
Note: Restart the kernel after updating the package.
# should restart the kernel each time update the package
# update azure storage (change version) in order to use BlockBlobService
!pip install azure-storage==0.30.0
Collecting azure-storage==0.30.0 Downloading https://files.pythonhosted.org/packages/76/28/e74b38107b3c087e4de18dc20bfb15f6c3d9b766ae827bf42fc79170ffe2/azure-storage-0.30.0.zip (153kB) 100% |████████████████████████████████| 163kB 8.6MB/s eta 0:00:01 Requirement already satisfied: azure-nspkg in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (2.0.0) Requirement already satisfied: azure-common in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (1.1.15) Requirement already satisfied: python-dateutil in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (2.7.3) Requirement already satisfied: requests in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (2.19.1) Requirement already satisfied: six>=1.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from python-dateutil->azure-storage==0.30.0) (1.11.0) Requirement already satisfied: urllib3<1.24,>=1.21.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (1.23) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (3.0.4) Requirement already satisfied: idna<2.8,>=2.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (2.6) Requirement already satisfied: certifi>=2017.4.17 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (2017.7.27.1) Building wheels for collected packages: azure-storage Running setup.py bdist_wheel for azure-storage ... done Stored in directory: /home/nbuser/.cache/pip/wheels/8c/00/22/879600d3b3e5d10fa31312d498f9ae8ac6cc6d2c59aac5acbb Successfully built azure-storage Installing collected packages: azure-storage Successfully installed azure-storage-0.30.0
import os
import numpy as np
import pandas as pd
import azureml
from azureml.core import Workspace, Run
from azureml.core.model import Model
# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)
Azure ML SDK Version: 0.1.59
# load workspace configuration from the config.json file in the current folder.
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')
Found the config file in: /home/nbuser/library/aml_config/config.json Performing interactive authentication. Please follow the instructions on the terminal.
To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code F5QHYXE84 to authenticate.
Interactive authentication successfully completed. Xiangzhe-WS westeurope Xiangzhe-ML westeurope
We have registered the model in our workspace. Now, we should retrieve it.
Note : Once we download the model on local, we don't have to run the cell below anymore.
model = Model(ws, 'nyc_taxi_model')
# If the model has already been downloaded, we don't have to do this step.
model.download(target_dir = '.')
# verify the downloaded model file
os.stat('./nyc_taxi_model.pkl')
os.stat_result(st_mode=33188, st_ino=7, st_dev=49, st_nlink=1, st_uid=1000, st_gid=1000, st_size=940, st_atime=0, st_mtime=1538653134, st_ctime=1538653134)
Once we've tested the model and are satisfied with the results, we can deploy the model as a web service.
Create the scoring script, called score.py
, used by the web service call to show how to use the model.
We must include two required functions into the scoring script:
The init()
function, which typically loads the model into a global object. This function is run only once when the Docker container is started.
The run(input_data)
function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.
%%writefile score.py
import os
import json
import pickle
import numpy as np
from sklearn.externals import joblib
from sklearn.linear_model import LinearRegression
from azureml.core.model import Model
def init():
global model
# retreive the path to the model file using the model name
model_path = Model.get_model_path('nyc_taxi_model')
model = joblib.load(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)['data'])
# make prediction
y_hat = model.predict(data)
return json.dumps(y_hat.tolist())
Overwriting score.py
Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image.
from azureml.core.conda_dependencies import CondaDependencies
myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
myenv.add_conda_package("numpy")
myenv.add_conda_package("pandas")
myenv.add_pip_package("pynacl==1.2.1")
with open("myenv.yml","w") as f:
f.write(myenv.serialize_to_string())
Review the content of the myenv.yml
file.
with open("myenv.yml","r") as f:
print(f.read())
# Conda environment specification. The dependencies defined in this file will # be automatically provisioned for runs with userManagedDependencies=False. # Details about the Conda environment file format: # https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually name: project_environment dependencies: # The python interpreter version. # Currently Azure ML only supports 3.5.2 and later. - python=3.6.2 - pip: # Required packages for AzureML execution, history, and data preparation. - azureml-defaults - pynacl==1.2.1 - scikit-learn - numpy - pandas
Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for our ACI container. While it depends on the model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models.
from azureml.core.webservice import AciWebservice
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1,
memory_gb=1,
tags={"data": "nyc-taxi-trip", "method" : "sklearn"},
description='NYC taxi trip duration prediction')
Configure the image and deploy. Normally, it will take 7-8 minutes to deploy the model.
%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage
# configure the image
image_config = ContainerImage.image_configuration(execution_script="score.py",
runtime="python",
conda_file="myenv.yml")
service = Webservice.deploy_from_model(workspace=ws,
name='nyc-taxi-dsvm',
deployment_config=aciconfig,
models=[model],
image_config=image_config)
service.wait_for_deployment(show_output=True)
Creating image Image creation operation finished for image nyc-taxi-dsvm:1, operation "Succeeded" Creating service Running..................................... SucceededACI service creation operation finished, operation "Succeeded" CPU times: user 4.91 s, sys: 308 ms, total: 5.22 s Wall time: 9min 28s
Get the scoring web service's HTTP endpoint, which accepts REST client calls.
This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.
print(service.scoring_uri)
http://40.115.23.210:80/score
First of all, load the test sub-dataset.
from sklearn import preprocessing
pd_dataframe = pd.read_pickle("./data/sub_data_after_prep.pkl")
y_test = np.array(pd_dataframe["trip_duration"]).astype(float)
y_test = np.log(y_test)
X_test = np.array(pd_dataframe.drop(["trip_duration"],axis = 1))
# normalize input
scaler = preprocessing.StandardScaler().fit(X_test)
X_test = scaler.transform(X_test)
/home/nbuser/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype object was converted to float64 by StandardScaler. warnings.warn(msg, DataConversionWarning)
Secondly, test the deployed model web service with test sub-dataset.
With the first method, we can test a group of data. However, with the second method(HTTP request), we can only test one raw each time.
import json
from sklearn.metrics import mean_squared_error
# find 30 random samples from test set
n = 30
sample_indices = np.random.permutation(X_test.shape[0])[0:n]
test_samples = json.dumps({"data": X_test[sample_indices].tolist()})
test_samples = bytes(test_samples, encoding = 'utf8')
# predict using the deployed model
y_pred = json.loads(service.run(input_data = test_samples))
mse = mean_squared_error(y_test[sample_indices], y_pred)
print("Mean Squared Error for Linear Regression: {}".format(mse))
Mean Squared Error for Linear Regression: 0.4408346330734582
We can also send raw HTTP request to test the web service.
import requests
import json
from sklearn.metrics import mean_squared_error
# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"
headers = {'Content-Type':'application/json'}
y_pred = requests.post(service.scoring_uri, input_data, headers = headers)
print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", y_test[random_index])
print("prediction:", y_pred.text)
POST to url http://40.115.23.210:80/score label: 6.963189985870238 prediction: "[6.647143813339028]"
service.delete()