Notebook

Index¶

Load workspace
Retrieve the model
Deploy as web service
Clean up a web service

First of all, we should change the Azure Storage version to 0.30.0 so as to use the module BlockBlobService to create the web service and deploy the model.

Note: Restart the kernel after updating the package.

In [1]:

# should restart the kernel each time update the package
# update azure storage (change version) in order to use BlockBlobService

!pip install azure-storage==0.30.0

Collecting azure-storage==0.30.0
  Downloading https://files.pythonhosted.org/packages/76/28/e74b38107b3c087e4de18dc20bfb15f6c3d9b766ae827bf42fc79170ffe2/azure-storage-0.30.0.zip (153kB)
    100% |████████████████████████████████| 163kB 8.6MB/s eta 0:00:01
Requirement already satisfied: azure-nspkg in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (2.0.0)
Requirement already satisfied: azure-common in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (1.1.15)
Requirement already satisfied: python-dateutil in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (2.7.3)
Requirement already satisfied: requests in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from azure-storage==0.30.0) (2.19.1)
Requirement already satisfied: six>=1.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from python-dateutil->azure-storage==0.30.0) (1.11.0)
Requirement already satisfied: urllib3<1.24,>=1.21.1 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (1.23)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (3.0.4)
Requirement already satisfied: idna<2.8,>=2.5 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (2.6)
Requirement already satisfied: certifi>=2017.4.17 in /home/nbuser/anaconda3_501/lib/python3.6/site-packages (from requests->azure-storage==0.30.0) (2017.7.27.1)
Building wheels for collected packages: azure-storage
  Running setup.py bdist_wheel for azure-storage ... done
  Stored in directory: /home/nbuser/.cache/pip/wheels/8c/00/22/879600d3b3e5d10fa31312d498f9ae8ac6cc6d2c59aac5acbb
Successfully built azure-storage
Installing collected packages: azure-storage
Successfully installed azure-storage-0.30.0

In [1]:

import os
import numpy as np
import pandas as pd

import azureml
from azureml.core import Workspace, Run
from azureml.core.model import Model

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  0.1.59

Load workspace¶

In [2]:

# load workspace configuration from the config.json file in the current folder.
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')

Found the config file in: /home/nbuser/library/aml_config/config.json
Performing interactive authentication. Please follow the instructions on the terminal.

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code F5QHYXE84 to authenticate.

Interactive authentication successfully completed.
Xiangzhe-WS	westeurope	Xiangzhe-ML	westeurope

Retrieve the model¶

We have registered the model in our workspace. Now, we should retrieve it.

Note : Once we download the model on local, we don't have to run the cell below anymore.

In [3]:

model = Model(ws, 'nyc_taxi_model')

# If the model has already been downloaded, we don't have to do this step.
model.download(target_dir = '.')
 
# verify the downloaded model file
os.stat('./nyc_taxi_model.pkl')

Out[3]:

os.stat_result(st_mode=33188, st_ino=7, st_dev=49, st_nlink=1, st_uid=1000, st_gid=1000, st_size=940, st_atime=0, st_mtime=1538653134, st_ctime=1538653134)

Deploy as web service¶

Once we've tested the model and are satisfied with the results, we can deploy the model as a web service.

Create scoring script¶

Create the scoring script, called score.py, used by the web service call to show how to use the model.

We must include two required functions into the scoring script:

The init() function, which typically loads the model into a global object. This function is run only once when the Docker container is started.
The run(input_data) function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported.

In [4]:

%%writefile score.py

import os
import json
import pickle
import numpy as np

from sklearn.externals import joblib
from sklearn.linear_model import LinearRegression

from azureml.core.model import Model

def init():
    global model
    # retreive the path to the model file using the model name
    model_path = Model.get_model_path('nyc_taxi_model')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    return json.dumps(y_hat.tolist())

Overwriting score.py

Create environment file¶

Next, create an environment file, called myenv.yml, that specifies all of the script's package dependencies. This file is used to ensure that all of those dependencies are installed in the Docker image.

In [5]:

from azureml.core.conda_dependencies import CondaDependencies 

myenv = CondaDependencies()
myenv.add_conda_package("scikit-learn")
myenv.add_conda_package("numpy")
myenv.add_conda_package("pandas")
myenv.add_pip_package("pynacl==1.2.1")

with open("myenv.yml","w") as f:
    f.write(myenv.serialize_to_string())

Review the content of the myenv.yml file.

In [6]:

with open("myenv.yml","r") as f:
    print(f.read())

# Conda environment specification. The dependencies defined in this file will
# be automatically provisioned for runs with userManagedDependencies=False.

# Details about the Conda environment file format:
# https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually

name: project_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2

- pip:
    # Required packages for AzureML execution, history, and data preparation.
  - azureml-defaults
  - pynacl==1.2.1
- scikit-learn
- numpy
- pandas

Create configuration file¶

Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for our ACI container. While it depends on the model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models.

In [8]:

from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "nyc-taxi-trip",  "method" : "sklearn"}, 
                                               description='NYC taxi trip duration prediction')

Deploy in ACI¶

Configure the image and deploy. Normally, it will take 7-8 minutes to deploy the model.

In [10]:

%%time
from azureml.core.webservice import Webservice
from azureml.core.image import ContainerImage

# configure the image
image_config = ContainerImage.image_configuration(execution_script="score.py", 
                                                  runtime="python", 
                                                  conda_file="myenv.yml")

service = Webservice.deploy_from_model(workspace=ws,
                                       name='nyc-taxi-dsvm',
                                       deployment_config=aciconfig,
                                       models=[model],
                                       image_config=image_config)

service.wait_for_deployment(show_output=True)

Creating image
Image creation operation finished for image nyc-taxi-dsvm:1, operation "Succeeded"
Creating service
Running.....................................
SucceededACI service creation operation finished, operation "Succeeded"
CPU times: user 4.91 s, sys: 308 ms, total: 5.22 s
Wall time: 9min 28s

Get the scoring web service's HTTP endpoint, which accepts REST client calls.

This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.

In [11]:

print(service.scoring_uri)

http://40.115.23.210:80/score

Test deployed service¶

First of all, load the test sub-dataset.

In [13]:

from sklearn import preprocessing

pd_dataframe = pd.read_pickle("./data/sub_data_after_prep.pkl")

y_test = np.array(pd_dataframe["trip_duration"]).astype(float)
y_test = np.log(y_test)
X_test = np.array(pd_dataframe.drop(["trip_duration"],axis = 1))

# normalize input
scaler = preprocessing.StandardScaler().fit(X_test)
X_test = scaler.transform(X_test)

/home/nbuser/anaconda3_501/lib/python3.6/site-packages/sklearn/utils/validation.py:475: DataConversionWarning: Data with input dtype object was converted to float64 by StandardScaler.
  warnings.warn(msg, DataConversionWarning)

Secondly, test the deployed model web service with test sub-dataset.

With the first method, we can test a group of data. However, with the second method(HTTP request), we can only test one raw each time.

In [14]:

import json
from sklearn.metrics import mean_squared_error

# find 30 random samples from test set
n = 30
sample_indices = np.random.permutation(X_test.shape[0])[0:n]

test_samples = json.dumps({"data": X_test[sample_indices].tolist()})
test_samples = bytes(test_samples, encoding = 'utf8')

# predict using the deployed model
y_pred = json.loads(service.run(input_data = test_samples))

mse = mean_squared_error(y_test[sample_indices], y_pred)
print("Mean Squared Error for Linear Regression: {}".format(mse))

Mean Squared Error for Linear Regression: 0.4408346330734582

We can also send raw HTTP request to test the web service.

In [15]:

import requests
import json
from sklearn.metrics import mean_squared_error

# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test[random_index])) + "]}"

headers = {'Content-Type':'application/json'}

y_pred = requests.post(service.scoring_uri, input_data, headers = headers)

print("POST to url", service.scoring_uri)
#print("input data:", input_data)
print("label:", y_test[random_index])
print("prediction:", y_pred.text)

POST to url http://40.115.23.210:80/score
label: 6.963189985870238
prediction: "[6.647143813339028]"

Clean up a web service¶

In [16]:

service.delete()

In [ ]: