Model Development

This example shows how to build, train, evaluate and deploy a model running on FPGA. Only Windows is supported. We use TensorFlow and Keras to build our model. We are going to use transfer learning, with ResNet50 as a featurizer. We don't use the last layer of ResNet50 in this case and instead add and train our own classification layer.

We will use the Kaggle Cats and Dogs dataset to train the classifier. The data set can be downloaded here. Download the zip and extract to a directory named 'catsanddogs' under your user directory ("~/catsanddogs").

Please set up your environment as described in the quick start.

In [ ]:
import os
import sys
import tensorflow as tf
import numpy as np
import amlrealtimeai
from amlrealtimeai import resnet50

Model Construction

Load the files we are going to use for training and testing. By default this notebook uses only a very small subset of the Cats and Dogs dataset. That makes it run quickly, but doesn't create a very accurate classifier. You can improve the classifier by using more of the dataset.

In [ ]:
import glob
import imghdr
datadir = os.path.expanduser("~/catsanddogs")

cat_files = glob.glob(os.path.join(datadir, 'PetImages', 'Cat', '*.jpg'))
dog_files = glob.glob(os.path.join(datadir, 'PetImages', 'Dog', '*.jpg'))

# Limit the data set to make the notebook execute quickly.
cat_files = cat_files[:64]
dog_files = dog_files[:64]

# The data set has a few images that are not jpeg. Remove them.
cat_files = [f for f in cat_files if imghdr.what(f) == 'jpeg']
dog_files = [f for f in dog_files if imghdr.what(f) == 'jpeg']

if(not len(cat_files) or not len(dog_files)):
    print("Please download the Kaggle Cats and Dogs dataset form https://www.microsoft.com/en-us/download/details.aspx?id=54765 and extract the zip to " + datadir)    
    raise ValueError("Data not found")
else:
    print(cat_files[0])
    print(dog_files[0])
In [ ]:
# constructing a numpy array as labels
image_paths = cat_files + dog_files
total_files = len(cat_files) + len(dog_files)
labels = np.zeros(total_files)
labels[len(cat_files):] = 1

We need to preprocess the input file to get it into the form expected by ResNet50. We've provided a default implementation of the preprocessing that you can use.

In [ ]:
# Input images as a two-dimensional tensor containing an arbitrary number of images represented a strings
import amlrealtimeai.resnet50.utils
in_images = tf.placeholder(tf.string)
image_tensors = resnet50.utils.preprocess_array(in_images)
print(image_tensors.shape)

Alternatively, if you would like to customize the preprocessing, you can write your own preprocessor using TensorFlow operations.

The input to the classifier we are training is the set of features produced by ResNet50. To train the classifier we need to featurize the images using resnet50. We do this using a featurizer running on FPGA. You can also run the featurizer locally on CPU or GPU.

Go to our GitHub repo "docs" folder to learn how to create a Model Management Account and find the required information below.

In [ ]:
subscription_id = "<Your Azure Subscription Id>"
resource_group = "<Your Azure Resource Group Name>"
model_management_account = "<Your AzureML Model Management Account Name>"

from amlrealtimeai.resnet50.model import RemoteQuantizedResNet50
model_path = os.path.expanduser('~/models')
featurizer = RemoteQuantizedResNet50(subscription_id, resource_group, model_management_account, model_path)
print(featurizer.version)

Calling import_graph_def on the featurizer will create a service that runs the featurizer on FPGA.

In [ ]:
featurizer.import_graph_def(include_top=False, input_tensor=image_tensors)
features = featurizer.featurizer_output

Pre-compute features

Load the data set and compute the features. These can be precomputed because they don't change during training.

In [ ]:
from tqdm import tqdm

def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

def read_files(files):
    contents = []
    for path in files:
        with open(path, 'rb') as f:
            contents.append(f.read())
    return contents
        
feature_list = []
with tf.Session() as sess:
    for chunk in tqdm(chunks(image_paths, 5)):
        contents = read_files(chunk)
        result = sess.run([features], feed_dict={in_images: contents})
        feature_list.extend(result[0])

feature_results = np.array(feature_list)
print(feature_results.shape)

Remove remote service

In [ ]:
featurizer.cleanup_remote_service()

Add and Train the classifier

We use Keras to define and train a simple classifier.

In [ ]:
from keras.models import Sequential
from keras.layers import Dropout, Dense, Flatten
from keras import optimizers

FC_SIZE = 1024
NUM_CLASSES = 2

model = Sequential()
model.add(Dropout(0.2, input_shape=(1, 1, 2048,)))
model.add(Dense(FC_SIZE, activation='relu', input_dim=(1, 1, 2048,)))
model.add(Flatten())
model.add(Dense(NUM_CLASSES, activation='sigmoid', input_dim=FC_SIZE))

model.compile(optimizer=optimizers.SGD(lr=1e-4,momentum=0.9), loss='binary_crossentropy', metrics=['accuracy'])

Prepare the train and test data.

In [ ]:
from sklearn.model_selection import train_test_split
onehot_labels = np.array([[0,1] if i else [1,0] for i in labels])
X_train, X_test, y_train, y_test = train_test_split(feature_results, onehot_labels, random_state=42, shuffle=True)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

Train the classifier.

In [ ]:
model.fit(X_train, y_train, epochs=16, batch_size=32)

Test the Classifier

Let's test the classifier and see how well it does. Since we only trained on a few images, we are not expecting to win a Kaggle competition, but it will likely get most of the images correct.

In [ ]:
from numpy import argmax

y_probs = model.predict(X_test)
y_prob_max = np.argmax(y_probs, 1)
y_test_max = np.argmax(y_test, 1)
print(y_prob_max)
print(y_test_max)
In [ ]:
from sklearn.metrics import confusion_matrix, roc_auc_score, accuracy_score, precision_score, recall_score, f1_score
import itertools
import matplotlib
from matplotlib import pyplot as plt

# compute a bunch of classification metrics 
def classification_metrics(y_true, y_pred, y_prob):
    cm_dict = {}
    cm_dict['Accuracy'] = accuracy_score(y_true, y_pred)
    cm_dict['Precision'] =  precision_score(y_true, y_pred)
    cm_dict['Recall'] =  recall_score(y_true, y_pred)
    cm_dict['F1'] =  f1_score(y_true, y_pred) 
    cm_dict['AUC'] = roc_auc_score(y_true, y_prob[:,0])
    cm_dict['Confusion Matrix'] = confusion_matrix(y_true, y_pred).tolist()
    return cm_dict

def plot_confusion_matrix(cm, classes, normalize=False, title='Confusion matrix', cmap=plt.cm.Blues):
    """Plots a confusion matrix.
    Source: http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
    New BSD License - see appendix
    """
    cm_max = cm.max()
    cm_min = cm.min()
    if cm_min > 0: cm_min = 0
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        cm_max = 1
    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)
    thresh = cm_max / 2.
    plt.clim(cm_min, cm_max)

    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i,
                 round(cm[i, j], 3),  # round to 3 decimals if they are float
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()
    
cm_dict = classification_metrics(y_test_max, y_prob_max, y_probs)
for m in cm_dict:
    print(m, cm_dict[m])
cm = np.asarray(cm_dict['Confusion Matrix'])
plot_confusion_matrix(cm, ['fail','pass'], normalize=False)

Service Definition

Like in the QuickStart notebook our service definition pipeline consists of three stages. Here we use the Keras classifier as the final stage.

In [ ]:
from amlrealtimeai.pipeline import ServiceDefinition, TensorflowStage, BrainWaveStage, KerasStage

service_def = ServiceDefinition()
service_def.pipeline.append(TensorflowStage(tf.Session(), in_images, image_tensors))
service_def.pipeline.append(BrainWaveStage(featurizer))
service_def.pipeline.append(KerasStage(model))

service_def_path = os.path.join(datadir, 'save', 'service_def')
service_def.save(service_def_path)
print(service_def_path)

Deploy

In [ ]:
from amlrealtimeai import DeploymentClient

model_name = "catsanddogs-model"
service_name = "modelbuild-service"

deployment_client = DeploymentClient(subscription_id, resource_group, model_management_account)

The first time the code below runs it will create a new service running your model. If you want to change the model you can make changes above in this notebook and save a new service definition. Then this code will update the running service in place to run the new model.

In [ ]:
service = deployment_client.get_service_by_name(service_name)
model_id = deployment_client.register_model(model_name, service_def_path)
In [ ]:
if(service is None):
    service = deployment_client.create_service(service_name, model_id)    
else:
    service = deployment_client.update_service(service.id, model_id)

The service is now running in Azure and ready to serve requests. We can check the address and port.

In [ ]:
print(service.ipAddress + ':' + str(service.port))

Client

There is a simple test client at amlrealtimeai.PredictionClient which can be used for testing. We'll use this client to score an image with our new service.

In [ ]:
from amlrealtimeai import PredictionClient
client = PredictionClient(service.ipAddress, service.port)

You can adapt the client code to meet your needs. There is also an example C# client.

The service provides an API that is compatible with TensorFlow Serving. There are instructions to download a sample client here.

Request

Let's see how our service does on a few images. It may get a few wrong.

In [ ]:
# Specify an image to classify
print('CATS')
for image_file in cat_files[:8]:
    results = client.score_image(image_file)
    result = 'CORRECT ' if results[0] > results[1] else 'WRONG '
    print(result + str(results))
print('DOGS')
for image_file in dog_files[:8]:
    results = client.score_image(image_file)
    result = 'CORRECT ' if results[1] > results[0] else 'WRONG '
    print(result + str(results))

Cleanup

Run the cell below to delete your service.

In [ ]:
services = deployment_client.list_services()

for service in filter(lambda x: x.name == service_name, services):
    print(service.id)
    deployment_client.delete_service(service.id)
    
models = deployment_client.list_models()

for model in filter(lambda x: x.name == model_name, models):
    print(model.id)
    deployment_client.delete_model(model.id)

Appendix

License for plot_confusion_matrix:

New BSD License

Copyright (c) 2007–2018 The scikit-learn developers. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. c. Neither the name of the Scikit-learn Developers nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.