This is a getting started tutorial for those who are new to ML-ModelCI, by the end of this tutorial, you will be able to:
Here are some prequisities before installation
We can install the ModelCI python package based on https://github.com/cap-ntu/ML-Model-CI/#installation-using--pip.
%%bash
git clone https://github.com/cap-ntu/ML-Model-CI.git
cd ML-Model-CI
pip install -q .
For the first time you start the ModelCI service, you need to setup all the environment variables in a single .env
file by the following script:
# set environment variables
!python scripts/generate_env.py
Read env-backend.env ... Read env-mongodb.env ... Read env-frontend.env ... Write .env for backend with setup: { "PROJECT_NAME": "modelci", "SERVER_HOST": "localhost", "SERVER_PORT": "8000", "SECRET_KEY": "2a6c03b9ca06cd8fc3cf506f0ba924cb735f15918d54758426fd7282366a5e19", "MONGO_HOST": "localhost", "MONGO_PORT": "27017", "MONGO_USERNAME": "modelci", "MONGO_PASSWORD": "modelci@2020", "MONGO_DB": "modelci", "MONGO_AUTH_SOURCE": "modelci", "BACKEND_CORS_ORIGINS": "localhost:3333" } Write .env for frontend with setup: { "HOST": "localhost", "PORT": "3333", "REACT_APP_BACKEND_URL": "localhost:8000" }
Then start the modelci service by following command, it's recommended to execute this command in your own terminal:
%%bash
modelci service init
2021-05-27 15:04:51.579933: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-05-27 15:05:19,828 - ml-modelci Docker Container Manager - INFO - Container name=mongo-75712 stared 2021-05-27 15:05:28,807 - ml-modelci Docker Container Manager - INFO - Container name=cadvisor-21447 started. 2021-05-27 15:05:37,027 - ml-modelci Docker Container Manager - INFO - Container name=dcgm-exporter-49902 started. 2021-05-27 15:05:47,032 - ml-modelci Docker Container Manager - INFO - gpu-metrics-exporter-66880 stared /home/shanshan/miniconda3/envs/test/lib/python3.7/site-packages/modelci/hub/client/trt_client.py:22: UserWarning: Module `tensorrtserver` not installed. You are not able to use TRT Client. warnings.warn('Module `tensorrtserver` not installed. You are not able to use TRT Client.') 2021-05-27 15:05:47,992 - modelci backend - INFO - Uvicorn server listening on http://localhost:8000, check full log at /home/shanshan/tmp/modelci.log
Firstly, we load pre-trained resnet50 model from torchvision, and save the whole model file, you can refer to https://pytorch.org/docs/stable/torchvision/models.html for more examples of pretrained models.
from torchvision import models
from pathlib import Path
import torch
import os
model = models.resnet50(pretrained=True)
torch_model_path = Path.home()/'.modelci/ResNet50/PyTorch-PYTORCH/Image_Classification/1.pth'
if not Path.is_dir(torch_model_path.parent):
os.makedirs(torch_model_path.parent, exist_ok=True)
torch.save(model, torch_model_path)
We can specify a MLModel instance by a YAML file, here is an example.
weight: "~/.modelci/ResNet50/PyTorch-PYTORCH/Image_Classification/1.pth"
architecture: ResNet50
framework: PyTorch
engine: PYTORCH
version: 1
dataset: ImageNet
task: Image_Classification
metric:
acc: 0.76
inputs:
- name: "input"
shape: [ -1, 3, 224, 224 ]
dtype: TYPE_FP32
outputs:
- name: "output"
shape: [ -1, 1000 ]
dtype: TYPE_FP32
convert: true
Then we can use modelci publish
api to register this model into modelhub, a list of generated IDs will be returned. In this case, we have three IDs, because modelci will automatically convert registered models into optimized formats,PyTorch model will be converted to TorchScipt and ONNX formats.
!modelci modelhub publish -f ../resnet50_explicit_path.yml
2021-05-27 15:08:22.003806: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 {'data': {'id': ['60af456c681d3e02a0099500', '60af456f681d3e02a009968b', '60af456f681d3e02a0099816']}, 'status': True}
We can list all models published in MLModelCI by the following command:
!modelci modelhub ls
2021-05-29 09:37:34.023488: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 ┏━━━━━━┳━━━━━━┳━━━━━━┳━━━━━━┳━━━━━━┳━━━━━┳━━━━━━┳━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━┓ ┃ ID ┃ ARCH ┃ FRA… ┃ ENG… ┃ VER… ┃ DA… ┃ MET… ┃ SC… ┃ TASK ┃ STA… ┃ ┃ ┃ NAME ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┡━━━━━━╇━━━━━━╇━━━━━━╇━━━━━━╇━━━━━━╇━━━━━╇━━━━━━╇━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━┩ │ 60b… │ Res… │ PyT… │ PYT… │ 1 │ Im… │ acc │ 0.… │ Image_Classifi… │ 💔 │ │ │ │ │ │ │ │ │ │ │ Unk… │ │ │ │ │ │ │ │ │ │ │ │ │ 60b… │ Res… │ PyT… │ TOR… │ 1 │ Im… │ acc │ 0.… │ Image_Classifi… │ 💔 │ │ │ │ │ │ │ │ │ │ │ Unk… │ │ │ │ │ │ │ │ │ │ │ │ │ 60b… │ Res… │ PyT… │ ONNX │ 1 │ Im… │ acc │ 0.… │ Image_Classifi… │ 💔 │ │ │ │ │ │ │ │ │ │ │ Unk… │ └──────┴──────┴──────┴──────┴──────┴─────┴──────┴─────┴─────────────────┴──────┘
We can view detatiled information of a single model by its ID
!modelci modelhub detail 60af456c681d3e02a0099500
2021-05-27 15:08:50.873389: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 ID Architec… Framework Version Pretrained Metric Score Task Dataset 60af456c… ResNet50 PyTorch 1 ImageNet acc 0.76 Image Classifica… Conve… Model Info Servi… PYTOR… Status 💔 Creat… shans… Creat… 25 Engine Unkno… secon… ago Inputs Name input Shape [-1, Data FP32 Format FORMAT_… 3, Type 224, 224] Outpu… Name output Shape [-1, Data FP32 Format FORMAT_… 1000] Type Profi… Resul… N.A.
!modelci modelhub detail 60af456f681d3e02a009968b
2021-05-27 15:09:03.627985: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 ID Architec… Framework Version Pretrained Metric Score Task Dataset 60af456f… ResNet50 PyTorch 1 ImageNet acc 0.76 Image Classifica… Conve… Model Info Servi… TORCH… Status 💔 Creat… shans… Creat… 37 Engine Unkno… secon… ago Inputs Name input Shape [-1, Data FP32 Format FORMAT_… 3, Type 224, 224] Outpu… Name output Shape [-1, Data FP32 Format FORMAT_… 1000] Type Profi… Resul… N.A.
!modelci modelhub detail 60af456f681d3e02a0099816
2021-05-27 15:09:18.770745: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 ID Architec… Framework Version Pretrained Metric Score Task Dataset 60af456f… ResNet50 PyTorch 1 ImageNet acc 0.76 Image Classifica… Conve… Model Info Servi… ONNX Status 💔 Creat… shansh… Creat… 52 Engine Unkno… seconds ago Inputs Name input Shape [-1, 3, Data FP32 Format FORMAT_… 224, Type 224] Outpu… Name output Shape [-1, Data FP32 Format FORMAT_… 1000] Type Profi… Resul… N.A.
We can convert models mannually.
You can refer to https://github.com/cap-ntu/ML-Model-CI/blob/master/docs/tutorial/convert.md for more details.
In the following example, we will convert ONNX model into TensorRT format.
from modelci.types.bo import IOShape
from modelci.types.trtis_objects import ModelInputFormat
# get ONNX model saved path
onnx_model_path = Path.home()/'.modelci/ResNet50/PyTorch-ONNX/Image_Classification/1.onnx'
trt_model_path = Path.home()/'.modelci/ResNet50/PyTorch-TRT/Image_Classification/1'
# specify inputs and outputs shape
inputs = [IOShape([-1, 3, 224, 224], dtype=float, name='INPUT__0', format=ModelInputFormat.FORMAT_NCHW)]
outputs = [IOShape([-1, 1000], dtype=float, name='probs')]
from modelci.hub.converter import convert
convert(
model=onnx_model_path,
src_framework='onnx',
dst_framework='trt',
save_path=trt_model_path,
inputs=inputs,
outputs=outputs
)
Loading ONNX file from path /home/shanshan/.modelci/ResNet50/PyTorch-ONNX/Image_Classification/1.onnx...
True
ModelCI also supports model deployment, in this case, we will deploy a ONNX model, the following script will start a docker container of onnx-serving. The port 8001 is the port of HTTP endpoint while 8002 is for gRPC
from modelci.hub.deployer.dispatcher import serve
batch_size = 8
server_name = 'onnx'
serve(save_path=onnx_model_path, device='cuda:0', name=server_name, batch_size=batch_size)
<Container: 651b81eb03>
!docker ps | grep onnx
651b81eb034f mlmodelci/onnx-serving:latest-gpu "/bin/sh -c 'python …" 19 seconds ago Up 12 seconds 0.0.0.0:8001->8000/tcp, 0.0.0.0:8002->8001/tcp onnx
Once a model is registered and deployed as a service, we can profile our model to get information such as memory usage and response latency.
At first, we need to retrieve the ONNX model.
from modelci.persistence.service import ModelService
model = ModelService.get_model_by_id("60af456f681d3e02a0099816")
model.inputs[0].dtype = 11
Then we build a client for ONNX serving platform
from modelci.hub.client.onnx_client import CVONNXClient
test_img_bytes = torch.rand(3, 224, 224)
onnx_client = CVONNXClient(test_img_bytes, model, batch_num=20, batch_size=8, asynchronous=False)
Finally we can init a profiler, one thing to note is the server name must be the same as the serving container's.
from modelci.hub.profiler import Profiler
profiler = Profiler(model_info=model, server_name='onnx', inspector=onnx_client)
dps = profiler.diagnose(device='cuda:0')
latency: 0.3299 sec throughput: 24.2529 req/sec latency: 0.3296 sec throughput: 24.2684 req/sec latency: 0.3288 sec throughput: 24.3295 req/sec latency: 0.3320 sec throughput: 24.0994 req/sec latency: 0.3295 sec throughput: 24.2803 req/sec latency: 0.3291 sec throughput: 24.3056 req/sec latency: 0.3339 sec throughput: 23.9628 req/sec latency: 0.3312 sec throughput: 24.1563 req/sec latency: 0.3297 sec throughput: 24.2679 req/sec latency: 0.3295 sec throughput: 24.2761 req/sec latency: 0.3298 sec throughput: 24.2566 req/sec latency: 0.3299 sec throughput: 24.2480 req/sec latency: 0.3340 sec throughput: 23.9506 req/sec latency: 0.3302 sec throughput: 24.2280 req/sec latency: 0.3297 sec throughput: 24.2668 req/sec latency: 0.3293 sec throughput: 24.2967 req/sec latency: 0.3304 sec throughput: 24.2139 req/sec latency: 0.3289 sec throughput: 24.3206 req/sec latency: 0.3293 sec throughput: 24.2927 req/sec latency: 0.3286 sec throughput: 24.3448 req/sec testing device: GeForce MX110 total batches: 20, batch_size: 8 total latency: 6.623807668685913 s total throughput: 24.15529073351584 req/sec 50th-percentile latency: 0.32966148853302 s 95th-percentile latency: 0.3338588953018189 s 99th-percentile latency: 0.3339882707595825 s total GPU memory: 2101870592.0 bytes average GPU memory usage percentage: 0.2840 average GPU memory used: 596901888.0 bytes average GPU utilization: 47.8333% completed at 2021-05-29 09:39:27.955909
dps
<modelci.types.bo.dynamic_profile_result_bo.DynamicProfileResultBO at 0x7f9244126e10>
!conda activate test && modelci service stop
2021-04-28 00:59:10.251530: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-04-28 00:59:12,892 - ml-modelci Docker Container Manager - INFO - Container name=gpu-metrics-exporter-31125 stopped. 2021-04-28 00:59:13,535 - ml-modelci Docker Container Manager - INFO - Container name=cadvisor-12065 stopped. 2021-04-28 00:59:14,859 - ml-modelci Docker Container Manager - INFO - Container name=mongo-49205 stopped. 2021-04-28 00:59:14,994 - modelci backend - WARNING - No process is listening on port 8000
Or you can remove all the stoppped docker containers by the following command:
!conda activate test && modelci service clean
2021-05-27 15:04:21.636005: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2021-05-27 15:04:23,508 - ml-modelci Docker Container Manager - INFO - Container name=gpu-metrics-exporter-78925 stopped. 2021-05-27 15:04:25,534 - ml-modelci Docker Container Manager - INFO - Container name=cadvisor-4409 stopped. 2021-05-27 15:04:33,859 - ml-modelci Docker Container Manager - INFO - Container name=mongo-97086 stopped. 2021-05-27 15:04:33,971 - modelci backend - WARNING - No process is listening on port 8000 2021-05-27 15:04:34,541 - ml-modelci Docker Container Manager - INFO - Container 8cd47d49c1f594684ba9d61cb399c55fc66190c6f714b74014e377c9cee9e9e3 is removed. 2021-05-27 15:04:35,250 - ml-modelci Docker Container Manager - INFO - Container ef64e035053f4b9d04bf2c2ae465e1ddce30a44bc97623ea145a8919e6900020 is removed. 2021-05-27 15:04:35,995 - ml-modelci Docker Container Manager - INFO - Container 4f1b2203ec42d2f83ccaa0b970bd3de9e4b55c3229e79536edbad50497e4da27 is removed. 2021-05-27 15:04:36,750 - ml-modelci Docker Container Manager - INFO - Container 4e6fb506ce97619521459209b690658ba2fac1319b7460d4a1ddfa5f079d6897 is removed.
raw
Copyright 2020 Nanyang Technological University, Singapore
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.