NB: The Caffe results are released with approval from General Motors. The TensorRT 1.0 EA results are released with approval from NVIDIA.
This Jupyter Notebook compares the performance (execution time, memory consumption):
$ uname -a
Linux tegra-ubuntu 3.10.96-tegra #1 SMP PREEMPT Wed Nov 9 19:42:57 PST 2016 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
using 6 Caffe libraries:
tag
] Branch (revision hash, date): math libraries.cpu
] Master (24d2f67, 28/Nov/2016): with OpenBLAS 0.2.19;nvidia-cuda
] NVIDIA 0.15 (1024d34, 17/Nov/2016): with cuBLAS (part of CUDA Toolkit 8.0.33);nvidia-cudnn
] NVIDIA 0.15 (1024d34, 17/Nov/2016): with cuDNN 5.1;nvidia-fp16-cuda
] NVIDIA experimental/fp16 (fca1cf4, 11/Jul/2016): with cuBLAS (part of CUDA Toolkit 8.0.33);nvidia-fp16-cudnn
] NVIDIA experimental/fp16 (fca1cf4, 11/Jul/2016): with cuDNN 5.1;libdnn-cuda
] OpenCL (b735c2d, 23/Nov/2016): with libDNN and cuBLAS (part of CUDA Toolkit 8.0.33) for fully connected layers;using 2 configurations of the NVIDIA TensorRT 1.0.0 EA engine:
tensorrt-fp16
] NVIDIA TensorRT 1.0.0 EA with fp16 enabled;tensorrt-fp32
] NVIDIA TensorRT 1.0.0 EA with fp16 disabled;using 4 CNN models:
with the batch size varying from 2 to 16 with step 2.
fw = [ 'forward' ]
fwbw = [ 'forward', 'backward' ]
# Set to fw for inference; to fwbw for training.
direction = fw
direction
['forward']
if direction==fw:
time_ms = 'time_fw_ms'
else: # direction==fwbw
time_ms = 'time_fwbw_ms'
time_ms
'time_fw_ms'
def images_per_second(time_in_milliseconds):
return 1000.0 / time_in_milliseconds
NB: Please ignore this section if you are not interested in re-running or modifying this notebook.
import os
import sys
import json
import re
If some of the scientific packages are missing, please install them using:
# pip install jupyter pandas numpy matplotlib
import IPython as ip
import pandas as pd
import numpy as np
import matplotlib as mp
print ('IPython version: %s' % ip.__version__)
print ('Pandas version: %s' % pd.__version__)
print ('NumPy version: %s' % np.__version__)
print ('Matplotlib version: %s' % mp.__version__)
IPython version: 4.1.1 Pandas version: 0.19.1 NumPy version: 1.11.2 Matplotlib version: 1.5.3
from IPython.display import Image
from IPython.core.display import HTML
import matplotlib.pyplot as plt
from matplotlib import cm
%matplotlib inline
default_title = 'NVIDIA Jetson TX1'
default_ylabel = 'Execution time (ms)'
default_colormap = cm.autumn
default_fontsize = 16
default_figsize = [16, 8]
If CK is not installed, please install it using:
# pip install ck
import ck.kernel as ck
print ('CK version: %s' % ck.__version__)
CK version: 1.8.7.1
repo_uoa = 'ck-caffe-nvidia-tx1'
def get_experimental_results(repo_uoa, tags):
module_uoa = 'experiment'
r = ck.access({'action':'search', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'tags':tags})
if r['return']>0:
print ("Error: %s" % r['error'])
exit(1)
experiments = r['lst']
dfs = []
for experiment in experiments:
data_uoa = experiment['data_uoa']
r = ck.access({'action':'list_points', 'repo_uoa':repo_uoa, 'module_uoa':module_uoa, 'data_uoa':data_uoa})
if r['return']>0:
print ("Error: %s" % r['error'])
exit(1)
# Get (lib_tag, model_tag) from a list of tags that should be available in r['dict']['tags'].
# Tags include 2 of the 3 irrelevant tags, a model tag and a lib tag.
# NB: Since it's easier to list all model tags than all lib tags, the latter list is not expicitly specified.
tags = r['dict']['tags']
irrelevant_tags = [ 'explore-batch-size-libs-models','time_gpu','time_cpu','time_gpu_fp16' ]
model_tags = [ 'bvlc-alexnet','bvlc-googlenet','deepscale-squeezenet-1.0','deepscale-squeezenet-1.1' ]
lib_model_tags = [ tag for tag in tags if tag not in irrelevant_tags ]
model_tags = [ tag for tag in lib_model_tags if tag in model_tags ]
lib_tags = [ tag for tag in lib_model_tags if tag not in model_tags ]
if len(lib_tags)==1 and len(model_tags)==1:
(lib, model) = (lib_tags[0], model_tags[0])
else:
continue
for point in r['points']:
with open(os.path.join(r['path'], 'ckp-%s.0001.json' % point)) as point_file:
point_data_raw = json.load(point_file)
# Obtain column data.
characteristics = [
{
'time (ms)' : characteristics['run'].get(time_ms,+1e9), # "positive infinity"
'memory (MB)' : characteristics['run'].get('memory_mbytes',-1),
'success?' : characteristics['run'].get('run_success','n/a'),
'per layer info' : characteristics['run'].get('per_layer_info',[])
}
for characteristics in point_data_raw['characteristics_list']
]
# Deal with missing column data (resulting from failed runs).
if len(characteristics)==1:
repetitions = point_data_raw['features'].get('statistical_repetitions',1)
characteristics = characteristics * repetitions
# Construct a DataFrame.
df = pd.DataFrame(characteristics)
# Set columns and index names.
df.columns.name = 'run characteristic'
df.index.name = 'repetition'
# Set indices.
if lib=='tensorrt-1.0.0':
enable_fp16 = (point_data_raw['choices']['env']['CK_TENSORRT_ENABLE_FP16'] != 0)
df['lib'] = 'tensorrt-fp%d' % (16 if enable_fp16 else 32)
else:
df['lib'] = lib
df['model'] = model
df['batch size'] = point_data_raw['choices']['env']['CK_CAFFE_BATCH_SIZE']
df = df.set_index(['lib', 'model', 'batch size'], append=True)
df = df.reorder_levels(('model', 'lib', 'batch size', 'repetition'))
# Append to the list of similarly constructed DataFrames.
dfs.append(df)
# Concatenate all constructed DataFrames (i.e. stack on top of each other).
result = pd.concat(dfs)
return result.sortlevel(result.index.names)
def plot(mean, std, title=default_title, ylabel=default_ylabel, rot=0, ymax=0):
ymax = mean.max().max() if ymax==0 else ymax
ax = mean.plot(kind='bar', yerr=std, grid=True, legend=True, rot=rot, ylim=[0,ymax*1.05],
fontsize=default_fontsize, figsize=default_figsize, colormap=default_colormap)
ax.set_title(title, fontsize=default_fontsize)
ax.set_xlabel(mean.index.name, fontsize=default_fontsize)
ax.set_ylabel(ylabel, fontsize=default_fontsize)
return ax
pretty_print_libs = {
'cpu': 'OpenBLAS (CPU)',
'libdnn-cuda':'libDNN-fp32',
'nvidia-cuda':'cuBLAS-fp32',
'nvidia-fp16-cuda':'cuBLAS-fp16',
'nvidia-cudnn':'cuDNN-fp32',
'nvidia-fp16-cudnn':'cuDNN-fp16',
'tensorrt-fp32':'TensorRT-fp32',
'tensorrt-fp16':'TensorRT-fp16'
}
pretty_print_models = {
'bvlc-alexnet':'AlexNet',
'bvlc-googlenet':'GoogleNet',
'deepscale-squeezenet-1.0':'SqueezeNet 1.0',
'deepscale-squeezenet-1.1':'SqueezeNet 1.1'
}
speedup_sort_models = [
'OpenBLAS (CPU)',
'libDNN-fp32',
'cuBLAS-fp32',
'cuBLAS-fp16',
'cuDNN-fp32',
'cuDNN-fp16',
'TensorRT-fp32',
'TensorRT-fp16'
]
# ['cuda', 'cudnn'] are roughly equivalent to ['nvidia-cuda', 'nvidia-cudnn'], so can be dropped.
def plot_max_num_images_per_second(df_mean_time_per_image, libs_to_drop=['cuda', 'cudnn'], rot=0, fontsize=None):
min_time_per_image = df_mean_time_per_image.min(axis=1).unstack('lib')
max_num_images_per_second = images_per_second(min_time_per_image) \
.drop(libs_to_drop, axis=1) \
.rename(columns=pretty_print_libs, index=pretty_print_models) \
.reindex(columns=speedup_sort_models)
ax = max_num_images_per_second \
.plot(kind='bar', rot=rot, width=0.95, grid=True, legend=True,
fontsize=default_fontsize, figsize=default_figsize, colormap=default_colormap)
ax.set_title(default_title, fontsize=default_fontsize)
ax.set_xlabel(max_num_images_per_second.index.name, fontsize=default_fontsize)
ax.set_ylabel('Images/s (with the best even batch size between 2 and 16)', fontsize=default_fontsize)
for patch in ax.patches:
ax.annotate(str(int(patch.get_height()+0.5)), (patch.get_x()*1.00, patch.get_height()*1.01), fontsize=fontsize)
# ['cuda', 'cudnn'] are roughly equivalent to ['nvidia-cuda', 'nvidia-cudnn'], so can be dropped.
def plot_speedup_over_baseline(df_mean_time_per_image, baseline='cpu', libs_to_drop=['cuda', 'cudnn'], rot=0, fontsize=None):
speedup_over_baseline = df_mean_time_per_image.min(axis=1).unstack('model').ix[baseline] / \
df_mean_time_per_image.min(axis=1).unstack('model')
speedup_over_baseline = speedup_over_baseline.T \
.drop(libs_to_drop, axis=1) \
.rename(index=pretty_print_models, columns=pretty_print_libs) \
.reindex(columns=speedup_sort_models)
ax = speedup_over_baseline \
.plot(kind='bar', rot=rot, width=0.95, grid=True, legend=True,
fontsize=default_fontsize, figsize=default_figsize, colormap=default_colormap)
ax.set_title(default_title, fontsize=default_fontsize)
ax.set_xlabel(speedup_over_baseline.index.name, fontsize=default_fontsize)
ax.set_ylabel('Speedup over the given baseline (%s)' % pretty_print_libs[baseline], fontsize=default_fontsize)
for patch in ax.patches:
ax.annotate('{0:.2f}'.format(patch.get_height())[0:4], (patch.get_x()*1.00, patch.get_height()*1.01),
fontsize=fontsize)
# This transformation is time consuming, hence only call it once for multiple plots.
def get_per_layer_info(df_all):
df_per_layer_info = df_all['per layer info']
row_dfs = []
for (row_info, row_id) in zip(df_per_layer_info, range(len(df_per_layer_info))):
# Skip constructing a DataFrame when no layer info is available.
if not row_info: continue
# Augment each layer info with the row index: (model, lib, batch size, repetition).
for layer_info in row_info:
layer_info.update({ k : v for k, v in zip(df_per_layer_info.index.names, df_per_layer_info.index[row_id]) })
# Construct a DataFrame and move the row index to where it belongs.
row_df = pd.DataFrame(data=row_info).set_index(df_per_layer_info.index.names)
row_dfs.append(row_df)
return pd.concat(row_dfs)
def plot_time_per_image_per_layer(df_per_layer_info, model, libs, batch_sizes,
direction=['forward'], lower=0.0, upper=1.0, ymax=0, rot=90):
df_time_per_batch = df_per_layer_info.loc[model, libs, batch_sizes] \
.set_index(['direction', 'label'], append=True) \
.reorder_levels(['direction', 'label', 'model', 'lib', 'batch size', 'repetition' ]) \
.ix[direction] \
.reorder_levels(['label', 'model', 'lib', 'batch size', 'repetition', 'direction' ]) \
.groupby(level=['label', 'model', 'lib', 'batch size', 'repetition']).sum() \
['time_ms']
df_time_per_image = df_time_per_batch.unstack('batch size') / batch_sizes
df = df_time_per_image.unstack(['lib', 'model'])
df = df.reorder_levels(['model', 'lib', 'batch size'], axis=1)
mean = df.groupby(level='label').mean()
std = df.groupby(level='label').std()
select = (lower*mean.sum() <= mean).any(axis=1) & (mean <= upper*mean.sum()).any(axis=1)
ymax = mean[select].max().max() if ymax==0 else ymax
ax = plot(mean=mean[select], std=std[select], ylabel='Execution per image time per layer (ms)', ymax=ymax, rot=rot)
ax.set_xlabel('Layer', fontsize=default_fontsize)
# The ideal adaptive solution for each layer selects the best performing library from the 'libs_for_adaptation' list.
# FIXME: add batch_sizes as explicit parameter.
def get_ideal_adaptive_solution(df_per_layer_info, libs_for_adaptation, direction):
df_for_adaptation = df_per_layer_info \
.set_index(['direction', 'label'], append=True) \
.reorder_levels(['direction', 'lib', 'model', 'label', 'batch size', 'repetition']) \
.ix[direction] \
.reorder_levels(['lib', 'model', 'label', 'batch size', 'repetition', 'direction']) \
.ix[libs_for_adaptation] \
.reorder_levels(['model', 'label', 'lib', 'batch size', 'repetition', 'direction']) \
['time_ms']
# With every step, reduce the rightmost dimension until the min time per model is reached.
df_cum_time_per_repetition = df_for_adaptation.groupby(level=df_for_adaptation.index.names[:-1]).sum()
df_min_time_per_repetition = df_cum_time_per_repetition.groupby(level=df_cum_time_per_repetition.index.names[:-1]).min()
df_min_time_per_batch = df_min_time_per_repetition.unstack('batch size') / batch_sizes
df_min_time_per_image = df_min_time_per_batch.min(axis=1)
df_min_time_per_layer = df_min_time_per_image.groupby(level=df_min_time_per_image.index.names[:-1]).min()
#df_min_time_per_model = df_min_time_per_layer.groupby(level=df_min_time_per_layer.index.names[:-1]).sum()
# Transform to get the models in the index and the libs in the columns.
df_min_time_per_layer_idx = df_min_time_per_image.groupby(level=df_min_time_per_image.index.names[:-1]).idxmin()
df_ideal = df_min_time_per_image[df_min_time_per_layer_idx] \
.reorder_levels(['model', 'lib', 'label']) \
.groupby(level=['model', 'lib']).sum() \
.unstack('lib')
# Sort in the order of increasing time per model.
df_ideal_sorted = df_ideal.ix[df_ideal.sum(axis=1).sort_values(ascending=True).index]
return df_ideal_sorted
def plot_ideal_adaptive_solution(df_ideal, df_real, tag=""):
figsize=[15, 3]
if not tag=="": figsize=[10, 2] # good for dumping png (e.g. 3 graphs fit well onto a slide).
for model in df_ideal.index:
df_data = {}; df_data['adaptive'] = df_ideal.ix[model]
for lib in df_ideal.columns:
df_data[lib] = pd.Series(index=df_ideal.columns)
df_data[lib][lib] = df_real.ix[model, lib]
df = pd.DataFrame(df_data).T \
.rename(index={'cpu': 'OpenBLAS only', 'nvidia-cuda':'cuBLAS only', 'nvidia-cudnn':'cuDNN only',
'libdnn-cuda': 'libDNN only'},
columns={'cpu': 'OpenBLAS', 'nvidia-cuda':'cuBLAS', 'nvidia-cudnn':'cuDNN', 'libdnn-cuda': 'libDNN'})
ax = df.ix[df.sum(axis=1).sort_values(ascending=True).index] \
.plot(kind='barh', stacked=True, width=0.9, grid=True, legend=True,
fontsize=default_fontsize, figsize=figsize, colormap=cm.summer_r)
#.legend(loc='lower right')
ax.set_title('%s - execution time per image (ms)' % model, fontsize=default_fontsize)
if not tag=="": ax.get_figure().savefig('%s.%s.png' % (tag, model))
def plot_time_per_image_and_memory_consumption(df_all, model, lib):
df = df_all[['time (ms)', 'memory (MB)']] \
.groupby(level=df_all.index.names[:-1]).mean() \
.loc[model, lib]
df['time per image (ms)'] = df['time (ms)'].divide(df.index, axis=0)
df['memory per image (MB)'] = df['memory (MB)'].divide(df.index, axis=0)
df = df.drop('time (ms)', axis=1).sortlevel(axis=1)
ax = df.plot(secondary_y=['memory (MB)', 'memory per image (MB)'], mark_right=False, grid=True,
figsize=[12, 8], fontsize=default_fontsize, colormap=cm.winter)
ax.set_title('%s w/ %s' % (model, lib), fontsize=default_fontsize)
ax.set_xlabel(df.index.name, fontsize=default_fontsize)
ax.set_ylabel('execution time (ms)', fontsize=default_fontsize); ax.legend(loc='center left'); ax.set_ylim(0)
ax.right_ax.set_ylabel('memory consumption (MB)', fontsize=default_fontsize); ax.right_ax.legend(loc='center right')
NB: Please ignore this section if you are not interested in re-running or modifying this notebook.
The Caffe experimental data was collected on the experimental platform (after installing all Caffe libraries and models of interest) as follows:
$ cd `ck find ck-caffe:script:explore-batch-size-libs-models`
$ python explore-batch-size-libs-models-benchmark.py
It can be downloaded from GitHub via CK as follows:
$ ck pull repo:ck-caffe-nvidia-tx1 --url=https://github.com/dividiti/ck-caffe-nvidia-tx1
df_all = get_experimental_results(repo_uoa=repo_uoa, tags='explore-batch-size-libs-models')
pd.options.display.max_columns = len(df_all.columns)
pd.options.display.max_rows = len(df_all.index)
df_all
run characteristic | memory (MB) | per layer info | success? | time (ms) | |||
---|---|---|---|---|---|---|---|
model | lib | batch size | repetition | ||||
bvlc-alexnet | cpu | 2 | 0 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 429.404000 |
1 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 432.713000 | |||
2 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 438.301000 | |||
4 | 0 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 657.945000 | ||
1 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 659.096000 | |||
2 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 654.448000 | |||
6 | 0 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 926.569000 | ||
1 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 923.892000 | |||
2 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 927.562000 | |||
8 | 0 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1058.140000 | ||
1 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1052.840000 | |||
2 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1050.140000 | |||
10 | 0 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1293.970000 | ||
1 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1279.210000 | |||
2 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1294.980000 | |||
12 | 0 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1530.730000 | ||
1 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1518.820000 | |||
2 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1527.610000 | |||
14 | 0 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1768.870000 | ||
1 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1750.410000 | |||
2 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1767.410000 | |||
16 | 0 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1983.670000 | ||
1 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2001.910000 | |||
2 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1982.130000 | |||
libdnn-cuda | 2 | 0 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 43.455800 | |
1 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 45.209400 | |||
2 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 45.086200 | |||
4 | 0 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 65.658900 | ||
1 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 65.170500 | |||
2 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 65.972100 | |||
6 | 0 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 93.980300 | ||
1 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 93.099900 | |||
2 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 91.359400 | |||
8 | 0 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 120.833000 | ||
1 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 118.322000 | |||
2 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 118.151000 | |||
10 | 0 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 147.654000 | ||
1 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 147.449000 | |||
2 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 149.036000 | |||
12 | 0 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 179.296000 | ||
1 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 177.986000 | |||
2 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 177.760000 | |||
14 | 0 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 208.372000 | ||
1 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 208.853000 | |||
2 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 208.871000 | |||
16 | 0 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 233.930000 | ||
1 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 232.266000 | |||
2 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 235.624000 | |||
nvidia-cuda | 2 | 0 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 32.655300 | |
1 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 34.419300 | |||
2 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 32.973400 | |||
4 | 0 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 44.510100 | ||
1 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 44.684900 | |||
2 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 47.259100 | |||
6 | 0 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 63.047100 | ||
1 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 62.730400 | |||
2 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 62.857800 | |||
8 | 0 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 82.230200 | ||
1 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 81.707200 | |||
2 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 81.749900 | |||
10 | 0 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 104.912000 | ||
1 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 102.048000 | |||
2 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 102.393000 | |||
12 | 0 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 123.494000 | ||
1 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 122.632000 | |||
2 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 123.046000 | |||
14 | 0 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 146.433000 | ||
1 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 145.012000 | |||
2 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 144.910000 | |||
16 | 0 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 161.290000 | ||
1 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 159.418000 | |||
2 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 163.115000 | |||
nvidia-cudnn | 2 | 0 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 30.962000 | |
1 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 31.267200 | |||
2 | 16.646488 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 29.678100 | |||
4 | 0 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 36.172400 | ||
1 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.930800 | |||
2 | 33.292976 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.934400 | |||
6 | 0 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 49.516800 | ||
1 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 47.077900 | |||
2 | 49.939464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 48.064000 | |||
8 | 0 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 61.372700 | ||
1 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 61.781200 | |||
2 | 66.585952 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 62.266000 | |||
10 | 0 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 77.890600 | ||
1 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 76.912900 | |||
2 | 83.232440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 77.028600 | |||
12 | 0 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 90.423300 | ||
1 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 91.181600 | |||
2 | 99.878928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 90.351700 | |||
14 | 0 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 108.288000 | ||
1 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 108.049000 | |||
2 | 116.525416 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 110.727000 | |||
16 | 0 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 118.699000 | ||
1 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 118.891000 | |||
2 | 133.171904 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 118.251000 | |||
nvidia-fp16-cuda | 2 | 0 | 7.704896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 24.763800 | |
1 | 7.704896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 31.361200 | |||
2 | 7.704896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 24.386800 | |||
4 | 0 | 15.409792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.685400 | ||
1 | 15.409792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.958800 | |||
2 | 15.409792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.573300 | |||
6 | 0 | 23.114688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 54.463000 | ||
1 | 23.114688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 52.272600 | |||
2 | 23.114688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 52.196600 | |||
8 | 0 | 30.819584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 69.339700 | ||
1 | 30.819584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 70.808600 | |||
2 | 30.819584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 69.177900 | |||
10 | 0 | 38.524480 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 87.918800 | ||
1 | 38.524480 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 87.469800 | |||
2 | 38.524480 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 87.699700 | |||
12 | 0 | 46.229376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 103.192000 | ||
1 | 46.229376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 102.738000 | |||
2 | 46.229376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 103.069000 | |||
14 | 0 | 53.934272 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 128.973000 | ||
1 | 53.934272 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 129.737000 | |||
2 | 53.934272 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 128.318000 | |||
16 | 0 | 61.639168 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 138.174000 | ||
1 | 61.639168 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.887000 | |||
2 | 61.639168 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 138.502000 | |||
nvidia-fp16-cudnn | 2 | 0 | 7.704896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 18.756000 | |
1 | 7.704896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 18.588600 | |||
2 | 7.704896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 18.794500 | |||
4 | 0 | 15.409792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 23.403500 | ||
1 | 15.409792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 23.671800 | |||
2 | 15.409792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 23.406400 | |||
6 | 0 | 23.114688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 32.481200 | ||
1 | 23.114688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 32.601000 | |||
2 | 23.114688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 32.558900 | |||
8 | 0 | 30.819584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 41.800500 | ||
1 | 30.819584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 41.906000 | |||
2 | 30.819584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 41.736200 | |||
10 | 0 | 38.524480 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 56.119700 | ||
1 | 38.524480 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 55.608400 | |||
2 | 38.524480 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 56.036900 | |||
12 | 0 | 46.229376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 62.911300 | ||
1 | 46.229376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 62.975500 | |||
2 | 46.229376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 63.190200 | |||
14 | 0 | 53.934272 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 83.796800 | ||
1 | 53.934272 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 83.486300 | |||
2 | 53.934272 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 84.114500 | |||
16 | 0 | 61.639168 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 81.327700 | ||
1 | 61.639168 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.877900 | |||
2 | 61.639168 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.907600 | |||
tensorrt-fp16 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 12.620625 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 12.564573 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 12.914520 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 16.090150 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.953150 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.990307 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 27.783512 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 26.235513 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 26.120513 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.959194 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.995985 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.967563 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 34.830140 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 34.651298 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 34.610877 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 39.653559 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 39.588243 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 39.469033 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.443609 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.248030 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.646662 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 47.710660 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 47.555765 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 47.390291 | |||
tensorrt-fp32 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 29.353353 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 20.505832 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 20.473990 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 28.293512 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 28.337670 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 28.287670 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 40.110348 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 40.261453 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 40.066032 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.178815 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.142868 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.086184 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 63.779229 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 63.143178 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 63.218810 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 71.943542 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 72.634436 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 71.585332 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 79.590329 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 79.380643 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 78.854327 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 104.076155 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 103.862054 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 103.895998 | |||
bvlc-googlenet | cpu | 2 | 0 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 726.576000 |
1 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 721.600000 | |||
2 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 721.741000 | |||
4 | 0 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1433.930000 | ||
1 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1435.080000 | |||
2 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1452.440000 | |||
6 | 0 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2161.430000 | ||
1 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2169.870000 | |||
2 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2165.080000 | |||
8 | 0 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2872.770000 | ||
1 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2895.270000 | |||
2 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2882.010000 | |||
10 | 0 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 3577.710000 | ||
1 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 3592.950000 | |||
2 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 3600.860000 | |||
12 | 0 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 4307.930000 | ||
1 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 4308.880000 | |||
2 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 4362.740000 | |||
14 | 0 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 5008.490000 | ||
1 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 5067.070000 | |||
2 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 5023.240000 | |||
16 | 0 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 5737.780000 | ||
1 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 5730.080000 | |||
2 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 5731.630000 | |||
libdnn-cuda | 2 | 0 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 95.832100 | |
1 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 98.095700 | |||
2 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 107.846000 | |||
4 | 0 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 150.999000 | ||
1 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 152.185000 | |||
2 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 153.025000 | |||
6 | 0 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 215.748000 | ||
1 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 216.037000 | |||
2 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 216.775000 | |||
8 | 0 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 275.590000 | ||
1 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 275.458000 | |||
2 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 276.615000 | |||
10 | 0 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 338.860000 | ||
1 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 349.021000 | |||
2 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 349.793000 | |||
12 | 0 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 414.334000 | ||
1 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 411.590000 | |||
2 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 401.385000 | |||
14 | 0 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 462.678000 | ||
1 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 468.287000 | |||
2 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 472.324000 | |||
16 | 0 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 523.130000 | ||
1 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 522.837000 | |||
2 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 522.547000 | |||
nvidia-cuda | 2 | 0 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 72.503000 | |
1 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 67.808100 | |||
2 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 67.142200 | |||
4 | 0 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 114.992000 | ||
1 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 113.434000 | |||
2 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 113.910000 | |||
6 | 0 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 158.350000 | ||
1 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 158.556000 | |||
2 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 159.451000 | |||
8 | 0 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 203.071000 | ||
1 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 204.409000 | |||
2 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 203.029000 | |||
10 | 0 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 248.787000 | ||
1 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 248.164000 | |||
2 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 248.568000 | |||
12 | 0 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 298.789000 | ||
1 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 294.533000 | |||
2 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 300.100000 | |||
14 | 0 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 342.822000 | ||
1 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 339.103000 | |||
2 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 347.853000 | |||
16 | 0 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 383.233000 | ||
1 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 386.597000 | |||
2 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 385.031000 | |||
nvidia-cudnn | 2 | 0 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 55.022400 | |
1 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.155200 | |||
2 | 110.306688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 55.736300 | |||
4 | 0 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 86.435100 | ||
1 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 85.662900 | |||
2 | 220.613376 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 86.069500 | |||
6 | 0 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 117.502000 | ||
1 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 123.356000 | |||
2 | 330.920064 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 117.326000 | |||
8 | 0 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 148.409000 | ||
1 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 151.150000 | |||
2 | 441.226752 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 149.771000 | |||
10 | 0 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 182.625000 | ||
1 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 178.184000 | |||
2 | 551.533440 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 177.301000 | |||
12 | 0 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 211.654000 | ||
1 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 209.654000 | |||
2 | 661.840128 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 211.429000 | |||
14 | 0 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 243.337000 | ||
1 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 244.806000 | |||
2 | 772.146816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 242.718000 | |||
16 | 0 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 272.625000 | ||
1 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 274.295000 | |||
2 | 882.453504 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 274.347000 | |||
nvidia-fp16-cuda | 2 | 0 | 54.551232 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 49.495300 | |
1 | 54.551232 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 49.407600 | |||
2 | 54.551232 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 51.201900 | |||
4 | 0 | 109.102464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 91.893600 | ||
1 | 109.102464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 93.751800 | |||
2 | 109.102464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 91.854400 | |||
6 | 0 | 163.653696 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 134.170000 | ||
1 | 163.653696 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 134.373000 | |||
2 | 163.653696 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 133.610000 | |||
8 | 0 | 218.204928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 176.348000 | ||
1 | 218.204928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 176.455000 | |||
2 | 218.204928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 176.885000 | |||
10 | 0 | 272.756160 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 219.665000 | ||
1 | 272.756160 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 219.154000 | |||
2 | 272.756160 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 221.270000 | |||
12 | 0 | 327.307392 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 263.124000 | ||
1 | 327.307392 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 262.079000 | |||
2 | 327.307392 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 263.535000 | |||
14 | 0 | 381.858624 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 306.364000 | ||
1 | 381.858624 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 305.265000 | |||
2 | 381.858624 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 306.503000 | |||
16 | 0 | 436.409856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 348.909000 | ||
1 | 436.409856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 349.635000 | |||
2 | 436.409856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 348.834000 | |||
nvidia-fp16-cudnn | 2 | 0 | 54.551232 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 29.643400 | |
1 | 54.551232 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 29.601300 | |||
2 | 54.551232 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 29.663000 | |||
4 | 0 | 109.102464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 54.761200 | ||
1 | 109.102464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 54.610200 | |||
2 | 109.102464 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 54.659300 | |||
6 | 0 | 163.653696 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.573500 | ||
1 | 163.653696 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.484700 | |||
2 | 163.653696 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.399300 | |||
8 | 0 | 218.204928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 90.605100 | ||
1 | 218.204928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 90.459300 | |||
2 | 218.204928 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 90.586500 | |||
10 | 0 | 272.756160 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 128.923000 | ||
1 | 272.756160 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 129.616000 | |||
2 | 272.756160 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 131.376000 | |||
12 | 0 | 327.307392 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 153.502000 | ||
1 | 327.307392 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 153.667000 | |||
2 | 327.307392 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 153.424000 | |||
14 | 0 | 381.858624 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 183.439000 | ||
1 | 381.858624 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 181.383000 | |||
2 | 381.858624 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 183.069000 | |||
16 | 0 | 436.409856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 178.709000 | ||
1 | 436.409856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 177.023000 | |||
2 | 436.409856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 178.549000 | |||
tensorrt-fp16 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 14.485308 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 14.576308 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 14.585309 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 24.238514 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 24.137461 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 24.254303 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 34.025298 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 34.076772 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 34.172509 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 44.013294 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.854451 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.804714 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.217868 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.229500 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.229815 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 64.078599 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 64.333019 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 64.391072 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 74.021542 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 73.949804 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 74.062541 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 83.507063 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 83.629536 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 83.874746 | |||
tensorrt-fp32 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 24.654566 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 24.706514 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 24.812513 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 46.024871 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 46.235344 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 46.118398 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 68.209123 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 68.337544 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 68.308860 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 89.629849 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 88.754744 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 89.036270 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 108.358576 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 109.368577 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 108.333629 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 132.795089 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 132.970562 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 132.811038 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 153.733922 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 153.970342 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 153.733921 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 176.017908 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 177.000647 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 176.663119 | |||
deepscale-squeezenet-1.0 | cpu | 2 | 0 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 253.702000 |
1 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 255.276000 | |||
2 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 252.807000 | |||
4 | 0 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 502.686000 | ||
1 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 504.233000 | |||
2 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 504.196000 | |||
6 | 0 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 756.378000 | ||
1 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 752.891000 | |||
2 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 755.365000 | |||
8 | 0 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1002.240000 | ||
1 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1003.510000 | |||
2 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1011.790000 | |||
10 | 0 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1253.520000 | ||
1 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1260.400000 | |||
2 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1256.570000 | |||
12 | 0 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1504.420000 | ||
1 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1505.160000 | |||
2 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1505.980000 | |||
14 | 0 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1754.850000 | ||
1 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1763.890000 | |||
2 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1752.510000 | |||
16 | 0 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2017.420000 | ||
1 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2005.910000 | |||
2 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 2027.760000 | |||
libdnn-cuda | 2 | 0 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 48.651200 | |
1 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 50.457500 | |||
2 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 55.726400 | |||
4 | 0 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 88.424500 | ||
1 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 86.170400 | |||
2 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 89.805400 | |||
6 | 0 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 126.186000 | ||
1 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 133.083000 | |||
2 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 123.676000 | |||
8 | 0 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 166.935000 | ||
1 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 162.627000 | |||
2 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 163.530000 | |||
10 | 0 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 196.310000 | ||
1 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 223.921000 | |||
2 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 197.104000 | |||
12 | 0 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 233.976000 | ||
1 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 234.237000 | |||
2 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 235.000000 | |||
14 | 0 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 270.913000 | ||
1 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 276.680000 | |||
2 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 271.152000 | |||
16 | 0 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 314.305000 | ||
1 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 307.690000 | |||
2 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 314.576000 | |||
nvidia-cuda | 2 | 0 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 40.542600 | |
1 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 41.271000 | |||
2 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 41.370500 | |||
4 | 0 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 77.570300 | ||
1 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 72.845500 | |||
2 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 71.853300 | |||
6 | 0 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 103.481000 | ||
1 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 104.214000 | |||
2 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 107.186000 | |||
8 | 0 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 133.831000 | ||
1 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 133.928000 | |||
2 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.559000 | |||
10 | 0 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 166.995000 | ||
1 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 166.171000 | |||
2 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 163.876000 | |||
12 | 0 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 195.475000 | ||
1 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 194.790000 | |||
2 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 197.505000 | |||
14 | 0 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 225.449000 | ||
1 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 224.973000 | |||
2 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 227.388000 | |||
16 | 0 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 255.899000 | ||
1 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 256.939000 | |||
2 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 257.944000 | |||
nvidia-cudnn | 2 | 0 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 34.387000 | |
1 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 36.602200 | |||
2 | 106.120408 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.238500 | |||
4 | 0 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 59.645700 | ||
1 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 60.952400 | |||
2 | 212.240816 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.919300 | |||
6 | 0 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 83.164300 | ||
1 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 83.634100 | |||
2 | 318.361224 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 83.703200 | |||
8 | 0 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 108.407000 | ||
1 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 110.041000 | |||
2 | 424.481632 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 112.536000 | |||
10 | 0 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 134.133000 | ||
1 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 132.381000 | |||
2 | 530.602040 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 134.470000 | |||
12 | 0 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 156.495000 | ||
1 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 156.894000 | |||
2 | 636.722448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 158.866000 | |||
14 | 0 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 185.548000 | ||
1 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 183.169000 | |||
2 | 742.842856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 185.909000 | |||
16 | 0 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 211.621000 | ||
1 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 206.880000 | |||
2 | 848.963264 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 205.637000 | |||
nvidia-fp16-cuda | 2 | 0 | 52.441856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 27.853600 | |
1 | 52.441856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 27.492600 | |||
2 | 52.441856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 27.647800 | |||
4 | 0 | 104.883712 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 53.218000 | ||
1 | 104.883712 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 53.337200 | |||
2 | 104.883712 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 54.811700 | |||
6 | 0 | 157.325568 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 78.116900 | ||
1 | 157.325568 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 78.267300 | |||
2 | 157.325568 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 78.620300 | |||
8 | 0 | 209.767424 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 104.016000 | ||
1 | 209.767424 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 103.641000 | |||
2 | 209.767424 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 103.442000 | |||
10 | 0 | 262.209280 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 128.267000 | ||
1 | 262.209280 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 128.886000 | |||
2 | 262.209280 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 128.613000 | |||
12 | 0 | 314.651136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 155.909000 | ||
1 | 314.651136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 156.209000 | |||
2 | 314.651136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 154.058000 | |||
14 | 0 | 367.092992 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 179.802000 | ||
1 | 367.092992 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 179.570000 | |||
2 | 367.092992 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 181.353000 | |||
16 | 0 | 419.534848 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 206.533000 | ||
1 | 419.534848 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 204.565000 | |||
2 | 419.534848 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 204.765000 | |||
nvidia-fp16-cudnn | 2 | 0 | 52.441856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 20.285600 | |
1 | 52.441856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 20.263700 | |||
2 | 52.441856 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 20.176900 | |||
4 | 0 | 104.883712 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 39.222300 | ||
1 | 104.883712 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 39.080000 | |||
2 | 104.883712 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 39.138000 | |||
6 | 0 | 157.325568 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.987900 | ||
1 | 157.325568 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 59.029000 | |||
2 | 157.325568 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 59.102300 | |||
8 | 0 | 209.767424 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 75.810200 | ||
1 | 209.767424 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 72.273200 | |||
2 | 209.767424 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 71.923500 | |||
10 | 0 | 262.209280 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 98.192500 | ||
1 | 262.209280 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 97.878600 | |||
2 | 262.209280 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 97.853600 | |||
12 | 0 | 314.651136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 116.559000 | ||
1 | 314.651136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 114.888000 | |||
2 | 314.651136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 117.108000 | |||
14 | 0 | 367.092992 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.638000 | ||
1 | 367.092992 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.398000 | |||
2 | 367.092992 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.875000 | |||
16 | 0 | 419.534848 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 142.985000 | ||
1 | 419.534848 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 143.718000 | |||
2 | 419.534848 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 143.840000 | |||
tensorrt-fp16 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.757838 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.738153 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.700311 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.879676 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.922834 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.959466 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 23.154199 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 23.167041 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 23.222724 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.145195 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.219143 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.188668 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 37.440718 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 37.072928 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 37.368454 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 45.281240 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 44.407293 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 44.368872 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 51.842710 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 51.829131 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 52.408868 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 58.747601 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 58.934180 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 58.617443 | |||
tensorrt-fp32 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.774518 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.716466 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.731940 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.450721 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.406879 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.546563 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 45.192555 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 44.934661 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 45.055346 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 59.673022 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 60.149180 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 59.989180 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 75.474067 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 74.971699 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 74.683594 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 90.008849 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 89.615848 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 89.681428 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 104.018262 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 103.730684 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 104.923155 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 119.386255 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 118.910782 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 119.409730 | |||
deepscale-squeezenet-1.1 | cpu | 2 | 0 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.270000 |
1 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 136.626000 | |||
2 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 137.120000 | |||
4 | 0 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 270.304000 | ||
1 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 273.057000 | |||
2 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 270.515000 | |||
6 | 0 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 408.191000 | ||
1 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 404.040000 | |||
2 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 406.419000 | |||
8 | 0 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 536.611000 | ||
1 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 538.226000 | |||
2 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 540.875000 | |||
10 | 0 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 669.522000 | ||
1 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 672.894000 | |||
2 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 671.978000 | |||
12 | 0 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 808.527000 | ||
1 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 807.042000 | |||
2 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 804.237000 | |||
14 | 0 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 937.107000 | ||
1 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 944.726000 | |||
2 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 938.090000 | |||
16 | 0 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1078.190000 | ||
1 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1072.820000 | |||
2 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 1086.130000 | |||
libdnn-cuda | 2 | 0 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 32.291500 | |
1 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 28.682600 | |||
2 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 43.390000 | |||
4 | 0 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.585900 | ||
1 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 49.974100 | |||
2 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 49.609600 | |||
6 | 0 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 76.880300 | ||
1 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 67.614700 | |||
2 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 69.939300 | |||
8 | 0 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 94.871500 | ||
1 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 92.338400 | |||
2 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 96.141600 | |||
10 | 0 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 113.236000 | ||
1 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 113.252000 | |||
2 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 105.192000 | |||
12 | 0 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 125.267000 | ||
1 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 124.526000 | |||
2 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 133.248000 | |||
14 | 0 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 151.743000 | ||
1 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 145.605000 | |||
2 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 150.738000 | |||
16 | 0 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 170.049000 | ||
1 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 174.297000 | |||
2 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 165.273000 | |||
nvidia-cuda | 2 | 0 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 30.304500 | |
1 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 28.741000 | |||
2 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 28.350700 | |||
4 | 0 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 53.818100 | ||
1 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 55.737700 | |||
2 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 49.688400 | |||
6 | 0 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 64.211500 | ||
1 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 64.305500 | |||
2 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 68.016700 | |||
8 | 0 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 86.360600 | ||
1 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 85.752100 | |||
2 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 83.783000 | |||
10 | 0 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 100.881000 | ||
1 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 104.219000 | |||
2 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 102.938000 | |||
12 | 0 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 122.704000 | ||
1 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 120.847000 | |||
2 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 119.593000 | |||
14 | 0 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 138.405000 | ||
1 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 140.048000 | |||
2 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 138.398000 | |||
16 | 0 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 156.537000 | ||
1 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 157.109000 | |||
2 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 159.291000 | |||
nvidia-cudnn | 2 | 0 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 24.500500 | |
1 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 25.696100 | |||
2 | 64.629592 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 25.903400 | |||
4 | 0 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 37.777800 | ||
1 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 38.335900 | |||
2 | 129.259184 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 38.896700 | |||
6 | 0 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 51.485400 | ||
1 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 51.531400 | |||
2 | 193.888776 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 51.437000 | |||
8 | 0 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 65.522900 | ||
1 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 64.851700 | |||
2 | 258.518368 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 64.468300 | |||
10 | 0 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.538000 | ||
1 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 78.080400 | |||
2 | 323.147960 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 78.383700 | |||
12 | 0 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 91.085900 | ||
1 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 94.015300 | |||
2 | 387.777552 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 91.812000 | |||
14 | 0 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 109.639000 | ||
1 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 107.549000 | |||
2 | 452.407144 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 109.073000 | |||
16 | 0 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 119.433000 | ||
1 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 118.776000 | |||
2 | 517.036736 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 129.538000 | |||
nvidia-fp16-cuda | 2 | 0 | 31.696448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 18.812500 | |
1 | 31.696448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 19.924600 | |||
2 | 31.696448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 18.920100 | |||
4 | 0 | 63.392896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 34.724900 | ||
1 | 63.392896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 34.908400 | |||
2 | 63.392896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.160200 | |||
6 | 0 | 95.089344 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 50.611900 | ||
1 | 95.089344 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 52.941600 | |||
2 | 95.089344 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 52.888400 | |||
8 | 0 | 126.785792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 66.761300 | ||
1 | 126.785792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 66.974200 | |||
2 | 126.785792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 67.070000 | |||
10 | 0 | 158.482240 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 82.873300 | ||
1 | 158.482240 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 85.610100 | |||
2 | 158.482240 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 82.638900 | |||
12 | 0 | 190.178688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 98.593900 | ||
1 | 190.178688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 100.466000 | |||
2 | 190.178688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 98.875200 | |||
14 | 0 | 221.875136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 114.812000 | ||
1 | 221.875136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 116.996000 | |||
2 | 221.875136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 114.821000 | |||
16 | 0 | 253.571584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 140.382000 | ||
1 | 253.571584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 131.114000 | |||
2 | 253.571584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 131.181000 | |||
nvidia-fp16-cudnn | 2 | 0 | 31.696448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 14.147600 | |
1 | 31.696448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 12.577200 | |||
2 | 31.696448 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 12.529400 | |||
4 | 0 | 63.392896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 23.465500 | ||
1 | 63.392896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 23.520600 | |||
2 | 63.392896 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 23.469500 | |||
6 | 0 | 95.089344 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.551900 | ||
1 | 95.089344 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.491700 | |||
2 | 95.089344 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 35.629900 | |||
8 | 0 | 126.785792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 40.318700 | ||
1 | 126.785792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 40.286800 | |||
2 | 126.785792 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 40.327100 | |||
10 | 0 | 158.482240 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.051000 | ||
1 | 158.482240 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.168500 | |||
2 | 158.482240 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 58.068100 | |||
12 | 0 | 190.178688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 67.803500 | ||
1 | 190.178688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 67.663000 | |||
2 | 190.178688 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 69.536100 | |||
14 | 0 | 221.875136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 81.022400 | ||
1 | 221.875136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 81.185500 | |||
2 | 221.875136 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.984100 | |||
16 | 0 | 253.571584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 79.594500 | ||
1 | 253.571584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.570500 | |||
2 | 253.571584 | [{u'index': 0, u'direction': u'forward', u'tim... | yes | 80.427800 | |||
tensorrt-fp16 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 5.377155 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 5.414945 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 5.411576 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 9.027626 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 9.028205 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.962101 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 12.251047 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 12.367309 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 12.304204 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.868360 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.945623 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.835097 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 18.976411 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 18.924622 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 19.173148 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 23.086462 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 23.001304 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 23.013410 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 26.214039 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 26.271934 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 26.415460 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.113932 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.275458 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 30.134406 | |||
tensorrt-fp32 | 2 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.265153 | |
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.238364 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 8.144259 | |||
4 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.685308 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.143676 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 15.116308 | |||
6 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 22.320673 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 22.175621 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 22.011515 | |||
8 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 29.065248 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 29.047458 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 29.676406 | |||
10 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 36.717139 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 36.331982 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 36.023297 | |||
12 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.034188 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 43.000188 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 42.970767 | |||
14 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 49.826817 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 49.500343 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 49.993396 | |||
16 | 0 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 57.256970 | ||
1 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 57.296445 | |||
2 | 0.000000 | [{u'index': 0, u'direction': u'forward', u'nam... | yes | 58.138023 |
df_time = df_all['time (ms)'].unstack(df_all.index.names[:-1])
pd.options.display.max_columns = len(df_time.columns)
pd.options.display.max_rows = len(df_time.index)
df_time
model | bvlc-alexnet | bvlc-googlenet | deepscale-squeezenet-1.0 | deepscale-squeezenet-1.1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
lib | cpu | libdnn-cuda | nvidia-cuda | nvidia-cudnn | nvidia-fp16-cuda | nvidia-fp16-cudnn | tensorrt-fp16 | tensorrt-fp32 | cpu | libdnn-cuda | nvidia-cuda | nvidia-cudnn | nvidia-fp16-cuda | nvidia-fp16-cudnn | tensorrt-fp16 | tensorrt-fp32 | cpu | libdnn-cuda | nvidia-cuda | nvidia-cudnn | nvidia-fp16-cuda | nvidia-fp16-cudnn | tensorrt-fp16 | tensorrt-fp32 | cpu | libdnn-cuda | nvidia-cuda | nvidia-cudnn | nvidia-fp16-cuda | nvidia-fp16-cudnn | tensorrt-fp16 | tensorrt-fp32 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
batch size | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 |
repetition | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
0 | 429.404 | 657.945 | 926.569 | 1058.14 | 1293.97 | 1530.73 | 1768.87 | 1983.67 | 43.4558 | 65.6589 | 93.9803 | 120.833 | 147.654 | 179.296 | 208.372 | 233.930 | 32.6553 | 44.5101 | 63.0471 | 82.2302 | 104.912 | 123.494 | 146.433 | 161.290 | 30.9620 | 36.1724 | 49.5168 | 61.3727 | 77.8906 | 90.4233 | 108.288 | 118.699 | 24.7638 | 35.6854 | 54.4630 | 69.3397 | 87.9188 | 103.192 | 128.973 | 138.174 | 18.7560 | 23.4035 | 32.4812 | 41.8005 | 56.1197 | 62.9113 | 83.7968 | 81.3277 | 12.620625 | 16.090150 | 27.783512 | 30.959194 | 34.830140 | 39.653559 | 43.443609 | 47.710660 | 29.353353 | 28.293512 | 40.110348 | 52.178815 | 63.779229 | 71.943542 | 79.590329 | 104.076155 | 726.576 | 1433.93 | 2161.43 | 2872.77 | 3577.71 | 4307.93 | 5008.49 | 5737.78 | 95.8321 | 150.999 | 215.748 | 275.590 | 338.860 | 414.334 | 462.678 | 523.130 | 72.5030 | 114.992 | 158.350 | 203.071 | 248.787 | 298.789 | 342.822 | 383.233 | 55.0224 | 86.4351 | 117.502 | 148.409 | 182.625 | 211.654 | 243.337 | 272.625 | 49.4953 | 91.8936 | 134.170 | 176.348 | 219.665 | 263.124 | 306.364 | 348.909 | 29.6434 | 54.7612 | 80.5735 | 90.6051 | 128.923 | 153.502 | 183.439 | 178.709 | 14.485308 | 24.238514 | 34.025298 | 44.013294 | 52.217868 | 64.078599 | 74.021542 | 83.507063 | 24.654566 | 46.024871 | 68.209123 | 89.629849 | 108.358576 | 132.795089 | 153.733922 | 176.017908 | 253.702 | 502.686 | 756.378 | 1002.24 | 1253.52 | 1504.42 | 1754.85 | 2017.42 | 48.6512 | 88.4245 | 126.186 | 166.935 | 196.310 | 233.976 | 270.913 | 314.305 | 40.5426 | 77.5703 | 103.481 | 133.831 | 166.995 | 195.475 | 225.449 | 255.899 | 34.3870 | 59.6457 | 83.1643 | 108.407 | 134.133 | 156.495 | 185.548 | 211.621 | 27.8536 | 53.2180 | 78.1169 | 104.016 | 128.267 | 155.909 | 179.802 | 206.533 | 20.2856 | 39.2223 | 58.9879 | 75.8102 | 98.1925 | 116.559 | 136.638 | 142.985 | 8.757838 | 15.879676 | 23.154199 | 30.145195 | 37.440718 | 45.281240 | 51.842710 | 58.747601 | 15.774518 | 30.450721 | 45.192555 | 59.673022 | 75.474067 | 90.008849 | 104.018262 | 119.386255 | 136.270 | 270.304 | 408.191 | 536.611 | 669.522 | 808.527 | 937.107 | 1078.19 | 32.2915 | 58.5859 | 76.8803 | 94.8715 | 113.236 | 125.267 | 151.743 | 170.049 | 30.3045 | 53.8181 | 64.2115 | 86.3606 | 100.881 | 122.704 | 138.405 | 156.537 | 24.5005 | 37.7778 | 51.4854 | 65.5229 | 80.5380 | 91.0859 | 109.639 | 119.433 | 18.8125 | 34.7249 | 50.6119 | 66.7613 | 82.8733 | 98.5939 | 114.812 | 140.382 | 14.1476 | 23.4655 | 35.5519 | 40.3187 | 58.0510 | 67.8035 | 81.0224 | 79.5945 | 5.377155 | 9.027626 | 12.251047 | 15.868360 | 18.976411 | 23.086462 | 26.214039 | 30.113932 | 8.265153 | 15.685308 | 22.320673 | 29.065248 | 36.717139 | 43.034188 | 49.826817 | 57.256970 |
1 | 432.713 | 659.096 | 923.892 | 1052.84 | 1279.21 | 1518.82 | 1750.41 | 2001.91 | 45.2094 | 65.1705 | 93.0999 | 118.322 | 147.449 | 177.986 | 208.853 | 232.266 | 34.4193 | 44.6849 | 62.7304 | 81.7072 | 102.048 | 122.632 | 145.012 | 159.418 | 31.2672 | 35.9308 | 47.0779 | 61.7812 | 76.9129 | 91.1816 | 108.049 | 118.891 | 31.3612 | 35.9588 | 52.2726 | 70.8086 | 87.4698 | 102.738 | 129.737 | 136.887 | 18.5886 | 23.6718 | 32.6010 | 41.9060 | 55.6084 | 62.9755 | 83.4863 | 80.8779 | 12.564573 | 15.953150 | 26.235513 | 30.995985 | 34.651298 | 39.588243 | 43.248030 | 47.555765 | 20.505832 | 28.337670 | 40.261453 | 52.142868 | 63.143178 | 72.634436 | 79.380643 | 103.862054 | 721.600 | 1435.08 | 2169.87 | 2895.27 | 3592.95 | 4308.88 | 5067.07 | 5730.08 | 98.0957 | 152.185 | 216.037 | 275.458 | 349.021 | 411.590 | 468.287 | 522.837 | 67.8081 | 113.434 | 158.556 | 204.409 | 248.164 | 294.533 | 339.103 | 386.597 | 58.1552 | 85.6629 | 123.356 | 151.150 | 178.184 | 209.654 | 244.806 | 274.295 | 49.4076 | 93.7518 | 134.373 | 176.455 | 219.154 | 262.079 | 305.265 | 349.635 | 29.6013 | 54.6102 | 80.4847 | 90.4593 | 129.616 | 153.667 | 181.383 | 177.023 | 14.576308 | 24.137461 | 34.076772 | 43.854451 | 52.229500 | 64.333019 | 73.949804 | 83.629536 | 24.706514 | 46.235344 | 68.337544 | 88.754744 | 109.368577 | 132.970562 | 153.970342 | 177.000647 | 255.276 | 504.233 | 752.891 | 1003.51 | 1260.40 | 1505.16 | 1763.89 | 2005.91 | 50.4575 | 86.1704 | 133.083 | 162.627 | 223.921 | 234.237 | 276.680 | 307.690 | 41.2710 | 72.8455 | 104.214 | 133.928 | 166.171 | 194.790 | 224.973 | 256.939 | 36.6022 | 60.9524 | 83.6341 | 110.041 | 132.381 | 156.894 | 183.169 | 206.880 | 27.4926 | 53.3372 | 78.2673 | 103.641 | 128.886 | 156.209 | 179.570 | 204.565 | 20.2637 | 39.0800 | 59.0290 | 72.2732 | 97.8786 | 114.888 | 136.398 | 143.718 | 8.738153 | 15.922834 | 23.167041 | 30.219143 | 37.072928 | 44.407293 | 51.829131 | 58.934180 | 15.716466 | 30.406879 | 44.934661 | 60.149180 | 74.971699 | 89.615848 | 103.730684 | 118.910782 | 136.626 | 273.057 | 404.040 | 538.226 | 672.894 | 807.042 | 944.726 | 1072.82 | 28.6826 | 49.9741 | 67.6147 | 92.3384 | 113.252 | 124.526 | 145.605 | 174.297 | 28.7410 | 55.7377 | 64.3055 | 85.7521 | 104.219 | 120.847 | 140.048 | 157.109 | 25.6961 | 38.3359 | 51.5314 | 64.8517 | 78.0804 | 94.0153 | 107.549 | 118.776 | 19.9246 | 34.9084 | 52.9416 | 66.9742 | 85.6101 | 100.4660 | 116.996 | 131.114 | 12.5772 | 23.5206 | 35.4917 | 40.2868 | 58.1685 | 67.6630 | 81.1855 | 80.5705 | 5.414945 | 9.028205 | 12.367309 | 15.945623 | 18.924622 | 23.001304 | 26.271934 | 30.275458 | 8.238364 | 15.143676 | 22.175621 | 29.047458 | 36.331982 | 43.000188 | 49.500343 | 57.296445 |
2 | 438.301 | 654.448 | 927.562 | 1050.14 | 1294.98 | 1527.61 | 1767.41 | 1982.13 | 45.0862 | 65.9721 | 91.3594 | 118.151 | 149.036 | 177.760 | 208.871 | 235.624 | 32.9734 | 47.2591 | 62.8578 | 81.7499 | 102.393 | 123.046 | 144.910 | 163.115 | 29.6781 | 35.9344 | 48.0640 | 62.2660 | 77.0286 | 90.3517 | 110.727 | 118.251 | 24.3868 | 35.5733 | 52.1966 | 69.1779 | 87.6997 | 103.069 | 128.318 | 138.502 | 18.7945 | 23.4064 | 32.5589 | 41.7362 | 56.0369 | 63.1902 | 84.1145 | 80.9076 | 12.914520 | 15.990307 | 26.120513 | 30.967563 | 34.610877 | 39.469033 | 43.646662 | 47.390291 | 20.473990 | 28.287670 | 40.066032 | 52.086184 | 63.218810 | 71.585332 | 78.854327 | 103.895998 | 721.741 | 1452.44 | 2165.08 | 2882.01 | 3600.86 | 4362.74 | 5023.24 | 5731.63 | 107.8460 | 153.025 | 216.775 | 276.615 | 349.793 | 401.385 | 472.324 | 522.547 | 67.1422 | 113.910 | 159.451 | 203.029 | 248.568 | 300.100 | 347.853 | 385.031 | 55.7363 | 86.0695 | 117.326 | 149.771 | 177.301 | 211.429 | 242.718 | 274.347 | 51.2019 | 91.8544 | 133.610 | 176.885 | 221.270 | 263.535 | 306.503 | 348.834 | 29.6630 | 54.6593 | 80.3993 | 90.5865 | 131.376 | 153.424 | 183.069 | 178.549 | 14.585309 | 24.254303 | 34.172509 | 43.804714 | 52.229815 | 64.391072 | 74.062541 | 83.874746 | 24.812513 | 46.118398 | 68.308860 | 89.036270 | 108.333629 | 132.811038 | 153.733921 | 176.663119 | 252.807 | 504.196 | 755.365 | 1011.79 | 1256.57 | 1505.98 | 1752.51 | 2027.76 | 55.7264 | 89.8054 | 123.676 | 163.530 | 197.104 | 235.000 | 271.152 | 314.576 | 41.3705 | 71.8533 | 107.186 | 136.559 | 163.876 | 197.505 | 227.388 | 257.944 | 35.2385 | 58.9193 | 83.7032 | 112.536 | 134.470 | 158.866 | 185.909 | 205.637 | 27.6478 | 54.8117 | 78.6203 | 103.442 | 128.613 | 154.058 | 181.353 | 204.765 | 20.1769 | 39.1380 | 59.1023 | 71.9235 | 97.8536 | 117.108 | 136.875 | 143.840 | 8.700311 | 15.959466 | 23.222724 | 30.188668 | 37.368454 | 44.368872 | 52.408868 | 58.617443 | 15.731940 | 30.546563 | 45.055346 | 59.989180 | 74.683594 | 89.681428 | 104.923155 | 119.409730 | 137.120 | 270.515 | 406.419 | 540.875 | 671.978 | 804.237 | 938.090 | 1086.13 | 43.3900 | 49.6096 | 69.9393 | 96.1416 | 105.192 | 133.248 | 150.738 | 165.273 | 28.3507 | 49.6884 | 68.0167 | 83.7830 | 102.938 | 119.593 | 138.398 | 159.291 | 25.9034 | 38.8967 | 51.4370 | 64.4683 | 78.3837 | 91.8120 | 109.073 | 129.538 | 18.9201 | 35.1602 | 52.8884 | 67.0700 | 82.6389 | 98.8752 | 114.821 | 131.181 | 12.5294 | 23.4695 | 35.6299 | 40.3271 | 58.0681 | 69.5361 | 80.9841 | 80.4278 | 5.411576 | 8.962101 | 12.304204 | 15.835097 | 19.173148 | 23.013410 | 26.415460 | 30.134406 | 8.144259 | 15.116308 | 22.011515 | 29.676406 | 36.023297 | 42.970767 | 49.993396 | 58.138023 |
df_mean_time_per_batch = df_time.describe().ix['mean'].unstack(level='batch size')
pd.options.display.max_columns = len(df_mean_time_per_batch.columns)
pd.options.display.max_rows = len(df_mean_time_per_batch.index)
df_mean_time_per_batch
batch size | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | |
---|---|---|---|---|---|---|---|---|---|
model | lib | ||||||||
bvlc-alexnet | cpu | 433.472667 | 657.163000 | 926.007667 | 1053.706667 | 1289.386667 | 1525.720000 | 1762.230000 | 1989.236667 |
libdnn-cuda | 44.583800 | 65.600500 | 92.813200 | 119.102000 | 148.046333 | 178.347333 | 208.698667 | 233.940000 | |
nvidia-cuda | 33.349333 | 45.484700 | 62.878433 | 81.895767 | 103.117667 | 123.057333 | 145.451667 | 161.274333 | |
nvidia-cudnn | 30.635767 | 36.012533 | 48.219567 | 61.806633 | 77.277367 | 90.652200 | 109.021333 | 118.613667 | |
nvidia-fp16-cuda | 26.837267 | 35.739167 | 52.977400 | 69.775400 | 87.696100 | 102.999667 | 129.009333 | 137.854333 | |
nvidia-fp16-cudnn | 18.713033 | 23.493900 | 32.547033 | 41.814233 | 55.921667 | 63.025667 | 83.799200 | 81.037733 | |
tensorrt-fp16 | 12.699906 | 16.011202 | 26.713179 | 30.974247 | 34.697438 | 39.570278 | 43.446100 | 47.552239 | |
tensorrt-fp32 | 23.444392 | 28.306284 | 40.145944 | 52.135956 | 63.380406 | 72.054437 | 79.275100 | 103.944736 | |
bvlc-googlenet | cpu | 723.305667 | 1440.483333 | 2165.460000 | 2883.350000 | 3590.506667 | 4326.516667 | 5032.933333 | 5733.163333 |
libdnn-cuda | 100.591267 | 152.069667 | 216.186667 | 275.887667 | 345.891333 | 409.103000 | 467.763000 | 522.838000 | |
nvidia-cuda | 69.151100 | 114.112000 | 158.785667 | 203.503000 | 248.506333 | 297.807333 | 343.259333 | 384.953667 | |
nvidia-cudnn | 56.304633 | 86.055833 | 119.394667 | 149.776667 | 179.370000 | 210.912333 | 243.620333 | 273.755667 | |
nvidia-fp16-cuda | 50.034933 | 92.499933 | 134.051000 | 176.562667 | 220.029667 | 262.912667 | 306.044000 | 349.126000 | |
nvidia-fp16-cudnn | 29.635900 | 54.676900 | 80.485833 | 90.550300 | 129.971667 | 153.531000 | 182.630333 | 178.093667 | |
tensorrt-fp16 | 14.548975 | 24.210093 | 34.091526 | 43.890820 | 52.225728 | 64.267563 | 74.011296 | 83.670448 | |
tensorrt-fp32 | 24.724531 | 46.126204 | 68.285176 | 89.140288 | 108.686927 | 132.858896 | 153.812728 | 176.560558 | |
deepscale-squeezenet-1.0 | cpu | 253.928333 | 503.705000 | 754.878000 | 1005.846667 | 1256.830000 | 1505.186667 | 1757.083333 | 2017.030000 |
libdnn-cuda | 51.611700 | 88.133433 | 127.648333 | 164.364000 | 205.778333 | 234.404333 | 272.915000 | 312.190333 | |
nvidia-cuda | 41.061367 | 74.089700 | 104.960333 | 134.772667 | 165.680667 | 195.923333 | 225.936667 | 256.927333 | |
nvidia-cudnn | 35.409233 | 59.839133 | 83.500533 | 110.328000 | 133.661333 | 157.418333 | 184.875333 | 208.046000 | |
nvidia-fp16-cuda | 27.664667 | 53.788967 | 78.334833 | 103.699667 | 128.588667 | 155.392000 | 180.241667 | 205.287667 | |
nvidia-fp16-cudnn | 20.242067 | 39.146767 | 59.039733 | 73.335633 | 97.974900 | 116.185000 | 136.637000 | 143.514333 | |
tensorrt-fp16 | 8.732101 | 15.920659 | 23.181321 | 30.184335 | 37.294033 | 44.685802 | 52.026903 | 58.766408 | |
tensorrt-fp32 | 15.740975 | 30.468054 | 45.060854 | 59.937127 | 75.043120 | 89.768708 | 104.224034 | 119.235589 | |
deepscale-squeezenet-1.1 | cpu | 136.672000 | 271.292000 | 406.216667 | 538.570667 | 671.464667 | 806.602000 | 939.974333 | 1079.046667 |
libdnn-cuda | 34.788033 | 52.723200 | 71.478100 | 94.450500 | 110.560000 | 127.680333 | 149.362000 | 169.873000 | |
nvidia-cuda | 29.132067 | 53.081400 | 65.511233 | 85.298567 | 102.679333 | 121.048000 | 138.950333 | 157.645667 | |
nvidia-cudnn | 25.366667 | 38.336800 | 51.484600 | 64.947633 | 79.000700 | 92.304400 | 108.753667 | 122.582333 | |
nvidia-fp16-cuda | 19.219067 | 34.931167 | 52.147300 | 66.935167 | 83.707433 | 99.311700 | 115.543000 | 134.225667 | |
nvidia-fp16-cudnn | 13.084733 | 23.485200 | 35.557833 | 40.310867 | 58.095867 | 68.334200 | 81.064000 | 80.197600 | |
tensorrt-fp16 | 5.401225 | 9.005977 | 12.307520 | 15.883027 | 19.024727 | 23.033725 | 26.300478 | 30.174599 | |
tensorrt-fp32 | 8.215925 | 15.315097 | 22.169270 | 29.263037 | 36.357473 | 43.001714 | 49.773519 | 57.563813 |
batch_sizes = df_mean_time_per_batch.columns.tolist()
# batch_sizes
df_mean_time_per_image = df_mean_time_per_batch / batch_sizes
pd.options.display.max_columns = len(df_mean_time_per_image.columns)
pd.options.display.max_rows = len(df_mean_time_per_image.index)
df_mean_time_per_image
batch size | 2 | 4 | 6 | 8 | 10 | 12 | 14 | 16 | |
---|---|---|---|---|---|---|---|---|---|
model | lib | ||||||||
bvlc-alexnet | cpu | 216.736333 | 164.290750 | 154.334611 | 131.713333 | 128.938667 | 127.143333 | 125.873571 | 124.327292 |
libdnn-cuda | 22.291900 | 16.400125 | 15.468867 | 14.887750 | 14.804633 | 14.862278 | 14.907048 | 14.621250 | |
nvidia-cuda | 16.674667 | 11.371175 | 10.479739 | 10.236971 | 10.311767 | 10.254778 | 10.389405 | 10.079646 | |
nvidia-cudnn | 15.317883 | 9.003133 | 8.036594 | 7.725829 | 7.727737 | 7.554350 | 7.787238 | 7.413354 | |
nvidia-fp16-cuda | 13.418633 | 8.934792 | 8.829567 | 8.721925 | 8.769610 | 8.583306 | 9.214952 | 8.615896 | |
nvidia-fp16-cudnn | 9.356517 | 5.873475 | 5.424506 | 5.226779 | 5.592167 | 5.252139 | 5.985657 | 5.064858 | |
tensorrt-fp16 | 6.349953 | 4.002801 | 4.452197 | 3.871781 | 3.469744 | 3.297523 | 3.103293 | 2.972015 | |
tensorrt-fp32 | 11.722196 | 7.076571 | 6.690991 | 6.516994 | 6.338041 | 6.004536 | 5.662507 | 6.496546 | |
bvlc-googlenet | cpu | 361.652833 | 360.120833 | 360.910000 | 360.418750 | 359.050667 | 360.543056 | 359.495238 | 358.322708 |
libdnn-cuda | 50.295633 | 38.017417 | 36.031111 | 34.485958 | 34.589133 | 34.091917 | 33.411643 | 32.677375 | |
nvidia-cuda | 34.575550 | 28.528000 | 26.464278 | 25.437875 | 24.850633 | 24.817278 | 24.518524 | 24.059604 | |
nvidia-cudnn | 28.152317 | 21.513958 | 19.899111 | 18.722083 | 17.937000 | 17.576028 | 17.401452 | 17.109729 | |
nvidia-fp16-cuda | 25.017467 | 23.124983 | 22.341833 | 22.070333 | 22.002967 | 21.909389 | 21.860286 | 21.820375 | |
nvidia-fp16-cudnn | 14.817950 | 13.669225 | 13.414306 | 11.318787 | 12.997167 | 12.794250 | 13.045024 | 11.130854 | |
tensorrt-fp16 | 7.274488 | 6.052523 | 5.681921 | 5.486352 | 5.222573 | 5.355630 | 5.286521 | 5.229403 | |
tensorrt-fp32 | 12.362265 | 11.531551 | 11.380863 | 11.142536 | 10.868693 | 11.071575 | 10.986623 | 11.035035 | |
deepscale-squeezenet-1.0 | cpu | 126.964167 | 125.926250 | 125.813000 | 125.730833 | 125.683000 | 125.432222 | 125.505952 | 126.064375 |
libdnn-cuda | 25.805850 | 22.033358 | 21.274722 | 20.545500 | 20.577833 | 19.533694 | 19.493929 | 19.511896 | |
nvidia-cuda | 20.530683 | 18.522425 | 17.493389 | 16.846583 | 16.568067 | 16.326944 | 16.138333 | 16.057958 | |
nvidia-cudnn | 17.704617 | 14.959783 | 13.916756 | 13.791000 | 13.366133 | 13.118194 | 13.205381 | 13.002875 | |
nvidia-fp16-cuda | 13.832333 | 13.447242 | 13.055806 | 12.962458 | 12.858867 | 12.949333 | 12.874405 | 12.830479 | |
nvidia-fp16-cudnn | 10.121033 | 9.786692 | 9.839956 | 9.166954 | 9.797490 | 9.682083 | 9.759786 | 8.969646 | |
tensorrt-fp16 | 4.366050 | 3.980165 | 3.863554 | 3.773042 | 3.729403 | 3.723817 | 3.716207 | 3.672901 | |
tensorrt-fp32 | 7.870487 | 7.617014 | 7.510142 | 7.492141 | 7.504312 | 7.480726 | 7.444574 | 7.452224 | |
deepscale-squeezenet-1.1 | cpu | 68.336000 | 67.823000 | 67.702778 | 67.321333 | 67.146467 | 67.216833 | 67.141024 | 67.440417 |
libdnn-cuda | 17.394017 | 13.180800 | 11.913017 | 11.806312 | 11.056000 | 10.640028 | 10.668714 | 10.617063 | |
nvidia-cuda | 14.566033 | 13.270350 | 10.918539 | 10.662321 | 10.267933 | 10.087333 | 9.925024 | 9.852854 | |
nvidia-cudnn | 12.683333 | 9.584200 | 8.580767 | 8.118454 | 7.900070 | 7.692033 | 7.768119 | 7.661396 | |
nvidia-fp16-cuda | 9.609533 | 8.732792 | 8.691217 | 8.366896 | 8.370743 | 8.275975 | 8.253071 | 8.389104 | |
nvidia-fp16-cudnn | 6.542367 | 5.871300 | 5.926306 | 5.038858 | 5.809587 | 5.694517 | 5.790286 | 5.012350 | |
tensorrt-fp16 | 2.700613 | 2.251494 | 2.051253 | 1.985378 | 1.902473 | 1.919477 | 1.878606 | 1.885912 | |
tensorrt-fp32 | 4.107963 | 3.828774 | 3.694878 | 3.657880 | 3.635747 | 3.583476 | 3.555251 | 3.597738 |
df_mean_time_per_image.min(axis=1)
model lib bvlc-alexnet cpu 124.327292 libdnn-cuda 14.621250 nvidia-cuda 10.079646 nvidia-cudnn 7.413354 nvidia-fp16-cuda 8.583306 nvidia-fp16-cudnn 5.064858 tensorrt-fp16 2.972015 tensorrt-fp32 5.662507 bvlc-googlenet cpu 358.322708 libdnn-cuda 32.677375 nvidia-cuda 24.059604 nvidia-cudnn 17.109729 nvidia-fp16-cuda 21.820375 nvidia-fp16-cudnn 11.130854 tensorrt-fp16 5.222573 tensorrt-fp32 10.868693 deepscale-squeezenet-1.0 cpu 125.432222 libdnn-cuda 19.493929 nvidia-cuda 16.057958 nvidia-cudnn 13.002875 nvidia-fp16-cuda 12.830479 nvidia-fp16-cudnn 8.969646 tensorrt-fp16 3.672901 tensorrt-fp32 7.444574 deepscale-squeezenet-1.1 cpu 67.141024 libdnn-cuda 10.617063 nvidia-cuda 9.852854 nvidia-cudnn 7.661396 nvidia-fp16-cuda 8.253071 nvidia-fp16-cudnn 5.012350 tensorrt-fp16 1.878606 tensorrt-fp32 3.555251 dtype: float64
plot_max_num_images_per_second(df_mean_time_per_image, libs_to_drop=[], fontsize=14)
# What is the batch size that gives the minimum time per image (or the maximum number of images per second)?
df_mean_time_per_image.idxmin(axis=1)
model lib bvlc-alexnet cpu 16 libdnn-cuda 16 nvidia-cuda 16 nvidia-cudnn 16 nvidia-fp16-cuda 12 nvidia-fp16-cudnn 16 tensorrt-fp16 16 tensorrt-fp32 14 bvlc-googlenet cpu 16 libdnn-cuda 16 nvidia-cuda 16 nvidia-cudnn 16 nvidia-fp16-cuda 16 nvidia-fp16-cudnn 16 tensorrt-fp16 10 tensorrt-fp32 10 deepscale-squeezenet-1.0 cpu 12 libdnn-cuda 14 nvidia-cuda 16 nvidia-cudnn 16 nvidia-fp16-cuda 16 nvidia-fp16-cudnn 16 tensorrt-fp16 16 tensorrt-fp32 14 deepscale-squeezenet-1.1 cpu 14 libdnn-cuda 16 nvidia-cuda 16 nvidia-cudnn 16 nvidia-fp16-cuda 14 nvidia-fp16-cudnn 16 tensorrt-fp16 14 tensorrt-fp32 14 dtype: int64
# Focus on e.g. nvidia-fp16-cuda, for which the batch size of 16 is not always the best.
df_mean_time_per_image.idxmin(axis=1).reorder_levels(['lib', 'model']).loc['nvidia-fp16-cuda']
model bvlc-alexnet 12 bvlc-googlenet 16 deepscale-squeezenet-1.0 16 deepscale-squeezenet-1.1 14 dtype: int64
# # Is the same answer as via .min(axis=1).values?
# df_mean_time_per_image.lookup(df_mean_time_per_image.index, df_mean_time_per_image.idxmin(axis=1)) \
# == df_mean_time_per_image.min(axis=1).values
df_time_per_image = df_time / (batch_sizes*(len(df_time.columns)/len(batch_sizes)))
df_min_time_per_image_index = pd.DataFrame(df_mean_time_per_image.idxmin(axis=1)).set_index(0, append=True).index.values
df_model_lib = df_time_per_image[df_min_time_per_image_index] \
.stack(['model', 'lib']).reorder_levels(['model','lib','repetition']).sum(axis=1)
df_model_lib_mean = df_model_lib.groupby(level=['model', 'lib']).mean()
df_model_lib_std = df_model_lib.groupby(level=['model', 'lib']).std()
zero_positive_infinity = df_model_lib_mean > 1e5
df_model_lib_mean[zero_positive_infinity] = 0
# exclude_positive_infinity = df_model_lib_mean < 1e6
# df_model_lib_mean = df_model_lib_mean[exclude_positive_infinity]
# df_model_lib_std = df_model_lib_std[exclude_positive_infinity]
mean = df_model_lib_mean.unstack('lib')
std = df_model_lib_std.unstack('lib')
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xabb69f70>
mean = df_model_lib_mean.unstack('lib').drop('cpu', axis=1)
std = df_model_lib_std.unstack('lib').drop('cpu', axis=1)
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa7c7ed50>
cuda_level_performance = ['nvidia-cuda', 'nvidia-cudnn', 'libdnn-cuda']
mean = df_model_lib_mean.reorder_levels(['lib', 'model'])[cuda_level_performance].unstack('lib')
std = df_model_lib_std.reorder_levels(['lib', 'model'])[cuda_level_performance].unstack('lib')
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa5e8d970>
cublas_libs = ['nvidia-cuda', 'nvidia-fp16-cuda']
mean = df_model_lib_mean.reorder_levels(['lib', 'model'])[cublas_libs].unstack('lib')
std = df_model_lib_std.reorder_levels(['lib', 'model'])[cublas_libs].unstack('lib')
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa562abd0>
# With cuBLAS, NVIDIA's fp16 branch is up to 20% faster than NVIDIA's fp32 mainline.
nvidia_fp16_cuda_vs_nvidia_fp32_cuda = mean['nvidia-fp16-cuda'] / mean['nvidia-cuda']
nvidia_fp16_cuda_vs_nvidia_fp32_cuda
model bvlc-alexnet 0.851548 bvlc-googlenet 0.906930 deepscale-squeezenet-1.0 0.799011 deepscale-squeezenet-1.1 0.837633 dtype: float64
cudnn_libs = ['nvidia-cudnn', 'nvidia-fp16-cudnn']
mean = df_model_lib_mean.reorder_levels(['lib', 'model'])[cudnn_libs].unstack('lib')
std = df_model_lib_std.reorder_levels(['lib', 'model'])[cudnn_libs].unstack('lib')
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa5665f90>
# With cuDNN, NVIDIA's fp16 branch is up to 35% (roughly one third) faster than NVIDIA's fp32 mainline.
nvidia_fp16_cudnn_vs_nvidia_fp32_cudnn = mean['nvidia-fp16-cudnn'] / mean['nvidia-cudnn']
nvidia_fp16_cudnn_vs_nvidia_fp32_cudnn
model bvlc-alexnet 0.683207 bvlc-googlenet 0.650557 deepscale-squeezenet-1.0 0.689820 deepscale-squeezenet-1.1 0.654235 dtype: float64
libs = [ 'nvidia-cudnn', 'nvidia-fp16-cudnn', 'tensorrt-fp32', 'tensorrt-fp16' ]
mean = df_model_lib_mean.reorder_levels(['lib', 'model'])[libs].unstack('lib')
std = df_model_lib_std.reorder_levels(['lib', 'model'])[libs].unstack('lib')
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa59cc550>
# TensorRT using fp16 is roughly twice as fast as TensorRT using fp32.
tensorrt_fp16_vs_tensorrt_fp32 = mean['tensorrt-fp16'] / mean['tensorrt-fp32']
tensorrt_fp16_vs_tensorrt_fp32
model bvlc-alexnet 0.524858 bvlc-googlenet 0.480515 deepscale-squeezenet-1.0 0.493366 deepscale-squeezenet-1.1 0.528403 dtype: float64
# TensorRT using fp32 is up to 54% faster than cuDNN using fp32.
tensorrt_fp32_vs_cudnn_fp32 = mean['tensorrt-fp32'] / mean['nvidia-cudnn']
tensorrt_fp32_vs_cudnn_fp32
model bvlc-alexnet 0.763825 bvlc-googlenet 0.635235 deepscale-squeezenet-1.0 0.572533 deepscale-squeezenet-1.1 0.464047 dtype: float64
# TensorRT using fp16 is up to 63% faster than cuDNN using fp16.
tensorrt_fp16_vs_cudnn_fp16 = mean['tensorrt-fp16'] / mean['nvidia-fp16-cudnn']
tensorrt_fp16_vs_cudnn_fp16
model bvlc-alexnet 0.586791 bvlc-googlenet 0.469198 deepscale-squeezenet-1.0 0.409481 deepscale-squeezenet-1.1 0.374795 dtype: float64
# TensorRT using fp16 up to 4 times faster than cuDNN using fp32.
tensorrt_fp16_vs_cudnn_fp32 = mean['tensorrt-fp16'] / mean['nvidia-cudnn']
tensorrt_fp16_vs_cudnn_fp32
model bvlc-alexnet 0.400900 bvlc-googlenet 0.305240 deepscale-squeezenet-1.0 0.282468 deepscale-squeezenet-1.1 0.245204 dtype: float64
mean = df_model_lib_mean.unstack('model')
std = df_model_lib_std.unstack('model')
plot(mean, std, rot=10)
<matplotlib.axes._subplots.AxesSubplot at 0xa9bc7450>
mean = df_model_lib_mean.unstack('model').drop('cpu', axis=0)
std = df_model_lib_std.unstack('model').drop('cpu', axis=0)
plot(mean, std, rot=10)
<matplotlib.axes._subplots.AxesSubplot at 0xa572dc70>
alexnet_level_accuracy = ['bvlc-alexnet','deepscale-squeezenet-1.0','deepscale-squeezenet-1.1']
# On this platform with all the libraries, SqueezeNet 1.0 is always slower than AlexNet
# despite a 50x reduction in weights (5 MB vs. 250 MB).
mean = df_model_lib_mean[alexnet_level_accuracy].unstack('model')
std = df_model_lib_std[alexnet_level_accuracy].unstack('model')
plot(mean, std, rot=10)
<matplotlib.axes._subplots.AxesSubplot at 0xa7eaf230>
# SqueezeNet 1.1 is 46% faster than AlexNet with OpenBLAS (on the CPU).
mean = df_model_lib_mean[alexnet_level_accuracy].unstack('model').ix[['cpu']]
std = df_model_lib_std[alexnet_level_accuracy].unstack('model').ix[['cpu']]
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa4e82e50>
mean['deepscale-squeezenet-1.1'] / mean['bvlc-alexnet']
lib cpu 0.540034 dtype: float64
# SqueezeNet 1.0 is slower than AlexNet. SqueezeNet 1.1 is 28% faster than AlexNet with
# libDNN-CUDA, and roughly equivalent to AlexNet with cuBLAS and cuDNN.
mean = df_model_lib_mean[alexnet_level_accuracy].unstack('model').ix[cuda_level_performance]
std = df_model_lib_std[alexnet_level_accuracy].unstack('model').ix[cuda_level_performance]
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa4e45430>
mean['deepscale-squeezenet-1.1'] / mean['bvlc-alexnet']
lib nvidia-cuda 0.977500 nvidia-cudnn 1.033459 libdnn-cuda 0.726139 dtype: float64
# SqueezeNet 1.1 achieves > 500 inferences per second with TensorRT using fp16.
mean = df_model_lib_mean[alexnet_level_accuracy].unstack('model').ix[['nvidia-fp16-cudnn', 'tensorrt-fp16']]
std = df_model_lib_std[alexnet_level_accuracy].unstack('model').ix[['nvidia-fp16-cudnn', 'tensorrt-fp16']]
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa4c1e1d0>
mean['deepscale-squeezenet-1.1'] / mean['bvlc-alexnet']
lib nvidia-fp16-cudnn 0.989633 tensorrt-fp16 0.632098 dtype: float64
# GoogleNet achieves nearly 200 inferences per second with TensorRT using fp16.
mean = df_model_lib_mean[['bvlc-googlenet']].unstack('model').ix[['nvidia-fp16-cudnn', 'tensorrt-fp16']]
std = df_model_lib_std[['bvlc-googlenet']].unstack('model').ix[['nvidia-fp16-cudnn', 'tensorrt-fp16']]
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa498feb0>
# TensorRT using fp16 is more than twice as fast than cuDNN using fp16.
mean.ix['tensorrt-fp16'] / mean.ix['nvidia-fp16-cudnn']
model bvlc-googlenet 0.469198 dtype: float64
images_per_second(mean['bvlc-googlenet'])
lib nvidia-fp16-cudnn 89.840365 tensorrt-fp16 191.476509 Name: bvlc-googlenet, dtype: float64
# Our results (with the batch size of 16) are very close to NVIDIA's results (with the batch size of 64).
Image(url="https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2016/09/Figure_2-1.png")
df_per_layer_info = get_per_layer_info(df_all)
# pd.options.display.max_columns = len(df_per_layer_info.columns)
# pd.options.display.max_rows = len(df_per_layer_info.index)
# df_per_layer_info
# Plot for a list of batch sizes.
# NB: This suggests that the fully connected layers benefit the most from larger batch sizes.
plot_time_per_image_per_layer(df_per_layer_info, model='bvlc-alexnet', libs='nvidia-cuda',
batch_sizes=[2, 8, 16], direction=direction)
# Plot for a list of batch sizes. Only plot layers that consume at least 10% of the total execution time.
plot_time_per_image_per_layer(df_per_layer_info, model='bvlc-alexnet', libs='nvidia-cudnn',
batch_sizes=[8, 16], direction=direction, lower=0.10, rot=0)
# Plot for a list of libs.
# NB: cuDNN and cuBLAS perform about the same on the fully connected layers (which suggests that
# cuDNN falls back to cuBLAS for these).
# Unsurprisingly, cuDNN performs better than cuBLAS on the convolution layers.
# Surprisingly, cuBLAS performs a bit better than cuDNN on the relu layers.
plot_time_per_image_per_layer(df_per_layer_info, model='bvlc-alexnet', libs=['nvidia-cuda','nvidia-cudnn'],
batch_sizes=16, direction=direction)
# Plot for a list of libs.
# NB: This suggests that libDNN is faster than cuDNN on the conv1 and expand1x1 layers, but slower on the squeeze1x1,
# expand3x3, conv/pool10 layers. (Recall that libDNN is not yet tuned for TX1 but uses parameters optimal for GTX 1080.)
plot_time_per_image_per_layer(df_per_layer_info, model='deepscale-squeezenet-1.1', libs=['nvidia-cudnn', 'libdnn-cuda'],
batch_sizes=16, direction=direction, ymax=0.65)
# Plot for a list of libs. Only plot layers that consume between 5% and 10% of the total execution time.
# NB: libDNN is slower than cuDNN on the expand3x3 layers and conv10 layers, but a bit faster on the conv1 layer.
plot_time_per_image_per_layer(df_per_layer_info, model='deepscale-squeezenet-1.1', libs=['nvidia-cudnn', 'libdnn-cuda'],
batch_sizes=16, direction=direction, lower=0.05, upper=0.10, rot=10)
# Plot for a list of libs and a list of batch sizes. (This works but might not be terribly legible).
plot_time_per_image_per_layer(df_per_layer_info, model='bvlc-alexnet', libs=['nvidia-cudnn', 'nvidia-cuda'],
batch_sizes=[4,6], direction=direction)
Overall, using cuDNN typically results in the minimum execution time. For some layers, however, other libraries may outperform cuDNN (e.g. libDNN from the OpenCL branch of Caffe). As we show below, using the best performing library per layer results in up to 17% execution time reduction over using cuDNN alone. For other models and on other platforms such adaptation can potentially results even in higher savings (e.g. up to 22% on the GTX 1080).
NB: Currently, the savings are hypothetical. However, Caffe allows for manual adaptation, i.e. the user can specify the engine to use for each layer in the model file (*.prototxt
). We are working on generating the optimized model file automatically from the obtained ideal adaptive solution. Please contact us if you are interested.
all_libs = df_per_layer_info.index.get_level_values('lib').drop_duplicates() \
.drop(['nvidia-fp16-cuda', 'nvidia-fp16-cudnn', 'tensorrt-fp16', 'tensorrt-fp32'])
all_libs
Index([u'cpu', u'libdnn-cuda', u'nvidia-cuda', u'nvidia-cudnn'], dtype='object', name=u'lib')
Each row specifies an ideal adaptive solution for a model. Each column specifies the execution time (in ms per image) that the ideal adaptive solution would cumulatively spend using a particular library.
df_ideal_all = get_ideal_adaptive_solution(df_per_layer_info, all_libs, direction)
df_ideal_all
lib | cpu | libdnn-cuda | nvidia-cuda | nvidia-cudnn |
---|---|---|---|---|
model | ||||
deepscale-squeezenet-1.1 | 0.002339 | 0.825522 | 2.406508 | 3.063189 |
bvlc-alexnet | 0.000500 | 1.656031 | 0.616434 | 4.804996 |
deepscale-squeezenet-1.0 | 0.002098 | 0.861213 | 3.959106 | 6.824531 |
bvlc-googlenet | 0.003723 | 0.166052 | 4.076741 | 11.693360 |
plot_ideal_adaptive_solution(df_ideal_all, df_model_lib_mean)
# Up to 17% execution time reduction compared to the best non-adaptive solution (i.e. cuDNN).
df_best_lib = df_model_lib_mean.reorder_levels(['lib', 'model'])[cuda_level_performance].unstack('lib')
df_ideal_all.sum(axis=1) / df_best_lib.min(axis=1)
model bvlc-alexnet 0.954758 bvlc-googlenet 0.931626 deepscale-squeezenet-1.0 0.895721 deepscale-squeezenet-1.1 0.821986 dtype: float64
df_ideal_cuda = get_ideal_adaptive_solution(df_per_layer_info, cuda_level_performance, direction)
df_ideal_cuda
lib | nvidia-cuda | nvidia-cudnn | libdnn-cuda |
---|---|---|---|
model | |||
deepscale-squeezenet-1.1 | 2.414011 | 3.066857 | 0.825522 |
bvlc-alexnet | 0.616434 | 4.806335 | 1.656031 |
deepscale-squeezenet-1.0 | 3.962985 | 6.830830 | 0.861213 |
bvlc-googlenet | 4.077942 | 11.704643 | 0.166052 |
plot_ideal_adaptive_solution(df_ideal_cuda, df_model_lib_mean)
# Hypothetical execution time reduction compared to the best non-adaptive solution (i.e. cuDNN).
df_best_lib = df_model_lib_mean.reorder_levels(['lib', 'model'])[cuda_level_performance].unstack('lib')
df_ideal_cuda.sum(axis=1) / df_best_lib.min(axis=1)
model bvlc-alexnet 0.954871 bvlc-googlenet 0.932139 deepscale-squeezenet-1.0 0.896342 deepscale-squeezenet-1.1 0.823139 dtype: float64
# Up to 0.1% worse performance when using the CUDA-level performance libs only.
df_ideal_cuda.sum(axis=1) / df_ideal_all.sum(axis=1)
model deepscale-squeezenet-1.1 1.001402 bvlc-alexnet 1.000119 deepscale-squeezenet-1.0 1.000694 bvlc-googlenet 1.000550 dtype: float64
df_ideal_cudnn_cublas = get_ideal_adaptive_solution(df_per_layer_info, ['nvidia-cudnn', 'nvidia-cuda'], direction)
df_ideal_cudnn_cublas
lib | nvidia-cudnn | nvidia-cuda |
---|---|---|
model | ||
deepscale-squeezenet-1.1 | 4.323794 | 2.474317 |
bvlc-alexnet | 6.469998 | 0.616434 |
deepscale-squeezenet-1.0 | 8.072759 | 3.962985 |
bvlc-googlenet | 11.917162 | 4.077942 |
plot_ideal_adaptive_solution(df_ideal_cudnn_cublas, df_model_lib_mean)
# Hypothetical execution time reduction compared to the best non-adaptive solution (i.e. cuDNN).
df_best_lib = df_model_lib_mean.reorder_levels(['lib', 'model'])[cuda_level_performance].unstack('lib')
df_ideal_cudnn_cublas.sum(axis=1) / df_best_lib.min(axis=1)
model bvlc-alexnet 0.955901 bvlc-googlenet 0.934854 deepscale-squeezenet-1.0 0.925622 deepscale-squeezenet-1.1 0.887320 dtype: float64
# Up to 8% worse performance when using cuDNN+cuBLAS only.
df_ideal_cudnn_cublas.sum(axis=1) / df_ideal_all.sum(axis=1)
model deepscale-squeezenet-1.1 1.079484 bvlc-alexnet 1.001197 deepscale-squeezenet-1.0 1.033382 bvlc-googlenet 1.003465 dtype: float64
df_ideal_cudnn_libdnn = get_ideal_adaptive_solution(df_per_layer_info, ['nvidia-cudnn', 'libdnn-cuda'], direction)
df_ideal_cudnn_libdnn
lib | nvidia-cudnn | libdnn-cuda |
---|---|---|
model | ||
deepscale-squeezenet-1.1 | 4.672342 | 1.735223 |
bvlc-alexnet | 5.180496 | 1.925741 |
deepscale-squeezenet-1.0 | 9.148563 | 2.605430 |
bvlc-googlenet | 14.880228 | 1.208790 |
plot_ideal_adaptive_solution(df_ideal_cudnn_libdnn, df_model_lib_mean)
# Hypothetical execution time reduction compared to the best non-adaptive solution (i.e. cuDNN).
df_best_lib = df_model_lib_mean.reorder_levels(['lib', 'model'])[cuda_level_performance].unstack('lib')
df_ideal_cudnn_libdnn.sum(axis=1) / df_best_lib.min(axis=1)
model bvlc-alexnet 0.958573 bvlc-googlenet 0.940343 deepscale-squeezenet-1.0 0.903953 deepscale-squeezenet-1.1 0.836344 dtype: float64
# Less than 2% worse performance when using cuDNN+libDNN only.
df_ideal_cudnn_libdnn.sum(axis=1) / df_ideal_all.sum(axis=1)
model deepscale-squeezenet-1.1 1.017468 bvlc-alexnet 1.003995 deepscale-squeezenet-1.0 1.009191 bvlc-googlenet 1.009356 dtype: float64
df_memory = df_all['memory (MB)']
# Batch size of 4; repetition 0 (should always be available).
df_memory = df_memory.unstack(['model','lib']).loc[4].loc[0].unstack('lib')
plot(mean=df_memory, std=pd.DataFrame(), ylabel='Memory size (MB)')
<matplotlib.axes._subplots.AxesSubplot at 0xa3a88e30>
The above, however, does not tell the full story. The memory consumption, as reported by Caffe, increases linearly with the batch size. In other words, the memory consumption per image is constant. (Note that extra memory may be required e.g. for GPU buffers in host memory.)
The execution time per image, however, decreases asymptotically. Since minimizing the execution time almost always should be balanced with minimizing the memory consumption, we should select the batch size that results in "good enough" performance.
We give several examples below. Note that the execution time per batch is omitted to make the execution time per image more pronounced.
# Is the batch size of 8 "good enough"?
plot_time_per_image_and_memory_consumption(df_all, 'bvlc-alexnet', 'nvidia-cudnn')
# Is the batch size of 2 "good enough"?
plot_time_per_image_and_memory_consumption(df_all, 'deepscale-squeezenet-1.1', 'cpu')
# Is the batch size of 10 "good enough"?
plot_time_per_image_and_memory_consumption(df_all, 'bvlc-googlenet', 'tensorrt-fp32')
# Is the batch size of 8 "good enough" (below 2 ms per image)?
plot_time_per_image_and_memory_consumption(df_all, 'deepscale-squeezenet-1.1', 'tensorrt-fp16')
# SqueezeNet consumes about 4 times more memory than AlexNet.
df_memory.ix[['bvlc-alexnet', 'deepscale-squeezenet-1.1']].iloc[1] / \
df_memory.ix[['bvlc-alexnet', 'deepscale-squeezenet-1.1']].iloc[0]
lib cpu 3.882476 libdnn-cuda 3.882476 nvidia-cuda 3.882476 nvidia-cudnn 3.882476 nvidia-fp16-cuda 4.113806 nvidia-fp16-cudnn 4.113806 tensorrt-fp16 NaN tensorrt-fp32 NaN dtype: float64
# Using TensorRT, SqueezeNet is over 1.5 times faster than AlexNet.
mean = df_model_lib_mean[['bvlc-alexnet', 'deepscale-squeezenet-1.1']].unstack('lib')
std = df_model_lib_std[['bvlc-alexnet', 'deepscale-squeezenet-1.1']].unstack('lib')
plot(mean, std)
<matplotlib.axes._subplots.AxesSubplot at 0xa3ac7b30>
# Using TensorRT, SqueezeNet is over 1.5 times faster than AlexNet.
df_model_lib_mean[['bvlc-alexnet', 'deepscale-squeezenet-1.1']].unstack('lib').iloc[1] / \
df_model_lib_mean[['bvlc-alexnet', 'deepscale-squeezenet-1.1']].unstack('lib').iloc[0]
lib cpu 0.540034 libdnn-cuda 0.726139 nvidia-cuda 0.977500 nvidia-cudnn 1.033459 nvidia-fp16-cuda 0.961526 nvidia-fp16-cudnn 0.989633 tensorrt-fp16 0.632098 tensorrt-fp32 0.627858 dtype: float64
# TensorRT-fp16 is up to 69x faster than the CPU.
plot_speedup_over_baseline(df_mean_time_per_image, baseline='cpu', libs_to_drop=[], fontsize=12)
# cuDNN-fp16 is about 2x faster than cuBLAS-fp32. TensorRT-fp16 is up to 5.2x faster than cuBLAS-fp32.
plot_speedup_over_baseline(df_mean_time_per_image, baseline='nvidia-cuda', libs_to_drop=[], fontsize=12)
# TensorRT-fp16 is up to 2.7x faster than cuDNN-fp16.
plot_speedup_over_baseline(df_mean_time_per_image, baseline='nvidia-fp16-cudnn', libs_to_drop=[], fontsize=12)
# AlexNet and SqueezeNet 1.1 have very similar performance with cuBLAS and cuDNN. (They also have very similar accuracy!)
# SqueezeNet, however, benefits much more from TensorRT optimizations, becoming over 1.5 times faster than AlexNet.
# At the same time, SqueezeNet requires about 4 times more memory than AlexNet (at least, with Caffe), so it's a trade-off.
plot_speedup_over_baseline(df_mean_time_per_image.ix[['bvlc-alexnet', 'deepscale-squeezenet-1.1']],
baseline='nvidia-cuda', libs_to_drop=[], fontsize=18)
In Nov/2015, NVIDIA published a whitepaper entitled "GPU-Based Deep Learning Inference: A Performance and Power Analysis", which presented the throughput of inference (images per second) on AlexNet and GoogleNet using small and large batch sizes.
Several points of difference complicate direct comparison:
nvidia_data = []
nvidia_data.append({
'source':'nvidia', 'model':'bvlc-alexnet', 'batch size':1, 'clock':'690 MHz',
'fp32 (images/s)':47, 'fp16 (images/s)':67
})
nvidia_data.append({
'source':'nvidia', 'model':'bvlc-alexnet', 'batch size':128, 'clock':'690 MHz',
'fp32 (images/s)':155, 'fp16 (images/s)':258
})
nvidia_data.append({
'source':'nvidia', 'model':'bvlc-googlenet', 'batch size':1, 'clock':'690 MHz',
'fp32 (images/s)':33, 'fp16 (images/s)':33
})
nvidia_data.append({
'source':'nvidia', 'model':'bvlc-googlenet', 'batch size':64, 'clock':'690 MHz',
'fp32 (images/s)':52, 'fp16 (images/s)':75
})
dividiti_data = []
dividiti_data.append({
'source':'dividiti', 'model':'bvlc-alexnet', 'batch size':2, 'clock':'998 MHz',
'fp32 (images/s)':(1000/(df_time['bvlc-alexnet','nvidia-cudnn',2].mean()/2)),
'fp16 (images/s)':(1000/(df_time['bvlc-alexnet','nvidia-fp16-cudnn',2].mean()/2)),
})
dividiti_data.append({
'source':'dividiti', 'model':'bvlc-alexnet', 'batch size':16, 'clock':'998 MHz',
'fp32 (images/s)':(1000/(df_time['bvlc-alexnet','nvidia-cudnn',16].mean()/16)),
'fp16 (images/s)':(1000/(df_time['bvlc-alexnet','nvidia-fp16-cudnn',16].mean()/16)),
})
dividiti_data.append({
'source':'dividiti', 'model':'bvlc-googlenet', 'batch size':2, 'clock':'998 MHz',
'fp32 (images/s)':(1000/(df_time['bvlc-googlenet','nvidia-cudnn',2].mean()/2)),
'fp16 (images/s)':(1000/(df_time['bvlc-googlenet','nvidia-fp16-cudnn',2].mean()/2)),
})
dividiti_data.append({
'source':'dividiti', 'model':'bvlc-googlenet', 'batch size':16, 'clock':'998 MHz',
'fp32 (images/s)':(1000/(df_time['bvlc-googlenet','nvidia-cudnn',16].mean()/16)),
'fp16 (images/s)':(1000/(df_time['bvlc-googlenet','nvidia-fp16-cudnn',16].mean()/16)),
})
pd.DataFrame(nvidia_data+dividiti_data).set_index(['model','batch size','source','clock']).sortlevel()
fp16 (images/s) | fp32 (images/s) | ||||
---|---|---|---|---|---|
model | batch size | source | clock | ||
bvlc-alexnet | 1 | nvidia | 690 MHz | 67.000000 | 47.000000 |
2 | dividiti | 998 MHz | 106.877381 | 65.283171 | |
16 | dividiti | 998 MHz | 197.438889 | 134.891707 | |
128 | nvidia | 690 MHz | 258.000000 | 155.000000 | |
bvlc-googlenet | 1 | nvidia | 690 MHz | 33.000000 | 33.000000 |
2 | dividiti | 998 MHz | 67.485718 | 35.521055 | |
16 | dividiti | 998 MHz | 89.840365 | 58.446279 | |
64 | nvidia | 690 MHz | 75.000000 | 52.000000 |
# Scale dividiti's data.
pd.concat([
pd.DataFrame(nvidia_data).set_index(['model','batch size','source']).drop(labels='clock',axis=1),
pd.DataFrame(dividiti_data).set_index(['model','batch size','source']).drop(labels='clock',axis=1)*(690.0/998.0)
]).sortlevel()
fp16 (images/s) | fp32 (images/s) | |||
---|---|---|---|---|
model | batch size | source | ||
bvlc-alexnet | 1 | nvidia | 67.000000 | 47.000000 |
2 | dividiti | 73.893179 | 45.135659 | |
16 | dividiti | 136.505845 | 93.261802 | |
128 | nvidia | 258.000000 | 155.000000 | |
bvlc-googlenet | 1 | nvidia | 33.000000 | 33.000000 |
2 | dividiti | 46.658463 | 24.558646 | |
16 | dividiti | 62.114080 | 40.408750 | |
64 | nvidia | 75.000000 | 52.000000 |
With scaling dividiti's data to the lower frequency of 690 MHz, NVIDIA's fp16 performance figures look in order. NVIDIA's fp32 performance figure on GoogleNet using the batch size of 1 (33 images/s) is 50% higher than expected (a copy-and-paste error from the fp16 figure?). NVIDIA's performance figures on AlexNet using the batch size of 128 also seem to be on the high side given the observed trend of improvements tailing off pretty rapidly.
According to Movidius, using fp16 their Myriad2 processor runs GoogleNet at 15 images per second (~67 ms per image), while consumes about 1 Watt of power.
According to the NVIDIA 2015 whitepaper, using fp16 their TX1 processor runs GoogleNet at up to 75 images per second, while consumes up to 6 Watts of power. This would make TX1 and Myriad2 roughly comparable in terms of images per second per unit power: ~13 vs 15 images per second per Watt.
However, the performance improvements brought by TensorRT ( ~2x vs cuDNN) swing the comparison in favour of NVIDIA even for small batch sizes: ~18-22 images per second per Watt, according to their blog introducing TensorRT.
Image(url="https://devblogs.nvidia.com/parallelforall/wp-content/uploads/2016/09/Figure_3-1.png")
A suite of open-source tools for collecting knowledge on optimising AI: