H2O.ai XGBoost GPU Benchmarks¶

In this notebook, we benchmark the latest version of XGBoost, the well-known Kaggle-winning gradient boosting algorithm, and in particular, the XGBoost GPU plugin. We also showcase the integration of XGBoost (incl. the GPU version) into H2O.

In [1]:

## For comparison between 1 GPU and 1 CPU, we use only 1 CPU:
#numactl -C 0 -N 0 -m 0 jupyter notebook

## This will ensure that we only use the first CPU on multi-CPU systems

1CPU

In [2]:

## First time only: install xgboost and H2O, and restart the kernel afterwards
if False:
    ## Build XGBoost from source and install its Python module
    import os
    os.system("mkdir -p tmp && cd tmp && git clone https://github.com/h2oai/xgboost --recursive && cd xgboost && mkdir build && cd build && cmake .. -DPLUGIN_UPDATER_GPU=ON -DCUB_DIRECTORY=../cub -DCUDA_NVCC_FLAGS=\"--expt-extended-lambda -arch=sm_30\" && make -j; make; cd ../python-package && python3.6 setup.py install")

    ## Download and install H2O and its Python module
    os.system("cd tmp && wget http://h2o-release.s3.amazonaws.com/h2o/rel-vajda/1/h2o-3.10.5.1.zip && unzip h2o-3.10.5.1.zip")
    os.system("python3.6 -m pip install h2o-3.10.5.1/python/h2o-3.10.5.1-py2.py3-none-any.whl --upgrade")
    
    ## restart the kernel!

In [3]:

%matplotlib inline
import xgboost as xgb
import pandas as pd
import numpy as np
import scipy as sp
import os
import time
from sklearn import metrics

In [4]:

path = "/opt/higgs_head_2M.csv"
if not os.path.exists(path):
    os.system("cd /opt/ && wget https://s3.amazonaws.com/h2o-public-test-data/bigdata/laptop/higgs_head_2M.csv")
num_class = 2
num_round = 100
learn_rate = 0.02
max_depth = 10

## Parse data into a Pandas Frame
df = pd.read_csv(path, header=None)

In [5]:

df_target = df.iloc[:,0]
df.drop(df.iloc[:,0], axis=1, inplace=True)
cols = df.columns.values
df.shape

Out[5]:

(2000000, 27)

In [6]:

train = df

In [7]:

train_target = df_target

In [8]:

print(train.shape)

(2000000, 27)

In [9]:

!lscpu

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                80
On-line CPU(s) list:   0-79
Thread(s) per core:    2
Core(s) per socket:    20
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2201.000
CPU max MHz:           2201.0000
CPU min MHz:           1200.0000
BogoMIPS:              4391.41
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              51200K
NUMA node0 CPU(s):     0-19,40-59
NUMA node1 CPU(s):     20-39,60-79
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

In [10]:

!cat /proc/meminfo | grep MemTotal

MemTotal:       528278376 kB

In [11]:

!nvidia-smi -L

GPU 0: Tesla P100-SXM2-16GB (UUID: GPU-77864ed9-e817-3463-e5be-bf4c4c563a4c)
GPU 1: Tesla P100-SXM2-16GB (UUID: GPU-e8ece262-f677-038f-5b40-6cc86f978487)
GPU 2: Tesla P100-SXM2-16GB (UUID: GPU-7d996bf3-876b-1439-e447-ec8fb235fd98)
GPU 3: Tesla P100-SXM2-16GB (UUID: GPU-151287c0-bb97-6f82-fedb-a88273ee5447)
GPU 4: Tesla P100-SXM2-16GB (UUID: GPU-99c5cdb5-54b9-ccd2-eb51-a32850d37ebf)
GPU 5: Tesla P100-SXM2-16GB (UUID: GPU-d2d22332-f462-061e-606a-3d6a5ea89848)
GPU 6: Tesla P100-SXM2-16GB (UUID: GPU-5cc8b757-ee04-3bbb-5866-1f9f8c404a54)
GPU 7: Tesla P100-SXM2-16GB (UUID: GPU-3016922d-1da7-46cd-17f3-cb6b3ab17bc3)

In [12]:

def runXGBoost(param):
    have_updater = "updater" in param.keys()
    label = "XGBoost " \
        + ("GPU hist" if have_updater and param["updater"]=="grow_gpu_hist" else "GPU exact" if have_updater and param["updater"]=="grow_gpu" else "CPU") \
        + " " + (param["tree_method"] if "updater" not in param.keys() else "")
    print(label)
    print("=====================")
    for k, v in param.items():
        print(k, v)
    print("=====================")
    
    t_start = time.time()
    dtrain = xgb.DMatrix(train.values, label = train_target.values, feature_names=[str(c) for c in cols])
    tt = time.time() - t_start
    print("Time to create DMatrix (sec): ", tt)
    dmatrix_times.append(tt)
    
    t_start = time.time()
    bst = xgb.train(param, dtrain, num_round)
    tt = time.time() - t_start
    print("Time to train (sec): ", tt)
    train_times.append(tt)

    t_start = time.time()
    preds = bst.predict(dtrain)
    tt = time.time() - t_start
    print("Time to predict (sec): ", tt)
    score_times.append(tt)

    labels = dtrain.get_label()
    auc = metrics.roc_auc_score(labels, preds)
    print("Training AUC:", auc)
    valid_aucs.append(auc)
    plot_labels.append(label)
    
    fs = bst.get_fscore()
    
    # Optional: Uncomment to show variable importance
    #varimp = pd.DataFrame({'Importance': list(fs.values()), 'Feature': list(fs.keys())})
    #varimp.sort_values(by = 'Importance', inplace = True, ascending = False)
    #varimp.head(10).plot(label='importance',kind="barh",x="Feature",y="Importance").invert_yaxis()

In [13]:

valid_aucs = []
dmatrix_times = []
train_times = []
score_times = []
plot_labels = []

In [14]:

param = {
    "objective":('reg:logistic' if num_class>1 else 'reg:linear')
    , "max_depth":max_depth
    , "eta":learn_rate
    , "tree_method":"exact"
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
}
runXGBoost(param)

XGBoost CPU exact
=====================
objective reg:logistic
max_depth 10
eta 0.02
tree_method exact
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
=====================
Time to create DMatrix (sec):  1.2587285041809082
Time to train (sec):  160.4102680683136
Time to predict (sec):  0.009805917739868164
Training AUC: 0.814825032969

In [15]:

param = {
    "objective":('reg:logistic' if num_class>1 else 'reg:linear')
    , "max_depth":max_depth
    , "eta":learn_rate
    , "tree_method":"approx"
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
}
runXGBoost(param)

XGBoost CPU approx
=====================
objective reg:logistic
max_depth 10
eta 0.02
tree_method approx
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
=====================
Time to create DMatrix (sec):  1.2151424884796143
Time to train (sec):  96.42245173454285
Time to predict (sec):  0.003930091857910156
Training AUC: 0.812860299622

In [16]:

param = {
    "objective":('reg:logistic' if num_class>1 else 'reg:linear')
    , "max_depth":max_depth
    , "eta":learn_rate
    , "tree_method":"hist"
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
}
runXGBoost(param)

XGBoost CPU hist
=====================
objective reg:logistic
max_depth 10
eta 0.02
tree_method hist
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
=====================
Time to create DMatrix (sec):  1.0287103652954102
Time to train (sec):  73.14299082756042
Time to predict (sec):  0.004008293151855469
Training AUC: 0.813941675296

In [17]:

param = {
    "objective":('reg:logistic' if num_class>1 else 'reg:linear')
    , "max_depth":max_depth
    , "eta":learn_rate
    , "tree_method":"exact"
    , "updater":"grow_gpu"
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
}
runXGBoost(param)

XGBoost GPU exact 
=====================
objective reg:logistic
max_depth 10
eta 0.02
tree_method exact
updater grow_gpu
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
=====================
Time to create DMatrix (sec):  1.0280582904815674
Time to train (sec):  73.03138780593872
Time to predict (sec):  0.007105827331542969
Training AUC: 0.814253361342

In [18]:

param = {
    "objective":('reg:logistic' if num_class>1 else 'reg:linear')
    , "max_depth":max_depth
    , "eta":learn_rate
    , "tree_method":"exact"
    , "updater":"grow_gpu_hist"
    , "n_gpus":1
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
}
runXGBoost(param)

XGBoost GPU hist 
=====================
objective reg:logistic
max_depth 10
eta 0.02
tree_method exact
updater grow_gpu_hist
n_gpus 1
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
=====================
Time to create DMatrix (sec):  1.0200557708740234
Time to train (sec):  5.28957724571228
Time to predict (sec):  0.00944375991821289
Training AUC: 0.813648642801

In [19]:

data = pd.DataFrame({'algorithm'  :plot_labels,
                     'dmatrix time':dmatrix_times,
                     'training time':train_times,
                     'scoring time':score_times,
                     'training AUC' :valid_aucs}).sort_values(by="training time")
data

Out[19]:

	algorithm	dmatrix time	scoring time	training AUC	training time
4	XGBoost GPU hist	1.020056	0.009444	0.813649	5.289577
3	XGBoost GPU exact	1.028058	0.007106	0.814253	73.031388
2	XGBoost CPU hist	1.028710	0.004008	0.813942	73.142991
1	XGBoost CPU approx	1.215142	0.003930	0.812860	96.422452
0	XGBoost CPU exact	1.258729	0.009806	0.814825	160.410268

In [20]:

data.plot(label="training time",kind='barh',x='algorithm',y='training time')
data.plot(title="training AUC",kind='barh',x='algorithm',y='training AUC',legend=False)

Out[20]:

<matplotlib.axes._subplots.AxesSubplot at 0x2ab48bd1d320>

Now call XGBoost from H2O¶

In [35]:

import h2o
from h2o.estimators import H2OXGBoostEstimator
h2o.init()

t_start = time.time()
df_hex = h2o.import_file(path)
print("Time to parse by H2O (sec): ", time.time() - t_start)

trainhex = df_hex
trainhex[0] = (trainhex[0]).asfactor()

Checking whether there is an H2O instance running at http://localhost:54321..... not found.
Attempting to start a local H2O server...
  Java Version: java version "1.8.0_131"; Java(TM) SE Runtime Environment (build 1.8.0_131-b11); Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
  Starting server from /usr/local/lib/python3.6/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmpbz9cy2k1
  JVM stdout: /tmp/tmpbz9cy2k1/h2o_nimbix_started_from_python.out
  JVM stderr: /tmp/tmpbz9cy2k1/h2o_nimbix_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321... successful.

H2O cluster uptime:	01 secs
H2O cluster version:	3.10.5.1
H2O cluster version age:	6 days
H2O cluster name:	H2O_from_python_nimbix_6vso2s
H2O cluster total nodes:	1
H2O cluster free memory:	26.67 Gb
H2O cluster total cores:	40
H2O cluster allowed cores:	40
H2O cluster status:	accepting new members, healthy
H2O connection url:	http://127.0.0.1:54321
H2O connection proxy:	None
H2O internal security:	False
Python version:	3.6.1 final

Parse progress: |█████████████████████████████████████████████████████████| 100%
Time to parse by H2O (sec):  3.3415310382843018

In [23]:

def runH2OXGBoost(param):
    label = "H2O XGBoost " \
    + ("GPU" if "backend" in param.keys() and "gpu"==param["backend"] else "CPU") \
    + (" " + param["tree_method"] if "tree_method" in param.keys() else "")
    print(label)
    print("=====================")
    for k, v in param.items():
        print(k, v)
    print("=====================")
        
    t_start = time.time()
    model = H2OXGBoostEstimator(**param)
    model.train(x = list(range(1,trainhex.shape[1])), y = 0, training_frame = trainhex)
    tt = time.time() - t_start
    print("Time to train (sec): ", tt)
    h2o_train_times.append(tt)

    t_start = time.time()
    preds = model.predict(trainhex)[:,2]
    tt = time.time() - t_start
    print("Time to predict (sec): ", tt)
    h2o_score_times.append(tt)

    preds = h2o.as_list(preds)
    labels = train_target.values
    auc = metrics.roc_auc_score(labels, preds)
    print("Training AUC:", auc)

    h2o_valid_aucs.append(auc)
    h2o_plot_labels.append(label)
    
    #pd.DataFrame(model.varimp(),columns=["Feature","","Importance",""]).head(10).plot(label='importance',kind="barh",x="Feature",y="Importance").invert_yaxis()

In [24]:

h2o_valid_aucs = []
h2o_train_times = []
h2o_score_times = []
h2o_plot_labels = []

In [25]:

param = {
    "ntrees":num_round
    , "max_depth":max_depth
    , "eta":learn_rate
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
    , "score_tree_interval":num_round
    , "backend":"cpu"
    , "tree_method":"exact"
}
runH2OXGBoost(param)

H2O XGBoost CPU exact
=====================
ntrees 100
max_depth 10
eta 0.02
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
score_tree_interval 100
backend cpu
tree_method exact
=====================
xgboost Model Build progress: |███████████████████████████████████████████| 100%
Time to train (sec):  192.16322207450867
xgboost prediction progress: |████████████████████████████████████████████| 100%
Time to predict (sec):  7.761167287826538
Training AUC: 0.819896987781

In [26]:

param = {
    "ntrees":num_round
    , "max_depth":max_depth
    , "eta":learn_rate
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
    , "score_tree_interval":num_round
    , "backend":"cpu"
    , "tree_method":"approx"
}
runH2OXGBoost(param)

H2O XGBoost CPU approx
=====================
ntrees 100
max_depth 10
eta 0.02
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
score_tree_interval 100
backend cpu
tree_method approx
=====================
xgboost Model Build progress: |███████████████████████████████████████████| 100%
Time to train (sec):  166.70660161972046
xgboost prediction progress: |████████████████████████████████████████████| 100%
Time to predict (sec):  7.733324766159058
Training AUC: 0.818158055647

In [27]:

param = {
    "ntrees":num_round
    , "max_depth":max_depth
    , "eta":learn_rate
    , "subsample":0.7
    , "colsample_bytree":0.9
    , "min_child_weight":5
    , "seed":12345
    , "score_tree_interval":num_round
    , "backend":"cpu"
    , "tree_method":"hist"
}
runH2OXGBoost(param)

H2O XGBoost CPU hist
=====================
ntrees 100
max_depth 10
eta 0.02
subsample 0.7
colsample_bytree 0.9
min_child_weight 5
seed 12345
score_tree_interval 100
backend cpu
tree_method hist
=====================
xgboost Model Build progress: |███████████████████████████████████████████| 100%
Time to train (sec):  114.9106297492981
xgboost prediction progress: |████████████████████████████████████████████| 100%
Time to predict (sec):  7.72308611869812
Training AUC: 0.819105186679

In [28]:

param = {
    "ntrees":num_round
    , "max_depth":max_depth
    , "learn_rate":learn_rate
    , "sample_rate":0.7
    , "col_sample_rate_per_tree":0.9
    , "min_rows":5
    , "seed":12345
    , "score_tree_interval":num_round
    , "backend":"gpu"
    , "tree_method":"exact"
}
runH2OXGBoost(param)

H2O XGBoost GPU exact
=====================
ntrees 100
max_depth 10
learn_rate 0.02
sample_rate 0.7
col_sample_rate_per_tree 0.9
min_rows 5
seed 12345
score_tree_interval 100
backend gpu
tree_method exact
=====================
xgboost Model Build progress: |███████████████████████████████████████████| 100%
Time to train (sec):  85.09905505180359
xgboost prediction progress: |████████████████████████████████████████████| 100%
Time to predict (sec):  7.728891849517822
Training AUC: 0.819862038742

In [29]:

param = {
    "ntrees":num_round
    , "max_depth":max_depth
    , "learn_rate":learn_rate
    , "sample_rate":0.7
    , "col_sample_rate_per_tree":0.9
    , "min_rows":5
    , "seed":12345
    , "score_tree_interval":num_round
    , "backend":"gpu"
    , "tree_method":"hist"
}
runH2OXGBoost(param)

H2O XGBoost GPU hist
=====================
ntrees 100
max_depth 10
learn_rate 0.02
sample_rate 0.7
col_sample_rate_per_tree 0.9
min_rows 5
seed 12345
score_tree_interval 100
backend gpu
tree_method hist
=====================
xgboost Model Build progress: |███████████████████████████████████████████| 100%
Time to train (sec):  17.322280406951904
xgboost prediction progress: |████████████████████████████████████████████| 100%
Time to predict (sec):  7.722412109375
Training AUC: 0.819402793669

H2O GBM (CPU)¶

In [30]:

from h2o.estimators.gbm import H2OGradientBoostingEstimator
param = {
      "ntrees":num_round
    , "max_depth":max_depth
    , "learn_rate":learn_rate
    , "sample_rate":0.7
    , "col_sample_rate_per_tree":0.9
    , "min_rows":5
    , "seed":12345
    , "score_tree_interval":num_round
}

t_start = time.time()
model = H2OGradientBoostingEstimator(**param)
model.train(x = list(range(1,trainhex.shape[1])), y = 0, training_frame = trainhex)
tt = time.time() - t_start
print("Time to train (sec): ", tt)
h2o_train_times.append(tt)

t_start = time.time()
preds = model.predict(trainhex)[:,2]
tt = time.time() - t_start
print("Time to predict (sec): ", tt)
h2o_score_times.append(tt)

preds = h2o.as_list(preds)
labels = train_target.values
auc = metrics.roc_auc_score(labels, preds)
print("AUC:", auc)

h2o_valid_aucs.append(auc)
h2o_plot_labels.append("H2O GBM CPU")

gbm Model Build progress: |███████████████████████████████████████████████| 100%
Time to train (sec):  50.22540855407715
gbm prediction progress: |████████████████████████████████████████████████| 100%
Time to predict (sec):  3.781240463256836
AUC: 0.818376121653

In [31]:

data = pd.DataFrame({'algorithm'  :h2o_plot_labels,
                     'training time':h2o_train_times,
                     'scoring time':h2o_score_times,
                     'training AUC' :h2o_valid_aucs}).sort_values(by="training time")
data

Out[31]:

	algorithm	scoring time	training AUC	training time
4	H2O XGBoost GPU hist	7.722412	0.819403	17.322280
5	H2O GBM CPU	3.781240	0.818376	50.225409
3	H2O XGBoost GPU exact	7.728892	0.819862	85.099055
2	H2O XGBoost CPU hist	7.723086	0.819105	114.910630
1	H2O XGBoost CPU approx	7.733325	0.818158	166.706602
0	H2O XGBoost CPU exact	7.761167	0.819897	192.163222

In [32]:

data.plot(label="DMatrix + training time",kind='barh',x='algorithm',y='training time')
data.plot(title="training AUC",kind='barh',x='algorithm',y='training AUC',legend=False)

Out[32]:

<matplotlib.axes._subplots.AxesSubplot at 0x2ab5a77bc898>

Summary: Fastest GPU algorithm (XGBoost histogram) takes 5s, fastest CPU algorithm (H2O) takes 50s¶

Note: H2O's XGBoost integration has some internal overhead still (DMatrix creation is single-threaded, and some parameters have different default values, hence the slightly slower training speed and slightly higher training accuracy) - this doesn't affect the summary conclusion¶