This is a short example of converting an Isolation Forest model generated through the isotree library to treelite format, which can then be passed to the tl2cgen library to compile these trees to a standalone runtime library which is oftentimes faster ar making predictions.
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
print(X.shape)
(20640, 8)
Note: only models that use ndim=1
can be exported to treelite
format.
from isotree import IsolationForest
iso = IsolationForest(ndim=1, ntrees=100, sample_size=256,
missing_action="impute", max_depth=8)
iso.fit(X)
### Now convert
treelite_model = iso.to_treelite()
### OPTIONAL: add annotations for better branch prediction
import tl2cgen
tl2cgen.annotate_branch(
model=treelite_model,
dmat=tl2cgen.DMatrix(X),
verbose=False,
path="iso_branches_annotation.json"
)
These models need to be compiled into a shared library in order to be used:
%%capture
import tl2cgen
import multiprocessing
tl2cgen.export_lib(
model=treelite_model,
toolchain="clang",
libpath='./predictor.so',
params={
"parallel_comp": multiprocessing.cpu_count(),
"annotate_in": "iso_branches_annotation.json"
}
)
treelite_predictor = tl2cgen.Predictor("predictor.so")
Now verify that they make the same predictions:
iso.predict(X[:10])
array([0.47006444, 0.47770081, 0.4910637 , 0.42605826, 0.41548625, 0.41730139, 0.41699421, 0.43228664, 0.40877799, 0.41800632])
treelite_predictor.predict(tl2cgen.DMatrix(X[:10]))
array([[0.47006445], [0.47770081], [0.4910637 ], [0.42605827], [0.41548626], [0.41730139], [0.41699421], [0.43228664], [0.40877799], [0.41800632]])
Note: some small disagreement between the two is expected due to loss of precision when converting. See the documentation in isotree
for more details.
%%timeit
import joblib
### see docs for 'IsolationForest.predict' about this part
iso.set_params(nthreads=joblib.cpu_count(only_physical_cores=True))
iso.predict(X)
20.4 ms ± 423 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
treelite_predictor.predict(tl2cgen.DMatrix(X))
4.14 ms ± 620 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)