KTS logo

Stacking Guide¶

In [1]:

import numpy as np
np.random.seed(0)

import kts
from kts import *

DASHBOARD

features

simple_feature

FEATURE CONSTRUCTOR

name

simple_feature

source

@feature
def simple_feature(df):
    res = stl.empty_like(df)
    res['is_male'] = (df.Sex == 'male') + 0
    return res

columns

is_male

interactions

GENERIC FEATURE

name

interactions

source

@feature
@generic(left="Pclass", right="SibSp")
def interactions(df):
    res = stl.empty_like(df)
    res[f"{left}_add_{right}"] = df[left] + df[right]
    res[f"{left}_sub_{right}"] = df[left] - df[right]
    res[f"{left}_mul_{right}"] = df[left] * df[right]
    return res

num_aggs

GENERIC FEATURE

name

num_aggs

description

Descriptions are also supported.

source

@feature
@generic(col="Parch")
def num_aggs(df):
    """Descriptions are also supported."""
    res = pd.DataFrame(index=df.index)
    mean = df[col].mean()
    std = df[col].std()
    res[f"{col}_div_mean"] = df[col] / mean
    res[f"{col}_sub_div_mean"] = (df[col] - mean) / mean
    res[f"{col}_div_std"] = df[col] / std
    return res

tfidf

GENERIC FEATURE

name

tfidf

source

@feature
@generic(col='Name')
def tfidf(df):
    if df.train:
        enc = TfidfVectorizer(analyzer='char', ngram_range=(1, 3), max_features=5)
        res = enc.fit_transform(df[col])
        df.state['enc'] = enc
    else:
        enc = df.state['enc']
        res = enc.transform(df[col])
    return res.todense()

requirements

sklearn==0.20.2

helpers

You've got no helpers so far.

In [2]:

train = kts.load('train')
test = kts.load('test')

stl.stack¶

To stack models, stl.stack is used:

In [3]:

stl.stack

Out[3]:

STACK DOCS

signature

stack(experiment_id, noise_level, random_state)

description

Returns predictions of specified experiment as features

For indices used for fitting the experiment returns OOF predictions.
For unseen indices returns predictions obtained via experiment.predict().

params

experiment_id

id of the experiment at the leaderboard

noise_level

range of noise added to predictions during train stage.
If specified, then uniformly distributed value from range [-noise_level/2, noise_level/2]
is added to each prediction.

random_state

random state for random noise generator

returns

A feature constructor returning predictions of the experiment.

examples

>>> stl.stack('ABCDEF')
>>> stl.stack('ABCDEF', noise_level=0.3, random_state=42)
>>> stl.concat([stl.stack('ABCDEF'), stl.stack('GHIJKL')])

In case if we pass a train set slice, it just returns OOF predictions:

Note that it cannot be used in parallel features.

In [4]:

@preview(train, 10, parallel=False)
def preview_stack(df):
    return stl.stack('KPBVAI')(df)

COMPUTING FEATURES

feature

progress

preview_stack

1s

	KPBVAI
PassengerId
1	0.109578
2	0.985013
3	0.635773
4	0.906962
5	0.132054
6	0.124416
7	0.322831
8	0.599516
9	0.520683
10	0.922890

But for test set, inference is run:

In [5]:

@preview(test, 10, parallel=False)
def preview_stack(df):
    return stl.stack('KPBVAI')(df)

COMPUTING FEATURES

feature

progress

preview_stack

num_aggs__Fare

0s

simple_feature

0s

interactions__Pclass_Age

0s

tfidf__Name

0s

INFERENCE

id

KPBVAI

progress

took

2s

eta

	KPBVAI
PassengerId
892	0.150248
893	0.589290
894	0.107736
895	0.229384
896	0.756447
897	0.191558
898	0.758887
899	0.271479
900	0.785848
901	0.282044

Anti-overfitting¶

KTS provides two basic ways to prevent overfitting during stacking.

Noise¶

First of them is adding random uniform noise to first-level model predictions during training stage:

In [6]:

@preview(train, 4, 4, parallel=False)
def preview_stack(df):
    return stl.stack('KPBVAI', noise_level=0.1, random_state=None)(df)

COMPUTING FEATURES

feature

progress

preview_stack

0s

	KPBVAI
PassengerId
1	0.112647
2	0.977061
3	0.683239
4	0.921687

	KPBVAI
PassengerId
1	0.077990
2	1.005335
3	0.609979
4	0.933971

Refiner¶

The second available option is a special splitter called Refiner, which splits each fold of an outer splitter using inner splitter. It allows to train a second-level model without even indirect leaks, as in this case each second-level model is trained using validation set of the corresponding first-level model.

In [7]:

from kts.validation.split import Refiner
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import roc_auc_score

outer_skf = StratifiedKFold(5, True, 42)  # splitter used to train the first-level model
inner_skf = StratifiedKFold(3, True, 42)  # splitter to be used to split its folds

refiner = Refiner(outer_skf, inner_skf)
val_stack = Validator(refiner, roc_auc_score)

In [8]:

fs = FeatureSet([stl.stack('KPBVAI'), stl.stack('FYCMDA'), tfidf()], 
                train_frame=train,
                targets='Survived')

In [9]:

from kts.models.binary import *

In [10]:

model = LogisticRegression(solver='lbfgs', C=11)

val_stack.score(model, fs)

FITTING

progress

train

valid

metric

0.927

took

0s

eta

0s

0.791

0s

0.872

0s

0.791

0s

0.833

0s

0.856

0s

0.780

0s

0.857

0s

0.828

0s

0.867

0s

0.894

1s

0s

0.818

0s

0.774

0s

0.859

0s

0.774

1s

0s

Out[10]:

{'score': 0.8348066445892532, 'id': 'CBUQLG'}

Stacked experiments behave exactly as usual experiments:

In [11]:

lb.CBUQLG.predict(test)[:10]

COMPUTING FEATURES

feature

progress

simple_feature

0s

interactions__Pclass_Age

0s

num_aggs__Fare

0s

tfidf__Name

0s

INFERENCE

id

KPBVAI

progress

took

2s

eta

FYCMDA

1s

CBUQLG

2s

Out[11]:

array([0.09516588, 0.46953291, 0.09342506, 0.15601523, 0.7111576 ,
       0.11536574, 0.56980286, 0.16907218, 0.66821078, 0.13896083])

In [12]:

lb.CBUQLG.feature_importances(estimator=Permutation(train, n_iters=10))

COMPUTING IMPORTANCES

progress

Computing tfidf__Name_4

took

6s

eta

0s

FEATURE IMPORTANCES

feature

mean

importance

KPBVAI

0.281

FYCMDA

0.056

tfidf__Name_1

0.015

tfidf__Name_0

0.014

tfidf__Name_2

0.013

tfidf__Name_3

4.01e-03

tfidf__Name_4

1.79e-03

In [13]:

lb.CBUQLG.feature_importances(estimator=PermutationBlind(test, n_iters=20))

COMPUTING IMPORTANCES

progress

Computing tfidf__Name_4

took

3s

eta

0s

FEATURE IMPORTANCES

feature

mean

importance

KPBVAI

0.364

FYCMDA

0.137

tfidf__Name_2

0.101

tfidf__Name_1

0.091

tfidf__Name_0

0.067

tfidf__Name_4

0.066

tfidf__Name_3

0.048

Deep Stacking¶

As stl.stack is no more than a usual feature constructor, you can build as complex stackings as you want just by adding it to feature sets.

Let's write a five-level stacking with resudual connections. In this demo we don't care about overfitting and model performance and just show that:

Stacking is as easy as adding stl.stack(id) to feature set
Stacking inference is no different from ordinary experiments
In case if two or more next-level models need predictions from model A, model A will still be run only once

In [14]:

skf = StratifiedKFold(5, True, 42)
val = Validator(skf, roc_auc_score)

In [15]:

current_features = [tfidf(), num_aggs('SibSp'), num_aggs('Parch')]

for i in range(5):
    model = RandomForestClassifier(n_estimators=50)
    fs = FeatureSet(current_features, train_frame=train, targets='Survived')

    summary = val.score(model, fs, leaderboard='deepstack')
    current_features.append(stl.stack(summary['id']))

COMPUTING FEATURES

feature

progress

num_aggs__SibSp

0s

num_aggs__Parch

0s

FITTING

progress

train

valid

metric

0.673

took

0s

eta

0s

0.638

0s

0.654

0s

0.707

0s

0.614

0s

FITTING

progress

train

valid

metric

0.774

took

0s

eta

0s

0.682

0s

0.757

0s

0.785

0s

0.703

0s

FITTING

progress

train

valid

metric

0.751

took

0s

eta

0s

0.696

1s

0s

0.735

1s

0s

0.782

1s

0s

0.706

0s

FITTING

progress

train

valid

metric

0.745

took

1s

eta

0s

0.682

1s

0s

0.723

1s

0s

0.773

1s

0s

0.698

1s

0s

FITTING

progress

train

valid

metric

0.753

took

1s

eta

0s

0.704

1s

0s

0.693

1s

0s

0.777

1s

0s

0.698

1s

0s

In [16]:

lbs.deepstack

So, BBFXPT is a fifth-level model.

In [17]:

lb.BBFXPT.predict(test.head(21))

COMPUTING FEATURES

feature

progress

tfidf__Name

0s

num_aggs__SibSp

0s

num_aggs__Parch

0s

INFERENCE

id

BESHSV

progress

took

0s

eta

EKLVJO

1s

TWVZZK

1s

HJMHZG

1s

BBFXPT

1s

Out[17]:

array([0.07853333, 0.848     , 0.065     , 0.11357619, 0.508     ,
       0.008     , 0.68731746, 0.37766667, 0.42224762, 0.24766667,
       0.14748889, 0.14460952, 0.58071429, 0.33666667, 0.632     ,
       0.768     , 0.542     , 0.0744    , 0.50438095, 0.46020952,
       0.27      ])

In [18]:

lb.BBFXPT.feature_importances()

Out[18]:

FEATURE IMPORTANCES

feature

mean

importance

EKLVJO

0.141

TWVZZK

0.129

HJMHZG

0.120

BESHSV

0.103

tfidf__Name_4

0.095

tfidf__Name_3

0.082

tfidf__Name_0

0.080

tfidf__Name_2

0.080

tfidf__Name_1

0.078

SibSp_sub_div_mean

0.018

SibSp_div_std

0.016

Parch_sub_div_mean

0.015

SibSp_div_mean

0.015

Parch_div_std

0.014

Parch_div_mean

0.013

In [19]: