In [ ]:

!pip install fastai fastdot -q

     |████████████████████████████████| 184kB 1.4MB/s

04 TabNet¶

In this notebook we'll be looking at comparing the TabNet architecture to our regular fastai fully connected models. We'll be utilizing Michael Grankin's fast_tabnet wrapper to utilize the model.

In [4]:

!pip install fast_tabnet==0.0.8 pytorch_tabnet==1.0.6 -q

WARNING: Keyring is skipped due to an exception: Failed to unlock the collection!
WARNING: Keyring is skipped due to an exception: Failed to unlock the collection!
  WARNING: Keyring is skipped due to an exception: Failed to unlock the collection!

TabNet is an attention-based network for tabular data, originating here. Let's first look at our fastai architecture and then compare it with TabNet utilizing the fastdot library.

First let's build our data real quick so we know just what we're visualizing. We'll use ADULTs again

In [ ]:

from fastai.tabular.all import *

In [ ]:

path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

We'll build our TabularPandas object and the DataLoaders

In [ ]:

cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(df))

In [ ]:

to = TabularPandas(df, procs, cat_names, cont_names, y_names="salary", splits=splits)
dls = to.dataloaders(bs=1)

And let's look at one batch to understand how the data is coming into the model:

In [ ]:

dls.one_batch()

So we can see first is our categoricals, second is our continuous, and the third is our y. With this in mind, let's make a TabularModel with 200 and 100 layer sizes:

In [3]:

learn = tabular_learner(dls, layers=[200,100], metrics=accuracy)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-d57e30351e72> in <module>
----> 1 learn = tabular_learner(dls, layers=[200,100], metrics=accuracy)

NameError: name 'tabular_learner' is not defined

Now a basic visualization of this model can be made with fastdot like below:

In [ ]:

from fastdot import *

Hidden¶

In [ ]:

layers_cat = ['Embedding Matrix', 'Dropout']
inp = ['Input']
cont_bn = ['BatchNorm1d']
lin_bn_drop = ['BatchNorm1d', 'Dropout', 'Linear', 'ReLU']
full_lin = ['LinBnDrop\n(ni, 200)', 'LinBnDrop\n(200,100)', 'LinBnDrop\n(100,2)']

In [ ]:

block1, block2, block3, block4, block5 = ['Preprocessed Input', 'Categorical\nEmbeddings',
                                  'Continous\nBatch Normalization', 'Fully\nConnected Layers',
                                  'LinBnDrop']

In [ ]:

conns = ((block1, block2),
         (block1, block3),
         (block2, block4),
         (block3, block4))

In [ ]:

def color(o):
  if o == 'Embedding Matrix': return 'white'
  if o == 'Input': return 'gray'
  if 'Dropout' in o: return 'gold'
  if 'BatchNorm' in o: return 'pink'
  if 'Lin' in o: return 'lightblue'
  if 'ReLU' in o: return 'gray'

In [ ]:

node_defaults['fillcolor'] = color

In [ ]:

model = [
                seq_cluster(inp, block1),
                seq_cluster(layers_cat, block2),
                seq_cluster(cont_bn, block3),
                seq_cluster(full_lin, block4)]

fcc = seq_cluster(lin_bn_drop, block5)
g = graph_items(*model)
g.add_items(fcc)
g.add_items(*object_connections(conns))

Out[ ]:

(#4) [<pydot.Edge object at 0x7fca8ecca2b0>,<pydot.Edge object at 0x7fca8ecd3f98>,<pydot.Edge object at 0x7fca8ec796a0>,<pydot.Edge object at 0x7fca8ec92080>]

Graph¶

In [ ]:

Out[ ]:

How does this compare to TabNet? This is TabNet:

Hide¶

In [ ]:

block1, block2, block3, block4, block5 = ['Preprocessed Input', 'Categorical\nEmbeddings',
                                  'Continous\nBatch Normalization', 'TabNet', 'Output']

In [ ]:

conns = ((block1, block2),
         (block1, block3),
         (block2, block4),
         (block3, block4),
         (block4, block5))

In [ ]:

feat_tfmer = [*shared, *specifics]

In [ ]:

fcc = ['Linear']

In [ ]:

tabnet = ['Attention\nTransformer', 'Feature\nTransformer', 'Final\nMapping (Linear)']

In [ ]:

def color(o):
  if o == 'Embedding Matrix': return 'white'
  if o == 'Input': return 'gray'
  if 'Dropout' in o: return 'white'
  if 'BatchNorm' in o: return 'pink'
  if 'Att' in o: return 'lightblue'
  if 'Feat' in o: return 'darkseagreen2'
  if 'Mask' in o: return 'lightgray'
  if 'Lin' in o: return 'gold2'
  if 'Out' in o: return 'lightgray'
  return 'white'

node_defaults['fillcolor'] = color

In [ ]:

def cluster_color(o):
  if 'Attention' in o: return 'lightblue'
  if 'Feat' in o: return 'darkseagreen2'
  return 'lightgray'

cluster_defaults['fillcolor'] = cluster_color

In [ ]:

block1, block2, block3, block4, block5 = ['Preprocessed Input', 'Categorical\nEmbeddings',
                                  'Continous\nBatch Normalization', 'TabNet', 'Output']

In [ ]:

out = ['Output', 'Mask_Loss', 'Mask_Explain', 'Masks']

In [ ]:

model = [
         seq_cluster(inp, block1),
         seq_cluster(layers_cat, block2),
         seq_cluster(cont_bn, block3),
         seq_cluster(tabnet, block4),
         *Cluster(block5).add_items(*out),
         ]

In [ ]:

conns = ((block1, block2),
         (block1, block3),
         (block2, block4),
         (block3, block4),
         (block4, out[0]),
         (block4, out[1]),
         (block4, out[2]),
         (block4, out[3]))

In [ ]:

g = graph_items(*model)
g.add_items(*object_connections(conns))

Out[ ]:

(#8) [<pydot.Edge object at 0x7fa9d96c8400>,<pydot.Edge object at 0x7fa9d96a16a0>,<pydot.Edge object at 0x7fa9d96b3a20>,<pydot.Edge object at 0x7fa9d96af3c8>,<pydot.Edge object at 0x7fa9d96c0da0>,<pydot.Edge object at 0x7fa9d9646780>,<pydot.Edge object at 0x7fa9d96b77b8>,<pydot.Edge object at 0x7fa9d9816630>]

In [ ]:

att_tfmer = ['Linear', 'GhostBatchNorm', 'torch.mul(x, prior)', 'Sparsemax']
shared = ['Linear\n(ni, 80)', 'Linear\n(ni-2, 80)']
specifics = ['GLU Block']
feat_tfmer = [*shared, *specifics]

In [ ]:

att_clus = seq_cluster(att_tfmer, 'Attention Transformer')
feat_clus = seq_cluster(feat_tfmer, 'Feature Transformer')

Graphs¶

In [ ]:

Out[ ]:

In [ ]:

graph_items(feat_clus)

Out[ ]:

In [ ]:

graph_items(att_clus)

Out[ ]:

So a few things to note, we now have two transformers, one that keeps an eye on the features and another that keeps an eye on the attention. We could call the Attention transformer the encoder and the Feature transformer the decoder. What this attention let's us do is see exactly how our model is behaving, moreso than just how our feature importance and other techniques "guess"

Now that we have this done, how do we make a model?

Using TabNet¶

I have found in my experiments that TabNet isn't quite as good as fastai's tabular model, but as attention can be important and is a hot topic, we'll use it here. Another con of this model is it takes many epochs to get a decent accuracy as we will see:

In [8]:

from fast_tabnet.core import *

In [9]:

from fast_tabnet.core import TabNetNoEmbeddings

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-9-0c37d6c71047> in <module>
----> 1 from fast_tabnet.core import TabNetNoEmbeddings

ImportError: cannot import name 'TabNetNoEmbeddings' from 'fast_tabnet.core' (/home/ml1/anaconda3/envs/fastai/lib/python3.7/site-packages/fast_tabnet/core.py)

First we need to grab the embedding matrix sizes:

In [9]:

emb_szs = get_emb_sz(to); emb_szs

Out[9]:

[(10, 6), (17, 8), (8, 5), (16, 8), (7, 5), (6, 4), (3, 3)]

And now we can make use of our model! There's many different values we can pass in, here's a brief summary:

n_d: Dimensions of the prediction layer (usually between 4 to 64)
n_a: Dimensions of the attention layer (similar to n_d)
n_steps: Number of sucessive steps in our network (usually 3 to 10)
gamma: A scalling factor for updating attention (usually between 1.0 to 2.0)
momentum: Momentum in all batch normalization
n_independent: Number of independant GLU layers in each block (default is 2)
n_shared: Number of shared GLU layers in each block (default is 2)
epsilon: Should be kept very low (avoid log(0)

Let's build one similar to the model we showed in the above. To do so we'll set the dimensions of the prediction layer to 8, the number of attention layer dimensions to 32, and our steps to 4:

In [10]:

class TabNetModel(Module):
    "Attention model for tabular data."
    def __init__(self, emb_szs, n_cont, out_sz, embed_p=0., y_range=None,
                 n_d=8, n_a=8,
                 n_steps=3, gamma=1.3,
                 n_independent=2, n_shared=2, epsilon=1e-15,
                 virtual_batch_size=128, momentum=0.02):
        self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
        self.emb_drop = nn.Dropout(embed_p)
        self.bn_cont = nn.BatchNorm1d(n_cont)
        n_emb = sum(e.embedding_dim for e in self.embeds)
        self.n_emb,self.n_cont,self.y_range = n_emb,n_cont,y_range
        self.tab_net = TabNetNoEmbeddings(n_emb + n_cont, out_sz, n_d, n_a, n_steps,
                                          gamma, n_independent, n_shared, epsilon, virtual_batch_size, momentum)

    def forward(self, x_cat, x_cont, att=False):
        if self.n_emb != 0:
            x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
            x = torch.cat(x, 1)
            x = self.emb_drop(x)
        if self.n_cont != 0:
            x_cont = self.bn_cont(x_cont)
            x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
        x, m_loss, m_explain, masks = self.tab_net(x)
        if self.y_range is not None:
            x = (self.y_range[1]-self.y_range[0]) * torch.sigmoid(x) + self.y_range[0]
        if att:
            return x, m_loss, m_explain, masks
        else:
            return x

First we need to make new DataLoaders because we currently have a batch size of 1

In [11]:

dls = to.dataloaders(bs=1024)

Then build the model:

In [12]:

net = TabNetModel(emb_szs, len(to.cont_names), dls.c, n_d=8, n_a=32, n_steps=1); 

Finally we'll build our Learner and use the ranger optimizer:

In [13]:

learn = Learner(dls, net, CrossEntropyLossFlat(), metrics=accuracy, opt_func=ranger)

In [14]:

learn.fit_flat_cos(5, 1e-1)

epoch	train_loss	valid_loss	accuracy	time
0	0.000000	00:00

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-8b593c373fbb> in <module>
----> 1 learn.fit_flat_cos(5, 1e-1)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastcore/utils.py in _f(*args, **kwargs)
    450         init_args.update(log)
    451         setattr(inst, 'init_args', init_args)
--> 452         return inst if to_return else f(*args, **kwargs)
    453     return _f
    454 

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/callback/schedule.py in fit_flat_cos(self, n_epoch, lr, div_final, pct_start, wd, cbs, reset_opt)
    135     lr = np.array([h['lr'] for h in self.opt.hypers])
    136     scheds = {'lr': combined_cos(pct_start, lr, lr, lr/div_final)}
--> 137     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    138 
    139 # Cell

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastcore/utils.py in _f(*args, **kwargs)
    450         init_args.update(log)
    451         setattr(inst, 'init_args', init_args)
--> 452         return inst if to_return else f(*args, **kwargs)
    453     return _f
    454 

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    203                     try:
    204                         self.epoch=epoch;          self('begin_epoch')
--> 205                         self._do_epoch_train()
    206                         self._do_epoch_validate()
    207                     except CancelEpochException:   self('after_cancel_epoch')

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/learner.py in _do_epoch_train(self)
    175         try:
    176             self.dl = self.dls.train;                        self('begin_train')
--> 177             self.all_batches()
    178         except CancelTrainException:                         self('after_cancel_train')
    179         finally:                                             self('after_train')

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/learner.py in all_batches(self)
    153     def all_batches(self):
    154         self.n_iter = len(self.dl)
--> 155         for o in enumerate(self.dl): self.one_batch(*o)
    156 
    157     def one_batch(self, i, b):

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/learner.py in one_batch(self, i, b)
    159         try:
    160             self._split(b);                                  self('begin_batch')
--> 161             self.pred = self.model(*self.xb);                self('after_pred')
    162             if len(self.yb) == 0: return
    163             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')

~/anaconda3/envs/fastai/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

<ipython-input-10-16f36b5dcdbe> in forward(self, x_cat, x_cont, att)
     22             x_cont = self.bn_cont(x_cont)
     23             x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
---> 24         x, m_loss, m_explain, masks = self.tab_net(x)
     25         if self.y_range is not None:
     26             x = (self.y_range[1]-self.y_range[0]) * torch.sigmoid(x) + self.y_range[0]

ValueError: not enough values to unpack (expected 4, got 2)

Now as you can see it actually didn't take that long to get to the 83% fairly quickly. On my other tests I wasn't able to do quite as well but try it out! The code is here for you to use and play with.

In [ ]:

dl = learn.dls.test_dl(df.iloc[:20], bs=1)

In [ ]:

batch = dl.one_batch()

In [ ]:

batch

Out[ ]:

(tensor([[5, 8, 3, 0, 6, 5, 1]]),
 tensor([[ 0.7629, -0.8397,  0.7556]]),
 tensor([[1]], dtype=torch.int8))

In [ ]:

from pytorch_tabnet.tab_model import *

In [ ]:

cat_dims = [emb_szs[i][1] for i in range(len(emb_szs))]

In [ ]:

cat_dims

Out[ ]:

[6, 8, 5, 8, 5, 4, 3]

Cat dims are the first bits of the embedding sizes. cat_idxs are what index in the batch our categorical variables come from. In our case it's everything after 3

In [ ]:

cat_idxs = [3,4,5,6,7,8, 9]

In [ ]:

cat_idxs

Out[ ]:

[3, 4, 5, 6, 7, 8, 9]

In [ ]:

tot = len(to.cont_names) + len(to.cat_names)

The 42 comes from the first input out of the embeddings:

In [ ]:

matrix = create_explain_matrix(tot,
                      cat_dims,
                      cat_idxs,
                      42)

In [ ]:

dl = learn.dls.test_dl(df.iloc[:20], bs=1)

Now let's patch in an explainer function to Learner

In [ ]:

@patch
def explain(x:Learner, dl:TabDataLoader):
  "Get explain values for a set of predictions"
  dec_y = []
  x.model.eval()
  for batch_nb, data in enumerate(dl):
    with torch.no_grad():
      out, M_loss, M_explain, masks = x.model(data[0], data[1], True)
    for key, value in masks.items():
      masks[key] = csc_matrix.dot(value.numpy(), matrix)
    if batch_nb == 0:
      res_explain = csc_matrix.dot(M_explain.numpy(),
                                  matrix)
      res_masks = masks
    else:
      res_explain = np.vstack([res_explain,
                              csc_matrix.dot(M_explain.numpy(),
                                              matrix)])
      for key, value in masks.items():
        res_masks[key] = np.vstack([res_masks[key], value])

    dec_y.append(int(learn.loss_func.decodes(out)))
  return dec_y, res_masks, res_explain

We'll pass in a DataLoader

In [ ]:

dec_y, res_masks, res_explain = learn.explain(dl)

And now we can visualize them with plot_explain

In [ ]:

def plot_explain(masks, lbls, figsize=(12,12)):
  "Plots masks with `lbls` (`dls.x_names`)"
  fig = plt.figure(figsize=figsize)
  ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
  plt.yticks(np.arange(0, len(masks[0]), 1.0))
  plt.xticks(np.arange(0, len(masks[0][0]), 1.0))
  ax.set_xticklabels(lbls, rotation=90)
  plt.ylabel('Sample Number')
  plt.xlabel('Variable')
  plt.imshow(masks[0])

We pass in the masks and the x_names and we can see for each input how it affected the output:

In [ ]:

lbls = dls.x_names

In [ ]:

plot_explain(res_masks, lbls)