# NBDT: Neural-Backed Decision Trees¶

Step A: Load the weights of the last fully-connected layer.
Step B: Load the wights as different nodes.
Step C, D: The wight of parent is the average weight of its children.

The nodes in same layer form a classifier. Parents' labels come from WordNet, for example, if the leaf node is dog, its parent node can be animal.

After step D, they fine-tuned their models, with a loss $\mathcal{L}$,

$$\mathcal{L}=\beta_{t} \underbrace{\text { CROSSENTROPY }\left(\mathcal{D}_{\text {pred }}, \mathcal{D}_{\text {label }}\right)}_{\mathcal{L}_{\text {original }}}+\omega_{t} \underbrace{\text { CROSSENTROPY }\left(\mathcal{D}_{\text {nbdt }}, \mathcal{D}_{\text {label }}\right)}_{\mathcal{L}_{\text {soft }}}$$

$D_{nbdt}$ is the defined in section 3.1 of this paper.

