*Sebastian Raschka*

last updated: **03/29/2014**

Link to this IPython Notebook on GitHub

Executed in Python 3.4.0

One of the biggest challenges of designing a good classifier for solving a ** Statistical Pattern Classification** problem is to estimate the underlying parameters to fit the model - given that the forms of the underlying probability distributions are known. The larger the number of parameters becomes, the more difficult it naturally is to estimate those parameters accurately (

In order to avoid the ** Curse of Dimensionality**, pattern classification is often accompanied by

An alternative to the projection-based approach is the so-called

Therefore, the goal of the presented ** sequential selection algorithms** is to reduce the feature space

The goal is to select a "sufficiently reduced" subset from the feature space

F. Ferri, P. Pudil, M. Hatef, and J. Kittler investigated the performance of different ** Sequential Selection Algorithms** for

Choosing an "appropriate" algorithm really depends on the problem - the size and desired recognition rate and computational performance. Thus, I want to encourage you to take (at least) a brief look at their paper and the results they obtained from experimenting with different problems feature space dimensions.

In order to evaluate the performance of our selected ** feature subset** (typically the recognition rate of the classifier), we need to define a

For the sake of simplicity, and in order to get an intuitive idea if our algorithm returns an "appropriate" result, let us define a very simple criterion function here.

The criterion function defined below simply returns the sum of numerical values in a list.

In [1]:

```
def simple_crit_func(feat_sub):
""" Returns sum of numerical values of an input list. """
return sum(feat_sub)
# Example:
simple_crit_func([1,2,4])
```

Out[1]:

The ** Sequential Fortward Selection (SFS)** is one of the simplest and probably fastest

Let's summarize its mechanics in words:

Note that included features are never removed, which is one of the biggest downsides of this algorithm.

**Input:**

the set of all features,

- $ Y ={y_1, y_2, ..., y_d} $

The ** SFS** algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (

**Output:**

a subset of features,

- $X_k = {x_j \; | \; j = 1, 2, ..., k; x_j ∈ Y}$,

where $k = (0, 1, 2, ..., d $)

The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, \; d = 10$).

**Initialization:**

- $ X_0 = \emptyset, \; k = 0 $

We initialize the algorithm with an empty set ("null set") so that the $k = 0$ (where $k$ is the size of the subset)

**Step 1 (Inclusion):**

- $ x^+ \; arg \; max \; J(x_k + x), \quad where \; x ∈ Y - X_k $
- $ X_k+1 = X_k + x^+ $
- $ k = k + 1 $
- $ Go \; to \; Step 1 $

We go through the ** feature space** and look for the feature $x^+$ which maximizes our criterion if we add it to the

**Termination:**

- stop when
*k*equals the number of desired features

We add features to the new feature subset $X_k$ until we reach the number of specified features for our final subset. E.g., if our desired number of features is 5 and we start with the "null set" (*Initialization*), we would add features to the subset until it contains 5 features.

In [2]:

```
# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Forward Selection (SBS)
def seq_forw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of a Sequential Forward Selection algorithm.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = []
k = 0
d = len(features)
if max_k > d:
max_k = d
while True:
# Inclusion step
if print_steps:
print('\nInclusion from feature space', features)
crit_func_max = criterion_func(feat_sub + [features[0]])
best_feat = features[0]
for x in features[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
feat_sub.append(best_feat)
if print_steps:
print('include: {} -> feature subset: {}'.format(best_feat, feat_sub))
features.remove(best_feat)
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub
```

In this example, we take a look at the individual steps of the ** SFS** algorithmn to select a

The input feature space consists of 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

In [3]:

```
def example_seq_forw_select():
ex_features = [6, 3, 1, 6, 8, 2, 3, 7, 9, 1]
res_forw = seq_forw_select(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, print_steps=True)
return res_forw
# Run example
res_forw = example_seq_forw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_forw)
```

The returned result is definitely what we would expect: the 3 highest values (note that we defined 3 as the number of desired features for our subset) in the input feature list, since our ** criterion** is to select the numerical values (= features) that yield the maximum mathematical sum in the feature subset.

The ** Sequential Backward Selection (SBS)** algorithm is very similar to the

Note that features are never added back once they were removed, which (similar to

the set of all features,

- $Y = {y_1, y_2, ..., y_d}$

The ** SBS** algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ($d = 10$).

**Output:**

a subset of features,

- $ X_k = {x_j \;| \;j = 1, 2, ..., k; x_j ∈ Y}, \quad where \; k = (0, 1, 2, ..., d)$

The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, d = 10$).

**Initialization:**

- $X_0 = Y, \; k = d$

We initialize the algorithm with the given feature set so that the $k = d$ (where $k$ has the size of the feature set $d$)

**Step 1 (Exclusion):**

- $x^- = arg \; max \; J(x_k - x), where x ∈ X_k$
- $X_k-1 = X_k - x^-$
- $k = k - 1$
- $Go \; to \; Step 1$

We go through the ** feature subset** and look for the feature $x^-$ which minimizes our criterion if we remove it from the

**Termination:**

- stop when $k$ equals the number of desired features

We remove features from the feature subset $X_k$ until we reach the number of specified features for our final subset. E.g., if our desired number of features is 5 and we start with the entire feature space (*Initialization*), we would remove features from the subset until it contains 5 features.

In [4]:

```
# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Backward Selection (SBS)
from copy import deepcopy
def seq_backw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of a Sequential Backward Selection algorithm.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = deepcopy(features)
k = len(feat_sub)
i = 0
while True:
# Exclusion step
if print_steps:
print('\nExclusion from feature subset', feat_sub)
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for i in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub
```

Like we did for the ** SFS example** above, we take a look at the individual steps of the

The input feature space consists of 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

In [5]:

```
def example_seq_backw_select():
ex_features = [6,3,1,6,8,2,3,7,9,1]
res_backw = seq_backw_select(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, print_steps=True)
return (res_backw)
# Run example
res_backw = example_seq_backw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_backw)
```

The returned ** feature subset** is similar to the result of the

The ** "Plus L take away R" (+L -R)** is basically a combination of

**Variant 1: L > R**

If *L > R*, the algorithm starts with an empty ** feature subset** and adds

Those steps are repeated until the

Else, if

Those steps are repeated until the

the set of all features,

- $Y = {y_1, y_2, ..., y_d}$

The ** +L -R** algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ($d = 10$).

**Output:**
a subset of features,

- $X_k = {x_j \; | \; j = 1, 2, ..., k; x_j ∈ Y}, \quad where \; k = (0, 1, 2, ..., d)$

The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, \; d = 10$).

**Initialization:**

- $X_0 = Y, \; k = d$

We initialize the algorithm with the given feature set so that the $k = d$ (where $k$ has the size of the feature set $d$)

**Step 1 (Inclusion):**

*repeat L-times:*- $x^+ = arg \; max \; J(x_k + x), \quad where \; x ∈ Y - X_k $
- $X_k+1 = X_k + x^+$
- $k = k + 1$

- $Go \; to\; Step 2$

**Step 2 (Exclusion):**

*repeat R-times:*- $x^- = arg max J(x_k - x), \quad where \; x ∈ X_k$
- $X_k-1 = X_k - x^-$
- $k = k - 1$

- $Go \; to \; Step 1$

In step 1, we go *L-times* through the ** feature space** and look for the feature $x^+$ which maximizes our criterion if we add it to the

In step 2, we go

Note that this order of steps only applies if $L > R$, in the opposite case ($R > L$), we have to start with the

**Termination:**

- stop when $k$ equals the number of desired features

We add and remove features from the feature subset $X_k$ until we reach the number of specified features for our final subset. E.g., if our desired number of features is 5 and we start with the entire feature space (*Initialization*), we would remove features from the subset until it contains 5 features.

In [6]:

```
# Sebastian Raschka
# last updated: 03/29/2014
# "Plus L take away R" (+L -R)
from copy import deepcopy
def plus_L_minus_R(features, max_k, criterion_func, L=3, R=2, print_steps=False):
"""
Implementation of a "Plus l take away r" algorithm.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
L (int): Number of features added per iteration.
R (int): Number of features removed per iteration.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
assert(L != R), 'L must be != R to avoid an infinite loop'
############################
### +L -R for case L > R ###
############################
if L > R:
feat_sub = []
k = 0
# Initialization
while True:
# +L (Inclusion)
if print_steps:
print('\nInclusion from features', features)
for i in range(L):
if len(features) > 0:
crit_func_max = criterion_func(feat_sub + [features[0]])
best_feat = features[0]
if len(features) > 1:
for x in features[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
features.remove(best_feat)
feat_sub.append(best_feat)
if print_steps:
print('include: {} -> feature_subset: {}'.format(best_feat, feat_sub))
# -R (Exclusion)
if print_steps:
print('\nExclusion from feature_subset', feat_sub)
for i in range(R):
if len(features) + len(feat_sub) > max_k:
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for j in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:j] + feat_sub[j+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = j, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub
############################
### +L -R for case L < R ###
############################
else:
# Initialization
feat_sub = deepcopy(features)
k = len(feat_sub)
i = 0
count = 0
while True:
count += 1
# Exclusion step
removed_feats = []
if print_steps:
print('\nExclusion from feature subset', feat_sub)
for i in range(R):
if len(feat_sub) > max_k:
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for i in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
removed_feats.append(feat_sub.pop(worst_feat))
if print_steps:
print('exclude: {} -> feature subset: {}'.format(removed_feats, feat_sub))
# +L (Inclusion)
included_feats = []
if len(feat_sub) != max_k:
for i in range(L):
if len(removed_feats) > 0:
crit_func_max = criterion_func(feat_sub + [removed_feats[0]])
best_feat = removed_feats[0]
if len(removed_feats) > 1:
for x in removed_feats[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
removed_feats.remove(best_feat)
feat_sub.append(best_feat)
included_feats.append(best_feat)
if print_steps:
print('\nInclusion from removed features', removed_feats)
print('include: {} -> feature_subset: {}'.format(included_feats, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
if count >= 30:
break
return feat_sub
```

Like we did for the ** SFS** example above, let's have look at the individual steps of the

Again, the input feature space consists of the 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

In [7]:

```
def example_plus_L_minus_R():
ex_features = [6, 3, 1, 6, 8, 2, 3, 7, 9, 1]
res_plmr = plus_L_minus_R(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, L=3, R=2, print_steps=True)
return (res_plmr)
# Run example
res = example_plus_L_minus_R()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res)
```

In [8]:

```
def example_plus_L_minus_R():
ex_features = [6, 3, 1, 6, 8, 2, 3, 7, 9, 1]
res_plmr = plus_L_minus_R(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, L=2, R=3, print_steps=True)
return (res_plmr)
# Run example
res = example_plus_L_minus_R()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res)
```

The returned ** feature subset** really is suboptimal in this particular example for L > R (Example 1: L > R). This is due to the fact that we add multiple features to our

**Modifying the "Plus L take away R" algorithm**

This algorithm can be tweaked by adding back *r - 1* features to the ** feature subset** after each

The ** Sequential Floating Forward Selection (SFFS)** algorithm can be considered as extension of the simpler

the set of all features,

- $Y = {y_1, y_2, ..., y_d}$

The ** SFFS** algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ($d = 10$).

**Output:**

a subset of features,

- $X_k = {x_j \; | \; j = 1, 2, ..., k; x_j ∈ Y}, \quad where \; k = (0, 1, 2, ..., d)$

The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, \; d = 10$).

**Initialization:**

- $X_0 = \emptyset, \quad k = 0$

We initialize the algorithm with an empty set ("null set") so that the $k = 0$ (where $k$ is the size of the subset)

**Step 1 (Inclusion):**

- $x^+ = arg\; max \;J(x_k + x), \quad where \; x ∈ Y - X_k$
- $X_k+1 = X_k + x^+$
- $k = k + 1$
- $Go \; to \; Step 2 $

**Step 2 (Conditional Exclusion):**

- $x^- = arg \; max \; J(x_k - x),\quad where \; x ∈ X_k$
- $if\; J(x_k - x) > J(x_k - x):$
- $X_k-1 = X_k - x^-$
- $k = k - 1$

- $Go \; to \; Step 1$

In step 1, we include the feature from the ** feature space** that leads to the best performance increase for our

In step 2, we only remove a feature if the resulting subset would gain an increase in performance. We go back to step 1.

Steps 1 and 2 are reapeated until the

**Termination:**

- stop when $k$ equals the number of desired features

In [9]:

```
# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Floating Forward Selection (SFFS)
def seq_float_forw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of Sequential Floating Forward Selection.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = []
k = 0
while True:
# Step 1: Inclusion
if print_steps:
print('\nInclusion from features', features)
if len(features) > 0:
crit_func_max = criterion_func(feat_sub + [features[0]])
best_feat = features[0]
if len(features) > 1:
for x in features[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
features.remove(best_feat)
feat_sub.append(best_feat)
if print_steps:
print('include: {} -> feature_subset: {}'.format(best_feat, feat_sub))
# Step 2: Conditional Exclusion
worst_feat_val = None
if len(features) + len(feat_sub) > max_k:
crit_func_max = criterion_func(feat_sub)
for i in reversed(range(0,len(feat_sub))):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
if worst_feat_val:
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub
```

Since the ** Exclusion** step in the

`simple_criterion_func()`

, which returns the integer sum of a feature set. Thus, let us define another simple criterion function that we use for testing our Just as we did for the previous examples above, let's take a look at the individual steps of the ** SFFS** algorithmn to select a

Also here, the input feature space consists of the 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

The criterion function we define below is also calculates the sum of a subset similar to the `simple_criterion_func()`

we used before. However, here we add a random integer ranging from -15 to 15 to the returned sum. Therefore, in some occasions, our criterion function can return a larger sum for a smaller subset - after we removed a feature from the subset after the ** Inclusion** step - in order to trigger the

In [10]:

```
from random import randint
def simple_rand_crit_func(feat_sub):
"""
Returns sum of numerical values of an input list plus
a random integer ranging from -15 to 15.
"""
return sum(feat_sub) + randint(-15,15)
# Example:
simple_rand_crit_func([1,2,4])
```

Out[10]:

In [11]:

```
def example_seq_float_forw_select():
ex_features = [6,3,1,6,8,2,3,7,9,1]
res_seq_flforw = seq_float_forw_select(features=ex_features, max_k=3,\
criterion_func=simple_rand_crit_func, print_steps=True)
return res_seq_flforw
# Run example
res_seq_flforw = example_seq_float_forw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_seq_flforw)
```

Just as in the ** SFFS** algorithm, we have a conditional step: Here, we start with the whole feature subset and exclude features sequentially. Only if adding one of the previously excluded features back to a new

the set of all features,

- $Y = {y_1, y_2, ..., y_d}$

The ** SFBS** algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ($d = 10$).

**Output:**

a subset of features,

- $X_k = {x_j \; | \; j = 1, 2, ..., k; x_j ∈ Y}, \quad where \; k = (0, 1, 2, ..., d)$

The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5,\; d = 10$).

**Initialization:**

- $X_0 = Y, \quad k = d$

We initialize the algorithm with the given feature set so that the $k = d$ (where $k$ has the size of the feature set $d$)

**Step 1 (Exclusion):**

- $x^- = arg max J(x_k - x), \quad where \; x ∈ X_k$
- $X_k-1 = X_k - x^-$
- $k = k - 1$
- $Go \;to \;Step 2$

**Step 2 (Conditional Inclusion):**

- $x^+ = arg max J(x_k + x), \quad where \; x ∈ Y - X_k$
- $if J(x_k + x) > J(x_k + x):$
- $X_k+1 = X_k + x^+$
- $k = k + 1$

- $Go\; to\; Step 1$

In step 1, we exclude the feature from the ** feature space** that yields the best performance increase of the

In step 2, we only include one of the removed features if the resulting subset would gain an increase in performance. We go back to step 1.

Steps 1 and 2 are reapeated until the

In [16]:

```
# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Floating Backward Selection (SFBS)
from copy import deepcopy
def seq_float_backw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of Sequential Floating Backward Selection.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = deepcopy(features)
k = len(feat_sub)
i = 0
excluded_features = []
while True:
# Termination condition
k = len(feat_sub)
if k == max_k:
break
# Step 1: Exclusion
if print_steps:
print('\nExclusion from feature subset', feat_sub)
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for i in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
excluded_features.append(feat_sub[worst_feat])
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Step 2: Conditional Inclusion
if len(excluded_features) > 0 and len(feat_sub) != max_k:
best_feat = None
best_feat_val = None
crit_func_max = criterion_func(feat_sub)
for i in range(len(excluded_features)):
crit_func_eval = criterion_func(feat_sub + [excluded_features[i]])
if crit_func_eval > crit_func_max:
best_feat, crit_func_max = i, crit_func_eval
best_feat_val = excluded_features[best_feat]
if best_feat:
feat_sub.append(excluded_features[best_feat])
del excluded_features[best_feat]
if print_steps:
print('include: {} -> feature subset: {}'.\
format(best_feat_val, feat_sub))
return feat_sub
```

Note that the ** Inclusion** step in the

Therefore, we have to be a little bit careful about the

`simple_criterion_func()`

, which returns the integer sum of a subset, we would trigger an infinite loop and never reach the termination criterion - assuming that our feature space consists of positive integers. The reason is that the `simple_criterion_func()`

would always return a larger sum if we include a positive integer (our feature) to the In order to reduce the number of iterations, we set the number of desired features for the

`max_k`

) to 7.In [18]:

```
def example_seq_float_backw_select():
ex_features = [6,3,1,6,8,2,3,7,9,1]
res_seq_flbackw = seq_float_backw_select(features=ex_features, max_k=7,\
criterion_func=simple_rand_crit_func, print_steps=True)
return res_seq_flbackw
# Run example
res_seq_flbackw = example_seq_float_backw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_seq_flbackw)
```

*to be continued ...*