What makes you sound like a female/male¶

Data is from Kaggle's Gender Recognition by Voice

In [1]:

import pandas as pd

In [2]:

xy = pd.read_csv('data/voice.csv')

X = xy.drop('label', axis='columns')
y = xy['label']

In [3]:

from sklearn.model_selection import train_test_split

In [4]:

X_train, X_test, y_train, y_test = train_test_split(X, y)

We'll train a random forest classifier on the entire dataset.

In [5]:

from sklearn.ensemble import RandomForestClassifier

In [6]:

rf = RandomForestClassifier(n_estimators=100)

In [7]:

rf.fit(X_train, y_train)

Out[7]:

RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features='auto', max_leaf_nodes=None,
            min_impurity_split=1e-07, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=100, n_jobs=1, oob_score=False, random_state=None,
            verbose=0, warm_start=False)

In [8]:

from sklearn.metrics import accuracy_score

In [9]:

accuracy_score(y_test, rf.predict(X_test))

Out[9]:

0.98232323232323238

Nice! We got over 98% accuracy.

Explaining the classifier¶

In [10]:

from lime.lime_tabular import LimeTabularExplainer

In [11]:

features = list(X_train.columns)
explainer = LimeTabularExplainer(X_train.values, feature_names=features, class_names=['female', 'male'])

In [16]:

# randomly pick an example
example = X_train.sample(1).values[0]

In [17]:

exp = explainer.explain_instance(example, rf.predict_proba)

In [18]:

exp.show_in_notebook()

This person has less than 0.12 mean fundamental frequency. That's why the model classified this person as a male.

Reference¶

https://github.com/marcotcr/lime

dreamgonfly@gmail.com