Fair classifiers with adversarial networks¶

Gilles Louppe, 2017.

We illustrate how one can use adversarial networks for building a classifier whose output is forced to be independent of some chosen attribute. We follow the adversarial networks setup described in "Learning to Pivot with Adversarial Networks" (Louppe, Kagan and Cranmer, 2016, arXiv:1611.01046).

In this notebook, we will show more specifically how one can build a fair classifier whose decision is made independent of gender.

@article{louppe2016pivot,
           author = {{Louppe}, G. and {Kagan}, M. and {Cranmer}, K.},
            title = "{Learning to Pivot with Adversarial Networks}",
          journal = {ArXiv e-prints},
    archivePrefix = "arXiv",
           eprint = {1611.01046},
     primaryClass = "stat.ML",
             year = 2016,
            month = nov,
}

In [64]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython import display
%matplotlib inline

Prepare data¶

We are using the adult UCI dataset, where the prediction task is to predict whether someone makes over 50,000$ a year.

In [65]:

original_data = pd.read_csv(
    "adult.data.txt", 
    names=["Age", "Workclass", "fnlwgt", "Education", "Education-Num", 
           "Martial Status", "Occupation", "Relationship", "Race", "Sex", 
           "Capital Gain", "Capital Loss", "Hours per week", "Country", "Target"],
    sep=r'\s*,\s*', engine='python', na_values="?")
original_data.head()

Out[65]:

	Age	Workclass	fnlwgt	Education	Education-Num	Martial Status	Occupation	Relationship	Race	Sex	Capital Gain	Hours per week	Country	Target
0	39	State-gov	77516	Bachelors	13	Never-married	Adm-clerical	Not-in-family	White	Male	2174	40	United-States	<=50K
1	50	Self-emp-not-inc	83311	Bachelors	13	Married-civ-spouse	Exec-managerial	Husband	White	Male	0	13	United-States	<=50K
2	38	Private	215646	HS-grad	9	Divorced	Handlers-cleaners	Not-in-family	White	Male	0	40	United-States	<=50K
3	53	Private	234721	11th	7	Married-civ-spouse	Handlers-cleaners	Husband	Black	Male	0	40	United-States	<=50K
4	28	Private	338409	Bachelors	13	Married-civ-spouse	Prof-specialty	Wife	Black	Female	0	40	Cuba	<=50K

In [66]:

data = pd.get_dummies(original_data)
target = data["Target_>50K"].values
gender = data["Sex_Male"].values
del data["Target_<=50K"]
del data["Target_>50K"]

In [67]:

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X_train, X_test, y_train, y_test, gender_train, gender_test = train_test_split(data, target, gender, train_size=0.5)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Standard classifier¶

We first train a standard neural network on the training data.

In [38]:

import keras.backend as K
from keras.layers import Input, Dense
from keras.models import Model
from keras.optimizers import SGD

inputs = Input(shape=(X_train.shape[1],))
Dx = Dense(32, activation="relu")(inputs)
Dx = Dense(32, activation="relu")(Dx)
Dx = Dense(32, activation="relu")(Dx)
Dx = Dense(1, activation="sigmoid")(Dx)
D = Model(input=[inputs], output=[Dx])
D.compile(loss="binary_crossentropy", optimizer="adam")

In [39]:

D.fit(X_train, y_train, nb_epoch=10)

Epoch 1/10
16280/16280 [==============================] - 1s - loss: 0.3781     
Epoch 2/10
16280/16280 [==============================] - 0s - loss: 0.3205     
Epoch 3/10
16280/16280 [==============================] - 0s - loss: 0.3073     
Epoch 4/10
16280/16280 [==============================] - 0s - loss: 0.3013     
Epoch 5/10
16280/16280 [==============================] - 0s - loss: 0.2958     
Epoch 6/10
16280/16280 [==============================] - 0s - loss: 0.2916     
Epoch 7/10
16280/16280 [==============================] - 1s - loss: 0.2876     
Epoch 8/10
16280/16280 [==============================] - 0s - loss: 0.2843     
Epoch 9/10
16280/16280 [==============================] - 0s - loss: 0.2801     
Epoch 10/10
16280/16280 [==============================] - 1s - loss: 0.2776

Out[39]:

<keras.callbacks.History at 0x7fb09003a0b8>

In [40]:

from sklearn.metrics import roc_auc_score
y_pred = D.predict(X_test)
roc_auc_score(y_test, y_pred)

Out[40]:

0.89805904237036704

Performance is good, but as the plot below illustrates, the distribution of the classifier output is different depending on gender. In particular, the classifier models that women are less likely to make more than 50,000$ a year than men.

In [43]:

plt.hist(y_pred[gender_test == 1], bins=50, histtype="step",  normed=1, label="M")
plt.hist(y_pred[gender_test == 0], bins=50, histtype="step", normed=1, label="F")
plt.ylim(0, 5)
plt.legend()
plt.grid()
plt.show()

The pearson correlation coefficient between gender and the classifier output also clearly highlights this dependency.

In [47]:

from scipy.stats import pearsonr
pearsonr(gender_test, D.predict(X_test).ravel())

Out[47]:

(0.29867010783449688, 0.0)

Training with adversarial networks¶

Let us now jointly train our classifier with an adversarial network. The goal of this second network is to predict gender from the classifier output. If this network is doing well, then it clearly indicates that the classifier output is correlated with the attribute. Accordingly, one can force the classifier to distort its decision to make the adversarial network performs worse. This is the strategy we will use.

In [48]:

def make_trainable(network, flag):
    network.trainable = flag
    for l in network.layers:
        l.trainable = flag

inputs = Input(shape=(X_train.shape[1],))

Dx = Dense(32, activation="relu")(inputs)
Dx = Dense(32, activation="relu")(Dx)
Dx = Dense(32, activation="relu")(Dx)
Dx = Dense(1, activation="sigmoid")(Dx)
D = Model(input=[inputs], output=[Dx])

Rx = Dx
Rx = Dense(32, activation="relu")(Rx)
Rx = Dense(32, activation="relu")(Rx)
Rx = Dense(32, activation="relu")(Rx)
Rx = Dense(1, activation="sigmoid")(Rx)
R = Model(input=[inputs], output=[Rx])

In [49]:

lam = 10.0  # control the trade-off between classification performance and independence

def make_loss_D(c):
    def loss_D(y_true, y_pred):
        return c * K.binary_crossentropy(y_pred, y_true)
    return loss_D

def make_loss_R(c):
    def loss_R(z_true, z_pred):
        return c * K.binary_crossentropy(z_pred, z_true)
    return loss_R

opt_D = SGD()
D.compile(loss=[make_loss_D(c=1.0)], optimizer=opt_D)

opt_DRf = SGD(momentum=0.0)
DRf = Model(input=[inputs], output=[D(inputs), R(inputs)])
make_trainable(R, False)
make_trainable(D, True)
DRf.compile(loss=[make_loss_D(c=1.0), make_loss_R(c=-lam)], optimizer=opt_DRf)

opt_DfR = SGD(momentum=0.0)
DfR = Model(input=[inputs], output=[R(inputs)])
make_trainable(R, True)
make_trainable(D, False)
DfR.compile(loss=[make_loss_R(c=1.0)], optimizer=opt_DfR)

In [52]:

# Pretraining of D
make_trainable(R, False)
make_trainable(D, True)
D.fit(X_train, y_train, nb_epoch=10)

Epoch 1/10
16280/16280 [==============================] - 1s - loss: 0.5117     
Epoch 2/10
16280/16280 [==============================] - 0s - loss: 0.3982     
Epoch 3/10
16280/16280 [==============================] - 0s - loss: 0.3617     
Epoch 4/10
16280/16280 [==============================] - 0s - loss: 0.3459     
Epoch 5/10
16280/16280 [==============================] - 0s - loss: 0.3365     
Epoch 6/10
16280/16280 [==============================] - 1s - loss: 0.3299     
Epoch 7/10
16280/16280 [==============================] - 0s - loss: 0.3249     
Epoch 8/10
16280/16280 [==============================] - 0s - loss: 0.3209     
Epoch 9/10
16280/16280 [==============================] - 0s - loss: 0.3175     
Epoch 10/10
16280/16280 [==============================] - 0s - loss: 0.3141

Out[52]:

<keras.callbacks.History at 0x7fb073cb23c8>

In [53]:

# Pretraining of R
make_trainable(R, True)
make_trainable(D, False)
DfR.fit(X_train, gender_train, nb_epoch=10)

Epoch 1/10
16280/16280 [==============================] - 0s - loss: 0.6385     
Epoch 2/10
16280/16280 [==============================] - 0s - loss: 0.6056     
Epoch 3/10
16280/16280 [==============================] - 0s - loss: 0.5878     
Epoch 4/10
16280/16280 [==============================] - 0s - loss: 0.5755     
Epoch 5/10
16280/16280 [==============================] - 0s - loss: 0.5689     
Epoch 6/10
16280/16280 [==============================] - 0s - loss: 0.5661     
Epoch 7/10
16280/16280 [==============================] - 0s - loss: 0.5647     
Epoch 8/10
16280/16280 [==============================] - 0s - loss: 0.5636     
Epoch 9/10
16280/16280 [==============================] - 1s - loss: 0.5629     
Epoch 10/10
16280/16280 [==============================] - 0s - loss: 0.5624

Out[53]:

<keras.callbacks.History at 0x7fb073abe278>

In [55]:

def plot_losses(i, losses):
    display.clear_output(wait=True)
    display.display(plt.gcf())

    ax1 = plt.subplot(311)   
    values = np.array(losses["L_f"])
    plt.plot(range(len(values)), values, label=r"$L_f$", color="blue")
    plt.legend(loc="upper right")
    plt.grid()
    
    ax2 = plt.subplot(312, sharex=ax1) 
    values = np.array(losses["L_r"]) / lam
    plt.plot(range(len(values)), values, label=r"$L_r$", color="green")
    plt.legend(loc="upper right")
    plt.grid()
    
    ax3 = plt.subplot(313, sharex=ax1)
    values = np.array(losses["L_f - L_r"])
    plt.plot(range(len(values)), values, label=r"$L_f - \lambda L_r$", color="red")  
    plt.legend(loc="upper right")
    plt.grid()
    
    plt.show()  

In [56]:

losses = {"L_f": [], "L_r": [], "L_f - L_r": []}

In [57]:

batch_size = 128

for i in range(201):
    l = DRf.evaluate(X_test, [y_test, gender_test], verbose=0)    
    losses["L_f - L_r"].append(l[0][None][0])
    losses["L_f"].append(l[1][None][0])
    losses["L_r"].append(-l[2][None][0])
    print(losses["L_r"][-1] / lam)
    
    if i % 5 == 0:
        plot_losses(i, losses)

    # Fit D
    make_trainable(R, False)
    make_trainable(D, True)
    indices = np.random.permutation(len(X_train))[:batch_size]
    DRf.train_on_batch(X_train[indices], [y_train[indices], gender_train[indices]])
        
    # Fit R
    make_trainable(R, True)
    make_trainable(D, False)
    DfR.fit(X_train, gender_train, batch_size=batch_size, nb_epoch=1, verbose=1)

<matplotlib.figure.Figure at 0x7fb07302e710>

Epoch 1/1
16280/16280 [==============================] - 0s - loss: 0.6343

In [58]:

y_pred = D.predict(X_test)
roc_auc_score(y_test, y_pred)

Out[58]:

0.86212246538727966

Performance is slightly worse, but as the plot and the pearson correlation coefficient show below, the distribution of the classifier output is now almost independent of gender. The classifier is now fair.

In [62]:

plt.hist(y_pred[gender_test == 1], bins=50, histtype="step",  normed=1, label="M")
plt.hist(y_pred[gender_test == 0], bins=50, histtype="step", normed=1, label="F")
plt.ylim(0, 5)
plt.legend()
plt.grid()
plt.show()

In [63]:

from scipy.stats import pearsonr
pearsonr(gender_test, D.predict(X_test).ravel())

Out[63]:

(0.018571555562095266, 0.017802639756038973)