Workshop // Exploring Gender Bias in Word Embedding

https://learn.responsibly.ai/word-embedding

Powerd by responsibly - Toolkit for Auditing and Mitigating Bias and Fairness of Machine Learning Systems 🔎🤖🧰

Legend:

💎 Important

⚡ Be Aware - Debated issue / interpret carefully / simplicity over precision

🛠️ Setup/Technical (a.k.a "the code is not important, just run it!")

🧪 Methodological Issue

💻 Hands-On - Your turn! NO programming background

⌨️ ... Some programming background (in Python) is required

🦄 Out of Scope

Part One: Setup

1.1 - 🛠️ Install responsibly

In [ ]:
!pip install --user git+https://github.com/ResponsiblyAI/[email protected]
# !pip install --user responsibly

1.2 - 🛠️ Validate Installation of responsibly

In [ ]:
import responsibly

# You should get '0.1.2'
responsibly.__version__

⚠️

If you get an error of ModuleNotFoundError: No module named 'responsibly' after the installation, and you work on either Colab or Binder - this is normal.

Restart the Kernel/Runtime (use the menu on top or the botton in the notebook), skip the installation cell (!pip install --user responsibly) and run the previous cell again (import responsibly).

Now it should all work fine!


Part Two: Motivation: Why to use Word Embeddings?

2.1 - NLP (Natural Language Processing) - Very partial list of tasks

1. Classification

  • Fake news classification
  • Toxic comment classification
  • Review raiting (sentiment analysis)
  • Hiring decision making by CV
  • Automated essay scoring

3. Machine Translation

2. Information Retrieval

  • Search engine
  • Plagiarism detection

3. Conversation chatbot

4. Coreference Resolution

Source: [Stanford Natural Language Processing Group](https://nlp.stanford.edu/projects/coref.shtml)





2.2 - Machine Learning (NLP) Pipeline


Data → Representation → (Structured) Inference → Prediction

                          â†‘

        Auxiliary Corpus/Model </h2> </div>

Source: [Kai-Wei Chang (UCLA) - What It Takes to Control Societal Bias in Natural Language Processing](https://www.youtube.com/watch?v=RgcXD_1Cu18)

2.3 - Esessional Question - How to represent language to machine?

We need some kind of dictionary 📖 to transform/encode

... from a human representation (words) 🗣 🔡

... to a machine representation (numbers) 🤖 🔢





First Try

Idea: Bag of Words (for a document)

Source: Zheng, A.& Casari, A. (2018). Feature Engineering for Machine Learning. O'Reilly Media.

One-Hot Representation - The Issue with Text

Source: [Tensorflow Documentation](https://www.tensorflow.org/tutorials/representation/word2vec)

Color Picker





2.4 - 💎 Idea: Embedding a word in a n-dimensional space

Distributional Hypothesis

"a word is characterized by the company it keeps" - John Rupert Firth

🦄 Training: using word-context relationships from a corpus. See: The Illustrated Word2vec by Jay Alammar

Distance ~ Meaning Similarity

🦄 Examples (algorithms and pre-trained models)

🦄 State of the Art

The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)

Part Three: Let's play with Word2Vec word embedding...!

Word2Vec - Google News - 100B tokens, 3M vocab, cased, 300d vectors - only lowercase vocab extracted

Loaded using responsibly package, the function responsibly.we.load_w2v_small returns a gensim's KeyedVectors object.

3.1 - Basic Properties

In [ ]:
# 🛠️⚡ ignore warnings
# generally, you shouldn't do that, but for this tutorial we'll do so for the sake of simplicity

import warnings
warnings.filterwarnings('ignore')
In [ ]:
from responsibly.we import load_w2v_small

w2v_small = load_w2v_small()
In [ ]:
# vocabulary size

len(w2v_small.vocab)
In [ ]:
# get the vector of the word "home"

print('home =', w2v_small['home'])
In [ ]:
# the word embedding dimension, in this case, is 300

len(w2v_small['home'])
In [ ]:
# all the words are normalized (=have norm equal to one as vectors)

from numpy.linalg import norm

norm(w2v_small['home'])
In [ ]:
# 🛠️ make sure that all the vectors are normalized!

from numpy.testing import assert_almost_equal

length_vectors = norm(w2v_small.vectors, axis=1)

assert_almost_equal(actual=length_vectors,
                    desired=1,
                    decimal=5)

3.2 - 💎 Demo - Mesuring Distance between Words

Mesure of Similiarty: Cosine Similariy

Measures the cosine of the angle between two vecotrs.

Ranges between 1 (same vector) to -1 (opposite/antipode vector)

In Python, for normalized vectors (Numpy's array), use the @(at) operator!

In [ ]:
w2v_small['cat'] @ w2v_small['cat']
In [ ]:
w2v_small['cat'] @ w2v_small['cats']
In [ ]:
from math import acos, degrees

degrees(acos(w2v_small['cat'] @ w2v_small['cats']))
In [ ]:
w2v_small['cat'] @ w2v_small['dog']
In [ ]:
degrees(acos(w2v_small['cat'] @ w2v_small['dog']))
In [ ]:
w2v_small['cat'] @ w2v_small['cow']
In [ ]:
degrees(acos(w2v_small['cat'] @ w2v_small['cow']))
In [ ]:
w2v_small['cat'] @ w2v_small['graduated']
In [ ]:
degrees(acos(w2v_small['cat'] @ w2v_small['graduated']))

💎 In general, the use of Word Embedding to encode words, as an input for NLP systems (*), improve their performance compared to one-hot representation.

  • Sometimes the embedding is learned as part of the NLP system.

3.3 - 🛠️ Demo - Visualization Word Embedding in 2D using T-SNE

Source: [Google's Seedbank](https://research.google.com/seedbank/seed/pretrained_word_embeddings)

In [ ]:
from sklearn.manifold import TSNE
from matplotlib import pylab as plt

# take the most common words in the corpus between 200 and 600
words = [word for word in w2v_small.index2word[200:600]]

# convert the words to vectors
embeddings = [w2v_small[word] for word in words]

# perform T-SNE
words_embedded = TSNE(n_components=2).fit_transform(embeddings)

# ... and visualize!
plt.figure(figsize=(20, 20))
for i, label in enumerate(words):
    x, y = words_embedded[i, :]
    plt.scatter(x, y)
    plt.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                 ha='right', va='bottom', size=11)
plt.show()

3.4 - Demo - Tensorflow Embedding Projector

⚡ Be cautious: It is easy to see "patterns".

3.5 - Demo - Most Similar

What are the most simlar words (=closer) to a given word?

In [ ]:
w2v_small.most_similar('cat')

3.6 - [EXTRA] Demo - Doesn't Match

Given a list of words, which one doesn't match?

The word further away from the mean of all words.

In [ ]:
w2v_small.doesnt_match('breakfast cereal dinner lunch'.split())

3.7 - Demo - Vector Arithmetic

Source: [Wikipedia](https://commons.wikimedia.org/wiki/File:Vector_add_scale.svg)

In [ ]:
# nature + science = ?

w2v_small.most_similar(positive=['nature', 'science'])

3.8 - 💎 More Vector Arithmetic

Source: [Tensorflow Documentation](https://www.tensorflow.org/tutorials/representation/word2vec)

3.9 - Demo - Vector Analogy

In [ ]:
# man:king :: woman:?
# king - man + woman = ?

w2v_small.most_similar(positive=['king', 'woman'],
                       negative=['man'])
In [ ]:
w2v_small.most_similar(positive=['big', 'smaller'],
                       negative=['small'])

3.10 - Think about a DIRECTION in word embedding as a RELATION

$\overrightarrow{she} - \overrightarrow{he}$

$\overrightarrow{smaller} - \overrightarrow{small}$

$\overrightarrow{Spain} - \overrightarrow{Madrid}$

⚡ Direction is not a word vector by itself!

⚡ But it doesn't work all the time...

In [ ]:
w2v_small.most_similar(positive=['forward', 'up'],
                       negative=['down'])

It might be because we have the"looking forward" which is acossiated with "excitement" in the data.

⚡🦄 Keep in mind the word embedding was generated by learning the co-occurrence of words, so the fact that it empirically exhibit "concept arithmetic", it doesn't necessarily mean it learned it! In fact, it seems it didn't.

See: king - man + woman is queen; but why? by Piotr Migdał

🦄 Demo - Word Analogies Visualizer by Julia Bazińska

⚡🦄 In fact, w2v_small.most_similar find the most closest word which is not one of the given ones. This is a real methodological issue. Nowadays, it is not a common practice to evaluate word embedding with analogies.

You can use from responsibly.we import most_similar for the unrestricted version.

Part Four: Gender Bias

⚡ We use the word bias merely as a technical term, without jugement of "good" or "bad". Later on we will put the bias into human contextes to evaluate it.

Keep in mind, the data is from Google News, the writers are professional journalists.

Bolukbasi Tolga, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam T. Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. NIPS 2016.

4.1 - Gender appropriate he-she analogies

In [ ]:
# she:sister :: he:?
# sister - she + he = ?

w2v_small.most_similar(positive=['sister', 'he'],
                       negative=['she'])
queen-king
waitress-waiter
sister-brother
mother-father
ovarian_cancer-prostate_cancer
convent-monastery

4.2 - Gender stereotype he-she analogies

In [ ]:
w2v_small.most_similar(positive=['nurse', 'he'],
                       negative=['she'])
sewing-carpentry
nurse-doctor
blond-burly
giggle-chuckle
sassy-snappy
volleyball-football
register_nurse-physician
interior_designer-architect
feminism-conservatism
vocalist-guitarist
diva-superstar
cupcakes-pizzas
housewife-shopkeeper
softball-baseball
cosmetics-pharmaceuticals
petite-lanky
charming-affable
hairdresser-barber

But with the unrestricted version...

In [ ]:
from responsibly.we import most_similar
In [ ]:
most_similar(w2v_small,
             positive=['nurse', 'he'],
             negative=['she'])

⚡ Be Aware: According to a recent paper, it seems that the method of generating analogies enforce producing gender sterotype ones!

Nissim, M., van Noord, R., van der Goot, R. (2019). Fair is Better than Sensational: Man is to Doctor as Woman is to Doctor.

... and a Twitter thread between the authors of the two papares.

My takeaway (and as well as of other researchers): Analogies are not approriate method to observe bias in word embedding.

🧪 What if our methodology introduce a bias?

4.3 - 💎 What can we take from analogies? Gender Direction!

$\overrightarrow{she} - \overrightarrow{he}$

In [ ]:
gender_direction = w2v_small['she'] - w2v_small['he']

gender_direction /= norm(gender_direction)
In [ ]:
gender_direction @ w2v_small['architect']
In [ ]:
gender_direction @ w2v_small['interior_designer']

⚡Interprete carefully: The word architect appears in more contexts with he than with she, and vice versa for interior designer.

🦄 In practice, we calculate the gender direction using multiple definitional pair of words for better estimation (words may have more than one meaning):

  • woman - man
  • girl - boy
  • she - he
  • mother - father
  • daughter - son
  • gal - guy
  • female - male
  • her - his
  • herself - himself
  • Mary - John

4.4 - 💻 Try some words by yourself

⚡ Keep in mind: You are performing exploratory data analysis, and not evaluate systematically!

In [ ]:
gender_direction @ w2v_small['word']

4.5 - 💎 So What?

Downstream Application - Putting a system into a human context

Toy Example - Search Engine Ranking

  • "MIT computer science PhD student"
  • "doctoral candidate" ~ "PhD student"
  • John:computer programmer :: Mary:homemaker

Universal Embeddings

  • Pre-trained on a large corpus
  • Plugged in downstream task models (sentimental analysis, classification, translation …)
  • Improvement of performances

4.6 - Measuring Bias in Word Embedding

Think-Pair-Shar


Basic Ideas: Use neutral-gender words!


Neutral Professions!

4.7 - Projections

In [ ]:
from responsibly.we import GenderBiasWE

w2v_small_gender_bias = GenderBiasWE(w2v_small, only_lower=True)
In [ ]:
w2v_small_gender_bias.positive_end, w2v_small_gender_bias.negative_end
In [ ]:
# gender direction
w2v_small_gender_bias.direction[:10]
In [ ]:
from responsibly.we.data import BOLUKBASI_DATA

neutral_profession_names = BOLUKBASI_DATA['gender']['neutral_profession_names']
In [ ]:
neutral_profession_names[:8]

Note: Why actor is in the neutral profession names list while actress is not there?

  1. Due to the statistical nature of the method that is used to find the gender- specific and natural word
  2. That might be because actor nowadays is much more gender-neutral, compared to waiter-waitress (see Wikipedia - The term Actress)
In [ ]:
len(neutral_profession_names)
In [ ]:
# the same of using the @ operator on the bias direction

w2v_small_gender_bias.project_on_direction(neutral_profession_names[0])

Let's visualize the projections of professions (neutral and specific by the orthography) on the gender direction

In [ ]:
import matplotlib.pylab as plt

f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_bias.plot_projection_scores(n_extreme=20, ax=ax);

Demo - Visualizing gender bias with Word Clouds

Let's take the percentage of female in various occupations from the Labor Force Statistics of 2017 Population Survey.

Taken from: https://arxiv.org/abs/1804.06876

In [ ]:
from operator import itemgetter  # 🛠️ For idiomatic sorting in Python

from responsibly.we.data import OCCUPATION_FEMALE_PRECENTAGE

sorted(OCCUPATION_FEMALE_PRECENTAGE.items(), key=itemgetter(1))
In [ ]:
f, ax = plt.subplots(1, figsize=(10, 8))

w2v_small_gender_bias.plot_factual_association(ax=ax);

Also: Word embeddings quantify 100 years of gender stereotypes

Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2018). Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), E3635-E3644.

Data: Google Books/Corpus of Historical American English (COHA)

Word embedding is sometimes used to analyze a collection of text in digital humanities - putting a system into a human context.

🧪 Quite strong and interesting observation! We used "external" data which wan't used directly to create the word embedding.

It takes us to think about the data generation process - in both cases it is the "world", but it will be difficult to argue for causality:

1. Text in newspapers
2. Employment by gender

4.9 - Direct Bias Measure

  1. Project each neutral profession names on the gender direction
  2. Calculate the absolute value of each projection
  3. Average it all
In [ ]:
# using responsibly

w2v_small_gender_bias.calc_direct_bias()
In [ ]:
# what responsibly does:

neutral_profession_projections = [w2v_small[word] @ w2v_small_gender_bias.direction
                                  for word in neutral_profession_names]

abs_neutral_profession_projections = [abs(proj) for proj in neutral_profession_projections]

sum(abs_neutral_profession_projections) / len(abs_neutral_profession_projections)

🧪 What are the assumptions of the direct bias measure? How the choice of neutral word effect on the definition of the bias?

4.10 - [EXTRA] Indirect Bias Measure

Similarity due to shared "gender direction" projection

In [ ]:
w2v_small_gender_bias.generate_closest_words_indirect_bias('softball',
                                                           'football')

Part Five: Mitigating Bias

We intentionally do not reference the resulting embeddings as "debiased" or free from all gender bias, and prefer the term "mitigating bias" rather that "debiasing," to guard against the misconception that the resulting embeddings are entirely "safe" and need not be critically evaluated for bias in downstream tasks. - James-Sorenson, H., & Alvarez-Melis, D. (2019). [Probabilistic Bias Mitigation in Word Embeddings](https://arxiv.org/pdf/1910.14497.pdf). arXiv preprint arXiv:1910.14497.

5.1 - Neutralize

In this case, we will remove the gender projection from all the words, except the neutral-gender ones, and then normalize.

🦄 We need to "learn" what are the gender-specific words in the vocabulary for a seed set of gender-specific words (by semi-automatic use of WordNet)

In [ ]:
w2v_small_gender_debias = w2v_small_gender_bias.debias(method='neutralize', inplace=False)
In [ ]:
print('home:',
      'before =', w2v_small_gender_bias.model['home'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['home'] @ w2v_small_gender_debias.direction)
In [ ]:
print('man:',
      'before =', w2v_small_gender_bias.model['man'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.direction)
In [ ]:
print('woman:',
      'before =', w2v_small_gender_bias.model['woman'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.direction)
In [ ]:
w2v_small_gender_debias.calc_direct_bias()
In [ ]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_debias.plot_projection_scores(n_extreme=20, ax=ax);
In [ ]:
f, ax = plt.subplots(1, figsize=(10, 8))

w2v_small_gender_debias.plot_factual_association(ax=ax);

5.2 [EXTRA] Equalize

  • Do you see that man and woman have a different projection on the gender direction?

  • It might cause to different similarity (distance) to neutral words, such as to kitchen

In [ ]:
w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.model['kitchen']
In [ ]:
w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.model['kitchen']
In [ ]:
BOLUKBASI_DATA['gender']['equalize_pairs'][:10]

5.3 - Hard Debias = Neutralize + Equalize

In [ ]:
w2v_small_gender_debias = w2v_small_gender_bias.debias(method='hard', inplace=False)
In [ ]:
print('home:',
      'before =', w2v_small_gender_bias.model['home'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['home'] @ w2v_small_gender_debias.direction)
In [ ]:
print('man:',
      'before =', w2v_small_gender_bias.model['man'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.direction)
In [ ]:
print('woman:',
      'before =', w2v_small_gender_bias.model['woman'] @ w2v_small_gender_bias.direction,
      'after = ', w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.direction)
In [ ]:
w2v_small_gender_debias.calc_direct_bias()
In [ ]:
w2v_small_gender_debias.model['man'] @ w2v_small_gender_debias.model['kitchen']
In [ ]:
w2v_small_gender_debias.model['woman'] @ w2v_small_gender_debias.model['kitchen']
In [ ]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_gender_debias.plot_projection_scores(n_extreme=20, ax=ax);

5.4 - Compare Preformances

After debiasing, the performance of the word embedding, using standard benchmarks, get only slightly worse!

⚠️ It might take few minutes to run!

In [ ]:
w2v_small_gender_bias.evaluate_word_embedding()
In [ ]:
w2v_small_gender_debias.evaluate_word_embedding()

💎 Part Six: So What?

We removed the gender bias, as we defined it, in a word embedding - Is there any impact on a downstream application?

Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2018). Gender bias in coreference resolution: Evaluation and debiasing methods. NAACL-HLT 2018.

WinoBias Dataset

Stereotypical Occupations (the source of responsibly.we.data.OCCUPATION_FEMALE_PRECENTAGE)

Results on UW End-to-end Neural Coreference Resolution System

No Intervention - Baseline
Word Embedding OnoNotes Type 1 - Pro-stereotypical Type 1 - Anti-stereotypical Avg Diff
Original 67.7 76.0 49.4 62.7 26.6*
Intervention: Named-entity anonymization
Word Embedding OnoNotes Type 1 - Pro-stereotypical Type 1 - Anti-stereotypical Avg Diff
Original 66.4 73.5 51.2 62.6 21.3*
Hard Debiased 66.5 67.2 59.3 63.2 7.9*
Interventions: Named-entity anonymization + Gender swapping
Word Embedding OnoNotes Type 1 - Pro-stereotypical Type 1 - Anti-stereotypical Avg Diff
Original 66.2 65.1 59.2 62.2 5.9*
Hard Debiased 66.3 63.9 62.8 63.4 1.1

Zhao, J., Zhou, Y., Li, Z., Wang, W., & Chang, K. W. (2018). Learning gender-neutral word embeddings. EMNLP 2018.

Another bias mitigation method (tailor-made for GloVe training process)

💎💎 Part Seven: Meta "So What?" - I

How should we definition of "bias" in word embedding?

1. Intrinsic (e.g., direct bias)

2. External - Downstream application (e.g., coreference resolution, classification)

💎 Part Eight: Have we really removed the bias?

Let's look on another metric, called WEAT (Word Embedding Association Test) which is inspired by IAT (Implicit-Association Test) from Pyschology.

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

8.1 - Ingredients

  1. Target words (e.g., Male ve. Female)

  2. Attribute words (e.g., Math vs. Arts)

In [ ]:
from copy import deepcopy  # 🛠️ For copying a nested data structure in Python

from responsibly.we.weat import WEAT_DATA


# B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math≠me.,
# Journal of Personality and Social Psychology 83, 44 (2002).
weat_gender_science_arts = deepcopy(WEAT_DATA[7])
In [ ]:
# 🛠️ filter words from the original IAT experiment that are not presend in the reduced Word2Vec model

from responsibly.we.weat import _filter_by_model_weat_stimuli

_filter_by_model_weat_stimuli(weat_gender_science_arts, w2v_small)
In [ ]:
weat_gender_science_arts['first_attribute']
In [ ]:
weat_gender_science_arts['second_attribute']
In [ ]:
weat_gender_science_arts['first_target']
In [ ]:
weat_gender_science_arts['second_target']

8.2 - Recipe

➕ Male x Science

➖ Male x Arts

➖ Female x Science

➕ Female x Arts

In [ ]:
def calc_combination_similiarity(model, attribute, target):
    score = 0

    for attribute_word in attribute['words']:

        for target_word in target['words']:

            score += w2v_small.similarity(attribute_word,
                                          target_word)

    return score
In [ ]:
male_science_score = calc_combination_similiarity(w2v_small,
                                                  weat_gender_science_arts['first_attribute'],
                                                  weat_gender_science_arts['first_target'])

male_science_score
In [ ]:
male_arts_score = calc_combination_similiarity(w2v_small,
                                               weat_gender_science_arts['first_attribute'],
                                               weat_gender_science_arts['second_target'])

male_arts_score
In [ ]:
female_science_score = calc_combination_similiarity(w2v_small,
                                                    weat_gender_science_arts['second_attribute'],
                                                    weat_gender_science_arts['first_target'])

female_science_score
In [ ]:
female_arts_score = calc_combination_similiarity(w2v_small,
                                                 weat_gender_science_arts['second_attribute'],
                                                 weat_gender_science_arts['second_target'])

female_arts_score
In [ ]:
male_science_score - male_arts_score - female_science_score + female_arts_score
In [ ]:
len(weat_gender_science_arts['first_attribute']['words'])
In [ ]:
(male_science_score - male_arts_score - female_science_score + female_arts_score) / 8

8.3 - All WEAT Tests

In [ ]:
from responsibly.we import calc_all_weat

calc_all_weat(w2v_small, [weat_gender_science_arts])

⚡ Important Note: Our results are a bit different because we use a reduced Word2Vec.

Results from the Paper (computed on the complete Word2Vec):

⚡Caveats regarding comparing WEAT to the IAT

  • Individuals (IAT) vs. Words (WEAT)
  • Therefore, the meanings of the effect size and p-value are totally different!

⚡🦄 The definition of the WEAT score is structured differently (but it is computationally equivalent). The original formulation matters to compute the p-value. Refer to the paper for details.

🧪 With the effect size, we can "compare" a human bias to a machine one. It raises the question whether the baseline for meauring bias/fairness of a machine should be human bias? Then a well-performing machine shouldn't be necessarily not biased, but only less biased than human (think about autonomous cars or semi-structured vs. unstructured interview).

8.4 - Let's go back to our question - did we removed the bias?

Gonen, H., & Goldberg, Y. (2019, June). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 609-614).

They used multiple methods, we'll show only two:

  1. WEAT
  2. Neutral words clustering
In [ ]:
w2v_small_gender_bias.calc_direct_bias()
In [ ]:
w2v_small_gender_debias.calc_direct_bias()

8.4.1 - WEAT - before and after

See responsibly demo page on word embedding for a complete example.

8.4.2 - Clustering Neutral Gender Words

In [ ]:
w2v_vocab = {word for word in w2v_small_gender_bias.model.vocab.keys()}

# 🦄 how we got these words - read the Bolukbasi's paper for details
all_gender_specific_words = set(BOLUKBASI_DATA['gender']['specific_full_with_definitional_equalize'])

all_gender_neutral_words = w2v_vocab - all_gender_specific_words

print('#vocab =', len(w2v_vocab),
      '#specific =', len(all_gender_specific_words),
      '#neutral =', len(all_gender_neutral_words))
In [ ]:
neutral_words_gender_projections = [(w2v_small_gender_bias.project_on_direction(word), word)
                                    for word in all_gender_neutral_words]

neutral_words_gender_projections.sort()
In [ ]:
neutral_words_gender_projections[:-20:-1]
In [ ]:
neutral_words_gender_projections[:20]
In [ ]:
# Neutral words: top 500 male-biased and top 500 female-biased words

GenderBiasWE.plot_most_biased_clustering(w2v_small_gender_bias, w2v_small_gender_debias);

Note: In the paper, they got a stronger result, 92.5% accuracy for the debiased model. However, they perform clustering on all the words from the reduced word embedding, both gender- neutral and specific words, and applied slightly different pre-processing.

8.4.3 - 💎 Strong words form the paper (emphasis mine):

The experiments ... reveal a systematic bias found in the embeddings, which is independent of the gender direction.

The implications are alarming: while suggested debiasing methods work well at removing the gender direction, the debiasing is mostly superficial. The bias stemming from world stereotypes and learned from the corpus is ingrained much more deeply in the embeddings space.

.. real concern from biased representations is not the association of a concept with words such as “he”, “she”, “boy”, “girl” nor being able to perform gender-stereotypical word analogies... algorithmic discrimination is more likely to happen by associating one implicitly gendered term with other implicitly gendered terms, or picking up on gender-specific regularities in the corpus by learning to condition on gender-biased words, and generalizing to other gender-biased words.

💎💎 Part Nine: Meta "So What?" - II

Can we debias at all a word embedding?

Under some downstream use-cases, maybe the bias in the word embedding is desirable?

⌨️ Part Ten: Your Turn!

Explore bias in word embedding by other groups (such as race and religious)

Task 1. Let's explor racial bias usint Tolga's approche. Will use the responsibly.we.BiasWordEmbedding class. GenderBiasWE is a sub-class of BiasWordEmbedding.

In [ ]:
from responsibly.we import BiasWordEmbedding

w2v_small_racial_bias = BiasWordEmbedding(w2v_small, only_lower=True)

💎💎💎 Identify the racial direction using the sum method

In [ ]:
white_common_names = ['Emily', 'Anne', 'Jill', 'Allison', 'Laurie', 'Sarah', 'Meredith', 'Carrie',
                      'Kristen', 'Todd', 'Neil', 'Geoffrey', 'Brett', 'Brendan', 'Greg', 'Matthew',
                      'Jay', 'Brad']

black_common_names = ['Aisha', 'Keisha', 'Tamika', 'Lakisha', 'Tanisha', 'Latoya', 'Kenya', 'Latonya',
                      'Ebony', 'Rasheed', 'Tremayne', 'Kareem', 'Darnell', 'Tyrone', 'Hakim', 'Jamal',
                      'Leroy', 'Jermaine']

w2v_small_racial_bias._identify_direction('Whites', 'Blacks',
                                          definitional=(white_common_names, black_common_names),
                                          method='sum')

Use the neutral profession names to measure the racial bias

In [ ]:
neutral_profession_names = BOLUKBASI_DATA['gender']['neutral_profession_names']
In [ ]:
neutral_profession_names[:10]
In [ ]:
f, ax = plt.subplots(1, figsize=(10, 10))

w2v_small_racial_bias.plot_projection_scores(neutral_profession_names, n_extreme=20, ax=ax);

Calculate the direct bias measure

In [ ]:
# Your Code Here...

Keep exploring the racial bias

In [ ]:
# Your Code Here...

Task 2. Open the word embedding demo page in responsibly documentation, and look on the use of the function calc_weat_pleasant_unpleasant_attribute. What was the attempt in that experiment? What was the result? Can you come up with other experiments?

In [ ]:
from responsibly.we import calc_weat_pleasant_unpleasant_attribute
In [ ]:
# Your Code Here...

Part Eleven: Examples of Representation Bias in the Context of Gender

Source: Sun, T., Gaut, A., Tang, S., Huang, Y., ElSherief, M., Zhao, J., ... & Wang, W. Y. (2019). [Mitigating Gender Bias in Natural Language Processing: Literature Review](https://arxiv.org/pdf/1906.08976.pdf). arXiv preprint arXiv:1906.08976.

💎💎 Part Twelve: Takeaways

  1. Downstream application - putting a system into a human context

  2. Measurements (a.k.a "what is a good system?")

  3. Data (generation process, corpus building, selection bias, train vs. validation vs. test datasets)

  4. Impact of a system on individuals, groups, society, and humanity, both for long-term and scale-up

Resources

Non-Technical Overview with More Downstream Application Examples

  • Understanding Bias

    • Ethayarajh, K., Duvenaud, D., & Hirst, G. (2019, July). Understanding Undesirable Word Embedding Associations. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1696-1705). - Including critical analysis of the current metrics and debiasing methods (quite technical)
Complete example of using responsibly with Word2Vec, GloVe and fastText: http://docs.responsibly.ai/notebooks/demo-gender-bias-words-embedding.html

Bias in NLP

Around dozen of papers on this field until 2019, but nowdays plenty of work is done.

THE END!