In our class today, we learned about "Availability Bias" where humans think that things that are easy to recall are also more common. For example, are there more English words starting with "e" than words with "e" in the third position? A lot of people would think there are more words starting with the letter e because it's easier to think of such words than word with e at the third position. To check this, we write some python scripts to count such words.
#Count the words starting with e
words = open('/usr/share/dict/words')
count = 0
for word in words:
if word.startswith("e") or word.startswith("E"):
#print(word)
count = count + 1
print(count)
8736
#Count the words with e in the third position
words = open('/usr/share/dict/words')
count = 0
for word in words:
if len(word) < 3:
continue
if word[2] == "e" or word[2] == "E":
#print(word)
count = count + 1
print(count)
18351
#We define a function to count words with 'letter' in 'position'
def count_words(letter, position, wordlist='/usr/share/dict/words'):
"""
Look through the words in 'wordlist',
count the words with 'letter' in 'position'.
If 'wordlist' is omitted, it's assumed to be at '/usr/share/dict/words'
which is a word list on macOS.
For example count_words("a", 1, "c:\dict.txt") counts the number of words
in the file 'c:\dict.txt' starting with A or a;
count_words("b", 3) counts the number of words whose 3rd letter is B or b.
"""
words = open(wordlist)
index = position - 1
upcase = letter.upper()
locase = letter.lower()
count = 0
for word in words:
if len(word) < position:
continue
if word[index] == upcase or word[index] == locase:
count += 1
return(count)
We try count_words() for e at positions 1, 2, 3 to check with previous results.
count_words("e", 1)
8736
count_words("e", 2)
33649
count_words("e", 3)
18351
Here, we count words with the letter e at position 1, 2, 3, ..., 10
for i in range(1,11):
print(i, count_words("e", i))
1 8736 2 33649 3 18351 4 25482 5 23685 6 22010 7 22663 8 20153 9 18357 10 14635
Here, we count words with the letter k at position 1, 2, 3, ..., 10
for i in range(1,11):
print(i, count_words("k", i))
1 2281 2 540 3 1149 4 3417 5 2427 6 1358 7 1591 8 1436 9 1119 10 498