## Counting Words¶

In our class today, we learned about "Availability Bias" where humans think that things that are easy to recall are also more common. For example, are there more English words starting with "e" than words with "e" in the third position? A lot of people would think there are more words starting with the letter e because it's easier to think of such words than word with e at the third position. To check this, we write some python scripts to count such words.

In [1]:
#Count the words starting with e
words = open('/usr/share/dict/words')
count = 0
for word in words:
if word.startswith("e") or word.startswith("E"):
#print(word)
count = count + 1
print(count)

8736

In [2]:
#Count the words with e in the third position
words = open('/usr/share/dict/words')
count = 0
for word in words:
if len(word) < 3:
continue
if word[2] == "e" or word[2] == "E":
#print(word)
count = count + 1
print(count)

18351

In [3]:
#We define a function to count words with 'letter' in 'position'
def count_words(letter, position, wordlist='/usr/share/dict/words'):
"""
Look through the words in 'wordlist',
count the words with 'letter' in 'position'.
If 'wordlist' is omitted, it's assumed to be at '/usr/share/dict/words'
which is a word list on macOS.

For example count_words("a", 1, "c:\dict.txt") counts the number of words
in the file 'c:\dict.txt' starting with A or a;
count_words("b", 3) counts the number of words whose 3rd letter is B or b.
"""
words = open(wordlist)
index = position - 1
upcase = letter.upper()
locase = letter.lower()
count = 0
for word in words:
if len(word) < position:
continue
if word[index] == upcase or word[index] == locase:
count += 1
return(count)


We try count_words() for e at positions 1, 2, 3 to check with previous results.

In [4]:
count_words("e", 1)

Out[4]:
8736
In [5]:
count_words("e", 2)

Out[5]:
33649
In [6]:
count_words("e", 3)

Out[6]:
18351

Here, we count words with the letter e at position 1, 2, 3, ..., 10

In [7]:
for i in range(1,11):
print(i, count_words("e", i))

1 8736
2 33649
3 18351
4 25482
5 23685
6 22010
7 22663
8 20153
9 18357
10 14635


Here, we count words with the letter k at position 1, 2, 3, ..., 10

In [8]:
for i in range(1,11):
print(i, count_words("k", i))

1 2281
2 540
3 1149
4 3417
5 2427
6 1358
7 1591
8 1436
9 1119
10 498

In [ ]: