# Generating Concordances¶

This notebook shows how you can generate a concordance using lists.

First we see what text files we have.

In [1]:
ls *.txt

Hume Enquiry.txt   negative.txt       positive.txt
Hume Treatise.txt  obama_tweets.txt


We are going to use the "Hume Enquiry.txt" from the Gutenberg Project. You can use whatever text you want. We print the first 50 characters to check.

In [2]:
theText2Use = "Hume Treatise.txt"


This string has 1344061 characters.
The Project Gutenberg EBook of A Treatise of Human


## Tokenization¶

Now we tokenize the text producing a list called "listOfTokens" and check the first words. This eliminates punctuation and lowercases the words.

In [3]:
import re
print(listOfTokens[:10])

['the', 'project', 'gutenberg', 'ebook', 'of', 'a', 'treatise', 'of', 'human', 'nature']


## Input¶

Now we get the word you want a concordance for and the context wanted.

In [4]:
word2find = input("What word do you want collocates for? ").lower() # Ask for the word to search for
context = input("How much context do you want? ")# This asks for the context of words on either side to grab

What word do you want collocates for? truth
How much context do you want? 10

In [5]:
type(context)

Out[5]:
str
In [7]:
contextInt = int(context)
type(contextInt)

Out[7]:
int
In [9]:
len(listOfTokens)

Out[9]:
228958

## Main function¶

Here is the main function that does the work populating a new list with the lines of concordance. We check the first 5 concordance lines.

In [10]:
def makeConc(word2conc,list2FindIn,context2Use,concList):

end = len(list2FindIn)
for location in range(end):
if list2FindIn[location] == word2conc:
# Here we check whether we are at the very beginning or end
if (location - context2Use) < 0:
beginCon = 0
else:
beginCon = location - context2Use

if (location + context2Use) > end:
endCon = end
else:
endCon = location + context2Use + 1

theContext = (list2FindIn[beginCon:endCon])
concordanceLine = ' '.join(theContext)
# print(str(location) + ": " + concordanceLine)
concList.append(str(location) + ": " + concordanceLine)

theConc = []
makeConc(word2find,listOfTokens,int(context),theConc)
theConc[-5:]

Out[10]:
['220330: a reason why the faculty of recalling past ideas with truth and clearness should not have as much merit in it',
'223214: confessing my errors and should esteem such a return to truth and reason to be more honourable than the most unerring',
'223680: from the other this therefore being regarded as an undoubted truth that belief is nothing but a peculiar feeling different from',
'224382: mind and he will evidently find this to be the truth secondly whatever may be the case with regard to this',
'225925: by their different feeling i should have been nearer the truth end of project gutenberg s a treatise of human nature']

## Output¶

Finally, we output to a text file.

In [11]:
nameOfResults = word2find.capitalize() + ".Concordance.txt"

with open(nameOfResults, "w") as fileToWrite:
for line in theConc:
fileToWrite.write(line + "\n")

print("Done")

Done


Here we check that the file was created.

In [12]:
ls *.Concordance.txt

Truth.Concordance.txt


## Next Steps¶

Onwards to our final utility example Exploring a text with NLTK

CC BY-SA From The Art of Literary Text Analysis by Stéfan Sinclair & Geoffrey Rockwell. Edited and revised by Melissa Mony.
Created September 30th, 2016 (Jupyter 4.2.1)

In [ ]: