Tutorial

SBLGNT and Text-Fabric

This tutorial introduces basic queries on the SBL Greek New Testament dataset using Text-Fabric
It assumes at least a basic familiarity with the data model
For documentation on Text-Fabric, see Text-Fabric Wiki

Table of Contents


In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
import collections

Loading Text-Fabric

Import the Fabric module from text-fabric:

In [3]:
from tf.fabric import Fabric

instantiate text-fabric

Load the module with its path in the text-fabric-data directory.

In [6]:
TF = Fabric(modules='greek/sblgnt')
This is Text-Fabric 2.2.1
Api reference : https://github.com/ETCBC/text-fabric/wiki/Api
Tutorial      : https://github.com/ETCBC/text-fabric/blob/master/docs/tutorial.ipynb
Data sources  : https://github.com/ETCBC/text-fabric-data
Data docs     : https://etcbc.github.io/text-fabric-data
Shebanq docs  : https://shebanq.ancient-data.org/text
Slack team    : https://shebanq.slack.com/signup
Questions? Ask [email protected] for an invite to Slack
60 features found and 0 ignored

load sblgnt features

Select which features to load from the data. The available features are in the sblgnt features documentation. Features unique to text-fabric are lower-case while features native to sblgnt are upper.

Features are loaded with the load method on the Fabric object. The method takes a string argument with all of the features. Features in the load string may be space or new-line separated.

In [7]:
api = TF.load('''
                Cat Gender Tense
                Unicode UnicodeLemma Mood
                book chapter verse
                otype function psp
                freq_occ freq_lex
                Head End
              ''')

api.makeAvailableIn(globals()) # optional line, but without it you must always append api.
  0.00s loading features ...
   |     0.01s B otype                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.00s B book                 from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.00s B chapter              from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.00s B verse                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.10s B Unicode              from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.07s B UnicodeLemma         from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.12s B Cat                  from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.04s B Gender               from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.02s B Tense                from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.03s B Mood                 from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.20s B function             from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.05s B psp                  from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.05s B freq_occ             from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.06s B freq_lex             from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.07s B Head                 from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.08s B End                  from /Users/dirk/github/text-fabric-data/greek/sblgnt
   |     0.00s Feature overview: 57 nodes; 2 edges; 1 configs; 7 computeds
  1.93s All features loaded/computed - for details use loadLog()

Intro to Nodes, Objects, and Features

TF uses nodes, objects, and features as pointers to the data.

what is a node?

A node is an arbitrary integer that TF uses to look up the data. Every datapoint in TF has its own unique node. We supply node numbers to TF python objects and get the value in return.

In [8]:
example_node = 137795

# What kind of data does example_node represent? 
# We can find out by supplying the node number to the otype feature object:

F.otype.v(example_node)
Out[8]:
'book'

Which book does example_node represent? We can find out by supplying it to another feature object:

In [9]:
F.book.v(example_node)  # the book feature returns the book's name
Out[9]:
'matthew'

Or, if we want the book name in Swahili:

In [10]:
T.sectionFromNode(example_node, lang='sw')
Out[10]:
('Mathayo', 1, 1)

To check the available languages:

In [11]:
T.languages
Out[11]:
{'am': {'language': 'ኣማርኛ', 'languageEnglish': 'amharic'},
 'ar': {'language': 'العَرَبِية', 'languageEnglish': 'arabic'},
 'bn': {'language': 'বাংলা', 'languageEnglish': 'bengali'},
 'da': {'language': 'Dansk', 'languageEnglish': 'danish'},
 'de': {'language': 'Deutsch', 'languageEnglish': 'german'},
 'el': {'language': 'Ελληνικά', 'languageEnglish': 'greek'},
 'en': {'language': 'English', 'languageEnglish': 'english'},
 'es': {'language': 'Español', 'languageEnglish': 'spanish'},
 'fa': {'language': 'فارسی', 'languageEnglish': 'farsi'},
 'fr': {'language': 'Français', 'languageEnglish': 'french'},
 'he': {'language': 'עברית', 'languageEnglish': 'hebrew'},
 'hi': {'language': 'हिन्दी', 'languageEnglish': 'hindi'},
 'id': {'language': 'Bahasa Indonesia', 'languageEnglish': 'indonesian'},
 'ja': {'language': '日本語', 'languageEnglish': 'japanese'},
 'ko': {'language': '한국어', 'languageEnglish': 'korean'},
 'la': {'language': 'Latina', 'languageEnglish': 'latin'},
 'nl': {'language': 'Nederlands', 'languageEnglish': 'dutch'},
 'pa': {'language': 'ਪੰਜਾਬੀ', 'languageEnglish': 'punjabi'},
 'pt': {'language': 'Português', 'languageEnglish': 'portuguese'},
 'ru': {'language': 'Русский', 'languageEnglish': 'russian'},
 'sw': {'language': 'Kiswahili', 'languageEnglish': 'swahili'},
 'syc': {'language': 'ܠܫܢܐ ܣܘܪܝܝܐ', 'languageEnglish': 'syriac'},
 'tr': {'language': 'Türkçe', 'languageEnglish': 'turkish'},
 'ur': {'language': 'اُردُو', 'languageEnglish': 'urdu'},
 'yo': {'language': 'èdè Yorùbá', 'languageEnglish': 'yoruba'},
 'zh': {'language': '中文', 'languageEnglish': 'chinese'}}

If we want all book names in Syriac:

In [16]:
for bn in F.otype.s('book'):
    print('{:<20} {:>40}'.format(
        T.bookName(bn, lang='en'),
        T.bookName(bn, lang='syc'),
    ))
Matthew                                             ܣܦܪܐ_ܕܡܬܝ
Mark                                              ܣܦܪܐ_ܕܡܪܩܘܣ
Luke                                              ܣܦܪܐ_ܕܠܘܩܘܣ
John                                              ܣܦܪܐ_ܕܝܘܚܢܢ
Acts                                            ܦܪܟܣܣ_ܕܫܠܝ̈ܚܐ
Romans                              ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܪ̈ܗܘܡܝܐ
1_Corinthians              ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܩܘܪ̈ܝܢܬܝܐ_ܩܕܡܝܬܐ
2_Corinthians              ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܩܘܪ̈ܝܢܬܝܐ_ܕܬܪܬܝܢ
Galatians                            ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܓܠܛܝ̈ܐ
Ephesians                            ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܐܦܣܝ̈ܐ
Philippians                       ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܦܝܠܝܦܣܝ̈ܐ
Colossians                          ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܩܘܠ̈ܣܝܐ
1_Thessalonians           ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܬܣܠ̈ܘܢܝܩܝܐ_ܩܕܡܝܬܐ
2_Thessalonians           ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܬܣܠ̈ܘܢܝܩܝܐ_ܕܬܪܬܝܢ
1_Timothy                    ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܛܝܡܬܐܘܣ_ܩܕܡܝܬܐ
2_Timothy                    ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܛܝܡܬܐܘܣ_ܕܬܪܬܝܢ
Titus                                 ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܛܝܛܘܣ
Philemon                            ܐܓܪܬܐ_ܕܦܘܠܘܣ_ܕܠܘܬ_ܦܝܠܝܡܘܢ
Hebrews                                     ܐܓܪܬܐ_ܕܠܘܬ_ܥܒܪ̈ܝܐ
James                                      ܐܓܪܬܐ_ܕܝܥܩܘܒ_ܫܠܝܚܐ
1_Peter                                    ܐܓܪܬܐ_ܕܦܛܪܘܣ_ܫܠܝܚܐ
2_Peter                                   ܐܓܪܬܐ_ܕܬܪܬܝܢ_ܕܦܛܪܘܣ
1_John                                     ܐܓܪܬܐ_ܕܝܘܚܢܢ_ܫܠܝܚܐ
2_John                                    ܐܓܪܬܐ_ܕܬܪܬܝܢ_ܕܝܘܚܢܢ
3_John                                      ܐܓܪܬܐ_ܕܬܠܬ_ܕܝܘܚܢܢ
Jude                                             ܐܓܪܬܐ_ܕܝܗܘܕܐ
Revelation                                       ܓܠܝܢܐ_ܕܝܘܚܢܢ

Let's try something else with this node. We'll supply example_node to a different kind of feature object...

In [8]:
print(F.Gender.v(example_node))
None

What happened here? Book nodes can't have gender features. But word nodes can:

In [9]:
word_node = 1231
print('word_node gender:', F.Gender.v(word_node))
print('word_node unicode:', F.Unicode.v(word_node))
word_node gender: Feminine
word_node unicode: ἔρημον

This is because any given node represents different linguistic objects.

what is an object?

Up to this point we've used the term 'object' in the usual Python sense. The sense we refer to from now on has no relation to programming objects. Rather, in the datamodel of TF, words, phrases, and clauses are defined as (linguistic) objects; likewise, sections like books, chapters, and verses are objects. For more information about how objects are defined, see the data model documentation. Every object has a type. As in the example above, some nodes are book object types, others are word object types, and more.

what is a feature?

Features are strings that provide information on an object type. book, gender, tense, and function are all examples of features that can be looked up for a corresponding object type. See the feature documentation for a reference to all of the features.


Access Object Nodes

access nodes

We've seen what we can do with nodes. But how do we get the nodes we want?

iterate through all nodes with node generator      N():

In [10]:
node_count = 0 

for node in N():
    node_count += 1
    
print('total nodes: ', node_count)
total nodes:  428430

interate through certain object type nodes with feature otype     F.otype.s()

In [11]:
book_count = 0

for book_node in F.otype.s('book'):
    book_count += 1
    
print('total books nodes: ', book_count)
total books nodes:  27

access embedd[ed/ing] objects with "level up", "level down" locality      L.u() / L.d()

TF preserves embedding relationships between object types. For example, phrases are embedded in clauses. See the datamodel discussion on levels to understand how this is encoded. The TF term for these relationships is 'levels.'

In [12]:
from random import Random
randomizer = Random()

highest_word_node = F.otype.maxSlot
random_word = randomizer.randint(1, highest_word_node)

random_word
Out[12]:
7255
In [13]:
# the book lookup returns a tuple containing the embedding book node:
L.u(random_word,'book')
Out[13]:
(137795,)
In [14]:
# let's see all the above information for random_word
level_up = (
            F.Unicode.v(random_word),
            F.book.v(
                        L.u(random_word, otype='book')[0]
                     ),
    
            str(F.chapter.v(
                        L.u(random_word, otype='chapter')[0]
                     )),

            str(F.verse.v(
                        L.u(random_word, otype = 'verse')[0]
                     )),
            'phraseFunction: ' + F.function.v(
                                L.u(random_word, otype='phrase')[0]
                             ))

', '.join(level_up)
Out[14]:
'πολλοὶ, matthew, 13, 17, phraseFunction: adjp'

access section objects with Text      T.nodeFromSection() / T.sectionFromNode()

In [15]:
john316 = ('John',3,16)  # req. a tuple; verse/chapter optional
john316_node = T.nodeFromSection(john316)

john316_node
Out[15]:
422594

The Text api can conversely return section information from a given node (T.sectionFromNode). The T. api also provides a formatting function for formatting UTF8 text from a provided list of nodes.

In the example below we do 3 things:

  1. Gather all of the word nodes in John 3:16 with a L.d() call (this returns a list).
  2. We feed the word nodes to T.text(), which requires an iterable of word nodes as an argument.
  3. And we print with the T.text() now formatted, and reverse the previous cell's step by re-gathering the section data from the john316_node (with `T.sectionfromNode()).
In [16]:
john316_words = L.d(john316_node, otype='word')

print(T.text(john316_words), T.sectionFromNode(john316_node))
γὰρ Οὕτως ἠγάπησεν ὁ θεὸς τὸν κόσμον ὥστε τὸν υἱὸν τὸν μονογενῆ ἔδωκεν, ἵνα πᾶς ὁ πιστεύων εἰς αὐτὸν μὴ ἀπόληται ἀλλὰ ἔχῃ ζωὴν αἰώνιον.  ('John', 3, 16)

count all object types

In [17]:
all_objects = F.otype.all # just a tuple of object types (not nodes!) in sblgnt
print(all_objects)
print(len(all_objects), 'object types in sblgnt')
('book', 'chapter', 'verse', 'sentence', 'clause', 'clause_atom', 'phrase', 'conjunction', 'wordx', 'word')
10 object types in sblgnt
In [18]:
# how many instances of each object type?

object_counts = collections.Counter() # we use a counter to number the instances

for obj_type in all_objects:
    for otype_node in F.otype.s(obj_type): # F.otype.s() to iterate through the given otype nodes
        object_counts[obj_type] += 1

for otype, count in sorted(object_counts.items(), key = lambda k: k[1]):
    print('{:<15}{:>15}'.format(otype, count))
book                        27
conjunction                172
chapter                    260
wordx                      879
verse                     7939
sentence                  8014
clause                   54800
clause_atom              75967
word                    137794
phrase                  142578

count features and values

A special method can return the count of a given feature.

In [19]:
F.Gender.freqList()
Out[19]:
(('Masculine', 41418), ('Feminine', 18750), ('Neuter', 13813))

Use   Fall() to see all loaded features

In [20]:
Fall()
Out[20]:
['Cat',
 'End',
 'Gender',
 'Head',
 'Mood',
 'Tense',
 'Unicode',
 'UnicodeLemma',
 'book',
 'chapter',
 'freq_lex',
 'freq_occ',
 'function',
 'otype',
 'psp',
 'verse']
In [21]:
select_features = {'category' : F.Cat, 
                   'gender' : F.Gender, 
                   'tense' : F.Tense, 
                   'mood' : F.Mood,
                   'partOfSpeech' : F.psp,}

for feature,TFObject in select_features.items():
    
    counts = '\n'.join(list('{:10}{:>15}'.format(value, count) for value, count in TFObject.freqList()))
    print('{:>15}\n{:>15}'.format(feature,'-'*25))
    print(counts, '\n\n')
       category
-------------------------
np                  86102
CL                  54800
vp                  28339
noun                28277
verb                28112
V                   25142
det                 19806
ADV                 19523
S                   19180
conj                18422
pron                16132
pp                  11434
prep                11039
O                   10931
adjp                 9651
adj                  8906
advp                 6535
adv                  6314
P                    3685
IO                   2666
VC                   2590
ptcl                 1043
nump                  517
num                   477
intj                  317
O2                    264 


         gender
-------------------------
Masculine           41418
Feminine            18750
Neuter              13813 


          tense
-------------------------
Aorist              11596
Present             11552
Imperfect            1679
Future               1624
Perfect              1573
Pluperfect             88 


           mood
-------------------------
Indicative          15616
Participle           6659
Infinitive           2289
Subjunctive           1857
Imperative           1623
Optative               68 


   partOfSpeech
-------------------------
noun                28277
verb                28112
det                 19806
conj                18250
pron                16130
prep                10898
adj                  8670
adv                  6176
ptcl                  979
num                   477
intj                   19 



Example Query: WordOrder

Word order is notoriously tricky in Greek. Can we find any tendencies throughout the different NT books?

For this search, we will look for clauses in which both a subject and a finite verb are present and measure which one comes first. The results will be presented on a book-by-book basis.

To accomplish this query, we have to first understand a bit about the highly nuanced clause structure of sblgnt with gbi trees. An iteration through clauses is not yet as simple in TF as it is for the etcbc Hebrew data:

excursus: clause structure in iteration

In [22]:
John1_1 = T.nodeFromSection(('John',1,1)) # pull the verse node for John 1.1

print('verse\t', T.text(L.d(John1_1, otype = 'word'))) # print the words contained in verse

for clause in L.d(John1_1, otype = 'clause'): # iterate through clauses contained in verse
    words = L.d(clause, otype='word')           # find the word nodes contained in each clause
    print('clause\t', T.text(words))            # print the words as unicode
verse	 Ἐν ἀρχῇ ἦν ὁ λόγος, καὶ ὁ λόγος ἦν πρὸς τὸν θεόν, καὶ θεὸς ἦν ὁ λόγος. 
clause	 Ἐν ἀρχῇ ἦν ὁ λόγος, καὶ ὁ λόγος ἦν πρὸς τὸν θεόν, καὶ θεὸς ἦν ὁ λόγος. 
clause	 Ἐν ἀρχῇ ἦν ὁ λόγος, 
clause	 ὁ λόγος ἦν πρὸς τὸν θεόν, 
clause	 θεὸς ἦν ὁ λόγος. 

We see that embedding and embedded clause objects are not yet distinguishable from each other with a simple iteration in Text Fabric. One way to mitigate this problem is to search for only embedded clauses in order to find only those clauses that function at the lowest levels. But this solution only shifts the problem down one level, since embedded clauses can themselves embed other clauses:

In [23]:
John1_5 = T.nodeFromSection(('John',1,5))
for clause in L.d(John1_5, otype = 'clause'):
    if L.u(clause, otype = 'clause'):  # here's the qualification; i.e. ONLY pull embedded's
        print(T.text(L.d(clause, otype = 'word')))
τὸ φῶς ἐν τῇ σκοτίᾳ φαίνει, καὶ ἡ σκοτία αὐτὸ οὐ κατέλαβεν. 
τὸ φῶς ἐν τῇ σκοτίᾳ φαίνει, 
ἡ σκοτία αὐτὸ οὐ κατέλαβεν. 

If we add yet another qualification, that the clause itself cannot be an embedder (only clauses that are embedded but not embedding), we find that we end up losing important information:

In [24]:
John1_6 = T.nodeFromSection(('John',1,6))

print('verse\t', T.text(L.d(John1_6, otype = 'word')))

for clause in L.d(John1_6, otype = 'clause'):
    # now we add a second qualification: clause cannot be an embedder...
    if L.u(clause, otype = 'clause') and not L.d(clause, otype = 'clause'):
        print('clause\t', T.text(L.d(clause, otype = 'word')))
verse	 Ἐγένετο ἄνθρωπος ἀπεσταλμένος παρὰ θεοῦ, ὄνομα αὐτῷ Ἰωάννης· 
clause	 ἀπεσταλμένος παρὰ θεοῦ, 
clause	 ὄνομα αὐτῷ Ἰωάννης· 

Now we miss the clause, Ἐγένετο ἄνθρωπος, because of the second qualification.

In order to retrieve non-overlapping clauses at the most basic level, we need to specify only those clauses that are:

  1. not embedding clauses OR
  2. an embedding clause with phrases not reflected in its "children" clauses.
    • keep only those phrase nodes (currently clause_atom)
In [25]:
def clauseFilter(clNode):
    '''
    test whether clause is embedded but not embedding
    return True or False
    '''
    motherClause = L.u(clNode, otype = 'clause')
    daughterClause = L.d(clNode, otype = 'clause')
    if motherClause and not daughterClause: # embedded but not embedding
        return True
    else:
        return False

def keepPhrases(clNode): 

    allPhrases = L.d(clNode, otype = 'clause_atom')
    daughterClauses = L.d(clNode, otype = 'clause')
    daughterPhrases = set(
                             ph for clause in daughterClauses \
                             for ph in L.d(clause, otype = 'clause_atom')
                         )
    goodPhrases = tuple(ph for ph in allPhrases if ph not in daughterPhrases)
    return goodPhrases

Now we test it again on the sample passage John 1.6 that gave us trouble above:

In [26]:
for i, clause in enumerate(L.d(John1_6, otype = 'clause')):
    if clauseFilter(clause):
        words = L.d(clause, otype = 'word')
        phrases = L.d(clause, otype = 'clause_atom')
    elif keepPhrases(clause):
        words = (w for phrase in keepPhrases(clause) for w in L.d(phrase, otype = 'word'))
        phrases = keepPhrases(clause)
    else: continue 
    print('Clause{}'.format(i))
    print('words\t\t', T.text(L.d(clause, otype = 'word')))
    print('phrNodes\t',' '.join(str(ph) for ph in phrases)) # nodes
    print('phrFunctions\t','\t'.join(F.function.v(ph) for ph in phrases),'\n') # nodes
Clause1
words		 Ἐγένετο ἄνθρωπος ἀπεσταλμένος παρὰ θεοῦ, 
phrNodes	 221521 221522
phrFunctions	 V	S 

Clause2
words		 ἀπεσταλμένος παρὰ θεοῦ, 
phrNodes	 221523 221524
phrFunctions	 V	ADV 

Clause3
words		 ὄνομα αὐτῷ Ἰωάννης· 
phrNodes	 221525 221526 221527
phrFunctions	 S	ADV	P 

Notice that the words from the clause1 still overlap with clause2! But if we look closer at the phrase nodes (clause_atoms), we find that even though the words overlap, the phrases and roles are both different because clause1 contains phrase nodes (221521,221522) and clause2 contains (221523,221524). The roles are different too, with clause1 consisting of verb-subject functions and clause2 of verb-adverb.

main code

Now we apply the functions in a large loop through all the books in the New Testament. We iterate through each word in each clause and test its phrase function by looking up its clause_atom (currently a phrase-type object in TF-sblgnt). We then check whether the phrasal clause_atom has the 'S' (subject) or 'V' (verb) function.

In [27]:
def getWordOrder(section, find = set(), wordList = False):
    for clause in L.d(section, otype = 'clause'):
        if clauseFilter(clause):
            phrases = L.d(clause, otype = 'clause_atom')
        else:
            phrases = keepPhrases(clause)
        clauseLvlFunctions = ''
        for phrase in phrases:
            if F.function.v(phrase) in find:
                clauseLvlFunctions += F.function.v(phrase)
        if clauseLvlFunctions and not wordList:
            yield clauseLvlFunctions
        elif clauseLvlFunctions:
            yield (clauseLvlFunctions, tuple(w for ph in phrases for w in L.d(ph, otype = 'word')))

First a test...

In [28]:
john1 = T.nodeFromSection(('John',1))

for result in list(getWordOrder(john1, find = {'S','V'}, wordList = True))[:5]:
    print(T.text(result[1]), '\n', result[0], '\n')
Ἐν ἀρχῇ ἦν ὁ λόγος,  
 S 

ὁ λόγος ἦν πρὸς τὸν θεόν,  
 S 

θεὸς ἦν ὁ λόγος.  
 S 

οὗτος ἦν ἐν ἀρχῇ πρὸς τὸν θεόν.  
 S 

πάντα δι’ αὐτοῦ ἐγένετο,  
 SV 

Now we can apply the function to the whole NT. We'll only keep results that contain both a subject and a verb. This presents a good opportunity to utilise the previously undiscussed, built-in info() function from Text Fabric.

In [29]:
# demo
info('Use info!')
  2.77s Use info!

Now for the loop!

In [33]:
wordOrderCounts = collections.defaultdict(collections.Counter)
find = {'S','V'}

for book in F.otype.s('book'):
    info('Processing word order for {}'.format(F.book.v(book)))
    for result in getWordOrder(book, find=find):
        if result in {'SV','VS'}:
            wordOrderCounts[F.book.v(book)][result] += 1
            wordOrderCounts[F.book.v(book)]['total'] += 1      

allresults = sum(amt[1] for book in wordOrderCounts for amt in wordOrderCounts[book].items()\
                 if amt[0] != 'total')

print()
info('Word order search complete with {} results'.format(allresults))
 1m 47s Processing word order for matthew
 1m 48s Processing word order for mark
 1m 48s Processing word order for luke
 1m 49s Processing word order for john
 1m 50s Processing word order for acts
 1m 50s Processing word order for romans
 1m 51s Processing word order for 1corinthians
 1m 51s Processing word order for 2corinthians
 1m 51s Processing word order for galatians
 1m 51s Processing word order for ephesians
 1m 51s Processing word order for philippians
 1m 51s Processing word order for colossians
 1m 51s Processing word order for 1thessalonians
 1m 51s Processing word order for 2thessalonians
 1m 52s Processing word order for 1timothy
 1m 52s Processing word order for 2timothy
 1m 52s Processing word order for titus
 1m 52s Processing word order for philemon
 1m 52s Processing word order for hebrews
 1m 52s Processing word order for james
 1m 52s Processing word order for 1peter
 1m 52s Processing word order for 2peter
 1m 52s Processing word order for 1john
 1m 52s Processing word order for 2john
 1m 52s Processing word order for 3john
 1m 52s Processing word order for jude
 1m 52s Processing word order for revelation

 1m 53s Word order search complete with 7865 results

results

In [35]:
def percent(amount, total):
    return round((amount/total)*100,2)

SV_total = 0
VS_total = 0

# Table Header
print('{:>15}{:>14}{:>19}'.format('Book','SV','VS'))
print('-'*55)

for book in (F.book.v(b) for b in F.otype.s('book')): # follow the canonical order
    total = wordOrderCounts[book]['total']
    SV = wordOrderCounts[book]['SV']
    VS = wordOrderCounts[book]['VS']
    SV_total += SV
    VS_total += VS
    print('{:>15}   {:5}   {:5}%    {:5}   {:5}%'.format( book, SV, percent(SV,total), VS, percent(VS,total)))
print('{:>15}   {:5}   {:5}%    {:5}   {:5}%'.format( 'TOTAL', 
                                                     SV_total, percent(SV_total,allresults), 
                                                     VS_total, percent(VS_total,allresults)))
           Book            SV                 VS
-------------------------------------------------------
        matthew     797   66.53%      401   33.47%
           mark     470   68.02%      221   31.98%
           luke     747   62.56%      447   37.44%
           john     735   59.51%      500   40.49%
           acts     695   60.54%      453   39.46%
         romans     224   69.78%       97   30.22%
   1corinthians     281   74.14%       98   25.86%
   2corinthians     122   68.93%       55   31.07%
      galatians      69   68.32%       32   31.68%
      ephesians      34   60.71%       22   39.29%
    philippians      35   77.78%       10   22.22%
     colossians      24   66.67%       12   33.33%
 1thessalonians      37   72.55%       14   27.45%
 2thessalonians      18    50.0%       18    50.0%
       1timothy      44   77.19%       13   22.81%
       2timothy      33   64.71%       18   35.29%
          titus      15    62.5%        9    37.5%
       philemon       5   83.33%        1   16.67%
        hebrews     137   61.43%       86   38.57%
          james      63   69.23%       28   30.77%
         1peter      42   65.62%       22   34.38%
         2peter      44   83.02%        9   16.98%
          1john      93   84.55%       17   15.45%
          2john       6   66.67%        3   33.33%
          3john      10   76.92%        3   23.08%
           jude       9   56.25%        7   43.75%
     revelation     312    65.0%      168    35.0%
          TOTAL    5101   64.86%     2764   35.14%
In [ ]: