Heads2TF

In this NB, we produce two text-fabric features on the BHSA data using the get_heads method developed in getting_heads.ipynb. See that notebook for a detailed description of the motivation, method, and shortcomings for this data.

N.B. this data is experimental and a work in progress!

Production

Three features are produced herein:

  • heads.tf - an edge feature from a phrase(atom) node to its phrase head + its coordinated head words.
  • prep_obj.tf - an edge feature from a prepositional phrase type to its noun object.
  • noun_heads.tf - an edge feature from a phrase(atom) node to its noun heads, regardless of whether the phrase is a prepositional phrase or not. "noun" is meant loosely and includes adjectives and other parts of speech.

Export

Updates

06.11.18

Added a new feature, noun_heads, to pluck noun heads from both noun phrases or prepositional phrases.

23.10.18

New export for the updated C version of BHSA data.

21.04.18

A new function has been added to double check phrase heads. Prepositional phrases whose objects are also prepositions have resulted in some false heads being assigned. This is because prepositional objects receive no subphrase relations in BHSA and appeared to the algorithm as independent. An additional check is required to make sure that a given preposition does not serve as the head of its phrase. The new function, check_preposition, looks one word behind a candidate head noun (within the phrase boundaries) and validates only those cases that are not immediately preceded by another preposition.

20.04.18

In discussion with Stephen Ku, I've decided to apply the quantifier algorithm to prepositional objects so that we retrieve the head of the prepositonal object noun phrase rather than a quantifier. For good measure, I will also apply the attributed function (see getting_heads.ipynb for a description of both functions).

In [1]:
import os, collections, random
from tf.fabric import Fabric
from tf.extra.bhsa import Bhsa
from heads import get_heads, find_quantified, find_attributed
In [2]:
# export heads.tf & prep_obj.tf for all TF versions
for version in ['c', '2017', '2016']:
    
    print('processing version ', version, '\n')
    
    # load Text-Fabric and data
    TF = Fabric(locations='~/github/etcbc/bhsa/tf', modules=version)
    api = TF.load('''
                  book chapter verse
                  typ pdp rela mother 
                  function lex sp ls
                  ''')

    F, E, T, L = api.F, api.E, api.T, api.L # TF data methods
        
    # get heads
    heads_features = collections.defaultdict(dict)
    
    print('\nprocessing heads...')
    
    for phrase in list(F.otype.s('phrase')) + list(F.otype.s('phrase_atom')):
        
        heads = get_heads(phrase, api)
        
        if heads:
            heads_features['heads'][phrase] = set(heads)
            
        # make noun heads part 1
        if F.typ.v(phrase) != 'PP' and heads:    
            heads_features['noun_heads'][phrase] = set(heads)
            
        # do prep objects and noun heads part 2
        if F.typ.v(phrase) == 'PP' and heads:
            for head in heads:
                obj = head + 1 if F.pdp.v(head + 1) != 'art' else head + 2
                phrase_bounds = L.d(phrase, 'word')
                if obj in phrase_bounds:
                    obj = find_quantified(obj, api) or find_attributed(obj, api) or obj
                    heads_features['prep_obj'][head] = set([obj])
                    heads_features['noun_heads'][phrase] = set([obj]) # make noun heads part 2
        
    # export TF data
    print('\nexporting TF...')
    meta = {'': {'created_by': 'Cody Kingham',
             'coreData': 'BHSA',
             'coreVersion': version
            },
        'heads' : {'source': 'see the notebook at https://github.com/etcbc/lingo/heads',
                  'valueType': 'int',
                  'edgeValues': False},
        'prep_obj': {'source': 'see the notebook at https://github.com/etcbc/lingo/heads',
                  'valueType': 'int',
                  'edgeValues': False},
        'noun_heads': {'source': 'see the notebook at https://github.com/etcbc/lingo/heads',
                  'valueType': 'int',
                  'edgeValues': False}
       }

    save_tf = Fabric(locations='~/github/etcbc/lingo/heads/tf', modules=version, silent=True)
    save_api = save_tf.load('', silent=True)
    save_tf.save(nodeFeatures={}, edgeFeatures=heads_features, metaData=meta)
    
    print(f'\ndone with {version}')
processing version  c 

This is Text-Fabric 6.4.4
Api reference : https://dans-labs.github.io/text-fabric/Api/General/
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

114 features found and 0 ignored
  0.00s loading features ...
   |     0.01s B book                 from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.00s B chapter              from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.01s B verse                from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B lex                  from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.18s B typ                  from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B pdp                  from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.19s B rela                 from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.17s B mother               from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.06s B function             from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B sp                   from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B ls                   from /Users/cody/github/etcbc/bhsa/tf/c
  5.03s All features loaded/computed - for details use loadLog()

processing heads...
  0.00s Feature "otype" not available in
/Users/cody/github/etcbc/lingo/heads/tf/c
  0.00s Not all features could be loaded/computed
exporting TF...
   |     1.11s T heads                to /Users/cody/github/etcbc/lingo/heads/tf/c
   |     1.05s T noun_heads           to /Users/cody/github/etcbc/lingo/heads/tf/c
   |     0.14s T prep_obj             to /Users/cody/github/etcbc/lingo/heads/tf/c

done with c
processing version  2017 

This is Text-Fabric 6.4.4
Api reference : https://dans-labs.github.io/text-fabric/Api/General/
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

115 features found and 0 ignored
  0.00s loading features ...
   |     0.01s B book                 from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.00s B chapter              from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.01s B verse                from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.19s B typ                  from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.11s B pdp                  from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.19s B rela                 from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.12s B mother               from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.06s B function             from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.11s B lex                  from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.11s B sp                   from /Users/cody/github/etcbc/bhsa/tf/2017
   |     0.10s B ls                   from /Users/cody/github/etcbc/bhsa/tf/2017
  5.76s All features loaded/computed - for details use loadLog()

processing heads...
  0.00s Feature "otype" not available in
/Users/cody/github/etcbc/lingo/heads/tf/2017
  0.00s Not all features could be loaded/computed
exporting TF...
   |     1.13s T heads                to /Users/cody/github/etcbc/lingo/heads/tf/2017
   |     1.07s T noun_heads           to /Users/cody/github/etcbc/lingo/heads/tf/2017
   |     0.14s T prep_obj             to /Users/cody/github/etcbc/lingo/heads/tf/2017

done with 2017
processing version  2016 

This is Text-Fabric 6.4.4
Api reference : https://dans-labs.github.io/text-fabric/Api/General/
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

109 features found and 0 ignored
  0.00s loading features ...
   |     0.01s B book                 from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.00s B chapter              from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.00s B verse                from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.18s B typ                  from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.10s B pdp                  from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.18s B rela                 from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.63s B mother               from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.06s B function             from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.11s B lex                  from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.11s B sp                   from /Users/cody/github/etcbc/bhsa/tf/2016
   |     0.10s B ls                   from /Users/cody/github/etcbc/bhsa/tf/2016
  5.59s All features loaded/computed - for details use loadLog()

processing heads...
  0.00s Feature "otype" not available in
/Users/cody/github/etcbc/lingo/heads/tf/2016
  0.00s Not all features could be loaded/computed
exporting TF...
   |     1.14s T heads                to /Users/cody/github/etcbc/lingo/heads/tf/2016
   |     1.08s T noun_heads           to /Users/cody/github/etcbc/lingo/heads/tf/2016
   |     0.14s T prep_obj             to /Users/cody/github/etcbc/lingo/heads/tf/2016

done with 2016

Tests

In [3]:
data_locs = ['~/github/etcbc/bhsa/tf',
            '~/github/etcbc/lingo/heads/tf']

# load Text-Fabric and data
TF = Fabric(locations=data_locs, modules='c')

api = TF.load('''
              book chapter verse
              typ pdp rela mother 
              function lex sp ls
              heads prep_obj noun_heads
              ''')

F, E, T, L = api.F, api.E, api.T, api.L # TF data methods

B = Bhsa(api, name='Heads2TF', version='c')
This is Text-Fabric 6.4.4
Api reference : https://dans-labs.github.io/text-fabric/Api/General/
Tutorial      : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb
Example data  : https://github.com/Dans-labs/text-fabric-data

117 features found and 0 ignored
  0.00s loading features ...
   |     0.01s B book                 from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.00s B chapter              from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.01s B verse                from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B lex                  from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.18s B typ                  from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B pdp                  from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.18s B rela                 from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.11s B mother               from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.06s B function             from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.11s B sp                   from /Users/cody/github/etcbc/bhsa/tf/c
   |     0.10s B ls                   from /Users/cody/github/etcbc/bhsa/tf/c
   |     2.09s T heads                from /Users/cody/github/etcbc/lingo/heads/tf/c
   |     0.20s T prep_obj             from /Users/cody/github/etcbc/lingo/heads/tf/c
   |     2.10s T noun_heads           from /Users/cody/github/etcbc/lingo/heads/tf/c
    11s All features loaded/computed - for details use loadLog()

This notebook online: NBViewer GitHub

noun_heads

In [4]:
B.show(

B.search('''

phrase typ=PP
    -noun_heads> word

''')[:10]

)
  0.63s 45182 results

verse 1

clause xQtX
phrase Time PP
phrase Pred VP
verb create qal perf
phrase Objc PP
prep <object marker>
art the
conj and
prep <object marker>
art the

verse 2

clause WXQt
phrase Conj CP
conj and
phrase Subj NP
phrase Pred VP
verb be qal perf
phrase PreC NP
clause NmCl
phrase Conj CP
conj and
phrase Subj NP
phrase PreC PP

verse 3

clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb see qal wayq
phrase Objc PP
prep <object marker>
art the
clause Objc xQt0
phrase Conj CP
phrase Pred VP
verb be good qal perf
clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb separate hif wayq

verse 4

clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb call qal wayq
phrase Cmpl PP
prep to
art the
phrase Objc NP
clause WxQ0
phrase Conj CP
conj and
phrase Cmpl PP
phrase Pred VP
verb call qal perf
clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb be qal wayq
phrase Subj NP
clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb be qal wayq
phrase Subj NP

verse 5

clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb say qal wayq
clause ZYqX
phrase Pred VP
verb be qal impf
phrase Subj NP
clause WYq0
phrase Conj CP
conj and
phrase Pred VP
verb be qal impf
phrase PreC VP
verb separate hif ptca
phrase Cmpl PP
phrase Cmpl PP

prep_obj.tf

In [6]:
test_prep = []

for ph in F.typ.s('PP'):
    heads = E.heads.f(ph)
    objs = [E.prep_obj.f(prep)[0] for prep in heads
               if E.prep_obj.f(prep)]
    test_prep.append(tuple(objs))
    
random.shuffle(test_prep)
In [8]:
#B.show(test_prep[:50]) # uncomment me

See what the prepositional object looks like for Genesis 1:21:

In [9]:
gen_121_case = L.d(T.nodeFromSection(('Genesis', 1, 21)), 'phrase')[13]

print('example phrase', gen_121_case, 'phrase number 14 in verse')
print(T.text(L.d(gen_121_case, 'word')))

print('\nGen 1:21 phrase 14\'s heads, a preposition:')
heads = E.heads.f(gen_121_case)
print(T.text(heads))


print('\nGen 1:21 phrase 14\'s prepositional object:')
print(T.text(E.prep_obj.f(heads[0])))
example phrase 651768 phrase number 14 in verse
אֵ֨ת כָּל־עֹ֤וף כָּנָף֙ 

Gen 1:21 phrase 14's heads, a preposition:
אֵ֨ת 

Gen 1:21 phrase 14's prepositional object:
עֹ֤וף 

heads.tf

In [35]:
heads = [E.heads.f(ph) for ph in F.otype.s('phrase') if F.typ.v(ph) == 'NP']
random.shuffle(heads)
In [37]:
#B.show(heads[:50]) # uncomment me