Structure ins and outs

See also the docs

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
import os

from tf.fabric import Fabric
In [3]:
GH_BASE = os.path.expanduser('~/github')
ORG = 'annotation'
REPO = 'banks'
FOLDER = 'tf'
TF_DIR = f'{GH_BASE}/{ORG}/{REPO}/{FOLDER}'

VERSION = '0.2'

TF_PATH = f'{TF_DIR}/{VERSION}'
TF = Fabric(locations=TF_PATH)
This is Text-Fabric 7.7.2
Api reference : https://annotation.github.io/text-fabric/Api/Fabric/

10 features found and 0 ignored

We ask for a list of all features:

In [4]:
allFeatures = TF.explore(silent=True, show=True)
loadableFeatures = allFeatures['nodes'] + allFeatures['edges']
loadableFeatures
Out[4]:
('author',
 'gap',
 'letters',
 'number',
 'otype',
 'punc',
 'terminator',
 'title',
 'oslots')

We load all features:

In [5]:
api = TF.load(loadableFeatures, silent=False)
T = api.T
F = api.F
  0.00s loading features ...
   |     0.00s B otype                from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B oslots               from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B title                from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B number               from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B letters              from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B punc                 from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B terminator           from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B author               from /Users/dirk/github/annotation/banks/tf/0.2
   |     0.00s B gap                  from /Users/dirk/github/annotation/banks/tf/0.2
  0.03s All features loaded/computed - for details use loadLog()

Look at the structure definition in the otext feature:

In [6]:
TF.features['otext'].metaData
Out[6]:
{'compiler': 'Dirk Roorda',
 'fmt:line-default': '{letters:XXX}{terminator} ',
 'fmt:line-term': 'line#{terminator} ',
 'fmt:text-orig-full': '{letters}{punc} ',
 'name': 'Culture quotes from Iain Banks',
 'purpose': 'exposition',
 'sectionFeatures': 'title,number,number',
 'sectionTypes': 'book,chapter,sentence',
 'source': 'Good Reads',
 'status': 'with for similarities in a separate module',
 'structureFeatures': 'title,number,number,number',
 'structureTypes': 'book,chapter,sentence,line',
 'url': 'https://www.goodreads.com/work/quotes/14366-consider-phlebas',
 'version': '0.2',
 'writtenBy': 'Text-Fabric',
 'dateWritten': '2019-05-13T10:20:06Z'}

The fields structureTypes and structureFeatures define the node types that correspond to structural elements and their heading features.

But we do not have to ask for the this raw configuration, we can just interrogate the T API:

In [7]:
T.structureInfo()
A heading is a tuple of pairs (node type, feature value)
	of node types and features that have been configured as structural elements
These 4 structural elements have been configured
	node type book       with heading feature title
	node type chapter    with heading feature number
	node type sentence   with heading feature number
	node type line       with heading feature number
You can get them as a tuple with T.headings.

Structure API:
	T.structure(node=None)       gives the structure below node, or everything if node is None
	T.structurePretty(node=None) prints the structure below node, or everything if node is None
	T.top()                      gives all top-level nodes
	T.up(node)                   gives the (immediate) parent node
	T.down(node)                 gives the (immediate) children nodes
	T.headingFromNode(node)      gives the heading of a node
	T.nodeFromHeading(heading)   gives the node of a heading
	T.ndFromHd                   complete mapping from headings to nodes
	T.hdFromNd                   complete mapping from nodes to headings
	T.hdMult are all headings    with their nodes that occur multiple times

There are 18 structural elements in the dataset.

Print the top-level nodes:

In [8]:
T.top()
Out[8]:
(100,)

Print the heading of the top-level node:

In [9]:
top = T.top()[0]
T.headingFromNode(top)
Out[9]:
(('book', 'Consider Phlebas'),)

Get the node from the heading:

In [10]:
T.nodeFromHeading((('book', 'Consider Phlebas'),))
Out[10]:
100

Go a level down:

In [11]:
level2 = T.down(top)
level2
Out[11]:
(101, 102)

and print their headings:

In [12]:
for l2 in level2:
  print(T.headingFromNode(l2))
(('book', 'Consider Phlebas'), ('chapter', 1))
(('book', 'Consider Phlebas'), ('chapter', 2))

Go a level up:

In [13]:
for l2 in level2:
  print(T.up(l2))
100
100

The complete structure of the corpus as a tuple:

In [14]:
T.structure()
Out[14]:
((100,
  ((101,
    ((115, ((103, ()), (104, ()), (105, ()), (106, ()))),
     (116, ((107, ()), (108, ()), (109, ()))))),
   (102, ((117, ((110, ()), (111, ()), (112, ()), (113, ()), (114, ()))),)))),)

The structure of the first chapter as a tuple:

In [15]:
T.structure(node=101)
Out[15]:
(101,
 ((115, ((103, ()), (104, ()), (105, ()), (106, ()))),
  (116, ((107, ()), (108, ()), (109, ())))))

Pretty-print the structure of the first chapter:

In [16]:
print(T.structurePretty(node=101))
  chapter:1
      sentence:1
          line:1
          line:2
          line:3
          line:4
      sentence:2
          line:6
          line:7
          line:8

Pretty-print the complete structure:

In [17]:
print(T.structurePretty())
    book:Consider Phlebas
        chapter:1
            sentence:1
                line:1
                line:2
                line:3
                line:4
            sentence:2
                line:6
                line:7
                line:8
        chapter:2
            sentence:1
                line:1
                line:2
                line:3
                line:4
                line:5

Pretty-print the complete structure with full headings:

In [18]:
print(T.structurePretty(fullHeading=True))
    book:Consider Phlebas
        book:Consider Phlebas-chapter:1
            book:Consider Phlebas-chapter:1-sentence:1
                book:Consider Phlebas-chapter:1-sentence:1-line:1
                book:Consider Phlebas-chapter:1-sentence:1-line:2
                book:Consider Phlebas-chapter:1-sentence:1-line:3
                book:Consider Phlebas-chapter:1-sentence:1-line:4
            book:Consider Phlebas-chapter:1-sentence:2
                book:Consider Phlebas-chapter:1-sentence:2-line:6
                book:Consider Phlebas-chapter:1-sentence:2-line:7
                book:Consider Phlebas-chapter:1-sentence:2-line:8
        book:Consider Phlebas-chapter:2
            book:Consider Phlebas-chapter:2-sentence:1
                book:Consider Phlebas-chapter:2-sentence:1-line:1
                book:Consider Phlebas-chapter:2-sentence:1-line:2
                book:Consider Phlebas-chapter:2-sentence:1-line:3
                book:Consider Phlebas-chapter:2-sentence:1-line:4
                book:Consider Phlebas-chapter:2-sentence:1-line:5
In [ ]: