%load_ext autoreload
%autoreload 2
import os
from tf.fabric import Fabric
GH_BASE = os.path.expanduser('~/github')
ORG = 'annotation'
REPO = 'banks'
FOLDER = 'tf'
TF_DIR = f'{GH_BASE}/{ORG}/{REPO}/{FOLDER}'
VERSION = '0.2'
TF_PATH = f'{TF_DIR}/{VERSION}'
TF = Fabric(locations=TF_PATH)
This is Text-Fabric 7.7.2 Api reference : https://annotation.github.io/text-fabric/Api/Fabric/ 10 features found and 0 ignored
We ask for a list of all features:
allFeatures = TF.explore(silent=True, show=True)
loadableFeatures = allFeatures['nodes'] + allFeatures['edges']
loadableFeatures
('author', 'gap', 'letters', 'number', 'otype', 'punc', 'terminator', 'title', 'oslots')
We load all features:
api = TF.load(loadableFeatures, silent=False)
T = api.T
F = api.F
0.00s loading features ... | 0.00s B otype from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B oslots from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B title from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B number from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B letters from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B punc from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B terminator from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B author from /Users/dirk/github/annotation/banks/tf/0.2 | 0.00s B gap from /Users/dirk/github/annotation/banks/tf/0.2 0.03s All features loaded/computed - for details use loadLog()
Look at the structure definition in the otext
feature:
TF.features['otext'].metaData
{'compiler': 'Dirk Roorda', 'fmt:line-default': '{letters:XXX}{terminator} ', 'fmt:line-term': 'line#{terminator} ', 'fmt:text-orig-full': '{letters}{punc} ', 'name': 'Culture quotes from Iain Banks', 'purpose': 'exposition', 'sectionFeatures': 'title,number,number', 'sectionTypes': 'book,chapter,sentence', 'source': 'Good Reads', 'status': 'with for similarities in a separate module', 'structureFeatures': 'title,number,number,number', 'structureTypes': 'book,chapter,sentence,line', 'url': 'https://www.goodreads.com/work/quotes/14366-consider-phlebas', 'version': '0.2', 'writtenBy': 'Text-Fabric', 'dateWritten': '2019-05-13T10:20:06Z'}
The fields structureTypes
and structureFeatures
define the node types that correspond to structural elements
and their heading features.
But we do not have to ask for the this raw configuration, we can just interrogate the T
API:
T.structureInfo()
A heading is a tuple of pairs (node type, feature value) of node types and features that have been configured as structural elements These 4 structural elements have been configured node type book with heading feature title node type chapter with heading feature number node type sentence with heading feature number node type line with heading feature number You can get them as a tuple with T.headings. Structure API: T.structure(node=None) gives the structure below node, or everything if node is None T.structurePretty(node=None) prints the structure below node, or everything if node is None T.top() gives all top-level nodes T.up(node) gives the (immediate) parent node T.down(node) gives the (immediate) children nodes T.headingFromNode(node) gives the heading of a node T.nodeFromHeading(heading) gives the node of a heading T.ndFromHd complete mapping from headings to nodes T.hdFromNd complete mapping from nodes to headings T.hdMult are all headings with their nodes that occur multiple times There are 18 structural elements in the dataset.
Print the top-level nodes:
T.top()
(100,)
Print the heading of the top-level node:
top = T.top()[0]
T.headingFromNode(top)
(('book', 'Consider Phlebas'),)
Get the node from the heading:
T.nodeFromHeading((('book', 'Consider Phlebas'),))
100
Go a level down:
level2 = T.down(top)
level2
(101, 102)
and print their headings:
for l2 in level2:
print(T.headingFromNode(l2))
(('book', 'Consider Phlebas'), ('chapter', 1)) (('book', 'Consider Phlebas'), ('chapter', 2))
Go a level up:
for l2 in level2:
print(T.up(l2))
100 100
The complete structure of the corpus as a tuple:
T.structure()
((100, ((101, ((115, ((103, ()), (104, ()), (105, ()), (106, ()))), (116, ((107, ()), (108, ()), (109, ()))))), (102, ((117, ((110, ()), (111, ()), (112, ()), (113, ()), (114, ()))),)))),)
The structure of the first chapter as a tuple:
T.structure(node=101)
(101, ((115, ((103, ()), (104, ()), (105, ()), (106, ()))), (116, ((107, ()), (108, ()), (109, ())))))
Pretty-print the structure of the first chapter:
print(T.structurePretty(node=101))
chapter:1 sentence:1 line:1 line:2 line:3 line:4 sentence:2 line:6 line:7 line:8
Pretty-print the complete structure:
print(T.structurePretty())
book:Consider Phlebas chapter:1 sentence:1 line:1 line:2 line:3 line:4 sentence:2 line:6 line:7 line:8 chapter:2 sentence:1 line:1 line:2 line:3 line:4 line:5
Pretty-print the complete structure with full headings:
print(T.structurePretty(fullHeading=True))
book:Consider Phlebas book:Consider Phlebas-chapter:1 book:Consider Phlebas-chapter:1-sentence:1 book:Consider Phlebas-chapter:1-sentence:1-line:1 book:Consider Phlebas-chapter:1-sentence:1-line:2 book:Consider Phlebas-chapter:1-sentence:1-line:3 book:Consider Phlebas-chapter:1-sentence:1-line:4 book:Consider Phlebas-chapter:1-sentence:2 book:Consider Phlebas-chapter:1-sentence:2-line:6 book:Consider Phlebas-chapter:1-sentence:2-line:7 book:Consider Phlebas-chapter:1-sentence:2-line:8 book:Consider Phlebas-chapter:2 book:Consider Phlebas-chapter:2-sentence:1 book:Consider Phlebas-chapter:2-sentence:1-line:1 book:Consider Phlebas-chapter:2-sentence:1-line:2 book:Consider Phlebas-chapter:2-sentence:1-line:3 book:Consider Phlebas-chapter:2-sentence:1-line:4 book:Consider Phlebas-chapter:2-sentence:1-line:5