You might want to consider the start of this tutorial.

Short introductions to other TF datasets:

Sharing data features

Explore additional data

The ETCBC has a few other repositories with data that work in conjunction with the BHSA data. One of them you have already seen: phono, for phonetic transcriptions. There is also parallels for detecting parallel passages, and valence for studying patterns around verbs that determine their meanings.

Make your own data

If you study the additional data, you can observe how that data is created and also how it is turned into a text-fabric data module. The last step is incredibly easy. You can write out every Python dictionary where the keys are numbers and the values string or numbers as a Text-Fabric feature. When you are creating data, you have already constructed those dictionaries, so writing them out is just one method call. See for example how the flowchart notebook in valence writes out verb sense data.

Share your new data

You can then easily share your new features on GitHub, so that your colleagues everywhere can try it out for themselves.

Here is how you draw in other data, for example

You can add such data on the fly, by passing a mod={org}/{repo}/{path} parameter, or a bunch of them separated by commas.

If the data is there, it will be auto-downloaded and stored on your machine.

Let's do it.

In [1]:
%load_ext autoreload
%autoreload 2

Incantation

The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.

In [2]:
from tf.app import use
In [3]:
A = use(
    'bhsa',
    mod=(
        'etcbc/valence/tf,'
        'etcbc/lingo/heads/tf,'
        'ch-jensen/Semantic-mapping-of-participants/actor/tf'
    ),
    hoist=globals(),
)
rate limit is 5000 requests per hour, with 4820 left for this hour
	connecting to online GitHub repo annotation/app-bhsa ... connected
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-bhsa/code:
	rv2.0.0=#7b3b9ffba7ee6dbc76a52b8d76475d17babf0daf (latest release)
rate limit is 5000 requests per hour, with 4815 left for this hour
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6 (latest release)
rate limit is 5000 requests per hour, with 4810 left for this hour
	connecting to online GitHub repo etcbc/phono ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/c:
	r1.2 (latest release)
rate limit is 5000 requests per hour, with 4805 left for this hour
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/c:
	r1.2 (latest release)
rate limit is 5000 requests per hour, with 4800 left for this hour
	connecting to online GitHub repo etcbc/valence ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/valence/tf/c:
	r1.1=#e59d91795bd1ebd83a3453d37ad58a7f3ba7d675 (latest release)
rate limit is 5000 requests per hour, with 4795 left for this hour
	connecting to online GitHub repo etcbc/lingo ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/lingo/heads/tf/c:
	r0.1=#6e632909160ab8f4486d6c6e88dfc522ea2b4de3 (latest release)
rate limit is 5000 requests per hour, with 4790 left for this hour
	connecting to online GitHub repo ch-jensen/Semantic-mapping-of-participants ... connected
Using data in /Users/dirk/text-fabric-data/ch-jensen/Semantic-mapping-of-participants/actor/tf/c:
	r1.7=#1c17398f92c0836c06de5e1798687c3fa18133cf (latest release)
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API

You see that the features from the etcbc/valence/tf and etcbc/lingo/heads/tf modules have been added to the mix.

If you want to check for data updates, you can add an check=True argument.

Note that edge features are in bold italic.

sense from valence

Let's find out about sense.

In [4]:
F.sense.freqList()
Out[4]:
(('--', 17999),
 ('d-', 9979),
 ('-p', 6193),
 ('-c', 4250),
 ('-i', 2869),
 ('dp', 1853),
 ('dc', 1073),
 ('di', 889),
 ('l.', 876),
 ('i.', 629),
 ('n.', 533),
 ('-b', 66),
 ('db', 61),
 ('c.', 57),
 ('k.', 54))

Which nodes have a sense feature?

In [5]:
{F.otype.v(n) for n in N() if F.sense.v(n)}
Out[5]:
{'word'}
In [6]:
results = A.search('''
word sense
''')
  0.32s 47381 results

Let's show some of the rarer sense values:

In [7]:
results = A.search('''
word sense=k.
''')
  0.39s 54 results
In [8]:
A.table(results, end=5)
npword
1Genesis 4:17יִּקְרָא֙
2Genesis 13:16שַׂמְתִּ֥י
3Genesis 32:13שַׂמְתִּ֤י
4Genesis 34:31יַעֲשֶׂ֖ה
5Genesis 48:20יְשִֽׂמְךָ֣

If we do a pretty display, the sense feature shows up.

In [9]:
A.show(results, start=1, end=1, withNodes=True)

result 1

Genesis 4:17 
verse:1414450
sentence:1172573
clause:427940
phrase:652698
1943 וַ
phrase:652699
sense=d-
phrase:652700
phrase:652701
sentence:1172574
clause:427941
phrase:652702
1948 וַ
phrase:652703
sense=--
sentence:1172575
clause:427942
phrase:652704
1950 וַ
phrase:652705
sense=d-
phrase:652706
sentence:1172576
clause:427943
phrase:652707
1954 וַֽ
phrase:652708
sense=--
clause:427944
phrase:652709
sense=d-
phrase:652710
sentence:1172577
clause:427945
phrase:652711
1958 וַ
phrase:652712
sense=k.
phrase:652713
phrase:652714
phrase:652714

actor from semantic

Let's find out about actor.

In [10]:
fl = F.actor.freqList()
len(fl)
Out[10]:
415
In [11]:
fl[0:10]
Out[11]:
(('JHWH', 358),
 ('BN JFR>L', 205),
 ('>JC', 101),
 ('2sm"YOUSgmas"', 67),
 ('MCH', 60),
 ('>RY', 58),
 ('>TM', 45),
 ('>X "YOUSgmas"', 36),
 ('JFR>L', 35),
 ('KHN', 33))

Which nodes have an actor feature?

In [12]:
{F.otype.v(n) for n in N() if F.actor.v(n)}
Out[12]:
{'phrase_atom', 'subphrase'}
In [13]:
results = A.search('''
phrase_atom actor
''')
  0.17s 2062 results

Let's show some of the rarer actor values:

In [14]:
results = A.search('''
phrase_atom actor=KHN
''')
  0.25s 30 results
In [15]:
A.table(results)
npphrase_atom
1Leviticus 17:5אֶל־הַכֹּהֵ֑ן
2Leviticus 17:6זָרַ֨ק
3Leviticus 17:6הַכֹּהֵ֤ן
4Leviticus 17:6הִקְטִ֣יר
5Leviticus 19:22כִפֶּר֩
6Leviticus 19:22הַכֹּהֵ֜ן
7Leviticus 21:1אֶל־הַכֹּהֲנִ֖ים
8Leviticus 21:1בְּנֵ֣י אַהֲרֹ֑ן
9Leviticus 21:5יִקְרְח֤וּ
10Leviticus 21:5יְגַלֵּ֑חוּ
11Leviticus 21:5יִשְׂרְט֖וּ
12Leviticus 21:6קְדֹשִׁ֤ים
13Leviticus 21:6יִהְיוּ֙
14Leviticus 21:6יְחַלְּל֔וּ
15Leviticus 21:6הֵ֥ם
16Leviticus 21:6מַקְרִיבִ֖ם
17Leviticus 21:6הָ֥יוּ
18Leviticus 21:6קֹֽדֶשׁ׃
19Leviticus 21:7יִקָּ֔חוּ
20Leviticus 21:7יִקָּ֑חוּ
21Leviticus 22:11כֹהֵ֗ן
22Leviticus 22:11יִקְנֶ֥ה
23Leviticus 22:14לַכֹּהֵ֖ן
24Leviticus 23:10אֶל־הַכֹּהֵֽן׃
25Leviticus 23:11הֵנִ֧יף
26Leviticus 23:11יְנִיפֶ֖נּוּ
27Leviticus 23:11הַכֹּהֵֽן׃
28Leviticus 23:20הֵנִ֣יף
29Leviticus 23:20הַכֹּהֵ֣ן׀
30Leviticus 23:20לַכֹּהֵֽן׃

heads from lingo

Now, heads is an edge feature, we cannot directly make it visible in pretty displays, but we can use it in queries.

We also want to make the feature sense visible, so we mention the feature in the query, without restricting the results.

In [17]:
results = A.search('''
book book=Genesis
  chapter chapter=1
    clause
      phrase
      -heads> word sense*
'''
)
  1.08s 402 results

We make the feature sense visible:

In [18]:
A.show(results, start=1, end=3, withNodes=True, skipCols="1 2")

result 1

Genesis 1:1 
verse:1414354
book=Genesischapter=1
sentence:1172290
clause:427553
phrase:651542
phrase:651543
sense=d-
phrase:651544

result 2

Genesis 1:1 
verse:1414354
book=Genesischapter=1
sentence:1172290
clause:427553
phrase:651542
phrase:651543
sense=d-
phrase:651544

result 3

Genesis 1:1 
verse:1414354
book=Genesischapter=1
sentence:1172290
clause:427553
phrase:651542
phrase:651543
sense=d-
phrase:651544

Note how the words that are heads of their phrases are highlighted within their phrases.

All together!

Here is a query that shows results with all features.

In [19]:
results = A.search('''
book book=Leviticus
  phrase sense*
    phrase_atom actor=KHN
  -heads> word
''')
  0.87s 30 results
In [20]:
A.displaySetup(condensed=True, condenseType='verse')
A.show(results, start=8, end=8)
A.displaySetup()

verse 8

Leviticus 22:11 
verse
book=Leviticus
sentence
clause
phrase
clause
phrase
sentence
clause

All steps

  • start your first step in mastering the bible computationally
  • display become an expert in creating pretty displays of your text structures
  • search turbo charge your hand-coding with search templates
  • exportExcel make tailor-made spreadsheets out of your results
  • share draw in other people's data and let them use yours
  • export export your dataset as an Emdros database

CC-BY Dirk Roorda