You might want to consider the start of this tutorial.

In [1]:
%load_ext autoreload
%autoreload 2
In [2]:
from tf.app import use
In [3]:
VERSION = '2017'
In [4]:
A = use('bhsa', hoist=globals(), version=VERSION)
	connecting to online GitHub repo annotation/app-bhsa ... connected
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-bhsa/code:
	#d3cf8f0c2ab5d690a0fda14ea31c33da5c5c8483 (latest commit)
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/2017:
	rv1.6 (latest release)
	connecting to online GitHub repo etcbc/phono ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/2017:
	r1.2=#1ac68e976ee4a7f23eb6bb4c6f401a033d0ec169 (latest release)
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/2017:
	r1.2=#395dfe2cb69c261862fab9f0289e594a52121d5c (latest release)

Gaps and spans

Searches often do not deliver the results you expect. Besides typos, lack of familiarity with the template formalism and bugs in the system, there is another cause: difficult semantics of the data.

Most users reason about phrases, clauses and sentences as if they are consecutive blocks of words. But in the BHSA this is not the case: each of these objects may have gaps.

Most of the time, verse boundaries coincide with the boundaries of sentences, clauses, and phrases. But not always, there are verse spanning sentences.

Note These phenomena may wreak havoc with your intuitive reasoning about what search templates should deliver. Query templates do not require the objects to be consecutive and still they make sense. But that might not be your sense, unless you Mind the gap!

We are going to show these issues in depth.

Gaps

TF-search has no primitives to deal with gaps directly. Nodes correspond to textual objects such as words, phrases, clauses, verses, books. Usually these are consecutive sequences of one or more words, but in theory they can be arbitrary sets of slots.

And, as far as the BHSA corpus is concerned, in practice too. If we look at phrases, then the overwhelming majority is consecutive, without gaps, But there is also a substantial amount of phrases with gaps.

People that are familiar with MQL (see fromMQL) may remember that in MQL you can search for a gap. The MQL query

SELECT ALL OBJECTS WHERE

[phrase FOCUS
    [word lex='L']
    [gap]
]

looks for a phrase with a gap in it (i.e. one or more consecutive words between the start and the end of the phrase that do not belong to the phrase). The query then asks additionally for those gap-containing phrases that have a certain word in front of the gap.

We want this too!

Find the gap

We start with a query that aims to get the same results as the MQL query above.

In our template, we require that there is a word wPreGap in the phrase that is just before the gap, a word wGap that comes right after, so it is in the gap, and hence does not belong to the phrase. But this all must happen before the last word wLast of the phrase.

In [65]:
query = '''
verse
    p:phrase
      wPreGap:word lex=L
      wLast:word
      :=

wGap:word
wPreGap <: wGap
wGap < wLast
p || wGap
'''
In [66]:
results = A.search(query)
  1.06s 13 results

Nice and quick. Let's see the results.

In [67]:
A.table(results)
np versephrasewordwordword
1 Genesis 17:7וַהֲקִמֹתִ֨י אֶת־בְּרִיתִ֜י בֵּינִ֣י וּבֵינֶ֗ךָ וּבֵ֨ין זַרְעֲךָ֧ אַחֲרֶ֛יךָ לְדֹרֹתָ֖ם לִבְרִ֣ית עֹולָ֑ם לִהְיֹ֤ות לְךָ֙ לֵֽאלֹהִ֔ים וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ לְךָ֙ אַחֲרֶֽיךָ׃ לֵֽ
2 Genesis 28:4וְיִֽתֶּן־לְךָ֙ אֶת־בִּרְכַּ֣ת אַבְרָהָ֔ם לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ לְרִשְׁתְּךָ֙ אֶת־אֶ֣רֶץ מְגֻרֶ֔יךָ אֲשֶׁר־נָתַ֥ן אֱלֹהִ֖ים לְאַבְרָהָֽם׃ לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ לְךָ֙ אִתָּ֑ךְ אֶת־
3 Genesis 31:16כִּ֣י כָל־הָעֹ֗שֶׁר אֲשֶׁ֨ר הִצִּ֤יל אֱלֹהִים֙ מֵֽאָבִ֔ינוּ לָ֥נוּ ה֖וּא וּלְבָנֵ֑ינוּ וְעַתָּ֗ה כֹּל֩ אֲשֶׁ֨ר אָמַ֧ר אֱלֹהִ֛ים אֵלֶ֖יךָ עֲשֵֽׂה׃ לָ֥נוּ וּלְבָנֵ֑ינוּ לָ֥נוּ בָנֵ֑ינוּ ה֖וּא
4 Exodus 30:21וְרָחֲצ֛וּ יְדֵיהֶ֥ם וְרַגְלֵיהֶ֖ם וְלֹ֣א יָמֻ֑תוּ וְהָיְתָ֨ה לָהֶ֧ם חָק־עֹולָ֛ם לֹ֥ו וּלְזַרְעֹ֖ו לְדֹרֹתָֽם׃ פ לָהֶ֧ם לֹ֥ו וּלְזַרְעֹ֖ו לָהֶ֧ם זַרְעֹ֖ו חָק־
5 Leviticus 25:6וְ֠הָיְתָה שַׁבַּ֨ת הָאָ֤רֶץ לָכֶם֙ לְאָכְלָ֔ה לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ הַגָּרִ֖ים עִמָּֽךְ׃ לָכֶם֙ לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ לָכֶם֙ תֹושָׁ֣בְךָ֔ לְ
6 Numbers 20:15וַיֵּרְד֤וּ אֲבֹתֵ֨ינוּ֙ מִצְרַ֔יְמָה וַנֵּ֥שֶׁב בְּמִצְרַ֖יִם יָמִ֣ים רַבִּ֑ים וַיָּרֵ֥עוּ לָ֛נוּ מִצְרַ֖יִם וְלַאֲבֹתֵֽינוּ׃ לָ֛נוּ וְלַאֲבֹתֵֽינוּ׃ לָ֛נוּ אֲבֹתֵֽינוּ׃ מִצְרַ֖יִם
7 Numbers 32:33וַיִּתֵּ֣ן לָהֶ֣ם׀ מֹשֶׁ֡ה לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף אֶת־מַמְלֶ֨כֶת֙ סִיחֹן֙ מֶ֣לֶךְ הָֽאֱמֹרִ֔י וְאֶת־מַמְלֶ֔כֶת עֹ֖וג מֶ֣לֶךְ הַבָּשָׁ֑ן הָאָ֗רֶץ לְעָרֶ֨יהָ֙ בִּגְבֻלֹ֔ת עָרֵ֥י הָאָ֖רֶץ סָבִֽיב׃ לָהֶ֣ם׀ לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף לָהֶ֣ם׀ יֹוסֵ֗ף מֹשֶׁ֡ה
8 Deuteronomy 1:36זֽוּלָתִ֞י כָּלֵ֤ב בֶּן־יְפֻנֶּה֙ ה֣וּא יִרְאֶ֔נָּה וְלֹֽו־אֶתֵּ֧ן אֶת־הָאָ֛רֶץ אֲשֶׁ֥ר דָּֽרַךְ־בָּ֖הּ וּלְבָנָ֑יו יַ֕עַן אֲשֶׁ֥ר מִלֵּ֖א אַחֲרֵ֥י יְהוָֽה׃ לֹֽו־וּלְבָנָ֑יו לֹֽו־בָנָ֑יו אֶתֵּ֧ן
9 Deuteronomy 26:11וְשָׂמַחְתָּ֣ בְכָל־הַטֹּ֗וב אֲשֶׁ֧ר נָֽתַן־לְךָ֛ יְהוָ֥ה אֱלֹהֶ֖יךָ וּלְבֵיתֶ֑ךָ אַתָּה֙ וְהַלֵּוִ֔י וְהַגֵּ֖ר אֲשֶׁ֥ר בְּקִרְבֶּֽךָ׃ ס לְךָ֛ וּלְבֵיתֶ֑ךָ לְךָ֛ בֵיתֶ֑ךָ יְהוָ֥ה
10 1_Samuel 25:31וְלֹ֣א תִהְיֶ֣ה זֹ֣את׀ לְךָ֡ לְפוּקָה֩ וּלְמִכְשֹׁ֨ול לֵ֜ב לַאדֹנִ֗י וְלִשְׁפָּךְ־דָּם֙ חִנָּ֔ם וּלְהֹושִׁ֥יעַ אֲדֹנִ֖י לֹ֑ו וְהֵיטִ֤ב יְהוָה֙ לַֽאדֹנִ֔י וְזָכַרְתָּ֖ אֶת־אֲמָתֶֽךָ׃ ס לְךָ֡ לַאדֹנִ֗י לְךָ֡ אדֹנִ֗י לְ
11 2_Kings 25:24וַיִּשָּׁבַ֨ע לָהֶ֤ם גְּדַלְיָ֨הוּ֙ וּלְאַנְשֵׁיהֶ֔ם וַיֹּ֣אמֶר לָהֶ֔ם אַל־תִּֽירְא֖וּ מֵעַבְדֵ֣י הַכַּשְׂדִּ֑ים שְׁב֣וּ בָאָ֗רֶץ וְעִבְד֛וּ אֶת־מֶ֥לֶךְ בָּבֶ֖ל וְיִטַ֥ב לָכֶֽם׃ ס לָהֶ֤ם וּלְאַנְשֵׁיהֶ֔ם לָהֶ֤ם אַנְשֵׁיהֶ֔ם גְּדַלְיָ֨הוּ֙
12 Jeremiah 40:9וַיִּשָּׁבַ֨ע לָהֶ֜ם גְּדַלְיָ֨הוּ בֶן־אֲחִיקָ֤ם בֶּן־שָׁפָן֙ וּלְאַנְשֵׁיהֶ֣ם לֵאמֹ֔ר אַל־תִּֽירְא֖וּ מֵעֲבֹ֣וד הַכַּשְׂדִּ֑ים שְׁב֣וּ בָאָ֗רֶץ וְעִבְד֛וּ אֶת־מֶ֥לֶךְ בָּבֶ֖ל וְיִיטַ֥ב לָכֶֽם׃ לָהֶ֜ם וּלְאַנְשֵׁיהֶ֣ם לָהֶ֜ם אַנְשֵׁיהֶ֣ם גְּדַלְיָ֨הוּ
13 Daniel 9:8יְהוָ֗ה לָ֚נוּ בֹּ֣שֶׁת הַפָּנִ֔ים לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ אֲשֶׁ֥ר חָטָ֖אנוּ לָֽךְ׃ לָ֚נוּ לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ לָ֚נוּ אֲבֹתֵ֑ינוּ בֹּ֣שֶׁת

Let's color the word in the gap differently.

In [68]:
A.table(results, condensed=False, colorMap={2: 'blue', 3: 'yellow', 5: 'magenta'})
np versephrasewordwordword
1 Genesis 17:7וַהֲקִמֹתִ֨י אֶת־בְּרִיתִ֜י בֵּינִ֣י וּבֵינֶ֗ךָ וּבֵ֨ין זַרְעֲךָ֧ אַחֲרֶ֛יךָ לְדֹרֹתָ֖ם לִבְרִ֣ית עֹולָ֑ם לִהְיֹ֤ות לְךָ֙ לֵֽאלֹהִ֔ים וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ לְךָ֙ וּֽלְזַרְעֲךָ֖ אַחֲרֶֽיךָ׃ לְךָ֙ אַחֲרֶֽיךָ׃ לֵֽ
2 Genesis 28:4וְיִֽתֶּן־לְךָ֙ אֶת־בִּרְכַּ֣ת אַבְרָהָ֔ם לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ לְרִשְׁתְּךָ֙ אֶת־אֶ֣רֶץ מְגֻרֶ֔יךָ אֲשֶׁר־נָתַ֥ן אֱלֹהִ֖ים לְאַבְרָהָֽם׃ לְךָ֙ לְךָ֖ וּלְזַרְעֲךָ֣ אִתָּ֑ךְ לְךָ֙ אִתָּ֑ךְ אֶת־
3 Genesis 31:16כִּ֣י כָל־הָעֹ֗שֶׁר אֲשֶׁ֨ר הִצִּ֤יל אֱלֹהִים֙ מֵֽאָבִ֔ינוּ לָ֥נוּ ה֖וּא וּלְבָנֵ֑ינוּ וְעַתָּ֗ה כֹּל֩ אֲשֶׁ֨ר אָמַ֧ר אֱלֹהִ֛ים אֵלֶ֖יךָ עֲשֵֽׂה׃ לָ֥נוּ וּלְבָנֵ֑ינוּ לָ֥נוּ בָנֵ֑ינוּ ה֖וּא
4 Exodus 30:21וְרָחֲצ֛וּ יְדֵיהֶ֥ם וְרַגְלֵיהֶ֖ם וְלֹ֣א יָמֻ֑תוּ וְהָיְתָ֨ה לָהֶ֧ם חָק־עֹולָ֛ם לֹ֥ו וּלְזַרְעֹ֖ו לְדֹרֹתָֽם׃ פ לָהֶ֧ם לֹ֥ו וּלְזַרְעֹ֖ו לָהֶ֧ם זַרְעֹ֖ו חָק־
5 Leviticus 25:6וְ֠הָיְתָה שַׁבַּ֨ת הָאָ֤רֶץ לָכֶם֙ לְאָכְלָ֔ה לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ הַגָּרִ֖ים עִמָּֽךְ׃ לָכֶם֙ לְךָ֖ וּלְעַבְדְּךָ֣ וְלַאֲמָתֶ֑ךָ וְלִשְׂכִֽירְךָ֙ וּלְתֹושָׁ֣בְךָ֔ לָכֶם֙ תֹושָׁ֣בְךָ֔ לְ
6 Numbers 20:15וַיֵּרְד֤וּ אֲבֹתֵ֨ינוּ֙ מִצְרַ֔יְמָה וַנֵּ֥שֶׁב בְּמִצְרַ֖יִם יָמִ֣ים רַבִּ֑ים וַיָּרֵ֥עוּ לָ֛נוּ מִצְרַ֖יִם וְלַאֲבֹתֵֽינוּ׃ לָ֛נוּ וְלַאֲבֹתֵֽינוּ׃ לָ֛נוּ אֲבֹתֵֽינוּ׃ מִצְרַ֖יִם
7 Numbers 32:33וַיִּתֵּ֣ן לָהֶ֣ם׀ מֹשֶׁ֡ה לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף אֶת־מַמְלֶ֨כֶת֙ סִיחֹן֙ מֶ֣לֶךְ הָֽאֱמֹרִ֔י וְאֶת־מַמְלֶ֔כֶת עֹ֖וג מֶ֣לֶךְ הַבָּשָׁ֑ן הָאָ֗רֶץ לְעָרֶ֨יהָ֙ בִּגְבֻלֹ֔ת עָרֵ֥י הָאָ֖רֶץ סָבִֽיב׃ לָהֶ֣ם׀ לִבְנֵי־גָד֩ וְלִבְנֵ֨י רְאוּבֵ֜ן וְלַחֲצִ֣י׀ שֵׁ֣בֶט׀ מְנַשֶּׁ֣ה בֶן־יֹוסֵ֗ף לָהֶ֣ם׀ יֹוסֵ֗ף מֹשֶׁ֡ה
8 Deuteronomy 1:36זֽוּלָתִ֞י כָּלֵ֤ב בֶּן־יְפֻנֶּה֙ ה֣וּא יִרְאֶ֔נָּה וְלֹֽו־אֶתֵּ֧ן אֶת־הָאָ֛רֶץ אֲשֶׁ֥ר דָּֽרַךְ־בָּ֖הּ וּלְבָנָ֑יו יַ֕עַן אֲשֶׁ֥ר מִלֵּ֖א אַחֲרֵ֥י יְהוָֽה׃ לֹֽו־וּלְבָנָ֑יו לֹֽו־בָנָ֑יו אֶתֵּ֧ן
9 Deuteronomy 26:11וְשָׂמַחְתָּ֣ בְכָל־הַטֹּ֗וב אֲשֶׁ֧ר נָֽתַן־לְךָ֛ יְהוָ֥ה אֱלֹהֶ֖יךָ וּלְבֵיתֶ֑ךָ אַתָּה֙ וְהַלֵּוִ֔י וְהַגֵּ֖ר אֲשֶׁ֥ר בְּקִרְבֶּֽךָ׃ ס לְךָ֛ וּלְבֵיתֶ֑ךָ לְךָ֛ בֵיתֶ֑ךָ יְהוָ֥ה
10 1_Samuel 25:31וְלֹ֣א תִהְיֶ֣ה זֹ֣את׀ לְךָ֡ לְפוּקָה֩ וּלְמִכְשֹׁ֨ול לֵ֜ב לַאדֹנִ֗י וְלִשְׁפָּךְ־דָּם֙ חִנָּ֔ם וּלְהֹושִׁ֥יעַ אֲדֹנִ֖י לֹ֑ו וְהֵיטִ֤ב יְהוָה֙ לַֽאדֹנִ֔י וְזָכַרְתָּ֖ אֶת־אֲמָתֶֽךָ׃ ס לְךָ֡ לַאדֹנִ֗י לְךָ֡ אדֹנִ֗י לְ
11 2_Kings 25:24וַיִּשָּׁבַ֨ע לָהֶ֤ם גְּדַלְיָ֨הוּ֙ וּלְאַנְשֵׁיהֶ֔ם וַיֹּ֣אמֶר לָהֶ֔ם אַל־תִּֽירְא֖וּ מֵעַבְדֵ֣י הַכַּשְׂדִּ֑ים שְׁב֣וּ בָאָ֗רֶץ וְעִבְד֛וּ אֶת־מֶ֥לֶךְ בָּבֶ֖ל וְיִטַ֥ב לָכֶֽם׃ ס לָהֶ֤ם וּלְאַנְשֵׁיהֶ֔ם לָהֶ֤ם אַנְשֵׁיהֶ֔ם גְּדַלְיָ֨הוּ֙
12 Jeremiah 40:9וַיִּשָּׁבַ֨ע לָהֶ֜ם גְּדַלְיָ֨הוּ בֶן־אֲחִיקָ֤ם בֶּן־שָׁפָן֙ וּלְאַנְשֵׁיהֶ֣ם לֵאמֹ֔ר אַל־תִּֽירְא֖וּ מֵעֲבֹ֣וד הַכַּשְׂדִּ֑ים שְׁב֣וּ בָאָ֗רֶץ וְעִבְד֛וּ אֶת־מֶ֥לֶךְ בָּבֶ֖ל וְיִיטַ֥ב לָכֶֽם׃ לָהֶ֜ם וּלְאַנְשֵׁיהֶ֣ם לָהֶ֜ם אַנְשֵׁיהֶ֣ם גְּדַלְיָ֨הוּ
13 Daniel 9:8יְהוָ֗ה לָ֚נוּ בֹּ֣שֶׁת הַפָּנִ֔ים לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ אֲשֶׁ֥ר חָטָ֖אנוּ לָֽךְ׃ לָ֚נוּ לִמְלָכֵ֥ינוּ לְשָׂרֵ֖ינוּ וְלַאֲבֹתֵ֑ינוּ לָ֚נוּ אֲבֹתֵ֑ינוּ בֹּ֣שֶׁת
In [69]:
A.show(results, end=3, condensed=False, colorMap={2: 'aqua', 3: 'yellow', 5: 'magenta'})

result 1

sentence 18|999
clause WQt0
phrase Conj CP
conj and lex=W
phrase Pred VP
verb arise hif perf lex=QWM[
phrase Objc PP
prep <object marker> lex=>T
subs covenant lex=BRJT/
phrase Cmpl PP
prep interval lex=BJN/
conj and lex=W
prep interval lex=BJN/
conj and lex=W
prep interval lex=BJN/
subs seed lex=ZR</
phrase Cmpl PP
prep after lex=>XR/
phrase Cmpl PP
prep to lex=L
subs generation lex=DWR/
phrase Cmpl PP
prep to lex=L
subs covenant lex=BRJT/
subs eternity lex=<WLM/
clause Adju InfC
phrase Pred VP
prep to lex=L
verb be qal infc lex=HJH[
phrase Cmpl PP
prep to lex=L
phrase PreC PP
prep to lex=L
subs god(s) lex=>LHJM/
phrase Cmpl PP|CP
conj and lex=W
phrase Cmpl PP
prep to lex=L
subs seed lex=ZR</
phrase Cmpl PP

result 2

sentence 13|2254
clause WYq0
phrase Conj CP
conj and lex=W
phrase Pred VP
verb give qal impf lex=NTN[
phrase Cmpl PP
prep to lex=L
phrase Objc PP
prep <object marker> lex=>T
subs blessing lex=BRKH/
nmpr Abraham lex=>BRHM/
phrase Cmpl PP
prep to lex=L
conj and lex=W
prep to lex=L
subs seed lex=ZR</
phrase Cmpl PP
prep together with lex=>T==
clause Adju InfC
phrase PreS VP
prep to lex=L
verb trample down qal infc lex=JRC[
phrase Objc PP
prep <object marker> lex=>T
subs earth lex=>RY/
subs neighbourhood lex=MGWRJM/
clause Attr xQtX
phrase Rela CP
conj <relative> lex=>CR
phrase Pred VP
verb give qal perf lex=NTN[
phrase Subj NP
subs god(s) lex=>LHJM/
phrase Cmpl PP
prep to lex=L
nmpr Abraham lex=>BRHM/

result 3

sentence 49|2625
clause CPen
phrase Conj CP
conj that lex=KJ
phrase Frnt NP
subs whole lex=KL/
art the lex=H
subs riches lex=<CR/
clause Attr xQtX
phrase Rela CP
conj <relative> lex=>CR
phrase Pred VP
verb deliver hif perf lex=NYL[
phrase Subj NP
subs god(s) lex=>LHJM/
phrase Cmpl PP
prep from lex=MN
subs father lex=>B/
clause Resu NmCl
phrase PreC PP
phrase Subj PPrP
prps he lex=HW>
phrase PreC PP|CP
conj and lex=W
phrase PreC PP
prep to lex=L
sentence 50|2626
clause MSyn
phrase Conj CP
conj and lex=W
phrase Time AdvP
advb now lex=<TH
sentence 51|2627
clause xIm0|Defc
phrase Objc NP
subs whole lex=KL/
clause Attr xQtX
phrase Rela CP
conj <relative> lex=>CR
phrase Pred VP
verb say qal perf lex=>MR[
phrase Subj NP
subs god(s) lex=>LHJM/
phrase Cmpl PP
clause xIm0|ZIm0
phrase Pred VP
verb make qal impv lex=<FH[

All gapped phrases

These were particular gaps. Now we want to get all gapped phrases.

We can just lift the special requirement that the preGapWord has to satisfy a special lexical condition.

In [70]:
query = '''
p:phrase
  wPreGap:word
  wLast:word
  :=

wGap:word
wPreGap <: wGap
wGap < wLast

p || wGap
'''
In [71]:
results = A.search(query)
  2.96s 715 results

Not too bad! We could wait for it. Here are some results.

In [72]:
A.table(results, end=10)
np phrasewordwordword
1 Genesis 1:7בֵּ֤ין הַמַּ֨יִם֙ וּבֵ֣ין הַמַּ֔יִם מַּ֨יִם֙ מַּ֔יִם אֲשֶׁר֙
2 Genesis 1:11דֶּ֔שֶׁא עֵ֚שֶׂב עֵ֣ץ פְּרִ֞י עֵ֚שֶׂב פְּרִ֞י מַזְרִ֣יעַ
3 Genesis 1:12דֶּ֠שֶׁא עֵ֣שֶׂב וְעֵ֧ץ עֵ֣שֶׂב עֵ֧ץ מַזְרִ֤יעַ
4 Genesis 1:29אֶת־כָּל־עֵ֣שֶׂב׀ וְאֶת־כָּל־הָעֵ֛ץ עֵ֣שֶׂב׀ עֵ֛ץ זֹרֵ֣עַ
5 Genesis 2:25שְׁנֵיהֶם֙ הָֽאָדָ֖ם וְאִשְׁתֹּ֑ו שְׁנֵיהֶם֙ אִשְׁתֹּ֑ו עֲרוּמִּ֔ים
6 Genesis 4:4הֶ֨בֶל גַם־ה֛וּא הֶ֨בֶל ה֛וּא הֵבִ֥יא
7 Genesis 7:8מִן־הַבְּהֵמָה֙ הַטְּהֹורָ֔ה וּמִן־הַ֨בְּהֵמָ֔ה וּמִ֨ן־הָעֹ֔וף וְכֹ֥ל בְּהֵמָ֔ה כֹ֥ל אֲשֶׁ֥ר
8 Genesis 7:14הֵ֜מָּה וְכָל־הַֽחַיָּ֣ה לְמִינָ֗הּ וְכָל־הַבְּהֵמָה֙ לְמִינָ֔הּ וְכָל־הָרֶ֛מֶשׂ לְמִינֵ֑הוּ וְכָל־הָעֹ֣וף לְמִינֵ֔הוּ כֹּ֖ל צִפֹּ֥ור כָּל־כָּנָֽף׃ רֶ֛מֶשׂ כָּנָֽף׃ הָ
9 Genesis 7:21כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃ בָּשָׂ֣ר׀ אָדָֽם׃ הָ
10 Genesis 7:21כָּל־בָּשָׂ֣ר׀ בָּעֹ֤וף וּבַבְּהֵמָה֙ וּבַ֣חַיָּ֔ה וּבְכָל־הַשֶּׁ֖רֶץ וְכֹ֖ל הָאָדָֽם׃ שֶּׁ֖רֶץ אָדָֽם׃ הַ

If a phrase has multiple gaps, we encounter it multiple times in our results.

We show the two condensed results in Genesis 7:21.

In [73]:
A.show(results, condensed=True, start=9, end=10, colorMap={1: 'lightyellow', 2: 'yellow', 4: 'magenta'})

verse 9

sentence 32|469
clause WayX
phrase Conj CP
conj and
phrase Pred VP
verb expire qal wayq
clause Attr Ptcp
phrase Rela CP
conj the
phrase PreC VP
verb creep qal ptca
phrase Cmpl PP
clause WayX|Defc
phrase Subj NP|PP
art the
conj and
prep in
art the
conj and
art the
subs wild animal
conj and
prep in
subs whole
art the
subs swarming creatures
clause Attr Ptcp
phrase Rela CP
conj the
phrase PreC VP
verb swarm qal ptca
phrase Cmpl PP
clause WayX|Defc
phrase Subj NP|CP
conj and
phrase Subj NP
art the
subs human, mankind

verse 10

sentence 6|532
clause Coor NmCl
phrase PreC PP
clause Attr xYq0
phrase Rela CP
conj <relative>
phrase Pred VP
verb creep qal impf
clause Coor NmCl|Defc
phrase PreC PP|CP
conj and
sentence 7|533
clause xQt0

If we want just the phrases, and only once, we can run the query in shallow mode, see advanced:

In [74]:
gapQueryResults = A.search(query, shallow=True)
  3.04s 671 results

A different query

We can make an equivalent query to get the gaps.

In [75]:
query = '''
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap
'''

Experience has shown that this is a slow query, so we handle it with care.

In [76]:
S.study(query)
S.showPlan(details=True)
   |     0.00s Feature overview: 111 for nodes; 8 for edges; 2 configs; 8 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.20s Constraining search space with 7 relations ...
  1.29s 	2 edges thinned
  1.29s Setting up retrieval plan ...
  1.33s Ready to deliver results from 1186145 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 4 objects and 7 relations
Results are instantiations of the following objects:
node  0-phrase                            (253187   choices)
node  1-word                              (253187   choices)
node  2-word                              (253187   choices)
node  3-word                              (426584   choices)
Performance parameters:
	yarnRatio            =     1.2
	tryLimitFrom         =     100
	tryLimitTo           =     100
Instantiations are computed along the following relations:
node                      0-phrase        (253187   choices)
edge  0-phrase        [[  2-word          (     1.0 choices)
edge  2-word          :=  0-phrase        (     1.0 choices (thinned))
edge  0-phrase        [[  1-word          (     1.0 choices)
edge  1-word          =:  0-phrase        (     1.0 choices (thinned))
edge  1-word          <   3-word          (213292.0 choices)
edge  3-word          <   2-word          (126593.5 choices)
edge  3-word          ||  0-phrase        (227868.3 choices)
  1.34s The results are connected to the original search template as follows:
 0     
 1 R0  p:phrase
 2 R1      =: wFirst:word
 3 R2      wLast:word
 4         :=
 5     
 6 R3  wGap:word
 7     wFirst < wGap
 8     wLast > wGap
 9     
10     p || wGap
11     
In [77]:
S.count(progress=1, limit=8)
  0.00s Counting results per 1 up to 8 ...
   |       27s 1
   |       27s 2
   |       27s 3
   |       27s 4
   |       27s 5
   |       27s 6
   |       48s 7
   |       48s 8
    48s Done: 8 results

This is a good example of a query that is slow to deliver even its first result. And that is bad, because it is such a straightforward query.

Why is this one so slow, while the previous one went so smoothly?

The crucial thing is the wGap word. In the latter template, wGap is not embedded in anything. It is constrained by wFirst < wGap and wGap < wLast. However, the way the search strategy works is by examining all possibilities for wFirst < wGap and only then checking whether wGap < wLast. The algorithm cannot check both conditions at the same time.

With embedding relations, things are better. Text-Fabric is heavily optimized to deal with embedding relationships.

In the former template, we see that the wGap is required to be adjacent to wPreGap, and this one is embedded in the phrase. Hence there are few cases to consider for wPreGap, and per instance there is only one wGap.

Lesson Try to prevent the use of free floating nodes in your template that become constrained by other spatial relationships than embedding.

To the rescue

The former template had it right. Can we rescue the latter template?

We can assume that the phrase and the gap each contain a word in one and the same verse. Note that phrase and gap may belong to different clauses and sentences. We assume that a phrase cannot belong to more than two verses, so either the first or the last word of the phrase is in the same verse as a word in the gap.

In [78]:
query = '''
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap

v:verse

v [[ wFirst
v [[ wGap
'''
In [79]:
S.study(query)
S.showPlan(details=True)
S.count(progress=100, limit=3000)
   |     0.00s Feature overview: 111 for nodes; 8 for edges; 2 configs; 8 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 5 objects ...
  0.21s Constraining search space with 9 relations ...
  1.30s 	2 edges thinned
  1.30s Setting up retrieval plan ...
  1.36s Ready to deliver results from 1209358 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
Search with 5 objects and 9 relations
Results are instantiations of the following objects:
node  0-phrase                            (253187   choices)
node  1-word                              (253187   choices)
node  2-word                              (253187   choices)
node  3-word                              (426584   choices)
node  4-verse                             ( 23213   choices)
Performance parameters:
	yarnRatio            =     1.2
	tryLimitFrom         =     100
	tryLimitTo           =     100
Instantiations are computed along the following relations:
node                      4-verse         ( 23213   choices)
edge  4-verse         [[  1-word          (    12.3 choices)
edge  1-word          ]]  0-phrase        (     1.0 choices)
edge  1-word          =:  0-phrase        (     1.0 choices (thinned))
edge  0-phrase        [[  2-word          (     1.0 choices)
edge  2-word          :=  0-phrase        (     1.0 choices (thinned))
edge  4-verse         [[  3-word          (    20.1 choices)
edge  3-word          <   2-word          (126593.5 choices)
edge  3-word          >   1-word          (126593.5 choices)
edge  3-word          ||  0-phrase        (227868.3 choices)
  1.37s The results are connected to the original search template as follows:
 0     
 1 R0  p:phrase
 2 R1      =: wFirst:word
 3 R2      wLast:word
 4         :=
 5     
 6 R3  wGap:word
 7     wFirst < wGap
 8     wLast > wGap
 9     
10     p || wGap
11     
12 R4  v:verse
13     
14     v [[ wFirst
15     v [[ wGap
16     
  0.00s Counting results per 100 up to 3000 ...
   |     0.47s 100
   |     0.82s 200
   |     1.06s 300
   |     1.19s 400
   |     1.29s 500
   |     1.57s 600
   |     1.63s 700
   |     1.94s 800
   |     2.04s 900
   |     2.20s 1000
   |     2.39s 1100
   |     2.52s 1200
   |     2.75s 1300
   |     3.41s 1400
   |     3.91s 1500
   |     4.21s 1600
   |     4.58s 1700
   |     4.75s 1800
   |     5.08s 1900
   |     5.41s 2000
   |     5.70s 2100
   |     5.91s 2200
   |     6.42s 2300
   |     7.82s 2400
   |     8.23s 2500
   |     8.60s 2600
   |     9.05s 2700
  9.07s Done: 2707 results
In [80]:
# ignore this
# S.tweakPerformance(yarnRatio=1)

We are going to run this query in shallow mode.

In [81]:
results = A.search(query, shallow=True)
    11s 671 results

Shallow mode tends to be quicker, but that does not always materialize. The number of results agrees with the first query. Yet we have been lucky, because we required the word in the gap to be in the same verse as the first word in the phrase. What if we require if it is the last word in the phrase?

In [82]:
query = '''
p:phrase
    =: wFirst:word
    wLast:word
    :=

wGap:word
wFirst < wGap
wLast > wGap

p || wGap

v:verse

v [[ wLast
v [[ wGap
'''
In [83]:
results = A.search(query, shallow=True)
    11s 660 results

Then we would not have found all results.

So, this road, although doable, is much less comfortable, performance-wise and logic-wise.

Check the gaps

In this misty landscape of gaps we need some corroboration that we found the right results.

  1. is every node in gapQueryResults a phrase?
  2. does every phrase in the gapQueryResults have a gap?
  3. is every gapped phrase contained in gapQueryResults?

We check all this by hand coding.

Here is a function that checks whether a phrase has a gap. If the distance between its end points is greater than the number of words it contains, it must have a gap.

In [84]:
def hasGap(p):
    words = L.d(p, otype='word')
    return words[-1] - words[0] + 1 > len(words)

Now we can perform the checks.

In [85]:
otypesGood = True
haveGaps = True

for p in gapQueryResults:
    otype = F.otype.v(p)
    if otype != 'phrase':
        print(f'Non phrase detected: {p}) is a {otype}')
        otypesGood = False
        break

    if not hasGap(p):
        print(f'Phrase without a gap: {p}')
        A.pretty(p)
        haveGaps = False
        break

print(f'{len(gapQueryResults)} nodes in query result')
if otypesGood:
    print('1. all nodes are phrases')
if haveGaps:
    print('2. all nodes have gaps')

inResults = True
for p in F.otype.s('phrase'):
    if hasGap(p):
        if p not in gapQueryResults:
            print(f'Gapped phrase outside query results: {p}')
            A.pretty(p)
            inResults = False
            break
            
if inResults:
    print('3. all gapped phrases are contained in the results')
671 nodes in query result
1. all nodes are phrases
2. all nodes have gaps
3. all gapped phrases are contained in the results

Note that by hand coding we can get the gapped phrases much more quickly and securely!

Custom sets for (non-)gapped phrases

We have obtained a set with all gapped phrases, and we have paid a price:

  • either an expensive query,
  • or an inconvenient bit of hand coding.

It would be nice if we could kick-start our queries using this set as a given. And that is exactly what we are going to do now.

We make two custom sets and give them a name, gapphrase for gapped phrases and conphrase for non-gapped phrases (consecutive phrases).

In [86]:
customSets = dict(
    gapphrase=gapQueryResults,
    conphrase=set(F.otype.s('phrase')) - gapQueryResults,
)

Suppose we want all verbs that occur in a gapped phrase.

In [87]:
query = '''
gapphrase
  word sp=verb
'''

Note that we have used the foreign name gapphrase in our search template, instead of phrase.

But we can still run search(), provided we tell it what we mean by gapphrase. We do that by passing the sets parameter to search(), which should be a dictionary of sets. Search will look up gapphrase in this dictionary, and will use its value, which should be a node set. That way, it understands that the expression gapphrase stands for the nodes in the given node set.

Here we go:

In [88]:
results = A.search(query, sets=customSets)
  0.57s 93 results
In [89]:
A.show(results, start=1, end=3)

result 1

sentence 64|2502
clause WayX
phrase Conj CP
conj and sp=conj
phrase Pred VP
verb say qal wayq sp=verb
phrase Subj PrNP
nmpr Leah sp=nmpr
sentence 65|2503
clause ZQtX
phrase PreO VP
verb bestow qal perf sp=verb
phrase Subj NP
subs god(s) sp=subs
phrase PreO VP|PP
prep <object marker> sp=prep
phrase Objc NP
subs endowment sp=subs
adjv good sp=adjv
sentence 66|2504
clause xYqX
phrase Modi NP
art the sp=art
subs foot sp=subs
phrase PreO VP
verb tolerate qal impf sp=verb
phrase Subj NP
subs man sp=subs
sentence 67|2505
clause xQt0
phrase Conj CP
conj that sp=conj
phrase Pred VP
verb bear qal perf sp=verb
phrase Cmpl PP
prep to sp=prep
phrase Objc NP
sentence 68|2506
clause Way0
phrase Conj CP
conj and sp=conj
phrase Pred VP
verb call qal wayq sp=verb
phrase Objc PP
prep <object marker> sp=prep
subs name sp=subs
phrase Objc PrNP
nmpr Zebulun sp=nmpr

result 2

sentence 118|2556
clause Way0
phrase Conj CP
conj and sp=conj
phrase Pred VP
verb turn aside hif wayq sp=verb
phrase Time PP
prep in sp=prep
art the sp=art
subs day sp=subs
art the sp=art
prde he sp=prps
phrase Objc PP
prep <object marker> sp=prep
art the sp=art
subs he-goat sp=subs
art the sp=art
adjv twisted sp=adjv
conj and sp=conj
art the sp=art
adjv patch qal ptcp sp=verb
phrase Objc PP|CP
conj and sp=conj
phrase Objc PP
prep <object marker> sp=prep
subs whole sp=subs
art the sp=art
subs goat sp=subs
art the sp=art
adjv speckled sp=adjv
conj and sp=conj
art the sp=art
adjv patch qal ptcp sp=verb
phrase Objc PP|NP
subs whole sp=subs
clause Attr NmCl
phrase Rela CP
conj <relative> sp=conj
phrase Subj NP
subs white sp=adjv
phrase PreC PP
prep in sp=prep
clause Way0|Defc
phrase Objc PP|CP
conj and sp=conj
phrase Objc PP|NP
subs whole sp=subs
adjv ruttish sp=adjv
phrase Objc PP
prep in sp=prep
art the sp=art
subs young ram sp=subs
sentence 119|2557
clause Way0
phrase Conj CP
conj and sp=conj
phrase Pred VP
verb give qal wayq sp=verb
phrase Cmpl PP
prep in sp=prep
subs hand sp=subs

result 3

sentence 118|2556
clause Way0
phrase Conj CP
conj and sp=conj
phrase Pred VP
verb turn aside hif wayq sp=verb
phrase Time PP
prep in sp=prep
art the sp=art
subs day sp=subs
art the sp=art
prde he sp=prps
phrase Objc PP
prep <object marker> sp=prep
art the sp=art
subs he-goat sp=subs
art the sp=art
adjv twisted sp=adjv
conj and sp=conj
art the sp=art
adjv patch qal ptcp sp=verb
phrase Objc PP|CP
conj and sp=conj
phrase Objc PP
prep <object marker> sp=prep
subs whole sp=subs
art the sp=art
subs goat sp=subs
art the sp=art
adjv speckled sp=adjv
conj and sp=conj
art the sp=art
adjv patch qal ptcp sp=verb
phrase Objc PP|NP
subs whole sp=subs
clause Attr NmCl
phrase Rela CP
conj <relative> sp=conj
phrase Subj NP
subs white sp=adjv
phrase PreC PP
prep in sp=prep
clause Way0|Defc
phrase Objc PP|CP
conj and sp=conj
phrase Objc PP|NP
subs whole sp=subs
adjv ruttish sp=adjv
phrase Objc PP
prep in sp=prep
art the sp=art
subs young ram sp=subs
sentence 119|2557
clause Way0
phrase Conj CP
conj and sp=conj
phrase Pred VP
verb give qal wayq sp=verb
phrase Cmpl PP
prep in sp=prep
subs hand sp=subs

That looks good.

We can also apply feature conditions to gapphrase:

In [90]:
query = '''
gapphrase function=Subj
'''
results = A.search(query, sets=customSets)
A.show(results, start=1, end=3)
  0.00s 176 results

result 1

sentence 59|151
clause WayX
sentence 60|152
clause WxY0
phrase Conj CP
conj and
phrase Nega NegP
phrase Pred VP

result 2

sentence 11|245
clause WXQt
phrase Conj CP
conj and
phrase Subj PrNP
phrase Pred VP
verb come hif perf
phrase Subj PrNP|PPrP
sentence 12|246
clause WayX

result 3

sentence 17|454
clause Ellp
phrase Subj PPrP
phrase Subj PPrP|PP
phrase Subj PPrP|CP
conj and
phrase Subj PPrP|NP
phrase Subj PPrP|PP
phrase Subj PPrP|CP
conj and
phrase Subj PPrP|NP
subs whole
art the
subs creeping animals
clause Attr Ptcp
phrase Rela CP
conj the
phrase PreC VP
verb creep qal ptca
phrase Cmpl PP
clause Ellp|Defc
phrase Subj PPrP|PP
phrase Subj PPrP|CP
conj and
phrase Subj PPrP|NP
phrase Subj PPrP|PP

Two-phrase clauses

We can find the gaps, but do our minds always reckon with gaps? Gaps cause unexpected semantics. Here is a little puzzle.

Suppose we want to count the clauses consisting of exactly two phrases.

Here follows a little journey. We use a query to find the clauses, check the result with hand-coding, scratch our heads, refine the query, the hand-coding and our question until we are satisfied.

Attempt 1

By query

The following template should do it: a clause, starting with a phrase, followed by an adjacent phrase, which terminates the clause.

In [91]:
query = '''
clause
    =: phrase
    <: phrase
    :=
'''
In [92]:
# ignore this
# S.tweakPerformance(yarnRatio=1.2)
In [93]:
S.study(query)
   |     0.00s Feature overview: 111 for nodes; 8 for edges; 2 configs; 8 computed
  0.00s Checking search template ...
  0.00s Setting up search space for 3 objects ...
  0.09s Constraining search space with 5 relations ...
  0.73s 	2 edges thinned
  0.73s Setting up retrieval plan ...
  0.75s Ready to deliver results from 264237 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results
In [94]:
S.showPlan(details=True)
Search with 3 objects and 5 relations
Results are instantiations of the following objects:
node  0-clause                            ( 88079   choices)
node  1-phrase                            ( 88079   choices)
node  2-phrase                            ( 88079   choices)
Performance parameters:
	yarnRatio            =     1.2
	tryLimitFrom         =     100
	tryLimitTo           =     100
Instantiations are computed along the following relations:
node                      0-clause        ( 88079   choices)
edge  0-clause        =:  1-phrase        (     0.3 choices)
edge  0-clause        [[  1-phrase        (     0.9 choices)
edge  0-clause        :=  2-phrase        (     1.0 choices (thinned))
edge  0-clause        [[  2-phrase        (     1.0 choices)
edge  1-phrase        <:  2-phrase        (     1.0 choices (thinned))
    10s The results are connected to the original search template as follows:
 0     
 1 R0  clause
 2 R1      =: phrase
 3 R2      <: phrase
 4         :=
 5     
In [95]:
results = A.search(query)
A.table(results, end=10)
  1.17s 23483 results
np clausephrasephrase
1 Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י אֹ֑ור
2 Genesis 1:4כִּי־טֹ֑וב כִּי־טֹ֑וב
3 Genesis 1:7אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ
4 Genesis 1:7אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ
5 Genesis 1:10כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
6 Genesis 1:11מַזְרִ֣יעַ זֶ֔רַע מַזְרִ֣יעַ זֶ֔רַע
7 Genesis 1:12כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
8 Genesis 1:14לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה
9 Genesis 1:15לְהָאִ֖יר עַל־הָאָ֑רֶץ לְהָאִ֖יר עַל־הָאָ֑רֶץ
10 Genesis 1:17לְהָאִ֖יר עַל־הָאָֽרֶץ׃ לְהָאִ֖יר עַל־הָאָֽרֶץ׃

If we want to have the clauses only, we run it in shallow mode:

In [96]:
clausesByQuery = sorted(A.search(query, shallow=True))
  1.10s 23483 results

By hand

Let us check this with a piece of hand-written code. We want clauses that consist of exactly two phrases.

In [97]:
indent(reset=True)
info('counting ...')

clausesByHand = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    if len(phrases) == 2:
        clausesByHand.append(clause)
clausesByHand = sorted(clausesByHand)
info(f'Done: found {len(clausesByHand)}')
  0.00s counting ...
  0.58s Done: found 23862

The difference

Strange, we end up with more cases. What is happening? Let us compare the results. We look at the first result where both methods diverge.

We put the difference finding in a little function.

In [98]:
def showDiff(queryResults, handResults):
    diff = [x for x in zip(queryResults, handResults) if x[0] != x[1]]
    if not diff:
        print(f'''
{len(queryResults):>6} queryResults
         are identical with
{len(handResults):>6} handResults
''')
        return
    (rQuery, rHand) = diff[0]
    if rQuery < rHand:
        print(f'clause {rQuery} is a query result but not found by hand')
        toShow = rQuery
    else:
        print(f'clause {rHand} is not a query result but has been found by hand')
        toShow = rHand
    colors = ['aqua', 'aquamarine', 'khaki', 'lavender', 'yellow']
    highlights = {}
    for (i, phrase) in enumerate(L.d(toShow, otype='phrase')):
        highlights[phrase] = colors[i % len(colors)]
        # for atom in L.d(phrase, otype='phrase_atom'):
        #     highlights[atom] = colors[i % len(colors)]
    A.pretty(toShow, withNodes=True, suppress={'lex', 'sp', 'vt', 'vs'}, highlights=highlights)
In [99]:
showDiff(clausesByQuery, clausesByHand)
clause 427931 is not a query result but has been found by hand

Lo and behold:

  • the hand-written code is right in a sense: this is a clause that consists exactly of two phrases.
  • the query is also right in a sense: the two phrases are not adjacent: there is a gap in the clause between them!

Attempt 2

By hand

We modify the hand-written code such that only clauses qualify if the two phrases are adjacent.

In [100]:
indent(reset=True)
info('counting ...')

clausesByHand2 = []
for clause in F.otype.s('clause'):
    phrases = L.d(clause, otype='phrase')
    if len(phrases) == 2:
        if L.d(phrases[0], otype='word')[-1] + 1 == L.d(phrases[1], otype='word')[0]:
            clausesByHand2.append(clause)
clausesByHand2 = sorted(clausesByHand2)
info(f'Done: found {len(clausesByHand2)}')
  0.00s counting ...
  0.68s Done: found 23399

The difference

Now we have less cases. What is going on?

In [101]:
showDiff(clausesByQuery, clausesByHand2)
clause 428692 is a query result but not found by hand

Observe:

This clause has three phrases, but the third one lies inside the second one.

  • the hand-written code is right in a sense: this clause has three phrases.
  • the query is right in a sense: it contains two adjacent phrases that together span the whole clause.

Attempt 3

By query

Can we adjust the pattern to exclude cases like this? Yes, with custom sets, see advanced.

Instead of looking through all phrases, we can just consider non gapped phrases only.

Earlier in this notebook we have constructed the set of non-gapped phrases and put it under the name conphrase in the custom sets.

In [102]:
query = '''
clause
    =: conphrase
    <: conphrase
    :=
'''

clausesByQuery2 = sorted(A.search(query, sets=customSets, shallow=True))
  1.11s 23327 results

The difference

There is still a difference.

In [103]:
showDiff(clausesByQuery2, clausesByHand2)
clause 428374 is not a query result but has been found by hand

Observe:

This clause has two phrases, the second one has a gap, which coincides with a gap in the clause.

  • the hand-written code is right in a sense: this clause has two phrases, adjacent, and they span the whole clause, nothing left out.
  • the query is right in a sense: the second phrase is not consecutive.

Attempt 4

By hand

We modify the hand-written code, so that only consecutive clauses qualify.

In [104]:
indent(reset=True)
info('counting ...')

clausesByHand3 = []
for clause in F.otype.s('clause'):
    if hasGap(clause):
        continue
    phrases = L.d(clause, otype='phrase')
    if len(phrases) == 2:
        if L.d(phrases[0], otype='word')[-1] + 1 == L.d(phrases[1], otype='word')[0]:
            clausesByHand3.append(clause)
clausesByHand3 = sorted(clausesByHand3)
info(f'Done: found {len(clausesByHand3)}')
  0.00s counting ...
  0.95s Done: found 23327

The difference

Now the number of results agree. But are they really the same?

In [105]:
showDiff(clausesByQuery2, clausesByHand3)
 23327 queryResults
         are identical with
 23327 handResults

Conclusion

It took four attempts to arrive at the final concept of things that we were looking for.

Sometimes the search template had to be modified, sometimes the hand-written code.

The interplay and systematic comparison between the attempts helped to spot all relevant configurations of phrases within clauses.

Spans

Here is another cause of wrong query results: there are sentences that span multiple verses. Such sentences are not contained in any verse. That makes that they are easily missed out in queries.

We describe a scenario where that happens.

Mother clauses

A clause and its mother do not have to be in the same verse. We are going to fetch are the cases where they are in different verses.

All mother clauses

But first we fetch all pairs of clauses connected by a mother edge.

In [106]:
query = '''
clause
-mother> clause
'''
allMotherPairs = A.search(query)
A.table(results, end=10)
  0.17s 13907 results
np clausephrasephrase
1 Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י אֹ֑ור
2 Genesis 1:4כִּי־טֹ֑וב כִּי־טֹ֑וב
3 Genesis 1:7אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ
4 Genesis 1:7אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ
5 Genesis 1:10כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
6 Genesis 1:11מַזְרִ֣יעַ זֶ֔רַע מַזְרִ֣יעַ זֶ֔רַע
7 Genesis 1:12כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
8 Genesis 1:14לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה
9 Genesis 1:15לְהָאִ֖יר עַל־הָאָ֑רֶץ לְהָאִ֖יר עַל־הָאָ֑רֶץ
10 Genesis 1:17לְהָאִ֖יר עַל־הָאָֽרֶץ׃ לְהָאִ֖יר עַל־הָאָֽרֶץ׃

Mother in another verse

Now we modify the query to the effect that mother and daughter must sit in distinct verses.

In [107]:
query = '''
cm:clause
-mother> cd:clause

v1:verse
v2:verse
v1 # v2

cm ]] v1
cd ]] v2
'''
diffMotherPairs = A.search(query)
A.table(results, end=10)
  0.35s 721 results
np clausephrasephrase
1 Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י אֹ֑ור
2 Genesis 1:4כִּי־טֹ֑וב כִּי־טֹ֑וב
3 Genesis 1:7אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ
4 Genesis 1:7אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ
5 Genesis 1:10כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
6 Genesis 1:11מַזְרִ֣יעַ זֶ֔רַע מַזְרִ֣יעַ זֶ֔רַע
7 Genesis 1:12כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
8 Genesis 1:14לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה
9 Genesis 1:15לְהָאִ֖יר עַל־הָאָ֑רֶץ לְהָאִ֖יר עַל־הָאָ֑רֶץ
10 Genesis 1:17לְהָאִ֖יר עַל־הָאָֽרֶץ׃ לְהָאִ֖יר עַל־הָאָֽרֶץ׃

Mother in same verse

As a check, we modify the latter query and require v1 and v2 to be the same verse, to get the mother pairs of which both members are in the same verse.

In [108]:
query = '''
cm:clause
-mother> cd:clause

v1:verse
v2:verse
v1 = v2

cm ]] v1
cd ]] v2
'''
sameMotherPairs = A.search(query)
A.table(results, end=10)
  0.36s 13160 results
np clausephrasephrase
1 Genesis 1:3יְהִ֣י אֹ֑ור יְהִ֣י אֹ֑ור
2 Genesis 1:4כִּי־טֹ֑וב כִּי־טֹ֑וב
3 Genesis 1:7אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ אֲשֶׁר֙ מִתַּ֣חַת לָרָקִ֔יעַ
4 Genesis 1:7אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ אֲשֶׁ֖ר מֵעַ֣ל לָרָקִ֑יעַ
5 Genesis 1:10כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
6 Genesis 1:11מַזְרִ֣יעַ זֶ֔רַע מַזְרִ֣יעַ זֶ֔רַע
7 Genesis 1:12כִּי־טֹֽוב׃ כִּי־טֹֽוב׃
8 Genesis 1:14לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה לְהַבְדִּ֕יל בֵּ֥ין הַיֹּ֖ום וּבֵ֣ין הַלָּ֑יְלָה
9 Genesis 1:15לְהָאִ֖יר עַל־הָאָ֑רֶץ לְהָאִ֖יר עַל־הָאָ֑רֶץ
10 Genesis 1:17לְהָאִ֖יר עַל־הָאָֽרֶץ׃ לְהָאִ֖יר עַל־הָאָֽרֶץ׃

The difference

Let's check if the numbers add up:

  • the first query asked for all pairs
  • the second query asked for pairs with members in different verses
  • the third query asked for pairs with members in the same verse

Then the results of the second and third query combined should equal the results of the first query.

That makes sense.

Still, let's check:

In [109]:
discrepancy = len(allMotherPairs) - len(diffMotherPairs) - len(sameMotherPairs)
print(discrepancy)
26

The numbers do not add up. We are missing cases. Why?

Clauses may cross verse boundaries. In that case they are not part of a verse, and hence our latter two queries do not detect them. Let's count how many verse boundary crossing clauses there are.

In [110]:
query = '''
clause
/with/
v1:verse
&& ..
v2:verse
&& ..
v1 < v2
/-/
'''
results = A.search(query)
  2.43s 50 results

You might think we can speed up the query by requiring v1 <: v2 (both verses are adjacent). There are less possibilities to consider, to maybe we gain something.

In [111]:
query = '''
clause
/with/
v1:verse
&& ..
v2:verse
&& ..
v1 <: v2
/-/
'''
results = A.search(query)
  2.34s 49 results

Indeed, slightly faster, but one result less! How can that be?

There must be a clause that spans at least two verses and in doing so, skips at least one verse.

Let's find that one:

In [7]:
query = '''
clause
/with/
v1:verse
&& ..
v2:verse
|| ..
v3:verse
&& ..
v1 < v2
v2 < v3
v1 < v3
/-/
'''
resultsX = A.search(query)
  3.11s 1 result
In [113]:
A.table(resultsX)
A.show(resultsX)

result 1

sentence 92|866
clause WXYq|Defc
phrase Conj CP
conj and
clause Attr NmCl
phrase Rela CP
conj <relative>
phrase Nega NegP
phrase PreC PP
phrase PreC PP|PrNP
phrase Subj PPrP
clause Coor WQt0
phrase Conj CP
conj and
phrase Pred VP
verb come qal perf
sentence 92|868
clause WXYq|XYqt
sentence 94|869
clause WQt0
phrase Conj CP
conj and
phrase Pred VP
verb make qal perf
phrase Adju PP
clause Attr xYqX
phrase Rela CP
conj <relative>
phrase Pred VP
verb call qal impf
phrase Subj NP
sentence 95|870
clause xYqX
phrase Conj CP
phrase Pred VP
verb know qal impf
phrase Objc PP
prep <object marker>
clause Adju InfC
phrase Pred VP
prep to
verb fear qal infc
phrase Objc PP
prep <object marker>
phrase Adju PP|PrNP
clause Coor InfC
phrase Conj CP
conj and
phrase Pred VP
prep to
verb know qal infc
clause Objc XQtl
phrase Conj CP
phrase Pred VP
verb call nif perf
clause Attr xQt0
phrase Rela CP
conj <relative>
phrase Pred VP

A more roundabout way to find the same clauses:

In [114]:
query = '''
clause
    =: first:word
    last:word
    :=
v1:verse
    w1:word
v2:verse
    w2:word
    
first = w1
last = w2
v1 # v2
'''
results = A.search(query)
  2.48s 50 results

Some of these verse spanning clauses do not have mothers or are not mothers. Let's count the cases where two clauses are in a mother relation and at least one of them spans a verse.

We need two queries for that. These queries are almost similar. One retrieves the clause pairs where the mother crosses verse boundaries, and the other where the daughter does so.

But we are programmers. We do not have to repeat ourselves:

In [115]:
queryCommon = '''
c1:clause
-mother> c2:clause

c3:clause
/with/
v1:verse
&& ..
v2:verse
&& ..
v1 < v2
/-/
'''

query1 = f'''
{queryCommon}
c1 = c3
'''
query2 = f'''
{queryCommon}
c2 = c3
'''

results1 = A.search(query1, silent=True)
results2 = A.search(query2, silent=True)
spannersByQuery = {(r[0], r[1]) for r in results1 + results2}
print(f'{len(spannersByQuery):>3} spanners are missing')
print(f'{discrepancy:>3} missing cases were detected before')
print(f'{discrepancy - len(spannersByQuery):>3} is the resulting disagreement')
 26 spanners are missing
 26 missing cases were detected before
  0 is the resulting disagreement

We may find the mother clause pairs in which it least one member is verse spanning by hand-coding in an easier way:

Starting with the set of all mother pairs, we filter out any pair that has a verse spanner.

In [116]:
spannersByHand = set()

for (c1, c2) in allMotherPairs:
    if not (
        L.u(c1, otype='verse')
        and
        L.u(c2, otype='verse')
    ):
        spannersByHand.add((c1, c2))
        
len(spannersByHand)
Out[116]:
26

And, to be completely sure:

In [117]:
spannersByHand == spannersByQuery
Out[117]:
True

By custom sets

If we are content with the clauses that do not span verses, we can put them in a set, and modify the queries by replacing clause by conclause and bind the right set to it.

Here we go. In one cell we run the queries to get all pairs, the mother-daughter-in-separate-verses pairs, and the mother-daughter-in-same-verses pair and we do the math of checking.

In [118]:
conClauses = {c for c in F.otype.s('clause') if L.u(c, otype='verse')}
customSets = dict(conclause=conClauses)

print('All pairs')
allPairs = A.search('''
conclause
-mother> conclause
''', 
    sets=customSets,
)

print('Different verse pairs')
diffPairs = A.search('''
cm:conclause
-mother> cd:conclause

v1:verse
v2:verse
v1 # v2

cm ]] v1
cd ]] v2
''',
    sets=customSets,
)

print('Same verse pairs')
samePairs = A.search('''
cm:conclause
-mother> cd:conclause

v1:verse
v2:verse
v1 = v2

cm ]] v1
cd ]] v2
''',
    sets=customSets,
)

allPairSet = set(allPairs)
diffPairSet = {(r[0], r[1]) for r in diffPairs}
samePairSet = {(r[0], r[1]) for r in samePairs}

print(f'Intersection same-verse/different-verse pairs: {samePairSet & diffPairSet}')
print(f'All pairs is union of same-verse/different-verse pairs: {allPairSet == (samePairSet | diffPairSet)}')
All pairs
  0.20s 13881 results
Different verse pairs
  0.27s 721 results
Same verse pairs
  0.30s 13160 results
Intersection same-verse/different-verse pairs: set()
All pairs is union of same-verse/different-verse pairs: True

Lessons

  • mix programming with composing queries;
  • a good way to do so is custom sets;
  • use programming for processing results;
  • find the balance between queries and hand-coding.

Next

You have now finished the search tutorial.

If you are interested in reproducing MQL queries in Text-Fabric search templates, see fromMQL.


basic advanced sets relations quantifiers rough gaps