You might want to consider the start of this tutorial.
Short introductions to other TF datasets:
or the
Search in Text-Fabric is a template based way of looking for structural patterns in your dataset.
Within Text-Fabric we have the unique possibility to combine the ease of formulating search templates for complicated syntactical patterns with the power of programmatically processing the results.
This notebook will show you how to get up and running.
Search is a powerful feature for a wide range of purposes.
Quite a bit of the implementation work has been dedicated to optimize performance. Yet I do not pretend to have found optimal strategies for all possible search templates. Some search tasks may turn out to be somewhat costly or even very costly.
That being said, I think search might turn out helpful in many cases, especially by reducing the amount of hand-coding needed to work with special subsets of your data.
Search is as simple as saying (just an example)
results = A.search(template)
A.show(results)
See all ins and outs in the search template docs.
%load_ext autoreload
%autoreload 2
The ins and outs of installing Text-Fabric, getting the corpus, and initializing a notebook are explained in the start tutorial.
from tf.app import use
A = use("etcbc/dhammapada", hoist=globals())
This is Text-Fabric 9.2.0 Api reference : https://annotation.github.io/text-fabric/tf/cheatsheet.html 16 features found and 0 ignored
We start with the most simple form of issuing a query.
Let's search for the word Māro
in the Pali text.
We also want to show the clauses in which they occur.
But first: how do you type that ā
? To be honest: I don't know either.
Text-Fabric has a handy function to give you a palette of all the non-ASCII characters in the corpus:
A.specialCharacters()
Special characters in text-orig-full
â
ā
ḍ
ê
ë
ḥ
î
ī
ḷ
ṃ
ñ
ṅ
ṇ
ȏ
ṭ
û
ū
Now, if you click on a letter, it is stored on your clipboard, ready to paste. To help you remember where you clicked last, the letter becomes yellow.
query = """
clause
word pali=Māro
"""
results = A.search(query)
0.01s 5 results
We have the results. We only need to display them. Here they are in a table:
A.table(results)
n | p | clause | word |
---|---|---|---|
1 | 1 7 | subhānupassiṃ viharantaṃ indriyesu asaṃvutaṃ bhojanamhi câmattaññuṃ kusītaṃ hīnavīriyaṃ taṃ ve pasahatī Māro vāto rukkhaṃ va dubbalaṃ. | Māro |
2 | 1 8 | asubhānupassiṃ viharantaṃ indriyesu susaṃvutaṃ bhojanamhi ca mattaññuṃ saddhaṃ āraddhavīriyaṃ taṃ [ve] na-ppasahatī Māro vāto selaṃ va pabbataṃ. | Māro |
3 | 4 57 | tesaṃ sampannasīlānaṃ appamādavihārinaṃ sammadaññāvimuttānaṃ Māro maggaṃ na vindati. | Māro |
4 | 8 105 | n' eva devo na gandhabbo na Māro saha Brahmunā jitaṃ apajitaṃ kayrā tathārūpassa jantuno. | Māro |
5 | 24 337 | taṃ vo vadāmi bhaddaṃ vo yāvant' ettha samāgatā taṇhāya mūlaṃ khanatha usīrattho va bīraṇaṃ mā vo naḷaṃ va soto va Māro bhañji punappunaṃ. | Māro |
The hyperlinks in the p
column point to the Tipitaka site, to the stanza most relevant to the individual results.
Here is the first one in a pretty display:
A.show(results, end=1)
result 1
We can also stop unravelling structure at the clause level:
A.show(results, end=2, baseTypes={"clause"})
result 1
result 2
There are two fundamentally different ways of presenting the results: condensed and uncondensed.
In uncondensed view, all results are listed individually. You can keep track of which parts belong to which results. The display can become unwieldy.
This is the default view, because it is the straightest, most logical, answer to your query.
In condensed view all nodes of all results are grouped in containers first (e.g. stanzas), and then presented container by container. You loose the information of what parts belong to what result.
Here is an example of the difference.
query = """
clause
word pali=maṃ
"""
results = A.search(query)
0.01s 7 results
A.table(results)
n | p | clause | word |
---|---|---|---|
1 | 1 3 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ |
2 | 1 3 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ |
3 | 1 3 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ |
4 | 1 4 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ |
5 | 1 4 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ |
6 | 1 4 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ |
7 | 26 414 | yo' maṃ palipathaṃ duggaṃ saṃsāraṃ moham accagā tiṇṇo pāragato jhāyī anejo akathaṃkathī anupādāya nibbuto tam - | maṃ |
There are multiple occurrences of maṃ
in the clauses.
Now in condensed mode:
A.table(results, condensed=True)
Much more compact.
And in a pretty display we get for the first 6 hits:
A.show(results, end=2, condensed=True)
stanza 1
stanza 2
We can make it more compact by condensing into clauses instead of stanzas:
A.show(results, end=2, condensed=True, condenseType="clause")
We can apply different highlight colours to different parts of the result. The words in the pair are member 5 and 6 of the result tuples. The members that we do not map, will not be highlighted. The members that we map to the empty string will be highlighted with the default color.
NB: Choose your colours from the CSS specification.
query = """
clause
word pali=maṃ
word pali=avadhi
"""
results = A.search(query)
0.02s 6 results
A.table(results, condensed=False, colorMap={1: "", 2: "cyan", 3: "magenta"})
n | p | clause | word | word |
---|---|---|---|---|
1 | 1 3 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ | avadhi |
2 | 1 3 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ | avadhi |
3 | 1 3 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ | avadhi |
4 | 1 4 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ | avadhi |
5 | 1 4 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ | avadhi |
6 | 1 4 | "akkocchi maṃ avadhi maṃ ajini maṃ ahāsi me", | maṃ | avadhi |
Or with more glory:
A.show(results, end=2, condensed=False, condenseType="sentence", colorMap={1: "", 2: "cyan", 3: "magenta"})
result 1
result 2
Color mapping works best for uncondensed results. If you condense results, some nodes may occupy different positions in different results. It is unpredictable which color will be used for such nodes:
A.show(results, end=1, condensed=True, condenseType="sentence", colorMap={1: "", 2: "cyan", 3: "magenta"})
sentence 1
You can stipulate an order on the words in your template.
You only have to put a relational operator between them.
Say we want only results where maṃ
follows avadhi
.
A.specialCharacters()
Special characters in text-orig-full
â
ā
ḍ
ê
ë
ḥ
î
ī
ḷ
ṃ
ñ
ṅ
ṇ
ȏ
ṭ
û
ū
query = """
clause
word pali=maṃ
> word pali=avadhi
"""
results = A.search(query)
0.02s 4 results
A.table(results, colorMap={1: "", 2: "cyan", 3: "magenta"})
We can also require the words to be adjacent.
query = """
clause
word pali=maṃ
:> word pali=avadhi
"""
results = A.search(query)
0.02s 2 results
A.table(results, colorMap={1: "", 2: "cyan", 3: "magenta"})
We would like to see the frequency. The way to do that, is to perform a display setup first. By the way, we can also include the highlight colours in the display setup.
A.displaySetup(
extraFeatures="freq_occ", colorMap={2: "lightsalmon", 3: "mediumaquamarine"}
)
A.show(results, condensed=False, condenseType="sentence")
result 1
result 2
Now we completely reset the display customization.
A.displayReset()
As you see, you have total control.