This notebook gets you started with using Text-Fabric for coding in the letters of René Descartes.
Familiarity with the underlying data model is recommended.
For provenance, see the documentation: about.
See the installation instructions.
We will run computer code in the cells below, and this code makes use of the
text-fabric library, shortly called tf
.
We import some standard Python modules and then we import the use
function from text-fabric.
import sys, os
from tf.app import use
Now we are going to use the use
function.
We want to use a corpus, and if we specify what corpus, text-fabric will the data for us.
If you have cloned the CLARIAH/descartes-tf
repository to your local machine under the directory
~/github/CLARIAH/descartes-tf
then you already have the data. In that case you have to call the use command like this:
A = use("CLARIAH/descartes-tf:clone", checkout="clone", hoist=globals())
Below we give the command for the case where you have not cloned the repository. Text-Fabric will fetch the data from the internet and store it in your directory
~/text-fabric-data/github/CLARIAH/descartes-tf
.
In both cases, the corpus data will be optimised for fast processing, a one time job.
A = use("CLARIAH/descartes-tf", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
volume | 8 | 85241.88 | 100 |
letter | 725 | 940.60 | 100 |
page | 2884 | 236.45 | 100 |
postscriptum | 56 | 46.79 | 0 |
opener | 545 | 1.97 | 0 |
closer | 541 | 13.10 | 1 |
address | 86 | 15.22 | 0 |
head | 725 | 23.37 | 2 |
p | 8438 | 80.82 | 100 |
sentence | 13074 | 50.14 | 96 |
hi | 5972 | 4.63 | 4 |
formula | 6200 | 1.21 | 1 |
figure | 319 | 1.00 | 0 |
word | 681935 | 1.00 | 100 |
Just to show the results of the optimization step: if we give the same command again, the data is loaded much quicker.
A = use("CLARIAH/descartes-tf", hoist=globals())
Locating corpus resources ...
Name | # of nodes | # slots/node | % coverage |
---|---|---|---|
volume | 8 | 85241.88 | 100 |
letter | 725 | 940.60 | 100 |
page | 2884 | 236.45 | 100 |
postscriptum | 56 | 46.79 | 0 |
opener | 545 | 1.97 | 0 |
closer | 541 | 13.10 | 1 |
address | 86 | 15.22 | 0 |
head | 725 | 23.37 | 2 |
p | 8438 | 80.82 | 100 |
sentence | 13074 | 50.14 | 96 |
hi | 5972 | 4.63 | 4 |
formula | 6200 | 1.21 | 1 |
figure | 319 | 1.00 | 0 |
word | 681935 | 1.00 | 100 |
The messages after loading the corpus contain a lot of information about it.
Tip: click the triangles and the links, and have a quick look.
The Text-Fabric line has various links to the API docs.
Under Node types you find statistics about the corpus.
Under Descartes = Descartes, all letters you find the features of the corpus with short descriptions.
This corpus has additional material: illustrations. They have been downloaded automatically in the process, and you see how many there are.
query = """
formula notation=TeX
"""
results = A.search(query)
0.01s 219 results
Let's show a few.
A.table(results, end=3)
n | p | formula |
---|---|---|
1 | 1 1046:11 | 134916276481 |
2 | 1 1060:3 | 4.900x6 aequat −4.899x5+2.354x4+16.858x3+9.458xx+429x−4.900 |
3 | 1 1060:9 | 3xx−1x2 |
You can see them in context as well:
A.show(results, end=3)
result 1
result 2
result 3
By now you have an impression how to orient yourself in this corpus. The next steps will show you how to get powerful: searching and computing.
After that it is time for collecting results, use them in new annotations and share them.
Advanced
CC-BY Dirk Roorda