Start with convert


The Banks example corpus as app

In [1]:
from tf.app import use

We do not only load the main corpus data, but also the additional sim (similarity) feature that is in a module.

In [3]:
A = use('banks', mod='annotation/banks/sim/tf', hoist=globals())
# A = use('banks:clone', checkout="clone", mod='annotation/banks/sim/tf:clone', hoist=globals())
rate limit is 5000 requests per hour, with 4906 left for this hour
	connecting to online GitHub repo annotation/app-banks ... connected
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-banks/code:
	rv2.0.0=#fb0df6a00a84c8b77d9283b3f3241c3d2d618dae (latest release)
rate limit is 5000 requests per hour, with 4901 left for this hour
	connecting to online GitHub repo annotation/banks ... connected
Using data in /Users/dirk/text-fabric-data/annotation/banks/tf/0.2:
	rv2.0=#9713e71c18fd296cf1860d6411312f9127710ba7 (latest release)
rate limit is 5000 requests per hour, with 4896 left for this hour
	connecting to online GitHub repo annotation/banks ... connected
Using data in /Users/dirk/text-fabric-data/annotation/banks/sim/tf/0.2:
	#8d87675fd02ee96ad6f4c3a5ce99e0bda8277a54 (latest commit)
   |     0.00s Dataset without structure sections in otext:no structure functions in the T-API
Documentation: BANKS Character table Feature docs banks API Text-Fabric API 8.0.1 Search Reference
Loaded features:

annotation/banks/sim/tf: sim

Two quotes from Consider Phlebas by Iain M. Banks: author gap letters number otype punc terminator title oslots

Use the similarity edge feature

We print all similar pairs of words that are at least 50% similar but not 100%.

In [4]:
query = '''
word
<sim>50> word
'''
In [5]:
results = A.search(query)
  0.01s 170 results
In [6]:
A.table(results, end=10, withPassage="1 2")
nwordword
1Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
2Consider Phlebas 1:1  Everything Consider Phlebas 1:1  everything
3Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
4Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
5Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
6Consider Phlebas 1:1  us, Consider Phlebas 1:1  us,
7Consider Phlebas 1:1  everything Consider Phlebas 1:1  Everything
8Consider Phlebas 1:1  everything Consider Phlebas 1:1  everything
9Consider Phlebas 1:1  we Consider Phlebas 1:2  we
10Consider Phlebas 1:1  we Consider Phlebas 1:2  we
In [7]:
A.show(results, end=5)

result 1

Consider Phlebas 1:1 
sentence
line
Everything
line
about
line
us,
line
everything
line
around
line
us,
line
everything
line
we
line
know
and
can
know
of
line
is
line
composed
line
ultimately
line
of
line
patterns
line
of
line
nothing;
line
that’s
line
the
line
bottom
line
line,
line
the
line
final
line
truth.

result 2

Consider Phlebas 1:1 
sentence
line
Everything
line
about
line
us,
line
everything
line
around
line
us,
line
everything
line
we
line
know
and
can
know
of
line
is
line
composed
line
ultimately
line
of
line
patterns
line
of
line
nothing;
line
that’s
line
the
line
bottom
line
line,
line
the
line
final
line
truth.

result 3

Consider Phlebas 1:1 
sentence
line
Everything
line
about
line
us,
line
everything
line
around
line
us,
line
everything
line
we
line
know
and
can
know
of
line
is
line
composed
line
ultimately
line
of
line
patterns
line
of
line
nothing;
line
that’s
line
the
line
bottom
line
line,
line
the
line
final
line
truth.

result 4

Consider Phlebas 1:1 
sentence
line
Everything
line
about
line
us,
line
everything
line
around
line
us,
line
everything
line
we
line
know
and
can
know
of
line
is
line
composed
line
ultimately
line
of
line
patterns
line
of
line
nothing;
line
that’s
line
the
line
bottom
line
line,
line
the
line
final
line
truth.

result 5

Consider Phlebas 1:1 
sentence
line
Everything
line
about
line
us,
line
everything
line
around
line
us,
line
everything
line
we
line
know
and
can
know
of
line
is
line
composed
line
ultimately
line
of
line
patterns
line
of
line
nothing;
line
that’s
line
the
line
bottom
line
line,
line
the
line
final
line
truth.

We sort each pair. We keep track of pairs we have seen in order to prevent printing duplicate pairs.

In [8]:
seen = set()
for (w1, w2) in results:
  if (w2, 100) in E.sim.b(w1):
    continue
  letters1 = F.letters.v(w1)
  letters2 = F.letters.v(w2)
  pair = tuple(sorted((letters1, letters2)))
  if pair in seen:
    continue
  seen.add(pair)
  print(' ~ '.join(pair))
know ~ own
harness ~ patterns
nothing ~ things
that ~ that’s
the ~ those
bottom ~ most
life ~ line
societies ~ those
not ~ to
make ~ take
elegant ~ languages
mattered ~ terms
left ~ life
humans ~ mountains
care ~ romance
studying ~ things
impossible ~ problems

All chapters:


CC-BY Dirk Roorda