Start with convert

The Banks example corpus as app

In [1]:
from import use

We do not only load the main corpus data, but also the additional sim (similarity) feature that is in a module.

In [2]:
A = use('banks', mod='annotation/banks/sim/tf', hoist=globals())
	connecting to online GitHub repo annotation/app-banks ... connected
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-banks/code:
	#f7d4ab9681130d9f7441b2e8ed893c90a38cb72f (latest commit)
	connecting to online GitHub repo annotation/banks ... connected
Using data in /Users/dirk/text-fabric-data/annotation/banks/tf/0.2:
	rv2.0=#9713e71c18fd296cf1860d6411312f9127710ba7 (latest release)
	connecting to online GitHub repo annotation/banks ... connected
Using data in /Users/dirk/text-fabric-data/annotation/banks/sim/tf/0.2:
	#0c148b3af1fb8801d1300866b0e72441a59b9548 (latest commit)
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used
Documentation: BANKS Character table Feature docs banks API Text-Fabric API 7.8.8 Search Reference
Loaded features:

Two quotes from Consider Phlebas by Iain M. Banks: author gap letters number otype punc terminator title oslots

annotation/banks/sim/tf: sim

Use the similarity edge feature

We print all similar pairs of words that are at least 50% similar but not 100%.

In [3]:
query = '''
<sim>50> word
In [4]:
results =
  0.01s 170 results

We sort each pair. We keep track of pairs we have seen in order to prevent printing duplicate pairs.

In [5]:
seen = set()
for (w1, w2) in results:
  if (w2, 100) in E.sim.b(w1):
  letters1 = F.letters.v(w1)
  letters2 = F.letters.v(w2)
  pair = tuple(sorted((letters1, letters2)))
  if pair in seen:
  print(' ~ '.join(pair))
know ~ own
harness ~ patterns
nothing ~ things
that ~ that’s
the ~ those
bottom ~ most
life ~ line
societies ~ those
not ~ to
make ~ take
elegant ~ languages
mattered ~ terms
left ~ life
humans ~ mountains
care ~ romance
studying ~ things
impossible ~ problems

All chapters: