from tf.app import use
We do not only load the main corpus data, but also the additional sim (similarity) feature that is in a module.
A = use('banks', mod='annotation/banks/sim/tf', hoist=globals())
connecting to online GitHub repo annotation/app-banks ... connected Using TF-app in /Users/dirk/text-fabric-data/annotation/app-banks/code: #f7d4ab9681130d9f7441b2e8ed893c90a38cb72f (latest commit) connecting to online GitHub repo annotation/banks ... connected Using data in /Users/dirk/text-fabric-data/annotation/banks/tf/0.2: rv2.0=#9713e71c18fd296cf1860d6411312f9127710ba7 (latest release) connecting to online GitHub repo annotation/banks ... connected Using data in /Users/dirk/text-fabric-data/annotation/banks/sim/tf/0.2: #0c148b3af1fb8801d1300866b0e72441a59b9548 (latest commit) | 0.00s No structure info in otext, the structure part of the T-API cannot be used
We print all similar pairs of words that are at least 50% similar but not 100%.
query = ''' word <sim>50> word '''
results = A.search(query)
0.01s 170 results
We sort each pair. We keep track of pairs we have seen in order to prevent printing duplicate pairs.
seen = set() for (w1, w2) in results: if (w2, 100) in E.sim.b(w1): continue letters1 = F.letters.v(w1) letters2 = F.letters.v(w2) pair = tuple(sorted((letters1, letters2))) if pair in seen: continue seen.add(pair) print(' ~ '.join(pair))
know ~ own harness ~ patterns nothing ~ things that ~ that’s the ~ those bottom ~ most life ~ line societies ~ those not ~ to make ~ take elegant ~ languages mattered ~ terms left ~ life humans ~ mountains care ~ romance studying ~ things impossible ~ problems