Notebook

In [1]:

%load_ext autoreload
%autoreload 2

In [2]:

from tf.fabric import Fabric
from tf.convert.walker import CV
import cProfile, pstats, io
from pstats import SortKey

In [3]:

TF_PATH = '_temp/tf'

Make test set¶

In [4]:

TF = Fabric(locations=TF_PATH, silent=True)

In [5]:

slotType = 'slot'
generic = {
    'name': 'test set for query strategy testing',
    'compiler': 'Dirk Roorda',
}
otext = {
    'fmt:text-orig-full': '{num}{cat} ',
    'sectionTypes': 'chunk',
    'sectionFeatures': 'num',
}
intFeatures = {
  'num',
}
featureMeta = {
    'num': {
        'description': 'node number',
    },
    'cat': {
        'description': 'category: m f n',
    },
}

nSlots = 400000
chunkSize = 4
cats = ['m', 'f', 'n']

def director(cv):
  c = None
  for n in range(nSlots):
    if n % chunkSize == 0:
      cv.terminate(c)
      c = cv.node('chunk')
      cv.feature(c, num=n // chunkSize)
    s = cv.slot()
    cv.feature(s, num=n, cat=cats[n % 3])
  cv.terminate(c)
    
cv = CV(TF)

good = cv.walk(
    director,
    slotType,
    otext=otext,
    generic=generic,
    intFeatures=intFeatures,
    featureMeta=featureMeta,
)

  0.00s Importing data from walking through the source ...
   |     0.00s Preparing metadata... 
   |     0.00s No structure nodes will be set up
   |   SECTION   TYPES:    chunk
   |   SECTION   FEATURES: num
   |   STRUCTURE TYPES:    
   |   STRUCTURE FEATURES: 
   |   TEXT      FEATURES:
   |      |   text-orig-full       cat, num
   |     0.01s OK
   |     0.00s Following director... 
   |     1.43s "edge" actions: 0
   |     1.44s "feature" actions: 500000
   |     1.44s "node" actions: 100000
   |     1.44s "resume" actions: 0
   |     1.44s "slot" actions: 400000
   |     1.44s "terminate" actions: 100001
   |     100000 x "chunk" node 
   |     400000 x "slot" node  = slot type
   |     500000 nodes of all types
   |     1.51s OK
   |     0.00s checking for nodes and edges ... 
   |     0.00s OK
   |     0.00s checking features ... 
   |     0.11s OK
   |     0.00s reordering nodes ...
   |     0.09s Sorting 100000 nodes of type "chunk"
   |     0.23s Max node = 500000
   |     0.24s OK
   |     0.00s reassigning feature values ...
   |      |     0.00s node feature "cat" with 400000 nodes
   |      |     0.09s node feature "num" with 500000 nodes
   |     0.30s OK
  0.00s Exporting 3 node and 1 edge and 1 config features to _temp/tf:
  0.00s VALIDATING oslots feature
  0.07s VALIDATING oslots feature
  0.07s maxSlot=     400000
  0.07s maxNode=     500000
  0.08s OK: oslots is valid
   |     0.56s T cat                  to _temp/tf
   |     0.69s T num                  to _temp/tf
   |     0.17s T otype                to _temp/tf
   |     0.24s T oslots               to _temp/tf
   |     0.00s M otext                to _temp/tf
  1.75s Exported 3 node features and 1 edge features and 1 config features to _temp/tf

Load test set¶

In [4]:

TF = Fabric(locations=TF_PATH, silent='deep')
api = TF.loadAll()
docs = api.makeAvailableIn(globals())
silentOff()

Main test1¶

This query template consists of a chunk and its first and last nodes, and an independent slot that is constrained between those nodes.

In [5]:

query = '''
chunk
  =: a:slot
  < c:slot
  :=

s:slot

a < s
s < c
'''

First we run it with a few old strategies. The strategies are not really documented, except from comments in the code because they are an implementation detail. In case you're interested, click the strategy names to go to the code:

The third one by_yarn_size is virtually identical for the kind of queries we are testing here. So we concentrate on the first two.

When we run the experiments, we do these steps:

study
show plan
fetch 10 results under a profiler and collect statistics

Strategy: small choice first¶

In [6]:

S.study(query, strategy='small_choice_first')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.20s Constraining search space with 7 relations ...
  0.54s 	2 edges thinned
  0.54s Setting up retrieval plan with strategy small_choice_first ...
  0.56s Ready to deliver results from 700000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [7]:

S.showPlan(details=True)

Search with 4 objects and 7 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          100000   choices
node  2-slot                                          100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            :=     2-slot               1.0 choices (thinned)
edge        0-chunk            [[     2-slot               0   choices
edge        0-chunk            [[     1-slot               1.0 choices
edge        1-slot             =:     0-chunk              0   choices
edge        1-slot             <      2-slot               0   choices
edge        1-slot             <      3-slot          200000.0 choices
edge        3-slot             <      2-slot               0   choices
  2.28s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1    =: a:slot
 3 R2    < c:slot
 4       :=
 5     
 6 R3  s:slot
 7     
 8     a < s
 9     s < c
10

In [8]:

pr = cProfile.Profile()
pr.enable()
results = S.fetch(limit=10)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

         6400243 function calls (4800149 primitive calls) in 1.832 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    1.832    0.916 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3230(run_code)
        2    0.000    0.000    1.832    0.916 {built-in method builtins.exec}
        1    0.000    0.000    1.832    1.832 <ipython-input-8-87271d1c4549>:3(<module>)
        1    0.000    0.000    1.832    1.832 /Users/dirk/github/annotation/text-fabric/tf/search/search.py:151(fetch)
        1    0.000    0.000    1.832    1.832 /Users/dirk/github/annotation/text-fabric/tf/search/searchexe.py:89(fetch)
       11    0.000    0.000    1.832    0.167 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:683(deliver)
1600105/11    1.503    0.000    1.832    0.167 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:690(stitchOn)
  3199998    0.237    0.000    0.237    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:73(<lambda>)
  1600027    0.091    0.000    0.091    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codeop.py:132(__call__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
       50    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:693(<genexpr>)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:383(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:142(__call__)
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:303(<lambda>)
        1    0.000    0.000    0.000    0.000 <ipython-input-8-87271d1c4549>:4(<module>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/utils/ipstruct.py:125(__getattr__)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:355(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:1258(user_global_ns)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:207(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:684(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

In [9]:

print('\n'.join(str(r) for r in results))

(400001, 1, 4, 2)
(400001, 1, 4, 3)
(400002, 5, 8, 6)
(400002, 5, 8, 7)
(400003, 9, 12, 10)
(400003, 9, 12, 11)
(400004, 13, 16, 14)
(400004, 13, 16, 15)
(400005, 17, 20, 18)
(400005, 17, 20, 19)

In [10]:

S.count(progress=1, limit=50)

  0.00s Counting results per 1 up to 50 ...
   |     0.00s 1
   |     0.00s 2
   |     0.29s 3
   |     0.30s 4
   |     0.57s 5
   |     0.57s 6
   |     0.84s 7
   |     0.84s 8
   |     1.11s 9
   |     1.11s 10
   |     1.38s 11
   |     1.38s 12
   |     1.66s 13
   |     1.66s 14
   |     1.93s 15
   |     1.93s 16
   |     2.20s 17
   |     2.20s 18
   |     2.48s 19
   |     2.48s 20
   |     2.76s 21
   |     2.76s 22
   |     3.13s 23
   |     3.13s 24
   |     3.45s 25
   |     3.45s 26
   |     3.75s 27
   |     3.75s 28
   |     4.02s 29
   |     4.02s 30
   |     4.30s 31
   |     4.30s 32
   |     4.65s 33
   |     4.65s 34
   |     4.99s 35
   |     4.99s 36
   |     5.28s 37
   |     5.28s 38
   |     5.56s 39
   |     5.56s 40
   |     5.83s 41
   |     5.83s 42
   |     6.11s 43
   |     6.11s 44
   |     6.39s 45
   |     6.39s 46
   |     6.66s 47
   |     6.66s 48
   |     7.03s 49
   |     7.04s 50
  7.04s Done: 50 results

Strategy: small choice multi¶

In [11]:

S.study(query, strategy='small_choice_multi')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.19s Constraining search space with 7 relations ...
  0.55s 	2 edges thinned
  0.55s Setting up retrieval plan with strategy small_choice_multi ...
  0.57s Ready to deliver results from 700000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [12]:

S.showPlan(details=True)

Search with 4 objects and 6 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          100000   choices
node  2-slot                                          100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            :=     2-slot               1.0 choices (thinned)
edge        0-chunk            [[     2-slot               0   choices
edge        0-chunk            [[     1-slot               1.0 choices
edge        1-slot             =:     0-chunk              0   choices
edge        1-slot             <      2-slot               0   choices
edge      1,2-slot            <,>     3-slot           20000.0 choices
  2.89s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1    =: a:slot
 3 R2    < c:slot
 4       :=
 5     
 6 R3  s:slot
 7     
 8     a < s
 9     s < c
10

Observe how two < > constraints have been taken together. They will be tested in one pass.

In [13]:

pr = cProfile.Profile()
pr.enable()
results = S.fetch(limit=10)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

         3200285 function calls (3200175 primitive calls) in 1.218 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    1.218    0.609 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3230(run_code)
        2    0.000    0.000    1.218    0.609 {built-in method builtins.exec}
        1    0.000    0.000    1.218    1.218 <ipython-input-13-87271d1c4549>:3(<module>)
        1    0.000    0.000    1.218    1.218 /Users/dirk/github/annotation/text-fabric/tf/search/search.py:151(fetch)
        1    0.000    0.000    1.218    1.218 /Users/dirk/github/annotation/text-fabric/tf/search/searchexe.py:89(fetch)
       11    0.000    0.000    1.218    0.111 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:683(deliver)
   121/11    0.981    0.008    1.218    0.111 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:690(stitchOn)
  1600024    0.120    0.000    0.120    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:73(<lambda>)
  1599974    0.117    0.000    0.117    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:83(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codeop.py:132(__call__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
       50    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:693(<genexpr>)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:383(<lambda>)
       53    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:142(__call__)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:355(<lambda>)
        1    0.000    0.000    0.000    0.000 <ipython-input-13-87271d1c4549>:4(<module>)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:303(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:1258(user_global_ns)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/utils/ipstruct.py:125(__getattr__)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:207(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:684(<listcomp>)

In [14]:

print('\n'.join(str(r) for r in results))

(400001, 1, 4, 2)
(400001, 1, 4, 3)
(400002, 5, 8, 6)
(400002, 5, 8, 7)
(400003, 9, 12, 10)
(400003, 9, 12, 11)
(400004, 13, 16, 14)
(400004, 13, 16, 15)
(400005, 17, 20, 18)
(400005, 17, 20, 19)

In [15]:

S.count(progress=1, limit=50)

  0.00s Counting results per 1 up to 50 ...
   |     0.00s 1
   |     0.00s 2
   |     0.24s 3
   |     0.24s 4
   |     0.46s 5
   |     0.46s 6
   |     0.68s 7
   |     0.68s 8
   |     0.90s 9
   |     0.90s 10
   |     1.12s 11
   |     1.12s 12
   |     1.34s 13
   |     1.34s 14
   |     1.56s 15
   |     1.56s 16
   |     1.78s 17
   |     1.78s 18
   |     2.00s 19
   |     2.00s 20
   |     2.22s 21
   |     2.22s 22
   |     2.44s 23
   |     2.44s 24
   |     2.67s 25
   |     2.67s 26
   |     2.97s 27
   |     2.97s 28
   |     3.24s 29
   |     3.24s 30
   |     3.50s 31
   |     3.50s 32
   |     3.73s 33
   |     3.73s 34
   |     3.95s 35
   |     3.95s 36
   |     4.17s 37
   |     4.17s 38
   |     4.39s 39
   |     4.40s 40
   |     4.62s 41
   |     4.62s 42
   |     4.93s 43
   |     4.93s 44
   |     5.20s 45
   |     5.20s 46
   |     5.47s 47
   |     5.47s 48
   |     5.69s 49
   |     5.69s 50
  5.69s Done: 50 results

Observations:¶

small_choice_multi has a better performance.

It does only 50% of the function calls that small_choice_first does: it cuts out nearly all calls to stitchOn() which is a recursive function that generates new candidates.

If you look at the primitive calls, then the gain is 30%.

If you look at the time spent in the stitchOn() calls, then you see that small_choice_first spends 50% more time in it than small_choice_multi.

N.B.

In small_choice_first 1,600,000 calls to stitchOn() take 1.5 seconds.

In small_choice_multi 121 calls to stitchOn() take 1.0 seconds.

That is remarkable. In order to compute the multi-edge, a lot of time per call is needed. But the net result is positive.

There is a price: the most time consuming bit is this line:

Main test 2¶

We leave out something of the query.

In [46]:

query = '''
chunk
  =: a:slot
  c:slot
  :=

s:slot

a < s
s < c
'''

It should not make a difference to the outcome that we omit the a < c condition, since all chunks have a length greater than 1, so the first slot of a chunk is always before the last one (and not identical with it).

Strategy: small choice first¶

In [47]:

S.study(query, strategy='small_choice_first')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.18s Constraining search space with 6 relations ...
  0.52s 	2 edges thinned
  0.52s Setting up retrieval plan with strategy small_choice_first ...
  0.54s Ready to deliver results from 700000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [48]:

S.showPlan(details=True)

Search with 4 objects and 6 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          100000   choices
node  2-slot                                          100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            [[     2-slot               1.0 choices
edge        2-slot             :=     0-chunk              0   choices
edge        0-chunk            [[     1-slot               1.0 choices
edge        1-slot             =:     0-chunk              0   choices
edge        2-slot             >      3-slot          200000.0 choices
edge        3-slot             >      1-slot               0   choices
  1.36s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1    =: a:slot
 3 R2    c:slot
 4       :=
 5     
 6 R3  s:slot
 7     
 8     a < s
 9     s < c
10

In [49]:

pr = cProfile.Profile()
pr.enable()
results = S.fetch(limit=10)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

         1600461 function calls (1600301 primitive calls) in 0.334 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.334    0.167 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3230(run_code)
        2    0.000    0.000    0.334    0.167 {built-in method builtins.exec}
        1    0.000    0.000    0.334    0.334 <ipython-input-49-87271d1c4549>:3(<module>)
        1    0.000    0.000    0.334    0.334 /Users/dirk/github/annotation/text-fabric/tf/search/search.py:151(fetch)
        1    0.000    0.000    0.334    0.334 /Users/dirk/github/annotation/text-fabric/tf/search/searchexe.py:89(fetch)
       11    0.000    0.000    0.334    0.030 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:683(deliver)
   171/11    0.228    0.001    0.334    0.030 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:690(stitchOn)
  1600074    0.106    0.000    0.106    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:82(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codeop.py:132(__call__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
      103    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       50    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:693(<genexpr>)
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:301(<lambda>)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:353(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:142(__call__)
        1    0.000    0.000    0.000    0.000 <ipython-input-49-87271d1c4549>:4(<module>)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/utils/ipstruct.py:125(__getattr__)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:1258(user_global_ns)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:378(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:207(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:684(<listcomp>)

In [50]:

print('\n'.join(str(r) for r in results))

(400001, 1, 4, 2)
(400001, 1, 4, 3)
(400002, 5, 8, 6)
(400002, 5, 8, 7)
(400003, 9, 12, 10)
(400003, 9, 12, 11)
(400004, 13, 16, 14)
(400004, 13, 16, 15)
(400005, 17, 20, 18)
(400005, 17, 20, 19)

In [51]:

S.count(progress=1, limit=50)

  0.00s Counting results per 1 up to 50 ...
   |     0.00s 1
   |     0.00s 2
   |     0.06s 3
   |     0.06s 4
   |     0.11s 5
   |     0.12s 6
   |     0.16s 7
   |     0.16s 8
   |     0.21s 9
   |     0.21s 10
   |     0.25s 11
   |     0.25s 12
   |     0.30s 13
   |     0.30s 14
   |     0.34s 15
   |     0.34s 16
   |     0.39s 17
   |     0.39s 18
   |     0.43s 19
   |     0.43s 20
   |     0.47s 21
   |     0.47s 22
   |     0.52s 23
   |     0.52s 24
   |     0.56s 25
   |     0.56s 26
   |     0.61s 27
   |     0.61s 28
   |     0.65s 29
   |     0.65s 30
   |     0.69s 31
   |     0.69s 32
   |     0.74s 33
   |     0.74s 34
   |     0.78s 35
   |     0.78s 36
   |     0.83s 37
   |     0.83s 38
   |     0.87s 39
   |     0.87s 40
   |     0.92s 41
   |     0.92s 42
   |     0.97s 43
   |     0.98s 44
   |     1.03s 45
   |     1.04s 46
   |     1.10s 47
   |     1.10s 48
   |     1.15s 49
   |     1.15s 50
  1.16s Done: 50 results

Strategy: small choice multi¶

In [52]:

S.study(query, strategy='small_choice_multi')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.19s Constraining search space with 6 relations ...
  0.53s 	2 edges thinned
  0.53s Setting up retrieval plan with strategy small_choice_multi ...
  0.54s Ready to deliver results from 700000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [53]:

S.showPlan(details=True)

Search with 4 objects and 5 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          100000   choices
node  2-slot                                          100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            [[     2-slot               1.0 choices
edge        2-slot             :=     0-chunk              0   choices
edge        0-chunk            [[     1-slot               1.0 choices
edge        1-slot             =:     0-chunk              0   choices
edge      2,1-slot            >,<     3-slot           20000.0 choices
  1.39s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1    =: a:slot
 3 R2    c:slot
 4       :=
 5     
 6 R3  s:slot
 7     
 8     a < s
 9     s < c
10

Observe how two < > constraints have been taken together. They will be tested in one pass.

In [54]:

pr = cProfile.Profile()
pr.enable()
results = S.fetch(limit=10)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

         1600341 function calls (1600246 primitive calls) in 0.744 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    0.744    0.372 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3230(run_code)
        2    0.000    0.000    0.744    0.372 {built-in method builtins.exec}
        1    0.000    0.000    0.744    0.744 <ipython-input-54-87271d1c4549>:3(<module>)
        1    0.000    0.000    0.744    0.744 /Users/dirk/github/annotation/text-fabric/tf/search/search.py:151(fetch)
        1    0.000    0.000    0.744    0.744 /Users/dirk/github/annotation/text-fabric/tf/search/searchexe.py:89(fetch)
       11    0.000    0.000    0.744    0.068 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:683(deliver)
   106/11    0.634    0.006    0.744    0.068 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:690(stitchOn)
  1600019    0.110    0.000    0.110    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:82(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codeop.py:132(__call__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
       50    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:693(<genexpr>)
       55    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:72(<lambda>)
       48    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:301(<lambda>)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:353(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:142(__call__)
        1    0.000    0.000    0.000    0.000 <ipython-input-54-87271d1c4549>:4(<module>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/utils/ipstruct.py:125(__getattr__)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:1258(user_global_ns)
        5    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:378(<lambda>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:207(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:684(<listcomp>)

In [55]:

print('\n'.join(str(r) for r in results))

(400001, 1, 4, 2)
(400001, 1, 4, 3)
(400002, 5, 8, 6)
(400002, 5, 8, 7)
(400003, 9, 12, 10)
(400003, 9, 12, 11)
(400004, 13, 16, 14)
(400004, 13, 16, 15)
(400005, 17, 20, 18)
(400005, 17, 20, 19)

In [56]:

S.count(progress=1, limit=50)

  0.00s Counting results per 1 up to 50 ...
   |     0.00s 1
   |     0.00s 2
   |     0.17s 3
   |     0.17s 4
   |     0.32s 5
   |     0.32s 6
   |     0.47s 7
   |     0.47s 8
   |     0.62s 9
   |     0.62s 10
   |     0.76s 11
   |     0.76s 12
   |     0.91s 13
   |     0.91s 14
   |     1.05s 15
   |     1.05s 16
   |     1.20s 17
   |     1.20s 18
   |     1.35s 19
   |     1.35s 20
   |     1.49s 21
   |     1.49s 22
   |     1.64s 23
   |     1.64s 24
   |     1.83s 25
   |     1.83s 26
   |     2.02s 27
   |     2.02s 28
   |     2.19s 29
   |     2.19s 30
   |     2.36s 31
   |     2.36s 32
   |     2.53s 33
   |     2.53s 34
   |     2.69s 35
   |     2.69s 36
   |     2.84s 37
   |     2.84s 38
   |     2.98s 39
   |     2.98s 40
   |     3.13s 41
   |     3.13s 42
   |     3.28s 43
   |     3.28s 44
   |     3.43s 45
   |     3.43s 46
   |     3.58s 47
   |     3.58s 48
   |     3.78s 49
   |     3.78s 50
  3.78s Done: 50 results

Main test 3¶

We add something of the query.

In [57]:

query = '''
chunk
  =: a:slot
  < b:slot
  < d:slot
  c:slot
  :=

s:slot

b < s
s < d
'''

It becomes more difficult to constrain s within the chunk.

This is a heavy query.

Strategy: small choice first¶

In [58]:

S.study(query, strategy='small_choice_first')

  0.00s Checking search template ...
  0.00s Setting up search space for 6 objects ...
  0.29s Constraining search space with 10 relations ...
  0.96s 	2 edges thinned
  0.96s Setting up retrieval plan with strategy small_choice_first ...
  0.99s Ready to deliver results from 1500000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [59]:

S.showPlan(details=True)

Search with 6 objects and 10 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          100000   choices
node  2-slot                                          400000   choices
node  3-slot                                          400000   choices
node  4-slot                                          100000   choices
node  5-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            :=     4-slot               1.0 choices (thinned)
edge        4-slot             ]]     0-chunk              0   choices
edge        0-chunk            =:     1-slot               1.0 choices (thinned)
edge        1-slot             ]]     0-chunk              0   choices
edge        0-chunk            [[     3-slot               4.0 choices
edge        0-chunk            [[     2-slot               4.0 choices
edge        2-slot             >      1-slot               0   choices
edge        2-slot             <      3-slot               0   choices
edge        2-slot             <      5-slot          200000.0 choices
edge        5-slot             <      3-slot               0   choices
  2.81s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1    =: a:slot
 3 R2    < b:slot
 4 R3    < d:slot
 5 R4    c:slot
 6       :=
 7     
 8 R5  s:slot
 9     
10     b < s
11     s < d
12

In [60]:

pr = cProfile.Profile()
pr.enable()
results = S.fetch(limit=10)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

         44799866 function calls (33599883 primitive calls) in 12.853 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000   12.853    6.426 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3230(run_code)
        2    0.000    0.000   12.853    6.426 {built-in method builtins.exec}
        1    0.000    0.000   12.853   12.853 <ipython-input-60-87271d1c4549>:3(<module>)
        1    0.000    0.000   12.853   12.853 /Users/dirk/github/annotation/text-fabric/tf/search/search.py:151(fetch)
        1    0.000    0.000   12.853   12.853 /Users/dirk/github/annotation/text-fabric/tf/search/searchexe.py:89(fetch)
       11    0.000    0.000   12.853    1.168 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:683(deliver)
11199994/11   10.608    0.000   12.853    1.168 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:690(stitchOn)
 22399625    1.628    0.000    1.628    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:72(<lambda>)
 11199886    0.618    0.000    0.618    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codeop.py:132(__call__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
      158    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:82(<lambda>)
       50    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:301(<lambda>)
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:380(<lambda>)
       70    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:693(<genexpr>)
       20    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:287(<lambda>)
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:355(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:142(__call__)
        1    0.000    0.000    0.000    0.000 <ipython-input-60-87271d1c4549>:4(<module>)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/utils/ipstruct.py:125(__getattr__)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:207(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:684(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:1258(user_global_ns)

In [61]:

print('\n'.join(str(r) for r in results))

(400001, 1, 2, 4, 4, 3)
(400002, 5, 6, 8, 8, 7)
(400003, 9, 10, 12, 12, 11)
(400004, 13, 14, 16, 16, 15)
(400005, 17, 18, 20, 20, 19)
(400006, 21, 22, 24, 24, 23)
(400007, 25, 26, 28, 28, 27)
(400008, 29, 30, 32, 32, 31)
(400009, 33, 34, 36, 36, 35)
(400010, 37, 38, 40, 40, 39)

Strategy: small choice multi¶

In [62]:

S.study(query, strategy='small_choice_multi')

  0.00s Checking search template ...
  0.00s Setting up search space for 6 objects ...
  0.29s Constraining search space with 10 relations ...
  0.96s 	2 edges thinned
  0.97s Setting up retrieval plan with strategy small_choice_multi ...
  1.00s Ready to deliver results from 1500000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [63]:

S.showPlan(details=True)

Search with 6 objects and 9 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          100000   choices
node  2-slot                                          400000   choices
node  3-slot                                          400000   choices
node  4-slot                                          100000   choices
node  5-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            :=     4-slot               1.0 choices (thinned)
edge        4-slot             ]]     0-chunk              0   choices
edge        0-chunk            =:     1-slot               1.0 choices (thinned)
edge        1-slot             ]]     0-chunk              0   choices
edge        0-chunk            [[     3-slot               4.0 choices
edge        0-chunk            [[     2-slot               4.0 choices
edge        2-slot             >      1-slot               0   choices
edge        2-slot             <      3-slot               0   choices
edge      2,3-slot            <,>     5-slot           20000.0 choices
  2.05s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1    =: a:slot
 3 R2    < b:slot
 4 R3    < d:slot
 5 R4    c:slot
 6       :=
 7     
 8 R5  s:slot
 9     
10     b < s
11     s < d
12

In [64]:

pr = cProfile.Profile()
pr.enable()
results = S.fetch(limit=10)
pr.disable()
s = io.StringIO()
sortby = SortKey.CUMULATIVE
ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
ps.print_stats()
print(s.getvalue())

         22400920 function calls (22400415 primitive calls) in 8.301 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000    8.301    4.150 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:3230(run_code)
        2    0.000    0.000    8.301    4.150 {built-in method builtins.exec}
        1    0.000    0.000    8.301    8.301 <ipython-input-64-87271d1c4549>:3(<module>)
        1    0.000    0.000    8.301    8.301 /Users/dirk/github/annotation/text-fabric/tf/search/search.py:151(fetch)
        1    0.000    0.000    8.301    8.301 /Users/dirk/github/annotation/text-fabric/tf/search/searchexe.py:89(fetch)
       11    0.000    0.000    8.301    0.755 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:683(deliver)
   516/11    6.658    0.013    8.301    0.755 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:690(stitchOn)
 11200157    0.833    0.000    0.833    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:72(<lambda>)
 11199626    0.809    0.000    0.809    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:82(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/codeop.py:132(__call__)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
      418    0.000    0.000    0.000    0.000 {built-in method builtins.len}
       50    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:301(<lambda>)
       70    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:693(<genexpr>)
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:380(<lambda>)
       20    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:287(<lambda>)
       10    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/relations.py:355(<lambda>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:142(__call__)
        1    0.000    0.000    0.000    0.000 <ipython-input-64-87271d1c4549>:4(<module>)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/utils/ipstruct.py:125(__getattr__)
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/interactiveshell.py:1258(user_global_ns)
       10    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000    0.000    0.000 /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/IPython/core/hooks.py:207(pre_run_code_hook)
        1    0.000    0.000    0.000    0.000 /Users/dirk/github/annotation/text-fabric/tf/search/stitch.py:684(<listcomp>)

In [65]:

print('\n'.join(str(r) for r in results))

(400001, 1, 2, 4, 4, 3)
(400002, 5, 6, 8, 8, 7)
(400003, 9, 10, 12, 12, 11)
(400004, 13, 14, 16, 16, 15)
(400005, 17, 18, 20, 20, 19)
(400006, 21, 22, 24, 24, 23)
(400007, 25, 26, 28, 28, 27)
(400008, 29, 30, 32, 32, 31)
(400009, 33, 34, 36, 36, 35)
(400010, 37, 38, 40, 40, 39)

Observation¶

Here is a query where the amount of time spent in the stitchOn() overtakes the time spent in the all) call.

So we really have a mixed bag with these strategies.

For now, I turn on the small_choice_multi because it makes really long queries a bit more bearable, and does not make much of a difference for shorter queries.

Main test 4¶

A quite different query.

In [66]:

query = '''
chunk
.num. slot
'''

Strategy: small choice first¶

In [67]:

S.study(query, strategy='small_choice_first')

  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
  0.08s Constraining search space with 1 relations ...
  0.10s 	0 edges thinned
  0.10s Setting up retrieval plan with strategy small_choice_first ...
  0.13s Ready to deliver results from 500000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [68]:

S.showPlan(details=True)

Search with 2 objects and 1 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk          .num.    1-slot               0.0 choices
  3.24s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1  .num. slot
 3

In [69]:

S.count(progress=1, limit=10)

  0.00s Counting results per 1 up to 10 ...
   |     0.00s 1
   |     0.23s 2
   |     0.45s 3
   |     0.66s 4
   |     0.87s 5
   |     1.09s 6
   |     1.30s 7
   |     1.51s 8
   |     1.72s 9
   |     1.93s 10
  1.94s Done: 10 results

Strategy: small choice multi¶

In [70]:

S.study(query, strategy='small_choice_multi')

  0.00s Checking search template ...
  0.00s Setting up search space for 2 objects ...
  0.08s Constraining search space with 1 relations ...
  0.10s 	0 edges thinned
  0.10s Setting up retrieval plan with strategy small_choice_multi ...
  0.12s Ready to deliver results from 500000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [71]:

S.showPlan(details=True)

Search with 2 objects and 1 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk          .num.    1-slot               0.0 choices
  1.94s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1  .num. slot
 3

In [72]:

S.count(progress=1, limit=10)

  0.00s Counting results per 1 up to 10 ...
   |     0.00s 1
   |     0.23s 2
   |     0.44s 3
   |     0.66s 4
   |     0.87s 5
   |     1.08s 6
   |     1.29s 7
   |     1.50s 8
   |     1.71s 9
   |     1.92s 10
  1.92s Done: 10 results

Strategy: by yarn size¶

In [73]:

S.study(query, strategy='by_yarn_size')

  0.00s Checking search template ...
  0.03s Setting up search space for 2 objects ...
  0.11s Constraining search space with 1 relations ...
  0.13s 	0 edges thinned
  0.13s Setting up retrieval plan with strategy by_yarn_size ...
  0.15s Ready to deliver results from 500000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [74]:

S.showPlan(details=True)

Search with 2 objects and 1 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk          .num.    1-slot               0.0 choices
  1.17s The results are connected to the original search template as follows:
 0     
 1 R0  chunk
 2 R1  .num. slot
 3

In [75]:

S.count(progress=1, limit=10)

  0.00s Counting results per 1 up to 10 ...
   |     0.00s 1
   |     0.24s 2
   |     0.45s 3
   |     0.66s 4
   |     0.88s 5
   |     1.09s 6
   |     1.31s 7
   |     1.53s 8
   |     1.74s 9
   |     1.96s 10
  1.96s Done: 10 results

Main test 5¶

Yet another feature comparison query.

In [76]:

query = '''
a:chunk
  n:slot
< b:chunk
  m:slot

n .cat. m
'''

Strategy: small choice first¶

In [77]:

S.study(query, strategy='small_choice_first')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.14s Constraining search space with 4 relations ...
  0.50s 	0 edges thinned
  0.50s Setting up retrieval plan with strategy small_choice_first ...
  0.54s Ready to deliver results from 1000000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [78]:

S.showPlan(details=True)

Search with 4 objects and 4 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          400000   choices
node  2-chunk                                         100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            [[     1-slot               4.0 choices
edge        0-chunk            <      2-chunk          50000.0 choices
edge        2-chunk            [[     3-slot               4.0 choices
edge        3-slot           .cat.    1-slot               0   choices
  1.49s The results are connected to the original search template as follows:
 0     
 1 R0  a:chunk
 2 R1    n:slot
 3 R2  < b:chunk
 4 R3    m:slot
 5     
 6     n .cat. m
 7

In [79]:

S.count(progress=100000, limit=1000000)

  0.00s Counting results per 100000 up to 1000000 ...
   |     0.50s 100000
   |     0.99s 200000
   |     1.47s 300000
   |     1.95s 400000
   |     2.43s 500000
   |     2.92s 600000
   |     3.45s 700000
   |     4.04s 800000
   |     4.53s 900000
   |     5.18s 1000000
  5.18s Done: 1000000 results

Strategy: small choice multi¶

In [80]:

S.study(query, strategy='small_choice_multi')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.14s Constraining search space with 4 relations ...
  0.50s 	0 edges thinned
  0.50s Setting up retrieval plan with strategy small_choice_multi ...
  0.54s Ready to deliver results from 1000000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [81]:

S.showPlan(details=True)

Search with 4 objects and 4 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          400000   choices
node  2-chunk                                         100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            [[     1-slot               4.0 choices
edge        0-chunk            <      2-chunk          50000.0 choices
edge        2-chunk            [[     3-slot               4.0 choices
edge        1-slot           .cat.    3-slot               0   choices
  1.38s The results are connected to the original search template as follows:
 0     
 1 R0  a:chunk
 2 R1    n:slot
 3 R2  < b:chunk
 4 R3    m:slot
 5     
 6     n .cat. m
 7

In [82]:

S.count(progress=100000, limit=1000000)

  0.00s Counting results per 100000 up to 1000000 ...
   |     0.51s 100000
   |     1.00s 200000
   |     1.49s 300000
   |     1.97s 400000
   |     2.46s 500000
   |     2.94s 600000
   |     3.43s 700000
   |     3.92s 800000
   |     4.44s 900000
   |     4.93s 1000000
  4.93s Done: 1000000 results

Strategy: by yarn size¶

In [83]:

S.study(query, strategy='by_yarn_size')

  0.00s Checking search template ...
  0.00s Setting up search space for 4 objects ...
  0.13s Constraining search space with 4 relations ...
  0.50s 	0 edges thinned
  0.50s Setting up retrieval plan with strategy by_yarn_size ...
  0.55s Ready to deliver results from 1000000 nodes
Iterate over S.fetch() to get the results
See S.showPlan() to interpret the results

In [84]:

S.showPlan(details=True)

Search with 4 objects and 4 relations
Results are instantiations of the following objects:
node  0-chunk                                         100000   choices
node  1-slot                                          400000   choices
node  2-chunk                                         100000   choices
node  3-slot                                          400000   choices
Performance parameters:
	yarnRatio            =    1.25
	tryLimitFrom         =      40
	tryLimitTo           =      40
Instantiations are computed along the following relations:
node                                  0-chunk         100000   choices
edge        0-chunk            [[     1-slot               4.0 choices
edge        0-chunk            <      2-chunk          50000.0 choices
edge        2-chunk            [[     3-slot               4.0 choices
edge        1-slot           .cat.    3-slot               0   choices
  5.14s The results are connected to the original search template as follows:
 0     
 1 R0  a:chunk
 2 R1    n:slot
 3 R2  < b:chunk
 4 R3    m:slot
 5     
 6     n .cat. m
 7

In [85]:

S.count(progress=100000, limit=1000000)

  0.00s Counting results per 100000 up to 1000000 ...
   |     0.51s 100000
   |     0.99s 200000
   |     1.48s 300000
   |     1.97s 400000
   |     2.46s 500000
   |     2.94s 600000
   |     3.43s 700000
   |     3.91s 800000
   |     4.40s 900000
   |     4.88s 1000000
  4.88s Done: 1000000 results

Left overs¶

Test use of shallow¶

In [5]:

query = '''
chunk
  slot num=1
  < slot
'''

In [6]:

list(S.search(query))

Out[6]:

[(400001, 2, 3), (400001, 2, 4)]

In [7]:

list(S.search(query, shallow=True))

Out[7]:

[400001]

In [8]:

list(S.search(query, shallow=2))

Out[8]:

[(400001, 2)]

In [8]:

query = '''
slot
<: slot
< slot
<: slot
< slot
<: slot
'''

In [20]:

from tf.app import use
A = use('bhsa', mod='cmerwich/bh-reference-system/tf')

	connecting to online GitHub repo annotation/app-bhsa ... connected
Using TF-app in /Users/dirk/text-fabric-data/annotation/app-bhsa/code:
	#d3cf8f0c2ab5d690a0fda14ea31c33da5c5c8483 (latest commit)
	connecting to online GitHub repo etcbc/bhsa ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/bhsa/tf/c:
	rv1.6 (latest release)
	connecting to online GitHub repo etcbc/phono ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/phono/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo etcbc/parallels ... connected
Using data in /Users/dirk/text-fabric-data/etcbc/parallels/tf/c:
	r1.2 (latest release)
	connecting to online GitHub repo cmerwich/bh-reference-system ... connected
	downloading https://github.com/cmerwich/bh-reference-system/releases/download/v1.0/tf-c.zip ... 
	unzipping ... 
	saving data
Using data in /Users/dirk/text-fabric-data/cmerwich/bh-reference-system/tf/c:
	rv1.0=#b9852739f705ab1e1bf53a60bbd68f16b4e20d90 (latest release)
   |     0.00s No structure info in otext, the structure part of the T-API cannot be used

Documentation: BHSA Character table Feature docs bhsa API Text-Fabric API 7.8.3 Search Reference

Loaded features:

BHSA = Biblia Hebraica Stuttgartensia Amstelodamensis: book book@ll chapter code det freq_lex function g_cons g_cons_utf8 g_lex g_lex_utf8 g_word g_word_utf8 gloss gn label language lex lex_utf8 ls nametype nu number otype pdp prs_gn prs_nu prs_ps ps qere qere_trailer qere_trailer_utf8 qere_utf8 rank_lex rela sp st trailer trailer_utf8 txt typ verse voc_lex voc_lex_utf8 vs vt mother oslots

cmerwich/bh-reference-system/tf: pgn_prde pgn_prps pgn_prs pgn_verb pgn_verb_prs

Parallel Passages: crossref

Phonetic Transcriptions: phono phono_trailer

In [ ]: