In this notebook we analyze the first results from the 50 experiment parameters as defined in parameters.py. These parameters have been run and fed to experiment classes (experiments2.py) in data_validation.ipynb. This process is computationally intensive due to the method of utilizing search templates, requiring about 30 minutes of processing. A pickled version of the data is exported by that notebook which is imported here.
The bulk of this analysis notebook takes place in the similarity analysis below, where we take the top 10 most common verbs and analyze their top 5 matches based on averaged scores across all applicable experiment parameters.
To properly see Hebrew search results, view this notebook in JupyterNBviewer
import numpy as np
from numpy.polynomial.polynomial import polyfit
import pandas as pd
import collections, os, sys, random, time, pickle, dill, copy, re
from IPython.display import clear_output
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from scipy.stats import iqr
from tf.fabric import Fabric
from tf.extra.bhsa import Bhsa
from project_code.experiments2 import Experiment
from project_code.semspace import SemSpace
bhsa_data_paths=['~/github/etcbc/bhsa/tf/c',
'~/github/verb_semantics/project_code/lingo/heads/tf/c',
'~/github/verb_semantics/project_code/sdbh']
TF = Fabric(bhsa_data_paths)
tf_api = TF.load('''
function lex lex_utf8 vs language
pdp freq_lex gloss domain ls
mother rela typ sp st code txt instruction
heads prep_obj
prs prs_gn prs_nu prs_ps
sem_domain sem_domain_code
''', silent=True)
tf_api.makeAvailableIn(globals())
B = Bhsa(api=tf_api, name='', version='c')
This is Text-Fabric 5.4.3 Api reference : https://dans-labs.github.io/text-fabric/Api/General/ Tutorial : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb Example data : https://github.com/Dans-labs/text-fabric-data 118 features found and 0 ignored
Documentation: BHSA Feature docs BHSA API Text-Fabric API 5.4.3 Search Reference
print('Loading cached experiments...')
with open('/Users/cody/Documents/experiments.dill', 'rb') as infile:
experiments = dill.load(infile)
print(f'{len(experiments)} experiments loaded.')
Loading cached experiments... 50 experiments loaded.
print('Loading semantic space data: adjusting counts with pmi, preparing plotter functions, etc...')
spaces = dict((exp_name, SemSpace(exp, info=False)) for exp_name, exp in experiments.items())
print(f'{len(spaces)} experiments successfully loaded into semantic spaces.')
Loading semantic space data: adjusting counts with pmi, preparing plotter functions, etc... 50 experiments successfully loaded into semantic spaces.
print('Experiments (coocurring features x target words):\n')
for shape, exp in sorted((experiments[exp].data.shape, exp) for exp in experiments):
print(f'{exp}:\t{shape}')
Experiments (coocurring features x target words): vi_subj_animacy: (2, 180) vi_allarg_pa: (2, 694) vi_objc_pa: (2, 714) vi_adj+_pa: (2, 734) vi_cmpl_pa: (2, 734) vi_coad_pa: (2, 734) vf_obj_pa: (3, 694) vd_domain_simple: (3, 704) vf_cmpl_pa: (4, 725) vi_objc_animacy: (5, 173) vf_adju_pa: (7, 733) vf_obj_animacy: (8, 127) vf_coad_pa: (8, 734) vg_tense: (8, 734) vi_cmpl_animacy: (39, 174) vf_argAll_pa: (43, 703) vi_adj+_animacy: (46, 108) vi_coad_animacy: (51, 241) vd_domain_embed: (73, 646) vf_cmpl_animacy: (88, 158) vi_allarg_animacy: (92, 370) vf_adju_animacy: (96, 78) vf_coad_animacy: (200, 192) vi_subj_domain: (247, 231) vd_par_lex: (305, 365) vf_argAll_animacy: (378, 207) vi_objc_domain: (448, 245) vf_obj_domain: (584, 213) vi_cmpl_domain: (1033, 223) vf_cmpl_domain: (1128, 207) vi_adj+_domain: (1219, 217) vf_adju_domain: (1575, 180) vi_coad_domain: (1735, 386) vi_subj_lex: (1959, 290) vi_objc_lex: (2853, 305) vi_allarg_domain: (2902, 527) vf_obj_lex: (3012, 274) vf_coad_domain: (3055, 301) vi_adj+_lex: (3478, 295) vf_cmpl_lex: (4074, 267) vi_cmpl_lex: (4079, 281) vf_adju_lex: (4184, 263) vd_con_window: (4463, 790) vf_argAll_domain: (5122, 339) vd_con_clause: (5477, 900) vi_coad_lex: (6765, 475) vd_con_chain: (8308, 1218) vf_coad_lex: (9337, 418) vi_allarg_lex: (11180, 652) vf_argAll_lex: (14761, 482)
Experiment names are coded by their strategy, the verbal argument they are denoting, and the particular level of information they are recording for the given verbal argument:
strategy_verbArgument_dataPoint
There are 4 strategies:
vi
- Verb inventory (vi
) makes inventories of verbal arguments based on their simple presence within the clause. The presence of multiple instances will be counted multiple times.vf
- Verb frame (vf
) experiments take into account all elements at once within the clause. If multiple instances are present, they are all recorded into a single string or frame.vd
- Verb domain experiments (vd
) create an inventory of surrounding words within a given discourse or context.vg
- The single verb grammar experiment (vg
) simply measures how often a term occurs in a specific verbal tense, based on the assumption that similar verbs might also prefer similar tenses.The arguments which are tested by verb inventory and frame experiments are as follows:
subj
- subject, which is only counted by verb inventory experimentsobjc
- objectcmpl
- complement (including lexicalized preposition info, e.g. L+COMPLEMENT)adj+
/adju
- adjunct, which includes tags for location, time, and predicate adjuncts (hence the + sign) (including lexicalized preposition info).coad
- complement and adjuncts together (including lexicalized preposition info)allArg
- object, complement, and adjunct arguments together (including lexicalized preposition info)For verb domain we test the following contexts that enclose the verb:
N
, Q
, D
for narrative, quotation, discourseData points which are gathered for all of these categories are:
For experiments that include a lot of lexical data, we employ the positive pointwise mutual information score or PPMI to adjust the raw counts (following Levshina 2015, 327). But for experiments with small-scale, categorical variables this does not make sense. The goal with categorical data is not to weigh surprising lexical occurrences, but to measure tendencies. For these it is more appropriate to use a ratio normalization.
Animacy experiments contain mixtures of lexical and categorical data. But lexical data is minimal in experiments that do not strongly weigh prepositional phrases such as object experiments. Thus, for these experiments we normalize with a ratio rather than ppmi. By contrast to the vf_obj_animacy experiment, which has 8 unique values, the vf_adju_animacy and vf_cmpl_animacy experiments have 96 and 200 unique values, respectively. These are more suitable for ppmi adjustment.
countType = {}
for exp_name, exp in experiments.items():
if exp.data.shape[0] < 9:
# print(exp_name)
# print(list(exp.data.index.unique()))
# print()
countType[exp_name] = 'sim_rRatio_maxNorm'
else:
countType[exp_name] = 'sim_pmi_maxNorm'
Which verbs will comprise this study? This depends on which verbs are accounted for in the dataset. The presence/absence (pa) experiments count the most basic features, namely, the simple presence or absence of a given verbal argument: objects, complements, adjuncts. I will first try to derive a specimen set by taking the intersection of all the pa experiments.
specimens = set(experiments['vi_objc_pa'].data.columns)
freq_count = collections.Counter()
print(f'Starting with specimens count of {len(specimens)}')
for exp_name, experiment in experiments.items():
if not 'vi_objc_pa' or not re.match('.*_pa', exp_name): # skip first pa experiments or non-pa exps.
continue
exp_lexemes = set(experiment.data.columns)
specimens = exp_lexemes & specimens
# make frequency count of lexemes
for lex in experiment.data.columns:
if lex in specimens:
freq_count[lex] += experiment.data[lex].sum()
print(f'Complete with specimen count of {len(specimens)}')
Starting with specimens count of 714 Complete with specimen count of 292
All verbs have their stem appended. How many plain lexemes are in the specimen set?
plain_lexs = set(lex.split('.')[0] for lex in specimens)
lex_nodes = [(next(l for l in F.otype.s('lex') if F.lex.v(l) == lex),) for lex in plain_lexs]
print(f'Number of plain lexemes accounted for: {len(plain_lexs)}\n')
B.show(lex_nodes)
Number of plain lexemes accounted for: 238
We have a good and diverse dataset here. Let's see what the makeup of stems are.
stem_counts = collections.Counter(lex.split('.')[1] for lex in specimens)
stem_counts.most_common()
[('qal', 165), ('hif', 57), ('piel', 40), ('nif', 24), ('hit', 4), ('hof', 1), ('hsht', 1)]
There are 297 verb specimens present with 243 plain lexical forms. Note that many of the experiments have different numbers of specimens present, due to the various requirements per experiment. We intend to use as much data as is present to inform the verb clustering. But are there some verbs whose coverage throughout all 50 experiments is especially lacking?
Below we count per lexeme how many different experiments it occurs in. The number itself is then counted.
spread_count = collections.Counter()
accounted_exps = set()
for lex in specimens:
count = 0
for exp_name, exp in experiments.items():
if lex in exp.data.columns:
count += 1
accounted_exps.add(exp_name)
spread_count[count] += 1
spread_count.most_common()
[(50, 26), (17, 14), (48, 14), (25, 13), (44, 13), (41, 12), (21, 11), (19, 11), (40, 11), (42, 10), (49, 10), (32, 10), (18, 10), (29, 9), (39, 8), (23, 8), (34, 8), (37, 8), (22, 7), (33, 7), (46, 7), (36, 7), (30, 6), (24, 6), (31, 6), (47, 5), (27, 5), (43, 5), (26, 5), (38, 4), (20, 4), (45, 4), (35, 3), (28, 2), (16, 2), (15, 1)]
What is the lowest accounted-for lexemes and how many of them are there?
min(spread_count.most_common())
(15, 1)
sum([amount for count, amount in spread_count.most_common() if count<20])
38
38/297 # ratio of total specimens within 20 or less experiments
0.12794612794612795
Only 1 lexeme is accounted for in only 15 of the experiments. 38 of the 297 verb specimens (12%) have less than 20 applicable experiments. These are acceptable amounts. Verbs that do not have an attested object, for instance, will not appear in 3 of 4 object experiments. But they will occur in the presence/absence object experiment. So it can be expected that certain verbs will be less accounted-for in various experiments.
Presence/absence experiments test whether and how often a given verb occurs with a given argument. Each experiment tests a single argument or combined set of arguments. These are: objects, complements, adjuncts, complements + adjuncts, and all arguments. For each argument type, there is a simple binary variable: present or not. Below we normalize across verbs to observe the distribution of verbs with the given argument.
vi_datas = pd.DataFrame((experiments['vi_objc_pa'].data / experiments['vi_objc_pa'].data.sum()).loc['Objc']) # normalized objc data
vi_datas.columns = ('ratio of attestation',)
vi_datas['experiment'] = 'vi_objc_pa'
for exp_name, experiment in experiments.items():
if exp_name == 'vi_objc_pa' or not re.match('vi_.*_pa', exp_name): # skip any non-pa experiments
continue
exp = experiment.data / experiment.data.sum() # normalize
arg = next(i for i in exp.index if i != 'ø')
datas = pd.DataFrame(exp.loc[arg])
datas.columns = ('ratio of attestation',)
datas['experiment'] = exp_name
vi_datas = pd.concat((vi_datas, datas))
plt.figure(figsize=(15,8))
plt.title('Distributions of Features Per Presence/Absence Experiment', fontsize=20)
sns.stripplot(x='experiment', y='ratio of attestation', data=vi_datas, jitter=0.3)
plt.xlabel('experiment', fontsize=18)
plt.ylabel('% of argument attestation', fontsize=18)
plt.axhline(0.5, color='black', linestyle='dotted')
<matplotlib.lines.Line2D at 0x2f2361518>
Points on the graph are individual verbs, for each of which we look at the percentage of its overall uses with the argument in question (e.g. the blue points show how often verbs occur with an object).
The combined, complement + adjunct experiment seems to have the most even distribution of features. The object presence/absence experiment appears to show the most separation between verbs. The adjunct tends to have <50% attestation, while the opposite is true for the allarg experiment.
It is possible to venture a few generalizations based on this visual:
ratioedO = experiments['vi_objc_pa'].data / experiments['vi_objc_pa'].data.sum()
allO = ratioedO.loc['Objc'][ratioedO.loc['Objc'] == 1.0]
noO = ratioedO.loc['Objc'][ratioedO.loc['ø'] == 1.0]
print(f'{allO.shape[0]} verbs that have an object 100% of the time...')
print(f'{noO.shape[0]} verbs that never occur with an object...')
39 verbs that have an object 100% of the time... 148 verbs that never occur with an object...
#print(' | '.join(allO.index))
#print(' | '.join(noO.index))
# B.show(B.search('''
# clause
# phrase function=Pred
# word lex=BW>[ vs=qal
# phrase function=Objc
# '''))
BW> with objects is a good example of the shortcomings of the ETCBC "object" label, which does not sharply distinguish between "objects" and "complements." Note however the group of verbs just beneath the 20% marker in the strip chart. Are many of these motion verbs like BW>?
smallO = ratioedO.loc['Objc'][ratioedO.loc['Objc'] <= 0.20]
print(f'{smallO.shape[0]} verbs that have an object ≤20% of the time...')
289 verbs that have an object ≤20% of the time...
#print(' | '.join(smallO.index))
For each specimen, calculate its similarity with all other specimens pairwise based on all experiments which they have in common. The similarity scores will be averaged across all of the shared experiments. All raw counts have been adjusted using the pointwise mutual information (pmi) score. Counts for
Average similarities across all experiments to derive a list of test cases.
sim_matrix = pd.DataFrame(np.zeros(shape=(len(specimens), len(specimens))), columns=specimens, index=specimens) # for pairwise similarities
common_matrix = pd.DataFrame(np.zeros(shape=(len(specimens), len(specimens))), columns=specimens, index=specimens) # for counting number of common experiments
for lex in specimens:
for space_name, space in spaces.items():
sim_measure = eval(f'space.{countType[space_name]}')
if lex not in sim_measure.columns:
continue
sim_matrix[lex] = sim_matrix[lex].add(sim_measure[lex], fill_value=0)
common_matrix[lex] += 1
sim = sim_matrix / common_matrix
#list(sim.columns)
sim['PYH[.qal']['PTX[.qal'] # random test
0.5903673728972471
sim['HJH[.qal'].sort_values(ascending=False).head(10)
NTN[.qal 0.612082 <FH[.qal 0.603892 BW>[.qal 0.547493 JY>[.qal 0.478234 QR>[.qal 0.471106 JCB[.qal 0.468395 HLK[.qal 0.467038 NPL[.qal 0.456240 <LH[.qal 0.455061 >MR[.qal 0.447714 Name: HJH[.qal, dtype: float64
sim['NTN[.qal'].sort_values(ascending=False).head(10)
FJM[.qal 0.743461 <FH[.qal 0.695756 BW>[.hif 0.649756 LQX[.qal 0.615713 HJH[.qal 0.612082 NF>[.qal 0.526348 QR>[.qal 0.519454 >MR[.qal 0.509996 CWB[.hif 0.502046 NGD[.hif 0.496463 Name: NTN[.qal, dtype: float64
One of the major goals of this project is to determine how verb meanings and classes are naturally distinguished by their context. היה is an interesting case of a verb which likely has no exact synonyms, but which exhibits similarity across all averaged experiments with words that are seemingly unsimilar such as נתן, עשה, בוא, etc. (see above). The purpose of gathering various kinds of contexts via the experiment parameters is to discover which contexts in particular are similar or different in various kinds of verbs.
The arrangment of experiment parameters on the plot is important, since I hope to use it to identify patterns. So here I develop an ordering and color scheme to be used with all of the plots.
I can attempt to arrange elements from more to less specific parameters. An example of a "specific" parameter would be one with lexical arguments, while a "general" parameter might only count the presence or absence of a given argument. Below is a concept of priorities:
# make bar orders and map to colors
verb_grammar = ['vg_tense']
verb_domain = '''
vd_domain_simple
vd_domain_embed
vd_con_chain
vd_con_clause
vd_con_window
vd_par_lex
'''.strip().split('\n')
vi_pa = '''
vi_allarg_pa
vi_coad_pa
vi_adj+_pa
vi_cmpl_pa
vi_objc_pa
'''.strip().split('\n')
vi_animacy = '''
vi_subj_animacy
vi_allarg_animacy
vi_coad_animacy
vi_adj+_animacy
vi_cmpl_animacy
vi_objc_animacy
'''.strip().split('\n')
vi_domain = '''
vi_subj_domain
vi_allarg_domain
vi_coad_domain
vi_adj+_domain
vi_cmpl_domain
vi_objc_domain
'''.strip().split('\n')
vi_lex = '''
vi_allarg_lex
vi_coad_lex
vi_subj_lex
vi_adj+_lex
vi_cmpl_lex
vi_objc_lex
'''.strip().split('\n')
vf_pa = '''
vf_argAll_pa
vf_coad_pa
vf_adju_pa
vf_cmpl_pa
vf_obj_pa
'''.strip().split('\n')
vf_animacy = '''
vf_argAll_animacy
vf_coad_animacy
vf_adju_animacy
vf_cmpl_animacy
vf_obj_animacy
'''.strip().split('\n')
vf_domain = '''
vf_argAll_domain
vf_coad_domain
vf_adju_domain
vf_cmpl_domain
vf_obj_domain
'''.strip().split('\n')
vf_lex = '''
vf_argAll_lex
vf_coad_lex
vf_adju_lex
vf_cmpl_lex
vf_obj_lex
'''.strip().split('\n')
blues = sns.color_palette(palette='Blues')
reds = sns.color_palette(palette='Reds')
# map experiment groups to colors
color2experiment = (('gold', verb_grammar),
('purple', verb_domain),
(blues[1], vi_pa),
(blues[2], vi_animacy),
(blues[4], vi_domain),
(blues[5], vi_lex),
(reds[1], vf_pa),
(reds[2], vf_animacy),
(reds[4], vf_domain),
(reds[5], vf_lex))
# make ordered tuple of colors
expcolors = tuple((exp, color) for color, exp_list in color2experiment # make tuple of colors
for exp in exp_list)
Below I prepare visualizer functions for examining similarities and differences in similarity scores.
def get_sim_experiments(lex1, lex2, colors=expcolors, show=True, returnData=True, wordSet=None):
'''
Exports a barchart that illustrates the level
of similarity between two lexemes per all experiments
in the dataset.
'''
sims = dict() # temporarily hold similarity scores here
# gather rated similarities between the two terms for every provided experiment
for sp_name, space in spaces.items():
sim_measure = eval(f'space.{countType[sp_name]}')
wordSet = wordSet or {lex1, lex2}
if wordSet & set(sim_measure.columns) != wordSet: # ensure both terms are in the space
continue
sims[sp_name] = sim_measure[lex1][lex2] # get similarity
sims_ordered = collections.OrderedDict((exp[0], sims[exp[0]]) for exp in colors if exp[0] in sims) # customized order
sims = pd.DataFrame(list(sims_ordered.items())) # drop into Dataframe
sims.columns = ['experiment', 'score'] # set col names
if show:
# plot:
plot_colors = tuple(expcol[1] for expcol in colors if expcol[0] in sims['experiment'].values)
plt.figure(figsize=(18, 8))
sns.barplot(sims['experiment'], sims['score'], palette=plot_colors, orient='v')
plt.xticks(rotation='vertical', fontsize=12)
plt.xlabel('experiment', fontsize=18)
plt.ylabel('adjusted similarity score', fontsize=18)
plt.title(f'Scored Similarities between {lex1} and {lex2}', fontsize=20)
if returnData:
return sims
def compare_simPatterns(dataset1, dataset2, pairs='', colors=expcolors):
'''
Compare barchart similarity patterns accross two barcharts.
The first dataset is the primary dataset and will be colored
normally. The second dataset will be colored grey for comparison.
'''
data1Sims = list(reversed(dataset1['score']))
data2Sims = list(reversed(dataset2['score']))
xLocations = np.arange(dataset1.shape[0])
barwidth = 0.8 # the width of the bars: can also be len(x) sequence
plt.figure(figsize=(10, 15))
p1 = plt.barh(xLocations, data1Sims, barwidth, color=sns.color_palette(palette='Reds')[3])
p2 = plt.barh(xLocations, data2Sims, barwidth, color=sns.color_palette(palette='Blues')[4])
plt.yticks(xLocations, list(reversed(dataset1['experiment'])), fontsize=12)
plt.ylabel('experiment', fontsize=18)
plt.xlabel('adjusted similarity scores', fontsize=18)
plt.title(f'Similarity Differences between {pairs[0]} and {pairs[1]}', fontsize=20)
plt.legend((p1[0], p2[0]), (f'{pairs[0]}', f'{pairs[1]}'), fontsize=12)
def compareSets(set1, set2):
'''
Compare two pairs of words and their similarity differences.
'''
dataset1 = get_sim_experiments(set1[0], set1[1], show=False, wordSet={set1[0], set1[1], set2[0], set2[1]})
dataset2 = get_sim_experiments(set2[0], set2[1], show=False, wordSet={set1[0], set1[1], set2[0], set2[1]})
compare_simPatterns(dataset1, dataset2, pairs=(f'{set1[0]} & {set1[1]}', f'{set2[0]} & {set2[1]}'))
def compareChange(set1, set2, colors=expcolors, returnData=False, showPlot=True):
'''
Compare two pairs of words and plot the positive or negative
difference in similarities from set1 to set2.
'''
dataset1 = get_sim_experiments(set1[0], set1[1], show=False, wordSet={set1[0], set1[1], set2[0], set2[1]}, colors=colors)
dataset2 = get_sim_experiments(set2[0], set2[1], show=False, wordSet={set1[0], set1[1], set2[0], set2[1]}, colors=colors)
change = dataset1.copy()
change['score'] = change['score'] - dataset2['score']
#change['score'] = change['score'] / dataset1['score'] # don't divide since all experiments are now fairly normal
datachange = list(reversed(change['score']))
blue = sns.color_palette(palette='Blues')[3]
red = sns.color_palette(palette='Reds')[4]
colors = [blue if score > 0 else red for score in datachange]
pairs = f'{set1[0]} & {set1[1]}', f'{set2[0]} & {set2[1]}'
if showPlot:
xLocations = np.arange(change.shape[0])
barwidth = 0.8 # the width of the bars: can also be len(x) sequence
plt.figure(figsize=(10, 15))
plt.barh(xLocations, datachange, barwidth, color=colors)
plt.yticks(xLocations, list(reversed(dataset1['experiment'])), fontsize=12)
plt.ylabel('experiment', fontsize=18)
plt.xlabel('adjusted similarity score differences', fontsize=18)
plt.title(f'Similarity Differences between {pairs[0]} and {pairs[1]}', fontsize=20)
if returnData:
return (dict(zip(list(reversed(dataset1['experiment'])), datachange)))
def plotTop(scores, lex):
'''
Plot the top most similar terms
using their rank and similarity score.
'''
plt.xticks(range(0, scores.shape[0]), scores.index)
plt.title(f'Plotted Similarity Scores to {lex}')
plt.ylabel('similarity score')
plt.xlabel('Most Similar Terms in Ranked Order')
plt.scatter(range(0, scores.shape[0]), scores)
plt.plot(range(0, scores.shape[0]), scores)
def plotPa(term1, term2, experiment):
'''
Plot side-by-side bar charts for presence
absence experiments to show the raw differences
in tendencies.
'''
width = 0.4
fig, ax = plt.subplots(figsize=(6, 5))
term1Dat = spaces[experiment].raw_norm[term1]
term2Dat = spaces[experiment].raw_norm[term2]
print(term1)
print(term1Dat, '\n')
print(term2)
print(term2Dat, '\n')
index = np.arange(term1Dat.shape[0], step=1)
ax.bar(index, term1Dat, width, label=term1)
ax.bar(index+width, term2Dat, width, label=term2)
ax.legend()
ax.set_xticks(index+width/2)
ax.set_xticklabels(term1Dat.index, fontsize=14)
ax.set_ylabel('attested proportion', fontsize=14)
ax.set_title(f'scores for {experiment}', fontsize=15)
def topCommon(term1, term2, experiment, count_type='pmi'):
'''
Find the top common values between two terms within
a given experiment.
'''
count_measure = eval(f'spaces["{experiment}"].{count_type}')
t1Top = count_measure[term1].sort_values(ascending=False)
t2Top = count_measure[term2].sort_values(ascending=False)
common = t1Top[t2Top > 0]
common = common[common > 0]
averaged_common = (common + t2Top[common.index]) / 2
averaged_common = pd.DataFrame(averaged_common, columns=['averaged'])
averaged_common[term1] = t1Top[common.index]
averaged_common[term2] = t2Top[common.index]
return averaged_common.sort_values(by='averaged', ascending=False)
def topUncommon(term1, term2, experiment, count_type='pmi', focus=''):
'''
Find the top uncommon values between two terms within
a given experiment.
'''
count_measure = eval(f'spaces["{experiment}"].{count_type}')
t1Top = count_measure[term1]
t2Top = count_measure[term2]
uncommon = abs(t1Top - t2Top) # absolute diff
uncommon = pd.DataFrame(uncommon, columns=['difference'])
uncommon[term1] = t1Top[uncommon.index]
uncommon[term2] = t2Top[uncommon.index]
if not focus:
return uncommon.sort_values(by='difference', ascending=False)
else:
return uncommon[uncommon[focus] > 0].sort_values(by='difference', ascending=False)
def sampleBases(targets, basis, exp_name):
'''
Returns examples of given bases
from the Hebrew Bible.
'''
results = list(sample for target in targets
for sample in experiments[exp_name].target2basis2result[target][basis])
random.shuffle(results)
return results
We will analyze the top ten words ranked by frequency of experiment data counts. For each word we look at the top 5 most similarly ranked words and analyze two categories: intuitive matches and surprising matches. Intuitive matches are those for which we can recognize a valid (as humanly judged) semantic similarity. We visualize the specific scores of similarity per experiment with barcharts. The charts allow us to map specific areas of similarity and dissimilarity based on the test parameters of the experiments. In doing so, we explore how similarity is multifaceted and whether there are any perceptible differences in intuitively and surprisingly matched terms. Another barchart is used to show differences between intuitively matched terms and surprisingly matched terms. This helps to isolate which experiment parameters consistently yield differences between intuitive and surprising matches.
Further above we have generated a frequency count for each lexeme in the specimen set. This allows us to select the top most common lexemes for the case study as shown below.
freq_count.most_common(10)
[('>MR[.qal', 47974.0), ('HJH[.qal', 34658.0), ('<FH[.qal', 22837.0), ('BW>[.qal', 19062.0), ('NTN[.qal', 18194.0), ('HLK[.qal', 13360.0), ('R>H[.qal', 10784.0), ('DBR[.piel', 10002.0), ('CM<[.qal', 9999.0), ('LQX[.qal', 9136.0)]
We can create a loose list of groups for these lexemes based on perceived intuitive similarities. These groups only serve the purpose of getting a sense of the lexical variety reflected in the most common set, and thus they do not go towards the analysis itself.
In this top ten set, we can see six general groups of verbs:
There is good variety in this top 10 set. We begin with these top ten, to analyze their top-rated most-similar terms, and to compare surprising similarities against intuitive ones.
We begin the word-by-word evaluation with אמר ("to say"). The top five most similar terms are based on the average of similarity scores across all available experiments for אמר. They are listed below along with the averaged similarity score.
say_top_sim = sim['>MR[.qal'].sort_values(ascending=False).head(5) # get top 5 elements
plotTop(say_top_sim, '>MR[.qal')
print('Similarity Scores with >MR\n', say_top_sim, '\n')
Similarity Scores with >MR QR>[.qal 0.643052 DBR[.piel 0.629496 NTN[.qal 0.531245 NGD[.hif 0.529970 <FH[.qal 0.510799 Name: >MR[.qal, dtype: float64
קרא is ranked most similar at 0.70. דבר is close at 0.68. There is a 10 point decrease at the third ranked word, נגד, which the plot illustrates. נתן and שלח are surprising similarities which will need to be explored.
We can note that the first three terms are good intuitive matches. The last two, נתן and שלח, are surprising matches. Is שלח included here because of its occasional use in messenger/communication situations, or for another reason? What is the cause for the similarity of נתן and שלח with אמר?
To answer these questions, we now turn to the analysis and comparison of similarity scores across the experiments.
say_call = get_sim_experiments('>MR[.qal', 'QR>[.qal')
Each score has been normalized according to the 75 percentile maximum similarity values per experiment so that the scores are judged in relation to the experiment's performance.
Each color group represents a different experiment strategy while different shades represent different data points, with increasing darkness for increasing specificity of the experiments' datapoint. Thus, higher bars within darker shaded groups will presumably show more specific similarities.
Starting with the vg
and vd
experiments, we see a high agreement in verb tense as well as simple domain. For the other vd
experiments there is decreasing similarity as the specificity of the datapoint increases to the context of a window. There does not appear to be any parallelism shared between אמר and קרא (vd_par_lex
).
In the inventory pa
experiments we see high levels of agreement between the adju and cmpl experiments. The allarg experiment is probably higher due to the high agreement in adju and cmpl. The object experiment is lower (<0.4) in comparison to the adju and cmpl experiments.
In the inventory animacy
experiments the subj and cmpl scores stand out above the others. There is a lot of agreement, then, on the animacy of the actant and the animacy of the complement element. For communicative verbs this is probably significant: a living subject communicating to a living complement. The coad experiment is also slightly higher, perhaps due to the increase in the cmpl agreement.
In the inventory domain
experiments there is highest agreement in the coad and cmpl experiments. Allarg and subject are also high. Since allarg includes the cmpl and adju, it makes sense that it would also be slightly higher. The high level of agreement with subject domain corresponds with a high level of subject animacy agreement. The adjunct domain is the second lowest. Probably most notable of all, though, is the very small level of agreement in the object experiment. This corresponds with lower levels of agreement seen in the pa
experiments (N.B. objc is absent from the animacy experiment due to lack of data).
In the inventory lex
experiments coad again stands out. Could it be that cmpl and adju agreement are combining to create even stronger levels of agreement in the coad experiment? Subject is the highest of the levels, just as is it is in the animacy experiments. Which lexemes might be co-occurring in this slot regularly? The adjunct experiment is again second lowest, as in the domain experiemnts. But the object experiment again shows the least amount of agreement.
Moving into the frame pa
experiments, the coad, adju, and cmpl experiments show high levels of agreement. The pattern between those three experiments is nearly exactly the same as the vi pa
experiments. The allarg experiment, on the other hand, is much lower than its pa
counterpart. This may be expected since the allarg frame experiment would require similarity across more diverse categories. Like with the vi experiments, objc has a much lower similarity at ~0.60. It's noteworthy that the adju experiment here is relatively equivalent with the cmpl (as in the inventory experiments) since the adju has consistently scored lower across animacy, domain, and lex experiments. This shows that while >MR and QR> have a similar tendency to attract adjuncts, the adjuncts themselves are particularly different. This could be an interesting area of investigation.
In the frame animacy
experiments, we see that adju has essentially zero similarity. This needs to be investigated. It shoud be noted that animacy experiments have much less data to work on, so a slight deficiency in one or another value could result in changes to this value. However, the lower level of animacy agreement in the adju experiment does correspond with its lower attestation in the vi experiments. The coad experiment is slightly higher than both argAll and cmpl, but in general all three are fairly close in agreement.
In the frame domain experiments the cmpl experiment shows the strongest similarity, which results in a higher score for the coad experiment. As in the inventory versions, the adju is second lowest while the object shows the most separation.
The lexical frame experiments show the same pattern and shape as the domain: high agreement in cmpl with a corresponding highest agreement in the coad experiment. argall, as a result, is also higher. The adju experiment is lower while the objc experiment is lowest.
>MR and QR> share very similar contexts at the discourse domain and clause levels. They also have very similar preferences for attracting the presence of adjuncts and complements, but they differ in their simple preference for an object. >MR and QR> differ even more in the kind of objects which they prefer, both by semantic domain and lexeme. The two verbs also tend to select different adjuncts in the animacy, semantic domain, and lexeme experiments. However, >MR and QR> share a high similarity in their complements across animacy, semantic domain, and lexical categories. Finally, both >MR and QR> share similar kinds of subjects. Notably, they share similarity in subject and complement animacy, which may be a key factor in differentiating communication verbs like these.
The analysis of the similarity plot shows high agreement in complement categories as well as subject animacy and lexemes. Here we dive deeper into the specific similarities with topCommon
, which pulls out the top similar features in common for a given experiment.
First, we see high levels of similarity in the discourse context experiments. We look at the top rated similarities in the window experiment, which records the lexemes to the left and right of the verbs.
topCommon('>MR[.qal', 'QR>[.qal', 'vd_con_window').head(10)
averaged | >MR[.qal | QR>[.qal | |
---|---|---|---|
L>_<MJ/ | 8.625709 | 8.625709 | 8.625709 |
J<BY/ | 7.803781 | 7.303781 | 8.303781 |
N<MJ=/ | 7.717750 | 8.510232 | 6.925269 |
CMCWN/ | 7.378714 | 8.247197 | 6.510232 |
GJXZJ/ | 7.333228 | 8.625709 | 6.040746 |
NPTLJ/ | 7.303781 | 7.303781 | 7.303781 |
BRWK/ | 7.303781 | 7.303781 | 7.303781 |
MJKJHW/ | 7.248265 | 8.040746 | 6.455784 |
>KJC/ | 7.217750 | 8.510232 | 5.925269 |
CM<J=/ | 7.110835 | 7.403316 | 6.818354 |
The frame, complement, animacy experiment yields important similarities in the shared complements of אמר and קרא:
topCommon('>MR[.qal', 'QR>[.qal', 'vf_cmpl_animacy')
averaged | >MR[.qal | QR>[.qal | |
---|---|---|---|
>L_animate | 4.417922 | 6.139394 | 2.696450 |
L_animate | 3.551028 | 4.445276 | 2.656780 |
L_inanimate | 2.891350 | 3.454115 | 2.328584 |
<L_inanimate | 1.054641 | 0.811928 | 1.297354 |
N.B. that the counts are ppmi adjusted counts. The complement also held high similarity in the frame domain experiment:
topCommon('>MR[.qal', 'QR>[.qal', 'vf_cmpl_domain').head(10) # show top 10
averaged | >MR[.qal | QR>[.qal | |
---|---|---|---|
>L_Professions | 6.371559 | 6.371559 | 6.371559 |
L_Gather | 5.901006 | 6.693487 | 5.108524 |
L_Control | 5.871559 | 6.371559 | 5.371559 |
L_Professions | 5.386132 | 5.886132 | 4.886132 |
>L_People | 5.168315 | 6.398031 | 3.938599 |
L_Waterbodies | 5.108524 | 5.108524 | 5.108524 |
L_Names of Landforms | 5.108524 | 5.108524 | 5.108524 |
L_Names of Groups | 4.898505 | 5.190987 | 4.606024 |
>L_Names of People | 4.866702 | 6.918107 | 2.815297 |
L_Names of People | 4.844022 | 5.447248 | 4.240797 |
These commonalities show the actions of both קרא and אמר being directed towards people (professions, gather[ings], people, names of groups). On a few occasions the action is directed towards performing an act of taking control (e.g. L_Control) using the verb שבה (Isa 61:1), אסר (Isa 49:9), or כבשׁ (as in 2 Chr 28:10). In the cases of L_Names of Landforms, the referents are Zion (Isa 52:7) and Hermon (Deut 3:9) as personified cities. In Ezek 6:3, YHWH speaks "to the channels" which is the source of L_Waterbodies.
The analysis also showed key differences in object and adjunct use. Here we will try to tease this out a bit more, and find out which object and adjunct instances in particular are different between the two terms. First we look at the raw counts in the inventory and frame presence/absence experiments.
plotPa('>MR[.qal', 'QR>[.qal', 'vi_objc_pa')
>MR[.qal Objc 0.921019 ø 0.078981 Name: >MR[.qal, dtype: float64 QR>[.qal Objc 0.489666 ø 0.510334 Name: QR>[.qal, dtype: float64
The data shows that >MR nearly always (92%) occurs alongside an object. Our export of the data counts the beginning of speech as an object. QR>, on the other hand, is evenly divided between its use with and without an object. Let's take a closer look at the cases where QR> occur without an object.
B.show(experiments['vi_objc_pa'].target2basis2result['QR>[.qal']['ø'][:10])
verse 1
verse 2
verse 3
verse 4
verse 5
verse 6
verse 7
verse 8
verse 9
verse 10
We can see that a common reason קרא often does not take the object is because it is used in conjunction with אמר, which will take the object instead. This happens most clearly in the formulaic statement: קרא לאמר. The raw counts for this occurrence is below.
print('Occurrences of QR> + L + >MR: \t', experiments['vi_adj+_lex'].data['QR>[.qal']['L_>MR['])
Occurrences of QR> + L + >MR: 23.0
We can make a further note here that this pattern gives some corroborating evidence as to the close meaning of קרא and אמר, especially since אמר often occurs with this same pattern:
print('Occurrences of >MR + L + >MR: \t', experiments['vi_adj+_lex'].data['>MR[.qal']['L_>MR['])
Occurrences of >MR + L + >MR: 113.0
Besides this formula, we see in the occurrences above that קרא is also frequently followed by אמר in the next sentence (i.e. call out -> say).
The experiments have thus noticed a subtle difference in meaning between אמר and קרא in their tendencies to attract objects. Namely, אמר almost always requires the beginning of a direct speech section (marked herein as the object), whereas קרא takes an object only about half of the time.
Another observation from the analysis above is that >MR and QR> frequently take different kinds of objects. This is seen in the relatively lower scores for object experiments in the remaining categories. Let's take a look at the top scores for the two terms to see if we can discern where major differences lie.
topUncommon('>MR[.qal', 'QR>[.qal', 'vi_objc_domain').head(20)
difference | >MR[.qal | QR>[.qal | |
---|---|---|---|
>M_Bless | 7.936638 | 7.936638 | 0.000000 |
L_Strong | 7.936638 | 7.936638 | 0.000000 |
Heat | 7.936638 | 0.000000 | 7.936638 |
>CR_Open | 7.936638 | 7.936638 | 0.000000 |
L_Speak | 7.936638 | 7.936638 | 0.000000 |
>CR_Ingest | 7.936638 | 7.936638 | 0.000000 |
L_Possess | 7.936638 | 7.936638 | 0.000000 |
L_Non-Exist | 7.936638 | 7.936638 | 0.000000 |
>CR_Flee | 7.936638 | 7.936638 | 0.000000 |
KJ_Witnesses | 7.936638 | 0.000000 | 7.936638 |
Soft | 7.936638 | 0.000000 | 7.936638 |
Names of Constructions | 7.351675 | 0.000000 | 7.351675 |
Names of Landforms | 7.003752 | 0.000000 | 7.003752 |
KJ_Sin | 6.936638 | 6.936638 | 0.000000 |
Contain | 6.936638 | 0.000000 | 6.936638 |
L_Space | 6.936638 | 6.936638 | 0.000000 |
L_Move | 6.936638 | 6.936638 | 0.000000 |
Search | 6.936638 | 6.936638 | 0.000000 |
Name | 6.745695 | 0.000000 | 6.745695 |
KJ_Birth | 6.351675 | 6.351675 | 0.000000 |
We see a bigger tendency amongst the data for אמר for the objects to consist of prepositional phrases which would be part of object clauses. קרא, on the other hand, has mainly simple noun phrases for the most part. We can also see that "Names" occur at least 3 times in the קרא's top list. Keep in mind that these scores are the PPMI adjusted scores, which aims to maximize surprising lexical occurrences in proportion to their overall sample sizes. If we look at the raw counts, we can see this "naming" tendency come out even more:
topUncommon('>MR[.qal', 'QR>[.qal', 'vi_objc_domain', count_type='raw').head(20)
difference | >MR[.qal | QR>[.qal | |
---|---|---|---|
Name | 106.0 | 0.0 | 106.0 |
Names of People | 73.0 | 0.0 | 73.0 |
Names of Locations | 34.0 | 0.0 | 34.0 |
Names of Landforms | 11.0 | 0.0 | 11.0 |
Happen | 10.0 | 2.0 | 12.0 |
Ingest | 8.0 | 0.0 | 8.0 |
Names of Deities | 7.0 | 0.0 | 7.0 |
Groups | 5.0 | 0.0 | 5.0 |
Names of Constructions | 4.0 | 0.0 | 4.0 |
Kinship | 4.0 | 0.0 | 4.0 |
Free | 4.0 | 0.0 | 4.0 |
Know | 3.0 | 3.0 | 0.0 |
People | 3.0 | 0.0 | 3.0 |
Land | 3.0 | 0.0 | 3.0 |
Festivals | 3.0 | 0.0 | 3.0 |
Towns | 3.0 | 0.0 | 3.0 |
Speak | 3.0 | 5.0 | 2.0 |
Universe | 2.0 | 0.0 | 2.0 |
Gather | 2.0 | 0.0 | 2.0 |
Priests | 2.0 | 0.0 | 2.0 |
If we look at the lexical experiments, we can see the same tendency reflected:
topUncommon('>MR[.qal', 'QR>[.qal', 'vi_objc_lex').head(20)
difference | >MR[.qal | QR>[.qal | |
---|---|---|---|
NPTLJ/ | 8.252665 | 0.000000 | 8.252665 |
YPNT_P<NX/ | 8.252665 | 0.000000 | 8.252665 |
JKJN/ | 8.252665 | 0.000000 | 8.252665 |
<MQ/ | 8.252665 | 0.000000 | 8.252665 |
<MNW_>L/ | 8.252665 | 0.000000 | 8.252665 |
KJ_CGGH/ | 8.252665 | 8.252665 | 0.000000 |
GRCM/ | 8.252665 | 0.000000 | 8.252665 |
XRVM/ | 8.252665 | 0.000000 | 8.252665 |
RXBWT/ | 8.252665 | 0.000000 | 8.252665 |
KJ_BL<==[ | 8.252665 | 8.252665 | 0.000000 |
FNJR/ | 8.252665 | 0.000000 | 8.252665 |
XRMH/ | 8.252665 | 0.000000 | 8.252665 |
JMJMH/ | 8.252665 | 0.000000 | 8.252665 |
XRB=/ | 8.252665 | 0.000000 | 8.252665 |
C_HBL/ | 8.252665 | 8.252665 | 0.000000 |
>BL_MYRJM/ | 8.252665 | 0.000000 | 8.252665 |
L>_<MJ/ | 8.252665 | 0.000000 | 8.252665 |
FRJN/ | 8.252665 | 0.000000 | 8.252665 |
>BN_H<ZR/ | 8.252665 | 0.000000 | 8.252665 |
L_DRC[ | 8.252665 | 8.252665 | 0.000000 |
In the אמר top list we see content of speech with verbs such as לדרש (say -> to seek) or לכנס (say -> to gather). In the קרא top list we see nouns and proper nouns. We can say, then, that one of the primary differences between אמר and קרא is the tendency of קרא to take nouns as direct objects in a naming context; אמר's objects are distinguished by marking action.
One of the distinguishing similarities between אמר and קרא was in the subject animacy experiment. These two verbs have high preferences for animate subjects, as would be expected:
plotPa('>MR[.qal', 'QR>[.qal', 'vi_subj_animacy')
>MR[.qal animate 0.992513 inanimate 0.007487 Name: >MR[.qal, dtype: float64 QR>[.qal animate 0.991453 inanimate 0.008547 Name: QR>[.qal, dtype: float64
There are cases in our data where inanimate objects are personified and begin speaking. We show a few of those cases below (results cut off at 5).
B.show(experiments['vi_subj_animacy'].target2basis2result['>MR[.qal']['inanimate'][:5], condenseType='clause')
clause 1
clause 2
clause 3
clause 4
clause 5
The strong agreement in subject animacy similarity is paired with a high agreement of complement similarity. Here are the top common scores for both terms in the complement animacy frames:
topCommon('>MR[.qal', 'QR>[.qal', 'vf_cmpl_animacy')
averaged | >MR[.qal | QR>[.qal | |
---|---|---|---|
>L_animate | 4.417922 | 6.139394 | 2.696450 |
L_animate | 3.551028 | 4.445276 | 2.656780 |
L_inanimate | 2.891350 | 3.454115 | 2.328584 |
<L_inanimate | 1.054641 | 0.811928 | 1.297354 |
Both verbs often direct their action from an animate subject with either the preposition אל or ל towards an animate object. We can note an interesting difference with קרא that ties back its naming tendency; see that "animate" without a qualifying preposition is included. This is presumably the person which is being named.
Finally, אנר and קרא consistently had lower levels of similarity in the adjunct experiments. This happens even though both have relatively higher similarity in the presence/absence experiment(closer to the complement). Why might that be the case?
topUncommon('>MR[.qal', 'QR>[.qal', 'vi_adj+_domain').head(20)
difference | >MR[.qal | QR>[.qal | |
---|---|---|---|
>L_Shame | 7.761551 | 7.761551 | 0.000000 |
B<D/_Evil | 7.761551 | 0.000000 | 7.761551 |
J<N/>CR_Space | 7.761551 | 7.761551 | 0.000000 |
>L_Domestic Animals | 7.761551 | 0.000000 | 7.761551 |
L_Bald | 7.761551 | 0.000000 | 7.761551 |
MN_Shine | 7.761551 | 0.000000 | 7.761551 |
<L_Crops | 7.761551 | 0.000000 | 7.761551 |
K_Name | 6.761551 | 0.000000 | 6.761551 |
L_Hang | 6.761551 | 6.761551 | 0.000000 |
>L_Possess | 6.761551 | 0.000000 | 6.761551 |
>L_Leaders | 6.761551 | 6.761551 | 0.000000 |
>L_Kinship | 6.761551 | 6.761551 | 0.000000 |
B_Curse | 6.761551 | 6.761551 | 0.000000 |
>CR_Move | 6.761551 | 6.761551 | 0.000000 |
L_Event Referents: Time | 6.761551 | 0.000000 | 6.761551 |
L_Curse | 6.761551 | 0.000000 | 6.761551 |
L_Think | 6.761551 | 6.761551 | 0.000000 |
L_Small | 6.761551 | 0.000000 | 6.761551 |
B_Lament | 6.761551 | 6.761551 | 0.000000 |
LM<N_Move | 6.761551 | 6.761551 | 0.000000 |
The differences here are diverse. We can see a slight tendency for the prepsition ב with אמר in conjunction with various kinds of "speaking" verbs like curse and lament. קרא has more references to landscapes and time references (possibly from the ETCBC Loca
and Time
tags. At this point it is difficult to perceive where the differences are.
Diving into the raw scores helps:
topUncommon('>MR[.qal', 'QR>[.qal', 'vi_adj+_domain', count_type='raw').head(20)
difference | >MR[.qal | QR>[.qal | |
---|---|---|---|
L_Speak | 88.0 | 112.0 | 24.0 |
<D_Time | 8.0 | 1.0 | 9.0 |
L_Move | 8.0 | 10.0 | 2.0 |
MN_Orientation | 5.0 | 0.0 | 5.0 |
B_Name | 5.0 | 1.0 | 6.0 |
B_Perception | 5.0 | 0.0 | 5.0 |
L_Dead | 4.0 | 4.0 | 0.0 |
B_Time | 4.0 | 17.0 | 13.0 |
MN_Universe | 3.0 | 0.0 | 3.0 |
Event Referents: Location | 3.0 | 0.0 | 3.0 |
K_Speak | 3.0 | 3.0 | 0.0 |
Event Referents: Time | 3.0 | 6.0 | 3.0 |
L_Exist | 3.0 | 3.0 | 0.0 |
L_Know | 3.0 | 3.0 | 0.0 |
B_Vision | 3.0 | 3.0 | 0.0 |
Buildings | 3.0 | 0.0 | 3.0 |
B_Just | 2.0 | 0.0 | 2.0 |
B_Distress | 2.0 | 0.0 | 2.0 |
L_Dwell | 2.0 | 2.0 | 0.0 |
B_Afraid | 2.0 | 2.0 | 0.0 |
We see references to space and time amongst the קרא data (though time is also frequent in אמר). The adjunct with מן happens 3 times in the קרא list but not at all with אמר. Let's look closer at these instances.
B.show(experiments['vi_adj+_domain'].target2basis2result['QR>[.qal']['MN_Universe'], condenseType='clause')
clause 1
clause 2
clause 3
B.show(experiments['vi_adj+_domain'].target2basis2result['QR>[.qal']['MN_Orientation'], condenseType='clause')
clause 1
clause 2
clause 3
clause 4
clause 5
Below looks for any cases where אמר occurs alongside מן in our dataset. It finds none:
[experiments['vi_adj+_domain'].target2basis2result['>MR[.qal'][b] for b in experiments['vi_adj+_domain'].target2basis2result['QR>[.qal'] if b.startswith('MN')]
[[], [], [], [], [], [], [], [], []]
This difference shows that קרא occurs with a preposition that indicates separation or source. We can tease this out further:
spaces['vf_argAll_animacy'].pmi['QR>[.qal'].sort_values(ascending=False).head(5) # N.B. these are ppmi adjusted scores
Cmpl.L_animate|Cmpl.L_inanimate 7.693487 Cmpl.L_animate|adj+.>L_animate|adj+.inanimate 7.693487 Cmpl.>L_animate|adj+.MN_inanimate 7.693487 Cmpl.L_inanimate|Objc.KJ_animate 7.693487 Cmpl.B_inanimate|Cmpl.L_animate 7.693487 Name: QR>[.qal, dtype: float64
Note the third-ranked frame with an animate complement and a MN+inanimate adjunct. This shows the action that is directed towards an animate object, a feature which קרא and אמר have in common, with an adjunct that קרא and אמר do not have in common. This is a nice distinction between these two verbs.
אמר and קרא share common contexts and frequently occur together. They are similar in their preference for animate subjects and complements, and they share similar prepositional phrases with אל or ל indicating the action towards the animate complement. The verbs differ, though, in their object and adjunct preferences. אמר nearly always takes speech as a direct object, while קרא is more flexible and can occur with or without an object 50% of the time. When either verb does take an object, there are differences in the domain and lexical choices of their objects. Namely, קרא more frequently takes proper noun phrases as objects, indicating its use to refer to naming. In the case of adjunct preference, קרא is also distinct in its use of מן to indicate the source of the action. It seems that קרא may also occur more often with adjuncts representing points in time or space but this hypothesis needs to be more thoroughly tested.
say_speak = get_sim_experiments('>MR[.qal', 'DBR[.piel')
The highest levels of agreement between אמר and דבר are seen in the vd experiments, where clause chain context is the highest, followed by the window contexts. We can find the top common context words that were found below:
topCommon('>MR[.qal', 'DBR[.piel', 'vd_con_window').head(10)
averaged | >MR[.qal | DBR[.piel | |
---|---|---|---|
NTN/ | 7.803781 | 8.888743 | 6.718818 |
GJXZJ/ | 7.333228 | 8.625709 | 6.040746 |
MJKJHW/ | 7.248265 | 8.040746 | 6.455784 |
>XJTPL/ | 7.125709 | 7.625709 | 6.625709 |
>LJC</ | 7.060638 | 8.268157 | 5.853119 |
MRJ/ | 7.040746 | 7.040746 | 7.040746 |
NBWT/ | 7.040746 | 7.040746 | 7.040746 |
JRMJHW/ | 6.946828 | 7.835632 | 6.058024 |
NBL/ | 6.925269 | 6.925269 | 6.925269 |
CVR[ | 6.833228 | 6.040746 | 7.625709 |
The majority of these terms are proper nouns. We do see שׁטר (participle, "officer").
We see high levels of agreement in the pa inventory experiments; allarg is higher due to the combined value of those experiments. We can notice that the object experiment, though, is much lower in its similarity, as was the case between אמר and קרא. Let's look more closely at vi_objc_pa
.
plotPa('>MR[.qal', 'DBR[.piel', 'vi_objc_pa')
>MR[.qal Objc 0.921019 ø 0.078981 Name: >MR[.qal, dtype: float64 DBR[.piel Objc 0.283664 ø 0.716336 Name: DBR[.piel, dtype: float64
דבר has an aversion for objects that is even stronger than what we saw for קרא (which was about 50/50). We can look at some examples:
B.show(experiments['vi_objc_pa'].target2basis2result['DBR[.piel']['ø'][:10])
verse 1
verse 2
verse 3
verse 4
verse 5
verse 6
verse 7
verse 8
verse 9
verse 10
Much like קרא, we notice that when the content of speech is introduced, the verb אמר is introduced, often with לאמר. But the verb דבר itself prefers object-less clauses. Complements such as את ("with") appear with דבר. This unsimilarity in object preference is also reflected in the rest of the object experiments. The kinds of objects that דבר does appear with refer to speech or a kind of speech, but not to its object.
experiments['vi_objc_lex'].data['DBR[.piel'].sort_values(ascending=False).head(10)
DBR/ 64.0 KL/ 11.0 MCPV/ 9.0 VWBH/ 6.0 VWB/ 6.0 CW>/ 6.0 CQR/ 6.0 KZB/ 5.0 R<H/ 5.0 SRH/ 4.0 Name: DBR[.piel, dtype: float64
We can also see that as compared to אמר, this verb occurs much less often with daughter clauses that describe an action:
topUncommon('>MR[.qal', 'DBR[.piel', 'vi_objc_lex').head(10)
difference | >MR[.qal | DBR[.piel | |
---|---|---|---|
KJ_CGGH/ | 8.252665 | 8.252665 | 0.000000 |
ZKR[ | 8.252665 | 8.252665 | 0.000000 |
C_HBL/ | 8.252665 | 8.252665 | 0.000000 |
>CR_>KL[ | 8.252665 | 8.252665 | 0.000000 |
KJ_BL<==[ | 8.252665 | 8.252665 | 0.000000 |
YX/ | 8.252665 | 0.000000 | 8.252665 |
>CR_NWS[ | 8.252665 | 8.252665 | 0.000000 |
BQC[ | 8.252665 | 8.252665 | 0.000000 |
XLQ=/ | 8.252665 | 0.000000 | 8.252665 |
L_<MD[ | 8.252665 | 8.252665 | 0.000000 |
As with אמר and קרא, subject animacy is highly similar between אמר and דבר. This, again, might be expected due to the meaning of these verbs. What of the complement animacy? We say the similarity in complements play an important role in the אמר & קרא pairing. In this pairing the complement experiments do not stand out as much. We can visualize the specific differences between the two pairings using compareChange
, which plots the change in similarity when moving from one pairing to another. This can help us isolate major areas of differences between the two pairs.
compareChange(('>MR[.qal', 'DBR[.piel'),('>MR[.qal', 'QR>[.qal'))
Here we see that דבר has advantages over קרא in its adjunct similarity with אמר, primarily between the vi and vf domain experiments. All other similarities dip as compared with קרא. We see the biggest decreases in complement animacy and domain similarity in this pairing. There is a difference in preference for the complement between אמר and דבר, with the latter occuring with the complement more frequently:
plotPa('>MR[.qal', 'DBR[.piel', 'vi_cmpl_pa')
>MR[.qal Cmpl 0.375498 ø 0.624502 Name: >MR[.qal, dtype: float64 DBR[.piel Cmpl 0.648199 ø 0.351801 Name: DBR[.piel, dtype: float64
But it is also important to note that the complement experiments remain high when compared amongst the other experiments. Here are the similarities and differences in select complement experiments:
topCommon('>MR[.qal', 'DBR[.piel', 'vi_cmpl_animacy')
averaged | >MR[.qal | DBR[.piel | |
---|---|---|---|
>L_animate | 5.423023 | 5.999025 | 4.847022 |
<L_animate | 2.942553 | 2.476110 | 3.408996 |
Here we see that אמר and דבר share the similarity found with קרא, i.e. directed action towards a living complement. But why might the complement category be rated so much lower than in the previous pair?
topUncommon('>MR[.qal', 'DBR[.piel', 'vi_cmpl_animacy').head(9)
difference | >MR[.qal | DBR[.piel | |
---|---|---|---|
>T==_animate | 4.423044 | 0.000000 | 4.423044 |
L_animate | 4.388867 | 4.388867 | 0.000000 |
L_inanimate | 3.622353 | 3.622353 | 0.000000 |
<M_animate | 3.293196 | 0.000000 | 3.293196 |
B_animate | 2.144152 | 0.000000 | 2.144152 |
<L_inanimate | 1.205904 | 1.205904 | 0.000000 |
>L_animate | 1.152003 | 5.999025 | 4.847022 |
<L_animate | 0.932886 | 2.476110 | 3.408996 |
<D_animate | 0.000000 | 0.000000 | 0.000000 |
We can note key differences here in preposition preference. דבר takes complements with an accompaniment sense "with": >T, B, and <M. דבר also does not occur with ל, using אל instead (It's important to note that דבר does occur with L 10 times in the raw counts; but these counts are apparently too neglible for the ppmi in comparison to their overall use).
The most significant improvements in similarity between דבר and אמר happen in the adjunct category. Why might that be the case? We see that the biggest jump comes from the adjunct domain experiments. We look at the common values below.
topCommon('>MR[.qal', 'DBR[.piel', 'vi_adj+_domain').head(10)
averaged | >MR[.qal | DBR[.piel | |
---|---|---|---|
B_Confident | 5.176589 | 5.176589 | 5.176589 |
L_Speak | 5.138114 | 4.698541 | 5.577687 |
B_Vision | 5.094601 | 5.887082 | 4.302120 |
<L_Furnishings | 4.761551 | 4.761551 | 4.761551 |
Perception | 4.761551 | 4.761551 | 4.761551 |
B_Non-See | 4.454196 | 3.954196 | 4.954196 |
B_Name | 4.354661 | 2.854661 | 5.854661 |
K_Speak | 4.261380 | 3.324146 | 5.198615 |
B_Know | 4.061112 | 4.061112 | 4.061112 |
B_Names of Landforms | 3.617695 | 3.117695 | 4.117695 |
topCommon('>MR[.qal', 'DBR[.piel', 'vf_adju_domain').head(10)
averaged | >MR[.qal | DBR[.piel | |
---|---|---|---|
K_Speak|L_Speak | 6.406891 | 5.906891 | 6.906891 |
B_Vision | 5.962406 | 6.754888 | 5.169925 |
<L_Furnishings | 5.169925 | 5.169925 | 5.169925 |
L_Speak | 4.978092 | 4.558967 | 5.397216 |
B_Confident | 4.906891 | 4.906891 | 4.906891 |
B_Name | 4.310568 | 2.906891 | 5.714246 |
K_Speak | 3.791413 | 2.791413 | 4.791413 |
B_Ingest | 3.791413 | 3.791413 | 3.791413 |
L_Know | 3.134301 | 3.134301 | 3.134301 |
L_Dead | 3.111909 | 3.404390 | 2.819428 |
These domains are associated with communication (L_Speak, K_Speak) and human actions (B_Confident, B_Vision, Perception, B_Non-See).
B.show(experiments['vi_adj+_domain'].target2basis2result['DBR[.piel']['B_Confident'])
say_declare = get_sim_experiments('>MR[.qal', 'NGD[.hif')
compareChange(('>MR[.qal', 'NGD[.hif'),('>MR[.qal', 'QR>[.qal'))
There are lot of decreases here across many categories. The biggest decrease comes in the subject domain and lex.
topUncommon('>MR[.qal', 'NGD[.hif', 'vi_subj_domain').head(10)
difference | >MR[.qal | NGD[.hif | |
---|---|---|---|
Compare | 7.851749 | 7.851749 | 0.0 |
Titles | 7.484764 | 7.484764 | 0.0 |
Lazy | 7.266787 | 7.266787 | 0.0 |
Old | 6.529821 | 6.529821 | 0.0 |
Names of Deities | 6.096671 | 6.096671 | 0.0 |
Modification | 5.851749 | 5.851749 | 0.0 |
Leaders | 5.540352 | 5.540352 | 0.0 |
Small Animals | 5.529821 | 5.529821 | 0.0 |
Sand | 5.529821 | 5.529821 | 0.0 |
Messengers | 5.436712 | 5.436712 | 0.0 |
There seems to be a lot more diversity in subjects for אמר, which is probably not surprising since it is a more common term. Below we look at differences from the perspective of NGD:
topUncommon('>MR[.qal', 'NGD[.hif', 'vi_subj_domain', focus='NGD[.hif').head(10)
difference | >MR[.qal | NGD[.hif | |
---|---|---|---|
Names of People | 5.044394 | 5.404654 | 0.360260 |
Shout | 4.392317 | 0.000000 | 4.392317 |
Tools | 3.459432 | 0.000000 | 3.459432 |
Classes | 3.321928 | 4.573764 | 1.251836 |
Universe | 1.000000 | 1.642296 | 2.642296 |
Wise | 1.000000 | 3.807355 | 2.807355 |
Perception | 0.000000 | 2.722466 | 2.722466 |
The difference in potential values between the two terms is highlighted by seeing how many more values there are for >MR than NGD:
experiments['vi_subj_domain'].data['>MR[.qal'][experiments['vi_subj_domain'].data['>MR[.qal'] > 0].shape # non-zero counts
(50,)
experiments['vi_subj_domain'].data['NGD[.hif'][experiments['vi_subj_domain'].data['NGD[.hif'] > 0].shape
(9,)
There are only 9 subject features with נגד. This is likely a cause for many differences in lexical, animacy, and domain experiments.
One key improvement in this pairing is the object preferences, especially as compared with the previous two pairings, which differed a lot in their preference for the object.
plotPa('>MR[.qal', 'NGD[.hif', 'vi_objc_pa')
>MR[.qal Objc 0.921019 ø 0.078981 Name: >MR[.qal, dtype: float64 NGD[.hif Objc 0.537975 ø 0.462025 Name: NGD[.hif, dtype: float64
While אמר and נגד still differ, the latter's 54% preference represents an increase over-against קרא.
plotPa('>MR[.qal', 'NGD[.hif', 'vi_subj_animacy')
>MR[.qal animate 0.992513 inanimate 0.007487 Name: >MR[.qal, dtype: float64 NGD[.hif animate 0.903226 inanimate 0.096774 Name: NGD[.hif, dtype: float64
We can note that the complement experiments are lower except for in vf_cmpl_animacy where there is an increase in similarity:
topCommon('>MR[.qal', 'NGD[.hif', 'vf_cmpl_animacy').head(10)
averaged | >MR[.qal | NGD[.hif | |
---|---|---|---|
L_animate | 3.564026 | 4.445276 | 2.682775 |
Here is the important category that we have seen in the latter two terms.
We can also see that ppmi has eliminated some otherwise similar terms from the frame experiment:
topCommon('>MR[.qal', 'NGD[.hif', 'vf_cmpl_animacy', count_type='raw').head(10)
averaged | >MR[.qal | NGD[.hif | |
---|---|---|---|
>L_animate | 305.5 | 609.0 | 2.0 |
L_animate | 123.0 | 190.0 | 56.0 |
L_inanimate | 12.5 | 24.0 | 1.0 |
B_animate | 3.5 | 2.0 | 5.0 |
Why does the similarity take such a hit in the inventory experiment?
topUncommon('>MR[.qal', 'NGD[.hif', 'vi_cmpl_animacy').head(7)
difference | >MR[.qal | NGD[.hif | |
---|---|---|---|
>L_animate | 5.999025 | 5.999025 | 0.000000 |
L_inanimate | 3.622353 | 3.622353 | 0.000000 |
<L_animate | 2.476110 | 2.476110 | 0.000000 |
<L_inanimate | 1.205904 | 1.205904 | 0.000000 |
L_animate | 1.185032 | 4.388867 | 3.203835 |
B_animate | 1.006648 | 0.000000 | 1.006648 |
MN_inanimate | 0.000000 | 0.000000 | 0.000000 |
It would seem again that higher frequency of אמר as compared to נגד results in less common values between these two terms.
This is the first surprising pair which we will investigate. In which areas does this pairing differ from the non-surprising matches?
say_give = get_sim_experiments('>MR[.qal', 'NTN[.qal')
We can see high levels of agreement across object experiments. Yet, it is significant that this high level of agreement in the presence/absence of an object is not accompanied by high levels of agreement in the domain and lexeme experiments. This is suspect, since the high levels in complement experiments, by comparison, were also accompanied with high similarities across domain and lexeme. This may be an important observation.
And now, in comparison to the intuitively similar matches:
compareChange(('>MR[.qal', 'NTN[.qal'),('>MR[.qal', 'QR>[.qal'))
compareChange(('>MR[.qal', 'NTN[.qal'),('>MR[.qal', 'DBR[.piel'))
compareChange(('>MR[.qal', 'NTN[.qal'),('>MR[.qal', 'NGD[.hif'))
We can note a decrease across all three comparisons in the vd_con_clause
experiment, indicating a different clause context for these three terms as compared with נתן.
Across all 3 intuitive matches we can see increased simmilarity in the object presence/absence experiments. This is reflected in the plot below.
plotPa('>MR[.qal', 'NTN[.qal', 'vi_objc_pa')
>MR[.qal Objc 0.921019 ø 0.078981 Name: >MR[.qal, dtype: float64 NTN[.qal Objc 0.875145 ø 0.124855 Name: NTN[.qal, dtype: float64
plotPa('>MR[.qal', 'NTN[.qal', 'vf_obj_pa')
>MR[.qal Objc 0.917243 Objc|Objc 0.002377 ø 0.080380 Name: >MR[.qal, dtype: float64 NTN[.qal Objc 0.799870 Objc|Objc 0.060233 ø 0.139896 Name: NTN[.qal, dtype: float64
None of the intuitive matches have such close similarity in the preference for objects. We also see that across all 3 pair comparisons there are decreases in the complement experiments.
plotPa('>MR[.qal', 'NTN[.qal', 'vi_cmpl_pa')
>MR[.qal Cmpl 0.375498 ø 0.624502 Name: >MR[.qal, dtype: float64 NTN[.qal Cmpl 0.827749 ø 0.172251 Name: NTN[.qal, dtype: float64
נתן takes its complement 40% more often than אמר. Is it also higher than the other terms?
plotPa('QR>[.qal', 'NTN[.qal', 'vi_cmpl_pa')
QR>[.qal Cmpl 0.564615 ø 0.435385 Name: QR>[.qal, dtype: float64 NTN[.qal Cmpl 0.827749 ø 0.172251 Name: NTN[.qal, dtype: float64
plotPa('NGD[.hif', 'NTN[.qal', 'vi_cmpl_pa')
NGD[.hif Cmpl 0.776758 ø 0.223242 Name: NGD[.hif, dtype: float64 NTN[.qal Cmpl 0.827749 ø 0.172251 Name: NTN[.qal, dtype: float64
Overall נתן prefers a complement much more often. Though it is close with נגד in the above experiment, this differs in the frame experiment where there is more distance.
plotPa('NGD[.hif', 'NTN[.qal', 'vf_cmpl_pa')
NGD[.hif Cmpl 0.683333 Cmpl|Cmpl 0.012500 Cmpl|Cmpl|Cmpl 0.000000 ø 0.304167 Name: NGD[.hif, dtype: float64 NTN[.qal Cmpl 0.768451 Cmpl|Cmpl 0.045634 Cmpl|Cmpl|Cmpl 0.000563 ø 0.185352 Name: NTN[.qal, dtype: float64
Overall, we can see that אמר and נתן share closer similarities in their preference for objects but differ in their preference for complements. Namely, נתן requires the complement more often than אמר. These same tendencies are seen when נתן is compared against the other terms that are intuitively similar to אמר.
In short summary, here are some primary observations about the key differences between the intuitive matches and this surprising match:
say_do = get_sim_experiments('>MR[.qal', '<FH[.qal')
As with נתן, we note high similarity in the object pa experiments with low agreement in the object domain and lexeme experiments. As compared with the previous example, though, we see higher levels of similarity in complement presence/absence for boh the frame and inventory experiments.
The vi subject lexeme experiment stands out due to low levels of similarity in the allarg, coad, adj+, and complement experiments. This situation is different as compared with נתן.
compareChange(('>MR[.qal', '<FH[.qal'),('>MR[.qal', 'QR>[.qal'))
We see large decreases in similarity across animacy, domain, and lexeme experiments. The highest increases are in the object experiments and adjunct domain.
The decreases across lexeme, domain, and animacy experiments that are accompanied by higher presence/absence experiments raises a question. Could it be that surprising matches frequently have higher grammatical similarity than contextual similarity? If we rearrange the order of the barchart, maybe those differences will be more apparent. We do that here and then test the resulting outcome.
# make NEW bar orders to test grammatical/contextual hypothesis
# we reorder the plots so that grammar comes first and context second
# bar order is handled under the colors argument of my plotting functions
grammar = '''
vg_tense
vi_allarg_pa
vi_coad_pa
vi_adj+_pa
vi_cmpl_pa
vi_objc_pa
vf_argAll_pa
vf_coad_pa
vf_adju_pa
vf_cmpl_pa
vf_obj_pa
'''.strip().split('\n')
context = '''
vd_domain_simple
vd_domain_embed
vd_con_chain
vd_con_clause
vd_con_window
vd_par_lex
vi_subj_animacy
vi_allarg_animacy
vi_coad_animacy
vi_adj+_animacy
vi_cmpl_animacy
vi_objc_animacy
vi_subj_domain
vi_allarg_domain
vi_coad_domain
vi_adj+_domain
vi_cmpl_domain
vi_objc_domain
vi_allarg_lex
vi_coad_lex
vi_subj_lex
vi_adj+_lex
vi_cmpl_lex
vi_objc_lex
vf_argAll_animacy
vf_coad_animacy
vf_adju_animacy
vf_cmpl_animacy
vf_obj_animacy
vf_argAll_domain
vf_coad_domain
vf_adju_domain
vf_cmpl_domain
vf_obj_domain
vf_argAll_lex
vf_coad_lex
vf_adju_lex
vf_cmpl_lex
vf_obj_lex
'''.strip().split('\n')
# map experiment groups to colors
color2experiment2 = ((reds[3], grammar),
(blues[4], context))
# make ordered tuple of colors
expcolors2 = tuple((exp, color) for color, exp_list in color2experiment2 # make tuple of colors
for exp in exp_list)
compareChange(('>MR[.qal', '<FH[.qal'),('>MR[.qal', 'QR>[.qal'), colors=expcolors2)
Here se can see that the biggest differences are towards the top of the graph, where the grammatical experiments have now been moved, though there are smaller changes below for object and adjuncts. The vi_adj+_domain is the biggest contextual gain. There are also slight gains in context in the chain and window, but not in vd_con_clause. This is significant because above we saw vd_con_clause as an experiment that was likewise consistently higher amongst the three intuitive matches. Looking back to נתן, do we see a similar grammar/context divide in the new plot order?
compareChange(('>MR[.qal', 'NTN[.qal'),('>MR[.qal', 'QR>[.qal'), colors=expcolors2)
Here too we see the biggest differences in the pa experiments, namely for object. There are also two bigger differences in context at adjunct (as seen above as well) and smaller increases in the object. We can again take note of the decrease in similarity in the vd_con_clause experiment.
Returning to עשׂה, do we see the same grammar/context division with דבר and נגד as the comparison point?
compareChange(('>MR[.qal', '<FH[.qal'),('>MR[.qal', 'DBR[.piel'), colors=expcolors2)
In this comparison the division is even more evident.
compareChange(('>MR[.qal', '<FH[.qal'),('>MR[.qal', 'NGD[.hif'), colors=expcolors2)
The tendency is less marked in the comparison with נגד. But we can note that the main gains in the context are restricted to the subject, allarg, coad, adj+ domains with no gains in complement or object domains. The same is true for the gains in subject and adjunct lexemes. Subject animacy is moderately increased. There is also a contextual increase in the domain_embed and con_chain categories. But we can see the indicative lower score for the vd_con_clause. Overall, the most substantive increases in similarity come in the pa experiments, as we have seen above.
To most thoroughly test the hypothesis, we will link every pa experiment score difference to its corresponding contextual experiment. We then plot all of the points and fit a line through them to look for a regression.
We've seen above that there seems to be a correlation between higher pa experiments, lower domain/lex experiments, and surprising matches. We've looked at the comparison of נתן with קרא above and saw that was so. What about the last two intuitive matches, דבר and נגד?
compareChange(('>MR[.qal', 'NTN[.qal'),('>MR[.qal', 'DBR[.piel'), colors=expcolors2)
Note that the increases in object presence/absence experiments here is accompanied with very slight increases in object domain and lexeme experiments. This is a different situation that we've seen in other surprising cases. What might be driving these improvements?
topCommon('>MR[.qal', 'NTN[.qal', 'vi_objc_lex')
averaged | >MR[.qal | NTN[.qal | |
---|---|---|---|
TCW<H/ | 5.667703 | 4.667703 | 6.667703 |
KBWD/ | 4.809956 | 3.580240 | 6.039672 |
QWL/ | 3.205297 | 0.912815 | 5.497778 |
<BD/ | 3.123382 | 2.123382 | 4.123382 |
R</ | 2.552226 | 2.552226 | 2.552226 |
<LH/ | 1.286881 | 1.286881 | 1.286881 |
DBR/ | 1.251914 | 1.620397 | 0.883432 |
Here we see some legitimate semantic similarities between speaking and giving. One can speak salvation, glory, words, as well as give them. Though, looking at these cases individually, we see that the senses may be different. In the cases with אמר, it seems that an adjunct or complement role might be a more appropriate tagging than object.
# B.show(experiments['vi_objc_lex'].target2basis2result['>MR[.qal']['<BD/']) # uncomment to peruse examples
compareChange(('>MR[.qal', 'NTN[.qal'),('>MR[.qal', 'NGD[.hif'), colors=expcolors2)
In the case of דבר we can definitely see the grammar/context division. With נגד that is less clear. As we saw with עשׂה, the comparison to נגד yields improvements in the pa inventory domain experiments. However, we see that these gains do not fully translate to the pa frame experiments. Crucially, the large gains in the object presence/absence experiments is accompanied with similarity decreases in object domain and lexemes. We also see, as with the last cases, there is a decrease in vd_con_clause
for both דבר and נגד.
top_be = sim['HJH[.qal'].sort_values(ascending=False).head(5)
plotTop(top_be, 'HJH[.qal')
print('Similarity scores for HJH:')
top_be
Similarity scores for HJH:
NTN[.qal 0.612082 <FH[.qal 0.603892 BW>[.qal 0.547493 JY>[.qal 0.478234 QR>[.qal 0.471106 Name: HJH[.qal, dtype: float64
All of the similar terms are surprising for this verb. There is thus the added difficulty here that there is no apparent intuitively similar term to test the surprising occurrences against.
be_give = get_sim_experiments('HJH[.qal', 'NTN[.qal')
The most striking difference in similarity is the negative scores for the presence of an object. Interestingly, these negative similarities in object presence/absence are not accompanied with negative scores for object animacy, domain, or lexeme. Likewise, we see that comparatively lower scores for the complement presence/absence experiments do not quite show up in the animacy, domain, and lexical experiments.
For the first time we see a low score for vi_subj_animacy, whereas with אמר the paired terms consistently scored higher in this experiment. This makes sense as היה is not strictly a verb with a living actant like אמר.
It looks like there is indeed some parallelism that occurs between נתן and היה as evidenced by vd_par_lex.
Looking across the similarity chart as a whole, we see high scores for the adjunct experiments, especially in the vi_adj+_domain. What kinds of common adjunct might these be?
topCommon('HJH[.qal', 'NTN[.qal', 'vi_adj+_domain').head(10)
averaged | HJH[.qal | NTN[.qal | |
---|---|---|---|
LM<N_Names of Locations | 6.761551 | 6.761551 | 6.761551 |
B_Walls | 6.761551 | 6.761551 | 6.761551 |
B_Leaders | 6.761551 | 6.761551 | 6.761551 |
L_Divide | 6.676589 | 7.176589 | 6.176589 |
<L_Space | 6.676589 | 6.176589 | 7.176589 |
MN_Sacrifice | 6.261551 | 5.761551 | 6.761551 |
MN_Non-Exist | 6.261551 | 6.761551 | 5.761551 |
L_Flee | 6.232104 | 6.439623 | 6.024586 |
L_Intact | 6.176589 | 6.176589 | 6.176589 |
B_Hear | 6.176589 | 6.176589 | 6.176589 |
topCommon('HJH[.qal', 'NTN[.qal', 'vi_adj+_lex').head(10)
averaged | HJH[.qal | NTN[.qal | |
---|---|---|---|
>L_QYH/ | 7.204571 | 7.204571 | 7.204571 |
B_PQD[ | 7.204571 | 7.204571 | 7.204571 |
B_MLWN/ | 7.204571 | 7.204571 | 7.204571 |
LM<N_JRWCLM/ | 7.204571 | 7.204571 | 7.204571 |
K_LBB/ | 7.119609 | 7.619609 | 6.619609 |
MN_>XZH/ | 7.119609 | 7.619609 | 6.619609 |
B_GWRL/ | 6.997052 | 6.204571 | 7.789534 |
L_>WR[ | 6.704571 | 7.204571 | 6.204571 |
L_>CH/ | 6.438324 | 5.034646 | 7.842001 |
B_CM<[ | 6.412090 | 7.204571 | 5.619609 |
We can see a lot of important similarities indicated with the L + verb or L + noun constructions. For instance, with L_Divide we can see a series of examples where something can be given or can be as/for something.
#B.show(experiments['vi_adj+_domain'].target2basis2result['HJH[.qal']['L_Divide']) # uncomment to peruse examples
#B.show(experiments['vi_adj+_domain'].target2basis2result['NTN[.qal']['L_Divide']) # uncomment to peruse examples
We can also note that the phrase לאישׁה is an easily recognizable similarity between these two verbs: to be "as a wife" and to be given "as a wife." Thus, the similarities in adjunct are important and do indeed show some cross over in semantic sense between נתן and היה.
be_do = get_sim_experiments('HJH[.qal', '<FH[.qal')
As was the case with נתן, we see large drops in the presence/absence of an object. There is also a similarity drop in subject animacy. The strongest points of similarity appear to happen in the adjunct experiments, for all categories but animacy. The frame object animacy is notably higher in similarity by comparison to the other experiments in that group.
Moving to the next two verbs, בוא and יצא, there is a significant increase in similarity for the object pa experiments:
be_come = get_sim_experiments('HJH[.qal', 'BW>[.qal')
Here we can see the similarity between היה and the movement verb בוא in that they both often lack an object:
plotPa('HJH[.qal', 'BW>[.qal', 'vi_objc_pa')
HJH[.qal Objc 0.013037 ø 0.986963 Name: HJH[.qal, dtype: float64 BW>[.qal Objc 0.010526 ø 0.989474 Name: BW>[.qal, dtype: float64
Indeed, we would expect היה and בוא to never take an object. But the object's presence here is due to idiosyncrasies in the ETCBC encoding, which occasionally encodes complementized elements as objects.
One key difference between היה and בוא comes in the complements experiments, where similarity is consistently lower.
plotPa('HJH[.qal', 'BW>[.qal', 'vi_cmpl_pa')
HJH[.qal Cmpl 0.237192 ø 0.762808 Name: HJH[.qal, dtype: float64 BW>[.qal Cmpl 0.667852 ø 0.332148 Name: BW>[.qal, dtype: float64
topUncommon('HJH[.qal', 'BW>[.qal', 'vi_cmpl_lex').head(10)
difference | HJH[.qal | BW>[.qal | |
---|---|---|---|
<D=/ | 8.134426 | 8.134426 | 0.000000 |
B_M<RB=/ | 8.134426 | 8.134426 | 0.000000 |
CQV[ | 8.134426 | 0.000000 | 8.134426 |
CWR==/ | 8.134426 | 0.000000 | 8.134426 |
DN/ | 8.134426 | 0.000000 | 8.134426 |
DRK/_>JLM/ | 8.134426 | 0.000000 | 8.134426 |
DRK/_>TRJM/ | 8.134426 | 0.000000 | 8.134426 |
GZR=/ | 8.134426 | 0.000000 | 8.134426 |
KKR/ | 8.134426 | 8.134426 | 0.000000 |
K_<VH[ | 8.134426 | 8.134426 | 0.000000 |
The presence of locations in the complement lex inventory for בוא attests to its status as a movement verb. Notably, many of these slots are empty for היה.
Finally, as with the last example, the category of adjunct experiments consistently score higher.
topCommon('HJH[.qal', 'BW>[.qal', 'vi_adj+_lex').head(10)
averaged | HJH[.qal | BW>[.qal | |
---|---|---|---|
>T==_<M/ | 7.204571 | 7.204571 | 7.204571 |
B_>CMRT/ | 7.204571 | 7.204571 | 7.204571 |
B_>CMH/ | 7.204571 | 7.204571 | 7.204571 |
ZKR=/ | 7.119609 | 6.619609 | 7.619609 |
B_CLX[ | 6.704571 | 7.204571 | 6.204571 |
B_QWM[ | 6.704571 | 6.204571 | 7.204571 |
B_MYWR=/ | 6.619609 | 6.619609 | 6.619609 |
B_NS<[ | 6.412090 | 7.204571 | 5.619609 |
L_<T/ | 6.251126 | 6.619609 | 5.882643 |
>XR/_DBR/ | 6.204571 | 7.789534 | 4.619609 |
One of the similarities we see here is the B + verb pattern which anchor the verb in an event.
#B.show(experiments['vi_adj+_lex'].target2basis2result['BW>[.qal']['B_QWM['])
In brief summary then, we see that היה and בוא share high similarities in their lack of an object as well as in their adjunct elements. This is different from the situation between היה and נתן and היה and עשׂה. However, they differ significantly in their preference for a complement.
As one last note, we can see a small increase in subject animacy as compared with the last cases.
plotPa('HJH[.qal', 'BW>[.qal', 'vi_subj_animacy')
HJH[.qal animate 0.519796 inanimate 0.480204 Name: HJH[.qal, dtype: float64 BW>[.qal animate 0.836299 inanimate 0.163701 Name: BW>[.qal, dtype: float64
בוא has more inanimate subjects than we might expect. But perusing the cases in the HB shows how this verb is often used with non-living, but moving objects.
#B.show(experiments['vi_subj_animacy'].target2basis2result['BW>[.qal']['inanimate']) # uncomment to view full results
be_goout = get_sim_experiments('HJH[.qal', 'JY>[.qal')
Like the comparison with בוא, the complement experiments show lower levels of similarity than their comparative experiments. We see very high levels of similarity in object experiments. The adjunct experiments also exhibit a higher level of similarity. The vd experiments appear slightly lower than in the pairing with בוא.
Below we plot the object agreement.
plotPa('HJH[.qal', 'BW>[.qal', 'vi_objc_pa')
HJH[.qal Objc 0.013037 ø 0.986963 Name: HJH[.qal, dtype: float64 BW>[.qal Objc 0.010526 ø 0.989474 Name: BW>[.qal, dtype: float64
This is the same shape which we saw in the בוא comparison.
Which adjunct features does יצא share with היה?
topCommon('HJH[.qal', 'JY>[.qal', 'vi_adj+_lex').head(10)
averaged | HJH[.qal | JY>[.qal | |
---|---|---|---|
K_JY>[ | 6.997052 | 7.789534 | 6.204571 |
L_PQD[ | 6.704571 | 6.204571 | 7.204571 |
B_CBJ<J/ | 6.382643 | 6.882643 | 5.882643 |
B_>SP[ | 6.204571 | 6.204571 | 6.204571 |
L_PNH[ | 6.119609 | 6.619609 | 5.619609 |
K_GBWR/ | 6.119609 | 6.619609 | 5.619609 |
K_KLH[ | 5.745140 | 7.745140 | 3.745140 |
B_YHRJM/ | 5.704571 | 6.204571 | 5.204571 |
MN_MXRT/ | 5.531229 | 7.381449 | 3.681009 |
>XR/_KN | 5.441298 | 6.552494 | 4.330102 |
Many of the matches here are with verbal adjuncts, one of the most interesting being כיצא due to it being the same root as our current target verb. We also see a higher number of ב, ל, and כ prepositions. ל is listed twice. We also see three time markers, בצהרים, אחר כן, ממחרת.
In short summary, all surprisingly matched verbs have in common a higher score for adjunct experiments. There are two categories of surprising matches: 1. verbs which differ in both their complement and object preferences but share a high degree of adjunct agreement (NTN and >FH), 2. verbs that share object and adjunct preferences but differ in their complement preferences (BW> and JY>).
Since היה had no intuitively similar matches, we delay further comparisons with the compareChange
plots. נתן and בוא are part of the case study set and have intuitive matches that can be used to compare over against היה.
be_call = get_sim_experiments('HJH[.qal', 'QR>[.qal')
The object experiments score low, as well as the complement experiments.
top_do = sim['<FH[.qal'].sort_values(ascending=False).head(5)
plotTop(top_do, '<FH[.qal')
print('Similarity scores for <FH:')
top_do
Similarity scores for <FH:
NTN[.qal 0.695756 HJH[.qal 0.603892 LQX[.qal 0.548989 BNH[.qal 0.530478 QR>[.qal 0.523806 Name: <FH[.qal, dtype: float64
Here we see curving shape without the distinctive "elbow" that has been present in the past plots. The closest term to an intuitive match is BNH[.qal. Notably there is a sharp drop in similarity from NTN to HJH of 0.1.
do_build = get_sim_experiments('<FH[.qal', 'BNH[.qal')
Here we see a lot of agreement in parallelism (vd_par_lex) ~0.30. Subject animacy is highly similar as well as object animacy. The next most similar is the adjunct argument, which shows high similarity across the plot. The complement is the next similar, and shows higher similarities in the frame experiments than the inventory.
While the object is higher in the pa and animacy experiments, it is the smallest similarity in the domain and lexeme experiments.
This leaves the adjunct and complement as the remaining two elements that consistently score higher across all kinds of datapoints.
topCommon('<FH[.qal', 'BNH[.qal', 'vf_adju_domain')
averaged | <FH[.qal | BNH[.qal | |
---|---|---|---|
L_Names of Deities | 4.962406 | 5.169925 | 4.754888 |
B_Names of Landforms | 4.684498 | 4.684498 | 4.684498 |
B_Names of Groups | 4.371969 | 5.775646 | 2.968291 |
B_Buildings | 4.073602 | 4.977280 | 3.169925 |
B_Names of People | 3.976979 | 4.269461 | 3.684498 |
L_Orientation | 3.962406 | 4.754888 | 3.169925 |
K>CR_Speak | 3.938844 | 6.231326 | 1.646363 |
B_Names of Locations | 3.807042 | 3.731041 | 3.883044 |
B_Landscapes | 3.587317 | 3.955800 | 3.218835 |
B_Land | 3.462106 | 4.462106 | 2.462106 |
Orientation | 3.404390 | 3.404390 | 3.404390 |
L_Space | 3.376376 | 3.376376 | 3.376376 |
B_Landforms | 3.321928 | 3.321928 | 3.321928 |
K_Time | 3.284334 | 4.076816 | 2.491853 |
Event Referents: Location | 3.243926 | 3.243926 | 3.243926 |
B_Towns | 3.084963 | 3.584963 | 2.584963 |
B_Orientation | 2.917788 | 4.647504 | 1.188072 |
MN_Orientation | 2.791413 | 3.791413 | 1.791413 |
L_Groups | 2.404390 | 2.404390 | 2.404390 |
L_Move | 1.468291 | 1.968291 | 0.968291 |
L_Possess | 1.352302 | 1.352302 | 1.352302 |
We see a lot of similarity in ב adjuncts. But ל and כ adjuncts are also attested. The highest similarity comes in L_Names of Deities, sampled below.
# B.show(sampleBases(('<FH[.qal', 'BNH[.qal'), 'L_Names of Deities', 'vf_adju_domain'), condenseType='clause') # uncomment me!
These are interesting parallels, which all show the assembling of cultic objects or ceremonies to a deity. We also sample one of the ב adjuncts:
# B.show(sampleBases(('<FH[.qal', 'BNH[.qal'), 'B_Names of Landforms', 'vf_adju_domain'), condenseType='clause') # uncomment me
The complements showed relatively higher similarity, especially in the pa and frame experiments.
plotPa('<FH[.qal', 'BNH[.qal', 'vi_cmpl_pa')
<FH[.qal Cmpl 0.290679 ø 0.709321 Name: <FH[.qal, dtype: float64 BNH[.qal Cmpl 0.242424 ø 0.757576 Name: BNH[.qal, dtype: float64
Below we sample the top common lexical complements in the frame experiment.
topCommon('<FH[.qal', 'BNH[.qal', 'vf_cmpl_lex')
averaged | <FH[.qal | BNH[.qal | |
---|---|---|---|
L_MQNH/ | 7.060696 | 7.060696 | 7.060696 |
L_<LH/ | 6.238768 | 6.738768 | 5.738768 |
L_PR<H/ | 5.183252 | 6.475733 | 3.890771 |
L_CM/ | 4.975733 | 3.475733 | 6.475733 |
L_YB>/ | 3.973233 | 3.973233 | 3.973233 |
L_DWD==/ | 3.861324 | 4.153805 | 3.568843 |
L_JHWH/ | 3.856615 | 3.663104 | 4.050127 |
L_>CH/ | 3.805808 | 4.305808 | 3.305808 |
L_>LHJM/ | 3.129959 | 4.129959 | 2.129959 |
These are all ל complements. This is probably significant.
# B.show(sampleBases(('<FH[.qal', 'BNH[.qal'), 'L_MQNH/', 'vf_cmpl_lex')) # uncomment me
These complements both show structures being constructed for their cattle.
# B.show(sampleBases(('<FH[.qal', 'BNH[.qal'), 'L_<LH/', 'vf_cmpl_lex')) # uncomment me
Here we also see cultic contexts. In the case of עשׂה there is an animal prepared "for the burnt offering"; in the case of בנה it is an altar built "for the burnt offering."
Looking at the vf complemement domain experiment, the ל complements are again the majority, with a single ב complement.
topCommon('<FH[.qal', 'BNH[.qal', 'vf_cmpl_domain')
averaged | <FH[.qal | BNH[.qal | |
---|---|---|---|
L_Sacrifice | 5.559469 | 5.927952 | 5.190987 |
L_Name | 4.493047 | 2.993047 | 5.993047 |
L_Names of Deities | 3.799081 | 3.569366 | 4.028797 |
L_Buildings | 3.644998 | 4.805962 | 2.484034 |
L_Possess | 3.301170 | 3.301170 | 3.301170 |
L_People | 2.719182 | 4.011663 | 1.426700 |
L_Deities | 2.644998 | 3.805962 | 1.484034 |
L_Leaders | 2.569523 | 4.693487 | 0.445559 |
L_Kinship | 1.823290 | 3.553006 | 0.093574 |
B_Buildings | 1.762750 | 1.762750 | 1.762750 |
The complement arguments appear to have more לs than do the adjuncts. But the fact that they both have ל suggests that there may be inconsistencies in the tagging of these arguments. This may be reason to look more closely at the coad experiment, which combines both adjuncts and complements.
One observation we can also make here is how these arguments are different from those seen with נתן, a verb that likewise uses ל often in the complement, but also shares the similarity of אל with its similar terms (see the analysis of נתן below).
Now we turn to the surprisingly similar terms. But in doing so, we recognize that there may be room for legitimate, unexpected similarities with עשׂה, a term that can be used in many different ways. In calling נתן a surprising similarity, for instance, we should keep in mind that נתן and עשׂה likely have some shared semantic nuances.
do_give = get_sim_experiments('<FH[.qal', 'NTN[.qal')
We note smaller similarities across all complement experiments. The object experiments scores highly across all. allarg and adjunct experiments also show higher similarities.
One thing we can note already is that while the עשׂה and בנה similarity was complement-strong and object-weak, these similarities are complement-weak and object-strong. The basic difference in complement preference is plotted below.
plotPa('<FH[.qal', 'NTN[.qal', 'vi_cmpl_pa')
<FH[.qal Cmpl 0.290679 ø 0.709321 Name: <FH[.qal, dtype: float64 NTN[.qal Cmpl 0.827749 ø 0.172251 Name: NTN[.qal, dtype: float64
We sample the most common object arguments.
topCommon('<FH[.qal', 'NTN[.qal', 'vi_objc_domain').head(10)
averaged | <FH[.qal | NTN[.qal | |
---|---|---|---|
Cover | 6.421764 | 6.129283 | 6.714246 |
Smell | 6.129283 | 5.129283 | 7.129283 |
Cords | 5.746392 | 5.503679 | 5.989105 |
Valuable | 5.629283 | 5.129283 | 6.129283 |
Furnishings | 5.557829 | 6.287545 | 4.828113 |
Objects | 5.544321 | 5.544321 | 5.544321 |
Weight | 5.436638 | 4.936638 | 5.936638 |
Metal | 5.423400 | 5.059909 | 5.786891 |
Safe | 5.421764 | 5.129283 | 5.714246 |
Substances | 5.351675 | 5.351675 | 5.351675 |
עשׂה is an interesting parallel in that it exhibits a high level of object similarity and presence/absence. The object animacy, domain, and lexical similarity seems to come from these verbs often being used with similar kinds of objects, especially with materials and valuables: Valuable, Furnishings, Objects, Weight, Metal, Substances. These are things that can be fashioned or given.
B.show(sampleBases(('<FH[.qal', 'NTN[.qal'), 'Cover', 'vi_objc_domain'), condenseType='clause')
compareChange(('<FH[.qal', 'NTN[.qal'), ('<FH[.qal', 'BNH[.qal'), colors=expcolors2)
In comparison to the <FH/BNH comparison, <FH/NTN has far less similarity in the presence of complement elements. However, with the exception of subject and argAll animacy, there are gains in animacy, domain, and lexeme experiments. Notably, the highest gain comes in vd_con_clause, which we have seen as an important differentiator in the אמר pairs.
We can note small areas of decrease in vi_cmpl_animacy and vf_cmpl_animacy. The gains for other complement experiments are much smaller in comparison to adjunct and object gains.
The decrease in vf_argAll_animacy is potentially important:
topUncommon('<FH[.qal', 'NTN[.qal', 'vf_argAll_animacy').head(10)
difference | <FH[.qal | NTN[.qal | |
---|---|---|---|
Objc.inanimate|Objc.inanimate|adj+.L_animate | 7.693487 | 7.693487 | 0.000000 |
adj+.<D_inanimate|adj+.MN_inanimate | 7.693487 | 7.693487 | 0.000000 |
Objc.inanimate|Objc.inanimate|adj+.B_inanimate | 7.693487 | 7.693487 | 0.000000 |
Cmpl.L_animate|Objc.animate|adj+.L_animate | 7.693487 | 0.000000 | 7.693487 |
Cmpl.K_animate|Objc.animate | 7.693487 | 7.693487 | 0.000000 |
Objc.animate|adj+.L_inanimate|adj+.inanimate | 7.693487 | 7.693487 | 0.000000 |
Cmpl.<L_inanimate|Objc.inanimate|Objc.inanimate | 7.693487 | 7.693487 | 0.000000 |
Cmpl.L_animate|Objc.animate|adj+.LM<N_inanimate | 7.693487 | 0.000000 | 7.693487 |
Cmpl.<M_inanimate|Objc.inanimate | 7.693487 | 7.693487 | 0.000000 |
Cmpl.L_animate|Objc.animate|adj+.LM<N_animate | 7.693487 | 0.000000 | 7.693487 |
# B.show(sampleBases(('NTN[.qal',), 'Cmpl.L_animate|Objc.animate|adj+.L_animate', 'vf_argAll_animacy')) # uncomment me
# B.show(sampleBases(('<FH[.qal',), 'Objc.inanimate|Objc.inanimate|adj+.L_animate', 'vf_argAll_animacy')) # uncomment me
These frames show different aspects of these two verbs' semantics.
The difference in complement animacy is shown below.
topUncommon('<FH[.qal', 'NTN[.qal', 'vi_cmpl_animacy', focus='NTN[.qal').head(20)
difference | <FH[.qal | NTN[.qal | |
---|---|---|---|
TXT/_animate | 5.568474 | 0.000000 | 5.568474 |
>YL/_inanimate | 5.121015 | 0.000000 | 5.121015 |
TXT/_inanimate | 4.742504 | 0.000000 | 4.742504 |
BJN/_inanimate | 3.442943 | 0.000000 | 3.442943 |
<L_inanimate | 2.678072 | 0.720477 | 3.398549 |
K_inanimate | 2.459432 | 3.488747 | 5.948179 |
<L_animate | 2.459432 | 0.016679 | 2.476110 |
BJN/_animate | 2.355481 | 0.000000 | 2.355481 |
B_inanimate | 2.321928 | 0.159598 | 2.481526 |
>L_inanimate | 2.221356 | 0.000000 | 2.221356 |
L_inanimate | 1.941897 | 4.857981 | 2.916084 |
MN_inanimate | 1.576695 | 0.000000 | 1.576695 |
animate | 1.214125 | 0.000000 | 1.214125 |
>L_animate | 1.049066 | 0.000000 | 1.049066 |
K_animate | 1.000000 | 5.273018 | 4.273018 |
L_animate | 0.932886 | 3.900766 | 4.833652 |
inanimate | 0.678072 | 0.000000 | 0.678072 |
MN_animate | 0.321928 | 1.258068 | 1.579996 |
B_animate | 0.263034 | 2.591611 | 2.854645 |
Here we see a variety of uncommon terms. The "focus" is placed on NTN to show the primary differences from BNH as it concerns positive values in NTN. The תחת + animate construction is an important aspect of נתן's semantics. It can be used to show a person taking another person's place.
# B.show(sampleBases(('NTN[.qal',), 'TXT/_animate', 'vi_cmpl_animacy'), condenseType='clause') # uncomment me
Finally, we examine the adjunct similarities with נתן
topCommon('NTN[.qal', '<FH[.qal', 'vf_adju_domain').head(15)
averaged | NTN[.qal | <FH[.qal | |
---|---|---|---|
<L_Wrong | 6.491853 | 6.491853 | 6.491853 |
K_Wrong | 5.962406 | 5.169925 | 6.754888 |
K_Right | 5.906891 | 5.906891 | 5.906891 |
L_Bear | 5.906891 | 5.906891 | 5.906891 |
L_Smell | 5.491853 | 5.491853 | 5.491853 |
B_Metal | 5.406891 | 4.906891 | 5.906891 |
K_Complete | 5.314708 | 3.522227 | 7.107189 |
MN_Land | 5.169925 | 5.169925 | 5.169925 |
L_Clean | 5.114409 | 4.321928 | 5.906891 |
K_Happen | 5.076816 | 3.491853 | 6.661778 |
B_Roads | 5.032421 | 4.032421 | 6.032421 |
L_Complete | 4.906891 | 4.906891 | 4.906891 |
K>CR_Speak | 4.842522 | 3.453718 | 6.231326 |
K>CR_See | 4.821928 | 5.321928 | 4.321928 |
L_Unclean | 4.821928 | 5.321928 | 4.321928 |
#B.show(sampleBases(('NTN[.qal', '<FH[.qal'), 'K_Right', 'vf_adju_domain'), condenseType='clause') # uncomment me
The first three similarities all deal with how one is treated "according to [standard]", either something that is repaid (NTN) or done (<FH) according to that standard. This tendency is seen several times with the כ constructions.
We also see multiple instances of L + verb which show the purpose that something is given for or made for.
#B.show(sampleBases(('NTN[.qal', '<FH[.qal'), 'L_Clean', 'vf_adju_domain')) # uncomment me
NTN and >FH exhibit a high degree of object and adjunct similarity. The main difference can be seen in complement experiments, especially in those which measure animacy. The vf_argAll_animacy frame is especially lower due to differences in the meaning of these two terms.
Previously when we encountered this pairing we had no intuitively similar term with which to compare to היה and עשׁה. Now we can utilize בנה to show more clearly the differences between these two verbs.
do_be = get_sim_experiments('<FH[.qal', 'HJH[.qal')
As we have already seen, the object experiments show low similarity levels. Now we compare this using בנה. We will continue to use the new order that places pa experiments at the top.
compareChange(('<FH[.qal', 'HJH[.qal'), ('<FH[.qal', 'BNH[.qal'), colors=expcolors2) # NB modified order
Despite many similarities across vd, animacy, domain, and lex experiments, we see big similarity decreases in allarg, object, and subject. allarg and object pa experiments are lower due to the apparent difference in object preference between עשׂה and היה as compared with עשׂה and בנה. The decrease in subject animacy is likely also important since עשׂה probably requires an active subject. There is also a decrease in object animacy.
plotPa('<FH[.qal', 'HJH[.qal', 'vi_subj_animacy')
<FH[.qal animate 0.984655 inanimate 0.015345 Name: <FH[.qal, dtype: float64 HJH[.qal animate 0.519796 inanimate 0.480204 Name: HJH[.qal, dtype: float64
plotPa('<FH[.qal', 'HJH[.qal', 'vi_objc_animacy')
<FH[.qal >CR_animate 0.000000 C_animate 0.000000 KJ_animate 0.000000 animate 0.145914 inanimate 0.854086 Name: <FH[.qal, dtype: float64 HJH[.qal >CR_animate 0.000000 C_animate 0.000000 KJ_animate 0.000000 animate 0.551724 inanimate 0.448276 Name: HJH[.qal, dtype: float64
HJH and <FH show differences in object experiments, animacy in subject and object.
do_take = get_sim_experiments('<FH[.qal', 'LQX[.qal')
LQX and <FH have high object similarity but low complement similarity. While complement scores high similarities in pa experiments, the contextual experiments show lower values. Here are the main differences in complement experiments.
Other areas of similarity include subject and object animacy.
topUncommon('<FH[.qal', 'LQX[.qal', 'vi_cmpl_lex', focus='<FH[.qal').head(15)
difference | <FH[.qal | LQX[.qal | |
---|---|---|---|
<D_>JN/ | 8.134426 | 8.134426 | 0.0 |
B_LV/ | 8.134426 | 8.134426 | 0.0 |
K_XMH/ | 8.134426 | 8.134426 | 0.0 |
K_TWRH/ | 8.134426 | 8.134426 | 0.0 |
K_TW<BH/ | 8.134426 | 8.134426 | 0.0 |
K_THW/ | 8.134426 | 8.134426 | 0.0 |
K_TBNJT/ | 8.134426 | 8.134426 | 0.0 |
K_SJSR>/ | 8.134426 | 8.134426 | 0.0 |
K_RYWN/ | 8.134426 | 8.134426 | 0.0 |
K_RC<=/ | 8.134426 | 8.134426 | 0.0 |
K_QN>H/ | 8.134426 | 8.134426 | 0.0 |
K_MDJN=/ | 8.134426 | 8.134426 | 0.0 |
K_MCPV/ | 8.134426 | 8.134426 | 0.0 |
K_JBJN/ | 8.134426 | 8.134426 | 0.0 |
K_DG/ | 8.134426 | 8.134426 | 0.0 |
topUncommon('<FH[.qal', 'LQX[.qal', 'vi_cmpl_lex', focus='LQX[.qal').head(20)
difference | <FH[.qal | LQX[.qal | |
---|---|---|---|
TXT/_BKR/ | 8.134426 | 0.0 | 8.134426 |
MN_P>RN/ | 8.134426 | 0.0 | 8.134426 |
L_>RK/ | 8.134426 | 0.0 | 8.134426 |
MN_NWH/ | 8.134426 | 0.0 | 8.134426 |
MN_MXYJT/ | 8.134426 | 0.0 | 8.134426 |
MN_MKL>/ | 8.134426 | 0.0 | 8.134426 |
L_MK>WB/ | 8.134426 | 0.0 | 8.134426 |
MN_M>KL/ | 8.134426 | 0.0 | 8.134426 |
L_RQX==/ | 8.134426 | 0.0 | 8.134426 |
L_VBXH=/ | 8.134426 | 0.0 | 8.134426 |
MN_LG/ | 8.134426 | 0.0 | 8.134426 |
L_XPJM/ | 8.134426 | 0.0 | 8.134426 |
MN_LCKH/ | 8.134426 | 0.0 | 8.134426 |
MN>CR_MY>[ | 8.134426 | 0.0 | 8.134426 |
MNH_BW>[ | 8.134426 | 0.0 | 8.134426 |
MN_KWN=/ | 8.134426 | 0.0 | 8.134426 |
MN_KBF/ | 8.134426 | 0.0 | 8.134426 |
MN_<DP[ | 8.134426 | 0.0 | 8.134426 |
MN_CLH/ | 8.134426 | 0.0 | 8.134426 |
MN_<Z==/ | 8.134426 | 0.0 | 8.134426 |
Here we can see the עשׂה has many more uses of K + noun whereas לקח has more MN + noun. Intuitively this makes sense: "do according to" and "take from".
The domain experiments show a similar difference, but עשׂה shows a bit more diversity in its primariy differences.
topUncommon('<FH[.qal', 'LQX[.qal', 'vi_cmpl_domain', focus='<FH[.qal').head(20)
difference | <FH[.qal | LQX[.qal | |
---|---|---|---|
L_Intense | 7.800900 | 7.800900 | 0.0 |
L_Names of Waterbodies | 7.800900 | 7.800900 | 0.0 |
K_Artifacts | 7.800900 | 7.800900 | 0.0 |
K_Happen | 7.800900 | 7.800900 | 0.0 |
>T_Kinship | 7.800900 | 7.800900 | 0.0 |
K_Laws | 7.800900 | 7.800900 | 0.0 |
K_Worthless | 7.800900 | 7.800900 | 0.0 |
K_Wrong | 7.800900 | 7.800900 | 0.0 |
LM<N_Name | 7.800900 | 7.800900 | 0.0 |
<M_Universe | 7.800900 | 7.800900 | 0.0 |
L_Extraordinary | 7.800900 | 7.800900 | 0.0 |
K_Angry | 7.800900 | 7.800900 | 0.0 |
K_Aquatic Animals | 7.800900 | 7.800900 | 0.0 |
<M_Land | 7.800900 | 7.800900 | 0.0 |
L_Non-Space | 7.800900 | 7.800900 | 0.0 |
<M_Dead | 7.800900 | 7.800900 | 0.0 |
<M_Classes | 7.800900 | 7.800900 | 0.0 |
MN_Cloth | 7.800900 | 7.800900 | 0.0 |
MN_Cover | 7.800900 | 7.800900 | 0.0 |
K_Complete | 7.537865 | 7.537865 | 0.0 |
There are more L + domains here.
topUncommon('<FH[.qal', 'LQX[.qal', 'vi_cmpl_domain', focus='LQX[.qal').head(20)
difference | <FH[.qal | LQX[.qal | |
---|---|---|---|
MN_Capacity | 7.800900 | 0.0 | 7.800900 |
MN_Clothing | 7.800900 | 0.0 | 7.800900 |
L_Clean | 7.800900 | 0.0 | 7.800900 |
MNH_Happen | 7.800900 | 0.0 | 7.800900 |
<L_Free | 7.800900 | 0.0 | 7.800900 |
>L_Young | 7.800900 | 0.0 | 7.800900 |
MN_Stalls | 7.800900 | 0.0 | 7.800900 |
MN_Walls | 7.800900 | 0.0 | 7.800900 |
MN>CR_Search | 7.800900 | 0.0 | 7.800900 |
MN_Apart | 7.385862 | 0.0 | 7.385862 |
MN_Food | 7.215937 | 0.0 | 7.215937 |
MN_Birds | 6.800900 | 0.0 | 6.800900 |
MN_Animals | 6.478972 | 0.0 | 6.478972 |
MN_Fruits | 6.215937 | 0.0 | 6.215937 |
L_Weak | 6.215937 | 0.0 | 6.215937 |
MN_Tents | 6.215937 | 0.0 | 6.215937 |
MN_Tombs | 6.215937 | 0.0 | 6.215937 |
TXT/_Kinship | 6.215937 | 0.0 | 6.215937 |
>L_Detach | 6.215937 | 0.0 | 6.215937 |
MN_Complete | 6.100460 | 0.0 | 6.100460 |
In this domain experiment we likewise see the dominance of MN, as with the lexical experiments.
Now we compare this pairing with the עשׂה and בנה pairing.
compareChange(('<FH[.qal', 'LQX[.qal'), ('<FH[.qal', 'BNH[.qal'), colors=expcolors2) # NB modified order
The biggest gains come in the object experiments. Note, though, that the higher levels of similarity in object context experiments does not result in a substantive gain in presence/absence experiments. There is also gain in vd experiments.
do_call = get_sim_experiments('<FH[.qal', 'QR>[.qal')
This pairing shows high similarity across pa experiments with the biggest in adju_pa experiments. Yet the context experiments do not bear this out, showing lower figures, most notably in the frame experiments.
The comparison with בנה brings out these differences.
compareChange(('<FH[.qal', 'QR>[.qal'), ('<FH[.qal', 'BNH[.qal'), colors=expcolors2) # NB modified order
The pa experiments actually show much lower levels of agreement when compared with בנה. The biggest decreasees come in object animacy, complement pa, and complement animacy. The biggest gains come from argall pa, con_clause, subj_domain, and subj_lex.
If agreement of the complement / object is defined in terms of good similarity across pa and context experiments, we can say that the עשׂה and קרא pairing exhibits similarities in neither object or complement arguments, and primarily contains similarities in adjunct and subject use.
top_go = sim['BW>[.qal'].sort_values(ascending=False).head(5)
plotTop(top_go, 'BW>[.qal')
print('Similarity scores for BW>:')
top_go
Similarity scores for BW>:
<LH[.qal 0.819233 HLK[.qal 0.752115 JY>[.qal 0.743727 CWB[.qal 0.727242 JRD[.qal 0.682752 Name: BW>[.qal, dtype: float64
All of these terms are intuitive matches and these are the highest similarity scores we have seen up to this point. Below I show the top seven, in order to see where the main decreases in similarity occur.
plotTop(sim['BW>[.qal'].sort_values(ascending=False).head(7), 'BW>[.qal')
At the hifil of BW> we see the "elbow."
go_ascend = get_sim_experiments('BW>[.qal' , '<LH[.qal')
Immediately we can see the "flat plateau" shape of the pa experiments, which show strong agreement in all categories, especially the complement and object categories. Throughout the domain, animacy, and lexeme experiments we continue to see high agreement in the complement, while the object slot apparently has too little samples to measure any agreement.
Subject domain and lex are noticeably lower. The vf animacy experiments are also lower in general, with the complement showing the lowest level of similarity. We look into that experiment below.
topUncommon('BW>[.qal', '<LH[.qal', 'vf_cmpl_animacy', focus='BW>[.qal')
difference | BW>[.qal | <LH[.qal | |
---|---|---|---|
>L_inanimate|MN_animate | 7.303781 | 7.303781 | 0.000000 |
<D_inanimate|MN_inanimate | 7.303781 | 7.303781 | 0.000000 |
NGD/_inanimate | 7.303781 | 7.303781 | 0.000000 |
B_inanimate|MN_inanimate | 7.303781 | 7.303781 | 0.000000 |
<L_animate|L_inanimate | 7.303781 | 7.303781 | 0.000000 |
>XR/_animate|MN_inanimate | 7.303781 | 7.303781 | 0.000000 |
>L_inanimate|inanimate | 6.566815 | 6.566815 | 0.000000 |
>L_animate|>L_inanimate | 6.515285 | 6.515285 | 0.000000 |
<L_inanimate|MN_inanimate | 5.718818 | 5.718818 | 0.000000 |
>L_inanimate|B_inanimate | 5.718818 | 5.718818 | 0.000000 |
<D_animate | 5.625709 | 5.625709 | 0.000000 |
<D_inanimate | 4.807355 | 6.651704 | 1.844349 |
animate | 4.392317 | 5.673730 | 1.281413 |
BJN/_animate | 3.981853 | 3.981853 | 0.000000 |
>L_animate | 3.033423 | 3.922519 | 0.889096 |
>L_inanimate | 2.427862 | 5.256681 | 2.828819 |
>L_animate|inanimate | 2.321928 | 6.303781 | 3.981853 |
MN_inanimate|inanimate | 2.321928 | 2.718818 | 5.040746 |
L_inanimate | 2.087463 | 2.956615 | 0.869153 |
inanimate | 1.680932 | 5.531540 | 3.850608 |
<M_animate | 1.584963 | 1.969880 | 0.384918 |
B_inanimate | 1.192645 | 2.612619 | 1.419974 |
DRK/_inanimate | 1.000000 | 3.981853 | 4.981853 |
MN_animate | 0.736966 | 2.897788 | 2.160823 |
B_animate | 0.584963 | 1.404927 | 0.819965 |
<L_inanimate | 0.263034 | 2.396890 | 2.659925 |
MN_inanimate | 0.043069 | 3.363338 | 3.320269 |
<L_animate | 0.000000 | 0.877516 | 0.877516 |
<L_animate|<L_inanimate | 0.000000 | 6.303781 | 6.303781 |
B.show(sampleBases(('BW>[.qal',), '>L_inanimate|MN_animate', 'vf_cmpl_animacy'), condenseType='clause')
This shows a mistake in the animacy encoding, where "Judah" is taken as a person rather than a location.
topUncommon('BW>[.qal', '<LH[.qal', 'vf_cmpl_animacy', focus='<LH[.qal')
difference | BW>[.qal | <LH[.qal | |
---|---|---|---|
MN_animate|inanimate | 7.303781 | 0.000000 | 7.303781 |
>L_animate|MN_animate | 5.303781 | 0.000000 | 5.303781 |
>L_inanimate|MN_inanimate | 5.303781 | 0.000000 | 5.303781 |
<D_inanimate | 4.807355 | 6.651704 | 1.844349 |
animate | 4.392317 | 5.673730 | 1.281413 |
>L_animate | 3.033423 | 3.922519 | 0.889096 |
>L_inanimate | 2.427862 | 5.256681 | 2.828819 |
>L_animate|inanimate | 2.321928 | 6.303781 | 3.981853 |
MN_inanimate|inanimate | 2.321928 | 2.718818 | 5.040746 |
L_inanimate | 2.087463 | 2.956615 | 0.869153 |
inanimate | 1.680932 | 5.531540 | 3.850608 |
<M_animate | 1.584963 | 1.969880 | 0.384918 |
B_inanimate | 1.192645 | 2.612619 | 1.419974 |
>XR/_animate | 1.133856 | 0.000000 | 1.133856 |
DRK/_inanimate | 1.000000 | 3.981853 | 4.981853 |
MN_animate | 0.736966 | 2.897788 | 2.160823 |
B_animate | 0.584963 | 1.404927 | 0.819965 |
<L_inanimate | 0.263034 | 2.396890 | 2.659925 |
MN_inanimate | 0.043069 | 3.363338 | 3.320269 |
<L_animate | 0.000000 | 0.877516 | 0.877516 |
<L_animate|<L_inanimate | 0.000000 | 6.303781 | 6.303781 |
By comparison with with BW>, <LH has much fewer arguments that are not matched in someway. The lower levels of frame agreement may be caused by the simple diversity of arguments that come from having a larger sample size of that verb. Let's compare the sample sizes below.
experiments['vf_cmpl_domain'].data['BW>[.qal'][experiments['vf_cmpl_domain'].data['BW>[.qal'] > 0].shape # non-zero counts
(182,)
experiments['vf_cmpl_domain'].data['<LH[.qal'][experiments['vf_cmpl_domain'].data['<LH[.qal'] > 0].shape # non-zero counts
(95,)
Thus, we can see that בוא has nearly twice as many potential values, most likely due to its higher frequency rate.
Overall, B>W and <LH show a high level of similarity across all major categories including complement and object. While the complement frame experiments show lower agreement, this seems to be due to a difference in sample size.
While close in meaning, עלה and בוא obviously have their own nuances. We can try to find those differences with the comparison to הלך, which intuitively seems more similar than עלה.
go_walk = get_sim_experiments('BW>[.qal', 'HLK[.qal')
The "plateau" shape of the BW>|<LH pairing is not quite seen here, where the complement experiments consistently score lower than its object or adjunct counterparts. Why is this the case?
plotPa('BW>[.qal', 'HLK[.qal', 'vi_cmpl_pa')
BW>[.qal Cmpl 0.667852 ø 0.332148 Name: BW>[.qal, dtype: float64 HLK[.qal Cmpl 0.506191 ø 0.493809 Name: HLK[.qal, dtype: float64
This shows a small differentiation in complement preference, with BW> prefering a complement more often. We look now at the complement domain experiment and the primary differences there.
topUncommon('BW>[.qal', 'HLK[.qal', 'vi_cmpl_domain', focus='HLK[.qal').head(15)
difference | BW>[.qal | HLK[.qal | |
---|---|---|---|
>XR/_Think | 7.8009 | 0.0 | 7.8009 |
>XR/_Names of Locations | 7.8009 | 0.0 | 7.8009 |
>XR/_Prophets | 7.8009 | 0.0 | 7.8009 |
>XR/_Good | 7.8009 | 0.0 | 7.8009 |
>XR/_Containers | 7.8009 | 0.0 | 7.8009 |
>L_Right | 7.8009 | 0.0 | 7.8009 |
BMW_Fire | 7.8009 | 0.0 | 7.8009 |
>XR/_Worthless | 7.8009 | 0.0 | 7.8009 |
>XR/_Unwilling | 7.8009 | 0.0 | 7.8009 |
Control | 7.8009 | 0.0 | 7.8009 |
DRK/_Leaders | 7.8009 | 0.0 | 7.8009 |
KJ_Impact | 7.8009 | 0.0 | 7.8009 |
>XR/_Sin | 7.8009 | 0.0 | 7.8009 |
K_Unable | 7.8009 | 0.0 | 7.8009 |
>XR/_Messengers | 7.8009 | 0.0 | 7.8009 |
Here we see the preposition אחר + domain which occurs quite frequently with הלך. There is also K and DRK. The same tendency does not show for עלה:
topUncommon('BW>[.qal', 'HLK[.qal', 'vi_cmpl_domain', focus='BW>[.qal').head(15)
difference | BW>[.qal | HLK[.qal | |
---|---|---|---|
<D_Buildings | 7.8009 | 7.8009 | 0.0 |
MN_Search | 7.8009 | 7.8009 | 0.0 |
>YL/_Location | 7.8009 | 7.8009 | 0.0 |
BMW_Hide | 7.8009 | 7.8009 | 0.0 |
<D_Conflict | 7.8009 | 7.8009 | 0.0 |
Confident | 7.8009 | 7.8009 | 0.0 |
Fortifications | 7.8009 | 7.8009 | 0.0 |
L_Dwell | 7.8009 | 7.8009 | 0.0 |
<D_Towns | 7.8009 | 7.8009 | 0.0 |
<D_Tents | 7.8009 | 7.8009 | 0.0 |
L_Help | 7.8009 | 7.8009 | 0.0 |
>L_Confident | 7.8009 | 7.8009 | 0.0 |
<D_Non-Happen | 7.8009 | 7.8009 | 0.0 |
<D_Quantity | 7.8009 | 7.8009 | 0.0 |
Names of People | 7.8009 | 7.8009 | 0.0 |
Here there are many more <D, MN, and L prepositions, indicating the destination. How do these differences compare with the בוא and עלה comparison?
compareChange(('BW>[.qal', 'HLK[.qal'), ('BW>[.qal', '<LH[.qal'), colors=expcolors2) # NB using the alternate order
The only improvements for the הלך and בוא pairing come in the adjunct, subject, and vd experiments. The greatest decreases in similarity are in the complement experiments, especially domain and animacy. All of the animacy experiments appear significantly lower as well.
In brief summary, HLK and BW> show a lot of similarities across all experiments, but they slightly differ in their complement preferences. This pairing demonstrates how seemingly intuitive similarities might not be borne out in the data itself: though הלך might seem to be more similar than בוא, the complement preferences are different. Overall בוא does indeed seem to be more similar to עלה.
go_goout = get_sim_experiments('BW>[.qal', 'JY>[.qal')
Here we see the distinctive "plateau" of agreement. This raises the question, why was יצא rated below הלך?
compareChange(('BW>[.qal', 'JY>[.qal'), ('BW>[.qal', 'HLK[.qal'), colors=expcolors2) # NB using the alternate order
As compared with the HLK pairing, the JY> pairing has less adjunct domain similarities.
go_return = get_sim_experiments('BW>[.qal', 'CWB[.qal')
שׁוב shows very high similarity scores across all but the vi and vf animacy experiments. In particular, the adjunct animacy is much lower by comparison to the other arguments. Why so?
topUncommon('BW>[.qal', 'CWB[.qal', 'vi_adj+_animacy', focus='BW>[.qal')
difference | BW>[.qal | CWB[.qal | |
---|---|---|---|
KMW_inanimate | 5.169925 | 5.169925 | 0.000000 |
>YL/_inanimate | 4.754888 | 4.754888 | 0.000000 |
>XR/_animate | 4.432959 | 4.432959 | 0.000000 |
>T==_animate | 4.054448 | 4.054448 | 0.000000 |
<M_animate | 2.947533 | 2.947533 | 0.000000 |
MN_animate | 2.646363 | 2.646363 | 0.000000 |
B_inanimate | 2.042392 | 2.042392 | 0.000000 |
K_inanimate | 1.968291 | 1.968291 | 0.000000 |
inanimate | 1.578649 | 1.578649 | 0.000000 |
L_animate | 1.150025 | 1.150025 | 0.000000 |
B_animate | 0.921997 | 0.921997 | 0.000000 |
L_inanimate | 0.263034 | 1.639410 | 1.376376 |
<D_inanimate | 0.118263 | 0.118263 | 0.000000 |
K_animate | 0.000000 | 0.488101 | 0.488101 |
MN_inanimate | 0.000000 | 0.215729 | 0.215729 |
topUncommon('BW>[.qal', 'CWB[.qal', 'vi_adj+_animacy', focus='CWB[.qal')
difference | BW>[.qal | CWB[.qal | |
---|---|---|---|
>XR/_inanimate | 4.754888 | 0.000000 | 4.754888 |
LM<N_animate | 3.584963 | 0.000000 | 3.584963 |
animate | 1.896907 | 0.000000 | 1.896907 |
<L_animate | 1.800691 | 0.000000 | 1.800691 |
L_inanimate | 0.263034 | 1.639410 | 1.376376 |
K_animate | 0.000000 | 0.488101 | 0.488101 |
MN_inanimate | 0.000000 | 0.215729 | 0.215729 |
It's not clear what the main difference might be here. There are no discernable themes in the differences.
go_descend = get_sim_experiments('BW>[.qal', 'JRD[.qal')
Again we see the plateau shapes for the pa experiments. We can see small deficiencies in the complement experiments. Let's see what the differences may be.
topUncommon('BW>[.qal', 'JRD[.qal', 'vi_cmpl_lex', focus='BW>[.qal').head(20)
difference | BW>[.qal | JRD[.qal | |
---|---|---|---|
<D_<DLM/ | 8.134426 | 8.134426 | 0.0 |
B_<DJ/ | 8.134426 | 8.134426 | 0.0 |
>L_XNWT/ | 8.134426 | 8.134426 | 0.0 |
>L_YRJX/ | 8.134426 | 8.134426 | 0.0 |
>PRT/ | 8.134426 | 8.134426 | 0.0 |
>XR/_BQR/ | 8.134426 | 8.134426 | 0.0 |
>XR/_LHB/ | 8.134426 | 8.134426 | 0.0 |
>YL/_<MD=/ | 8.134426 | 8.134426 | 0.0 |
B<L_P<WR/ | 8.134426 | 8.134426 | 0.0 |
BJN/_MXNH/ | 8.134426 | 8.134426 | 0.0 |
BMW_>RB=/ | 8.134426 | 8.134426 | 0.0 |
BRZL/ | 8.134426 | 8.134426 | 0.0 |
B_<B==/ | 8.134426 | 8.134426 | 0.0 |
B_BXRJM=/ | 8.134426 | 8.134426 | 0.0 |
B_TNWR/ | 8.134426 | 8.134426 | 0.0 |
B_CBW<H/ | 8.134426 | 8.134426 | 0.0 |
B_FKR/ | 8.134426 | 8.134426 | 0.0 |
B_M<JM/ | 8.134426 | 8.134426 | 0.0 |
B_M<MQJM/ | 8.134426 | 8.134426 | 0.0 |
B_M<WZ/ | 8.134426 | 8.134426 | 0.0 |
BW> appears to disagree more with JRD when there is a ב preposition present.
topUncommon('BW>[.qal', 'JRD[.qal', 'vi_cmpl_lex', focus='JRD[.qal').head(15)
difference | BW>[.qal | JRD[.qal | |
---|---|---|---|
>L_S<JP/ | 8.134426 | 0.0 | 8.134426 |
MN_RGLJM/ | 8.134426 | 0.0 | 8.134426 |
>L_CXT/ | 8.134426 | 0.0 | 8.134426 |
L_QYB/ | 8.134426 | 0.0 | 8.134426 |
MN_>NJH/ | 8.134426 | 0.0 | 8.134426 |
MN_BMH/ | 8.134426 | 0.0 | 8.134426 |
MN_CPM/ | 8.134426 | 0.0 | 8.134426 |
MN_JNWXH/ | 8.134426 | 0.0 | 8.134426 |
MN_MGDL_SWNH/ | 8.134426 | 0.0 | 8.134426 |
>CQLWN/ | 8.134426 | 0.0 | 8.134426 |
<VRWT_>DR/ | 8.134426 | 0.0 | 8.134426 |
<RBH/ | 8.134426 | 0.0 | 8.134426 |
<PR/ | 8.134426 | 0.0 | 8.134426 |
>L_KRM/ | 8.134426 | 0.0 | 8.134426 |
<D_GZR=/ | 8.134426 | 0.0 | 8.134426 |
JRD appears to disagree more with BW> when there is a MN preposition present. It is likely that BW> prefers the B preposition while JRD prefers MN. Intuitively this would make sense. These findings reinforce the need for a separate preposition experiment in the future.
top_give = sim['NTN[.qal'].sort_values(ascending=False).head(5)
plotTop(top_give, 'NTN[.qal')
print('Similarity scores for NTN:')
top_give
Similarity scores for NTN:
FJM[.qal 0.743461 <FH[.qal 0.695756 BW>[.hif 0.649756 LQX[.qal 0.615713 HJH[.qal 0.612082 Name: NTN[.qal, dtype: float64
The shape of this similarity plot is different from the last two, which both had a more distinguishable "elbow" after the first two terms, though we do see the elbow at היה.
Importantly, all of the terms here are matched more closely than the top terms with אמר and היה:
for lex, sims in (('>MR[.qal', say_top_sim), ('HJH[.qal', top_be), ('NTN[.qal', top_give)):
print(f'{lex} total similarity: {sims.sum()}')
>MR[.qal total similarity: 2.844562858050681 HJH[.qal total similarity: 2.7128060554005335 NTN[.qal total similarity: 3.3167665240130577
Overall, all terms except for היה seem to have some intuitive similarity to נתן. >FH is the least intuitive of those matches. The match of לקח is especially interesting since these two terms might be construed as antonyms (giving vs. taking).
The match of שׂים at 0.74 is 0.04 points higher than עשׂה. It is a good match. What are the points of agreement and disagreement in similarity?
give_set = get_sim_experiments('NTN[.qal', 'FJM[.qal')
The highest similarities are seen in the allarg_domain and all_arg lex experiments, potentially due to the higher levels of agreement across all of the arguments.
Perhaps the most recognizable difference from other similarity charts up to this point is the high level of agreement across all the pa experiments, creating a flat plateau shape; this notably includes the object and complement experiments. The high similarity in object and construct is mirrored with relatively high similarity scores in domain and lexeme experiments. A subtle but potentially important difference can be seen in the complement animacy experiments, most evident in the vf experiments.
Across all of the experiments the adjunct category is much smaller in similarity. This is the same situation which was seen in the אמר comparisons with קרא and דבר.
There is a high level of similarity in subject animacy.
First, let's view differences in the adjunct. The biggest difference comes in the inventory animacy experiment.
topUncommon('NTN[.qal', 'FJM[.qal', 'vi_adj+_animacy').head(10)
difference | NTN[.qal | FJM[.qal | |
---|---|---|---|
NKX=/_inanimate | 6.754888 | 0.000000 | 6.754888 |
LM<N_inanimate | 5.169925 | 5.169925 | 0.000000 |
>YL/_inanimate | 4.754888 | 0.000000 | 4.754888 |
LM<N_animate | 4.584963 | 4.584963 | 0.000000 |
<D_animate | 3.584963 | 3.584963 | 0.000000 |
TXT/_animate | 3.584963 | 0.000000 | 3.584963 |
L_animate | 3.471954 | 3.471954 | 0.000000 |
<L_animate | 3.385654 | 3.385654 | 0.000000 |
TXT/_inanimate | 3.252387 | 3.252387 | 0.000000 |
<M_inanimate | 3.054448 | 3.054448 | 0.000000 |
There are several cases with נתן where the action is directed towards or near (<L, L, <D) an animate adjunct which שׂים does not share. There are also differences in the lexemes of adjuncts:
topUncommon('NTN[.qal', 'FJM[.qal', 'vi_adj+_lex').head(10)
difference | NTN[.qal | FJM[.qal | |
---|---|---|---|
>L_PH/ | 8.204571 | 8.204571 | 0.000000 |
<L_>X/ | 8.204571 | 8.204571 | 0.000000 |
MN_JRWCLM/ | 8.204571 | 8.204571 | 0.000000 |
L_YLH[ | 8.204571 | 8.204571 | 0.000000 |
L_YDQH/ | 8.204571 | 8.204571 | 0.000000 |
L_BZ/ | 8.204571 | 8.204571 | 0.000000 |
BJN/_KWKB/ | 8.204571 | 0.000000 | 8.204571 |
BJN/_MZBX/ | 8.204571 | 8.204571 | 0.000000 |
MN_JXF[ | 8.204571 | 8.204571 | 0.000000 |
L>CR_>CM[ | 8.204571 | 8.204571 | 0.000000 |
give_move = get_sim_experiments('NTN[.qal', 'BW>[.hif')
The "plateau" shape of high presence/absence experiment agreement stands out here due to its similarity with the comparison to שׂים. However, unlike the similarity with שׂים, there is lower similarity for animacy, domain, and lexeme complement experiments. This presents a challenge to my developing hypothesis that complement similarities in pa experiments without corresponding domain, animacy, and lex similarities distinguish surprising matches from intuitive ones. In this case BW>.hif is intuitively a good match. But the complement patterns differ. What, if any, differences are there then between this and the surprising matches?
One observation is that the similar object arguments are corresponded with similarities in object animacy, domain, and lex experiments; in past surprising matches, there have not been correspondences in a complete group of object or complement experiments.
topUncommon('NTN[.qal', 'BW>[.hif', 'vi_cmpl_animacy').head(15)
difference | NTN[.qal | BW>[.hif | |
---|---|---|---|
K_inanimate | 5.948179 | 5.948179 | 0.000000 |
TXT/_animate | 5.568474 | 5.568474 | 0.000000 |
>YL/_inanimate | 5.121015 | 5.121015 | 0.000000 |
TXT/_inanimate | 4.742504 | 4.742504 | 0.000000 |
K_animate | 4.273018 | 4.273018 | 0.000000 |
L_animate | 3.444785 | 4.833652 | 1.388867 |
BJN/_inanimate | 3.442943 | 3.442943 | 0.000000 |
B_animate | 2.854645 | 2.854645 | 0.000000 |
inanimate | 2.807355 | 0.678072 | 3.485427 |
BJN/_animate | 2.355481 | 2.355481 | 0.000000 |
animate | 2.321928 | 1.214125 | 3.536053 |
B_inanimate | 1.906891 | 2.481526 | 0.574635 |
>L_inanimate | 1.807355 | 2.221356 | 4.028711 |
<L_inanimate | 1.299560 | 3.398549 | 2.098989 |
>L_animate | 1.125531 | 1.049066 | 2.174596 |
#B.show(experiments['vi_cmpl_animacy'].target2basis2result['NTN[.qal']['K_inanimate']) # uncomment to show texts
The top unsimilar match, K_inanimate, shows a special meaning of נתן that is not shared by BW>.hif: assigning or changing things, i.e. to set things "as" something else.
This difference also highlights the greater similarity between נתן and שׂים, which scores high similarities in both object and complement experiments (across all datapoints). שׂים, like נתן, can bear this kind of "assignment" sense beyond a simple movement of objects in space.
give_take = get_sim_experiments('NTN[.qal', 'LQX[.qal')
לקח scores high in similarity with נתן in object experiments, including animacy, domain, and lexemes. There are also higher scores for the adjunct experiments.
לקח reflects differences in the complement experiments, which have lower similarity overall.
Below we compare לקח to the two previously matched lexemes with נתן.
compareChange(('NTN[.qal', 'LQX[.qal'), ('NTN[.qal', 'FJM[.qal'), colors=expcolors2)
Here we can see that there are improvements in the frame object domain and lexeme scores over against שׂים for לקח
compareChange(('NTN[.qal', 'LQX[.qal'), ('NTN[.qal', 'BW>[.hif'), colors=expcolors2)
לקח also enjoys a boost in object domain and lexeme similarity as compared with בוא.hif.
This is the first of the moderately surprising verbs.
give_do = get_sim_experiments('NTN[.qal', '<FH[.qal')
Here we have high levels of agreement in object experiments, including animac, domain, and lexeme. In the presence/absence experiments the complement experiments have a markedly decreased similarity.
First, how is the complement presence/absence different?
plotPa('NTN[.qal', '<FH[.qal', 'vi_cmpl_pa')
NTN[.qal Cmpl 0.827749 ø 0.172251 Name: NTN[.qal, dtype: float64 <FH[.qal Cmpl 0.290679 ø 0.709321 Name: <FH[.qal, dtype: float64
עשׂה occurs without a complement much more often than with; the opposite is true for נתן.
It is noteworthy, though, that the frame experiments seem to be somewhat of an exception to this. Though the complement experiments there are still lower than, for instance, the object experiment or adjunct experiment, they are relatively higher in similarity. So, while there is a smaller similarity in the simple presence/absence of the complement, the non-pa complement frames have a fairly high rate of similarity.
Looking more closely at complement animacy frames, which score the second highest:
topCommon('NTN[.qal', '<FH[.qal', 'vf_cmpl_animacy')
averaged | NTN[.qal | <FH[.qal | |
---|---|---|---|
L_animate|L_animate | 6.096262 | 6.303781 | 5.888743 |
L_animate | 4.218944 | 4.643605 | 3.794284 |
K_inanimate | 4.126606 | 5.856322 | 2.396890 |
L_inanimate | 3.610052 | 2.328584 | 4.891520 |
K_animate | 3.603341 | 3.603341 | 3.603341 |
B_animate | 2.670185 | 2.819965 | 2.520405 |
<L_inanimate | 2.028407 | 3.244887 | 0.811928 |
<L_animate | 1.292553 | 2.292553 | 0.292553 |
MN_animate | 0.868342 | 1.160823 | 0.575860 |
The L_animate|L_animate frame, upon closer inspection, appears quite differently between the two terms. With NTN the term is used to refer to giving something[/one] to something as a role: give woman to man "to" wife. In the two examples of this frame with עשׂה, two complements appear due to coordination or resumption of a previous complement, not as a double complement. This shows a bit of the pitfall of the current approach which should eventually be addressed in future research.
# B.show(experiments['vf_cmpl_animacy'].target2basis2result['<FH[.qal']['L_animate|L_animate']) # uncomment too see texts
The example of L+animate also reveals a subtle difference between a pattern that is otherwise exactly the same. With נתן the complement denotes to whom an object is given; but with עשׂה the complement seems to describe for whom an action is done.
#B.show(experiments['vf_cmpl_animacy'].target2basis2result['<FH[.qal']['L_animate']) # uncomment to see texts
Here are the top differences in frame complement animacy.
topUncommon('NTN[.qal', '<FH[.qal', 'vf_cmpl_animacy').head(10)
difference | NTN[.qal | <FH[.qal | |
---|---|---|---|
L_inanimate|L_inanimate | 7.303781 | 0.000000 | 7.303781 |
<L_inanimate|L_animate | 6.718818 | 6.718818 | 0.000000 |
L_animate|L_inanimate | 6.566815 | 6.566815 | 0.000000 |
<L_animate|L_animate | 5.303781 | 5.303781 | 0.000000 |
TXT/_animate | 5.133856 | 5.133856 | 0.000000 |
<M_inanimate | 4.718818 | 0.000000 | 4.718818 |
TXT/_inanimate | 4.603341 | 4.603341 | 0.000000 |
>T==_animate | 4.253155 | 0.000000 | 4.253155 |
BJN/_inanimate | 3.718818 | 3.718818 | 0.000000 |
K_inanimate | 3.459432 | 5.856322 | 2.396890 |
#B.show(experiments['vf_cmpl_animacy'].target2basis2result['NTN[.qal']['<L_animate|L_animate']) # uncomment to see texts
Also for complement lexemes:
topUncommon('NTN[.qal', '<FH[.qal', 'vi_cmpl_lex').head(10)
difference | NTN[.qal | <FH[.qal | |
---|---|---|---|
>T==_<BD/ | 8.134426 | 0.000000 | 8.134426 |
L_CB</ | 8.134426 | 8.134426 | 0.000000 |
B_<MJT/ | 8.134426 | 8.134426 | 0.000000 |
L_BRQ/ | 8.134426 | 0.000000 | 8.134426 |
>L_BGD/ | 8.134426 | 8.134426 | 0.000000 |
L_BZH/ | 8.134426 | 8.134426 | 0.000000 |
L_C>R/ | 8.134426 | 8.134426 | 0.000000 |
B_<BRH=/ | 8.134426 | 0.000000 | 8.134426 |
MN_TKLT/ | 8.134426 | 0.000000 | 8.134426 |
L_CBJ/ | 8.134426 | 8.134426 | 0.000000 |
Now we compare changes in similarity for עשׂה with the intuitive matches. With אמר we saw a division between pa experiments and context-based experiments, with decreases also in similarity for vd_con_clause. Does the same pattern hold true here?
compareChange(('NTN[.qal', '<FH[.qal'), ('NTN[.qal', 'FJM[.qal'), colors=expcolors2) # NB: with modified order
In this case, we actually see that the simple presence/absence experiments bear most of the negative changes while adjunct context shows a lot of gains. However, the only gains occur in the adjunct experiments and vd experiments.
The improvement in vd_con_clause is really opposite from the pa context division seen with the אמר surprise matches, where this experiment would frequently rank lower in the pairwise comparisons.
What kind of adjunct elements contribute to similarity here?
topCommon('NTN[.qal', '<FH[.qal', 'vi_adj+_domain').head(10)
averaged | NTN[.qal | <FH[.qal | |
---|---|---|---|
<L>CR_Exist | 6.761551 | 6.761551 | 6.761551 |
<L_Wrong | 6.761551 | 6.761551 | 6.761551 |
K_Wrong | 6.115160 | 4.954196 | 7.276124 |
K_Right | 6.091626 | 5.591626 | 6.591626 |
L_Smell | 5.969070 | 5.176589 | 6.761551 |
L_Location | 5.761551 | 6.761551 | 4.761551 |
B_Waterbodies | 5.761551 | 5.761551 | 5.761551 |
L_Bear | 5.761551 | 5.761551 | 5.761551 |
LM<N_Names of People | 5.676589 | 6.176589 | 5.176589 |
B_Metal | 5.591626 | 5.006664 | 6.176589 |
# B.show(experiments['vi_adj+_domain'].target2basis2result['<FH[.qal']['L_Location'])
topCommon('NTN[.qal', '<FH[.qal', 'vi_adj+_lex').head(10)
averaged | NTN[.qal | <FH[.qal | |
---|---|---|---|
<L_TW<BH/ | 7.204571 | 7.204571 | 7.204571 |
<L>CR_<FH[ | 7.204571 | 7.204571 | 7.204571 |
<L_QYH/ | 7.119609 | 6.619609 | 7.619609 |
K_BJT/ | 6.912090 | 6.619609 | 7.204571 |
L_RXY[ | 6.704571 | 6.204571 | 7.204571 |
K_PH/ | 6.675124 | 7.467606 | 5.882643 |
B_ZDWN/ | 6.619609 | 6.619609 | 6.619609 |
L_RJX/ | 6.382643 | 5.882643 | 6.882643 |
B_RXB==/ | 6.382643 | 5.882643 | 6.882643 |
L_<LH/ | 6.204571 | 6.204571 | 6.204571 |
The vf_argAll_animacy experiment was another noticeably smaller similarity score in the comparison with the שׂים pairing:
topUncommon('NTN[.qal', '<FH[.qal', 'vf_argAll_animacy').head(10)
difference | NTN[.qal | <FH[.qal | |
---|---|---|---|
Objc.inanimate|Objc.inanimate|adj+.L_animate | 7.693487 | 0.000000 | 7.693487 |
adj+.<D_inanimate|adj+.MN_inanimate | 7.693487 | 0.000000 | 7.693487 |
Objc.inanimate|Objc.inanimate|adj+.B_inanimate | 7.693487 | 0.000000 | 7.693487 |
Cmpl.L_animate|Objc.animate|adj+.L_animate | 7.693487 | 7.693487 | 0.000000 |
Cmpl.K_animate|Objc.animate | 7.693487 | 0.000000 | 7.693487 |
Objc.animate|adj+.L_inanimate|adj+.inanimate | 7.693487 | 0.000000 | 7.693487 |
Cmpl.<L_inanimate|Objc.inanimate|Objc.inanimate | 7.693487 | 0.000000 | 7.693487 |
Cmpl.L_animate|Objc.animate|adj+.LM<N_inanimate | 7.693487 | 7.693487 | 0.000000 |
Cmpl.<M_inanimate|Objc.inanimate | 7.693487 | 0.000000 | 7.693487 |
Cmpl.L_animate|Objc.animate|adj+.LM<N_animate | 7.693487 | 7.693487 | 0.000000 |
# B.show(experiments['vf_argAll_animacy'].target2basis2result['<FH[.qal']['Cmpl.K_animate|Objc.animate'])
compareChange(('NTN[.qal', '<FH[.qal'), ('NTN[.qal', 'BW>[.hif'), colors=expcolors2)
עשׂה shows surprising similarity across nearly all of the context experiments with decreases in the presence/absence experiments. This represents an inverse situation to, for instance, the comparison of אמר & דבר with אמר and נתן, where the primary increases in similarity came in the pa experiments while decreases happened in the context experiments.
The strongest increases in similarity come from the adjunct experiments.
compareChange(('NTN[.qal', '<FH[.qal'), ('NTN[.qal', 'LQX[.qal'), colors=expcolors2)
Again there are lot of similarity increases in the contextual experiments. The greatest gains come from adjunct experiments. The complement experiments also show increased similarity in contextual experiments.
There are some differences, though, in object contextual experiments; and while the complement category shows higher similarity amongst contextual experiments, that similarity is countered by dips in similarity for the presence/absence experiments.
Overall, the main difference between נתן and עשׂה with respect to the intuitive matches of FJM, BW>.hif, and LQX, is increased similarity in contextual experiments without a corresponding increase in similarity between pa experiments. This is the counterpart to the mismatch of pa/context with the אמר unintuitive matches.
There may be some significance in the shared similarity between נתן and עשׂה for the contextual experiments. These experiments test more semantic and lexical similarities than the more grammatically oriented pa experiments.
We have already investigated this pairing, but we need to make the comparison of the intuitive match of נתן against the pairing with היה.
compareChange(('HJH[.qal', 'NTN[.qal'), ('NTN[.qal', 'FJM[.qal'), colors=expcolors2) # NB modified bar order
These differences confirm what could already be seen in the similarity chart for היה and נתן: increases in similarity are primarily limited to the adjunct experiments. Object experiments and the subject animacy experiments take a hit throughout.
With the modified order we can also see that the gains are restricted to the contextual experiments while there are large losses in the pa experiments, even for adjunct categories.
top_walk = sim['HLK[.qal'].sort_values(ascending=False).head(5)
plotTop(top_walk, 'HLK[.qal')
print('Similarity scores for HLK:')
top_walk
Similarity scores for HLK:
BW>[.qal 0.675369 <LH[.qal 0.588611 CWB[.qal 0.560521 <BR[.qal 0.541362 JY>[.qal 0.527966 Name: HLK[.qal, dtype: float64
Note that while HLK is BW>'s second most similar, HLK ranks B>W as its most similar.
Since we've examined this pairing already, we will skip it here.
walk_ascend = get_sim_experiments('HLK[.qal', '<LH[.qal')
Here we see a lot of similarity with strong adjunct similarities.
The complement experiments are also high, but occasionally dip by comparison to other experiments, such as in the vf_pa experiments. Though these differences seem very small overall.
plotPa('HLK[.qal', '<LH[.qal', 'vf_cmpl_pa')
HLK[.qal Cmpl 0.466211 Cmpl|Cmpl 0.018223 Cmpl|Cmpl|Cmpl 0.000759 ø 0.514806 Name: HLK[.qal, dtype: float64 <LH[.qal Cmpl 0.590193 Cmpl|Cmpl 0.057793 Cmpl|Cmpl|Cmpl 0.000000 ø 0.352014 Name: <LH[.qal, dtype: float64
עלה seems to have a slightly bigger preference for double complements and for complements in general. This is similar to what we saw in the similarity differences between הלך and בוא. HLK simply seems to prefer less complement elements as opposed to its fellow movement verbs.
topUncommon('HLK[.qal', '<LH[.qal', 'vi_cmpl_lex', focus='HLK[.qal').head(15)
difference | HLK[.qal | <LH[.qal | |
---|---|---|---|
ZJP/ | 8.134426 | 8.134426 | 0.0 |
>L_GG/ | 8.134426 | 8.134426 | 0.0 |
B_YJH/ | 8.134426 | 8.134426 | 0.0 |
>T==_XKM/ | 8.134426 | 8.134426 | 0.0 |
CBJ/ | 8.134426 | 8.134426 | 0.0 |
CPJ/ | 8.134426 | 8.134426 | 0.0 |
>L_Y>N/ | 8.134426 | 8.134426 | 0.0 |
>L_XLDH/ | 8.134426 | 8.134426 | 0.0 |
>L_VWB/ | 8.134426 | 8.134426 | 0.0 |
>L_TRCJC=/ | 8.134426 | 8.134426 | 0.0 |
>L_TLMJ/ | 8.134426 | 8.134426 | 0.0 |
>L_TBY/ | 8.134426 | 8.134426 | 0.0 |
>L_RB<=/ | 8.134426 | 8.134426 | 0.0 |
DRK/_MLK/ | 8.134426 | 8.134426 | 0.0 |
DTN/ | 8.134426 | 8.134426 | 0.0 |
topUncommon('HLK[.qal', '<LH[.qal', 'vi_cmpl_lex', focus='<LH[.qal').head(15)
difference | HLK[.qal | <LH[.qal | |
---|---|---|---|
B_LHB/ | 8.134426 | 0.0 | 8.134426 |
MN_RXYH/ | 8.134426 | 0.0 | 8.134426 |
MN_PXT/ | 8.134426 | 0.0 | 8.134426 |
B_CRH/ | 8.134426 | 0.0 | 8.134426 |
<L_RWX/ | 8.134426 | 0.0 | 8.134426 |
<L_TJKWN/ | 8.134426 | 0.0 | 8.134426 |
DRK/_BCN/ | 8.134426 | 0.0 | 8.134426 |
>DR===/ | 8.134426 | 0.0 | 8.134426 |
MN_GBTWN/ | 8.134426 | 0.0 | 8.134426 |
MN_G>WN/ | 8.134426 | 0.0 | 8.134426 |
BJT_XGLH/ | 8.134426 | 0.0 | 8.134426 |
BJT_>WN/ | 8.134426 | 0.0 | 8.134426 |
>L_BKJM/ | 8.134426 | 0.0 | 8.134426 |
MN_>DMH/ | 8.134426 | 0.0 | 8.134426 |
MN_<RBH/ | 8.134426 | 0.0 | 8.134426 |
A main difference seems to be עלה's preference for additional MN prepositions. הלך seems to have slightly more אל instances.
walk_return = get_sim_experiments('HLK[.qal', 'CWB[.qal')
Why is adjunt lower here?
topUncommon('HLK[.qal', 'CWB[.qal', 'vf_adju_animacy', focus='HLK[.qal')
difference | HLK[.qal | CWB[.qal | |
---|---|---|---|
>T==_inanimate | 6.285402 | 6.285402 | 0.000000 |
>T==_animate|B_inanimate | 5.285402 | 5.285402 | 0.000000 |
KMW_animate | 4.700440 | 4.700440 | 0.000000 |
>T==_animate | 4.437405 | 4.437405 | 0.000000 |
<M_animate | 3.906891 | 3.906891 | 0.000000 |
B_inanimate|inanimate | 2.037475 | 2.037475 | 0.000000 |
B_animate | 1.910363 | 1.910363 | 0.000000 |
inanimate | 1.217658 | 1.217658 | 0.000000 |
K_animate | 1.000000 | 1.584963 | 0.584963 |
L_inanimate | 0.321928 | 1.596103 | 1.274175 |
B_inanimate | 0.112398 | 0.112398 | 0.000000 |
topUncommon('HLK[.qal', 'CWB[.qal', 'vf_adju_animacy', focus='CWB[.qal')
difference | HLK[.qal | CWB[.qal | |
---|---|---|---|
>XR/_inanimate | 4.700440 | 0.000000 | 4.700440 |
LM<N_animate | 3.963474 | 0.000000 | 3.963474 |
animate | 2.378512 | 0.000000 | 2.378512 |
<L_animate | 2.115477 | 0.000000 | 2.115477 |
K_animate | 1.000000 | 1.584963 | 0.584963 |
L_inanimate | 0.321928 | 1.596103 | 1.274175 |
The primary difference here seems to come from the accompaniment "with" prepositions like את and ב that occur more with הלך. These are missing from the CWB top uncommons.
walk_cross = get_sim_experiments('HLK[.qal', '<BR[.qal')
We see decreases in object pa experiments as well as in frame experiments generally. The complement experiments seem lower too.
How do these objects differ in object preference?
plotPa('HLK[.qal', '<BR[.qal', 'vi_objc_pa')
HLK[.qal Objc 0.014118 ø 0.985882 Name: HLK[.qal, dtype: float64 <BR[.qal Objc 0.241379 ø 0.758621 Name: <BR[.qal, dtype: float64
topUncommon('HLK[.qal', '<BR[.qal', 'vi_cmpl_animacy', focus='HLK[.qal')
difference | HLK[.qal | <BR[.qal | |
---|---|---|---|
BMW_inanimate | 7.442943 | 7.442943 | 0.000000 |
QDMH/_animate | 7.442943 | 7.442943 | 0.000000 |
DRK/_animate | 7.442943 | 7.442943 | 0.000000 |
>XR/_inanimate | 6.705978 | 6.705978 | 0.000000 |
DRK/_inanimate | 5.705978 | 5.705978 | 0.000000 |
>XR/_animate | 5.209453 | 6.128835 | 0.919382 |
L_inanimate | 3.060474 | 3.060474 | 0.000000 |
<M_animate | 2.807355 | 3.100551 | 0.293196 |
<D_inanimate | 2.321928 | 4.036951 | 1.715023 |
>L_inanimate | 1.938599 | 3.744918 | 1.806319 |
>L_animate | 1.799087 | 1.799087 | 0.000000 |
inanimate | 1.528379 | 3.906891 | 2.378512 |
>T==_animate | 1.000000 | 2.253119 | 1.253119 |
MN_inanimate | 0.652077 | 1.451164 | 0.799087 |
animate | 0.584963 | 2.799087 | 2.214125 |
B_inanimate | 0.118644 | 2.237600 | 2.118956 |
<L_animate | 0.000000 | 0.016679 | 0.016679 |
<L_inanimate | 0.000000 | 1.205904 | 1.205904 |
topUncommon('HLK[.qal', '<BR[.qal', 'vi_cmpl_animacy', focus='<BR[.qal')
difference | HLK[.qal | <BR[.qal | |
---|---|---|---|
>XR/_animate | 5.209453 | 6.128835 | 0.919382 |
TXT/_inanimate | 3.742504 | 0.000000 | 3.742504 |
<M_animate | 2.807355 | 3.100551 | 0.293196 |
<D_inanimate | 2.321928 | 4.036951 | 1.715023 |
>L_inanimate | 1.938599 | 3.744918 | 1.806319 |
inanimate | 1.528379 | 3.906891 | 2.378512 |
>T==_animate | 1.000000 | 2.253119 | 1.253119 |
MN_inanimate | 0.652077 | 1.451164 | 0.799087 |
animate | 0.584963 | 2.799087 | 2.214125 |
B_inanimate | 0.118644 | 2.237600 | 2.118956 |
<L_animate | 0.000000 | 0.016679 | 0.016679 |
<L_inanimate | 0.000000 | 1.205904 | 1.205904 |
walk_goout = get_sim_experiments('HLK[.qal', 'JY>[.qal')
plotPa('HLK[.qal', 'JY>[.qal', 'vi_objc_pa')
HLK[.qal Objc 0.014118 ø 0.985882 Name: HLK[.qal, dtype: float64 JY>[.qal Objc 0.022059 ø 0.977941 Name: JY>[.qal, dtype: float64
topUncommon('HLK[.qal', 'JY>[.qal', 'vi_objc_lex').head(17)
difference | HLK[.qal | JY>[.qal | |
---|---|---|---|
>CR_BW>[ | 8.252665 | 0.000000 | 8.252665 |
YB>/ | 7.252665 | 0.000000 | 7.252665 |
TMJM/ | 7.252665 | 7.252665 | 0.000000 |
NKX/ | 6.667703 | 6.667703 | 0.000000 |
MDBR/ | 6.445311 | 6.445311 | 0.000000 |
NTJBH/ | 6.252665 | 6.252665 | 0.000000 |
PTX/ | 5.445311 | 0.000000 | 5.445311 |
>RX/ | 5.082740 | 5.082740 | 0.000000 |
GBWL/ | 3.667703 | 0.000000 | 3.667703 |
DRK/ | 3.655730 | 3.655730 | 0.000000 |
YDQH/ | 2.667703 | 2.667703 | 0.000000 |
<JR/ | 2.506711 | 0.000000 | 2.506711 |
MJM/ | 2.263981 | 2.263981 | 0.000000 |
JFR>L/ | 1.333802 | 1.333802 | 0.000000 |
KL/ | 0.608809 | 0.000000 | 0.608809 |
BJT/ | 0.108007 | 0.108007 | 0.000000 |
MWRH/ | 0.000000 | 0.000000 | 0.000000 |
top_see = sim['R>H[.qal'].sort_values(ascending=False).head(5)
plotTop(top_see, 'R>H[.qal')
print('Similarity scores for R>H:')
top_see
Similarity scores for R>H:
JD<[.qal 0.512509 <FH[.qal 0.505851 <BD[.qal 0.468596 NKH[.hif 0.455947 MY>[.qal 0.455344 Name: R>H[.qal, dtype: float64
ידע is a mostly intuitive match in this set; מצא is also a striking similarity, though maybe not intuitive. There are other verbs like נבט that might be expected. Are these in the dataset?
'NBV[.hif' in sim.keys()
True
plotTop(sim['NBV[.hif'].sort_values(ascending=False).head(5), 'NBV[.hif')
Interestingly, נבט does list ראה in its top 5, as the last most similar term. Perhaps at some point we will analyze these differences.
see_know = get_sim_experiments('R>H[.qal', 'JD<[.qal')
see_find = get_sim_experiments('R>H[.qal', 'MY>[.qal')
Here is the first surprising match.
see_do = get_sim_experiments('R>H[.qal', '<FH[.qal')
compareChange(('R>H[.qal', '<FH[.qal'), ('R>H[.qal', 'JD<[.qal'), colors=expcolors2)
compareChange(('R>H[.qal', '<FH[.qal'), ('R>H[.qal', 'MY>[.qal'), colors=expcolors2)
ALL
top_speakD = sim['DBR[.piel'].sort_values(ascending=False).head(5)
plotTop(top_speakD, 'DBR[.piel')
print('Similarity scores for DBR:')
top_speakD
Similarity scores for DBR:
>MR[.qal 0.629496 <FH[.qal 0.540199 QR>[.qal 0.538599 CLX[.qal 0.517720 BW>[.qal 0.499386 Name: DBR[.piel, dtype: float64
ALL
top_hear = sim['CM<[.qal'].sort_values(ascending=False).head(5)
plotTop(top_hear, 'CM<[.qal')
print('Similarity scores for CM<:')
top_hear
Similarity scores for CM<:
DBR[.piel 0.494233 >MR[.qal 0.459185 R>H[.qal 0.443848 NF>[.qal 0.417100 <FH[.qal 0.411775 Name: CM<[.qal, dtype: float64
ALL
top_take = sim['LQX[.qal'].sort_values(ascending=False).head(5)
plotTop(top_take, 'LQX[.qal')
print('Similarity scores for LQX:')
top_take
Similarity scores for LQX:
NTN[.qal 0.615713 JY>[.hif 0.568238 BW>[.hif 0.560366 <FH[.qal 0.548989 NF>[.qal 0.532253 Name: LQX[.qal, dtype: float64
ALL
export_dir = 'verb_similarities'
for rank, verb_tuple in enumerate(freq_count.most_common(20)):
verb = verb_tuple[0]
top_take = sim[verb].sort_values(ascending=False).head(5)
plotTop(top_take, verb)
plt.savefig(f'{export_dir}/{rank}.{verb}.png')
plt.close()