This is joint work of Martijn Naaijer and Dirk Roorda.
Results of this work are included in:
Naaijer, Martijn and Dirk Roorda (2016). Parallel Texts in the Hebrew Bible, New Methods and Visualizations. Preprint on arxiv. To be published in the Journal of Datamining and Digitial Humanities.
We perform a case study into 2 Kings 19-25 and passages that run partly parallel to these chapters. These variants occur in Chronicles, Isaiah and Jeremiah. We also examine the Qumran scroll 1QIsaa of the book Isaiah. In this notebook we collect the data and carry out preliminary analyses. We find the variants by using the database of cross references made by this notebook in SHEBANQ: parallel.
Section 1 is devoted to finding and viewing the parallels within the Masoretic texts.
Section 2 shows the differences between Isaiah in the MT and in the 1QIsaa scroll.
The results of section 1 are:
The results of section 2 are:
The program starts in the next cell, with the loading of several modules. We shall not document all programming details, but restrict ourselves to the issues that concern the Hebrew texts and the patterns in the data associated with it.
import sys, os, re, pickle
import collections, difflib
from Levenshtein import ratio
# (sudo -H) pip(3) install python-Levenshtein
# brew install freetype # on mac os x
# (sudo -H) pip(3) install matplotlib
from IPython.display import HTML, display_pretty, display_html
from difflib import SequenceMatcher
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
from tf.fabric import Fabric
from tf.transcription import Transcription
We use the BHSA database in its version 2016, downloadable from the GitHub repo bhsa. The format of the data obtained through Github is immediately ready to be used by Text-Fabric, and hence by this notebook as well.
A previous version of this notebook was based on version 4b
,
also downloadable from the BHSA repository.
That version has also been archived at DANS, downloadable via DOI 10.17026/dans-z6y-skyh.
The transcription of 1QIsaa is in a file produced by the ETCBC. This file is included here in SHEBANQ.
locations = '~/github/etcbc'
sources = ['bhsa', 'parallels', 'phono']
version = '2017'
QISA_FILE = os.path.expanduser('{}/parallels/source/1QIsaa_an.txt'.format(locations))
We only use a few data features from the ETCBC database. You see them in the code below. Their documentation can be found through the SHEBANQ help function or via this direct link: Feature-doc. Here is the direct link to otype.
modules = ['{}/tf/{}'.format(s, version) for s in sources]
TF = Fabric(locations=locations, modules=modules)
This is Text-Fabric 3.0.9 Api reference : https://github.com/Dans-labs/text-fabric/wiki/Api Tutorial : https://github.com/Dans-labs/text-fabric/blob/master/docs/tutorial.ipynb Example data : https://github.com/Dans-labs/text-fabric-data 120 features found and 0 ignored
api = TF.load('''
language lex_utf8 verse
crossref
''')
api.makeAvailableIn(globals())
0.00s loading features ... | 0.01s B verse from /Users/dirk/github/etcbc/bhsa/tf/2017 | 0.17s B lex_utf8 from /Users/dirk/github/etcbc/bhsa/tf/2017 | 0.12s B language from /Users/dirk/github/etcbc/bhsa/tf/2017 | 0.01s B crossref from /Users/dirk/github/etcbc/parallels/tf/2017 | 0.00s Feature overview: 110 for nodes; 8 for edges; 2 configs; 7 computed 4.55s All features loaded/computed - for details use loadLog()
# the language of the book names
LANG = 'en'
# the book and chapters that are central to our study
REFBOOKS = {'2_Kings'}
REFCHAPTERS = set(range(19,26))
# output files
NCOL_FILE = 'kings_crossrefs.ncol' # graph of similar verses to 2 Kings 19-25
SIMILAR_FILE = 'kings_similarities.tsv' # refined similarities based on sentences
We want to discuss the portion of 2_Kings
that we are interested in separately from the other chapters.
That is why we introduce a virtual book 2_Kingsr
for our reference chapters.
The ETCBC database uses Latin names for the Bible books. We translate them to conventional English names.
bookNode = dict()
for b in F.otype.s('book'):
bookName = T.bookName(b, lang=LANG)
bookNode[bookName] = b
if bookName == '2_Kings':
bookNode[bookName+'r'] = b
def passage_key(p):
(bk, ch, vs) = p
return (-1, ch, vs) if bk in REFBOOKS and ch in REFCHAPTERS else (bookNode[bk], ch, vs)
# the format of verse references
PASSAGE_FMT = '{}~{}:{}'
PASSAGER_FMT = '{}r~{}:{}' # used for the pseudo-book Reges_IIr
We find the verses that are similar to any verse in 2 Kings 19-25.
Our method is to use one of the similarity matrices that has been computed by the parallels notebook.
To be precise, we took the matrix computed for the SET method applied to verses. We then extract the similarities higher than 60. These specifications are stored in the variables CHUNK_GREP
, MATRIX_GREP
and
SIM_THRESHOLD_GREP
.
Every similarity we find is a pair of verse references, at least one of which is in 2 Kings 19-25.
That reference we render as being in the book 2_Kingsr
because in a later visualization we
want to have our focus chapters in a separate colummn.
allVerseNodes = set()
nInternal = 0
nAll = 0
crossrefs = set()
for vnX in F.otype.s('verse'):
(bkX, chX, vsX) = T.sectionFromNode(vnX, lang=LANG)
if bkX not in REFBOOKS or chX not in REFCHAPTERS: continue
for (vnY, r) in E.crossref.t(vnX):
nAll += 1
(bkY, chY, vsY) = T.sectionFromNode(vnY, lang=LANG)
if bkY in REFBOOKS and chY in REFCHAPTERS:
nInternal += 1
continue
crossrefs.add(((bkX, chX, vsX), (bkY, chY, vsY), r))
allVerseNodes |= {vnX, vnY}
info('{} external crossrefs saved; {} internal crossrefs skipped; from total {} crossrefs'.format(
len(crossrefs), nInternal, nAll,
))
print('\n'.join('{}r\t{}\t{}\t{}\t{}\t{}\t{}'.format(*x[0], *x[1], round(x[2])) for x in sorted(crossrefs)[0:10]))
19s 276 external crossrefs saved; 50 internal crossrefs skipped; from total 326 crossrefs 2_Kingsr 19 1 Isaiah 37 1 100 2_Kingsr 19 2 Isaiah 37 2 96 2_Kingsr 19 3 Isaiah 37 3 100 2_Kingsr 19 4 Isaiah 37 4 97 2_Kingsr 19 5 Isaiah 37 5 100 2_Kingsr 19 6 Isaiah 37 6 100 2_Kingsr 19 7 Isaiah 37 7 100 2_Kingsr 19 8 Isaiah 37 8 100 2_Kingsr 19 9 Isaiah 37 9 86 2_Kingsr 19 10 Isaiah 37 10 100
We want to visualize the similarities found so far as a graph, where the verses are nodes and the similarities are edges. To that end we have to store the similarities in a format such that graph software can read it.
We write out the graph data as a file in .ncol
format, and we will use the python package networkx to
read and process that file.
We also produce:
info('Exporting graph info, assembling sets')
ncolfile = open(NCOL_FILE, 'w')
for (x, y, r) in sorted(crossrefs, key=lambda z: (
bookNode[z[0][0]], z[0][1], z[0][2],
bookNode[z[1][0]], z[1][1], z[1][2],
)):
ncolfile.write('{} {} {}\n'.format(PASSAGER_FMT.format(*x), PASSAGE_FMT.format(*y), round(r)))
ncolfile.close()
allVerses = {(x[0][0]+'r', x[0][1], x[0][2]) for x in crossrefs} | {x[1] for x in crossrefs}
allChapters = {(x[0], x[1]) for x in allVerses}
allBooks = {x[0] for x in allChapters}
info('{} edges, {} verses, {} chapters, {} books'.format(
len(crossrefs), len(allVerses), len(allChapters), len(allBooks),
))
print(' '.join(sorted(allBooks)))
23s Exporting graph info, assembling sets 23s 276 edges, 292 verses, 42 chapters, 9 books 1_Kings 2_Chronicles 2_Kings 2_Kingsr Ezekiel Haggai Isaiah Jeremiah Leviticus
We now visualize the similarities in a graph, using networkx. The layout is done manually, not following any of the methods provided by networkx.
The verses are put into columns by the book they occur in,
and our focus chapters occupy a separate column, thanks to the pseudo book Reges_IIr
.
Reges_II
stands for the other chapters of 2 Kings.
The rows of verses are ordered textually.
Finally, we shift rows of verses up and down in order to align them with their parallel stretches. The thickness of the edges correponds to the degree of similarity, and likewise, the blacker the edge, the more similar the pair of verses.
Now we read the graph data file, adjust layout settings, plot it and save it as pdf.
# read the graph data
g = nx.read_weighted_edgelist(NCOL_FILE)
# order the books for handy layout in columns
allBooksCust = '''
Jeremiah 2_Chronicles Isaiah 2_Kingsr 1_Kings 2_Kings Ezekiel Haggai Leviticus
'''.strip().split()
# specify colors of books
gcolors = {
'2_Kingsr': (0.9, 0.9, 0.9),
'Isaiah': (1.0, 0.3, 0.3),
'Jeremiah': (0.3, 1.0, 0.3),
'2_Chronicles': (1.0, 0.3, 1.0),
'1_Kings': (0.3, 0.3, 1.0),
'2_Kings': (1.0, 1.0, 0.3),
'Leviticus': (0.7, 0.7, 0.7),
'Ezekiel': (0.7, 0.7, 0.7),
'Haggai': (0.7, 0.7, 0.7),
}
# specify vertical positions of passages
offsetY = {
'2_Kingsr': 0,
'Isaiah': 0,
'Jeremiah': 90,
'2_Chronicles': 52,
'1_Kings': 55,
'2_Kings': 75,
'Leviticus': 65,
'Ezekiel': 65,
'Haggai': 60,
}
# compute positions of verses
ncolors = [gcolors[x.split('~')[0]] for x in g.nodes()]
nlabels = dict((x, x.split('~')[1]) for x in g.nodes())
ncols = len(allBooks)
posX = dict((x, i) for (i,x) in enumerate(allBooksCust))
verseLists = collections.defaultdict(lambda: [])
for (bk, ch, vs) in sorted(allVerses):
verseLists[bk].append('{}:{}'.format(ch, vs))
nrows = max(len(verseLists[bk]) for bk in allBooksCust)
pos = {}
for bk in verseLists:
for (i, chvs) in enumerate(verseLists[bk]):
pos['{}~{}'.format(bk, chvs)] = (posX[bk], i+offsetY[bk])
# start plotting
plt.figure(figsize=(18,40))
nx.draw_networkx(g, pos,
width=[g.get_edge_data(*x)['weight']/40 for x in g.edges()],
edge_color=[g.get_edge_data(*x)['weight'] for x in g.edges()],
edge_cmap=plt.cm.Greys,
edge_vmin=50,
edge_vmax=100,
node_color=ncolors,
node_size=100,
labels=nlabels,
alpha=0.4,
linewidths=0,
)
plt.ylim(-2, 130)
bookFontSize = 12
plt.grid(b=True, which='both', axis='x')
plt.title('Parallels involving 2_Kings 19-25', fontsize=24)
plt.text(-1,70, '''
The parallels with
1_Kings and the other
chapters of 2_Kings
are weaker and
more sporadic.
Note that there are
also links within
2_Kings 19-25.
All these are probably
similar verses but
not parallels.
''', #bbox=dict(width=145, height=200, facecolor='yellow', alpha=0.4), fontsize=12)
# suddenly the width and height keyword args are no longer accepted.
# bbox performs an auto fit
bbox=dict(facecolor='yellow', alpha=0.4), fontsize=12)
# add additional book labels
for (ypos, books) in (
(-1, allBooksCust),
(51, ['2_Chronicles', 'Isaiah', '1_Kings']),
(61, ['Leviticus', 'Ezekiel', 'Haggai']),
(72, ['2_Kings']),
(89, ['2_Chronicles', 'Jeremiah']),
(101, ['2_Kings']),
(128, allBooksCust),
):
for bk in books:
plt.text(posX[bk], ypos, bk, fontsize=bookFontSize, horizontalalignment='center')
# save the plot as pdf
plt.savefig('kings_parallels.pdf')
From the graph above we read off which are the interesting passages to compare:
2_Kings 19-20:19 with Isaiah 37-39:8
2_Kings 21-23:3 with 2_Chronicles 33-34:31
2_Kings 23:31-25:30 with Jeremiah 52-52:34
focus_books = {'2_Kings', 'Isaiah', 'Jeremiah', '2_Chronicles'}
We have a closer look at the similarities between de passages as far as they are contained in the focus books.
We make a pairwise comparison of all sentences, based on the Levenshtein ratio (the similarity based on the Levenshtein distance).
First we divide the material up into sentences and fetch their texts from the database.
For this we use the function T.text(nodes, fmt='text-orig-full')
, an
API function
of Text-Fabric.
For each verse, we list its sentences and store them in two lists:
chunks
is a list of verse-sentence referenceschunk_data
is a list of the texts of the corresponding sentences.We order them in a special way: first all our reference verses, i.e. 2 Kings 19-25, and then the rest in normal order.
info('Identifying sentences')
crossrefs_lcs = set()
all_sentences = set()
for v in allVerseNodes:
for w in L.d(v, 'word'):
all_sentences.add(L.u(w, 'sentence')[0])
info('Found {} sentences in {} verses'.format(len(all_sentences), len(allVerseNodes)))
info('Grouping sentences by verse')
focus_sentences = collections.defaultdict(set)
for s in all_sentences:
fw = L.d(s, 'word')[0]
(bk, ch, vs) = T.sectionFromNode(fw, lang=LANG)
if bk not in focus_books: continue
if bk in REFBOOKS and ch not in REFCHAPTERS: continue
focus_sentences[(bk, ch, vs)].add(s)
info('Getting sentence texts')
chunks = []
chunk_data = []
# in the next line we order chunks such that the reference sentences come first, i.e.
# the ones from 2 Kings 19-25.
# When computing distances, if one of these verses is involved, it will
# occur in the first column, unless an other reference verse is in the first column
# It will not happen that when a reference verse is involved,
# a non-reference verse occupies the first column
for ((bk, ch, vs), sents) in sorted(
focus_sentences.items(),
key=lambda x: (passage_key(x[0]), x[1])
):
for (sn, s) in enumerate(sorted(sents)):
chunk_data.append(''.join(T.text(L.d(s, 'word'), fmt='text-orig-full')))
chunks.append((bk, ch, vs, sn))
info('Done: {} sentences in {} verses'.format(len(chunks), len(focus_sentences)))
for i in range(5):
print('{} {}'.format(chunks[i], chunk_data[i]))
39s Identifying sentences 39s Found 758 sentences in 292 verses 39s Grouping sentences by verse 39s Getting sentence texts 39s Done: 685 sentences in 246 verses ('2_Kings', 19, 1, 0) וַיְהִ֗י כִּשְׁמֹ֨עַ֙ הַמֶּ֣לֶךְ חִזְקִיָּ֔הוּ ('2_Kings', 19, 1, 1) וַיִּקְרַ֖ע אֶת־בְּגָדָ֑יו ('2_Kings', 19, 1, 2) וַיִּתְכַּ֣ס בַּשָּׂ֔ק ('2_Kings', 19, 1, 3) וַיָּבֹ֖א בֵּ֥ית יְהוָֽה׃ ('2_Kings', 19, 2, 0) וַ֠יִּשְׁלַח אֶת־אֶלְיָקִ֨ים אֲשֶׁר־עַל־הַבַּ֜יִת וְשֶׁבְנָ֣א הַסֹּפֵ֗ר וְאֵת֙ זִקְנֵ֣י הַכֹּֽהֲנִ֔ים מִתְכַּסִּ֖ים בַּשַּׂקִּ֑ים אֶל־יְשַֽׁעְיָ֥הוּ הַנָּבִ֖יא בֶּן־אָמֹֽוץ׃
We compare all sentences to each other.
The next cell performs the pairwise comparison by calling the Levenshtein function ratio
for each pair.
The result is stored in the dictionary chunk_dist
, keyed by a pair of indices in the chunk
list and valued by the similarity of that pair.
Here is more information about the Levenshtein module.
info('Comparing sentences and filtering the similar ones')
chunk_dist = {}
total_chunks = len(chunks)
for i in range(total_chunks):
c_i = chunk_data[i]
for j in range(i + 1, total_chunks):
c_j = chunk_data[j]
chunk_dist[(i, j)] = round(100 * ratio(c_i, c_j))
info('Done: {} distances'.format(len(chunk_dist)))
42s Comparing sentences and filtering the similar ones 43s Done: 234270 distances
The Levenshtein ratio of arbitrary but real texts tend to be not zero, but rather something approaching 0.5. We present an overview of how the actual similarity values of our sentence pairs are distributed.
We plot the number of comparisons against the similarities, where for each similarity value we plot the number of pairs that have at most that similarity value.
info('Analyzing similarities')
sim_levels = collections.Counter()
for ((i,j), sim) in chunk_dist.items(): sim_levels[sim] += 1
cumsum = 0
sim_levels_cum = []
start = 0
end = 100
for sim in reversed(range(start, end+1)):
cumsum += sim_levels.get(sim, 0)
sim_levels_cum.append(cumsum)
cummax = sim_levels_cum[100]
x = range(len(sim_levels_cum))
fig = plt.figure(figsize=(40,20))
plt.plot(x[start:end], sim_levels_cum[start:end])
plt.axis([start, end, 0, cummax])
plt.xticks(x[start:end], range(start, end), rotation='vertical')
plt.margins(0.2)
plt.subplots_adjust(bottom=0.15)
plt.title('cumulative similarities')
plt.savefig('kings_similarities.pdf')
46s Analyzing similarities
The full similarity file is here (8 MB). It is a tab separated file with 9 columns: 4 for book, chapter, verse, sentence number of the first sentence, another 4 for the second sentence, and the last column holds the similarity of both sentences. See the code below.
info('Writing similarities to disk')
field_template = ('{}\t' * 8) + '{}\n'
with open(SIMILAR_FILE, 'w') as f:
f.write(field_template.format('book_1', 'chap_1', 'verse_1', 'sen_1', 'book_2', 'chap2', 'verse_2', 'sen_2', 'sim'))
for ((i,j), sim) in sorted(chunk_dist.items()):
f.write(field_template.format(*chunks[i], *chunks[j], sim))
info('Done')
51s Writing similarities to disk 51s Done
We are going to construct a verse-by-verse table containing all parallels in text with difference markup. Red and green indicate that the material is absent at the other side. Yellow means that there is corresponding material at both sides but different.
For each verse we also give a list of the lexemes that are not shared by the two verses in that comparison.
The table will be formatted in HTML. Therefore we specify a stylesheet.
css = '''
<style type="text/css">
table.t {
width: 100%;
border-collapse: collapse;
}
table.h {
direction: rtl;
}
table.p {
direction: ltr;
}
tr.t.tb {
border-top: 2px solid #aaaaaa;
border-left: 2px solid #aaaaaa;
border-right: 2px solid #aaaaaa;
}
tr.t.bb {
border-bottom: 2px solid #aaaaaa;
border-left: 2px solid #aaaaaa;
border-right: 2px solid #aaaaaa;
}
th.t {
font-family: Verdana, Arial, sans-serif;
font-size: large;
vertical-align: middle;
text-align: center;
padding-left: 2em;
padding-right: 2em;
padding-top: 1ex;
padding-bottom: 2ex;
border-left: 2px solid #aaaaaa;
border-right: 2px solid #aaaaaa;
}
td.t {
border-left: 2px solid #aaaaaa;
border-right: 2px solid #aaaaaa;
padding-left: 1em;
padding-right: 1em;
padding-top: 0.3ex;
padding-bottom: 0.5ex;
}
td.h {
font-family: Ezra SIL, SBL Hebrew, Verdana, sans-serif;
font-size: large;
line-height: 1.6;
text-align: right;
direction: rtl;
}
td.ld {
font-family: Ezra SIL, SBL Hebrew, Verdana, sans-serif;
font-size: medium;
line-height: 1.2;
text-align: right;
vertical-align: top;
direction: rtl;
width: 10%;
}
td.p {
font-family: Verdana, sans-serif;
font-size: large;
line-height: 1.3;
text-align: left;
direction: ltr;
}
td.vl {
font-family: Verdana, Arial, sans-serif;
font-size: small;
text-align: right;
vertical-align: top;
color: #aaaaaa;
width: 5%;
direction: ltr;
border-left: 2px solid #aaaaaa;
border-right: 2px solid #aaaaaa;
padding-left: 0.4em;
padding-right: 0.4em;
padding-top: 0.3ex;
padding-bottom: 0.5ex;
}
span.m {
background-color: #aaaaff;
}
span.f {
background-color: #ffaaaa;
}
span.x {
background-color: #ffffaa;
color: #bb0000;
}
span.delete {
background-color: #ffaaaa;
}
span.insert {
background-color: #aaffaa;
}
span.replace {
background-color: #ffff00;
}
</style>
'''
html_file_tpl = '''<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>{}</title>
{}
</head>
<body>
{}
</body>
</html>'''
The following auxiliary functions help to wrap the text into HTML.
def lex_diff(c1, c2):
v1 = T.nodeFromSection(c1, lang=LANG)
v2 = T.nodeFromSection(c2, lang=LANG)
lex1 = {F.lex_utf8.v(w) for w in L.d(v1, 'word')}
lex2 = {F.lex_utf8.v(w) for w in L.d(v2, 'word')}
return (lex1-lex2, lex2-lex1)
compare_lexemes = {}
for (c1, c2, r) in crossrefs:
compare_lexemes[(c1, c2)] = lex_diff(c1, c2)
def print_label(vl, without_book=True):
bookrep = '' if without_book else '{} '.format(vl[0])
return '{}{}:{}'.format(bookrep, vl[1], vl[2]) if vl[0] != '' else ''
def print_diff(a, b):
arep = ''
brep = ''
for (lb, ai, aj, bi, bj) in SequenceMatcher(isjunk=None, a=a, b=b, autojunk=False).get_opcodes():
if lb == 'equal':
arep += a[ai:aj]
brep += b[bi:bj]
elif lb == 'delete':
arep += '<span class="{}">{}</span>'.format(lb, a[ai:aj])
elif lb == 'insert':
brep += '<span class="{}">{}</span>'.format(lb, b[bi:bj])
else:
arep += '<span class="{}">{}</span>'.format(lb, a[ai:aj])
brep += '<span class="{}">{}</span>'.format(lb, b[bi:bj])
return (arep, brep)
def get_vtext(v, hp):
return ''.join('{}'.format(T.text(L.d(v, 'word'), fmt='text-orig-full' if hp == 'h' else 'text-phono-full')))
def print_chunk(v1, v2, hp):
vn1 = T.nodeFromSection(v1, lang=LANG)
vn2 = T.nodeFromSection(v2, lang=LANG)
text1 = get_vtext(vn1, hp)
text2 = get_vtext(vn2, hp)
(lexdiff1, lexdiff2) = lex_diff(v1, v2)
(line1, line2) = print_diff(text1, text2)
return '''
<tr class="t tb">
<td class="vl">{b1}</td>
<td class="t {hp}">{l1}</td>
<td class="t ld"><span class="delete">{ld1}</span></td>
<td class="t ld"><span class="insert">{ld2}</span></td>
<td class="t {hp}">{l2}</td>
<td class="vl">{b2}</td>
</tr>
'''.format(
b1=print_label(v1, without_book=False),
l1=line1,
ld1=' '.join(sorted(lexdiff1)),
ld2=' '.join(sorted(lexdiff2)),
b2=print_label(v2, without_book=False),
l2=line2,
hp=hp,
)
def print_passage(cmp_list, hp):
result = []
for item in cmp_list:
result.append(print_chunk(item[0], item[1], hp))
return '\n'.join(result)
def get_lex_summ(book, my_own_lex):
result = []
for (lex, n) in sorted(my_own_lex[book].items(), key=lambda x: (-x[1], x[0])):
result.append('<span class="ld">{}</span> {}<br/>'.format(lex, n))
return '\n'.join(result)
def print_lexeme_summary(book1, book2, my_own_lex):
return '''
<tr class="t tb">
<td class="vl"> </td>
<td class="t"> </td>
<td class="t ld"><span class="delete">{ld1}</span></td>
<td class="t ld"><span class="insert">{ld2}</span></td>
<td class="t"> </td>
<td class="vl"> </td>
</tr>
'''.format(
ldr=get_lex_summ(book1, my_own_lex),
ldp=get_lex_summ(book2, my_own_lex),
)
def print_table(hp):
result = '''
<table class="t {}">
'''.format(hp)
result += print_passage(sorted(crossrefs), hp)
result += '''
</table>
'''
return result
And here we put everything together and produce the html (in fully pointed Hebrew and in a phonetic transcription) and save it to file.
html_text_h = html_file_tpl.format(
'2 Kings 19-26 and parallels [Hebrew]',
css,
print_table('h'),
)
html_text_p = html_file_tpl.format(
'2 Kings 19-26 and parallels [phonetic]',
css,
print_table('p'),
)
ht = open('kings_parallels_h.html', 'w')
ht.write(html_text_h)
ht.close()
ht = open('kings_parallels_p.html', 'w')
ht.write(html_text_p)
ht.close()
Here we read the transcribed text of 1QIsaa and store it in the variable qisa
.
info('reading 1QIsaa')
qf = open(QISA_FILE)
qisa = collections.defaultdict(lambda: collections.defaultdict(lambda: []))
nwords = 0
for line in qf:
nwords += 1
(passage, word, xword) = line.strip().split()
(chapter, verse) = passage.split(',')
qisa[int(chapter)][int(verse)].append(Transcription.to_hebrew_x(word))
qf.close()
info('{} words in {} chapters in {} verses'.format(nwords, len(qisa), sum(len(qisa[x]) for x in qisa)))
print(' '.join(qisa[1][1]))
1m 02s reading 1QIsaa 1m 02s 16862 words in 66 chapters in 1290 verses חזונ ישׁעיהו בנ אמוצ אשׁר חזה על יהודה וירושׁלמ ביומי עוזיה יותמ אחז יחזקיה מלכי יהודה
Here are the functions to make comparisons with the Qumran (1QIsaa) text.
Note that the Qumran text is unpointed, so we compare it with an unpointed representation of the Masoretic text. We also strip the marks from the s(h)in letter, so that we ignore any distinction between sin and shin in both sources.
wh = '10px'
wn = '10px'
ww = '200px'
diffhead = '''
<head>
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" />
<title></title>
<style type="text/css">
table.diff {{
font-family: Ezra SIL, SBL Hebrew, Verdana, sans-serif;
font-size: large;
text-align: right;
width: 100%;
}}
td, th {{padding-left: 4px; padding-right: 4px}}
td[nowrap] {{width: {ww}; min-width: {ww}; max-width: {ww}; }}
th.diff_next {{width: {wn}; min-width: {wn}; max-width: {wn}; }}
td.diff_next {{text-align:left; font-size: small; width: {wn}; min-width: {wn}; max-width: {wn}; }}
.diff_next {{background-color:#c0c0c0; font-size: small}}
th.diff_header {{width: {wh}; min-width: {wh}; max-width: {wh}; }}
td.diff_header {{text-align:left; width: {wh}; min-width: {wh}; max-width: {wh}; }}
.diff_header {{background-color:#e0e0e0; font-size: small}}
.diff_add {{background-color:#aaffaa}}
.diff_chg {{background-color:#ffff77}}
.diff_sub {{background-color:#ffaaaa}}
</style>
</head>
'''.format(wh=wh, wn=wn, ww=ww)
def shin(x): return x.replace(
'\uFB2A'
,'ש'
).replace(
'\uFB2B',
'ש'
)
def lines_chapter_mt(ch):
vn = T.nodeFromSection(('Isaiah', ch, 1), lang=LANG)
cn = L.u(vn, 'chapter')[0]
lines = []
for v in L.d(cn, 'verse'):
vl = F.verse.v(v)
text = T.text(L.d(v, 'word'), fmt='text-orig-plain').\
replace('\u05BE',' ').\
replace('\u05C3', '') # maqef and sof pasuq
#lines.append('{} {}'.format(vl, text))
lines.append(text)
return lines
def lines_chapter_1q(ch):
lines = []
for v in qisa[ch]:
text = ' '.join(qisa[ch][v])
#lines.append('{} {}'.format(v, shin(text.strip())))
lines.append(shin(text.strip()))
return lines
def compare_chapters(c1, c2, lb1, lb2, head=True):
dh = difflib.HtmlDiff(wrapcolumn=50)
table_html = dh.make_table(
c1,
c2,
fromdesc=lb1,
todesc=lb2,
context=False,
numlines=5,
)
htext = '''<html>{}<body>{}</body></html>'''.format(diffhead, table_html) if head else table_html
return htext
def mt1q_chapter_diff(ch, head=True):
lines_mt = lines_chapter_mt(ch)
lines_1q = lines_chapter_1q(ch)
return compare_chapters(
lines_mt,
lines_1q,
'Isaiah {} MT'.format(ch),
'Isaiah {} 1QIsa<sup>a</sup>'.format(ch),
head=head,
)
And next we produce the actual html results.
info('Writing chapter diffs')
for ch in range(37,40):
ht = open('Isaiah-mt-1QIsaa_{}.html'.format(ch), 'w')
ht.write(mt1q_chapter_diff(ch))
ht.close()
# Now the whole of Isaiah
info('Writing whole Isaiah')
ht = open('Isaiah-mt-1QIsaa.html', 'w')
ht.write('''<html>{}<body>'''.format(diffhead))
for ch in range (1, 67):
ht.write(mt1q_chapter_diff(ch, head=False))
ht.write('''</body></html>''')
ht.close()
info('Done')
1m 07s Writing chapter diffs 1m 07s Writing whole Isaiah 1m 10s Done