To get started: consult start

# Export to Excel¶

In a notebook, you can perform searches and view them in a tabular display and zoom in on items with pretty displays.

But there are times that you want to take your results outside Text-Fabric, outside a notebook, outside Python, and just work with them in other programs, such as Excel.

You want to do that not only with query results, but with all kinds of lists of tuples of nodes.

There is a function for that, A.export(), and here we show what it can do.

In [1]:
%load_ext autoreload

In [2]:
import os
from tf.app import use

In [3]:
A = use('oldbabylonian:clone', checkout="clone", hoist=globals())
# A = use('oldbabylonian', hoist=globals())

TF-app: ~/github/annotation/app-oldbabylonian/code
data: ~/github/Nino-cunei/oldbabylonian/tf/1.0.6
Text-Fabric API: names N F E L T S C TF directly usable

# Inspect the contents of a file¶

We write a function that can peek into file on your system, and show the first few lines. We'll use it to inspect the exported files that we are going to produce.

In [4]:
EXPORT_FILE = os.path.expanduser('~/Downloads/results.tsv')
UPTO = 10

def checkout():
with open(EXPORT_FILE, encoding='utf_16') as fh:
for (i, line) in enumerate(fh):
if i >= UPTO:
break
print(line.rstrip('\n'))


# Encoding¶

Our exported .tsv files open in Excel without hassle, even if they contain non-latin characters. That is because TF writes such files in an encoding that works well with Excel: utf_16_le. You can just open them in Excel, there is no need for conversion before or after opening these files.

Should you want to process these files by means of a (Python) program, take care to read them with encoding utf_16.

# Example query¶

We first run a query in order to export the results.

In [5]:
query = '''
line ln<4
'''
results = A.search(query)

  0.62s 1052 results


# Bare export¶

You can export the table of results to Excel.

The following command writes a tab-separated file results.tsv to your downloads directory.

You can specify arguments toDir=directory and toFile=file name to write to a different file. If the directory does not exist, it will be created.

We stick to the default, however.

In [6]:
A.export(results)


Check out the contents:

In [7]:
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	ln1	NODE2	TYPE2	TEXT2	reading2	NODE3	TYPE3	TEXT3	reading3	NODE4	TYPE4	TEXT4	reading4
1	P509373	obverse	3	230790	line	um-ma _{d}en-lil2_-sza-du-u2-ni-ma	3	11	sign	um-	um	12	sign	ma 	ma	20	sign	ma	ma
2	P509374	obverse	3	230826	line	um-[ma] _dingir_ x zi x [...] ma	3	359	sign	um-	um	360	sign	[ma] 	ma	366	sign	ma	ma
3	P509375	obverse	3	230844	line	[um-ma x]-na-su-[x]-ma	3	509	sign	[um-	um	510	sign	ma 	ma	515	sign	ma	ma
4	P481190	obverse	3	230916	line	um-ma nu#-ur2#-i3-li2-szu-ma	3	1123	sign	um-	um	1124	sign	ma 	ma	1130	sign	ma	ma
5	P510526	obverse	3	231004	line	um-ma geme2-{d}utu-ma	3	1686	sign	um-	um	1687	sign	ma 	ma	1691	sign	ma	ma
6	P510527	obverse	3	231022	line	um-ma i-lu-ni-ma	3	1808	sign	um-	um	1809	sign	ma 	ma	1813	sign	ma	ma
7	P510528	obverse	3	231055	line	[um]-ma# i-lu-ni-ma	3	2161	sign	[um]-	um	2162	sign	ma# 	ma	2166	sign	ma	ma
8	P510529	obverse	3	231068	line	um-ma i-lu-ni-ma	3	2272	sign	um-	um	2273	sign	ma 	ma	2277	sign	ma	ma
9	P510530	obverse	3	231088	line	[um-ma {d}na-bi]-um#-na-s,i-ir-ma	3	2401	sign	[um-	um	2402	sign	ma 	ma	2410	sign	ma	ma


You see the following columns:

• R the sequence number of the result tuple in the result list
• S1 S2 S3 the section as P-number, face, line number, in separate columns
• NODEi TYPEi the node and its type, for each node i in the result tuple
• TEXTi the full text of node i, if the node type admits a concise text representation
• reading2-4 the value of feature reading, since our query mentions the feature reading on nodes 2-4

# Poorer exports¶

If you do not need the full text of the lines, you can leave them out by specifying a smaller condense type.

The export function provides text for all nodes whose type is not too big. What is too big is determined by the condense type.

In this corpus, the default condens type is line. Node types bigger than lines will not get text.

Now, if we change the condenseType to something smaller than line, e.g. word, the line text will be suppressed.

In [8]:
A.export(results, condenseType='word')
checkout()

R	S1	S2	S3	NODE1	TYPE1	ln1	NODE2	TYPE2	TEXT2	reading2	NODE3	TYPE3	TEXT3	reading3	NODE4	TYPE4	TEXT4	reading4
1	P509373	obverse	3	230790	line	3	11	sign	um-	um	12	sign	ma 	ma	20	sign	ma	ma
2	P509374	obverse	3	230826	line	3	359	sign	um-	um	360	sign	[ma] 	ma	366	sign	ma	ma
3	P509375	obverse	3	230844	line	3	509	sign	[um-	um	510	sign	ma 	ma	515	sign	ma	ma
4	P481190	obverse	3	230916	line	3	1123	sign	um-	um	1124	sign	ma 	ma	1130	sign	ma	ma
5	P510526	obverse	3	231004	line	3	1686	sign	um-	um	1687	sign	ma 	ma	1691	sign	ma	ma
6	P510527	obverse	3	231022	line	3	1808	sign	um-	um	1809	sign	ma 	ma	1813	sign	ma	ma
7	P510528	obverse	3	231055	line	3	2161	sign	[um]-	um	2162	sign	ma# 	ma	2166	sign	ma	ma
8	P510529	obverse	3	231068	line	3	2272	sign	um-	um	2273	sign	ma 	ma	2277	sign	ma	ma
9	P510530	obverse	3	231088	line	3	2401	sign	[um-	um	2402	sign	ma 	ma	2410	sign	ma	ma


# Richer exports¶

If we want to see the text in another format, we can specify it:

In [9]:
A.export(results, fmt='text-orig-unicode')
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	ln1	NODE2	TYPE2	TEXT2	reading2	NODE3	TYPE3	TEXT3	reading3	NODE4	TYPE4	TEXT4	reading4
1	P509373	obverse	3	230790	line	𒌝𒈠 𒀭𒂗𒆤𒊭𒁺𒌑𒉌𒈠	3	11	sign	𒌝	um	12	sign	𒈠 	ma	20	sign	𒈠	ma
2	P509374	obverse	3	230826	line	𒌝𒈠 𒀭 x 𒍣 x … 𒈠	3	359	sign	𒌝	um	360	sign	𒈠 	ma	366	sign	𒈠	ma
3	P509375	obverse	3	230844	line	𒌝𒈠 x𒈾𒋢x𒈠	3	509	sign	𒌝	um	510	sign	𒈠 	ma	515	sign	𒈠	ma
4	P481190	obverse	3	230916	line	𒌝𒈠 𒉡𒌫𒉌𒉌𒋗𒈠	3	1123	sign	𒌝	um	1124	sign	𒈠 	ma	1130	sign	𒈠	ma
5	P510526	obverse	3	231004	line	𒌝𒈠 𒊩𒆳𒀭𒌓𒈠	3	1686	sign	𒌝	um	1687	sign	𒈠 	ma	1691	sign	𒈠	ma
6	P510527	obverse	3	231022	line	𒌝𒈠 𒄿𒇻𒉌𒈠	3	1808	sign	𒌝	um	1809	sign	𒈠 	ma	1813	sign	𒈠	ma
7	P510528	obverse	3	231055	line	𒌝𒈠 𒄿𒇻𒉌𒈠	3	2161	sign	𒌝	um	2162	sign	𒈠 	ma	2166	sign	𒈠	ma
8	P510529	obverse	3	231068	line	𒌝𒈠 𒄿𒇻𒉌𒈠	3	2272	sign	𒌝	um	2273	sign	𒈠 	ma	2277	sign	𒈠	ma
9	P510530	obverse	3	231088	line	𒌝𒈠 𒀭𒈾𒁉𒌝𒈾𒍢𒅕𒈠	3	2401	sign	𒌝	um	2402	sign	𒈠 	ma	2410	sign	𒈠	ma


If we want to export additional features, we just have to mention them. In order to do so and not change the result set, put a * behind the feature.

The * means: always true, no matter what's in the feature, even if there is nothing in there.

Lets ask for the flags on the first ma.

In [10]:
query = '''
line ln<4
'''
results = A.search(query)

  0.80s 1052 results


The same number of results.

We do the export again and peek at the results.

In [11]:
A.export(results, condenseType='word')
checkout()

R	S1	S2	S3	NODE1	TYPE1	ln1	NODE2	TYPE2	TEXT2	reading2	NODE3	TYPE3	TEXT3	flags3	reading3	NODE4	TYPE4	TEXT4	reading4
1	P509373	obverse	3	230790	line	3	11	sign	um-	um	12	sign	ma 		ma	20	sign	ma	ma
2	P509374	obverse	3	230826	line	3	359	sign	um-	um	360	sign	[ma] 		ma	366	sign	ma	ma
3	P509375	obverse	3	230844	line	3	509	sign	[um-	um	510	sign	ma 		ma	515	sign	ma	ma
4	P481190	obverse	3	230916	line	3	1123	sign	um-	um	1124	sign	ma 		ma	1130	sign	ma	ma
5	P510526	obverse	3	231004	line	3	1686	sign	um-	um	1687	sign	ma 		ma	1691	sign	ma	ma
6	P510527	obverse	3	231022	line	3	1808	sign	um-	um	1809	sign	ma 		ma	1813	sign	ma	ma
7	P510528	obverse	3	231055	line	3	2161	sign	[um]-	um	2162	sign	ma# 	#	ma	2166	sign	ma	ma
8	P510529	obverse	3	231068	line	3	2272	sign	um-	um	2273	sign	ma 		ma	2277	sign	ma	ma
9	P510530	obverse	3	231088	line	3	2401	sign	[um-	um	2402	sign	ma 		ma	2410	sign	ma	ma


As you see, you have an extra column flags3.

This gives you a lot of control over the generation of spreadsheets.

# Not from queries¶

You can also export lists of node tuples that are not obtained by a query:

In [12]:
words = F.otype.s('word')[1000:1010]
signs1 = F.otype.s('sign')[100000:100010]
signs2 = F.otype.s('sign')[200000:200010]
tuples = list(zip(words, signs1, signs2))

tuples

Out[12]:
[(259163, 100001, 200001),
(259164, 100002, 200002),
(259165, 100003, 200003),
(259166, 100004, 200004),
(259167, 100005, 200005),
(259168, 100006, 200006),
(259169, 100007, 200007),
(259170, 100008, 200008),
(259171, 100009, 200009),
(259172, 100010, 200010)]

Ten rows, each row has a word node and two sign nodes.

The word and the signs in each row do not have any meaningful relationship!

Let's do a bare export:

In [13]:
A.export(tuples)
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	ln1	NODE2	TYPE2	TEXT2	reading2	NODE3	TYPE3	TEXT3	flags3	reading3
1	P510530	obverse	19	259163	word	gin2 		100001	sign	s,i-	s,i	200001	sign	ka		ka
2	P510530	obverse	19	259164	word	ku3-babbar_		100002	sign	bu-	bu	200002	sign	ki-		ki
3	P510530	obverse	20	259165	word	id-di-nu-ni-ik-ku		100003	sign	tam 	tam	200003	sign	ma 		ma
4	P510530	reverse	1	259166	word	u3 		100004	sign	ia 	ia	200004	sign	ti-		ti
5	P510530	reverse	1	259167	word	ka-ni-ik 		100005	sign	ir-	ir	200005	sign	du 		du
6	P510530	reverse	1	259168	word	szi-ma-tim 		100006	sign	szi	szi	200006	sign	ma-		ma
7	P510530	reverse	1	259169	word	ta-ad-di-in		100007	sign	i-	i	200007	sign	WI?-	?
8	P510530	reverse	2	259170	word	mi-nu-u2 		100008	sign	nu-	nu	200008	sign	i-		i
9	P510530	reverse	2	259171	word	a-wa-tum-ma 		100009	sign	ma 	ma	200009	sign	ma		ma


Wait a minute: why are the reading2 and reading3 and flags3 columns showing up?

It is because we have run a query before where we asked for these features.

If we do not want to be influenced by previous things we've run, we need to reset the display:

In [14]:
A.displayReset('tupleFeatures')


Again:

In [15]:
A.export(tuples)
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	NODE2	TYPE2	TEXT2	NODE3	TYPE3	TEXT3
1	P510530	obverse	19	259163	word	gin2 	100001	sign	s,i-	200001	sign	ka
2	P510530	obverse	19	259164	word	ku3-babbar_	100002	sign	bu-	200002	sign	ki-
3	P510530	obverse	20	259165	word	id-di-nu-ni-ik-ku	100003	sign	tam 	200003	sign	ma
4	P510530	reverse	1	259166	word	u3 	100004	sign	ia 	200004	sign	ti-
5	P510530	reverse	1	259167	word	ka-ni-ik 	100005	sign	ir-	200005	sign	du
6	P510530	reverse	1	259168	word	szi-ma-tim 	100006	sign	szi	200006	sign	ma-
7	P510530	reverse	1	259169	word	ta-ad-di-in	100007	sign	i-	200007	sign	WI?-
8	P510530	reverse	2	259170	word	mi-nu-u2 	100008	sign	nu-	200008	sign	i-
9	P510530	reverse	2	259171	word	a-wa-tum-ma 	100009	sign	ma 	200009	sign	ma


# Display setup¶

When we exported query results, we could mention features in the query with a * so that they got exported. If we do not have a previous query we can achieve the same effect by specifying the desired export features per column.

The display option tupleFeatures takes care of that.

In [16]:
A.displaySetup(tupleFeatures=(
(0, 'atfpre atf atfpost'),
(1, 'symr symu'),
(2, 'symr symu'),
))


We assign extra features per member of the tuple.

In the above case:

• the first (0) member (the word node), gets features atfpre (preceding bracketing characters), atf, atfpost (subsequent clustering characters);
• the second and third member (the sign nodes), get features symr (subscripted latin text) and symu (cuneiform unicode).
In [17]:
A.export(tuples)
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	atfpre1	atf1	atfpost1	NODE2	TYPE2	TEXT2	symr2	symu2	NODE3	TYPE3	TEXT3	symr3	symu3
1	P510530	obverse	19	259163	word	gin2 		gin2		100001	sign	s,i-	ṣi	𒍢	200001	sign	ka	ka	𒅗
2	P510530	obverse	19	259164	word	ku3-babbar_		ku3-babbar_		100002	sign	bu-	bu	𒁍	200002	sign	ki-	ki	𒆠
3	P510530	obverse	20	259165	word	id-di-nu-ni-ik-ku		id-di-nu-ni-ik-ku		100003	sign	tam 	tam	𒌓	200003	sign	ma 	ma	𒈠
4	P510530	reverse	1	259166	word	u3 		u3		100004	sign	ia 	ia	𒅀	200004	sign	ti-	ti	𒋾
5	P510530	reverse	1	259167	word	ka-ni-ik 		ka-ni-ik		100005	sign	ir-	ir	𒅕	200005	sign	du 	du	𒁺
6	P510530	reverse	1	259168	word	szi-ma-tim 		szi-ma-tim		100006	sign	szi	ši	𒅆	200006	sign	ma-	ma	𒈠
7	P510530	reverse	1	259169	word	ta-ad-di-in		ta-ad-di-in		100007	sign	i-	i	𒄿	200007	sign	WI?-	WI	𒉿
8	P510530	reverse	2	259170	word	mi-nu-u2 		mi-nu-u2		100008	sign	nu-	nu	𒉡	200008	sign	i-	i	𒄿
9	P510530	reverse	2	259171	word	a-wa-tum-ma 		a-wa-tum-ma		100009	sign	ma 	ma	𒈠	200009	sign	ma	ma	𒈠


# Chained queries¶

You can chain queries like this:

In [18]:
results = (
A.search('''
line
cluster type=missing
''')
+
A.search('''
line
cluster type=langalt
''')
)

  0.23s 4 results
0.23s 15 results


In such cases, it is better to setup the features yourself:

In [19]:
A.displaySetup(
tupleFeatures=(
(0, 'lnno'),
(1, 'type'),
(2, 'langalt missing'),
),
fmt='text-orig-rich',
)


Now we can do a fine export:

In [20]:
A.export(results)
checkout()

R	S1	S2	S3	NODE1	TYPE1	TEXT1	lnno1	NODE2	TYPE2	TEXT2	type2	NODE3	TYPE3	TEXT3	langalt3	missing3
1	P510754	obverse	11	235994	line	5(diš) lu₂-meš an-nu-ti-in lu₂ la-ga-aš⁼ki-meš	11	207965	cluster	5(diš) lu₂-meš an-nu-ti-	missing	39275	sign	5(diš) 		1
2	P313375	obverse	10	245175	line	1(u) 8(diš) gin₂ ku₃-babbar wa-at-ra-am	10	215367	cluster	1(u) 8(diš) 	missing	106850	sign	8(diš) 		1
3	P373035	reverse	3	255006	line	ša 5(diš) gin₂ ku₃-babbar gu₄ hi-a	3	223992	cluster	ša 5(diš) 	missing	180799	sign	5(diš) 	1	1
4	P373044	left	1:2	255237	line	1(u) 5(diš) ma-na siki hi-a ši-na	1:2	224274	cluster	1(u) 5(diš) ma-na 	missing	182906	sign	5(diš) 	1	1
5	P509373	obverse	6	230793	line	diš⁼še-ep-d⁼suen a₂-gal₂ dumu um-mi-a-meš	6	203234	cluster	d⁼suen a₂-gal₂ dumu um-mi-a-meš	langalt	47	sign	um-	1
6	P510550	obverse	4	231520	line	aš-šum ha-za-nu-um⁼sar	4	203984	cluster	ha-za-nu-um⁼sar	langalt	5899	sign	um⁼	1
7	P510550	obverse	7	231523	line	šum-ma ha-za-nu-um⁼sar la ba-aš-lu-ma	7	203987	cluster	ha-za-nu-um⁼sar 	langalt	5924	sign	um⁼	1
8	P510550	obverse	9	231525	line	ha-za-nu-um⁼sar i-ka-am x x-al-la-x-ma	9	203989	cluster	ha-za-nu-um⁼sar 	langalt	5941	sign	um⁼	1
9	P510550	reverse	1	231528	line	a-na mi-nim ha-za-nu-um⁼sar	1	203992	cluster	ha-za-nu-um⁼sar	langalt	5971	sign	um⁼	1


All chapters:

• start become an expert in creating pretty displays of your text structures
• display become an expert in creating pretty displays of your text structures
• search turbo charge your hand-coding with search templates