#!/usr/bin/env python
# coding: utf-8

# ## Collating for real with Collatex (2)
# Here we can repeat the same steps done in the previous exercise, with a new and slightly more complicated text case. You can create a new notebook for this exercise and follow the instructions below.
# 
# We will be using different editions of Virginia Woolf's "To the lighthouse":
# - USA = New York: Harcourt, Brace & Company, 1927 (1st USA edition)
# - UK = Londond: R & R Clark Limited, 1827 (1st UK edition)
# - EM (EVERYMAN) = London: J. M. Dent & Sons LTD, 1938 (reprint 1952)
# 
# The facsimiles and trascriptions of the editions are available at http://woolfonline.com/ 

# ### First exercise
# Try to reproduce what you have done with the Darwin text.

# Import the *collatex* Python library

# In[2]:


from collatex import *


# Create a collation object

# In[3]:


collation = Collation()


# Now open the texts in "../fixtures/Woolf/Lighthouse-1" and let Python read them

# In[4]:


witness_USA = open( "../fixtures/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../fixtures/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../fixtures/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read()


# Add them to the CollateX instance as witnesses

# In[5]:


collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )


# Align, using the HTML output option

# In[6]:


alignment_table = collate(collation, layout='vertical', output='html')


# If you want to know more about this text:
# 
# Look at the sources
# - USA, p. 14
# http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=19&content=2817&pos=15
# - UK, pp. 16-17
# http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=20&content=3139&pos=19
# - EVERYMAN, p. 7
# http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=22&content=3804&pos=24
# 
# Start thinking about how to handle situation like the following (proof with corrections in the margin)
# - http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=18&content=4172&pos=14
# 

# ### Second exercise
# In the second exercise, repeat the previous steps, now using the texts at "../fixtures/Woolf/Lighthouse-2" and visualizing the output with the more sophisticated HTML option (HTML2).
# 

# In[6]:


collation = Collation()
witness_USA = open( "../fixtures/Woolf/Lighthouse-2/Lighthouse-2-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../fixtures/Woolf/Lighthouse-2/Lighthouse-2-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../fixtures/Woolf/Lighthouse-2/Lighthouse-2-EM.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )
alignment_table = collate(collation, output='html2')


# ### What's next?
# 
# Look at the results above and think about what you want to consider in your collation.
# 
# - Do you want, for example, to consider punctuation? We will talk about how to prepare and normalize the witnesses in Unit 6.
# 
# - What about if you consider a messy manuscript as one of the witnesses for the collation ... Something like this
# http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=6&taxa=26&content=5473&pos=8&search=Not%20a%20breath%20of%20wind%20blew&exact ... Do you recognize it? Is another witness of the text used in Exercise 2.    In order to register *all the texts* on this page, scholars use markup, and in particular XML-TEI. We will talk about collation of XML documents in Unit 7.
# 
# 
# 
# 

# In[ ]: