#!/usr/bin/env python # coding: utf-8 # ## Collating for real with Collatex (2) # Here we can repeat the same steps done in the previous exercise, with a new and slightly more complicated text case. You can create a new notebook for this exercise and follow the instructions below. # # We will be using different editions of Virginia Woolf's "To the lighthouse": # - USA = New York: Harcourt, Brace & Company, 1927 (1st USA edition) # - UK = Londond: R & R Clark Limited, 1827 (1st UK edition) # - EM (EVERYMAN) = London: J. M. Dent & Sons LTD, 1938 (reprint 1952) # # The facsimiles and trascriptions of the editions are available at http://woolfonline.com/ # ### First exercise # Try to reproduce what you have done with the Darwin text. # Import the *collatex* Python library # In[2]: from collatex import * # Create a collation object # In[3]: collation = Collation() # Now open the texts in "../fixtures/Woolf/Lighthouse-1" and let Python read them # In[4]: witness_USA = open( "../fixtures/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read() witness_UK = open( "../fixtures/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read() witness_EM = open( "../fixtures/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read() # Add them to the CollateX instance as witnesses # In[5]: collation.add_plain_witness( "USA", witness_USA ) collation.add_plain_witness( "UK", witness_UK ) collation.add_plain_witness( "EM", witness_EM ) # Align, using the HTML output option # In[6]: alignment_table = collate(collation, layout='vertical', output='html') # If you want to know more about this text: # # Look at the sources # - USA, p. 14 # http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=19&content=2817&pos=15 # - UK, pp. 16-17 # http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=20&content=3139&pos=19 # - EVERYMAN, p. 7 # http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=22&content=3804&pos=24 # # Start thinking about how to handle situation like the following (proof with corrections in the margin) # - http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=2&taxa=18&content=4172&pos=14 # # ### Second exercise # In the second exercise, repeat the previous steps, now using the texts at "../fixtures/Woolf/Lighthouse-2" and visualizing the output with the more sophisticated HTML option (HTML2). # # In[6]: collation = Collation() witness_USA = open( "../fixtures/Woolf/Lighthouse-2/Lighthouse-2-USA.txt", encoding='utf-8' ).read() witness_UK = open( "../fixtures/Woolf/Lighthouse-2/Lighthouse-2-UK.txt", encoding='utf-8' ).read() witness_EM = open( "../fixtures/Woolf/Lighthouse-2/Lighthouse-2-EM.txt", encoding='utf-8' ).read() collation.add_plain_witness( "USA", witness_USA ) collation.add_plain_witness( "UK", witness_UK ) collation.add_plain_witness( "EM", witness_EM ) alignment_table = collate(collation, output='html2') # ### What's next? # # Look at the results above and think about what you want to consider in your collation. # # - Do you want, for example, to consider punctuation? We will talk about how to prepare and normalize the witnesses in Unit 6. # # - What about if you consider a messy manuscript as one of the witnesses for the collation ... Something like this # http://woolfonline.com/?node=content/text/transcriptions&project=1&parent=6&taxa=26&content=5473&pos=8&search=Not%20a%20breath%20of%20wind%20blew&exact ... Do you recognize it? Is another witness of the text used in Exercise 2. In order to register *all the texts* on this page, scholars use markup, and in particular XML-TEI. We will talk about collation of XML documents in Unit 7. # # # # # In[ ]: