#!/usr/bin/env python # coding: utf-8 # # Collate outside the notebook # ## Python files, input files, output file # # --- # - Set up a PyCharm project # - Create a Python file # - Run a script # - In PyCharm # - In the terminal # - Input files # - Output file # - Exercise # # --- # # Here it is another way to run the scripts you produced in the previous tutorials (note: even if technically they mean different things, we will use interchangeably the words code, script and program). This tutorial assumes that you went already through tutorials on Collate plain texts ([1](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit5/1_collate-plain-text.ipynb) and [2](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit5/2_collate-plain-text.ipynb)) and on the different [Collation ouputs](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit5/3_collation-outputs.ipynb). Everything that we will do here, is possible also in Jupyter notebook and certain section, as *Input files* is a recap of something already seen in the previous tutorials. # # In the [Command line tutorial](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb), we have briefly seen how to run a Python program. In the terminal, type # # python myfile.py # # replacing “myfile.py” with the name of your Python program. # # ### Again on file system hygiene: directory 'Scripts' # In this tutorial, we will create Python programs. Where to save the files that you will create? Remember that [we created a directory for this workshop](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb#Create-a-directory-for-this-workshop), called 'Workshop'. Now let's create a sub-directory, called 'Scripts', to store all our Python programs. # # --- # # ## Set up a PyCharm project # # If you are using PyCharm for these exercises it is worth setting up a project that will automatically save the files you create to the 'Scripts' directory you just created (see above). To do this open PyCharm and from the *File* menu select *New Project*. In the dialogue box that appears navigate to the 'scripts' directory you made for this workshop by clicking the button with '...' on it, on the right of the *location* box. Then click *create*. This will create a new project that will save all of the files to the folder you have selected. # # ## Create a Python file # # Let's do this step by step. First of all, create a python file. # # - Open PyCharm, if you downloaded it before, or another text editor: Notepad++ for Windows or TextWrangler for Mac OS X. # - Create a new file and copy paste the code we used before: # In[1]: from collatex import * collation = Collation() collation.add_plain_witness( "A", "The quick brown fox jumped over the lazy dog.") collation.add_plain_witness( "B", "The brown fox jumped over the dog." ) collation.add_plain_witness( "C", "The bad fox jumped over the lazy dog.") table = collate(collation) print(table) # - Now save the file, as 'collate.py', inside the directory 'Scripts' (see above). If you setup a project in PyCharm then the files should automatically be saved in the correct place. # # ## Run the script # # ### In PyCharm # # - In Pycharm you can run the script using the button, or run from the menu. # - The result will appear in a window at the bottom of the page. # # ### In the terminal # # # # - Open the terminal and navigate to the folder where your script is, using the 'cd' command (again, refer to the [Command line tutorial](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit1/Command_line.ipynb), if you don't know what this means). Then type # # python collate.py # # If you are not in the directory where your script is, you should specify the path for that file. If you are in the Home directory, for example, the command would look like # # python Workshop/Scripts/collate.py # # - The result will appear below in the terminal. # # # ## Input files # # In the [first tutorial](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit5/1_collate-plain-text.ipynb), we saw how to use texts stored in files as witnesses for the collation. We used the `open` command to open each text file and appoint the contents to a variable with an appropriately chosen name; and we don't forget the `encoding="utf-8"` bit! # # Let's try to do the same in our script 'collate.py', using the data in *fixtures/Darwin/txt* (only the first paragraph: \_par1) and producing an output in XML/TEI. The code will look like this: # In[2]: from collatex import * collation = Collation() witness_1859 = open( "../fixtures/Darwin/txt/darwin1859_par1.txt", encoding='utf-8' ).read() witness_1860 = open( "../fixtures/Darwin/txt/darwin1860_par1.txt", encoding='utf-8' ).read() witness_1861 = open( "../fixtures/Darwin/txt/darwin1861_par1.txt", encoding='utf-8' ).read() witness_1866 = open( "../fixtures/Darwin/txt/darwin1866_par1.txt", encoding='utf-8' ).read() witness_1869 = open( "../fixtures/Darwin/txt/darwin1869_par1.txt", encoding='utf-8' ).read() witness_1872 = open( "../fixtures/Darwin/txt/darwin1872_par1.txt", encoding='utf-8' ).read() collation.add_plain_witness( "1859", witness_1859 ) collation.add_plain_witness( "1860", witness_1860 ) collation.add_plain_witness( "1861", witness_1861 ) collation.add_plain_witness( "1866", witness_1866 ) collation.add_plain_witness( "1869", witness_1869 ) collation.add_plain_witness( "1872", witness_1872 ) table = collate(collation, output='tei') print(table) # - Now save the file (just 'save', or 'save as' with another name, as 'collate-darwin-tei.py', if you want to keep both scripts) and then # - run the new script (*run* in PyCharm; or type *python collate.py* or *python collate-darwin-tei.py* in the terminal). This may take a bit longer than the *fox and dog* example. # - The result will appear below. # # # ## Output file # # Looking at the result this way is not very practical, especially if we want to save it. Better store the result in a new file, that we call 'outfile' (but you can give it another name if you prefer). We need to add this chunk of code, in order to create and open 'outfile': # In[3]: outfile = open('outfile.txt', 'w', encoding='utf-8') # If we are going to produce an output in XML/TEI, we can specify that 'outfile' will be a XML file, and the same goes for any other format. Here below there are two examples, the first for a XML output file, the second for a JSON output file: # In[4]: outfile = open('outfile.xml', 'w', encoding='utf-8') outfile = open('outfile.json', 'w', encoding='utf-8') # Now we add the outfile chunk to our code above. The new script is: # In[5]: from collatex import * collation = Collation() witness_1859 = open( "../fixtures/Darwin/txt/darwin1859_par1.txt", encoding='utf-8' ).read() witness_1860 = open( "../fixtures/Darwin/txt/darwin1860_par1.txt", encoding='utf-8' ).read() witness_1861 = open( "../fixtures/Darwin/txt/darwin1861_par1.txt", encoding='utf-8' ).read() witness_1866 = open( "../fixtures/Darwin/txt/darwin1866_par1.txt", encoding='utf-8' ).read() witness_1869 = open( "../fixtures/Darwin/txt/darwin1869_par1.txt", encoding='utf-8' ).read() witness_1872 = open( "../fixtures/Darwin/txt/darwin1872_par1.txt", encoding='utf-8' ).read() outfile = open('outfile-tei.xml', 'w', encoding='utf-8') collation.add_plain_witness( "1859", witness_1859 ) collation.add_plain_witness( "1860", witness_1860 ) collation.add_plain_witness( "1861", witness_1861 ) collation.add_plain_witness( "1866", witness_1866 ) collation.add_plain_witness( "1869", witness_1869 ) collation.add_plain_witness( "1872", witness_1872 ) table = collate(collation, output='tei') print(table, file=outfile) # When we run the script, the result won't appear below anymore. But a new file, 'outfile-tei.xml' has been created in the directory 'Scripts'. Check what's inside! # # If you want to change the location of the output file, you can specify a different path. If, for example, you want your output file in the Desktop, you would write # In[6]: outfile = open('C:/Users/Elena/Desktop/output.xml', 'w', encoding='utf-8') # **N.b.**: you can create an output file also running your script in the Jupyter notebook! Depending on the path you specify, it will be created in your 'Notebook' directory or elsewhere. # ## Exercise # # Create a new Python script that produces an output in JSON, using the data in 'fixtures/Woolf/Lighthouse-1' (remember? We use the same data in [another tutorial](http://nbviewer.jupyter.org/github/DiXiT-eu/collatex-tutorial/blob/master/unit5/2_collate-plain-text.ipynb)). Pay attention to indicate correctly the input files, the output file (and its extension) and the output format. # In[ ]: