The Art of Literary Text Analysis (ALTA) has three objectives.
First, to introduce concepts and methodologies for literary text analysis programming. It doesn't assume you know how to program or how to use digital tools for analyzing texts.
Second, to show a range of analytical techniques for the study of texts. While it cannot explain and demonstrate everything, it provides a starting point for humanists with links to other materials.
Third, to provide utility notebooks you can use for operating on differents of texts. These are less well documented and combine ideas from the introductory notebooks.
This instance of The Art of Literary Text Analysis is created in iPython Notebooks based on the Python scripting language. It is a fork of the original. Other programming choices are available, and many conceptual aspects of the guide are relevant regardless of the language and implementation.
iPython Notebooks was chosen for three main reasons:
Python (the programming language used in iPython Notebooks) features extensive support for text analysis and natural language processing;
Python is a great programming language to learn for those learning to program for the first time – it's not easy, but it represents a good balance between power, speed, readability and learnability;
IPython Notebooks offers a literate programming model of writing where blocks of prose text (like this one) can be interspersed with bits of code and output allowing us to use it to write this guide in IPython and you to write up your experiments. The Art of Literary Text Analysis focuses on the thinking through of analytical processes, and the documentation-rich format offered by IPython Notebooks is well-suited to the nature of this guide and to helping you think through what you want to do.
This guide is a work in progress. It was developed over the Winter of 2015 in conjunction with a course on literary text mining at McGill. It is being forked and extended for a course in the Winter of 2016 on big data and analysis at the University of Alberta. Here is the current outline: