Tony Hirst
Computing and Communications
Jupyter notebooks are a browser based, interactive environment widely used in computational research and commercial data science environments. Notebooks provide a blended research environment within which rich text (HTML), code and outputs generated from code, such as charts and tables, or even static or interactive maps, can be created and displayed.
Using simple analysis scripts, typically written using Python or R, researchers can create narrated, readable, reproducible, shareable research workflows as well as all-in-one “text+source code” versions of research papers, where the code used to produce analyses reported in the paper is self-contained within the notebook itself.
In this workshop you will have an opportunity to see how notebooks can be used and even try them out yourself. All you need is a desktop computer, laptop, tablet or even phone running a Chrome or Firefox browser.
Can you:
Is there:
...one possible interface you can use to access Jupyter computing environments.
Code is entered and executed via code cells. The execution environment is determined by the notebook kernel attached to the notebook.
This notebook has been associated with an Python kernel. Which means we can write — and execute — Python code in the cells:
print("hello world")
hello world
A wide range of packages support analysis of English, as well as classical, texts.
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
displacy.render(doc, style="dep")
text = "The current Chancellor of the Open University is Martha Lane Fox."
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
displacy.render(doc, style="ent")
(See also: CLTK - Classical Languages Text Analysis)
import nltk
#nltk.download('gutenberg')
#nltk.download('punkt')
nltk.corpus.gutenberg.fileids()
[nltk_data] Downloading package punkt to /Users/tonyhirst/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip.
['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt', 'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt', 'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']
from nltk.corpus import gutenberg
macbeth_sentences = gutenberg.sents('shakespeare-macbeth.txt')
macbeth_sentences[:2]
[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'], ['Actus', 'Primus', '.']]
# Load full text
macbeth = nltk.Text(nltk.corpus.gutenberg.words('shakespeare-macbeth.txt'))
# Text concordances
macbeth.concordance("spot")
Displaying 2 of 2 matches: er of an houre Lad . Yet heere ' s a spot Doct . Heark , she speaks , I will s ce the more strongly La . Out damned spot : out I say . One : Two : Why then '
macbeth.dispersion_plot(['Macbeth', 'Macduff', 'Banquo'])
Computing environments where you guarantee that certain software packages:
# Reproducible Computing Environments Using Jupyter Tools
- MyBinder
- repo2docker