This is a tutorial on how to use Tracery in Python. Tracery is a computer language for random text generation originally developed by Kate Compton.
The easiest way to use Tracery in Python is to install pytracery, my Python port of Kate's original code. You can install it on the command line with pip
:
$ pip install tracery
If you're running this in Jupyter Notebook, you can execute the cell below:
import sys
!{sys.executable} -m pip install tracery
Requirement already satisfied: tracery in /Users/allison/opt/miniconda3/envs/rwet-2022/lib/python3.8/site-packages (0.1.1)
Then you need to import the tracery
library:
import tracery
You don't need Python to use Tracery! Here's a version of this tutorial that you can use with any implementation of Tracery. I recommend Beau Gunderson's Tracery writer as a kind of Tracery playground. You can also use Kate Compton's Tracery tutorial, which has a visual editor or Cheap Bots Done Quick, which has a built-in editor for writing Tracery grammars for Twitter bots with a minimum amount of fuss.
You might be interested in reading Nora Reed's explanation of how @nerdgarbagebot works, which takes you through the process of ideating and implementing a Tracery grammar for a Twitter bot. (Nora Reed makes a lot of amazing bots with Tracery, including @thinkpiecebot.)
A Tracery grammar is a series of rules that tell the computer how to put text together, piece by piece. Tracery grammars consist of a series of rules and expansions. The goal of writing a Tracery grammar is to write rules and expansions that, when followed by the computer, produce interesting (funny, insightful, poetic) text. The word for generating a text from a grammar is "expand"---we'll be talking a lot below about "expanding" the grammar into a text. (Hopefully the reasons for using this word will become clear!)
In Python, Tracery rules and expansions are written as dictionaries, where the rules are keys and the expansions are values. Here's an example of a complete, but very boring, Tracery grammar:
rules = {
"origin": "Hello, world!"
}
To generate text from this grammar, first create a Tracery Grammar
object like so, passing the rules as the only parameter:
grammar = tracery.Grammar(rules)
Then call the .flatten()
method of the Grammar
object with "#origin#"
as the only parameter. (I'll talk about what #origin#
means in a second.)
grammar.flatten("#origin#")
'Hello, world!'
This grammar can produce only one text: Hello, world!
. Not very interesting,
but helpful for the moment to illustrate how a grammar is put together and how to make it produce some output.
Here's a Tracery grammar with two rules, written again as a dictionary, where each rule and its expansion are key/value pairs:
rules = {
"origin": "Hello, #noun#!",
"noun": "galaxy"
}
grammar = tracery.Grammar(rules)
grammar.flatten("#origin#")
'Hello, galaxy!'
This grammar, again, can only ever produce one text: Hello, galaxy!
But it
accomplishes it in a slightly more sophisticated way. Notice in the expansion
for the origin
rule the following text:
#noun#
When the Tracery generator encounters text that looks like this---a word surrounded by
#
signs---it looks in the grammar for a rule with the same name as the word,
and replaces the text with the expansion for that rule.
Let's add a third rule to this grammar, just to see how it looks:
rules = {
"origin": "#greeting#, #noun#!",
"greeting": "Howdy",
"noun": "galaxy"
}
grammar = tracery.Grammar(rules)
grammar.flatten("#origin#")
'Howdy, galaxy!'
EXERCISE: Add another rule for the punctuation at the end of the sentence, so that the grammar produces the text "Howdy, galaxy?"
The examples above are really boring, because they can only ever produce one output. In order for a grammar to be able to produce different outputs, we need to make the expansions of our rules have alternatives for the computer to choose between. Rules with alternatives look like this:
"rule": ["alternative one", "alternative two", "alternative three"]
That is: the value of the rule is a list of strings (instead of an individual string). When Tracery expands a rule whose value is a list, it will select one item from the list at random.
Here's our "Hello, world!" grammar, now with multiple alternatives for what we're greeting:
rules = {
"origin": "#greeting#, #noun#!",
"greeting": "Howdy",
"noun": ["world", "solar system", "galaxy", "local cluster", "universe"]
}
grammar = tracery.Grammar(rules)
grammar.flatten("#origin#")
'Howdy, galaxy!'
Run the cell over and over again and you'll see different outputs. (Sometimes it'll look like it isn't working, but that's just because the computer randomly selected the same alternative twice in a row. It can happen!)
Let's make this "Hello, world!" example even more interesting by adding
alternatives for the greeting
rule:
rules = {
"origin": "#greeting#, #noun#!",
"greeting": ["Howdy", "Hello", "Greetings", "What's up", "Hey", "Hi"],
"noun": ["world", "solar system", "galaxy", "local cluster", "universe"]
}
grammar = tracery.Grammar(rules)
grammar.flatten("#origin#")
'Hello, local cluster!'
Sometimes for debugging purposes, it's nice to generate multiple outputs from the same grammar in one cell execution. To do this, print()
the value of the .flatten()
function in a for
loop. (You don't have to re-create the Grammar
object each time.)
rules = {
"origin": "#greeting#, #noun#!",
"greeting": ["Howdy", "Hello", "Greetings", "What's up", "Hey", "Hi"],
"noun": ["world", "solar system", "galaxy", "local cluster", "universe"]
}
grammar = tracery.Grammar(rules)
for i in range(5):
print(grammar.flatten("#origin#"))
What's up, world! Greetings, solar system! Hey, local cluster! Howdy, local cluster! What's up, universe!
Remember that in Python you can format dictionaries and lists with some flexibility. For example, your grammar might be a bit more readable if you write each option on a separate line:
rules = {
"origin": "#greeting#, #noun#!",
"greeting": [
"Howdy",
"Hello",
"Greetings",
"What's up",
"Hey",
"Hi"
],
"noun": [
"world",
"solar system",
"galaxy",
"local cluster",
"universe"
]
}
grammar = tracery.Grammar(rules)
grammar.flatten("#origin#")
"What's up, local cluster!"
You don't always have to write the expansions as string literals and list literals. You can use a variable with a list assigned to it, for example—this is especially helpful if you have a long list of things that you plan to use in multiple grammars, or if you want to get the list of things from another source (e.g., a text file).
Let's make a more sophisticated grammar that produces sentences in the format "Dammit Jim, I'm a X, not a Y!" popularized by the ground-breaking science fiction program, Star Trek. I happen to have a list of professions, which I'm going to put into a variable here. (I got this list from Darius Kazemi's Corpora Project—an excellent place to find lists of things. And they're already preformatted in a way that makes it easy to cut-and-paste them into your Tracery grammars.)
professions = [
"accountant",
"actor",
"archeologist",
"astronomer",
"audiologist",
"bartender",
"curator",
"detective",
"economist",
"editor",
"engineer",
"epidemiologist",
"farmer",
"flight attendant",
"forest fire prevention specialist",
"graphic designer",
"hydrologist",
"librarian",
"mathematician",
"middle school teacher",
"nutritionist",
"painter",
"rancher",
"referee",
"reporter",
"sailor",
"sociologist",
"stonemason",
"surgeon",
"tailor",
"taxi driver",
"teacher",
"therapist",
"tour guide",
"umpire",
"undertaker",
"urban planner",
"veterinarian",
"web developer",
"welder",
"writer",
"zoologist"
]
The grammar for generating our Star Trek phrase might look like this:
rules = {
"origin": "#interjection#, #name#! I'm a #profession#, not a #profession#!",
"interjection": ["alas", "congratulations", "eureka", "fiddlesticks",
"good grief", "hallelujah", "oops", "rats", "thanks", "whoa", "yes"],
"name": ["Jim", "John", "Tom", "Steve", "Kevin", "Gary", "George", "Larry"],
"profession": professions
}
grammar = tracery.Grammar(rules)
grammar.flatten("#origin#")
"congratulations, George! I'm a graphic designer, not a audiologist!"
This is pretty good, but there are problems. The first is that we typed in all of the interjections in lower case, but they're supposed to have the first letter capitalized (since they're at the beginning of the sentence). The second problem is that the grammar occasionally produces something like
yes, George! I'm a economist, not a zoologist!
"A economist" isn't right. It should be "an economist." English indefinite articles are tricky that way!
There are several ways to solve these problems. We could just change all of our interjections to be capitalized, and add the appropriate article to the beginning of each profession. But (1) this will be time consuming and (2) it means that we won't ever be able to re-use those same rules with the unmodified versions of those rules. What to do?
Thankfully, Tracery comes equipped with a series of modifiers that take the expansion of a rule and apply a transformation to it. The modifiers are included with pytracery, but they're in a separate module, so you need to import them in their own import statement:
from tracery.modifiers import base_english
And then you have to explicitly "add" them to the Grammar
object after you create it, like so:
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
The two modifiers we're going to use are .a
, which adds the appropriate indefinite article before the expansion of a rule, and .capitalize
, which capitalizes the first letter of the expansion.
Use the modifers by adding .a
inside the #
signs, right after the name of the rule. For example, change:
#interjection#
to
#interjection.capitalize#
Here's our "Dammit Jim" generator with the modifiers in place:
rules = {
"origin": "#interjection.capitalize#, #name#! I'm #profession.a#, not #profession.a#!",
"interjection": ["alas", "congratulations", "eureka", "fiddlesticks",
"good grief", "hallelujah", "oops", "rats", "thanks", "whoa", "yes"],
"name": ["Jim", "John", "Tom", "Steve", "Kevin", "Gary", "George", "Larry"],
"profession": professions
}
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
grammar.flatten("#origin#")
"Whoa, Kevin! I'm a flight attendant, not an editor!"
Nice! Another modifier you can use is .s
, which turns the text in the expansion into its plural version. Using this, we can modify the above example to be a Star Wars meme instead of a Star Trek one:
rules = {
"origin": "These aren't the #profession.s# we're looking for.",
"profession": professions
}
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
grammar.flatten("#origin#")
"These aren't the librarians we're looking for."
origin
rule¶By convention, the "starting" rule of Tracery grammars is called origin
. A lot of tools that use Tracery grammars follow this convention, and for ease of interoperability it's probably best if you do too. But you can actually use any name you want, as long as you use that name in the call to .flatten()
. For example, we could rewrite the above example like so:
rules = {
"origin": "These aren't the #profession.s# we're looking for.",
"profession": professions
}
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
grammar.flatten("#origin#")
"These aren't the referees we're looking for."
Like any other rule, the "starting" rule can have multiple options. We could use this to, for example, create a grammar that outputs Star Wars memes half the time and Star Trek memes the other half:
rules = {
"origin": ["#interjection.capitalize#, #name#! I'm #profession.a#, not #profession.a#!",
"These aren't the #profession.s# we're looking for."],
"interjection": ["alas", "congratulations", "eureka", "fiddlesticks",
"good grief", "hallelujah", "oops", "rats", "thanks", "whoa", "yes"],
"name": ["Jim", "John", "Tom", "Steve", "Kevin", "Gary", "George", "Larry"],
"profession": professions
}
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
for i in range(10):
print(grammar.flatten("#origin#"))
These aren't the audiologists we're looking for. Good grief, Larry! I'm a graphic designer, not a tour guide! Alas, George! I'm an audiologist, not an audiologist! These aren't the detectives we're looking for. Hallelujah, George! I'm a tailor, not a mathematician! These aren't the zoologists we're looking for. These aren't the hydrologists we're looking for. Eureka, Jim! I'm a graphic designer, not a rancher! These aren't the zoologists we're looking for. These aren't the bartenders we're looking for.
The grammars we've written together so far have replacement syntax (#somethinglikethis#
) only in the expansions for the origin
rule. But you can include that syntax in any expansion you want! This is a powerful tool for building sophisticated grammars that are built up from reusable parts. For example, this tiny model of English reuses the noun
and verb
rules in multiple places, thereby preventing repetition and increasing expressiveness.
rules = {
"origin": "#nounphrase.capitalize# #verbphrase#.",
"nounphrase": ["the #noun#", "the #noun#", "#noun.a#", "#noun.a#", "the #noun# that #verbphrase#",
"the #noun# #prep# #nounphrase#"],
"verbphrase": ["#verb#", "#verb# #nounphrase#", "#verb# #prep# #nounphrase#"],
"noun": ["amoeba", "dichotomy", "seagull", "trombone", "corsage", "restaurant", "suburb"],
"verb": ["awakens", "bends", "burns", "closes", "expands", "fails", "fractures", "gathers",
"melts", "opens", "ripens", "scatters", "stops", "sways", "turns", "unfurls", "worries"],
"prep": ["in", "on", "over", "against"]
}
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
for i in range(10):
print(grammar.flatten("#origin#"))
A dichotomy worries against a dichotomy. An amoeba awakens the dichotomy that opens an amoeba. The corsage that worries ripens on a trombone. A trombone melts over an amoeba. The restaurant that ripens turns. The seagull on the seagull over the trombone on the trombone closes on the suburb on an amoeba. The restaurant that gathers the corsage unfurls. An amoeba bends a dichotomy. A corsage fails on the seagull that worries on the amoeba. The trombone that sways against a suburb fails.
All of the examples we've looked so far in this notebook have used literal strings and lists as the expansions for rules (i.e., the values for keys in the grammar dictionary). If you're using Tracery in Python, there are some techniques you can use for loading expansions from external data sources instead. This is a good option if you have a large number of alternatives for a particular expansion.
Let's say that you have a text file which has one alternative per line, like this list of adjectives from my plaintext example files repository. To use this, first download the file into the same directory as this notebook. Then execute the following cell to load the file in as a list, with one element per line in the file:
adjs = open("adjs.txt").read().split("\n")
Now you have an array of adjectives. Let's take a peek inside to make sure we've loaded the file correctly.
adjs[100:110]
['bolstered', 'bonnie', 'bored', 'boundary', 'bounded', 'bounding', 'branched', 'brawling', 'brazen', 'breeding']
Having loaded in this list, you can now use it as the expansion for a rule. To do this, put the variable name of the list as the rule expansion in the grammar. Here, I've adapted the code from the grammar above to incorporate a new adj
rule, whose expansion is the list of adjectives. I've also added references to the adj
rule in various expansions for the nounphrase
and verbphrase
rules:
rules = {
"origin": "#nounphrase.capitalize# #verbphrase#.",
"nounphrase": ["the #noun#", "the #adj# #noun#", "#noun.a#", "#adj.a# #noun#", "the #noun# that #verbphrase#",
"the #noun# #prep# #nounphrase#"],
"verbphrase": ["#verb#", "#verb# #nounphrase#", "#verb# #prep# #nounphrase#", "is #adj#"],
"noun": ["amoeba", "dichotomy", "seagull", "trombone", "corsage", "restaurant", "suburb"],
"verb": ["awakens", "bends", "burns", "closes", "expands", "fails", "fractures", "gathers",
"melts", "opens", "ripens", "scatters", "stops", "sways", "turns", "unfurls", "worries"],
"prep": ["in", "on", "over", "against"],
"adj": adjs
}
grammar = tracery.Grammar(rules)
grammar.add_modifiers(base_english)
for i in range(10):
print(grammar.flatten("#origin#"))
The floral suburb is uninvited. A hands-off seagull is arrested. The seagull burns. The dichotomy in the suburb that ripens the suburb that closes on the trombone on a dichotomy scatters on a layered seagull. A mated amoeba melts a nonsense corsage. A seagull awakens against a voluptuous corsage. The trombone on the dichotomy burns. A graven suburb burns the trombone over the corsage against the medical trombone. An uncooperative amoeba awakens. The trombone on the dichotomy against the seagull against the dichotomy against the intern corsage is robust.
To do this for other rules, copy the line of code above that loads in the adjectives, and change the filename to a different file with one entry per line. (Also make sure to make a different variable name!)
Congratulations, you now know the basics of writing a Tracery grammar and how to use them in Python.
Tracery has a number of features that we didn't go into here, including the ability to save the output of a rule to be re-used later in the same expansion. See Kate Compton's tutorial for more information. You might be interested in these advanced text generators that Kate Compton made with Tracery.
If you're a Javascript programmer and you want to incorporate Tracery into your own projects, the source code is available here (also available as a Node module).