Speculative magic words workbook

By Allison Parrish

(Early draft, incomplete, under construction gif here)

The goal of this notebook is to demonstrate some computational means for exploring the literary genre of the magic word. For present purposes, I define a "magic word" as a string of letters that affords a foregrounding of its material properties (e.g., spelling, pronunciation), and suggests some effect beyond meaning alone. The underlying assumption (maybe faulty) is that magic words with similar material properties will also have similar effects, and that by writing computer programs to produce magic words (whether from whole cloth or as variants on other magic words), we can produce new magic words with new effects.

I don't understand this notebook as a way of casting spells, but merely as a way of investigating potential forms. Hence: speculative magic words.

The notebook serves as a demonstration of (1) Python string manipulation techniques; and (2) the Pincelate library for grapheme-to-phoneme and phoneme-to-grapheme translation.

Preliminaries

Some of these examples will be data-driven, i.e., we need an existing corpus of words. Download this file into the same folder as this notebook like so:

In [332]:
!curl -L -O https://raw.githubusercontent.com/dariusk/corpora/master/data/words/nouns.json
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18192  100 18192    0     0  94259      0 --:--:-- --:--:-- --:--:-- 94750

The file contains a list of English nouns. The code in the cell below reads them into a list. We'll use this list throughout in the code below.

In [3]:
import json
nouns = [item.lower() for item in json.load(open("nouns.json"))['nouns']]

The random module has a function choice that picks one item from a list at random:

In [4]:
import random
In [5]:
random.choice(nouns)
Out[5]:
'mediator'

Orthographic variations

"[W]riting gave physical permanence to words.... Written words continued to act in one's behalf long after the sound of spoken words had ceased" (Skemer 133)

"Motion terminates at no other end save its own beginning, in order to cease and rest in it... In the intelligible world... Grammar begins with the letter, from which all writing is derived and into which it is all resolved" (John Scotus Erigena, quoted in Leggott 46)

"[T]he unit of textual meaning—the letter—lacks meaning itself. The alphabet's semantic vacuum represents a threat to orthodoxy, for into this space competing meaning systems may rush." (Crain 18)

The words in many apotropaic charms exhibit certain kinds of manipulation that we can characterize as orthographic in nature—i.e., they have to do with the letters in the words. In this section of the notebook, I show some computer code for performing these transformations explicitly.

The following cell defines a short text that we'll use for testing purposes:

In [6]:
text = "in the beginning was the notebook"

Cacography

"In medieval manuscripts, the letters themselves were frequently a source of confusion. [...] The first letters of words can be omitted... while others are doubled up.... Words can be dislocated," "compounded," "contracted," "abbreviated"; "letters vanish. [...] [W]e should also mention the variations made with uppercase and lowercase letters.... May this overview give the reader a small idea of the difficulties encountered by the researcher!" (Lecouteux xxi)

"Cacography" here means writing with mistakes. In medieval grimoires, mistakes were usually introduced as errors in copying, but the presence of errors actually made people perceive the spells as more powerful. We can simulate these errors in Python.

Compounding/contracting words

This operation "contracts" two words, smooshing together the first and last parts.

In [7]:
noun1 = random.choice(nouns)
noun2 = random.choice(nouns)
In [8]:
print(noun1, noun2)
spoiler intercession
In [9]:
noun1[:int(len(noun1)/2)] + noun2[int(len(noun2)/2):]
Out[9]:
'spoession'

In function form:

In [10]:
def smoosh(a, b):
    return a[:int(len(a)/2)] + b[int(len(b)/2):]
smoosh("allison", "parrish")
Out[10]:
'allrish'

Dislocation

This operation inserts random spaces, dislocating words from each other.

In [11]:
out = ""
for ch in text:
    if random.random() < 0.1:
        out += " "
    out += ch
print(out)
in  the  beginning was the notebo ok

As a function:

In [12]:
def dislocate(s, prob=0.1):
    out = ""
    for ch in s:
        if random.random() < prob:
            out += " "
        out += ch
    return out
dislocate("abracadabra")
Out[12]:
'ab rac adabra'
In [13]:
dislocate("abracadabra", 0.75)
Out[13]:
' a b r ac ad a b r a'

Coding, transliteration, encryption

Another strategy for producing magic words is transliterating them (e.g., converting Greek letters to their Roman equivalent) or applying ciphers (like a substitution cipher, in which each letter is replaced with another letter). These techniques retain the underlying structure of the spelling, so the resulting form doesn't look entirely random. But it doesn't retain the surface form—it makes the familiar unfamiliar.

Character ciphers

The function below implements simple character replacement:

In [16]:
def replace_by_char(s, ch_map):
    out = ""
    for ch in s:
        if ch in ch_map:
            out += ch_map[ch]
        else:
            out += ch
    return out

You need to give the function a dictionary that maps any letter expected in the input to a corresponding letter to output. This dictionary maps each letter to the letter that follows it in the alphabet:

In [17]:
nextch_map = {
    'a': 'b', 'b': 'c', 'c': 'd', 'd': 'e',
    'e': 'f', 'f': 'g', 'g': 'h', 'h': 'i',
    'i': 'j', 'j': 'k', 'k': 'l', 'l': 'm',
    'm': 'n', 'n': 'o', 'o': 'p', 'p': 'q',
    'q': 'r', 'r': 's', 's': 't', 't': 'u',
    'u': 'v', 'v': 'w', 'w': 'x', 'x': 'y',
    'y': 'z', 'z': 'a'
}

Call it on a string:

In [18]:
replace_by_char("allison parrish", nextch_map)
Out[18]:
'bmmjtpo qbssjti'

A well-known cipher in computer programming culture is rot13, in which each character is replaced with the character that comes thirteen spots later in the alphabet (wrapping around the end of the alphabet as needed). It's so common, it's already implemented in Python:

In [19]:
import codecs
codecs.encode("allison parrish", 'rot13')
Out[19]:
'nyyvfba cneevfu'

Mirror writing

"According to legend, some devil-pacts were written in retrograde to invoke diabolical powers. [...] Artists depicted retrograde writing as demonic. In a 15th c. block book, a demon is shown holding up a tablet on which the sins of the dying man's life are recorded in mirror writing..." (Skemer 121)

In [20]:
# from https://github.com/combatwombat/Lunicode.js/blob/master/lunicode.js
mirror_replacements = {
    'a': 'ɒ', 'b': 'd', 'c': 'ɔ', 'd': 'b', 'e': 'ɘ', 
    'f': 'Ꮈ', 'g': 'ǫ', 'h': 'ʜ', 'i': 'i', 'j': 'ꞁ',
    'k': 'ʞ', 'l': 'l', 'm': 'm', 'n': 'ᴎ', 'o': 'o',
    'p': 'q', 'q': 'p', 'r': 'ɿ', 's': 'ꙅ', 't': 'ƚ',
    'u': 'u', 'v': 'v', 'w': 'w', 'x': 'x', 'y': 'ʏ', 'z': 'ƹ',
    'A': 'A', 'B': 'ᙠ', 'C': 'Ɔ', 'D': 'ᗡ', 'E': 'Ǝ',
    'F': 'ꟻ', 'G': 'Ꭾ', 'H': 'H', 'I': 'I', 'J': 'Ⴑ',
    'K': '⋊', 'L': '⅃', 'M': 'M', 'N': 'Ͷ', 'O': 'O',
    'P': 'ꟼ', 'Q': 'Ọ', 'R': 'Я', 'S': 'Ꙅ', 'T': 'T',
    'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X', 'Y': 'Y', 'Z': 'Ƹ'}
In [21]:
print(text + " " + replace_by_char(text, mirror_replacements))
in the beginning was the notebook iᴎ ƚʜɘ dɘǫiᴎᴎiᴎǫ wɒꙅ ƚʜɘ ᴎoƚɘdooʞ

Mimicking handwriting mistakes and misinterpretations

Magic words gain power from being copied over and over; mistakes creep in that make the words strange. Lecouteux (p. xxi) suggests that the following accidental replacements were common in medieval manuscripts written in Roman scripts:

In [24]:
# suggested in Lecouteux, p. xxi
replacements = {
    'u': ['o', 'n'],
    'st': ['h'],
    'p': ['f'],
    'ni': ['m'],
    'rn': ['m'],
    'in': ['m'],
    'iu': ['m', 'in'],
    'r': ['t', 'z', 'c'],
    'l': ['t'],
    'c': ['t'],
    'd': ['ol']
}
In [23]:
import re
import random

These replacements have to be implemented a bit differently from the character substitution ciphers, because the patterns on the left have varying numbers of characters. So we can't just step straight through the source string character by character. The following code replaces every instance of sequences of characters on the left (dictionary keys) at random from the suggested replacements on the right (dictionary values), if a coin flip succeeds.

In [26]:
out = text
for patt, repl in replacements.items():
    out = re.sub(patt,
                 lambda m: random.choice(repl) if random.random() < 0.5 else m.group(),
                 out)
print(text)
print(out)
in the beginning was the notebook
in the begmmng was the notebook

Abbreviations

[In magic spells] "we find sequences of letters that can be the initials of words. [...] A passage from the Gesta Imperatorum suggests this; in fact we read there the sequence "P P P, S S S, R R R, F F F," meaning, "Pater patriae perditur, sapientia secum sustollitur, ruunt regna Rome ferro, flamma, fame." The series of letters would therefore be a mnemonic means used to retain whole phrases, but in charms it also serves as a way to keep things secret..." (Lecouteux xx)

The following function takes a string and returns the first n characters of each word in the string (as a list).

In [27]:
def abbrev(s, take=1):
    words = s.split()
    return [w[:take] for w in words]
abbrev("hello there how are you?")
Out[27]:
['h', 't', 'h', 'a', 'y']
In [28]:
abbrev(text, 2)
Out[28]:
['in', 'th', 'be', 'wa', 'th', 'no']
In [29]:
print(''.join(abbrev(text, 2)))
inthbewathno
In [30]:
init_cap = [item.capitalize() for item in abbrev(text, 2)]
print('. '.join(init_cap))
In. Th. Be. Wa. Th. No

Formatting

According to Skemer, magic words and formulas such as abracadabra and abraxas were "often written as diminishing and augmenting series of letters"—shaped in "inverted triangles" or "mandorlas" (116).

The following function implements a word triangle, in which the word is spelled out letter-by-letter, with each spelling on its own line (returned as a list). It's demonstrated here with a second call that reverses the order, creating a mandorla.

In [31]:
def triangle(s):
    out = []
    for i in range(len(s)):
        snippet = s[:i+1]
        out.append(snippet)
    return out
print("\n".join(triangle("abracadabra")))
print("\n".join(reversed(triangle("abracadabra"))))
a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a

The mandorla function performs both steps:

In [32]:
def mandorla(s):
    return triangle(s)[:-1] + list(reversed(triangle(s)))
In [33]:
print("\n".join(mandorla("abracadabra")))
a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a

Jupyter Notebook displays text in a fixed-width font by default, so centering doesn't work very well. Instead, we'll write the lines out as HTML and display with Jupyter Notebook's HTML widget:

In [34]:
from IPython.display import display, HTML
In [35]:
html_src = "<div style='text-align: center'>"
html_src += "<br>".join(mandorla("abracadabra"))
html_src += "</div>"
In [36]:
display(HTML(html_src))
a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a

Here's a mandorla of the word abracadabra followed by its mirror replacement:

In [37]:
html_src = "<div style='text-align: center'>"
html_src += "<br>".join(mandorla("abracadabra" + replace_by_char("abracadabra", mirror_replacements)))
html_src += "</div>"
In [38]:
display(HTML(html_src))
a
ab
abr
abra
abrac
abraca
abracad
abracada
abracadab
abracadabr
abracadabra
abracadabraɒ
abracadabraɒd
abracadabraɒdɿ
abracadabraɒdɿɒ
abracadabraɒdɿɒɔ
abracadabraɒdɿɒɔɒ
abracadabraɒdɿɒɔɒb
abracadabraɒdɿɒɔɒbɒ
abracadabraɒdɿɒɔɒbɒd
abracadabraɒdɿɒɔɒbɒdɿ
abracadabraɒdɿɒɔɒbɒdɿɒ
abracadabraɒdɿɒɔɒbɒdɿ
abracadabraɒdɿɒɔɒbɒd
abracadabraɒdɿɒɔɒbɒ
abracadabraɒdɿɒɔɒb
abracadabraɒdɿɒɔɒ
abracadabraɒdɿɒɔ
abracadabraɒdɿɒ
abracadabraɒdɿ
abracadabraɒd
abracadabraɒ
abracadabra
abracadabr
abracadab
abracada
abracad
abraca
abrac
abra
abr
ab
a

Word squares

The Sator Square:

S A T O R
A R E P O
T E N E T
O P E R A
R O T A S

"Arepo the sower guides the wheels by his work" (Skemer's translation, pp. 116–117), an example of an apotropaic formula that "clearly worked best in writing" (134).

We can generate random word squares in Python.

The function below creates a string of random characters of the given length, using the specified alphabet.

In [39]:
def gen_str(n, alphabet):
    return ''.join([random.choice(alphabet) for i in range(n)])
gen_str(5, alphabet="abcdefghijklmnopqrstuvwxyz")
Out[39]:
'yhmok'

You can pick the letters in the alphabet, and the word will be constrained to contain only those letters.

In [193]:
gen_str(5, alphabet="abracadabra")
Out[193]:
'aarad'

The gen_square function generates random word squares of size n using the given alphabet. ("Letter square" might be a better term here, since the function is not guaranteed to produce valid "words.")

In [41]:
def gen_square(n, alphabet='abcdefghijklmnopqrstuvwxyz', start=None):
    if start is None:
        rows = [gen_str(n, alphabet)]
    else:
        assert len(start) == n
        rows = [start]
    for i in range(int(n/2)):
        beg = ""
        end = ""
        for j in range(i+1):
            beg += rows[j][i+1]
            end += rows[j][-i-2]
        row = beg + gen_str(n - ((i+1)*2), alphabet) + ''.join(reversed(end))
        rows.append(row)
    return rows + list(reversed([''.join(reversed(s)) for s in rows[:int(n/2)]]))

Five on the side with random letters:

In [195]:
print("\n".join(gen_square(5)))
xoqgr
olfug
qfifq
guflo
rgqox

Five on the side with only the letters in the Sator square starting with the word sator:

In [52]:
print("\n".join(gen_square(5, alphabet="satorarepotenet", start="sator")))
sator
aarto
trtrt
otraa
rotas

Seven on a side, starting with the word allison:

In [43]:
print("\n".join(gen_square(7, start="allison")))
allison
lwediho
lehjpis
idjdjdi
sipjhel
ohidewl
nosilla

Five on a square, with an alphabet of emoji:

In [44]:
print()
print("\n".join(gen_square(5, alphabet="😀😄😁😆😅😂🤣😊😙😗😘🥰😍😌😉🙃🙂😇😚😋😛😝😜🤨🧐🤓😎")))
🙂😘😁😁😂
😘🙃😍🙂😁
😁😍😊😍😁
😁🙂😍🙃😘
😂😁😁😘🙂

Numerology

"Gematria was based on the fact that, in Hebrew, numbers are indicated by letters; this means that each Hebrew word can be given a numerical value, calculated by summing numbers represented by its letters. This allows mystic relations to be established between words having different meanings though identical numerical values..." (Eco 28)

Let's write some code that groups words by the sum of their numbers.

In [45]:
# only works for the English alphabet (a-z)
def letter_value(ch):
    if not(ch.isalpha()):
        return 0
    return ord(ch.lower()) - 96
letter_value('a')
Out[45]:
1

Adds up the sum of letters in a word:

In [48]:
def gematriesque(s):
    return sum([letter_value(ch) for ch in s])
gematriesque('allison')
Out[48]:
82

The code below finds the sum of every noun in our noun list, then makes a lookup dictionary that shows us all of the nouns with a given sum.

In [49]:
from collections import defaultdict
by_sum = defaultdict(list)
word_to_sum = {}
In [50]:
for item in nouns:
    letter_sum = gematriesque(item)
    word_to_sum[item] = letter_sum
    by_sum[letter_sum].append(item)
In [51]:
by_sum[72]
Out[51]:
['ambulance',
 'carrier',
 'dawning',
 'discord',
 'homeland',
 'lifeline',
 'mayor',
 'sending',
 'tendon',
 'tracing']
In [52]:
gematriesque('allison')
Out[52]:
82

Display words with the same sum as the given word:

In [53]:
print("\n".join(by_sum[gematriesque('allison')]))
frenchman
apartheid
artisan
bowling
colors
conflict
glucose
gusto
hallway
indecency
innocence
juror
kangaroo
melodrama
panther
volcano
voltage

The sound

"[T]hose who are skilled in the use of incantations, relate that the utterance of the same incantation in its proper language can accomplish what the spell professes to do; but when translated into any other tongue, it is observed to become inefficacious and feeble. And thus it is not the things signified, but the qualities and peculiarities of words, which possess a certain power for this or that purpose..."—Origen (in Richardson and Pick 406–407)

"The rhyme, repetition and alliteration of charms produced a sonorous effect that appealed to users and had psychological effects. [...] Words in a sacralized and euphonious language like Latin could be soothing to the ear and thus might seem to have an immediate magical effect. [...] Vocalized reading [was] better able to deter evil spirits" (Skemer 153)

"I don't think I can breathe / With the way you let me down [...] / I don't need the words / I want the sound, sound, sound..." (Jepsen)

A recurring concept with magic words is that what they sound like matters. So it would be nice if we had some way to compose magic words based solely on their phonetics. The problem (in English, at least) is creating the written form of a word from its sound—i.e., spelling.

Pincelate is a Python library that provides a simple interface for a machine learning model that can sound out English words and spell English words based on how they sound. "Sounding out" here means converting letters ("orthography") to sounds ("phonemes"), and "spelling" means converting sounds to letters (phonemes to orthography). The model is trained on the CMU Pronouncing Dictionary, which means it generally sounds words out as though speaking "standard" North American English, and spells words according to "standard" North American English rules (at least as far as the model itself is accurate).

Installing Pincelate

You need to install the tensorflow and pincelate modules. Open up a terminal window and type the following lines:

pip install tensorflow
pip install pincelate

If you're not using Anaconda, you might also need to install a few other libraries:

pip install numpy scipy

Now import the libraries:

In [54]:
import numpy as np
import pronouncing as pr

Now import Pincelate and instantiate a Pincelate object. (This will load the pre-trained model provided with the package.)

In [55]:
from pincelate import Pincelate
Using TensorFlow backend.
In [56]:
pin = Pincelate()

Later in the notebook, I'm going to use some of Jupyter Notebook's interactive features, so I'll import the libraries here:

In [57]:
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive_output, Layout, HBox, VBox

Sounding out and spelling

Pincelate is a machine learning model trained on the CMU Pronouncing Dictionary, a database of tens of thousands of English words along with their pronunciations. To get the pronunciation of a word:

In [58]:
pin.soundout("mimsy")
Out[58]:
['M', 'IH1', 'M', 'S', 'IY0']

... and to produce a plausible spelling for a word whose sounds you just made up, use the .spell() method, passing it a list of Arpabet phonemes:

In [59]:
pin.spell(['B', 'L', 'AH1', 'R', 'F'])
Out[59]:
'blurf'

It's important to note that Pincelate's .soundout() method will only work with letters that appear the CMU Pronouncing Dictionary's vocabulary. (You need to use lowercase letters only.) So the following will throw an error:

In [60]:
pin.spell("étui")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-60-59be7f6871e5> in <module>()
----> 1 pin.spell("étui")

/Users/allison/anaconda/lib/python3.6/site-packages/pincelate/__init__.py in spell(self, phones, temperature)
    130                 this_feat = phone_feature_map[item[:-1]]
    131             else:
--> 132                 this_feat = phone_feature_map[item]
    133             feats.append(this_feat)
    134         state_val = self.phon2orth.infer(

KeyError: 'é'

Spelling words from random phonemes

Let's invent somewhat plausible neologisms by drawing phonemes at random from the list of Arpabet phonemes. ("Neologism" is a fancy word for "made-up word.") Here's a list of all of the phonemes in the CMU Pronouncing Dictionary, plus examples in use:

    Phoneme Example Translation
    ------- ------- -----------
    AA      odd     AA D
    AE      at      AE T
    AH      hut     HH AH T
    AO      ought   AO T
    AW      cow     K AW
    AY      hide    HH AY D
    B       be      B IY
    CH      cheese  CH IY Z
    D       dee     D IY
    DH      thee    DH IY
    EH      Ed      EH D
    ER      hurt    HH ER T
    EY      ate     EY T
    F       fee     F IY
    G       green   G R IY N
    HH      he      HH IY
    IH      it      IH T
    IY      eat     IY T
    JH      gee     JH IY
    K       key     K IY
    L       lee     L IY
    M       me      M IY
    N       knee    N IY
    NG      ping    P IH NG
    OW      oat     OW T
    OY      toy     T OY
    P       pee     P IY
    R       read    R IY D
    S       sea     S IY
    SH      she     SH IY
    T       tea     T IY
    TH      theta   TH EY T AH
    UH      hood    HH UH D
    UW      two     T UW
    V       vee     V IY
    W       we      W IY
    Y       yield   Y IY L D
    Z       zee     Z IY
    ZH      seizure S IY ZH ER

The cell below has a Python list containing all of these phonemes:

In [61]:
all_phonemes = ['AH', 'N', 'S', 'IH', 'L', 'T', 'R', 'K', 'IY', 'D', 'M',
                'ER', 'Z', 'EH', 'AA', 'AE', 'B', 'P', 'OW', 'F', 'EY',
                'G', 'AO', 'AY', 'V', 'NG', 'UW', 'HH', 'W', 'SH', 'JH',
                'Y', 'CH', 'AW', 'TH', 'UH', 'OY', 'DH', 'ZH']

And then this function will return a random neologism, created from phonemes drawn at random from that list:

In [62]:
def neologism_phonemes():
    return [random.choice(all_phonemes) for item in range(random.randrange(3,10))]

Here's a handful, just to get a taste:

In [63]:
for i in range(5):
    print(neologism_phonemes())
['ER', 'AH', 'AO', 'AH', 'DH', 'B', 'K', 'S']
['JH', 'Y', 'OW', 'AA', 'R', 'Z', 'G', 'AY']
['R', 'HH', 'OY', 'N', 'R', 'OY', 'EY', 'EH']
['F', 'JH', 'NG']
['CH', 'SH', 'DH', 'UH', 'AA', 'S', 'IH']

That's all well and good! Try sounding out some of these on your own (consult the Arpabet table to find the English sound corresponding to each symbol).

But how do you spell these neologisms? Why, with Pincelate's .spell() method of course:

In [64]:
pin.spell(neologism_phonemes())
Out[64]:
'appengts'

Here's a for loop that generates neologisms and prints them along with their spellings:

In [65]:
for i in range(12):
    phonemes = neologism_phonemes()
    print(pin.spell(phonemes), phonemes)
anguinitaiz ['EY', 'NG', 'W', 'AA', 'IY', 'T', 'AY', 'JH', 'EY']
mang ['M', 'AE', 'N', 'G']
bninghich ['B', 'N', 'T', 'NG', 'AY', 'CH']
gauth ['G', 'AA', 'TH']
glojchivew ['G', 'F', 'JH', 'OY', 'CH', 'V', 'S', 'UW']
odchden ['AA', 'CH', 'D', 'N']
bsguing ['B', 'S', 'G']
gjuth ['JH', 'Y', 'TH', 'W']
enlplawen ['IY', 'N', 'L', 'W', 'P', 'L', 'AW', 'N']
pubge's ['P', 'UW', 'B', 'JH', 'Z']
jroshwok ['R', 'JH', 'SH', 'W', 'OW', 'HH', 'K']
adiet ['AH', 'D', 'IY', 'T']

Phoneme features

The examples above use the phoneme as the basic unit of English phonetics. But each phoneme itself has characteristics, and many phonemes have characteristics in common. For example, the phoneme /B/ has the following characteristics:

  • bilabial: you put your lips together when you say it
  • stop: airflow from the lungs is completely obstructed
  • voiced: your vocal cords are vibrating while you say it

The phoneme /P/ shares two out of three of these characteristics (it's bilabial and a stop, but is not voiced). The phoneme /AE/, on the other hand, shares none of these characteristics. Instead, it has these characteristics:

  • vowel: your mouth doesn't stop or occlude airflow when making this sound
  • low: your tongue is low in the mouth
  • front: your tongue is advanced forward in the mouth
  • unrounded: your lips are not rounded

These characteristics of phonemes are traditionally called "features." You can look up the features for particular phonemes using the phone_feature_map variable in Pincelate's featurephone module:

In [66]:
from pincelate.featurephone import phone_feature_map

For example, to get the features for the vowel /UW/ (vowel sound in "toot"):

In [67]:
phone_feature_map['UW']
Out[67]:
('hgh', 'bck', 'rnd', 'vwl')

The features are referred to here with short three-letter abbreviations. Here's a full list:

  • alv: alveolar
  • apr: approximant
  • bck: back
  • blb: bilabial
  • cnt: central
  • dnt: dental
  • fnt: front
  • frc: fricative
  • glt: glottal
  • hgh: high
  • lat: lateral
  • lbd: labiodental
  • lbv: labiovelar
  • lmd: low-mid
  • low: low
  • mid: mid
  • nas: nasal
  • pal: palatal
  • pla: palato-alveolar
  • rnd: rounded
  • rzd: rhoticized
  • smh: semi-high
  • stp: stop
  • umd: upper-mid
  • unr: unrounded
  • vcd: voiced
  • vel: velar
  • vls: voiceless
  • vwl: vowel

Additionally, there are two special phoneme features:

  • beg: beginning of word
  • end: end of word

... which are found and the beginnings and endings of words.

Internally, Pincelate's model operates on these phoneme features, instead of directly on whole phonemes. This allows the model to capture and predict underlying similarities between phonemes.

Pincelate's .phonemefeatures() method works a lot like .spell(), except instead of returning a list of phonemes, it returns a numpy array of phoneme feature probabilities. This array has one row for each predicted phoneme, and one column for the probability (between 0 and 1) of a phoneme feature being a component of each phoneme. To illustrate, here I get the feature array for the word cat:

In [68]:
cat_feats = pin.phonemefeatures("cat")

This array has the following shape:

In [69]:
cat_feats.shape
Out[69]:
(5, 32)

... which tells us that there are five predicted phonemes. (The 32 is the total number of possible features.) The word cat, of course, has only three phonemes (/K AE T/)—the extra two are the special "beginning of the word" and "end of the word" phonemes at the beginning and end, respectively.

Examining predicted phoneme features

Let's look at the feature probabilities for the first phoneme (after the special "beginning of the word" token at index 0):

In [70]:
cat_feats[1]
Out[70]:
array([6.42707571e-04, 2.13692928e-07, 6.62605757e-08, 5.43442347e-10,
       5.47038814e-09, 7.04440527e-06, 1.58982238e-09, 1.66211791e-08,
       3.81101599e-05, 8.24350354e-05, 1.62252746e-07, 5.46323768e-08,
       1.41502560e-10, 5.33169420e-09, 7.31331828e-10, 2.70081146e-05,
       1.83614669e-04, 1.62359720e-05, 2.74244065e-11, 1.44446346e-07,
       3.33543511e-07, 1.91042790e-08, 3.52445828e-09, 4.54965146e-07,
       9.99929667e-01, 7.26780854e-05, 8.35576885e-10, 2.66875286e-04,
       1.75827936e-05, 9.99930263e-01, 9.99974251e-01, 1.87013138e-04])

You can look up the index in this array associated with a particular phoneme feature using Pincelate's .featureidx() method:

In [71]:
cat_feats[1][pin.featureidx('vel')]
Out[71]:
0.9999302625656128

This tells us that the vel (velar) feature for this phoneme is predicted with almost 100% probability—which makes sense, since the phoneme we'd anticipate—/K/ is a voiceless velar stop.

The following bit of code steps through each row in this array and prints out the phoneme features with the highest probability in that row, using numpy's argsort function:

In [72]:
def idxfeature(pin, idx):
    return pin.orth2phon.target_vocab[idx]
for i, phon in enumerate(cat_feats):
    print("phoneme", i)
    for idx in np.argsort(phon)[::-1][:5]:
        print(idxfeature(pin, idx), phon[idx])
    print()
phoneme 0
beg 1.0
vwl 0.0
vls 0.0
apr 0.0
bck 0.0

phoneme 1
vls 0.999974250793457
vel 0.9999302625656128
stp 0.999929666519165
alv 0.000642707571387291
unr 0.00026687528588809073

phoneme 2
unr 0.9997866749763489
vwl 0.9990422129631042
str 0.9986899495124817
fnt 0.9959463477134705
low 0.9807271957397461

phoneme 3
vls 0.9993033409118652
alv 0.9990631937980652
stp 0.9904974102973938
frc 0.0036416002549231052
end 0.0013078120537102222

phoneme 4
end 0.9997904896736145
fnt 0.0006787743768654764
vwl 0.000589678471442312
unr 0.0005248847301118076
str 0.0003406509349588305

We'll come back to a more complete example that shows how to manipulate these values below.

Example: Resizing feature probability arrays

Once you have the phonetic feature probability arrays, you can treat them the same way you'd treat any other numpy array. One thing I like to do is use scipy's image manipulation functions and use them resample the phonetic feature arrays. This lets us use the same phonetic information to spell a shorter or longer word. In particular, scipy.ndimage.interpolation has a handy zoom function that resamples an array and interpolates it. Normally you'd use this to resize an image, but nothing's stopping us from using it to resize our phonetic feature array.

First, import the function:

In [73]:
from scipy.ndimage.interpolation import zoom

Then get some phoneme feature probabilities:

In [74]:
feats = pin.phonemefeatures("alphabet")

Then resize with zoom(). The second parameter to zoom() is a tuple with the factor by which to scale the dimensions of the incoming array. We only want to scale along the first axis (i.e., the phonemes), keeping the second axis (i.e., the features) constant.

A shorter version of the word:

In [75]:
shorter = zoom(feats, (0.67, 1))
pin.spellfeatures(shorter)
Out[75]:
'albe'

A longer version:

In [76]:
longer = zoom(feats, (2.0, 1))
pin.spellfeatures(longer)
Out[76]:
'all-phafebet'
In [77]:
def stretch_words(s, factor=1.0):
    out = []
    for word in s.split():
        word = word.lower()
        vec = pin.phonemefeatures(word)
        if factor < 1.0:
            order = 3
        else:
            order = 0
        zoomed = zoom(vec, (factor, 1), order=order)
        out.append(pin.spellfeatures(zoomed))
    return " ".join(out)
stretch_words("this is a test", factor=1.5)
/Users/allison/anaconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:605: UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed.
  "the returned array has changed.", UserWarning)
Out[77]:
"theothis' ayes' ah ttestsed"

If you've downloaded this notebook and you're following along running the code, the following cell will create an interactive widget that lets you "stretch" and "shrink" the words that you type into the text box by dragging the slider.

In [78]:
import warnings
warnings.filterwarnings('ignore')
@interact(words="in the beginning was the notebook", factor=(0.1, 4.0, 0.1))
def stretchy(words, factor=1.0):
    print(stretch_words(words, factor))

Round-trip spelling manipulation

Pincelate actually consists of two models: one that knows how to sound out words based on how they're spelled, and another that knows how to spell words from sounds. Pincelate's .manipulate() function does a "round trip" re-spelling of a word, passing it through both models to return back to the original word. Try it out:

In [79]:
pin.manipulate("poetic")
Out[79]:
'poetic'

On the surface, this isn't very interesting! You don't need Pincelate to tell you how to spell a word that you already know how to spell. But the .manipulate() has a handful of parameters that allow you to mess around with the model's internal workings in fun and interesting ways. The first is the temperature parameter, which artificially increases or decreases the amount of randomness in the model's output probabilities.

Spelling temperature

When the temperature is close to zero, the model will always pick the most likely spelling of the word at each step.

In [80]:
pin.manipulate("poetic", temperature=0.01)
Out[80]:
'poetic'

As you increase the temperature to 1.0, the model starts picking values at random according to the underlying probabilities.

In [81]:
pin.manipulate("poetic", temperature=1.0)
Out[81]:
'poetick'

At temperatures above 1.0, the model has a higher chance of picking from letters with lower probabilities, producing a more unlikely spelling:

In [82]:
pin.manipulate("poetic", temperature=1.5)
Out[82]:
'poetike'

At a high enough temperature, the model's spelling feels essentially random:

In [83]:
pin.manipulate("poetic", temperature=3.0)
Out[83]:
'ppeetinh'

The following interactive widget lets you play with the temperature parameter:

In [84]:
@interact(s="your text here", temp=(0.05, 2.5, 0.05))
def tempadjust(s, temp):
    return ' '.join([pin.manipulate(w.lower(), temperature=temp) for w in s.split()])

Manipulating letter and phoneme frequencies

The manipulate method can take two other parameters: letters and features. These are dictionaries that map letters or phonetic features to exponential multipliers. When Pincelate is spelling the word, it uses these multipliers to adjust the probability of the corresponding letters in the output. Somewhat unintuitively, positive values reduce the corresponding probability, while negative values increase the probability.

Here's an example to make it more clear. First: respelling a word without the letter e:

In [85]:
pin.manipulate("spelling", letters={'e': 10})
Out[85]:
'spilling'

Let's do this for a set of randomly selected words from the noun list:

In [86]:
for noun in random.sample(nouns, 10):
    print(noun, pin.manipulate(noun, letters={'e': 20}))
carcass carcas
nuisance nusancl
stylus stylus
humility humility
firing firing
councilman councilman
sediment shadimant
constable constably
contentment contintmint
inaction inaction

The features parameter does the same thing, except it adjusts the probability of particular phoneme features at each step. For example, this makes words more nasal:

In [87]:
pin.manipulate("spelling", features={'nas': -10})
Out[87]:
'smnenging'

The following code makes all of the vowels more rounded and further back in the mouth in a list of random nouns:

In [88]:
for noun in random.sample(nouns, 10):
    print(noun, pin.manipulate(noun, features={'bck': -2, 'rnd': -5}))
vomiting vowautung
ralph raulfough
footing foutung
rendition rundutuon
mechanics mocaunux
cabbage kaubauge
cilantro solauntuo
criminality cruminauloup
lineage lunaug
disobedience dusoubuluong

Interactive manipulation tool

The following cells make an interactive tool you can use to play around with manipulating temperature, letter probabilties and phoneme probabilities interactively.

In [89]:
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive_output, Layout, HBox, VBox
In [90]:
def manipulate(instr="allison", temp=0.25, **kwargs):
    return ' '.join([
        pin.manipulate(
            w,
            letters={k: v*-1 for k, v in kwargs.items()
                  if k in pin.orth2phon.src_vocab_idx_map.keys()},
            features={k: v*-1 for k, v in kwargs.items()
                      if k in pin.orth2phon.target_vocab_idx_map.keys()},
            temperature=temp
        ) for w in instr.split()]
    )
In [91]:
orth_sliders = {}
phon_sliders = {}
for ch in pin.orth2phon.src_vocab_idx_map.keys():
    if ch in "'-.": continue
    orth_sliders[ch] = widgets.FloatSlider(description=ch,
                               continuous_update=False,
                               value=0,
                               min=-20,
                               max=20,
                               step=0.5,
                               layout=Layout(height="10px"))
for feat in pin.orth2phon.target_vocab_idx_map.keys():
    if feat in ("beg", "end", "cnt", "dnt"): continue
    phon_sliders[feat] = widgets.FloatSlider(description=feat,
                               continuous_update=False,
                               value=0,
                               min=-20,
                               max=20,
                               step=0.5,
                               layout=Layout(height="10px"))
instr = widgets.Text(description='input', value="spelling words with machine learning")
tempslider = widgets.FloatSlider(description='temp', continuous_update=False, value=0.3, min=0.01, max=5, step=0.05)
left_box = VBox(tuple(orth_sliders.values()) + (tempslider,))
right_box = VBox(tuple(phon_sliders.values()))
all_sliders = HBox([left_box, right_box])

out = interactive_output(lambda *args, **kwargs: print(manipulate(*args, **kwargs)),
                         dict(instr=instr, temp=tempslider, **orth_sliders, **phon_sliders))
out.layout.height = "100px"
display(VBox([all_sliders, instr]), out)

Phonetic states

The Pincelate model also produces a "hidden state," which is a single fixed-size vector that represents the total sound of a word. (You can think of this as a point on a Cartesian plane, where words with similar sounds are clustered next to each other.) To get the hidden state of a word, call the .phonemestate() method:

In [92]:
pin.phonemestate('abracadabra')
Out[92]:
array([ 7.95686364e-01, -8.32179904e-01, -1.32981718e+00,  7.25831270e-01,
       -2.64316416e+00,  1.57794631e+00, -1.49719226e+00,  2.60457993e+00,
       -3.31631720e-01, -6.20785542e-02, -1.07942343e+00, -9.35500801e-01,
        1.13087571e+00, -2.40438804e-02, -3.28609198e-01,  2.97865009e+00,
        5.29175103e-01,  1.03818035e+00, -1.86510909e+00,  1.05075657e+00,
        1.13979602e+00,  2.85125399e+00, -6.54058456e-01,  5.91307104e-01,
        4.18249458e-01,  4.07120883e-01,  2.90681601e-01, -2.21350479e+00,
        6.69969380e-01, -6.35705888e-01, -1.40898752e+00,  1.23353994e+00,
       -4.64894950e-01, -5.61830521e-01, -2.65465081e-01,  6.93497515e+00,
        2.54075122e+00, -3.86470616e-01,  7.37920403e-01, -2.52454400e-01,
        1.13615263e+00,  1.07363796e+00, -3.24268669e-01,  2.30040264e+00,
        1.46473849e+00, -2.06925702e+00, -1.03245997e+00, -1.25596628e-01,
       -1.65496230e+00, -4.91467148e-01, -5.36341250e-01,  4.08115983e-01,
        1.84644151e+00, -1.96521312e-01, -9.94934380e-01, -1.75815284e-01,
       -1.07653344e+00, -3.44106033e-02,  2.51568604e+00,  4.28566813e-01,
        4.42921072e-01,  1.39196253e+00, -1.56479609e+00,  3.04453349e+00,
        2.39666963e+00,  8.14390123e-01, -2.70349789e+00,  1.15729785e+00,
       -7.88961649e-02,  2.75429010e-01,  6.31406188e-01,  1.58451569e+00,
       -1.55730531e-01,  2.57675266e+00, -1.86182892e+00,  1.68593317e-01,
       -1.95982814e+00, -7.32693970e-01,  7.66813755e-03, -5.66927716e-02,
       -4.79854643e-01, -1.47521091e+00, -3.14706206e+00, -1.85165763e-01,
        1.51251328e+00, -3.31812084e-01,  3.72764409e-01,  1.87518907e+00,
        7.84418583e-01,  5.91440462e-02,  2.49756783e-01, -6.65867984e-01,
       -2.45798969e+00,  2.43706182e-01, -1.74799120e+00,  6.31147289e+00,
       -2.21082544e+00, -6.17550135e-01, -1.05487323e+00,  1.32610798e+00,
       -1.96974850e+00,  6.00875989e-02, -7.77341351e-02,  3.41730595e-01,
       -3.29071307e+00,  1.91098666e+00,  2.74943769e-01,  2.36249596e-01,
       -7.78424263e-01, -1.48498321e+00, -1.75742328e-01, -2.70122141e-01,
       -7.82502234e-01,  1.02417684e+00,  1.33242464e+00,  8.82816672e-01,
       -9.57970083e-01, -1.86585039e-01, -8.48214865e-01,  1.15504694e+00,
       -1.22457647e+00,  2.49675870e-01,  1.96862161e+00, -3.13274860e-01,
        2.70345712e+00,  1.11661434e+00,  1.75637615e+00, -3.24920726e+00,
       -1.31210089e+00,  7.51341939e-01, -4.61002064e+00, -1.79387522e+00,
       -2.13482738e-01,  1.16403735e+00,  6.09336972e-01, -1.19726789e+00,
        6.51616156e-01, -1.64964771e+00, -1.07895292e-01,  1.17505085e+00,
        1.00255024e+00,  2.09715486e+00, -2.84226799e+00,  3.04437727e-01,
       -8.29695046e-01,  1.77979434e+00, -3.90957534e-01, -1.63378143e+00,
        1.43395996e+00, -4.61261392e-01,  4.31022048e-03,  2.70064235e-01,
       -2.65720755e-01, -1.66805908e-01,  7.00646102e-01, -3.77741992e-01,
        8.39838505e-01,  1.02057767e+00,  1.36773157e+00, -5.73049784e-01,
        3.41587991e-01, -8.69696915e-01, -5.50617874e-01, -7.18180537e-01,
        5.41177392e-01, -7.49346852e-01,  1.33970344e+00, -1.03110039e+00,
        6.94945455e-01, -1.95170224e-01, -1.03363979e+00,  2.98215580e+00,
        3.45216870e-01, -2.18459392e+00,  2.91187835e+00,  9.79840875e-01,
        3.20049500e+00,  7.04905629e-01,  1.92975909e-01,  7.36262500e-01,
       -3.34599018e-02,  1.89192712e+00,  8.96418840e-02, -1.41968474e-01,
       -1.46555102e+00, -1.18895268e+00, -1.25323486e+00,  5.48723757e-01,
        1.16233110e+00, -3.77950400e-01, -2.00661182e+00,  3.27691698e+00,
       -1.96016419e+00, -2.57373786e+00,  1.35590124e+00,  3.65701348e-01,
       -3.07851863e+00, -1.65423945e-01,  1.09554805e-01,  4.22158629e-01,
       -4.81078625e-01,  1.02364518e-01,  1.48046303e+00, -1.36909890e+00,
       -9.12416160e-01, -2.13123873e-01,  1.57091486e+00,  1.03272748e+00,
        3.81099284e-02,  3.83975387e-01,  2.15760851e+00,  6.17110789e-01,
       -5.82861066e-01, -1.10520041e+00, -8.93351912e-01, -4.44957986e-02,
        1.46159840e+00, -1.04589856e+00, -1.55343175e+00, -1.07608688e+00,
        1.22968698e+00,  8.79801631e-01, -1.39852309e+00,  4.19925094e-01,
        1.06851876e+00, -1.04367542e+00, -2.36931384e-01,  2.73201913e-01,
       -6.20300889e-01, -1.98342371e+00,  1.82388949e+00,  1.52567357e-01,
        1.38442791e+00, -1.00117397e+00, -6.62417471e-01, -3.00938010e+00,
        1.61543345e+00, -1.76816809e+00,  2.49266005e+00,  4.57145870e-01,
        7.06938148e-01, -1.41129887e+00, -7.02914178e-01, -5.97419918e-01,
       -1.53821719e+00,  2.86762547e+00,  1.05890915e-01,  6.03433847e-01,
        9.90708888e-01,  2.16755056e+00,  8.05558801e-01, -3.54735470e+00,
       -2.80189663e-01,  3.64496589e+00,  2.59146929e-01, -1.99815035e+00],
      dtype=float32)

This is a big weird number (a 256-dimensional vector, to be specific) that doesn't seem meaningful on its own. But we can do some interesting things with it.

Blending words

We can manipulate this underlying representation in various ways and then spell the word resulting from that manipulation with the .spellstate() method. The following code phonetically blends two words:

In [93]:
def blend(a, b):
    factor = 0.5
    start = pin.phonemestate(a)
    end = pin.phonemestate(b)
    return pin.spellstate(((start*factor) + (end*(1-factor))))
blend('paper', 'plastic')
Out[93]:
'paceter'

The following code picks ten random nouns and then shows the word between them:

In [94]:
for i in range(10):
    worda = random.choice(nouns)
    wordb = random.choice(nouns)
    print(worda, " → ", blend(worda, wordb), " → ", wordb)
philosopher  →  alloiser  →  allies
extinction  →  tichtion  →  tycoon
variation  →  neawirting  →  networking
glitter  →  gietter  →  headcount
corpus  →  torsales  →  translation
weariness  →  fraising  →  phrasing
cheesecake  →  contecater  →  contractor
youngster  →  linguitor  →  liquidation
standpoint  →  sandipite  →  publicity
atheism  →  geinthism  →  greens

Variants with noise

"[M]agic spells come in a wide variety [...]. [W]hat seems to be most important is the sound, which is often based on alliterations and homophones. The use of sounds prompts a series of variations on a single word, such as, "festella, festelle, festelle festelli festello festello, festella festellum," used to banish all kinds of fistulas." (Lecouteux xix)

We can create variants of existing words by adding randomly generated noise to the phoneme state vector. For example:

In [96]:
state = pin.phonemestate("abracadabra")
pin.spellstate(state + np.random.randn(256))
Out[96]:
'a-prakanbabaragh'

This function lets you control the amount of noise to add to the specified word, and spells from the resulting vector:

In [97]:
def noisy(word, factor=1.0):
    state = pin.phonemestate(word) + np.random.randn(256) * factor
    return pin.spellstate(state)
noisy("allison", 0.5)
Out[97]:
'alison'

Adding progressively more noise:

In [101]:
for i in range(5, 25):
    factor = i * 0.1
    print("%0.02f" % factor, noisy("abracadabra", factor))
0.50 abrocadable
0.60 abrakadabara
0.70 abracadabra
0.80 arkrakadorab
0.90 abricadabana
1.00 abrocadra
1.10 mbrimidaga
1.20 ammracibadiam
1.30 ab-abaganberrieg
1.40 apcricabada
1.50 hahrhachadagrada
1.60 e
1.70 zbrccracherb
1.80 adradadadrada
1.90 abbbb
2.00 jjqteepadagedavad
2.10 hhhhh
2.20 eegekwywywbyb-byg-b
2.30 ggpradabadarpa
2.40 vqmrttctttritttt-bj

Bibliography

Crain, Patricia. The Story of A: The Alphabetization of America from the New England Primer to the Scarlet Letter. Stanford University Press, 2000.

Eco, Umberto. The Search for the Perfect Language. Blackwell, 1997.

Jepsen, Carly R. “The sound.” Dedicated. By Jepsen, Carly R., et al, 2019. Digital release.

Lecouteux, Claude. Dictionary of Ancient Magic Words and Spells: From Abraxas to Zoar. First U.S. edition, Inner Traditions, 2015.

Leggott, Michele J. Reading Zukofsky’s 80 Flowers. Johns Hopkins University Press, 1989.

Richardson, Ernest Cushing, and Bernhard Pick, editors. The Ante-Nicene Fathers: Translations of the Writings of the Fathers down to A.D. 325. C. Scribner’s sons, 1905.

Skemer, Don C. Binding Words: Textual Amulets in the Middle Ages. Penn State Press, 2010.