(Early draft, incomplete, under construction gif here)
The goal of this notebook is to demonstrate some computational means for exploring the literary genre of the magic word. For present purposes, I define a "magic word" as a string of letters that affords a foregrounding of its material properties (e.g., spelling, pronunciation), and suggests some effect beyond meaning alone. The underlying assumption (maybe faulty) is that magic words with similar material properties will also have similar effects, and that by writing computer programs to produce magic words (whether from whole cloth or as variants on other magic words), we can produce new magic words with new effects.
I don't understand this notebook as a way of casting spells, but merely as a way of investigating potential forms. Hence: speculative magic words.
The notebook serves as a demonstration of (1) Python string manipulation techniques; and (2) the Pincelate library for grapheme-to-phoneme and phoneme-to-grapheme translation.
Some of these examples will be data-driven, i.e., we need an existing corpus of words. Download this file into the same folder as this notebook like so:
!curl -L -O https://raw.githubusercontent.com/dariusk/corpora/master/data/words/nouns.json
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 18192 100 18192 0 0 94259 0 --:--:-- --:--:-- --:--:-- 94750
The file contains a list of English nouns. The code in the cell below reads them into a list. We'll use this list throughout in the code below.
import json
nouns = [item.lower() for item in json.load(open("nouns.json"))['nouns']]
The random
module has a function choice
that picks one item from a list at random:
import random
random.choice(nouns)
'mediator'
"[W]riting gave physical permanence to words.... Written words continued to act in one's behalf long after the sound of spoken words had ceased" (Skemer 133)
"Motion terminates at no other end save its own beginning, in order to cease and rest in it... In the intelligible world... Grammar begins with the letter, from which all writing is derived and into which it is all resolved" (John Scotus Erigena, quoted in Leggott 46)
"[T]he unit of textual meaning—the letter—lacks meaning itself. The alphabet's semantic vacuum represents a threat to orthodoxy, for into this space competing meaning systems may rush." (Crain 18)
The words in many apotropaic charms exhibit certain kinds of manipulation that we can characterize as orthographic in nature—i.e., they have to do with the letters in the words. In this section of the notebook, I show some computer code for performing these transformations explicitly.
The following cell defines a short text that we'll use for testing purposes:
text = "in the beginning was the notebook"
"In medieval manuscripts, the letters themselves were frequently a source of confusion. [...] The first letters of words can be omitted... while others are doubled up.... Words can be dislocated," "compounded," "contracted," "abbreviated"; "letters vanish. [...] [W]e should also mention the variations made with uppercase and lowercase letters.... May this overview give the reader a small idea of the difficulties encountered by the researcher!" (Lecouteux xxi)
"Cacography" here means writing with mistakes. In medieval grimoires, mistakes were usually introduced as errors in copying, but the presence of errors actually made people perceive the spells as more powerful. We can simulate these errors in Python.
This operation "contracts" two words, smooshing together the first and last parts.
noun1 = random.choice(nouns)
noun2 = random.choice(nouns)
print(noun1, noun2)
spoiler intercession
noun1[:int(len(noun1)/2)] + noun2[int(len(noun2)/2):]
'spoession'
In function form:
def smoosh(a, b):
return a[:int(len(a)/2)] + b[int(len(b)/2):]
smoosh("allison", "parrish")
'allrish'
This operation inserts random spaces, dislocating words from each other.
out = ""
for ch in text:
if random.random() < 0.1:
out += " "
out += ch
print(out)
in the beginning was the notebo ok
As a function:
def dislocate(s, prob=0.1):
out = ""
for ch in s:
if random.random() < prob:
out += " "
out += ch
return out
dislocate("abracadabra")
'ab rac adabra'
dislocate("abracadabra", 0.75)
' a b r ac ad a b r a'
Another strategy for producing magic words is transliterating them (e.g., converting Greek letters to their Roman equivalent) or applying ciphers (like a substitution cipher, in which each letter is replaced with another letter). These techniques retain the underlying structure of the spelling, so the resulting form doesn't look entirely random. But it doesn't retain the surface form—it makes the familiar unfamiliar.
The function below implements simple character replacement:
def replace_by_char(s, ch_map):
out = ""
for ch in s:
if ch in ch_map:
out += ch_map[ch]
else:
out += ch
return out
You need to give the function a dictionary that maps any letter expected in the input to a corresponding letter to output. This dictionary maps each letter to the letter that follows it in the alphabet:
nextch_map = {
'a': 'b', 'b': 'c', 'c': 'd', 'd': 'e',
'e': 'f', 'f': 'g', 'g': 'h', 'h': 'i',
'i': 'j', 'j': 'k', 'k': 'l', 'l': 'm',
'm': 'n', 'n': 'o', 'o': 'p', 'p': 'q',
'q': 'r', 'r': 's', 's': 't', 't': 'u',
'u': 'v', 'v': 'w', 'w': 'x', 'x': 'y',
'y': 'z', 'z': 'a'
}
Call it on a string:
replace_by_char("allison parrish", nextch_map)
'bmmjtpo qbssjti'
A well-known cipher in computer programming culture is rot13, in which each character is replaced with the character that comes thirteen spots later in the alphabet (wrapping around the end of the alphabet as needed). It's so common, it's already implemented in Python:
import codecs
codecs.encode("allison parrish", 'rot13')
'nyyvfba cneevfu'
"According to legend, some devil-pacts were written in retrograde to invoke diabolical powers. [...] Artists depicted retrograde writing as demonic. In a 15th c. block book, a demon is shown holding up a tablet on which the sins of the dying man's life are recorded in mirror writing..." (Skemer 121)
# from https://github.com/combatwombat/Lunicode.js/blob/master/lunicode.js
mirror_replacements = {
'a': 'ɒ', 'b': 'd', 'c': 'ɔ', 'd': 'b', 'e': 'ɘ',
'f': 'Ꮈ', 'g': 'ǫ', 'h': 'ʜ', 'i': 'i', 'j': 'ꞁ',
'k': 'ʞ', 'l': 'l', 'm': 'm', 'n': 'ᴎ', 'o': 'o',
'p': 'q', 'q': 'p', 'r': 'ɿ', 's': 'ꙅ', 't': 'ƚ',
'u': 'u', 'v': 'v', 'w': 'w', 'x': 'x', 'y': 'ʏ', 'z': 'ƹ',
'A': 'A', 'B': 'ᙠ', 'C': 'Ɔ', 'D': 'ᗡ', 'E': 'Ǝ',
'F': 'ꟻ', 'G': 'Ꭾ', 'H': 'H', 'I': 'I', 'J': 'Ⴑ',
'K': '⋊', 'L': '⅃', 'M': 'M', 'N': 'Ͷ', 'O': 'O',
'P': 'ꟼ', 'Q': 'Ọ', 'R': 'Я', 'S': 'Ꙅ', 'T': 'T',
'U': 'U', 'V': 'V', 'W': 'W', 'X': 'X', 'Y': 'Y', 'Z': 'Ƹ'}
print(text + " " + replace_by_char(text, mirror_replacements))
in the beginning was the notebook iᴎ ƚʜɘ dɘǫiᴎᴎiᴎǫ wɒꙅ ƚʜɘ ᴎoƚɘdooʞ
Magic words gain power from being copied over and over; mistakes creep in that make the words strange. Lecouteux (p. xxi) suggests that the following accidental replacements were common in medieval manuscripts written in Roman scripts:
# suggested in Lecouteux, p. xxi
replacements = {
'u': ['o', 'n'],
'st': ['h'],
'p': ['f'],
'ni': ['m'],
'rn': ['m'],
'in': ['m'],
'iu': ['m', 'in'],
'r': ['t', 'z', 'c'],
'l': ['t'],
'c': ['t'],
'd': ['ol']
}
import re
import random
These replacements have to be implemented a bit differently from the character substitution ciphers, because the patterns on the left have varying numbers of characters. So we can't just step straight through the source string character by character. The following code replaces every instance of sequences of characters on the left (dictionary keys) at random from the suggested replacements on the right (dictionary values), if a coin flip succeeds.
out = text
for patt, repl in replacements.items():
out = re.sub(patt,
lambda m: random.choice(repl) if random.random() < 0.5 else m.group(),
out)
print(text)
print(out)
in the beginning was the notebook in the begmmng was the notebook
[In magic spells] "we find sequences of letters that can be the initials of words. [...] A passage from the Gesta Imperatorum suggests this; in fact we read there the sequence "P P P, S S S, R R R, F F F," meaning, "Pater patriae perditur, sapientia secum sustollitur, ruunt regna Rome ferro, flamma, fame." The series of letters would therefore be a mnemonic means used to retain whole phrases, but in charms it also serves as a way to keep things secret..." (Lecouteux xx)
The following function takes a string and returns the first n characters of each word in the string (as a list).
def abbrev(s, take=1):
words = s.split()
return [w[:take] for w in words]
abbrev("hello there how are you?")
['h', 't', 'h', 'a', 'y']
abbrev(text, 2)
['in', 'th', 'be', 'wa', 'th', 'no']
print(''.join(abbrev(text, 2)))
inthbewathno
init_cap = [item.capitalize() for item in abbrev(text, 2)]
print('. '.join(init_cap))
In. Th. Be. Wa. Th. No
According to Skemer, magic words and formulas such as abracadabra and abraxas were "often written as diminishing and augmenting series of letters"—shaped in "inverted triangles" or "mandorlas" (116).
The following function implements a word triangle, in which the word is spelled out letter-by-letter, with each spelling on its own line (returned as a list). It's demonstrated here with a second call that reverses the order, creating a mandorla.
def triangle(s):
out = []
for i in range(len(s)):
snippet = s[:i+1]
out.append(snippet)
return out
print("\n".join(triangle("abracadabra")))
print("\n".join(reversed(triangle("abracadabra"))))
a ab abr abra abrac abraca abracad abracada abracadab abracadabr abracadabra abracadabra abracadabr abracadab abracada abracad abraca abrac abra abr ab a
The mandorla
function performs both steps:
def mandorla(s):
return triangle(s)[:-1] + list(reversed(triangle(s)))
print("\n".join(mandorla("abracadabra")))
a ab abr abra abrac abraca abracad abracada abracadab abracadabr abracadabra abracadabr abracadab abracada abracad abraca abrac abra abr ab a
Jupyter Notebook displays text in a fixed-width font by default, so centering doesn't work very well. Instead, we'll write the lines out as HTML and display with Jupyter Notebook's HTML widget:
from IPython.display import display, HTML
html_src = "<div style='text-align: center'>"
html_src += "<br>".join(mandorla("abracadabra"))
html_src += "</div>"
display(HTML(html_src))
Here's a mandorla of the word abracadabra
followed by its mirror replacement:
html_src = "<div style='text-align: center'>"
html_src += "<br>".join(mandorla("abracadabra" + replace_by_char("abracadabra", mirror_replacements)))
html_src += "</div>"
display(HTML(html_src))
The Sator Square:
S A T O R
A R E P O
T E N E T
O P E R A
R O T A S
"Arepo the sower guides the wheels by his work" (Skemer's translation, pp. 116–117), an example of an apotropaic formula that "clearly worked best in writing" (134).
We can generate random word squares in Python.
The function below creates a string of random characters of the given length, using the specified alphabet.
def gen_str(n, alphabet):
return ''.join([random.choice(alphabet) for i in range(n)])
gen_str(5, alphabet="abcdefghijklmnopqrstuvwxyz")
'yhmok'
You can pick the letters in the alphabet, and the word will be constrained to contain only those letters.
gen_str(5, alphabet="abracadabra")
'aarad'
The gen_square
function generates random word squares of size n
using the given alphabet. ("Letter square" might be a better term here, since the function is not guaranteed to produce valid "words.")
def gen_square(n, alphabet='abcdefghijklmnopqrstuvwxyz', start=None):
if start is None:
rows = [gen_str(n, alphabet)]
else:
assert len(start) == n
rows = [start]
for i in range(int(n/2)):
beg = ""
end = ""
for j in range(i+1):
beg += rows[j][i+1]
end += rows[j][-i-2]
row = beg + gen_str(n - ((i+1)*2), alphabet) + ''.join(reversed(end))
rows.append(row)
return rows + list(reversed([''.join(reversed(s)) for s in rows[:int(n/2)]]))
Five on the side with random letters:
print("\n".join(gen_square(5)))
xoqgr olfug qfifq guflo rgqox
Five on the side with only the letters in the Sator square starting with the word sator
:
print("\n".join(gen_square(5, alphabet="satorarepotenet", start="sator")))
sator aarto trtrt otraa rotas
Seven on a side, starting with the word allison
:
print("\n".join(gen_square(7, start="allison")))
allison lwediho lehjpis idjdjdi sipjhel ohidewl nosilla
Five on a square, with an alphabet of emoji:
print()
print("\n".join(gen_square(5, alphabet="😀😄😁😆😅😂🤣😊😙😗😘🥰😍😌😉🙃🙂😇😚😋😛😝😜🤨🧐🤓😎")))
🙂😘😁😁😂 😘🙃😍🙂😁 😁😍😊😍😁 😁🙂😍🙃😘 😂😁😁😘🙂
"Gematria was based on the fact that, in Hebrew, numbers are indicated by letters; this means that each Hebrew word can be given a numerical value, calculated by summing numbers represented by its letters. This allows mystic relations to be established between words having different meanings though identical numerical values..." (Eco 28)
Let's write some code that groups words by the sum of their numbers.
# only works for the English alphabet (a-z)
def letter_value(ch):
if not(ch.isalpha()):
return 0
return ord(ch.lower()) - 96
letter_value('a')
1
Adds up the sum of letters in a word:
def gematriesque(s):
return sum([letter_value(ch) for ch in s])
gematriesque('allison')
82
The code below finds the sum of every noun in our noun list, then makes a lookup dictionary that shows us all of the nouns with a given sum.
from collections import defaultdict
by_sum = defaultdict(list)
word_to_sum = {}
for item in nouns:
letter_sum = gematriesque(item)
word_to_sum[item] = letter_sum
by_sum[letter_sum].append(item)
by_sum[72]
['ambulance', 'carrier', 'dawning', 'discord', 'homeland', 'lifeline', 'mayor', 'sending', 'tendon', 'tracing']
gematriesque('allison')
82
Display words with the same sum as the given word:
print("\n".join(by_sum[gematriesque('allison')]))
frenchman apartheid artisan bowling colors conflict glucose gusto hallway indecency innocence juror kangaroo melodrama panther volcano voltage
"[T]hose who are skilled in the use of incantations, relate that the utterance of the same incantation in its proper language can accomplish what the spell professes to do; but when translated into any other tongue, it is observed to become inefficacious and feeble. And thus it is not the things signified, but the qualities and peculiarities of words, which possess a certain power for this or that purpose..."—Origen (in Richardson and Pick 406–407)
"The rhyme, repetition and alliteration of charms produced a sonorous effect that appealed to users and had psychological effects. [...] Words in a sacralized and euphonious language like Latin could be soothing to the ear and thus might seem to have an immediate magical effect. [...] Vocalized reading [was] better able to deter evil spirits" (Skemer 153)
"I don't think I can breathe / With the way you let me down [...] / I don't need the words / I want the sound, sound, sound..." (Jepsen)
A recurring concept with magic words is that what they sound like matters. So it would be nice if we had some way to compose magic words based solely on their phonetics. The problem (in English, at least) is creating the written form of a word from its sound—i.e., spelling.
Pincelate is a Python library that provides a simple interface for a machine learning model that can sound out English words and spell English words based on how they sound. "Sounding out" here means converting letters ("orthography") to sounds ("phonemes"), and "spelling" means converting sounds to letters (phonemes to orthography). The model is trained on the CMU Pronouncing Dictionary, which means it generally sounds words out as though speaking "standard" North American English, and spells words according to "standard" North American English rules (at least as far as the model itself is accurate).
You need to install the tensorflow
and pincelate
modules. Open up a terminal window and type the following lines:
pip install tensorflow
pip install pincelate
If you're not using Anaconda, you might also need to install a few other libraries:
pip install numpy scipy
Now import the libraries:
import numpy as np
import pronouncing as pr
Now import Pincelate and instantiate a Pincelate object. (This will load the pre-trained model provided with the package.)
from pincelate import Pincelate
Using TensorFlow backend.
pin = Pincelate()
Later in the notebook, I'm going to use some of Jupyter Notebook's interactive features, so I'll import the libraries here:
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive_output, Layout, HBox, VBox
Pincelate is a machine learning model trained on the CMU Pronouncing Dictionary, a database of tens of thousands of English words along with their pronunciations. To get the pronunciation of a word:
pin.soundout("mimsy")
['M', 'IH1', 'M', 'S', 'IY0']
... and to produce a plausible spelling for a word whose sounds you just made up, use the .spell()
method, passing it a list of Arpabet phonemes:
pin.spell(['B', 'L', 'AH1', 'R', 'F'])
'blurf'
It's important to note that Pincelate's .soundout()
method will only work with letters that appear the CMU Pronouncing Dictionary's vocabulary. (You need to use lowercase letters only.) So the following will throw an error:
pin.spell("étui")
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-60-59be7f6871e5> in <module>() ----> 1 pin.spell("étui") /Users/allison/anaconda/lib/python3.6/site-packages/pincelate/__init__.py in spell(self, phones, temperature) 130 this_feat = phone_feature_map[item[:-1]] 131 else: --> 132 this_feat = phone_feature_map[item] 133 feats.append(this_feat) 134 state_val = self.phon2orth.infer( KeyError: 'é'
Let's invent somewhat plausible neologisms by drawing phonemes at random from the list of Arpabet phonemes. ("Neologism" is a fancy word for "made-up word.") Here's a list of all of the phonemes in the CMU Pronouncing Dictionary, plus examples in use:
Phoneme Example Translation
------- ------- -----------
AA odd AA D
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY
CH cheese CH IY Z
D dee D IY
DH thee DH IY
EH Ed EH D
ER hurt HH ER T
EY ate EY T
F fee F IY
G green G R IY N
HH he HH IY
IH it IH T
IY eat IY T
JH gee JH IY
K key K IY
L lee L IY
M me M IY
N knee N IY
NG ping P IH NG
OW oat OW T
OY toy T OY
P pee P IY
R read R IY D
S sea S IY
SH she SH IY
T tea T IY
TH theta TH EY T AH
UH hood HH UH D
UW two T UW
V vee V IY
W we W IY
Y yield Y IY L D
Z zee Z IY
ZH seizure S IY ZH ER
The cell below has a Python list containing all of these phonemes:
all_phonemes = ['AH', 'N', 'S', 'IH', 'L', 'T', 'R', 'K', 'IY', 'D', 'M',
'ER', 'Z', 'EH', 'AA', 'AE', 'B', 'P', 'OW', 'F', 'EY',
'G', 'AO', 'AY', 'V', 'NG', 'UW', 'HH', 'W', 'SH', 'JH',
'Y', 'CH', 'AW', 'TH', 'UH', 'OY', 'DH', 'ZH']
And then this function will return a random neologism, created from phonemes drawn at random from that list:
def neologism_phonemes():
return [random.choice(all_phonemes) for item in range(random.randrange(3,10))]
Here's a handful, just to get a taste:
for i in range(5):
print(neologism_phonemes())
['ER', 'AH', 'AO', 'AH', 'DH', 'B', 'K', 'S'] ['JH', 'Y', 'OW', 'AA', 'R', 'Z', 'G', 'AY'] ['R', 'HH', 'OY', 'N', 'R', 'OY', 'EY', 'EH'] ['F', 'JH', 'NG'] ['CH', 'SH', 'DH', 'UH', 'AA', 'S', 'IH']
That's all well and good! Try sounding out some of these on your own (consult the Arpabet table to find the English sound corresponding to each symbol).
But how do you spell these neologisms? Why, with Pincelate's .spell()
method of course:
pin.spell(neologism_phonemes())
'appengts'
Here's a for loop that generates neologisms and prints them along with their spellings:
for i in range(12):
phonemes = neologism_phonemes()
print(pin.spell(phonemes), phonemes)
anguinitaiz ['EY', 'NG', 'W', 'AA', 'IY', 'T', 'AY', 'JH', 'EY'] mang ['M', 'AE', 'N', 'G'] bninghich ['B', 'N', 'T', 'NG', 'AY', 'CH'] gauth ['G', 'AA', 'TH'] glojchivew ['G', 'F', 'JH', 'OY', 'CH', 'V', 'S', 'UW'] odchden ['AA', 'CH', 'D', 'N'] bsguing ['B', 'S', 'G'] gjuth ['JH', 'Y', 'TH', 'W'] enlplawen ['IY', 'N', 'L', 'W', 'P', 'L', 'AW', 'N'] pubge's ['P', 'UW', 'B', 'JH', 'Z'] jroshwok ['R', 'JH', 'SH', 'W', 'OW', 'HH', 'K'] adiet ['AH', 'D', 'IY', 'T']
The examples above use the phoneme as the basic unit of English phonetics. But each phoneme itself has characteristics, and many phonemes have characteristics in common. For example, the phoneme /B/
has the following characteristics:
The phoneme /P/
shares two out of three of these characteristics (it's bilabial and a stop, but is not voiced). The phoneme /AE/
, on the other hand, shares none of these characteristics. Instead, it has these characteristics:
These characteristics of phonemes are traditionally called "features." You can look up the features for particular phonemes using the phone_feature_map
variable in Pincelate's featurephone
module:
from pincelate.featurephone import phone_feature_map
For example, to get the features for the vowel /UW/
(vowel sound in "toot"):
phone_feature_map['UW']
('hgh', 'bck', 'rnd', 'vwl')
The features are referred to here with short three-letter abbreviations. Here's a full list:
alv
: alveolarapr
: approximantbck
: backblb
: bilabialcnt
: centraldnt
: dentalfnt
: frontfrc
: fricativeglt
: glottalhgh
: highlat
: laterallbd
: labiodentallbv
: labiovelarlmd
: low-midlow
: lowmid
: midnas
: nasalpal
: palatalpla
: palato-alveolarrnd
: roundedrzd
: rhoticizedsmh
: semi-highstp
: stopumd
: upper-midunr
: unroundedvcd
: voicedvel
: velarvls
: voicelessvwl
: vowelAdditionally, there are two special phoneme features:
beg
: beginning of wordend
: end of word... which are found and the beginnings and endings of words.
Internally, Pincelate's model operates on these phoneme features, instead of directly on whole phonemes. This allows the model to capture and predict underlying similarities between phonemes.
Pincelate's .phonemefeatures()
method works a lot like .spell()
, except instead of returning a list of phonemes, it returns a numpy array of phoneme feature probabilities. This array has one row for each predicted phoneme, and one column for the probability (between 0 and 1) of a phoneme feature being a component of each phoneme. To illustrate, here I get the feature array for the word cat
:
cat_feats = pin.phonemefeatures("cat")
This array has the following shape:
cat_feats.shape
(5, 32)
... which tells us that there are five predicted phonemes. (The 32
is the total number of possible features.) The word cat
, of course, has only three phonemes (/K AE T/
)—the extra two are the special "beginning of the word" and "end of the word" phonemes at the beginning and end, respectively.
Let's look at the feature probabilities for the first phoneme (after the special "beginning of the word" token at index 0):
cat_feats[1]
array([6.42707571e-04, 2.13692928e-07, 6.62605757e-08, 5.43442347e-10, 5.47038814e-09, 7.04440527e-06, 1.58982238e-09, 1.66211791e-08, 3.81101599e-05, 8.24350354e-05, 1.62252746e-07, 5.46323768e-08, 1.41502560e-10, 5.33169420e-09, 7.31331828e-10, 2.70081146e-05, 1.83614669e-04, 1.62359720e-05, 2.74244065e-11, 1.44446346e-07, 3.33543511e-07, 1.91042790e-08, 3.52445828e-09, 4.54965146e-07, 9.99929667e-01, 7.26780854e-05, 8.35576885e-10, 2.66875286e-04, 1.75827936e-05, 9.99930263e-01, 9.99974251e-01, 1.87013138e-04])
You can look up the index in this array associated with a particular phoneme feature using Pincelate's .featureidx()
method:
cat_feats[1][pin.featureidx('vel')]
0.9999302625656128
This tells us that the vel
(velar) feature for this phoneme is predicted with almost 100% probability—which makes sense, since the phoneme we'd anticipate—/K/
is a voiceless velar stop.
The following bit of code steps through each row in this array and prints out the phoneme features with the highest probability in that row, using numpy's argsort
function:
def idxfeature(pin, idx):
return pin.orth2phon.target_vocab[idx]
for i, phon in enumerate(cat_feats):
print("phoneme", i)
for idx in np.argsort(phon)[::-1][:5]:
print(idxfeature(pin, idx), phon[idx])
print()
phoneme 0 beg 1.0 vwl 0.0 vls 0.0 apr 0.0 bck 0.0 phoneme 1 vls 0.999974250793457 vel 0.9999302625656128 stp 0.999929666519165 alv 0.000642707571387291 unr 0.00026687528588809073 phoneme 2 unr 0.9997866749763489 vwl 0.9990422129631042 str 0.9986899495124817 fnt 0.9959463477134705 low 0.9807271957397461 phoneme 3 vls 0.9993033409118652 alv 0.9990631937980652 stp 0.9904974102973938 frc 0.0036416002549231052 end 0.0013078120537102222 phoneme 4 end 0.9997904896736145 fnt 0.0006787743768654764 vwl 0.000589678471442312 unr 0.0005248847301118076 str 0.0003406509349588305
We'll come back to a more complete example that shows how to manipulate these values below.
Once you have the phonetic feature probability arrays, you can treat them the same way you'd treat any other numpy array. One thing I like to do is use scipy's image manipulation functions and use them resample the phonetic feature arrays. This lets us use the same phonetic information to spell a shorter or longer word. In particular, scipy.ndimage.interpolation
has a handy zoom function that resamples an array and interpolates it. Normally you'd use this to resize an image, but nothing's stopping us from using it to resize our phonetic feature array.
First, import the function:
from scipy.ndimage.interpolation import zoom
Then get some phoneme feature probabilities:
feats = pin.phonemefeatures("alphabet")
Then resize with zoom()
. The second parameter to zoom()
is a tuple with the factor by which to scale the dimensions of the incoming array. We only want to scale along the first axis (i.e., the phonemes), keeping the second axis (i.e., the features) constant.
A shorter version of the word:
shorter = zoom(feats, (0.67, 1))
pin.spellfeatures(shorter)
'albe'
A longer version:
longer = zoom(feats, (2.0, 1))
pin.spellfeatures(longer)
'all-phafebet'
def stretch_words(s, factor=1.0):
out = []
for word in s.split():
word = word.lower()
vec = pin.phonemefeatures(word)
if factor < 1.0:
order = 3
else:
order = 0
zoomed = zoom(vec, (factor, 1), order=order)
out.append(pin.spellfeatures(zoomed))
return " ".join(out)
stretch_words("this is a test", factor=1.5)
/Users/allison/anaconda/lib/python3.6/site-packages/scipy/ndimage/interpolation.py:605: UserWarning: From scipy 0.13.0, the output shape of zoom() is calculated with round() instead of int() - for these inputs the size of the returned array has changed. "the returned array has changed.", UserWarning)
"theothis' ayes' ah ttestsed"
If you've downloaded this notebook and you're following along running the code, the following cell will create an interactive widget that lets you "stretch" and "shrink" the words that you type into the text box by dragging the slider.
import warnings
warnings.filterwarnings('ignore')
@interact(words="in the beginning was the notebook", factor=(0.1, 4.0, 0.1))
def stretchy(words, factor=1.0):
print(stretch_words(words, factor))
interactive(children=(Text(value='in the beginning was the notebook', description='words'), FloatSlider(value=…
Pincelate actually consists of two models: one that knows how to sound out words based on how they're spelled, and another that knows how to spell words from sounds. Pincelate's .manipulate()
function does a "round trip" re-spelling of a word, passing it through both models to return back to the original word. Try it out:
pin.manipulate("poetic")
'poetic'
On the surface, this isn't very interesting! You don't need Pincelate to tell you how to spell a word that you already know how to spell. But the .manipulate()
has a handful of parameters that allow you to mess around with the model's internal workings in fun and interesting ways. The first is the temperature
parameter, which artificially increases or decreases the amount of randomness in the model's output probabilities.
When the temperature is close to zero, the model will always pick the most likely spelling of the word at each step.
pin.manipulate("poetic", temperature=0.01)
'poetic'
As you increase the temperature to 1.0, the model starts picking values at random according to the underlying probabilities.
pin.manipulate("poetic", temperature=1.0)
'poetick'
At temperatures above 1.0, the model has a higher chance of picking from letters with lower probabilities, producing a more unlikely spelling:
pin.manipulate("poetic", temperature=1.5)
'poetike'
At a high enough temperature, the model's spelling feels essentially random:
pin.manipulate("poetic", temperature=3.0)
'ppeetinh'
The following interactive widget lets you play with the temperature
parameter:
@interact(s="your text here", temp=(0.05, 2.5, 0.05))
def tempadjust(s, temp):
return ' '.join([pin.manipulate(w.lower(), temperature=temp) for w in s.split()])
interactive(children=(Text(value='your text here', description='s'), FloatSlider(value=1.2500000000000002, des…
The manipulate
method can take two other parameters: letters
and features
. These are dictionaries that map letters or phonetic features to exponential multipliers. When Pincelate is spelling the word, it uses these multipliers to adjust the probability of the corresponding letters in the output. Somewhat unintuitively, positive values reduce the corresponding probability, while negative values increase the probability.
Here's an example to make it more clear. First: respelling a word without the letter e
:
pin.manipulate("spelling", letters={'e': 10})
'spilling'
Let's do this for a set of randomly selected words from the noun list:
for noun in random.sample(nouns, 10):
print(noun, pin.manipulate(noun, letters={'e': 20}))
carcass carcas nuisance nusancl stylus stylus humility humility firing firing councilman councilman sediment shadimant constable constably contentment contintmint inaction inaction
The features
parameter does the same thing, except it adjusts the probability of particular phoneme features at each step. For example, this makes words more nasal:
pin.manipulate("spelling", features={'nas': -10})
'smnenging'
The following code makes all of the vowels more rounded and further back in the mouth in a list of random nouns:
for noun in random.sample(nouns, 10):
print(noun, pin.manipulate(noun, features={'bck': -2, 'rnd': -5}))
vomiting vowautung ralph raulfough footing foutung rendition rundutuon mechanics mocaunux cabbage kaubauge cilantro solauntuo criminality cruminauloup lineage lunaug disobedience dusoubuluong
The following cells make an interactive tool you can use to play around with manipulating temperature, letter probabilties and phoneme probabilities interactively.
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interact, interactive_output, Layout, HBox, VBox
def manipulate(instr="allison", temp=0.25, **kwargs):
return ' '.join([
pin.manipulate(
w,
letters={k: v*-1 for k, v in kwargs.items()
if k in pin.orth2phon.src_vocab_idx_map.keys()},
features={k: v*-1 for k, v in kwargs.items()
if k in pin.orth2phon.target_vocab_idx_map.keys()},
temperature=temp
) for w in instr.split()]
)
orth_sliders = {}
phon_sliders = {}
for ch in pin.orth2phon.src_vocab_idx_map.keys():
if ch in "'-.": continue
orth_sliders[ch] = widgets.FloatSlider(description=ch,
continuous_update=False,
value=0,
min=-20,
max=20,
step=0.5,
layout=Layout(height="10px"))
for feat in pin.orth2phon.target_vocab_idx_map.keys():
if feat in ("beg", "end", "cnt", "dnt"): continue
phon_sliders[feat] = widgets.FloatSlider(description=feat,
continuous_update=False,
value=0,
min=-20,
max=20,
step=0.5,
layout=Layout(height="10px"))
instr = widgets.Text(description='input', value="spelling words with machine learning")
tempslider = widgets.FloatSlider(description='temp', continuous_update=False, value=0.3, min=0.01, max=5, step=0.05)
left_box = VBox(tuple(orth_sliders.values()) + (tempslider,))
right_box = VBox(tuple(phon_sliders.values()))
all_sliders = HBox([left_box, right_box])
out = interactive_output(lambda *args, **kwargs: print(manipulate(*args, **kwargs)),
dict(instr=instr, temp=tempslider, **orth_sliders, **phon_sliders))
out.layout.height = "100px"
display(VBox([all_sliders, instr]), out)
VBox(children=(HBox(children=(VBox(children=(FloatSlider(value=0.0, continuous_update=False, description='$', …
Output(layout=Layout(height='100px'))
The Pincelate model also produces a "hidden state," which is a single fixed-size vector that represents the total sound of a word. (You can think of this as a point on a Cartesian plane, where words with similar sounds are clustered next to each other.) To get the hidden state of a word, call the .phonemestate()
method:
pin.phonemestate('abracadabra')
array([ 7.95686364e-01, -8.32179904e-01, -1.32981718e+00, 7.25831270e-01, -2.64316416e+00, 1.57794631e+00, -1.49719226e+00, 2.60457993e+00, -3.31631720e-01, -6.20785542e-02, -1.07942343e+00, -9.35500801e-01, 1.13087571e+00, -2.40438804e-02, -3.28609198e-01, 2.97865009e+00, 5.29175103e-01, 1.03818035e+00, -1.86510909e+00, 1.05075657e+00, 1.13979602e+00, 2.85125399e+00, -6.54058456e-01, 5.91307104e-01, 4.18249458e-01, 4.07120883e-01, 2.90681601e-01, -2.21350479e+00, 6.69969380e-01, -6.35705888e-01, -1.40898752e+00, 1.23353994e+00, -4.64894950e-01, -5.61830521e-01, -2.65465081e-01, 6.93497515e+00, 2.54075122e+00, -3.86470616e-01, 7.37920403e-01, -2.52454400e-01, 1.13615263e+00, 1.07363796e+00, -3.24268669e-01, 2.30040264e+00, 1.46473849e+00, -2.06925702e+00, -1.03245997e+00, -1.25596628e-01, -1.65496230e+00, -4.91467148e-01, -5.36341250e-01, 4.08115983e-01, 1.84644151e+00, -1.96521312e-01, -9.94934380e-01, -1.75815284e-01, -1.07653344e+00, -3.44106033e-02, 2.51568604e+00, 4.28566813e-01, 4.42921072e-01, 1.39196253e+00, -1.56479609e+00, 3.04453349e+00, 2.39666963e+00, 8.14390123e-01, -2.70349789e+00, 1.15729785e+00, -7.88961649e-02, 2.75429010e-01, 6.31406188e-01, 1.58451569e+00, -1.55730531e-01, 2.57675266e+00, -1.86182892e+00, 1.68593317e-01, -1.95982814e+00, -7.32693970e-01, 7.66813755e-03, -5.66927716e-02, -4.79854643e-01, -1.47521091e+00, -3.14706206e+00, -1.85165763e-01, 1.51251328e+00, -3.31812084e-01, 3.72764409e-01, 1.87518907e+00, 7.84418583e-01, 5.91440462e-02, 2.49756783e-01, -6.65867984e-01, -2.45798969e+00, 2.43706182e-01, -1.74799120e+00, 6.31147289e+00, -2.21082544e+00, -6.17550135e-01, -1.05487323e+00, 1.32610798e+00, -1.96974850e+00, 6.00875989e-02, -7.77341351e-02, 3.41730595e-01, -3.29071307e+00, 1.91098666e+00, 2.74943769e-01, 2.36249596e-01, -7.78424263e-01, -1.48498321e+00, -1.75742328e-01, -2.70122141e-01, -7.82502234e-01, 1.02417684e+00, 1.33242464e+00, 8.82816672e-01, -9.57970083e-01, -1.86585039e-01, -8.48214865e-01, 1.15504694e+00, -1.22457647e+00, 2.49675870e-01, 1.96862161e+00, -3.13274860e-01, 2.70345712e+00, 1.11661434e+00, 1.75637615e+00, -3.24920726e+00, -1.31210089e+00, 7.51341939e-01, -4.61002064e+00, -1.79387522e+00, -2.13482738e-01, 1.16403735e+00, 6.09336972e-01, -1.19726789e+00, 6.51616156e-01, -1.64964771e+00, -1.07895292e-01, 1.17505085e+00, 1.00255024e+00, 2.09715486e+00, -2.84226799e+00, 3.04437727e-01, -8.29695046e-01, 1.77979434e+00, -3.90957534e-01, -1.63378143e+00, 1.43395996e+00, -4.61261392e-01, 4.31022048e-03, 2.70064235e-01, -2.65720755e-01, -1.66805908e-01, 7.00646102e-01, -3.77741992e-01, 8.39838505e-01, 1.02057767e+00, 1.36773157e+00, -5.73049784e-01, 3.41587991e-01, -8.69696915e-01, -5.50617874e-01, -7.18180537e-01, 5.41177392e-01, -7.49346852e-01, 1.33970344e+00, -1.03110039e+00, 6.94945455e-01, -1.95170224e-01, -1.03363979e+00, 2.98215580e+00, 3.45216870e-01, -2.18459392e+00, 2.91187835e+00, 9.79840875e-01, 3.20049500e+00, 7.04905629e-01, 1.92975909e-01, 7.36262500e-01, -3.34599018e-02, 1.89192712e+00, 8.96418840e-02, -1.41968474e-01, -1.46555102e+00, -1.18895268e+00, -1.25323486e+00, 5.48723757e-01, 1.16233110e+00, -3.77950400e-01, -2.00661182e+00, 3.27691698e+00, -1.96016419e+00, -2.57373786e+00, 1.35590124e+00, 3.65701348e-01, -3.07851863e+00, -1.65423945e-01, 1.09554805e-01, 4.22158629e-01, -4.81078625e-01, 1.02364518e-01, 1.48046303e+00, -1.36909890e+00, -9.12416160e-01, -2.13123873e-01, 1.57091486e+00, 1.03272748e+00, 3.81099284e-02, 3.83975387e-01, 2.15760851e+00, 6.17110789e-01, -5.82861066e-01, -1.10520041e+00, -8.93351912e-01, -4.44957986e-02, 1.46159840e+00, -1.04589856e+00, -1.55343175e+00, -1.07608688e+00, 1.22968698e+00, 8.79801631e-01, -1.39852309e+00, 4.19925094e-01, 1.06851876e+00, -1.04367542e+00, -2.36931384e-01, 2.73201913e-01, -6.20300889e-01, -1.98342371e+00, 1.82388949e+00, 1.52567357e-01, 1.38442791e+00, -1.00117397e+00, -6.62417471e-01, -3.00938010e+00, 1.61543345e+00, -1.76816809e+00, 2.49266005e+00, 4.57145870e-01, 7.06938148e-01, -1.41129887e+00, -7.02914178e-01, -5.97419918e-01, -1.53821719e+00, 2.86762547e+00, 1.05890915e-01, 6.03433847e-01, 9.90708888e-01, 2.16755056e+00, 8.05558801e-01, -3.54735470e+00, -2.80189663e-01, 3.64496589e+00, 2.59146929e-01, -1.99815035e+00], dtype=float32)
This is a big weird number (a 256-dimensional vector, to be specific) that doesn't seem meaningful on its own. But we can do some interesting things with it.
We can manipulate this underlying representation in various ways and then spell the word resulting from that manipulation with the .spellstate()
method. The following code phonetically blends two words:
def blend(a, b):
factor = 0.5
start = pin.phonemestate(a)
end = pin.phonemestate(b)
return pin.spellstate(((start*factor) + (end*(1-factor))))
blend('paper', 'plastic')
'paceter'
The following code picks ten random nouns and then shows the word between them:
for i in range(10):
worda = random.choice(nouns)
wordb = random.choice(nouns)
print(worda, " → ", blend(worda, wordb), " → ", wordb)
philosopher → alloiser → allies extinction → tichtion → tycoon variation → neawirting → networking glitter → gietter → headcount corpus → torsales → translation weariness → fraising → phrasing cheesecake → contecater → contractor youngster → linguitor → liquidation standpoint → sandipite → publicity atheism → geinthism → greens
"[M]agic spells come in a wide variety [...]. [W]hat seems to be most important is the sound, which is often based on alliterations and homophones. The use of sounds prompts a series of variations on a single word, such as, "festella, festelle, festelle festelli festello festello, festella festellum," used to banish all kinds of fistulas." (Lecouteux xix)
We can create variants of existing words by adding randomly generated noise to the phoneme state vector. For example:
state = pin.phonemestate("abracadabra")
pin.spellstate(state + np.random.randn(256))
'a-prakanbabaragh'
This function lets you control the amount of noise to add to the specified word, and spells from the resulting vector:
def noisy(word, factor=1.0):
state = pin.phonemestate(word) + np.random.randn(256) * factor
return pin.spellstate(state)
noisy("allison", 0.5)
'alison'
Adding progressively more noise:
for i in range(5, 25):
factor = i * 0.1
print("%0.02f" % factor, noisy("abracadabra", factor))
0.50 abrocadable 0.60 abrakadabara 0.70 abracadabra 0.80 arkrakadorab 0.90 abricadabana 1.00 abrocadra 1.10 mbrimidaga 1.20 ammracibadiam 1.30 ab-abaganberrieg 1.40 apcricabada 1.50 hahrhachadagrada 1.60 e 1.70 zbrccracherb 1.80 adradadadrada 1.90 abbbb 2.00 jjqteepadagedavad 2.10 hhhhh 2.20 eegekwywywbyb-byg-b 2.30 ggpradabadarpa 2.40 vqmrttctttritttt-bj
Crain, Patricia. The Story of A: The Alphabetization of America from the New England Primer to the Scarlet Letter. Stanford University Press, 2000.
Eco, Umberto. The Search for the Perfect Language. Blackwell, 1997.
Jepsen, Carly R. “The sound.” Dedicated. By Jepsen, Carly R., et al, 2019. Digital release.
Lecouteux, Claude. Dictionary of Ancient Magic Words and Spells: From Abraxas to Zoar. First U.S. edition, Inner Traditions, 2015.
Leggott, Michele J. Reading Zukofsky’s 80 Flowers. Johns Hopkins University Press, 1989.
Richardson, Ernest Cushing, and Bernhard Pick, editors. The Ante-Nicene Fathers: Translations of the Writings of the Fathers down to A.D. 325. C. Scribner’s sons, 1905.
Skemer, Don C. Binding Words: Textual Amulets in the Middle Ages. Penn State Press, 2010.