Part 6: Ciphers and Key exchange in Python 3.x¶

In this notebook, we introduce cryptography -- how to communicate securely over insecure channels. We begin with a study of two basic ciphers, the Caesar cipher and its fancier variant, the Vigenère cipher. The Vigenère cipher uses a key to turn plaintext (i.e., the message) into ciphertext (the coded message), and uses the same key to turn the ciphertext back into plaintext. Therefore, two parties can communicate securely if they -- and only they -- possess the key.

If the security of communication rests on possession of a common key, then we're left with a new problem: how do the two parties agree on a common key, especially if they are far apart and communicating over an insecure channel?

A clever solution to this problem was published in 1976 by Whitfield Diffie and Martin Hellman, and so it's called Diffie-Hellman key exchange. It takes advantage of modular arithmetic: the existence of a primitive root (modulo a prime) and the difficulty of solving the discrete logarithm problem.

This part complements Chapter 6 of An Illustrated Theory of Numbers.

Ciphers¶

A cipher is a way of transforming a message, called the plaintext into a different form, the ciphertext, which conceals the meaning to all but the intended recipient(s). A cipher is a code, and can take many forms. A substitution cipher might simply change every letter to a different letter in the alphabet. This is the idea behind "Cryptoquip" puzzles. These are not too hard for people to solve, and are easy for computers to solve, using frequency analysis (understanding how often different letters or letter-combinations occur).

ASCII code and the Caesar cipher¶

Even though substitution ciphers are easy to break, they are a good starting point. To implement substitution ciphers in Python, we need to study the string type in a bit more detail. To declare a string variable, just put your string in quotes. You can use any letters, numbers, spaces, and many symbols inside a string. You can enclose your string by single quotes, like 'Hello' or double-quotes, like "Hello". This flexibility is convenient, if you want to use quotes within your string. For example, the string Prince's favorite prime is 1999 should be described in Python with double-quotes "Prince's favorite prime is 1999" so that the apostrophe doesn't confuse things.

Strings are indexed, and their letters can be retrieved as if the string were a list of letters. Python experts will note that strings are immutable while lists are mutable objects, but we aren't going to worry about that here.

In [ ]:
W = "Hello"
print(W)
for j in range(len(W)):  # len(W) is the length of the string W.
print(W[j])  # Access the jth character of the string.

Each "letter" of a string again belongs to the string type. A string of length one is called a character.

In [ ]:
print(type(W))
print(type(W))  # W is a character.

Since computers store data in binary, the designers of early computers (1960s) created a code called ASCII (American Standard Code for Information Interchange) to associate to each character a number between 0 and 127. Every number between 0 and 127 is represented in binary by 7 bits (between 0000000 and 1111111), and so each character is stored with 7 bits of memory. Later, ASCII was extended with another 128 characters, so that codes between 0 and 255 were used, requiring 8 bits. 8 bits of memory is called a byte. One byte of memory suffices to store one (extended ASCII) character.

You might notice that there are 256 ASCII codes available, but there are fewer than 256 characters available on your keyboard, even once you include symbols like # and ;. Some of these "extra" codes are for accented letters, and others are relics of old computers. For example, ASCII code 7 (0000111) stands for the "bell", and readers born in the 1970s or earlier might remember making the Apple II computer beep by pressing Control-G on the keyboard ("G" is the 7th letter). You can look up a full ASCII table if you're curious.

Nowadays, the global community of computer users requires far more than 256 "letters" -- there are many alphabets around the world! So instead of ASCII, we can access over 100 thousand unicode characters. Scroll through a unicode table to see what is possible. With emoji, unicode tables have entered unexpected territory. Python version 3.x fully supports Unicode in all strings.

But here we stay within old-fashioned ASCII codes, since they will suffice for basic English messages. Python has built-in commands chr and ord for converting from code-number (0--255) to character and back again.

In [ ]:
chr(65)
In [ ]:
ord('A')

The following code will produce a table of the ASCII characters with codes between 32 and 126. This is a good range which includes all the most common English characters and symbols on a U.S. keyboard. Note that ASCII code 32 corresponds to an empty space (an important character for long messages!)

In [ ]:
for a in range(32,127):
c = chr(a)
print("ASCII {} is {}".format(a, c))

Since we only work with the ASCII range between 32 and 126, it will be useful to "cycle" other numbers into this range. For example, we will interpret 127 as 32, 128 as 33, etc., when we convert out-of-range numbers into characters.

The following function forces a number into a given range, using the mod operator. It's a common trick, to make lists loop around cyclically.

In [ ]:
def inrange(n,range_min, range_max):
'''
The input number n can be any integer.
The output number will be between range_min and range_max (inclusive)
If the input number is already within range, it will not change.
'''
range_len = range_max - range_min + 1
a = n % range_len
if a < range_min:
a = a + range_len
return a
In [ ]:
inrange(13,1,10)
In [ ]:
inrange(17,5,50)

Now we can implement a substitution cipher by converting characters to their ASCII codes, shuffling the codes, and converting back. One of the simplest substitution ciphers is called a Caesar cipher, in which each character is shifted -- by a fixed amount -- down the list. For example, a Caesar cipher of shift 3 would send 'A' to 'D' and 'B' to 'E', etc.. Near the end of the list, characters are shifted back to the beginning -- the list is considered cyclicly, using our inrange function.

Here is an implementation of the Caesar cipher, using the ASCII range between 32 and 126. We begin with a function to shift a single character.

In [ ]:
def Caesar_shift(c, shift):
'''
Shifts the character c by shift units
within the ASCII table between 32 and 126.
The shift parameter can be any integer!
'''
ascii = ord(c)
a = ascii + shift # Now we have a number between 32+shift and 126+shift.
a = inrange(a,32,126) # Put the number back in range.
return chr(a)

Let's see the effect of the Caesar cipher on our ASCII table.

In [ ]:
for a in range(32,127):
c = chr(a)
print("ASCII {} is {}, which shifts to {}".format(a, c, Caesar_shift(c,5))) # Shift by 5.

Now we can use the Caesar cipher to encrypt strings.

In [ ]:
def Caesar_cipher(plaintext, shift):
ciphertext = ''
for c in plaintext:  # Iterate through the characters of a string.
ciphertext = ciphertext + Caesar_shift(c,shift)
return ciphertext
In [ ]:
print(Caesar_cipher('Hello!  Can you read this?', 5))  # Shift forward 5 units in ASCII.

As designed, the Caesar cipher turns plaintext into ciphertext by using a shift of the ASCII table. To decipher the ciphertext, one can just use the Caesar cipher again, with the negative shift.

In [ ]:
print(Caesar_cipher('Mjqqt&%%Hfs%~tz%wjfi%ymnxD', -5))  # Shift back 5 units in ASCII.

The Vigenère cipher¶

The Caesar cipher is pretty easy to break, by a brute force attack (shift by all possible values) or a frequency analysis (compare the frequency of characters in a message to the frequency of characters in typical English messages, to make a guess).

The Vigenère cipher is a variant of the Caesar cipher which uses an ecryption key to vary the shift-parameter throughout the encryption process. For example, to encrypt the message "This is very secret" using the key "Key", you line up the characters of the message above repeated copies of the key.

T h i s i s v e r y s e c r e t
K e y K e y K e y K e y K e y K e y K

Then, you turn everything into ASCII (or your preferred numerical system), and use the bottom row to shift the top row.

ASCII message 84 104 105 115 32 105 115 32 118 101 114 121 32 115 101 99 114 101 116
Shift 75 101 121 75 101 121 75 101 121 75 101 121 75 101 121 75 101 121 75
ASCII shifted 159 205 226 190 133 226 190 133 239 176 215 242 107 216 222 174 215 222 191
ASCII shifted in range 64 110 36 95 38 36 95 38 49 81 120 52 107 121 32 79 120 32 96

Finally, the shifted ASCII codes are converted back into characters for transmission. In this case, the codes 64,110,36,95, etc., are converted to the ciphertext "@n$_&$_&1Qx4ky Ox \"

The Vigenère cipher is much harder to crack than the Caesar cipher, if you don't have the key. Indeed, the varying shifts make frequency analysis more difficult. The Vigenère cipher is weak by today's standards (see Wikipedia for a description of 19th century attacks), but illustrates the basic actors in a symmetric key cryptosystem: the plaintext, ciphertext, and a single key. Today, symmetric key cryptosystems like AES and 3DES are used all the time for secure communication.

Below, we implement the Vigenère cipher.

In [ ]:
def Vigenere_cipher(plaintext, key):
ciphertext = '' # Start with an empty string
for j in range(len(plaintext)):
c = plaintext[j] # the jth letter of the plaintext
key_index = j % len(key) # Cycle through letters of the key.
shift = ord(key[key_index]) # How much we shift c by.
ciphertext = ciphertext + Caesar_shift(c,shift) # Add new letter to ciphertext
return ciphertext
In [ ]:
print(Vigenere_cipher('This is very secret', 'Key')) # 'Key' is probably a bad key!!

The Vigenère cipher is called a symmetric cryptosystem, because the same key that is used to encrypt the plaintext can be used to decrypt the ciphertext. All we do is subtract the shift at each stage.

In [ ]:
def Vigenere_decipher(ciphertext, key):
plaintext = '' # Start with an empty string
for j in range(len(ciphertext)):
c = ciphertext[j] # the jth letter of the ciphertext
key_index = j % len(key) # Cycle through letters of the key.
shift = - ord(key[key_index]) # Note the negative sign to decipher!
plaintext = plaintext + Caesar_shift(c,shift) # Add new letter to plaintext
return plaintext
In [ ]:
print(Vigenere_decipher('@n$_&$_&1Qx4ky Ox ', 'Key'))
In [ ]:
# Try a few cipher/deciphers yourself to get used to the Vigenere system.

The Vigenère cipher becomes an effective way for two parties to communicate securely, as long as they share a secret key. In the 19th century, this often meant that the parties would require an initial in-person meeting to agree upon a key, or a well-guarded messenger would carry the key from one party to the other.

Today, as we wish to communicate securely over long distances on a regular basis, the process of agreeing on a key is more difficult. It seems like a chicken-and-egg problem, where we need a shared secret to communicate securely, but we can't share a secret without communicating securely in the first place!

Remarkably, this secret-sharing problem can be solved with some modular arithmetic tricks. This is the subject of the next section.

In [ ]: