The Vigenère cipher

Let us write some algorithms to deal with the Vigenère cipher.

In [1]:
from itertools import cycle

def encode(s):
    return tuple(ord(c) - 65 for c in s)

def decode(t):
    return ''.join(chr(n + 65) for n in t)

def encrypt(plaintext, key):
    return decode((p + k) % 26 for p, k in zip(encode(plaintext), cycle(encode(key))))

def decrypt(ciphertext, key):
    return decode((c - k) % 26 for c, k in zip(encode(ciphertext), cycle(encode(key))))

def index_of_coincidence(s):
    d = {}
    for c in s:
        d.setdefault(c, 0)
        d[c] += 1
    l = len(s)
    return sum(n * (n-1) for n in d.values()) / (l * (l-1))

Let us encrypt a sample message.

In [2]:
p = 'LJUBLJANA'
k = 'FRI'
c = encrypt(p, k)
c
Out[2]:
'QACGCRFEI'

We now compare the indices of coincidence for the plaintext and the ciphertext.

In [3]:
index_of_coincidence(p)
Out[3]:
0.08333333333333333
In [4]:
index_of_coincidence(c)
Out[4]:
0.027777777777777776

Note that the index of coincidence does not change when a Caesar cipher is used.

In [5]:
cc = encrypt(p, 'C')
cc
Out[5]:
'NLWDNLCPC'
In [6]:
index_of_coincidence(cc)
Out[6]:
0.08333333333333333

Sample ciphertext

Let us now examine the following ciphertext. Using the Kasiski test, we have determined that the most likely key length is $3$.

In [7]:
ciphertext = "NKASFBBYIYPWZCWTBIYKPFKUFKBJIANKABYIYPWZJMJ"
In [8]:
index_of_coincidence(ciphertext)
Out[8]:
0.058693244739756366

We now break the ciphertext into what we expect to be three Caesar-encrypted sub-ciphertexts.

In [9]:
C = [ciphertext[i::3] for i in range(3)]
C
Out[9]:
['NSBYZTYFFJNBYZJ', 'KFYPCBKKKIKYPJ', 'ABIWWIPUBAAIWM']
In [10]:
[index_of_coincidence(y) for y in C]
Out[10]:
[0.0761904761904762, 0.13186813186813187, 0.10989010989010989]

Let us now compute the indices of coincidence we obtain when concatenating one of these strings with each Caesar-encryption of each of the other two. We list the offsets sorted by the index of coincidence.

In [11]:
sorted(((index_of_coincidence(C[0] + decrypt(C[1], chr(65+i))), i) for i in range(26)), reverse=True)
Out[11]:
[(0.09359605911330049, 23),
 (0.09113300492610837, 11),
 (0.08866995073891626, 12),
 (0.08374384236453201, 9),
 (0.07881773399014778, 17),
 (0.07881773399014778, 5),
 (0.07881773399014778, 1),
 (0.07881773399014778, 0),
 (0.07389162561576355, 10),
 (0.07142857142857142, 18),
 (0.06896551724137931, 22),
 (0.06896551724137931, 6),
 (0.0665024630541872, 15),
 (0.0665024630541872, 4),
 (0.0665024630541872, 3),
 (0.06403940886699508, 25),
 (0.06403940886699508, 19),
 (0.06403940886699508, 16),
 (0.06403940886699508, 14),
 (0.06403940886699508, 2),
 (0.06157635467980296, 7),
 (0.05665024630541872, 8),
 (0.054187192118226604, 21),
 (0.05172413793103448, 13),
 (0.04926108374384237, 24),
 (0.04926108374384237, 20)]
In [12]:
sorted(((index_of_coincidence(C[0] + decrypt(C[2], chr(65+i))), i) for i in range(26)), reverse=True)
Out[12]:
[(0.09359605911330049, 21),
 (0.08620689655172414, 3),
 (0.08374384236453201, 2),
 (0.0812807881773399, 17),
 (0.07881773399014778, 25),
 (0.07881773399014778, 13),
 (0.07881773399014778, 9),
 (0.07635467980295567, 7),
 (0.07142857142857142, 10),
 (0.0665024630541872, 24),
 (0.0665024630541872, 14),
 (0.06403940886699508, 22),
 (0.06157635467980296, 23),
 (0.06157635467980296, 1),
 (0.05665024630541872, 16),
 (0.05665024630541872, 15),
 (0.05665024630541872, 8),
 (0.054187192118226604, 18),
 (0.054187192118226604, 11),
 (0.054187192118226604, 0),
 (0.05172413793103448, 19),
 (0.05172413793103448, 4),
 (0.04926108374384237, 6),
 (0.046798029556650245, 20),
 (0.04433497536945813, 12),
 (0.04433497536945813, 5)]

First, we try all Caesar-decryptions with the offsets giving the highest indices of coincidence. Unfortunately, this does not seem to work.

In [13]:
ccc = ''.join((''.join(t) for t in zip(C[0], decrypt(C[1], chr(65+23)), decrypt(C[2], chr(65+23)))))
[decrypt(ccc, chr(65+i)) for i in range(26)]
Out[13]:
['NNDSIEBBLYSZZFZTELYNSFNXFNEJLDNNDBBLYSZZMP',
 'MMCRHDAAKXRYYEYSDKXMREMWEMDIKCMMCAAKXRYYLO',
 'LLBQGCZZJWQXXDXRCJWLQDLVDLCHJBLLBZZJWQXXKN',
 'KKAPFBYYIVPWWCWQBIVKPCKUCKBGIAKKAYYIVPWWJM',
 'JJZOEAXXHUOVVBVPAHUJOBJTBJAFHZJJZXXHUOVVIL',
 'IIYNDZWWGTNUUAUOZGTINAISAIZEGYIIYWWGTNUUHK',
 'HHXMCYVVFSMTTZTNYFSHMZHRZHYDFXHHXVVFSMTTGJ',
 'GGWLBXUUERLSSYSMXERGLYGQYGXCEWGGWUUERLSSFI',
 'FFVKAWTTDQKRRXRLWDQFKXFPXFWBDVFFVTTDQKRREH',
 'EEUJZVSSCPJQQWQKVCPEJWEOWEVACUEEUSSCPJQQDG',
 'DDTIYURRBOIPPVPJUBODIVDNVDUZBTDDTRRBOIPPCF',
 'CCSHXTQQANHOOUOITANCHUCMUCTYASCCSQQANHOOBE',
 'BBRGWSPPZMGNNTNHSZMBGTBLTBSXZRBBRPPZMGNNAD',
 'AAQFVROOYLFMMSMGRYLAFSAKSARWYQAAQOOYLFMMZC',
 'ZZPEUQNNXKELLRLFQXKZERZJRZQVXPZZPNNXKELLYB',
 'YYODTPMMWJDKKQKEPWJYDQYIQYPUWOYYOMMWJDKKXA',
 'XXNCSOLLVICJJPJDOVIXCPXHPXOTVNXXNLLVICJJWZ',
 'WWMBRNKKUHBIIOICNUHWBOWGOWNSUMWWMKKUHBIIVY',
 'VVLAQMJJTGAHHNHBMTGVANVFNVMRTLVVLJJTGAHHUX',
 'UUKZPLIISFZGGMGALSFUZMUEMULQSKUUKIISFZGGTW',
 'TTJYOKHHREYFFLFZKRETYLTDLTKPRJTTJHHREYFFSV',
 'SSIXNJGGQDXEEKEYJQDSXKSCKSJOQISSIGGQDXEERU',
 'RRHWMIFFPCWDDJDXIPCRWJRBJRINPHRRHFFPCWDDQT',
 'QQGVLHEEOBVCCICWHOBQVIQAIQHMOGQQGEEOBVCCPS',
 'PPFUKGDDNAUBBHBVGNAPUHPZHPGLNFPPFDDNAUBBOR',
 'OOETJFCCMZTAAGAUFMZOTGOYGOFKMEOOECCMZTAANQ']

We may also try with some other offsets giving high indices of coincidece. After some attempts, we find a legible string.

In [14]:
ccc = ''.join((''.join(t) for t in zip(C[0], decrypt(C[1], chr(65+12)), decrypt(C[2], chr(65+3)))))
[decrypt(ccc, chr(65+i)) for i in range(26)]
Out[14]:
['NYXSTYBMFYDTZQTTPFYYMFYRFYYJWXNYXBMFYDTZXJ',
 'MXWRSXALEXCSYPSSOEXXLEXQEXXIVWMXWALEXCSYWI',
 'LWVQRWZKDWBRXORRNDWWKDWPDWWHUVLWVZKDWBRXVH',
 'KVUPQVYJCVAQWNQQMCVVJCVOCVVGTUKVUYJCVAQWUG',
 'JUTOPUXIBUZPVMPPLBUUIBUNBUUFSTJUTXIBUZPVTF',
 'ITSNOTWHATYOULOOKATTHATMATTERSITSWHATYOUSE',
 'HSRMNSVGZSXNTKNNJZSSGZSLZSSDQRHSRVGZSXNTRD',
 'GRQLMRUFYRWMSJMMIYRRFYRKYRRCPQGRQUFYRWMSQC',
 'FQPKLQTEXQVLRILLHXQQEXQJXQQBOPFQPTEXQVLRPB',
 'EPOJKPSDWPUKQHKKGWPPDWPIWPPANOEPOSDWPUKQOA',
 'DONIJORCVOTJPGJJFVOOCVOHVOOZMNDONRCVOTJPNZ',
 'CNMHINQBUNSIOFIIEUNNBUNGUNNYLMCNMQBUNSIOMY',
 'BMLGHMPATMRHNEHHDTMMATMFTMMXKLBMLPATMRHNLX',
 'ALKFGLOZSLQGMDGGCSLLZSLESLLWJKALKOZSLQGMKW',
 'ZKJEFKNYRKPFLCFFBRKKYRKDRKKVIJZKJNYRKPFLJV',
 'YJIDEJMXQJOEKBEEAQJJXQJCQJJUHIYJIMXQJOEKIU',
 'XIHCDILWPINDJADDZPIIWPIBPIITGHXIHLWPINDJHT',
 'WHGBCHKVOHMCIZCCYOHHVOHAOHHSFGWHGKVOHMCIGS',
 'VGFABGJUNGLBHYBBXNGGUNGZNGGREFVGFJUNGLBHFR',
 'UFEZAFITMFKAGXAAWMFFTMFYMFFQDEUFEITMFKAGEQ',
 'TEDYZEHSLEJZFWZZVLEESLEXLEEPCDTEDHSLEJZFDP',
 'SDCXYDGRKDIYEVYYUKDDRKDWKDDOBCSDCGRKDIYECO',
 'RCBWXCFQJCHXDUXXTJCCQJCVJCCNABRCBFQJCHXDBN',
 'QBAVWBEPIBGWCTWWSIBBPIBUIBBMZAQBAEPIBGWCAM',
 'PAZUVADOHAFVBSVVRHAAOHATHAALYZPAZDOHAFVBZL',
 'OZYTUZCNGZEUARUUQGZZNGZSGZZKXYOZYCNGZEUAYK']

Looking at the offsets used, we determine that the key is again FRI.

In [15]:
decrypt(ciphertext, 'FRI')
Out[15]:
'ITSNOTWHATYOULOOKATTHATMATTERSITSWHATYOUSEE'