# The Vigenère cipher¶

Let us write some algorithms to deal with the Vigenère cipher.

In :
from itertools import cycle

def encode(s):
return tuple(ord(c) - 65 for c in s)

def decode(t):
return ''.join(chr(n + 65) for n in t)

def encrypt(plaintext, key):
return decode((p + k) % 26 for p, k in zip(encode(plaintext), cycle(encode(key))))

def decrypt(ciphertext, key):
return decode((c - k) % 26 for c, k in zip(encode(ciphertext), cycle(encode(key))))

def index_of_coincidence(s):
d = {}
for c in s:
d.setdefault(c, 0)
d[c] += 1
l = len(s)
return sum(n * (n-1) for n in d.values()) / (l * (l-1))


Let us encrypt a sample message.

In :
p = 'LJUBLJANA'
k = 'FRI'
c = encrypt(p, k)
c

Out:
'QACGCRFEI'

We now compare the indices of coincidence for the plaintext and the ciphertext.

In :
index_of_coincidence(p)

Out:
0.08333333333333333
In :
index_of_coincidence(c)

Out:
0.027777777777777776

Note that the index of coincidence does not change when a Caesar cipher is used.

In :
cc = encrypt(p, 'C')
cc

Out:
'NLWDNLCPC'
In :
index_of_coincidence(cc)

Out:
0.08333333333333333

## Sample ciphertext¶

Let us now examine the following ciphertext. Using the Kasiski test, we have determined that the most likely key length is $3$.

In :
ciphertext = "NKASFBBYIYPWZCWTBIYKPFKUFKBJIANKABYIYPWZJMJ"

In :
index_of_coincidence(ciphertext)

Out:
0.058693244739756366

We now break the ciphertext into what we expect to be three Caesar-encrypted sub-ciphertexts.

In :
C = [ciphertext[i::3] for i in range(3)]
C

Out:
['NSBYZTYFFJNBYZJ', 'KFYPCBKKKIKYPJ', 'ABIWWIPUBAAIWM']
In :
[index_of_coincidence(y) for y in C]

Out:
[0.0761904761904762, 0.13186813186813187, 0.10989010989010989]

Let us now compute the indices of coincidence we obtain when concatenating one of these strings with each Caesar-encryption of each of the other two. We list the offsets sorted by the index of coincidence.

In :
sorted(((index_of_coincidence(C + decrypt(C, chr(65+i))), i) for i in range(26)), reverse=True)

Out:
[(0.09359605911330049, 23),
(0.09113300492610837, 11),
(0.08866995073891626, 12),
(0.08374384236453201, 9),
(0.07881773399014778, 17),
(0.07881773399014778, 5),
(0.07881773399014778, 1),
(0.07881773399014778, 0),
(0.07389162561576355, 10),
(0.07142857142857142, 18),
(0.06896551724137931, 22),
(0.06896551724137931, 6),
(0.0665024630541872, 15),
(0.0665024630541872, 4),
(0.0665024630541872, 3),
(0.06403940886699508, 25),
(0.06403940886699508, 19),
(0.06403940886699508, 16),
(0.06403940886699508, 14),
(0.06403940886699508, 2),
(0.06157635467980296, 7),
(0.05665024630541872, 8),
(0.054187192118226604, 21),
(0.05172413793103448, 13),
(0.04926108374384237, 24),
(0.04926108374384237, 20)]
In :
sorted(((index_of_coincidence(C + decrypt(C, chr(65+i))), i) for i in range(26)), reverse=True)

Out:
[(0.09359605911330049, 21),
(0.08620689655172414, 3),
(0.08374384236453201, 2),
(0.0812807881773399, 17),
(0.07881773399014778, 25),
(0.07881773399014778, 13),
(0.07881773399014778, 9),
(0.07635467980295567, 7),
(0.07142857142857142, 10),
(0.0665024630541872, 24),
(0.0665024630541872, 14),
(0.06403940886699508, 22),
(0.06157635467980296, 23),
(0.06157635467980296, 1),
(0.05665024630541872, 16),
(0.05665024630541872, 15),
(0.05665024630541872, 8),
(0.054187192118226604, 18),
(0.054187192118226604, 11),
(0.054187192118226604, 0),
(0.05172413793103448, 19),
(0.05172413793103448, 4),
(0.04926108374384237, 6),
(0.046798029556650245, 20),
(0.04433497536945813, 12),
(0.04433497536945813, 5)]

First, we try all Caesar-decryptions with the offsets giving the highest indices of coincidence. Unfortunately, this does not seem to work.

In :
ccc = ''.join((''.join(t) for t in zip(C, decrypt(C, chr(65+23)), decrypt(C, chr(65+23)))))
[decrypt(ccc, chr(65+i)) for i in range(26)]

Out:
['NNDSIEBBLYSZZFZTELYNSFNXFNEJLDNNDBBLYSZZMP',
'MMCRHDAAKXRYYEYSDKXMREMWEMDIKCMMCAAKXRYYLO',
'LLBQGCZZJWQXXDXRCJWLQDLVDLCHJBLLBZZJWQXXKN',
'KKAPFBYYIVPWWCWQBIVKPCKUCKBGIAKKAYYIVPWWJM',
'JJZOEAXXHUOVVBVPAHUJOBJTBJAFHZJJZXXHUOVVIL',
'IIYNDZWWGTNUUAUOZGTINAISAIZEGYIIYWWGTNUUHK',
'HHXMCYVVFSMTTZTNYFSHMZHRZHYDFXHHXVVFSMTTGJ',
'GGWLBXUUERLSSYSMXERGLYGQYGXCEWGGWUUERLSSFI',
'FFVKAWTTDQKRRXRLWDQFKXFPXFWBDVFFVTTDQKRREH',
'EEUJZVSSCPJQQWQKVCPEJWEOWEVACUEEUSSCPJQQDG',
'DDTIYURRBOIPPVPJUBODIVDNVDUZBTDDTRRBOIPPCF',
'CCSHXTQQANHOOUOITANCHUCMUCTYASCCSQQANHOOBE',
'AAQFVROOYLFMMSMGRYLAFSAKSARWYQAAQOOYLFMMZC',
'ZZPEUQNNXKELLRLFQXKZERZJRZQVXPZZPNNXKELLYB',
'YYODTPMMWJDKKQKEPWJYDQYIQYPUWOYYOMMWJDKKXA',
'XXNCSOLLVICJJPJDOVIXCPXHPXOTVNXXNLLVICJJWZ',
'WWMBRNKKUHBIIOICNUHWBOWGOWNSUMWWMKKUHBIIVY',
'VVLAQMJJTGAHHNHBMTGVANVFNVMRTLVVLJJTGAHHUX',
'UUKZPLIISFZGGMGALSFUZMUEMULQSKUUKIISFZGGTW',
'TTJYOKHHREYFFLFZKRETYLTDLTKPRJTTJHHREYFFSV',
'SSIXNJGGQDXEEKEYJQDSXKSCKSJOQISSIGGQDXEERU',
'RRHWMIFFPCWDDJDXIPCRWJRBJRINPHRRHFFPCWDDQT',
'QQGVLHEEOBVCCICWHOBQVIQAIQHMOGQQGEEOBVCCPS',
'PPFUKGDDNAUBBHBVGNAPUHPZHPGLNFPPFDDNAUBBOR',
'OOETJFCCMZTAAGAUFMZOTGOYGOFKMEOOECCMZTAANQ']

We may also try with some other offsets giving high indices of coincidece. After some attempts, we find a legible string.

In :
ccc = ''.join((''.join(t) for t in zip(C, decrypt(C, chr(65+12)), decrypt(C, chr(65+3)))))
[decrypt(ccc, chr(65+i)) for i in range(26)]

Out:
['NYXSTYBMFYDTZQTTPFYYMFYRFYYJWXNYXBMFYDTZXJ',
'MXWRSXALEXCSYPSSOEXXLEXQEXXIVWMXWALEXCSYWI',
'LWVQRWZKDWBRXORRNDWWKDWPDWWHUVLWVZKDWBRXVH',
'KVUPQVYJCVAQWNQQMCVVJCVOCVVGTUKVUYJCVAQWUG',
'JUTOPUXIBUZPVMPPLBUUIBUNBUUFSTJUTXIBUZPVTF',
'ITSNOTWHATYOULOOKATTHATMATTERSITSWHATYOUSE',
'HSRMNSVGZSXNTKNNJZSSGZSLZSSDQRHSRVGZSXNTRD',
'GRQLMRUFYRWMSJMMIYRRFYRKYRRCPQGRQUFYRWMSQC',
'FQPKLQTEXQVLRILLHXQQEXQJXQQBOPFQPTEXQVLRPB',
'EPOJKPSDWPUKQHKKGWPPDWPIWPPANOEPOSDWPUKQOA',
'DONIJORCVOTJPGJJFVOOCVOHVOOZMNDONRCVOTJPNZ',
'CNMHINQBUNSIOFIIEUNNBUNGUNNYLMCNMQBUNSIOMY',
'BMLGHMPATMRHNEHHDTMMATMFTMMXKLBMLPATMRHNLX',
'ALKFGLOZSLQGMDGGCSLLZSLESLLWJKALKOZSLQGMKW',
'ZKJEFKNYRKPFLCFFBRKKYRKDRKKVIJZKJNYRKPFLJV',
'YJIDEJMXQJOEKBEEAQJJXQJCQJJUHIYJIMXQJOEKIU',
'WHGBCHKVOHMCIZCCYOHHVOHAOHHSFGWHGKVOHMCIGS',
'VGFABGJUNGLBHYBBXNGGUNGZNGGREFVGFJUNGLBHFR',
'UFEZAFITMFKAGXAAWMFFTMFYMFFQDEUFEITMFKAGEQ',
'TEDYZEHSLEJZFWZZVLEESLEXLEEPCDTEDHSLEJZFDP',
'SDCXYDGRKDIYEVYYUKDDRKDWKDDOBCSDCGRKDIYECO',
'RCBWXCFQJCHXDUXXTJCCQJCVJCCNABRCBFQJCHXDBN',
'QBAVWBEPIBGWCTWWSIBBPIBUIBBMZAQBAEPIBGWCAM',
'OZYTUZCNGZEUARUUQGZZNGZSGZZKXYOZYCNGZEUAYK']
Looking at the offsets used, we determine that the key is again FRI.
decrypt(ciphertext, 'FRI')

'ITSNOTWHATYOULOOKATTHATMATTERSITSWHATYOUSEE'