Panos Louridas
Athens University of Economics and Business
The security of the RSA cryptosystem is based on the difficulty of factoring.
That is, given a number, it is difficult to find its prime factors.
But to arrive at the RSA cryptosystem itself, we need first to see first some algorithms from Number Theory.
Euclid's algorithm allows us to find the Greatest Common Divisor (gcd) of two numbers.
def euclid(a, b):
if b == 0:
return a
else:
return euclid(b, a % b)
print(euclid(160, 144))
16
The algorithm is recursive.
To find the gcd of $a$ and $b$ is equivalent to finding the GCD of $a$ and $a \bmod b$, unless $b$ is zero, in which case the gcd is $a$.
It does not matter which of $a$, $b$ is bigger, the algorithm will provide the same answer.
print(euclid(144, 160))
16
To see why this happens, and to get a better understanding at the workings of the algorithm, we can include some tracing information in the code.
def trace_euclid(a, b):
print('GCD of', a, b)
if b == 0:
return a
else:
return trace_euclid(b, a % b)
print(trace_euclid(160, 144))
print(trace_euclid(144, 160))
GCD of 160 144 GCD of 144 16 GCD of 16 0 16 GCD of 144 160 GCD of 160 144 GCD of 144 16 GCD of 16 0 16
So, what happens is that when $a < b$, $a$ and $b$ are swapped in the first recursive call.
That is because if $a < b$, then $a \bmod b = a$.
144 % 160 == 144
True
The next step is the Extended Euclid's Algorithm.
This algorithm gives us the GCD of $a$ and $b$ and two numbers $x$ and $y$ such that: $$ GCD(a, b) = xa + yb$$
def extended_euclid(a, b):
if b == 0:
return (a, 1, 0)
else:
(d, x, y) = extended_euclid(b, a % b)
return (d, y, x - (a // b) * y)
print(extended_euclid(3, 352))
(1, -117, 1)
Indeed, we can verify that the GCD of 3 and 352 is 1 and that: $$ 1 = -117 \times 3 + 352 \times 1$$
(d, x, y) = extended_euclid(3, 352)
d == x * 3 + y * 352
True
Again, we can gain a better understanding by tracing the algorithm.
In the following, Going down
means that we make recursive calls (so we go add to the recursion stack) and Going up
means that we have returned from recursive calls (so we make our way up, unwinding the recursion stack).
In effect, they are the left and the right part of Figure 5.2 in the book.
def trace_extended_euclid(a, b):
if b == 0:
return (a, 1, 0)
else:
print('Going down', a, b, a % b)
(d, x, y) = trace_extended_euclid(b, a % b)
print('Going up', d, x, y)
return (d, y, x - (a // b) * y)
print(trace_extended_euclid(3, 352))
Going down 3 352 3 Going down 352 3 1 Going down 3 1 0 Going up 1 1 0 Going up 1 0 1 Going up 1 1 -117 (1, -117, 1)
We need the Extended Euclid's Algorithm in order to find the modular inverse of a number $a$ modulo $n$.
The inverse of a number $a$, is the number that we denote by $a^{-1}$ such that $a a^{-1} = 1$.
If we move over to arithmetic modulo $n$, we obtain a similar definition for the modular inverse.
The modular inverse of a number $a$ modulo $n$, is the number we denote by $a^{-1} \bmod n$ such that $a a^{-1} \bmod n = 1$.
To find the modular inverse of a number $a$ modulo $n$, we run the Extended Euclid's Algorithm with input $a$ and $n$.
The algorithm will return a triplet $(r, x, y)$.
If $r = 1$ it means we have: $$ 1 = ax + nb$$
Therefore: $$ ax = 1 - bn$$
But that is equivalent to: $$ ax \bmod n = 1$$
So, $x$ is the modular inverse of $a$ modulo $n$.
If $r \ne 1$, then the modular inverse does not exist.
Note that the Extended Euclid's Algorithm may return $(r, x, y)$ with $x < 0$.
If this happens, the modular inverse is $x + n$.
With all this, the function for finding the modular inverse is:
def modular_inverse(a, n):
(r, x, y) = extended_euclid(a, n)
if r != 1:
return 0
elif x < 0:
return x + n
else:
return x
For example, let's take $n = 31$ and $a = 6$.
modular_inverse(6, 31)
26
We can verify that it is correct:
(6 * 26) % 31 == 1
True
Now we can see the RSA cryptosystem in action.
Alice picks two primes, say $p = 17$ and $q = 23$.
She calculates $n = pq$.
p_alice = 17
q_alice = 23
n_alice = p_alice * q_alice
print(n_alice)
391
Alice then chooses a number $e$ that is relatively prime to $(p - 1)(q - 1)$.
Say she picks $e = 3$.
She then finds the modular inverse of $e$ modulo $(p - 1)(q - 1)$.
e_alice = 3
phi_alice = (p_alice - 1)*(q_alice - 1)
d_alice = modular_inverse(e_alice, phi_alice)
print(d_alice)
235
Alice's public key is $(e, n) = (3, 391)$.
Her private key is $(d, n) = (235, 391)$.
If Bob wants to encrypt, say, the message $M = 314$ and send it to Alice, he takes her public key and performs the following operation: $$ M^e \bmod n$$
m = 314
c = pow(m, e_alice, n_alice)
print(c)
155
The encrypted message is $C = 155$, which he sends to Alice.
To decrypt the message, Alice uses her private key and performs the following operation: $$C^d \bmod n$$
dec_c = pow(c, d_alice, n_alice)
print(dec_c)
314
So she did decrypt $C$ to the correct message.
Signing a message works in the same way, only the roles of the public and the private key are interchanged.
If Alice wants to sign, say, the message $M = 314$ and send it to Bob, she takes her public key and performs the following operation: $$M^d \bmod n$$
sig_c = pow(m, d_alice, n_alice)
print(sig_c)
274
The signature is $S = 274$, which she sends to Bob, along with $M = 314$.
Bob can then verify that the sender is indeed Alice by taking her public key and performing the following operation: $$S^e \bmod n$$
pow(sig_c, e_alice, n_alice) == m
True
As we saw, this allows Bob to verify that Alice was the sender, but Alice sends her message as plaintext to Bob.
If she wants to encrypt it, then she will encrypt the message and the signature with Bob's public key.
Bob will decrypt the message and the signagure and then will check the signature against the decrypted message.
Suppose that Bob has selected $p = 31$, $q = 43$.
p_bob = 31
q_bob = 43
n_bob = 31 * 43
print(n_bob)
phi_bob = (p_bob - 1) * (q_bob - 1)
print(phi_bob)
1333 1260
Bob then picks $e = 23$, which is relatively prime to 1260.
He finds the modular inverse of $e$ modulo $(p - 1)(q - 1)$.
e_bob = 23
d_bob = modular_inverse(e_bob, phi_bob)
print(d_bob)
767
So Alice will send her signature of $M = 314$, and send it along with $M$ encrypted with Bob's public key:
c = pow(m, e_bob, n_bob)
print(c)
enc_sig = pow(sig_c, e_bob, n_bob)
print(enc_sig)
126 471
Bob can now decrypt the message and the signature using his private key:
dec_c = pow(c, d_bob, n_bob)
print(dec_c)
dec_sig = pow(enc_sig, d_bob, n_bob)
print(dec_sig)
314 274
This is equal to what he will get by verifying the signature:
pow(dec_sig, e_alice, n_alice) == m
True
Up to this point we have been using small numbers, but in reality we use much bigger ones.
For instance, we want the sizes of our RSA keys to be 4096 bits.
Let's see how RSA works with such numbers.
In what follows we did not pulled big numbers out of thin air. In reality, we used the OpenSSL toolkit, in case you wonder how the example was built.
We'll use $e = 65537$.
e = 65537
We need two large primes $p$ and $q$, so here is $p$:
p = 0x00cfb0f468a68aee13806191c962ea\
565a4ab0d5547797d05a27aac67d6a\
ba1b279a1e7f62220b26190cefb271\
5d4bb81e28cffc290c2c05a58c9767\
184ceec95e3b333833829e346ec9ec\
f50ba2ba033f88d5b961008726f61b\
7600fd6b0d408c2d347ab8360aa14c\
eb9b4c851e61b76265bf53b510c647\
d185341571cb8fa76e1fce322906a8\
8a78fb4c588d821a08d395dc57e972\
2cc83c71cf3a138531f3e0f3918996\
eaf80c4879bf94755f0d2f470d5d6f\
bcbe1ec71fa2012341dcf8be36daf2\
3f3b00d30859a373efce255649602d\
f7664779bdc441602d3f0217240003\
a96801730aba5392796d0ea69116e8\
2e107d48b35d010c3c4a49973b3aa7\
d047
And here is $q$:
q = 0x00c7e596a9d7d1ee5f7e591933b3b9\
9f8e720b9f3e7b402987adb9480595\
14a5bd56d071bb9ef8637cd9c34932\
1d60ff2990bf2d4d1178d6456d7007\
ae527e3f8d7128a4f0d7fe04179b2a\
080c7c63f85207be1e21a8d322823a\
201860accf874effbc9a14bfeb266c\
83bf2957dc79f1459a34d7bda13b96\
caa239540b9337c018793d0645bf4c\
541e792bccfb43b849505296da74c9\
18ae9b47dbc544ac880baaf733df5d\
76fd0a2bfb9c9b2bfe24393a61acdb\
2cc134366ce669788ff8c85ac4806a\
b35bc8d1110a27769fabae1a474007\
33f0e63e078b15163a86b7f402d9e4\
644a594f2e42b095728d25d411eb0a\
72efe859f58e5bdd1f93d15b77fb90\
c9dd
Now we can calculate $(p - 1)(q - 1)$:
phi = (p - 1) * (q - 1)
print(hex(phi))
0xa22cd18375319222e3284c004ac9a225ecce5b00446aaff5f2df68ecb6ae135c90127c58b7d99cb1a66da9cbe67db3352f3068af2381bf6d2bcf0220a416af0b008bf87eec889a95e8f81036634f2d823eccab77f79f09c693a5f0f335566c5b10c90c0ebbeef5076a273db5a86873351a7b06df09118f3fe988732ebdd4642f6a25b0ad85b49f1b68b07e946e2005cbb6143110a05df959570828c45b7aa8d2bbc78c0bddf7d27a48aaee8f569b5434229ddd4711f8c67bc6c8f52dff31b1d8df252008c8049706bc1598a1cba14ea82a2a9dd19b6d937df77106676c1c039894dd9101189f97c709f5849293774f74c4b6f69cdfb8397c4a22518a7f1aaaa720076e198df7913b3da617b04efad404d366ecf7a3e3c9ddad036f36f5bcda11b395436f9edd42a9e192e994d6573e4ce1b9ed779f17320a3dd78f8e301e0e2651c58bf8351e28caa10a12872581617a74f7ec1ce9cfc038e6de4e71dff6f047dba1b113eed5a4b7e3ac3f56fc8d2f32360167f96c3c9bcb14ff87bc1b8735d3b9394807d3e853dbf6a8cd669f0600de09c15558cd99d8371fc016d24ac73de641714b1afa8e39e1f70b1e335cedf452477509af901580cdbd1ed1fa2c1b65105dbfd2c89184c5f835e9a73d0d73604da0c2b50268a56e073220f8b121f2827d321eec3a5c582dc6c8875efb7bdf47da380bb614bf430b44e8ffb217f41df228
And we can also calculate $n = pq$:
n = p * q
print(hex(n))
0xa22cd18375319222e3284c004ac9a225ecce5b00446aaff5f2df68ecb6ae135c90127c58b7d99cb1a66da9cbe67db3352f3068af2381bf6d2bcf0220a416af0b008bf87eec889a95e8f81036634f2d823eccab77f79f09c693a5f0f335566c5b10c90c0ebbeef5076a273db5a86873351a7b06df09118f3fe988732ebdd4642f6a25b0ad85b49f1b68b07e946e2005cbb6143110a05df959570828c45b7aa8d2bbc78c0bddf7d27a48aaee8f569b5434229ddd4711f8c67bc6c8f52dff31b1d8df252008c8049706bc1598a1cba14ea82a2a9dd19b6d937df77106676c1c039894dd9101189f97c709f5849293774f74c4b6f69cdfb8397c4a22518a7f1aaaa8b79df92c0c546dae3c60c2ad659ec9ed9023618a96bbc3bf82677db9f58b9af6a484348d5fe0cc3fc845e5385103f5949b4916edbcbc0df537defe54cf8b1711fe21691c8fba615106210f9f449f5d0c058bc39f932a09b13c7467cff7d3b8230892c5e0e4cb6c715306b533f768d7da35f5936c1e3e7a673c6cf1397a4e9d5a5244807699dd32736b20f2ef64d85301f23487b708df4f0ed96b162a7c813d722c36b40f5c8350566c674dd4ba1f5cd3b67f54990f687e5a45896dcc01dc7e0bb91cc55f5b28df5c00d436b6e0e3f0edd5ee0c30206abd5da888be6b2d195c653fd146fc955531eeb48193761ee13a7b387158bdaa9ff4a0c71aa4cb2a568c4b
The public key is $(e, n)$.
The private key is $(d, n)$ where $d$ is the multiplicative inverse modulo $(p - 1)(q - 1)$ of $e$.
That is:
d = modular_inverse(e, phi)
print(hex(d))
0x5c36e1d73eff35da52922a4d0c3984d2cdc934a37d43b0d4480ad2edae9e62f2021610d09d91c6709972c7d6e233dd7fc35a625c1bf37df6c4af4bc565a86455fc349ad3090a4fe427f94db6af576948230f5bfcb6379f6663b43ac30034291ecaf796bc960e3513c73f92ee455947110e02a09097e67d2ed94ab63c00c2d148c8b1afb9ab4a5e2246affcf9c778bbf2ee90a2a992967cdf590691afbd588cc06ef7f3611810ff847ae77f08d035387700ac0513915b84f902ba67f784a12c4065fa05321911cb1290463368e491cf58fdc907be15040496dee94cbbe81431d343ea8cb633db238190df2347f9442151033e0a127f7f906864fb08a498cb7ff7d1d587bc4e2654557c23825ac53e08eac98e3ff63c219d3a4853a89c76b4b0c6d9f768c6026e4ba6af564b2d4f87822c5339d6966b46d434509fcec80a023d077afdba2e237e07b1d84a12e8f7e18e9d03ffa92973a5a6e674183a40b41a04dcc73d34a461ece7618cbaf7b52d9bd61755f00cc01dd484bdc02de06f8cb3a05ec87692c7a8adf56920e33110462232c20e87f950aa6fed2ac30a05b003c731224959f35bded69a812b690dfe24bc456db7a60c20e0ebe936936ffff3a1bade07a5571ab2e353bd01fb1de5f3683f8fa5f460626e813477d0eda723e20cac899848b49e9c3cc0031f820cbf6c7e2e22fc295c3767aaada3adc65306606a75d0a9
Let's encrypt the message ATTACKATDAWN
.
To do that, we need to convert the message to a number smaller than $n$.
One way to do that is to convert the string to a sequence of bytes and then interpret that sequence of bytes as an integer, using the int.from_bytes()
function that we saw in the notebook for Chapter 4.
encoded_m = int.from_bytes('ATTACKATDAWN'.encode('utf-8'), byteorder='big')
print(m)
20218473289907025413077555022
The number is well below $n$, so we are fine.
We can encrypt it:
c = pow(encoded_m, e, n)
print(c)
407610036536766973941746712970056610431835270763945506960742542700949567151960874466077831719168680539470512240285486337550969993279696657735430597939419591374065091761356463235152143710255449503873561336415103006795697846296395947461876977977394431370772059394605673469827590676144590868929203525034699443219779269405168770159479959294157741569972986104837443912443448384645504245575692381456943218026944194684571058452049478974903319436215524101922325643097716873419410243227934020948460647053999998422913455094868056924225820448155709173128595626608895554934753751544860636088412735253276591446354450386193737480234349332997586568691670839008357864280250666956934406970115168608653775115126626245410948290822146137494594341860806577958750028245578858672356969329085479075907906476430846438024985350304613047425941365722484003570780153140446461960446467394379043325391336573770382418692001929699853863734921837392277439431493254812307721121364761570747515567992948509006262597846233976186363549135380390043355308274871299976822982419028054165067879724454988158437177860111295800172420220495633529577878379178147048919884717838512877763877003097684166522722064813438850586879945356215277724271919578412510711801553294801455380141500
And we can verify that it decrypts correctly:
dec_c = pow(c, d, n)
dec_c == encoded_m
True
Finally, we can recover the original string, by converting its number encoding to a series of bytes, and that to a string.
Note that the int.to_bytes()
function requires the length of the byte array, which we calculate as we go.
In effect, we add 7 to the number of bits representing the integer encoding of the message, and then we perform an integer division by 8. In this way we get the rounded up multiple of 8, which is what we want.
decoded_m = dec_c.to_bytes((dec_c.bit_length() + 7) // 8, byteorder='big').decode('utf-8')
print(decoded_m)
ATTACKATDAWN
Note, though, that if our message cannot be encoded with less bits than those required for representing $n$, the modulo, we cannot encrypt it using RSA.
In that case, we use RSA only to exchange a secret encryption key that we will use to encrypt and decrypt our message using a symmetric encryption algorithm, such as AES, using a hybrid encryption scheme: a combination of symmetric and asymmetric cryptography.
If we want to sign a large message, we usually sign a digital fingerprint of the message instead.
A fingerprint of a message $M$ is a short piece of identifying data that we get by applying a special fast function $h(M)$ to the message.
This function is called a hash function.
As a hash function maps a message of a number of bits to a digest of a smaller number of bits, we will always have collisions: different messages hashing to the same value.
However, a good hash function will make such collisions very unlikely in practice.
A good family of hash function is the Secure Hash Algorithm 2 (SHA-2) family of functions, defined as part of the Secure Hash Standard.
The functions are available in Python through the hashlib
library.
As an example, we will use SHA-256, which produces 256 bit hashes.
We can verify that even very similar messages produce very different hashes.
For example, capital and smaller case latin letters differ only by one bit, bit 100000 = 32 as we can see below:
import string
for lc, uc in zip(string.ascii_lowercase, string.ascii_uppercase):
bin_lc = bin(ord(lc))
bin_uc = bin(ord(uc))
diff = ord(lc) ^ ord(uc)
print(lc, bin_lc, uc, bin_uc, bin(diff), diff)
a 0b1100001 A 0b1000001 0b100000 32 b 0b1100010 B 0b1000010 0b100000 32 c 0b1100011 C 0b1000011 0b100000 32 d 0b1100100 D 0b1000100 0b100000 32 e 0b1100101 E 0b1000101 0b100000 32 f 0b1100110 F 0b1000110 0b100000 32 g 0b1100111 G 0b1000111 0b100000 32 h 0b1101000 H 0b1001000 0b100000 32 i 0b1101001 I 0b1001001 0b100000 32 j 0b1101010 J 0b1001010 0b100000 32 k 0b1101011 K 0b1001011 0b100000 32 l 0b1101100 L 0b1001100 0b100000 32 m 0b1101101 M 0b1001101 0b100000 32 n 0b1101110 N 0b1001110 0b100000 32 o 0b1101111 O 0b1001111 0b100000 32 p 0b1110000 P 0b1010000 0b100000 32 q 0b1110001 Q 0b1010001 0b100000 32 r 0b1110010 R 0b1010010 0b100000 32 s 0b1110011 S 0b1010011 0b100000 32 t 0b1110100 T 0b1010100 0b100000 32 u 0b1110101 U 0b1010101 0b100000 32 v 0b1110110 V 0b1010110 0b100000 32 w 0b1110111 W 0b1010111 0b100000 32 x 0b1111000 X 0b1011000 0b100000 32 y 0b1111001 Y 0b1011001 0b100000 32 z 0b1111010 Z 0b1011010 0b100000 32
We can then see what happens if we get the SHA-256 hash value of i am going to be hashed
vs. the hash value of I am going to be hashed
.
The two messages differ by just one bit.
Yet they produce completely different SHA-256 hash values.
import hashlib
print(hashlib.sha256(b"i am going to be hashed").hexdigest())
print(hashlib.sha256(b"I am going to be hashed").hexdigest())
52ac2af460118f051bcc42886ab7f6cbb48ecdba8eeffb141f9c811c12a34353 1b0ca72ec711fcec092e0acfec74f2033dbe70531019892e2bf9be2a0d53c84d
To repeat how we ended the previous notebook: you should never, ever, use any of the code in this notepad for production use if you care a bit about security.
The code we have shown serves for demonstration purposes only.
However, the development of secure code is a full-blown art. There are many details that one has to take into account when writing code for a robust, secure system.
You should always use libraries that have a good security record, that are actively supported, and are well documented.
You should never use your own cryptographic code unless you are an expert programmer, you have shared your code with the programming community, and the consensus is that the implementation is sound.