RIVEST SHAMIR ADLEMAN (RSA)

Let $p,q$ be two large primes of comparable size. Put $n = pq$. This computation takes about $O(\log_2(n)^2)$ many flops if you use elementary school method of multiplying these numbers (it is actually $O(\log_2(n)\log\log_2(n))$ many flops if you employ Fast Fourier Transform multiplication algorithm).

However trying to factor $n$ into $p$ and $q$ using the brute force method will take $O(\max\{p,q\})$ modular divisions. which is on the order of $O(\sqrt{n})$ if $p$ and $q$ are close enough to one another.

In [1]:
163298476128374618237491823746912837468127467125648723654817356418765*298734691328746192837461923874619287461983278
Out[1]:
48782919860664620234782399461351856881494516091785396502382566480860785545607256493096155801868780530190395411670
In [2]:
def BruteForceFactor(n):
    primelist = []
    d = 2
    while n!= 1:
        if d*d>n:
            primelist.append(n)
            return primelist
        elif n%d == 0:
            primelist.append(d)
            n = n//d
        else:
            d = d + 1
    return primelist

BruteForceFactor(156)
Out[2]:
[2, 2, 3, 13]
In [3]:
BruteForceFactor(123456789)
Out[3]:
[3, 3, 3607, 3803]
In [4]:
BruteForceFactor(1234567891)
Out[4]:
[1234567891]
In [5]:
BruteForceFactor(1234567891011121314151617181920)
Out[5]:
[2, 2, 2, 2, 2, 3, 5, 323339, 3347983, 2375923237887317]

Euler $\phi$ Function

We are interested in the order of the group $(\mathbb{Z}/n\mathbb{Z})^*$. That is, what is the number $$\phi(n) := \#\{1\leq a \leq n: \operatorname{gcd}(a,n) = 1\}.$$

If $n = p$ is a prime, we know that every integer less than $p$ is relatively prime to $p$, and hence $\phi(p) = p-1$.

If $n = p^k$ a power of a prime, then the only integers that are not relatively prime to $n$ are multiples of $p$, and there are $n/p = p^{k-1}$ many of them. Hence $\phi(p^k) = p^k - p^{k-1} = p^k(1 - 1/p)$.

Lastly by Chinese Remainder theorem we note that if $n = ab$ with $\operatorname{gcd}(a,b) = 1$, then there is an isomorphism of groups $$ (\mathbb{Z}/n\mathbb{Z})^* \cong (\mathbb{Z}/a\mathbb{Z})^* \times (\mathbb{Z}/b\mathbb{Z})^*. $$ There are $\phi(n)$ elements on the left, and $\phi(a)\phi(b)$ elements on the right.

Because of this we say that $\phi$ is a multiplicative function and for a general $n$ of the form, $$ n = p_1^{k_1} p_2^{k_2} \cdots p_r^{k_r} $$ we deduce that $$ \phi(n) = (p_1^{k_1} - p_1^{k_1-1}) \cdots (p_r^{k_r} - p_r^{k_r-1}) = n \prod_{p | n} \left(1 - \frac 1p\right) $$

In [6]:
def EulerPhi(n, factors = None):
    if factors == None:
        primelist = BruteForceFactor(n)
    else:
        primelist= factors
    result = 1
    while len(primelist)>1:
        ult = primelist[-1]
        penult = primelist[-2]
        if ult == penult:
            result = result*ult
            primelist.pop()
        else:
            result = result*(ult -1)
            primelist.pop()
    return result*(primelist[-1]-1)

EulerPhi(101)
Out[6]:
100
In [7]:
EulerPhi(25)
Out[7]:
20
In [8]:
EulerPhi(1234567891)
Out[8]:
1234567890
In [9]:
EulerPhi(6)
Out[9]:
2
In [10]:
def fastexp(x,n,m):
    z = 1
    while n!=0:
        if n%2==1:
            z = (z*x)%m
        x = (x*x)%m
        n = n//2
    return z

Exponentiation for hiding information

Let $p = 1234567891$ be a prime, and choose a number $m$ as the message to be sent.

Computing $m^{123876} \pmod{p}$ gives:

In [11]:
p = 1234567891
m = 58008 #a message for the other side
c = fastexp(m,1234567, p)
c
Out[11]:
1010044275
In [12]:
import matplotlib.pyplot as plt
ks = range(2,640)
y = [fastexp(m,k,p) for k in ks]
s = [5 for i in ks]

plt.scatter(ks,y,s)
plt.show()
<matplotlib.figure.Figure at 0x1f9a79f7710>

Cracking $m$ from $c$ with what is above.

Actually it is not very hard to solve $m$ given $c$ using the fact that $p$ is prime, and we have $$|(\mathbb{Z}/p\mathbb{Z})^*| = p-1$$ for a prime number $p$. Hence we have that $$ a^{p-1} \equiv 1 \pmod{p}.$$ This is because the order of any element $a$ divides the order of the group (that is $p-1$). So $a^p \equiv a \pmod{p}$ for all integers $a$, or in fact $a^{e} \equiv a \pmod p$ for any $e \equiv 1 \pmod {p-1}$.

In [13]:
#Let us bring the things we have written again. 

def bezout(n,m):
    (g,u,v) = (n,1,0)
    (e,s,t) = (m,0,1)
    r = g%e
    while r != 0:
        q = g//e
        (eprime, sprime,tprime) = (e,s,t)
        (e,s,t) = (r,u - q*s,v  - t*q)
        (g,u,v) = (eprime,sprime,tprime)
        r = g%e
    return (e,s,t)

def gcd(n,m):
    (g,u,v) = bezout(n,m)
    return g

def inverse(x,n):
    (g,u,v) = bezout(x,n)
    if g!=1:
        return None
    else:
        return (u%n)
In [14]:
e = 1234567
d = inverse(e, p-1)
d
Out[14]:
18033013
In [15]:
(e*d)%1234567890
Out[15]:
1
In [16]:
n = 10403
fastexp(fastexp(fastexp(fastexp(fastexp(fastexp(7580,5,10403),6,n),7,n),8,n),9,n),10,n)
Out[16]:
9798
In [17]:
fastexp(c,d,p)
Out[17]:
58008
In [18]:
gcd(9797,10403)
Out[18]:
101
In [19]:
for k in range(15):
    print("2^", k," =", fastexp(2,k,7), 'mod 7')
2^ 0  = 1 mod 7
2^ 1  = 2 mod 7
2^ 2  = 4 mod 7
2^ 3  = 1 mod 7
2^ 4  = 2 mod 7
2^ 5  = 4 mod 7
2^ 6  = 1 mod 7
2^ 7  = 2 mod 7
2^ 8  = 4 mod 7
2^ 9  = 1 mod 7
2^ 10  = 2 mod 7
2^ 11  = 4 mod 7
2^ 12  = 1 mod 7
2^ 13  = 2 mod 7
2^ 14  = 4 mod 7

We got $m$ back!!! Why was that, well here is the reason $$ c^{d} \equiv (m^e)^d \equiv m^{ed} \pmod p$$ and $ed \equiv 1 \pmod {p-1}$ and therefore $$c^{d} \equiv m^{ed} \equiv m \pmod {p}.$$

What if the modulus $n$ is not prime.

Then we can do calculation in $G = (\mathbb{Z}/n\mathbb{Z})^*$ however we don't necessarily know $|G| = \phi(n)$. If we do, then the above method also solves the equation for $m$: $$ m^e = \equiv c \pmod{n}.$$

Knowing $\phi(n)$ is easy if you know the factorization of $n$, which means that we are looking for numbers that are hard to factor. The smallest such numbers are of the form $$ n = pq$$ for primes $p$ and $q$.

In [20]:
q = 12345678910111
BruteForceFactor(q)
Out[20]:
[12345678910111]
In [21]:
n = p*q
n
Out[21]:
15241578775018915845901

Dont even try BruteForceFactor(n), it takes too long!!!

In [22]:
from random import randint as rdint

def RSAkeys(n,factors):
    e = rdint(n//4,n//2)
    while not all([gcd(e, p-1)==1 for p in factors]):
        e = rdint(n//4,n//2)
    order = EulerPhi(n,factors)
    d = inverse(e,order)
    return (e,d)

def RSAencrypt(m, n, e):
    c = fastexp(m,e,n)
    return c

def RSAdecrypt(c,n, d):
    m = fastexp(c,d,n)
    return m
In [23]:
(e,d) = RSAkeys(n,[p,q])
(e,d)
Out[23]:
(3937139939893578395651, 15184922774872748407451)
In [24]:
(p-1)*(q-1)
Out[24]:
15241578762672002367900
In [25]:
(e*d)%((p-1)*(q-1))
Out[25]:
1
In [26]:
c = RSAencrypt(58008, n, e)
c
Out[26]:
7399584266783735548498
In [27]:
RSAdecrypt(c,n,d)
Out[27]:
58008

The RSA protocole:

Alice chooses two primes $p,q$ and computes $n = pq$. Choses an $e$ such that $\operatorname{gcd}(e,(p-1)(q-1))$. Computes $d$ such that $ed \equiv 1 \pmod{\phi(n)}$.

Alice publishes $e, n$ and keeps $d$ to herself.

Bob has a message $m$, and computes $c \equiv m^e \pmod{n}$ at the privacy of his own office and sends Alice the cipher $c$.

Alice (since only she has knowledge of the secret key $e$) computes $c^e \pmod{n}$ which is equivalent to $m$.

Factorization is as hard as finding $\phi(n)$.

Note that if $n = pq$ then $\phi(n) = (p-1)(q-1) = pq - (p + q) + 1= n - (p + q) + 1$. Thus if we know $n$ and $\phi(n)$ then we can easily calculate $p + q = n- \phi(n) +1$, and knowing both $pq$ and $p+q$ we solve the quadratic equation $$X^2 + (p + q) X + pq = 0$$ in order to get $p$ and $q$.

The solution is unique

We are given $c \equiv m^{e} \pmod{n}$ and for $\operatorname{gcd}(e, (p-1)(q-1))=1$. So we are trying to find the $e$-th root of $c$ modulo $n$. We said that with $d$ given as $ed \equiv 1 \pmod \phi(n)$ then $x \equiv c^d\pmod{n}$ is a solution to $x^e \equiv c \pmod{n}$. It is also the only solution:

Assume $x$ were a solution, and that $de = 1 + k\phi(n) $ for some integer $k$, $$ \begin{align*} x &\equiv x^{de - k\phi(n) }\pmod{n} \\ &\equiv (x^e)^d (x^{\phi(n)})^{-k} \pmod{n}\\ &\equiv c^d 1^{-k} \equiv c^d \pmod{n}\end{align*} $$

(Note that if the exponent $e$ is not relatively prime to the order of the group, it is not necessarily true that $x^e \equiv c \pmod{n}$ is solvable for generic $c$. For example not every integer modulo $p$ is a square.)

Is this really as hard as factorization

It is generally assumed (especially by public) that the difficulty of breaking RSA is equivalent to factorization. Technically what one needs is to solve the equation $x^e \equiv c \pmod{n}$, which may or may not be strictly easier than the problem of factorizing $n$. The expert opinion is a bit more complicated:

  • D. Boneh and R. Venkatesan. Breaking RSA may not be equivalent to factoring. In Advances in Cryptology—EUROCRYPT ’98 (Espoo), volume 1403 of Lecture Notes in Comput. Sci., pages 59–71. Springer, Berlin, 1998.
  • Brown, D.R.L. J Cryptol Breaking RSA may be as difficult as factoring. Journal of Cryptology January 2016, Volume 29, Issue 1, pp 220–241.
  • Divesh Aggarwal, Ueli Maurer, Breaking RSA Generically Is Equivalent to Factoring Joux A. (eds) Advances in Cryptology - EUROCRYPT 2009. EUROCRYPT 2009. Lecture Notes in Computer Science, vol 5479. Springer, Berlin, Heidelberg

FERMAT's Test for Primality

If $p$ is a prime number then $(\mathbb{Z}/p\mathbb{Z})^*$ has $p-1$ integers and therefore the order of any nonzero element $a \pmod{p}$ divides $p-1$. In particular, $$ a^{p-1} \equiv 1 \pmod{p}.$$ Equivalently $$ a^{p}\equiv a \pmod{p}$$ for all integers $a$.

In [28]:
n = 329487987
fastexp(2,n, n)
Out[28]:
211367123
In [29]:
BruteForceFactor(n)
Out[29]:
[3, 19, 5780491]

Since $2^n \not\equiv 2 \pmod{n}$ we know that $n$ is not a prime.

In [30]:
n = 12801
fastexp(2,n,n)
Out[30]:
2
In [31]:
BruteForceFactor(n)
Out[31]:
[3, 17, 251]
In [32]:
fastexp(3,n,n)
Out[32]:
9693

As can be seen even though $n = 12801$ is not a prime, $2^n \equiv 2 \pmod{n}$. Yet the base $3$ comes to save the day, and is a witness that $n$ is composite.

In [33]:
def CompositeWitness(a,n):
    return fastexp(a,n,n)!=(a%n)

CompositeWitness(2,n)
Out[33]:
False
In [34]:
CompositeWitness(3,n)
Out[34]:
True

Let us see of all the composite numbers, how many of them are caught by various Fermat witnesses.

In [35]:
def CompositeList(N):
    return list(filter(lambda n: len(BruteForceFactor(n))>1, range(2,N)))

comp = CompositeList(10000)
len(comp)
comp[:20]
Out[35]:
[4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20, 21, 22, 24, 25, 26, 27, 28, 30, 32]
In [36]:
list(filter(lambda n: not CompositeWitness(2,n),comp))
Out[36]:
[341,
 561,
 645,
 1105,
 1387,
 1729,
 1905,
 2047,
 2465,
 2701,
 2821,
 3277,
 4033,
 4369,
 4371,
 4681,
 5461,
 6601,
 7957,
 8321,
 8481,
 8911]
In [37]:
list(filter(lambda n: not (CompositeWitness(2,n) or CompositeWitness(3,n)),comp))
Out[37]:
[561, 1105, 1729, 2465, 2701, 2821, 6601, 8911]
In [38]:
def CarmichaelNumbers(N):
    #Returns all the composite numbers n less than N such that
    #All a < n are non-witnesses for the compositeness of n
    comp = CompositeList(N)
    a = 2
    while (a < comp[-1]) and len(comp)>0:
        comp = list(filter(lambda n: not CompositeWitness(a,n), comp))
        a = a + 1
    return list(comp)

CarmichaelNumbers(10000)
Out[38]:
[561, 1105, 1729, 2465, 2821, 6601, 8911]

Carmichael Numbers are the wrenches in this scheme to determine primes.

Even though $561 = 3\times 11\times 17$, it is true that $a^n \equiv a \pmod{n}$ for all integers $a$. So we have a sufficient method for testing primality.

Theorem. All Charmichael numbers are of the form $n = p_1 p_2 \cdots p_k$ with $k>3$ such that the prime divisors $p_i$ are all distinct and $(p_i -1) | (n-1)$ for all $i$.

In [39]:
BruteForceFactor(561)
Out[39]:
[3, 11, 17]
In [40]:
X = range(10**8,10**8 + 100000)
X = filter(lambda n: not(CompositeWitness(2,n) or CompositeWitness(3,n)),X)
X = list(X)
print(len(X))
5412
In [41]:
for p in X:
    if len(BruteForceFactor(p))>1:
        print(str(p) + ' is not a prime, but pretends to be')
100017223 is not a prime, but pretends to be

Miller Rabin Test Comes to the Rescue

Let us test whether $n$ is a prime using the following algorithm. First factorize as $$n -1 = 2^k q$$ with $q$ an odd integer. Let $L_n$ be the set of nonzero integers $a$ less than $n$ such that either $$a^{2^kq} \equiv 1 \pmod{n}$$ or in the set $$a^q, a^{2q}, a^{2^2q}, \ldots, a^{2^{k-1}q}$$ of successive squares they are all congruent to $1$ or one element is congruent to $-1$ modulo $n$.

If $n$ is prime then $L_n = (\mathbb{Z}/N\mathbb{Z})^*$.

If $n$ is not prime then $|L_n| \leq (n-1)/4$. This means that for $n$ composite $\%75$ of the bases $a$ are Witnesses for the compositeness of $n$ using the Miller Rabin primality test. This means that a probabilistic argument can be made.

Partial Proof: If $n=p$ is a prime then since $a^{n-1} \equiv 1 \pmod {n}$ then eitber all integers in the sequence are equivalent to $1$ or in this list of succesive squares one integer $b \not \equiv 1 \pmod{p}$ and $b^2 \equiv 1 \pmod{p}$. This implies that $p | b^2 -1$ hence $p | (b-1)$ or $p | b + 1$. The first case would mean $b \equiv 1 \pmod{p}$, and hence we have $b \equiv -1 \pmod{p}$.

In [42]:
def powersOfTwo(n):
    """
    Returns (k,q) such that n = 2^k q
    """
    q = n
    k = 0
    while q%2==0:
        k = k + 1
        q = q//2
    return (k,q)

def MillerRabinWitness(a,n):
    """
    Tests whether a is a witness for the compositeness of n
    """
    g = gcd(n,a)
    if g > 1 and g < n:
        return True
    (k,q) = powersOfTwo(n-1)
    b = fastexp(a,q,n)
    if b==1:
        return False
    for i in range(k):
        if b==n-1:
            return False
        b = (b*b)%n
    return True

from math import log as log
from random import randint as randint

def MillerRabinPrimalityTest(n, precision = None):
    if precision == None:
        precision = min(1/n,1/1000)
    if n==2:
        return True
    if n%2==0:
        return False
    A = int(log(precision,3/4))+1
    for trial in range(A):
        a = randint(2,n-1)
        if MillerRabinWitness(a,n):
            return False
    return True

MillerRabinPrimalityTest(105,0.001)
Out[42]:
False
In [43]:
def FindPrimes(N,h):
    #Returns the list of (Miller-Rabin)--primes in the interval [N, N + h)
    X = range(N, N + h)
    X = list(filter(MillerRabinPrimalityTest, X))
    return X

FindPrimes(1000000000000000000000000000000,100)
Out[43]:
[1000000000000000000000000000057, 1000000000000000000000000000099]

Deterministic Miller-Rabin Test,

It can be shown that if one assumes the Riemann hypothesis, then if $n$ is not a prime, then an integer $a$ will be a witness for the compositeness of $a$ such that $$a \leq 2\ln(n).$$ This gives us a deterministic test for primality (dependent on Generalized Riemann Hypothesis).

In [ ]:
 

Pollard $p-1$ factorization algorithm

In [44]:
def Pollardp(n, B):
    a = 2
    k = max(1,B//1000) #this is a optimization technique, we don't check the gcd everytime
    for j in range(2,B):
        a = fastexp(a,j,n)
        if j%k ==0:
            g = gcd(a-1,n)
            if g >1 and g<n:
                return (g,n//g)
    return None

Pollardp(101*103,100)
Out[44]:
(101, 103)
In [45]:
B = 10**7

def BSmooth(n,B):
    #Checks whether n is B smooth or not
    for prime in range(2,B):
        while n%prime==0:
            n=n//prime
    if n ==1:
        return True
    else:
        return False

def BSmoothFinder(alist, B):
    for n in alist:
        if BSmooth(n-1,B):
            return n
    return None
    
primes = FindPrimes(223456789101,1000)
q = primes.pop()
p = BSmoothFinder(primes,B)
print('p = ',p, 'q = ', q)
n = p*q
n 
p =  223456789121 q =  223456790033
Out[45]:
49932936808059655630993
In [ ]:
 
In [46]:
Pollardp(n,B)
Out[46]:
(223456789121, 223456790033)
In [47]:
BruteForceFactor(p-1)
Out[47]:
[2, 2, 2, 2, 2, 2, 2, 5, 31, 71, 158633]
In [48]:
BruteForceFactor(q-1)
Out[48]:
[2, 2, 2, 2, 7, 643, 3102877]

Factorization Via Difference of Squares

If we find integers such that $a^2 \equiv b^2 \pmod n$ with $n = pq$ then we can use that since $$pq | (a-b) (a + b) $$ we may have that $p | (a + b)$ and $b | (a - b) $.

How to find such numbers.

Let $p$ and $q$ be as above.

In [49]:
n = 914387
In [50]:
from math import sqrt as sqrt
a = int(sqrt(n)) + 1
sample =0
while sample <5:
    x = (a*a)%n
    if BSmooth(x,13):
        print(a,"^2 =", BruteForceFactor(x), "mod n")
        sample = sample +1
    a = a + 1
1869 ^2 = [2, 2, 2, 2, 3, 5, 5, 5, 5, 5, 5] mod n
1909 ^2 = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5, 11] mod n
3387 ^2 = [3, 5, 5, 5, 11, 11, 11] mod n
4586 ^2 = [3, 3, 5, 11] mod n
5023 ^2 = [2, 2, 2, 2, 2, 2, 2, 5, 7, 11, 11] mod n
In [51]:
a = (1869*1909*3387)%n
b = (2**9*3*5**5*11**2)%n
print(a,b)
9835 164255
In [52]:
gcd(a - b,n)
Out[52]:
1103
In [53]:
gcd(a + b,n)
Out[53]:
829
In [54]:
1103 * 829 ==n
Out[54]:
True
In [55]:
n = 9873417
In [56]:
from math import sqrt as sqrt
a = int(sqrt(n)) + 1
sample =0
while sample <10:
    x = (a*a)%n
    if BSmooth(x,20):
        print(a,"^2 =", BruteForceFactor(x), "mod n")
        sample = sample +1
    a = a + 1
3144 ^2 = [3, 7, 7, 7, 11] mod n
3177 ^2 = [2, 2, 2, 3, 7, 7, 11, 17] mod n
3487 ^2 = [2, 2, 2, 7, 7, 7, 7, 7, 17] mod n
3555 ^2 = [2, 2, 2, 2, 2, 2, 3, 7, 11, 11, 17] mod n
4578 ^2 = [2, 3, 5, 5, 5, 5, 17, 19] mod n
4806 ^2 = [2, 3, 7, 13, 17, 19, 19] mod n
4833 ^2 = [3, 5, 7, 7, 17, 17, 17] mod n
5499 ^2 = [2, 3, 3, 5, 5, 5, 5, 5, 11] mod n
5655 ^2 = [2, 3, 3, 3, 11, 11, 19, 19] mod n
6288 ^2 = [2, 2, 3, 7, 7, 7, 11] mod n
In [57]:
a = (3144*3177*3487)%n
b = (2**3*3**7**5*11*17)%n
print(a, b)
6315897 3814236
In [58]:
gcd(a + b, n)
Out[58]:
3
In [59]:
BruteForceFactor(n)
Out[59]:
[3, 23, 143093]
In [71]:
n = 81407
a = int(sqrt(n)) + 1
sample =0
while sample <10:
    x = (a*a)%n
    if BSmooth(x,12):
        print(a,"^2 =", BruteForceFactor(x), "mod n")
        sample = sample +1
    a = a + 1
407 ^2 = [3, 3, 3, 3, 5, 7] mod n
412 ^2 = [2, 3, 3, 5, 7, 11] mod n
435 ^2 = [7, 7, 7, 7, 11] mod n
484 ^2 = [2, 3, 3, 3, 3, 3, 3, 7, 7] mod n
638 ^2 = [3, 3] mod n
755 ^2 = [2, 2, 2, 2, 11] mod n
757 ^2 = [2, 2, 2, 2, 2, 2, 2, 5, 5] mod n
763 ^2 = [2, 2, 2, 2, 2, 5, 7, 11] mod n
777 ^2 = [2, 2, 2, 5, 7, 11, 11] mod n
814 ^2 = [2, 2, 3, 3, 3, 3, 5, 7] mod n
In [70]:
(503**2)%n
Out[70]:
8788
In [72]:
(313*484)%n
Out[72]:
70085
In [73]:
(2*3**3*7**2*13)%n
Out[73]:
34398
In [74]:
a = 78005 
b = 34398
In [75]:
n
Out[75]:
81407
In [76]:
a + b
Out[76]:
112403
In [78]:
gcd(a - b,n)
Out[78]:
1
In [83]:
a = (2*3**3*5*7**4*11*13)%n
In [84]:
b = (313*407*412*435)%n
In [85]:
print(a,b)
61444 14835
In [86]:
gcd(a + b,n)
Out[86]:
641
In [ ]: