Building mathematical knowledge up to where RSA (named for inventors Ron Rivest, Adi Shamir and Leonard Adleman) makes sense, gives you many useful concepts and insights along the way.
Not that RSA is by any means the only cryptographic algorithm we care about. On the contrary, RSA is relatively new and we may build our concepts and insights by exploring the field of cyptography more generally.
We should start with one of the oldest algorithms in history, called Euclid's Algorithm (EA). Euclid probably got it from some earlier source. It has applications far beyond cryptography.
Python will help us, by expressing this and other algorithms succinctly. We'll be able to test them interactively.
However, before we get to Euclid's Algorithm (EA), and then Euclid's Extended Algorithm (EEA) we should review the basic concepts of prime versus composite positive integers.
Primes have no divisors other than themselves, and one. Trying to divide them by another integer greater than 1 always leaves some remainder.
Composities are products of prime factors and comprise the rest of the positive integers.
All positive integers except 1 are either composite or prime. Mathematician J. H. Conway suggested we make -1 a prime, as then negative integers could also be reduced to prime factors.
Discovering whether a giant integer is composite or prime is not always easy, because huge composites with only two prime factors may behave just like primes, and finding these factors can be really hard to do. Absent any factorization, the number might be prime after all, right?
In fact, RSA depends on factoring large composites being next to impossible, if they're large enough.
When we talk about factoring numbers, that gets us thinking about remainders, like if there is one. We say m divides n "evenly" not because m goes into n an even number of times (which may be true too) but because there's no remainder. m divides n with nothing left over. That makes m a factor of n.
12 % 3 # no remainder, 3 divides 12 evenly
0
12 % 7 # yes remainder, 7 is not a factor of 12
5
The divmod function (built in, no need to import it from anywhere), returns a tuple with two pieces of information: how many times b went into a, and the remainder after so doing.
divmod(100, 12)
(8, 4)
12 goes into 100 eight times, leaving a remainder of 4.
By definition, the 2nd argument $m$ times the first output $q$, plus the 2nd output $r$, should equal the 1st argument $n$ to divmod(n, m).
Think about it for awhile. We're looking at lots of moving parts.
$q$ stands for quotient and $r$ for remainder. divmod(n, m) returns (n//m, n%m).
def _divmod(n, m):
return (n//m, n%m)
_divmod(28398, 747)
(38, 12)
def always_true(n, m):
q, r = divmod(n, m)
print((q, r))
return n == q * m + r # should always be True
always_true(28398, 747) # try a bunch of examples
(38, 12)
True
import primes # a package (has __init__.py)
dir(primes) # not everything is exposed
['PrimeNumbers', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', 'all_factors', 'eratosthenes', 'euler_test', 'factors', 'invmod', 'isprime', 'primes_gen', 'primesplay', 'xgcd']
primes.factors(100)
(1, 2, 2, 5, 5)
primes.all_factors(100)
[1, 2, 4, 5, 10, 20, 25, 50, 100]
p = primes.primes_gen.PrimeNumbers()
[next(p) for _ in range(20)]
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]
Source Code for the PrimeNumbers
iterator.
Coprimes are not necessarily prime themselves. Two numbers are coprime if they have no factors in common. We also call them "strangers" in that case.
For example 10 and 7 are coprime, but not 27 and 15. Those two (27 and 15) have the common factor 3.
So that means: if the greatest common divisor (GCD) of two positive integers is > 1 (greater than 1), then these two integers are not coprime. They have some factor in common. If the greatest number that divides evenly into both of them, is 1, then they're "strangers" to one another.
Clearly it would be useful to have a sure-fire way to obtain the GCD of any two positive integers.
One way we're taught to find the GCD in school is to get the prime factors of m and n, to find out what factors they have in common.
The product of all in-common factors, including of the same prime more than once, gives the GCD.
factors = primes.factors
from operator import mul, add
from functools import reduce
def gcd(a, b):
p_b = list(factors(b))
common_ab = []
for p_a in factors(a):
if p_a in p_b:
common_ab.append(p_a)
p_b.remove(p_a)
return reduce(mul, common_ab) # product of all primes in common
One issue with this method, is prime factorization gets to be difficult with larger numbers.
Euclid's Algorithm, introduced below, does not suffer from this deficiency.
def euclid(a, b):
while a % b: # when remainder is 0, b is gcd
b, a = a % b, b # chopping down to 1
return b
euclid(5, 12)
1
euclid(27, 15)
3
euclid(10, 7)
1
So 10 and 7 are strangers.
def strangers(a: int, b: int) -> bool:
return euclid(a,b)==1
strangers(10, 7)
True
Now that we have a working GCD function (or just import it from math), lets define the "totatives" of a number n, to be all coprimes < n, including 1 itself. Remember coprimes to n are not necessarily prime themselves, it's just that they don't divide n evenly or contain any factors that do.
A quick way to compute totatives is with a list comprehension.
def totatives(n):
return [m for m in range(1, n) if strangers(m, n)]
totatives(12)
[1, 5, 7, 11]
print(totatives(29))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]
If every positive number < n is coprime to n, then n itself is a prime number.
The totient of n is the number of totatives of n. This concept will come in handy when we look at Euler's Theorem, a generalization of Fermat's Little Theorem.
Since we have a function for totatives already, counting them will give us the info.
def totient(n):
return len(totatives(n))
totient(12)
4
totient(100)
40
Euler sometimes used the lowercase Greek letter $\phi$ for totient. We might provide that as a synonym:
𝜙 = totient
𝜙(12)
4
The totient of prime p is always (p-1) because every number less than p, down to 1 inclusive is coprime to p.
𝜙(17)
16
𝜙(593)
592
N = 593 * 17
N
10081
𝜙(N)
9472
𝜙(593) * 𝜙(17)
9472
def check(a, b):
"""
if a, b are coprime
"""
if not strangers(a, b):
print("Not coprime")
return
return 𝜙(a) * 𝜙(b) == 𝜙(a * b)
check(100, 12)
Not coprime
def check2(a, b):
"""
for any a, b, a more general identity
"""
return 𝜙(a) * 𝜙(b) * gcd(a, b) == 𝜙(a * b) * 𝜙(gcd(a, b))
check2(100, 12)
True
Another way to compute the totient is to use the algorithm:
$$ \phi(n) = n\ \prod\limits_{p | n} (1 - 1/p) $$p | n means "p a prime factor of n" (divides evenly) and we only want the unique ones. For example, in computing the totient of 100, we would have only the prime factors 2, and 5, each once. The $\prod$ means to form a product of all the unique prime factors using terms (1 - 1/p).
We will translate this typography into a Python program below.
100 * (1 - 1/2) * (1 - 1/5)
40.0
To keep the computation out of floating point, we may use Fraction objects and, realizing the answer is a whole number, grab just the numerator at the very end (the denominator will be 1).
from fractions import Fraction
def tot(n):
product = Fraction(1,1)
for p in set(primes.factors(n)[1:]): # throw away 1 as a factor (not prime)
product *= (Fraction(1,1) - Fraction(1,p))
return (n * product).numerator
tot(100)
40
tot(N)
9472
GCD and LCD are often used, especially when simplifying fractions. To get the fraction $(m/n)$ in lowest terms, one divides both $m$ and $n$ by gcd(m,n).
Clearly gcd is a workhorse at the core of our numeric computations.
How does Euclid's Method get the job done?
Let's break it down, step by step.
If I'm looking for the greatest divisor of a and b, I should first see if a divides b or b divides a, with no remainder. If $b > a$, then divmod(a, b) is (0, a).
divmod(4, 12)
(0, 4)
This means a and b will swap in the next line:
a=4
b=12
print(a, b) # before
if a % b:
b, a = a % b, b
print(a, b) # after
4 12 12 4
Therefore, we don't really care if a > b or b < a at the start.
Going forward, whenever there's a remainder, the question becomes "what divides both this new remainder, and the smaller of the two numbers just compared?"
In other words, the problem keeps transferring to finding a divisor that works for the smaller size, and the remainder upon dividing into the larger size. The quantities get smaller and smaller.
Once 1 is reached, as the smaller size b, we're done, as gcd = b = 1 always divides into an integer with no remainder. Remember, if gcd(a, b) is 1, then a and b are coprime.
We may define the lowest common divisor lcd(a, b)
as (a * b)/gcd(a, b)
i.e. their product, after canceling all factors in common.
The LCD is the smallest number both $a$ and $b$ will divide into, evenly (without remainder).
gcd = euclid # make a synonym
def lcd(a, b):
return (a * b)//gcd(a, b)
r = lcd(679, 301)
r
29197
r//301, r//679
(97, 43)
Suppose we want to find the greatest common divisor of a whole long list of numbers? Ditto LCD. The idea makes sense. We can use the reduce
function in functools.
reduce
eats the first two arguments, gets a result, eats the next argument to combine with the earlier result, and so on, a cumulative strategy. Think of adding and/or multiplying a whole list of numbers together.
? reduce
Docstring: reduce(function, sequence[, initial]) -> value Apply a function of two arguments cumulatively to the items of a sequence, from left to right, so as to reduce the sequence to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). If initial is present, it is placed before the items of the sequence in the calculation, and serves as a default when the sequence is empty. Type: builtin_function_or_method
reduce(add, [1,2,3,4]) # sum(sum(sum(0, 1), 2), 3)...
10
reduce(mul, range(1, 10)) # factorial 9!
362880
import math
from math import factorial
factorial(9)
362880
reduce(gcd, [25, 27, 32, 17])
1
reduce(lcd, [25, 27, 32, 17])
367200
This topic is introduced in the Pascal's Triangle Notebook as well as in the Sympy Notebook.
We're revisiting it here as an application for reduce
along with gcd
.
Lets define a function to give the nth row of Pascal's Triangle.
The Binomial Theorem gives us an expression we can use. Our goal is to introduce another primality test associated with a relatively new algorithm called AKS after its discoverers (similar to how RSA is named for a 3-person team).
import sys
sys.version[:3]
'3.7'
from math import gcd as euclid
try:
from math import binom # included in Python 3.8 up
except:
def binom(n, k):
return math.factorial(n) // math.factorial(k) // math.factorial(n - k)
p = 19
coeffs = [binom(p, k) for k in range(0,p+1)]
coeffs
[1, 19, 171, 969, 3876, 11628, 27132, 50388, 75582, 92378, 92378, 75582, 50388, 27132, 11628, 3876, 969, 171, 19, 1]
Note the symmetry here. With an odd number as input, two terms always repeat at the center. For our primality test, we don't need to test any coefficient but once. The theorem behind AKS states that only a prime will be a divisor of every coefficient in the the corresponding row of Pascal's Triangle.
coeffs[1:p//2 + 1] # just left side, keeping coefficient p
[19, 171, 969, 3876, 11628, 27132, 50388, 75582, 92378]
reduce(euclid, coeffs[1:p//2 + 1])
19
If the gcd is p itself, then we know p is a divisor of all the coefficients in question, and therefore p is prime.
See the Youtube Gallery for a video on this recently discovered Primality Test. The AKS test itself is a different but related algorithm.
def primality_test(c):
"""
p divides evenly into the coefficients of the pth
row of Pascal's Triangle, if and only if p is prime
"""
coeffs = [binom(c, k) for k in range(1, c//2 + 1)]
return c == reduce(euclid, coeffs) # gcd is the candidate prime?
primality_test(11)
True
primality_test(17)
True
print(list(filter(primality_test, [2, *range(3, 200, 2)])))
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199]
To be continued...
Note to readers: in the early days, simple links such as the above returned you to this very repo, but Github will now redirect you somewhere else. To use this repo effectively, you're encouraged to clone it locally and use it internally to JupyterLab.