# HITCON CTF 2019 Quals - Not So Hard RSA (crypto)¶

The challenge consists of the following short code:

from Crypto.Util.number import *
from secrets import *

T = 10
def encrypt(data):
num = bytes_to_long(data)
p = getPrime(512)
q = getPrime(512)
n = p*q
assert num < n
phi = (p-1)*(q-1)
e = inverse(d,phi)
a = pow(num,e,n)
enc = long_to_bytes(a).encode('hex')
return (n,e,enc)

print d.bit_length()
for _ in xrange(T):
data = flag + urandom(40)
print encrypt(data)


We are given 10 different RSA moduli and encryptions of the flag under each of them. The only relation between them is the common private exponent $d$. From the output file we can see that it has 465 bits, i.e. it is very short (compared to 1024 bits - size of $n$). Consider the exponents equations of the RSA, with approximate bit size of elements noted (recall that $0 \le k_i < min(e_i, d)$):

$$\underbrace{e_i}_\text{1024 bits} \underbrace{d}_\text{465 bits} - 1 = k_i \phi(n_i) = \underbrace{k_i}_\text{465 bits}(\underbrace{n_i}_\text{1024 bits}\underbrace{-p_i-q_i+1}_\text{512 bits}) .$$$$\underbrace{k_in_i}_\text{1489 bits} - \underbrace{e_i d+1}_\text{1489 bits} = \underbrace{k_i(p_i+q_i-1)}_\text{977 bits}.$$

How much freedom do the unknowns have? Let's assume we fix $d$ arbitrarily. Then 465-bit freedom of $k$ would allow to get roughly $1489-465=1024$-bit value on the left. However, we know that it has only 977 bits (we consider it as an error/noise term). Therefore, we obtain a 47-bit costraint on $d$ from each equation. From 10 equations we can expect to recover all 465 bits of $d$!

This is just an intuition behind the sizes of the numbers, we also have to efficiently satisfy the constraints. Luckily, LLL works great with such linear equations.

Consider the following lattice (spanned by rows):

$$\Lambda = \begin{array}{c c} \begin{array}{c} d \\ k_1 \\ \vdots \\ k_{10} \end{array}\hspace{-1em} & \left( \begin{array}{@{} c c c c c c c @{}} -e_1 & \cdots & -e_{10} & 1 & 0 & \cdots & 0 \\ 1 & \cdots & 0 & 0 & 1 & \cdots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \cdots & 1 & 0 & 0 & \cdots & 1 \\ \end{array} \right) \\ & \begin{array} {@{} c c c c c c c @{}} \hspace{0.7cm}v_1 & \cdots & v_{10} & \hspace{0.35cm}d & k_1 & \cdots & k_{10} \end{array} \\ \end{array}.$$

Note that all $e_i$ are in the same row since they correspond to terms with $e_id$ and $d$ is the same for all equations. Whereas all $k_i$ are different and each has its own row.

Consider the correct $k_i$ and $d$ and consider the linear combination descibed at the left side of the matrix. The corresponding vector in the lattice is equal to $$(v_1, \dots, v_{10}, d, k_1, \dots, k_{10}),$$ where $v_i = k_in_i-e_id$ is the small ("noise") 977-bit value. As a result, we obtain that the solution we look for correspond to the short vector in the lattice: all $v_i$, $d$ and all $k_i$ must be small. Note that $d$ and $k$ should be 465-bit values, while $v$ can be much larger: 977 bits. To encode this information, we scale the '$d$' and '$k_i$' columns by $2^{512}$ before applying LLL and scale back afterwards.

In [34]:
from __future__ import print_function, division
from sage.all import *
from Crypto.Util.number import *
from sock import Sock
class Stop(Exception): _render_traceback_ = lambda self: None

In [35]:
# chall
data = [
(52084595054768217522676979342755393306305099169414947960508049057119329537162079071100773540172780699842974838973453517862170568297372572652378982794735495836797191260523386985555538123219596180065060457770169661533302156160201909218943628670173938802399175928847179180982290051008255643478089280887936398779L, 15937444970326662770243305998311639639802081677519521519037892156989918335411343952028567001158781869582357356721336548356177945486289103616332672857562241745733313956089488246637981047367689313876169403573606972534488436195086998111247112950511965259280728899941655052549135516189815880662766290132607874671L, '15dbeced76ab710b84982e67a839846cfe38cfccce4dacc585df0e38d695e1c84eac7281d8c83b3c6eebfae7c27d91496b19de120374d08cbccc251a464c2f7bb10fb9d1c1f13b78f1fbfe0ce37d01350978dcb192c92db43560a9cd81481aa2e2d41a8c5b3c67d3e3ae4be50ddf37a5a0193da6f4befe71d5348bd7820f1b0a'),
(64885222129962661919689742957615146346855998523264787345799887210230045803423356080820154546050540728264000944692170729880161760807788699983366314269171423680987673788784023826885067996007133127166699735414920427997399513922777341273346196540175237091469555617281522541592407225697228219483666768990986901207L, 47625666889348051674457306461777570860477708163951567694477110559722255361844497439523737482859577232793119885759760321686098657292699849459852343686370029321619072315457562131610624443702007471950020751311592439673932509558946203816066868221203483167079506399876187240483994566182748627389022243608934595071L, '0f79419a86361d668ced50fbcfa521c5117f7b2f72c3cc248dba4b2b1c9e7dc3a32fa500f5167a0360aeefcb8daa973bc67b0537d641617c2d96d98ee27179de1644b480c120b2d14054db032b42af33b23f8182f72cc8e752d6f89d556800f637bc492bd8eb2fb294eee61bc3c55677066b6962c4d2a1f1896703d446d903f3'),
(130307595686638523389042871138355252466828093625912424932382409748014813151912412889866698513547588105179569464359905871041351306187955332098310075195714887141180801001191398677311128761953690562878908997464309620384635922453765468745206262477334243233099872249671186622159533482223573972729351847574480258349L, 15517881429792452627724339825487038872077384939873021432131971858090290889497162855962736652640866521963165312583164013911162455865793567436833813352448640307756786561236246631146444590597902995591016928432988304077340670079503467394770901559567924809126517090243024890436892801283564005789256977822007581731L, '69a8e0bb49427f3465c9f3a41a22e047bd348c886cdb07264a321f8b890bb48cd7a878e43c1eb4b2f496dafb50677b3eea032c8f7f2ee59695398c56cc3b183bb6e2f1ab2d5d633461a6592ca0c98c0f2dae4100d6aaaf0d1166bdd46beb68b07d9c9cf6c5a92ef7db019dc065b3a8300ca25b50cba51a3294d954c178bf3770'),
]
T = len(data)


Note that $v_i$ are typically a bit smaller, and we can reduce them by approximating $p+q$ by $2\sqrt{n}$. Therefore, we should scale the $d$ and $k_i$ columns by a bit smaller numbers. This is important, since the information amount that we obtain is quite tight. In fact, when scaling by $2^{512}$, the resulting basis does not contain the desired result. However, it should still be available in small linear combinations of short vectors, or e.g., by guessing a few bits of $d$ first.

Luckily, scaling by $2^{505}$ works well and we find the correct $d$ in 10-th element of the smallest basis vector.

In [37]:
m = matrix(ZZ, T+1, 2*T+1)

i = 0
m[0,T] = 1
for ni, ei, ci in data:
m[0,i] = -ei
# approximate p+q as 2sqrt(n), reduce a bit size of the noise
m[1+i,i] = (ni - 2*int(sqrt(ni)))
m[1+i,T+1+i] = 1
i += 1

Cd = 2**505
Ck = 2**506
for i in range(T):
m.set_column(i, m.column(i))
m.set_column(T+1+i, m.column(T+1+i) * Cd)
m.set_column(T, m.column(T) * Ck)

ml = m.LLL()

for i in range(T):
ml.set_column(i, ml.column(i))
ml.set_column(T+1+i, ml.column(T+1+i) / Cd)
ml.set_column(T, ml.column(T) / Ck)

print("Bit sizes:")
print([len(ZZ(v).bits()) for v in ml[0][:T]])
print([len(ZZ(v).bits()) for v in ml[0][T:T+1]])
print([len(ZZ(v).bits()) for v in ml[0][T+1:]])

n, e, ct = data[0]

for irow, row in enumerate(ml):
d = int(abs(row[T]))
if pow(2, e*d, n) == 2:
print("found d", "row", irow, ":", "d =", d)
ct = int(ct, 16)
flag = pow(ct, d, n)
print(long_to_bytes(flag))
break

Bit sizes:
[968, 966, 962, 967, 970, 971, 964, 970, 961, 970]
[465]
[465, 463, 459, 461, 465, 464, 462, 465, 461, 464]
found d row 0 : d = 52734116413990333859776618028885273976613159615407987318262228643225868562829470128730568170502025104920857931843837285163945541735508099991
"hitcon{recover_everything_by_amazing_LLL_algorithm!!}\n\x1c}w\x16\x1b\xbbHv\x85\x98X\xae\x83xH\xbeZ'\x83\x98o\xf3\xfe\xbcW\x16\x15\xb6\xd9\x1f\xcbG\xa0\x84\xbe\xbd\x1b\xba\xb6\xed"