Backdoor Hash Function

From MgmtWiki
Jump to: navigation, search

Solutions

Thomas Pornin

Here is a "backdoored" hash function:

Let p=2q+1 be a big prime of length 2048 bits, such that q is also prime. Let a be an integer of order q modulo p, i.e. a≠1 but aq=1(mod p); it can be shown that a=4 is always a valid solution. Let s be a (secret) integer between 1 and q−1 , and let b=as(modp).

Then define operation ϕ(x,y) such that input x is a sequence of 2048 bits, and y is a sequence of 2042 bits (not 2048); the output is a 2048-bit value:

Concatenate x

and y and split it back into u and v, such that u
and v both have length 2045 bits each:

u||v=x||y

Interpret u

and v as unsigned big integers, using some convention (e.g. big-endian). Note that both u
and v are lower than 22045

.

Compute: z=aubv(modp)

ϕ(x,y)

is the encoding of z over exactly 2048
bits (there again, using a fixed convention).

Note that ϕ(x,y)≠0 for all x

and y

.

Now that you have ϕ, define a hash function h as follows:

For input m , pad it with some "removable" padding sequence (e.g. a Merkle-Damgård compliant padding) so that the total length of m

is now a multiple of 2042

. Split m

into n
blocks of 2042
bits (denoted m1

, m2 , ... mn ). Define x0=0

(a sequence of 2048
bits, all of value 0

). For i=1

to n

, define: xi=ϕ(xi−1,mi) Define the hash output h(m)

to be SHA-256(xn

). Let's see the security of this construction:

Preimages: for a given t , finding m

such that h(m)=t
implies, in particular, finding some xn
such that SHA-256(xn

) =t , so our function h

is at least as much preimage-resistant as SHA-256.

Second preimages: second preimages are a special case of collisions, so h

will resist second preimages at least as well as it resists collisions. Ideally we would want something better (i.e. resistance up to 2256
for second preimages, even though collisions can be found in effort 2128

), but since collisions should be infeasible anyway, this is probably acceptable.

Collisions: suppose that m≠m′

and h(m)=h(m′)
m
yields a sequence of n
words x1
to xn

, while m′

yields x′1
to x′n′

. We suppose, without loss of generality, that n′≥n . Then one of the following holds:

xn≠x′n′

but SHA-256(xn

) =

SHA-256(x′n′

). We have found a collision on SHA-256.

There is an index i

such that xi−1≠x′i−1
but xi=x′i

. This means that we know integers of at most 2045

bits u

, v , u′

and v′
such that (u,v)≠(u′,v′)

, but aubv=au′bv′(modp) . This implies that au−u′=bv′−v(modp) . Since a

and b
have order q
(greater than 22046

) and u−u′

and v−v′
cannot be both 0

, then u−u′

and v′−v
are invertible modulo q

, and we can write b=ag(modp)

with g=(u−u′)/(v′−v)(modq)
(i.e. g
is the secret value s

). We just solved discrete logarithm.

There is an index j

such that n<j≤n′
and x′j=xn

. By "rewinding" the steps this implies that either we find an instance of the previous situation (aubv=au′bv′

for (u,v)≠(u′,v′)

), or we have x′j−n=x0 , which is not possible since x0=0

and x′j−n=ϕ(x′j−n−1,mj−n)≠0

.

Therefore, finding a collision on h

requires either finding a collision on SHA-256, or solving discrete logarithm modulo p

. Since both are computationally infeasible (as far as we know), this function h

can be deemed "secure" as a hash function.

The backdoor is the knowledge of s . In all of the above, we used the a

and b
values "as is". The value of s
is not needed anywhere. However, if you know s

, then you can compute collisions at will:

Generate two random blocks m1

and m′1

. Compute the corresponding x1

and x′1

. Let u

be the first 2045
bits of x1

, and u′

be the first 2045
bits of x′1

. Compute k=(u−u′)/s(modq) If k

is less than 22045

, or greater than q−22045 , then you can find v

and v′
of 2045
bits each, such that v′−v=k(modq)

, from which you can infer two blocks m2

and m′2
that match (u,v)
and (u′,v′)

, respectively. You have your collision. The condition on k

works with probability at least 1/2

. If you were unlucky, just try again with new random blocks m1

and m′1

. The above is thus a hash function with a backdoor. However, mind the fine print:

The backdoor is explicit. While the knowledge of s

is necessary to exploit the backdoor, everybody knows that it is there. Conversely, if you generate a
and b
as NUMS numbers (thus with s
convincingly unknown to everybody), then the hash function is "backdoorless" (but also useless).

When you exploit the backdoor, you produce two colliding messages m

and m′

, from which anybody can recompute s . Thus, the backdoor is "one-shot".

The backdoor allows finding collisions, not preimages. A hash function designed to have a backdoor for preimages would need something else. Ideally you would use a one-way trapdoor permutation, but (to my knowledge) no such object is currently known, that would yield an output of suitable length (e.g. 256 bits).