MasFlam's Blog

The Tonelli-Shanks Algorithm, Explained

Modular arithmetic — perhaps surprisingly — has tons of practical applications, ranging from hashing strings to forming cryptosystems. And — maybe not so surprisingly — these involve solving equations which would normally be trivial in real numbers, but aren't nearly as simple in the discrete case. Such is the case with x2=a, aka finding the square root of a. That is exactly where the Tonelli-Shanks algorithm comes in. It solves x2=a in — pardon my math — any cyclic group G of even order, in Θ(log2|G|) group operations.

Sadly, the way it's often presented is convoluted both in terms of the pseudocode provided and the actual reasoning behind its correctness. As such, this article is my attempt at making sense of Tonelli-Shanks, in words myself from two years ago would understand.

The basics

I'll try not to overload the reader with too much mathematical notation and formalisms. That said, we do first need to introduce some basics. Feel free to skip over what you already know.

Modular arithmetic

Throughout this article, we'll be working with integers modulo a prime p. In simple terms, this means that after every operation we divide the result by p and leave just the remainder, effectively limiting ourselves to the numbers {0,,p1}. For example, 2·2=41(mod3). That last part just means we did this taking p=3. From now on I'll omit the special symbols and write 2·2=1 where p is known from the context. Furthermore, in our example 4=3+34=2, that is negative numbers get reduced as well.

An important property of multiplication modulo p is that for any non-zero a and b, it holds that a·b0. This is somewhat crucial, so we will assume the number a in x2=a is never zero. We're not losing out on much anyway since the only solution to x2=0 is zero itself.

Generators and square roots

It turns out there always exists a number g such that if we keep multiplying it by itself (aka take g1,g2,g3,), we will eventually get all the numbers {1,,p1}. This g is called a generator.

It follows from the above that every number from {1,,p1} can be written as gi for some i in {1,,p1}. Additionally, the number of elements in {1,,p1} is p1 which is even, unless p=2 (again, a trivial case: 0=0 and 1=1).

And these powers of g cycle around. That is, gp1 becomes 1 again, gp becomes g, etc. So the exponents sort of act like numbers modulo p1. An important realization stems from this fact: only a=gk for even k have square roots. This is because if k is odd, then were any integer i to satisfy x2=(gi)2=g2i=gk=a, it would mean that 2ik(modp1), which can't be true since p1 and 2i are even, and k is not. We call these numbers quadratic residues, in contrast with quadratic nonresidues which do not have a square root.

Oh, and one more thing. You know how in real numbers 4=2? Well if you think about it, 2 is a perfectly valid solution too. We sort of arbitrarily choose the positive one. Among (non-zero) integers modulo p, every quadratic residue also has exactly two square roots, each being the negative of the other. Our algorithm will find one of them, and it's not really ever clear which one. If you badly need consistency, simply choosing the smaller one is an option.

Fast exponentiation

One last thing we'll need is an efficient way to calculate an modulo p. This is accomplished using a simple recursive algorithm often referred to as fast exponentiation or square-and-multiply:

Clearly, this yields the correct value of an. And because every (at most) two iterations the value of n halves, the runtime of this algorithm is 𝒪(logn).

The Tonelli-Shanks algorithm

With the preliminaries out of the way, let's get down to business. The Tonelli-Shanks algorithm is as follows:

So why on earth does this work? Let's go through the algorithm step by step.

Finding the nonresidue

For Tonelli-Shanks to work, we need a quadratic nonresidue z. Thankfully, finding one is easy. We can make use of a theorem called Euler's criterion:

If zp12=1 then z is a quadratic residue.
Otherwise, zp12=1 and z is a nonresidue.

I won't show it here, but the proof is not that elaborate and you can easily find it online.

So we can check if any given z is good in 𝒪(logp) using the fast exponentiation algorithm described above. And as it turns out, we can just keep choosing a random z until one passes our test. This is because exactly half of the numbers in {1,,p1} are nonresidues, so the expected number of times we'll have to repeat this is Θ(logp). And again, I'll leave the proof as an exercise to the reader.

In total, this part of the algorithm takes Θ(log2p) time.

Initial state

We start with s=p12 and t=p1, which as we can see satisfies our invariants:

aszt=(g2k)p12·zp1=(gk)p1·zp1=1·1=1

And obviously, p12 is divisible by one less 2 than p1.

The loop

First of all, if as2zt2=1 then we clearly can use these new values for s and t, keeping both our invariants.

The second case is more interesting. We can't just do the same thing, because as2zt21 which would break one of our invariants. But, aszt=1 is currently true. And if we realize that the only square roots of 1 are 1 and 1 (as there can only be two), then we know as2zt2=1, since dividing the exponent by two is exactly what taking a square root is. So if we multiply this number by another 1, we will get 1 again. And this is what adding p12 to t/2 does, as zp12=1 per Euler's criterion:

as2zt2+p12=(as2zt2)·zp12=(1)·(1)=1

How about the other invariant? Can we be sure that adding p12 won't cause the new t to be "less even" than s? Yes, because both t2 and p12 are multiples of 2·s2, so their sum is still divisible by at least one more 2 than s2.

The result

The final answer looks just as mysterious. But you can verify it's correct by simply taking its square:

(as+12·zt2)2=as+1·zt=a·aszt=a·1=a

Runtime complexity

This part is relatively simple. Each iteration involves two fast exponentiations, and because after each iteration s becomes s2, there can't be more than 𝒪(logp) iterations before all the twos get divided out. In total, this nets us a 𝒪(log2p) algorithm.

There is also the initial search for z, which is the only randomized part of Tonelli-Shanks. But unless we get hugely unlucky this will take a comparable amount of time, and for any given p we only need to find z once.

Finishing thoughts

That's it! As you can (hopefully) see, there is no magic involved. I wouldn't go as far as to call this algorithm intuitive, but you can surely make sense of it, even without any advanced math knowledge. And the implementation isn't hard either. Let's end with coding one up in Python:

import random

# These work for prime p > 2.
# Python's 3-argument pow performs fast exponentiation mod p.

def tonelli_shanks(a, z, p):
    s = (p - 1) // 2
    t = p - 1
    while s % 2 == 0:
        apow = pow(a, s // 2, p)
        zpow = pow(z, t // 2, p)
        azpow = apow * zpow % p
        if azpow == 1:
            s = s // 2
            t = t // 2
        else:
            s = s // 2
            t = t // 2 + (p - 1) // 2
    apow = pow(a, (s + 1) // 2, p)
    zpow = pow(z, t // 2, p)
    return apow * zpow % p

def is_nonresidue(x, p):
    return pow(x, (p - 1) // 2, p) == p - 1

def find_nonresidue(p):
    while True:
        z = random.randint(2, p - 1)
        if is_nonresidue(z, p):
            return z

Special thanks to tfkls for helpful feedback, as well as to dr. Lech Duraj for the original explanation given during his lecture.

#algebra #algorithms #discrete root #number theory