FFT based multiplication of large numbers FFT based multiplication of large numbers

(Click here for a Postscript version of this page.)

Multiplication of large numbers of n digits can be done in time O(nlog(n)) (instead of O(n2) with the classic algorithm) thanks to the Fast Fourier Transform (FFT). This page presents this technique along with practical considerations.

Basic definitions and notations about large number representation can be found in Arbitrary precision computation.

1  Introduction

Two large integers X and Y of size at most n-1 can be written in the form

X = P(B),     Y = Q(B)
where B is the base (usually B = 10 or a power of 10) and P and Q two polynomials
P(z) = n-1
å
j = 0 
xj zj,    Q(z) = n-1
å
j = 0 
yjzj.
Denoting by R(z) the polynomial obtained by the product of P(z) by Q(z), we have XY = R(B) and a final rearrangement on the coefficients of R(z) permits to obtain the product XY. Thus, we are lead to the problem of multiplying two polynomials of degree < n.

A polynomial of degree < m is uniquely defined from its evaluations at m distincts points (with Lagrange interpolation for example). Thus, to obtain the product R(z) = P(z)Q(z), it suffices to compute the values R(wk) at 2n distincts points wk, that is computing P(wk) and Q(wk). The Fast Fourier Transform (FFT) idea consists in choosing for wk the complex roots of unity

wk = exp æ
ç
è
2ikp
2n
ö
÷
ø
= wk,    w = exp æ
ç
è
2ip
2n
ö
÷
ø
This choice of the wk permits to have two properties :

The second point results from the fact that the k-th coefficient rk of R(z) satisfy

rk = 1
2n
T æ
è

wk
 
ö
ø
,        T(z) = 2n-1
å
j = 0 
R(wj) zj,
which easily follows from a double summation inversion. In Fourier transform words, the coefficients of R(z) are obtained from the conjugate of the Fourier transform of T(z).

1.1  Fourier transform

The previous tasks all finally reduces in the following problem : Given a sequence A = (a0, a1,¼, a2n-1), compute its Fourier transform

F2n(A) = (b0, b1, ¼, b2n-1),       bk = 2n-1
å
j = 0 
aj wjk.     (*)
The final property to retrieve R(z) from the Fourier transform of its coefficients R(wj) writes in the form

F
 

2n 
(F2n(R)) = 2n R,
where the conjugate Fourier transform is

F
 

2n 
(A) = (b0, b1, ¼, b2n-1),       bk = 2n-1
å
j = 0 
aj w-jk.

1.2  Fast Fourier Transform

The Fast Fourier Transform (FFT) is a way to compute the Fourier transform of a sequence A in time O(n logn) instead of O(n2) with the classical way, when n is a power of 2.

With the notation of (*), the principle is to write

bk = 2n-1
å
j = 0 
aj wjk = n-1
å
j = 0 
a2j (w2)jk + wk n-1
å
j = 0 
a2j+1 (w2)jk.

In other words, to compute the coefficients bk of F2n(A), perform the following steps :

Thus the cost T(2n) of computing F2n(A) with FFT satisfy T(2n) = 2T(n) + O(n). When n is a power of two, the process can be made recursive leading to the bound T(n) = O(n logn). The algorithm is of course similar for the conjuguate Fourier transform.

2  Multiplication with FFT

2.1  Algorithm

We now present formaly the algorithm to multiply big numbers with FFT (the method is called Strassen multiplication when it is used with floating complex numbers) :

Let n be a power of two. Let two big integers X and Y with less than n coefficients

X = n-1
å
j = 0 
xj Bj,    Y = n-1
å
j = 0 
yj Bj.
To compute Z = XY in time O(nlog(n)), perform the following steps :

Notice that X* is the Fourier transform of the sequence made with n consecutive zeros appended to x0,¼,xn-1 (the same is true with Y*).

The algorithm consists in computing two FFTs of size 2n, 2n products of basic data types (whose complexity is negligible), and one reverse FFT of size 2n. Thus the product of two large integers with n digits has a complexity asymptotically equal to 3 FFTs.

To square a number of size n, only one direct FFT is needed, thus the squaring complexity is asymptotically equal to 2 FFTs.

2.2  Numerical errors

In the practice, one should use the basic data type double in C to handle floating point numbers appearing in the Fourier transform. Numerical errors appear during the computation. If the numerical errors are sufficiently small, they can be overpassed : each final value zi after the reverse FFT should be an integer, thus we must take its nearest integer value, the fractionnal part giving the error obtained.

The bound on the numerical errors a on the zi after the FFT process can be proved to be

a £ 6 n2 B2 log(n) e
(1)
where e @ 1.e-16 is the machine precision on the double precision words. But this bound is a worst case one and is very pessimistic. In the practice, the numerical stability of the FFT is very good and the bound a = O(nB2e) is fulfilled. To deduce correctly the true value of zi by taking its nearest integer, we need a to be sufficiently small (a less than 0.5 at least, a < 0.25 is better !). Choosing a base B of three or four digits usually suffices to have the desired accuracy for multiplying two large integers.

Other approaches of FFT consists in working in a finite field Z/pZ with p prime, choosing w as a primitive n-th root of unity in Z/pZ (this approach is called NTT, for Number Theoretic Transform). This approach overpass numerical errors. But is seems that, except on special architectures, actual implementation of NTT approach are less efficient.

2.3  More on the complexity of multiplication with FFT

In fact, the time complexity of multiplication with FFT is a little bigger than n log(n).

Let us be more precise. To multiply two numbers of N digits, we write them in a base B which contains k digits (say B = 10k), thus giving a number of coefficients equal to n @ N/k. The discussion above tells us that to multiply those two numbers, FFT permits to perform O(n log(n)) operations on basic numbers (basic numbers express coefficients in the base B, they are usually basic numerical data types like double in C of Fortran). Because of the numerical error bound (1), these basic numbers should be precise enough to represent integers up to 6n2B2log(n) (for example, working in double precision, the base B should be choosen small enough so that the error bound is not too large... and this is not even possible if n is too large). Thus the number of digits of these basic numbers should be of the order of log(B)+log(n). As a consequence, the basic operations on these numbers has cost O((log(B)+log(n))2) and the final cost is O( n log(n) (log(B) + log(n))2). The base B is choosen so that k = log10(B) is of the same order than log(n), and finally, multiplying those two numbers of N @ kn digits have cost

O( n log(n)3 ) = O( N log(N)2).

Thus the cost of Strassen multiplication (this is the name of the floating-FFT we presented) to multiply numbers of N digits is O(Nlog2(N)). If the Strassen mulplication is also used to compute the operations on the basic numbers (one recursive level), the cost reduces to O(Nlog(N)loglog2(N)). The best bound is obtained with complete recursive version and is O(Nlog(N) loglog(N) logloglog(N) ¼). The use of the Number Theoretic Transform (see above) leads to the same bounds.

The lowest known theoritical bound is obtained with the Schönhage multiplication, which generalizes the process by working in finite rings, and is O(N log(N) loglog(N)) (see  Multiplication en le Lisp for a description of Schönhage multiplication). The Nussbaumer multiplication also reaches this complexity (see [2] for an explanation of various large-multiplication routines). Notice that on actual implementations, the Strassen approach is the fastest, even if asymptotically slightly inferior.

3  Practical implementation

The basic practical implementation of floating-FFT is not so difficult. To be really efficient, one should take into account the hermitian property of the problem (x*2n-k is the conjugate of x*k) to avoid to compute twice the same terms and storing minimal information. This problem, together with a maximal efficiency on the data storage to avoid recopy, is more difficult. In this context, the use of the Hartley transform (variant of the FFT) can be helpful : Hartley transform is involutive, avoiding writing a specific code for the inverse transform.

The practical timings of FFT are often far from the theoritical one O(n logn). This is essentially due to a sparse access to the memory during the FFT. Techniques exist that optimizes this aspect to optimize locality while accessing memory (4-step or 6-step FFT, also called matrix Fourier Algorithm).

A very careful implementation of the FFT make it more efficient than any other multiplication techniques with number with ten thousands digits or more.

A very easy FFT C sample code can be found in Easy programs for constants computation.


Related links on FFT

Carey Bloodworth FFT pages : many informations on different types of FFT.

Jorg Arndt FFT page. Contains Jorg Arndt "FFTs for programmers" text, where you can find algorithms and source code for FFT.

References

[1]
D.E. Knuth, The Art of Computer Programming, Vol. II, Seminumerical Algorithms, Addison Wesley, (1998)

[2]
Crandall R. and Pomerance C, Prime Numbers: a Computational Perspective, Springer (2000)


Back to

constants, numbers and computation


File translated from TEX by TTH, version 2.32.
On 31 Aug 2001, 02:50.