(Click here for a Postscript version of this page.)
Multiplication of large numbers of n digits can be done in time O(nlog(n)) (instead of O(n2) with the classic algorithm) thanks to the Fast Fourier Transform (FFT). This page presents this technique along with practical considerations.
Basic definitions and notations about large number representation can be found in Arbitrary precision computation.
Two large integers X and Y of size at most n-1 can be written in the form
|
|
A polynomial of degree < m is uniquely defined from its evaluations at m distincts points (with Lagrange interpolation for example). Thus, to obtain the product R(z) = P(z)Q(z), it suffices to compute the values R(wk) at 2n distincts points wk, that is computing P(wk) and Q(wk). The Fast Fourier Transform (FFT) idea consists in choosing for wk the complex roots of unity
|
The second point results from the fact that the k-th coefficient rk of R(z) satisfy
|
The previous tasks all finally reduces in the following problem : Given a sequence A = (a0, a1,¼, a2n-1), compute its Fourier transform
|
|
|
With the notation of (*), the principle is to write
|
In other words, to compute the coefficients bk of F2n(A), perform the following steps :
|
|
|
Thus the cost T(2n) of computing F2n(A) with FFT satisfy T(2n) = 2T(n) + O(n). When n is a power of two, the process can be made recursive leading to the bound T(n) = O(n logn). The algorithm is of course similar for the conjuguate Fourier transform.
We now present formaly the algorithm to multiply big numbers with FFT (the method is called Strassen multiplication when it is used with floating complex numbers) :
Let n be a power of two. Let two big integers X and Y with less than n coefficients
|
|
|
|
|
|
Notice that X* is the Fourier transform of the sequence made with n consecutive zeros appended to x0,¼,xn-1 (the same is true with Y*).
The algorithm consists in computing two FFTs of size 2n, 2n products of basic data types (whose complexity is negligible), and one reverse FFT of size 2n. Thus the product of two large integers with n digits has a complexity asymptotically equal to 3 FFTs.
To square a number of size n, only one direct FFT is needed, thus the squaring complexity is asymptotically equal to 2 FFTs.
The bound on the numerical errors a on the zi after the FFT process can be proved to be
| (1) |
Other approaches of FFT consists in working in a finite field Z/pZ with p prime, choosing w as a primitive n-th root of unity in Z/pZ (this approach is called NTT, for Number Theoretic Transform). This approach overpass numerical errors. But is seems that, except on special architectures, actual implementation of NTT approach are less efficient.
In fact, the time complexity of multiplication with FFT is a little bigger than n log(n).
Let us be more precise. To multiply two numbers of N digits, we write them in a base B which contains k digits (say B = 10k), thus giving a number of coefficients equal to n @ N/k. The discussion above tells us that to multiply those two numbers, FFT permits to perform O(n log(n)) operations on basic numbers (basic numbers express coefficients in the base B, they are usually basic numerical data types like double in C of Fortran). Because of the numerical error bound (1), these basic numbers should be precise enough to represent integers up to 6n2B2log(n) (for example, working in double precision, the base B should be choosen small enough so that the error bound is not too large... and this is not even possible if n is too large). Thus the number of digits of these basic numbers should be of the order of log(B)+log(n). As a consequence, the basic operations on these numbers has cost O((log(B)+log(n))2) and the final cost is O( n log(n) (log(B) + log(n))2). The base B is choosen so that k = log10(B) is of the same order than log(n), and finally, multiplying those two numbers of N @ kn digits have cost
|
Thus the cost of Strassen multiplication (this is the name of the floating-FFT we presented) to multiply numbers of N digits is O(Nlog2(N)). If the Strassen mulplication is also used to compute the operations on the basic numbers (one recursive level), the cost reduces to O(Nlog(N)loglog2(N)). The best bound is obtained with complete recursive version and is O(Nlog(N) loglog(N) logloglog(N) ¼). The use of the Number Theoretic Transform (see above) leads to the same bounds.
The lowest known theoritical bound is obtained with the Schönhage multiplication, which generalizes the process by working in finite rings, and is O(N log(N) loglog(N)) (see Multiplication en le Lisp for a description of Schönhage multiplication). The Nussbaumer multiplication also reaches this complexity (see [2] for an explanation of various large-multiplication routines). Notice that on actual implementations, the Strassen approach is the fastest, even if asymptotically slightly inferior.
The practical timings of FFT are often far from the theoritical one O(n logn). This is essentially due to a sparse access to the memory during the FFT. Techniques exist that optimizes this aspect to optimize locality while accessing memory (4-step or 6-step FFT, also called matrix Fourier Algorithm).
A very careful implementation of the FFT make it more efficient than any other multiplication techniques with number with ten thousands digits or more.
A very easy FFT C sample code can be found in Easy programs for constants computation.
Carey Bloodworth FFT pages : many informations on different types of FFT.
Jorg Arndt FFT page. Contains Jorg Arndt "FFTs for programmers" text, where you can find algorithms and source code for FFT.
constants, numbers and computation