Contents : + Introduction + Features + Version + Timings + Number of digits + Usage + Behavior of the program + Formula and algorithm used + Problems + What should be done for next versions + Source code information + Conclusion ================ | INTRODUCTION | ================ PiFast computes decimal digits of the constants Pi, E=exp(1), Sqrt(2) or classical hypergeometric series. The default output are the files pi.txt and e.txt (for the constants Pi and E). PiFast currently holds the world record pi computation on a home computer with 12.88 billion digits (Shigeru Kondo, on a Pentium III 800, 1024 Mo of memory). PiFast can compute much more digits than what could fit in the physical memory, using disk memory. PiFast is available on several platforms, download it from the page http://numbers.computation.free.fr/Constants/PiProgram/download.html (unix version is PiFast 3.3 only). Windows 95/98 and NT : download the file pifast.zip hpux10.20 : download the file pifast_hp.gz irix6.2 : download the file pifast_irix.gz aix4.3.2 : download the file pifast_aix.gz The web adress of the PiFast site is http://numbers.computation.free.fr/Constants/PiProgram/pifast.html ============ | FEATURES | ============ * Computes decimal digits of pi, e=exp(1), sqrt2, with the high limit of 17 billion digits * Very fast (fastest pi program available on the web - July 5 2003) * Much more economical in memory than other pi programs * Parameters available to fit in the memory machine (memory saved <-> time lost) * Able to handle disk memory for very large computations, permitting to compute much more digits than could be contained in the physical memory * Very large computations can be done in several chunks * Can compute a lot of other predefined constants (log(2), zeta(3), ...) with many formulas * User can define his own formula to compute a constant * Can compress directly the output result and later, decompress ranges of it =========== | VERSION | =========== Current version : PiFast43 (version 4.3, released on Jul 05 2003, fixed on Jul 08 2003). New features of version 4.3 : * Faster than previous version : 4-5% for pi computations, and more (10-15%) for e computations * A serious (but rare) bug discovered by Shigeru Kondo in the computation of 64 million digits of e=1/exp(-1) was fixed. The bug existed since version 4.0 * Ability to specify a factor suffix k,K,m,M,g,G after the number of digits (k=1000,K=1024,m=1000000,M=1,048,576...) * Maximum number of digits is now 17 billion (instead of 12 billion with previous versions) Older versions +++++++++++++ Version : 4.2 (released on Jan 07 2003). ------------- * PiFast42 is significantly faster than previous version (saving is at least 10 or 25% depending on the hardware). * Some bugs have been corrected. Version : 4.1 (released on Sep 06 2001, fixed on Sep 25 2001). ------------- * PiFast41 is slightly faster (3%) for small computations. * Some bugs have been corrected. Version : 4.0 (released on Aug 30 2001) ------------- * PiFast40 can compute user defined constants from a large class of hypergeometric series. This permits to compute a lot of other constants like log(2), zeta(3), pi with arctan formulas... (see section for more details) * PiFast40 is faster : progress have been made on the FFT. The speed-up ratio varies from 15% (small computations) to 10% or less (larger computations) in standard mode. Version : 3.3 (released on Jul 08 2000) ------------- * PiFast33 computes also the number e = exp(1). Two methods are available. * PiFast can verify a pi computation by using the Fabrice Bellard formula which computes directly the n-th bit of pi. * A bug (reported by Shigeru Kondo for some computations with more than 6 billion digits) have been corrected. * PiFast is now available on unix platforms. * The PiFast.log file, generated if problems are encountered, has been enriched. More internal tests are done. * The default output format is now a classical format (Guttenberg) Version : 3.2 (released on Dec 27 1999) ------------- * A larger number of digits can be computed with a limited physical memory in the advanced disk memory mode. * Timings have been decreased by a few percents, especially in the standard mode. * Physical memory needed has been decreased by a few percents (~ 4%). * A Sqrt(2) computation is available (with verification). This mode essentially permits to test PiFast32 with a huge number of digits, with ranges that I cannot reach with my computer. The timings of Sqrt(2) computation is at least ten times faster than the pi computation. * When errors of the program are encountered (I hope this does not happen, but since I could not test PiFast in huge ranges, bugs can appear...), a PiFast.log file is generated. This file contains information about the program and the error and should be send to xgourdon@yahoo.fr to help me to debug the problem. * When reading a compressed format, the program can generate x digits every y digits, where x and y are user specified. Version : 3.1 (released on Oct 30 1999) ------------- This version contains several improvements, essentially dedicated to huge computations (1 giga digits or more). * A new disk swap mode exists, using a Number Theoritic Transform (NTT) with a variable number of primes, that permits to have a quasi-linear behavior of the timings of the program for really huge computations (to be compared with the asymptotic quadratic behavior of PiFast23). * The maximum number of digits to be computed with that version is now 24 billions digits (instead of 4.5 billions with version 2.3) in the disk swap mode. In this respect, developpement have been made to limit the internal swap files size to 2 giga-bytes (by generating several files if needed). * The output can be directly compressed by PiFast32. The resulting file can be read (to access digits at some places only for example) or transformed into a standard one with PiFast. Different output format are also available in this compressed output mode. Version : 2.3 (released on Sep 5 1999) ------------- * A difficult bug (reported to me by Shigeru Kondo) was found that made version 2.2 going wrong when FFT size was >= 4096k (possible on machines with 256 Meg at least). * The maximum number of digits to be computed with that version is now 4.5 billion digits (instead of 1.5 billion with version 2.2). Large number of digits computation with version 2.3 : > 2,684,354,560 ( = 2.5*2^30 ) digits on a PIII 550 (1024 Mo) (19 Oct 1999) in 245 hours by Shigeru Kondo (world record pi computation on PC at this date). > 2.1 billions digits on a PIII 350 (1024 Mo) (13 Sep 1999) in 157 hours by Shigeru Kondo. Version : 2.2 (released on 10 Aug 1999) ------------- PiFast22 is Version 2.2 of PiFast. Efforts have essential been made in the disk swap mode. new features of version 2.2 : * In the disk swap mode, Disk swapping has been reduced, saving a quite large part of the total timing. * Disk swap timings are given in the computation information summary (in the header of the resulting file). * Disk head movements are reduced. * A bug has been fixed to have all computed digits right (in some cases with version 1.1 and 2.1, the 0.8% last digits were wrong). Another bug in internal computation has also been fixed. * Physical memory needed has been decreased by 5% in the disk memory mode. * Documentation has been enlarged (behavior of the program). Version : 2.1 (released on 5 Aug 1999) ------------- * It can use disk memory to perform huge computations. * A bug has been fixed than permits to reach a larger number of digits. * A large computation using disk memory can be done in several runs. * A computation information summary is written as the header of the resulting file. Version 1.1 (released on 30 jul 1999) : ----------- * Faster than the first version of PiFast (15%) * Less memory consuming Version 1.0 (First Version) was released on 17 jul 1999 : ----------- * Only Pi computation available (2 formulas, Chudnovsky and Ramanujan) in standard mode. =========== | TIMINGS | =========== PiFast42 is the fastest program available of the net to compute Pi on Windows (Jan 07 2003). Another pi program on Windows, named QuickPi, is comparable in speed with PiFast when the number of digits is less than 1 million but becomes less efficient for larger computations. See also the page http://home.istar.ca/~lyster/pi.html for the current fastest pi programs on PC. I have not been able to perform comparisons on unix platforms. Notice that PiFast on unix is not as tuned as on windows. For example, PiFast on windows uses basic routines handling bytes in the little-endian conventions, this has not been tuned in the big-endian convention for most unix machines. Following are just sample of timings. More can be found from PiFast home page : http://numbers.computation.free.fr/Constants/PiFast/pifast.html Send me your timings on other machines ! Standard mode (no disk space used) ------------- This mode is the fastest when there are enough physical memory. Timings sample on a Pentium 4 1.4 Ghz with 1024 Mo running on Windows NT (notice that no particular compilation has been made to benefit from Pentium 4 specific instructions) : PI Computation : (timings with version 4.1) 1 Million decimal digits : 9.69 seconds 2 Million decimal digits : 22.78 seconds 4 Million decimal digits : 50.80 seconds 8 Million decimal digits : 116.38 seconds 16 Million decimal digits : 264.9 seconds 32 Million decimal digits : 583.25 seconds 64 Million decimal digits : 1327 seconds 128 Million decimal digits : 2974 seconds E Computation : (timings with version 4.1) 1 Million decimal digits : 4.33 seconds 2 Million decimal digits : 9.47 seconds 4 Million decimal digits : 21.34 seconds 8 Million decimal digits : 44.03 seconds 16 Million decimal digits : 95.25 seconds 32 Million decimal digits : 206.74 seconds 64 Million decimal digits : 433 seconds 128 Million decimal digits : 948 seconds Standard Disk memory mode ------------------------- This mode is be used for very large computation when the standard mode requires too much physical memory. It only requires a few disk space. (timings for Pi with version 2.3, on my Pentium 350 with 128M of physical memory) 32 millions decimal digits : 3662 seconds (~ 1 hour 1 minutes) 64 millions decimal digits : 9829 seconds (~ 2 hours 44 minutes) 128 millions decimal digits : 30320 seconds (~ 8 hours 35 minutes) 256 millions decimal digits : 110000 seconds (~30 hours 30 minutes) This mode is not available in version 4.0 and 4.1 for user constants computations. Disk memory mode for huge computations -------------------------------------- This mode is the fastest for huge computations. It requires largest disk space. This mode is not available in version 4.0 and 4.1 for user constants computations. (timings for the computation of Pi with version 3.3, on an Athlon 1000Mhz overclocked at 1050Mhz with 1Go of physical memory, from Gary Oliaro) 500 million decimal digits : 8.6 hours 1 billion decimal digits : 21.5 hours 2 billion decimal digits : 52.3 hours 4 billion decimal digits : 117.6 hours 8 billion decimal digits : 284.1 hours Important : the timings in the disk memory use mode are sensitive to your disk access speed. ==================== | NUMBER OF DIGITS | ==================== Standard mode (no disk space used) ------------- The maximum number of decimal digits you can compute with PiFast in standard mode depends essentially of the amount of your physical memory. Several low memory modes enable to compute a very large number of digits. But when you want a large number of digits, you should use the disk memory mode instead of a very low memory mode in standard mode. (The threshold depends on the hardware configuration. On my 128M machine, it is between 16 and 32 millions of digits). With version 1.1, which ran only in standard mode, Stuart Lyster has reported me a 64 millions digits computation on a 128M machine. It is strongly recommended that you work only with physical memory (system disk swapping gives very poor results) in this mode. Disk memory use mode -------------------- Two mode of this type exist (a special disk memory mode can be used for really huge computations). (The disk memory use mode is not available in version 4.0 and 4.1 for user constants computations). This mode permits to reach a very large number of digits. The program can potentially compute a bit more than 16 billion digits. This limit has been reached by Shigeru Kondo who computed 16 billion digits of pi with PiFast on a home computer. This limitation is related to the maximum value 2^31 on the long C type (in various places in the program). I would appreciate any comments on a very large number of digits computation experience with PiFast (xgourdon@yahoo.fr). ========= | USAGE | ========= Run PiFast43.exe in a dos window (or in a unix shell window for unix versions. You may type chmod +x PiFast33 before running it). Usage information about running PiFast can be found in the help.txt file (avalaible at http://numbers.computation.free.fr/Constants/PiProgram/download.html) =========================== | BEHAVIOR OF THE PROGRAM | =========================== Memory : -------- In standard mode, the required physical memory by PiFast to compute N decimal digits of Pi using and FFT of size NFFT is approximately Physical memory (in bytes) = 2*N + 40*NFFT. In the disk memory mode, memory required is Physical memory (in bytes) = 48*NFFT Disk memory (in bytes) = 2*N In the disk memory huge mode, memory required is Physical memory (in bytes) = 48*NFFT Disk memory (in bytes) ~ 4.5*N (approximation) (The preceeding PiFast help file had 4*N instead of 4.5*N in the last term. This is not a different behavior of PiFast but just a correction in the documentation, since 4*N was a little too optimistic). Note that the Disk memory requirement during the computation is also enough to contain the final output files. The e computation behaves in the same way, expect in the disk memory huge mode where it uses only 3.5*N byte space on the disk. For user defined constants, the required memory is a bit larger but remains of the same kind. Timing : -------- In the fastest mode (until you do not reach your physical memory limit), PiFast is close to linear. That is, when you double the number of digits to be computed, your timing almost doubles (in fact, the practical factor is often 2.2 or 2.3). Things are getting worse once your physical memory is full : the program becomes asymptotically quadratic (that is, for huge number of digits, the timing has a factor of 4.) Timings in the standard disk memory mode on my 128M machine show a factor of 2.7 between the 32 and 64 millions computation, a factor of 3.1 between 64 and 128 millions, a factor of 3.6 ~ 3.7 between 128 and 256 millions. The asymptotic behavior is much better in the second disk memory mode, dedicated to huge computations. The theoritical behaviour in this latest mode is of the form T = n log(n)^3 + alpha * n^2, with alpha very small. In practical computations, the alpha*n^2 part is so small that it should not be representative (it correspond to the chinese remainder theorem in the NTT when the number of primes is large). Number of digits : ---------------- Due to how I do things, the program has a (small) threshold when the number of digits required is close to a power of two (for example, computing 1048576 digits in standard mode takes 6% more than computing 1040000 digits, whereas the difference of number of digits is 0.8% only). This remark is important since power of two number of digits is often used to compute pi (PiFast does not need power of two digits). For that reason, I personnaly compute pi with PiFast with 1 million digits (instead of 1Mega = 1048576), 2millions, 4, 8 ... ============================== | FORMULA AND ALGORITHM USED | ============================== Pi computation -------------- PiFast is based on Ramanajun like formula to compute Pi. The Chudnovsky method is based on the Chudnovsky formula ---- 426880 (10005)^(1/2) \ (6n)! (545140134 n + 13591409) -------------------- = / ------------------------------ Pi ---- (n!)^3 (3n)! (-640320)^(3n) n>=0 which adds roughly 14 decimal digits by term. The Ramanujan method is based on the Ramanujan formula ---- 1 \ (4n)! (1103 + 26390 n) ---- = 2 (2)^(1/2) / ----------------------- Pi ---- 4^(4n) (n!)^4 99^(4n+2) n>=0 which adds roughly 8 decimal digits by term. From these formulas, the algorithm used is the Brent binary splitting acceleration together with an efficient cache handling hermitian FFT to multiply big integers. Details of the binary splitting method can be found on my web site : http://numbers.computation.free.fr/Constants/Algorithms/splitting.html Note : This approach has a theoritical complexity of O(n log(n)^3) to compute n digits of Pi. The AGM techniques (Gauss-Salamin or Borwein Quartic algorithm for example) have an asymptotically better complexity of O(n log(n)^2). Nethertheless these theoritical bounds are not practical (non cached-memory access and data trashing make the practical complexity higher) and the constants on front of the big O are important. My experience is that for a reachable number of digits on actual machines, a well handled binary splitting approach from Chudnovsky formula seems better than any other known method. E computation ------------- Two formulas are available to compute E, ---- ---- \ 1 1 \ (-1)^n E = / --- and --- = / -------- ---- n! E ---- n! n>=0 n>=0 PiFast uses Binary splitting method. In the particular case of E the theoritical cost to compute n digits is O(n log(n)^2). User constant computation ------------------------- In version 4.0 and higher, PiFast can compute any user constant defined as a linear combination of hypergeometric series : ---- \ C = / gamma_n * H_n ---- 0<=n=1 or ---- ----- \ / | | F(k) \ H = alpha + beta * / ( | | ------ ) * Z^n (2) ---- \ G(k) / n>=1 0