Contents :
+ Introduction
+ Features
+ Version
+ Timings
+ Number of digits
+ Usage
+ Behavior of the program
+ Formula and algorithm used
+ Problems
+ What should be done for next versions
+ Source code information
+ Conclusion
================
| INTRODUCTION |
================
PiFast computes decimal digits of the constants Pi, E=exp(1), Sqrt(2) or
classical hypergeometric series. The default output are the
files pi.txt and e.txt (for the constants Pi and E).
PiFast currently holds the world record pi computation on a home
computer with 12.88 billion digits (Shigeru Kondo, on a Pentium III 800,
1024 Mo of memory). PiFast can compute much more digits than what could
fit in the physical memory, using disk memory.
PiFast is available on several platforms, download it from the page
http://numbers.computation.free.fr/Constants/PiProgram/download.html
(unix version is PiFast 3.3 only).
Windows 95/98 and NT : download the file pifast.zip
hpux10.20 : download the file pifast_hp.gz
irix6.2 : download the file pifast_irix.gz
aix4.3.2 : download the file pifast_aix.gz
The web adress of the PiFast site is
http://numbers.computation.free.fr/Constants/PiProgram/pifast.html
============
| FEATURES |
============
* Computes decimal digits of pi, e=exp(1), sqrt2, with the high limit of 17 billion digits
* Very fast (fastest pi program available on the web - July 5 2003)
* Much more economical in memory than other pi programs
* Parameters available to fit in the memory machine (memory saved <-> time lost)
* Able to handle disk memory for very large computations, permitting to compute much more
digits than could be contained in the physical memory
* Very large computations can be done in several chunks
* Can compute a lot of other predefined constants (log(2), zeta(3), ...) with many formulas
* User can define his own formula to compute a constant
* Can compress directly the output result and later, decompress ranges of it
===========
| VERSION |
===========
Current version : PiFast43 (version 4.3, released on Jul 05 2003, fixed on Jul 08 2003).
New features of version 4.3 :
* Faster than previous version : 4-5% for pi computations, and more (10-15%) for
e computations
* A serious (but rare) bug discovered by Shigeru Kondo in the computation of
64 million digits of e=1/exp(-1) was fixed. The bug existed since version 4.0
* Ability to specify a factor suffix k,K,m,M,g,G after the number of digits
(k=1000,K=1024,m=1000000,M=1,048,576...)
* Maximum number of digits is now 17 billion (instead of 12 billion with previous versions)
Older versions
+++++++++++++
Version : 4.2 (released on Jan 07 2003).
-------------
* PiFast42 is significantly faster than previous version (saving is at least 10 or 25% depending on the hardware).
* Some bugs have been corrected.
Version : 4.1 (released on Sep 06 2001, fixed on Sep 25 2001).
-------------
* PiFast41 is slightly faster (3%) for small computations.
* Some bugs have been corrected.
Version : 4.0 (released on Aug 30 2001)
-------------
* PiFast40 can compute user defined constants from a large class
of hypergeometric series. This permits to compute a lot of other
constants like log(2), zeta(3), pi with arctan formulas...
(see section for more details)
* PiFast40 is faster : progress have been made on the FFT.
The speed-up ratio varies from 15% (small computations) to 10%
or less (larger computations) in standard mode.
Version : 3.3 (released on Jul 08 2000)
-------------
* PiFast33 computes also the number e = exp(1). Two methods are
available.
* PiFast can verify a pi computation by using the Fabrice Bellard
formula which computes directly the n-th bit of pi.
* A bug (reported by Shigeru Kondo for some computations with more than
6 billion digits) have been corrected.
* PiFast is now available on unix platforms.
* The PiFast.log file, generated if problems are encountered, has
been enriched. More internal tests are done.
* The default output format is now a classical format (Guttenberg)
Version : 3.2 (released on Dec 27 1999)
-------------
* A larger number of digits can be computed with a limited physical memory in the
advanced disk memory mode.
* Timings have been decreased by a few percents, especially in the standard mode.
* Physical memory needed has been decreased by a few percents (~ 4%).
* A Sqrt(2) computation is available (with verification). This mode essentially
permits to test PiFast32 with a huge number of digits, with ranges that I cannot
reach with my computer. The timings of Sqrt(2) computation is at least ten times
faster than the pi computation.
* When errors of the program are encountered (I hope this does not happen, but since
I could not test PiFast in huge ranges, bugs can appear...), a PiFast.log file is
generated. This file contains information about the program and the error and should
be send to xgourdon@yahoo.fr to help me to debug the problem.
* When reading a compressed format, the program can generate x digits every y digits,
where x and y are user specified.
Version : 3.1 (released on Oct 30 1999)
-------------
This version contains several improvements, essentially dedicated
to huge computations (1 giga digits or more).
* A new disk swap mode exists, using a Number Theoritic Transform (NTT) with
a variable number of primes, that permits to have a quasi-linear behavior of
the timings of the program for really huge computations (to be compared with
the asymptotic quadratic behavior of PiFast23).
* The maximum number of digits to be computed with that version is
now 24 billions digits (instead of 4.5 billions with version 2.3)
in the disk swap mode. In this respect, developpement have been made
to limit the internal swap files size to 2 giga-bytes (by generating
several files if needed).
* The output can be directly compressed by PiFast32. The resulting file
can be read (to access digits at some places only for example) or
transformed into a standard one with PiFast. Different output format
are also available in this compressed output mode.
Version : 2.3 (released on Sep 5 1999)
-------------
* A difficult bug (reported to me by Shigeru Kondo) was found that
made version 2.2 going wrong when FFT size was >= 4096k
(possible on machines with 256 Meg at least).
* The maximum number of digits to be computed with that version is
now 4.5 billion digits (instead of 1.5 billion with version 2.2).
Large number of digits computation with version 2.3 :
> 2,684,354,560 ( = 2.5*2^30 ) digits on a PIII 550 (1024 Mo)
(19 Oct 1999) in 245 hours by Shigeru Kondo (world record pi
computation on PC at this date).
> 2.1 billions digits on a PIII 350 (1024 Mo) (13 Sep 1999) in
157 hours by Shigeru Kondo.
Version : 2.2 (released on 10 Aug 1999)
-------------
PiFast22 is Version 2.2 of PiFast. Efforts have essential been made
in the disk swap mode. new features of version 2.2 :
* In the disk swap mode, Disk swapping has been reduced, saving a
quite large part of the total timing.
* Disk swap timings are given in the computation information summary
(in the header of the resulting file).
* Disk head movements are reduced.
* A bug has been fixed to have all computed digits right (in some
cases with version 1.1 and 2.1, the 0.8% last digits were wrong).
Another bug in internal computation has also been fixed.
* Physical memory needed has been decreased by 5% in the disk
memory mode.
* Documentation has been enlarged (behavior of the program).
Version : 2.1 (released on 5 Aug 1999)
-------------
* It can use disk memory to perform huge computations.
* A bug has been fixed than permits to reach a larger number of
digits.
* A large computation using disk memory can be done in several
runs.
* A computation information summary is written as the header of the
resulting file.
Version 1.1 (released on 30 jul 1999) :
-----------
* Faster than the first version of PiFast (15%)
* Less memory consuming
Version 1.0 (First Version) was released on 17 jul 1999 :
-----------
* Only Pi computation available (2 formulas, Chudnovsky and Ramanujan)
in standard mode.
===========
| TIMINGS |
===========
PiFast42 is the fastest program available of the net
to compute Pi on Windows (Jan 07 2003). Another pi program on Windows,
named QuickPi, is comparable in speed with PiFast when the number
of digits is less than 1 million but becomes less efficient for larger computations.
See also the page http://home.istar.ca/~lyster/pi.html
for the current fastest pi programs on PC.
I have not been able to perform comparisons on unix platforms. Notice
that PiFast on unix is not as tuned as on windows. For example, PiFast
on windows uses basic routines handling bytes in the little-endian
conventions, this has not been tuned in the big-endian convention for
most unix machines.
Following are just sample of timings. More can be found from
PiFast home page :
http://numbers.computation.free.fr/Constants/PiFast/pifast.html
Send me your timings on other machines !
Standard mode (no disk space used)
-------------
This mode is the fastest when there are enough physical memory.
Timings sample on a Pentium 4 1.4 Ghz with 1024 Mo running on Windows
NT (notice that no particular compilation has been made to benefit from
Pentium 4 specific instructions) :
PI Computation : (timings with version 4.1)
1 Million decimal digits : 9.69 seconds
2 Million decimal digits : 22.78 seconds
4 Million decimal digits : 50.80 seconds
8 Million decimal digits : 116.38 seconds
16 Million decimal digits : 264.9 seconds
32 Million decimal digits : 583.25 seconds
64 Million decimal digits : 1327 seconds
128 Million decimal digits : 2974 seconds
E Computation : (timings with version 4.1)
1 Million decimal digits : 4.33 seconds
2 Million decimal digits : 9.47 seconds
4 Million decimal digits : 21.34 seconds
8 Million decimal digits : 44.03 seconds
16 Million decimal digits : 95.25 seconds
32 Million decimal digits : 206.74 seconds
64 Million decimal digits : 433 seconds
128 Million decimal digits : 948 seconds
Standard Disk memory mode
-------------------------
This mode is be used for very large computation when the standard mode
requires too much physical memory. It only requires a few disk space.
(timings for Pi with version 2.3, on my Pentium 350 with 128M of
physical memory)
32 millions decimal digits : 3662 seconds (~ 1 hour 1 minutes)
64 millions decimal digits : 9829 seconds (~ 2 hours 44 minutes)
128 millions decimal digits : 30320 seconds (~ 8 hours 35 minutes)
256 millions decimal digits : 110000 seconds (~30 hours 30 minutes)
This mode is not available in version 4.0 and 4.1 for user constants
computations.
Disk memory mode for huge computations
--------------------------------------
This mode is the fastest for huge computations. It requires largest
disk space. This mode is not available in version 4.0 and 4.1 for user
constants computations.
(timings for the computation of Pi with version 3.3, on an Athlon 1000Mhz
overclocked at 1050Mhz with 1Go of physical memory, from Gary Oliaro)
500 million decimal digits : 8.6 hours
1 billion decimal digits : 21.5 hours
2 billion decimal digits : 52.3 hours
4 billion decimal digits : 117.6 hours
8 billion decimal digits : 284.1 hours
Important : the timings in the disk memory use mode are sensitive
to your disk access speed.
====================
| NUMBER OF DIGITS |
====================
Standard mode (no disk space used)
-------------
The maximum number of decimal digits you can compute with PiFast in
standard mode depends essentially of the amount of your physical memory.
Several low memory modes enable to compute a very large number of
digits. But when you want a large number of digits, you should use the
disk memory mode instead of a very low memory mode in standard mode.
(The threshold depends on the hardware configuration.
On my 128M machine, it is between 16 and 32 millions of digits).
With version 1.1, which ran only in standard mode, Stuart Lyster has
reported me a 64 millions digits computation on a 128M machine.
It is strongly recommended that you work only with physical memory
(system disk swapping gives very poor results) in this mode.
Disk memory use mode
--------------------
Two mode of this type exist (a special disk memory mode can be used
for really huge computations). (The disk memory use mode is not
available in version 4.0 and 4.1 for user constants computations).
This mode permits to reach a very large number of digits.
The program can potentially compute a bit more than 16 billion digits.
This limit has been reached by Shigeru Kondo who computed 16
billion digits of pi with PiFast on a home computer.
This limitation is related to the maximum value 2^31 on the long C type
(in various places in the program).
I would appreciate any comments on a very large number of digits
computation experience with PiFast (xgourdon@yahoo.fr).
=========
| USAGE |
=========
Run PiFast43.exe in a dos window (or in a unix shell window for
unix versions. You may type chmod +x PiFast33 before running it).
Usage information about running PiFast can be found in the help.txt
file (avalaible at
http://numbers.computation.free.fr/Constants/PiProgram/download.html)
===========================
| BEHAVIOR OF THE PROGRAM |
===========================
Memory :
--------
In standard mode, the required physical memory by PiFast to compute N
decimal digits of Pi using and FFT of size NFFT is approximately
Physical memory (in bytes) = 2*N + 40*NFFT.
In the disk memory mode, memory required is
Physical memory (in bytes) = 48*NFFT
Disk memory (in bytes) = 2*N
In the disk memory huge mode, memory required is
Physical memory (in bytes) = 48*NFFT
Disk memory (in bytes) ~ 4.5*N (approximation)
(The preceeding PiFast help file had 4*N instead of 4.5*N in the last
term. This is not a different behavior of PiFast but just a correction
in the documentation, since 4*N was a little too optimistic).
Note that the Disk memory requirement during the computation is also
enough to contain the final output files. The e computation behaves in
the same way, expect in the disk memory huge mode where it uses only
3.5*N byte space on the disk. For user defined constants, the
required memory is a bit larger but remains of the same kind.
Timing :
--------
In the fastest mode (until you do not reach your physical memory limit),
PiFast is close to linear. That is, when you double the number of
digits to be computed, your timing almost doubles (in fact, the
practical factor is often 2.2 or 2.3).
Things are getting worse once your physical memory is full : the
program becomes asymptotically quadratic (that is, for huge number of
digits, the timing has a factor of 4.) Timings in the standard
disk memory mode on my 128M machine show a factor of 2.7 between the
32 and 64 millions computation, a factor of 3.1 between 64 and 128 millions,
a factor of 3.6 ~ 3.7 between 128 and 256 millions.
The asymptotic behavior is much better in the second disk memory mode,
dedicated to huge computations. The theoritical behaviour in this
latest mode is of the form T = n log(n)^3 + alpha * n^2, with alpha
very small. In practical computations, the alpha*n^2 part is so small that
it should not be representative (it correspond to the chinese
remainder theorem in the NTT when the number of primes is large).
Number of digits :
----------------
Due to how I do things, the program has a (small) threshold when the
number of digits required is close to a power of two (for example,
computing 1048576 digits in standard mode takes 6% more than
computing 1040000 digits, whereas the difference of number of digits
is 0.8% only).
This remark is important since power of two number of digits
is often used to compute pi (PiFast does not need power of two
digits). For that reason, I personnaly compute pi with PiFast with
1 million digits (instead of 1Mega = 1048576), 2millions, 4, 8 ...
==============================
| FORMULA AND ALGORITHM USED |
==============================
Pi computation
--------------
PiFast is based on Ramanajun like formula to compute Pi.
The Chudnovsky method is based on the Chudnovsky formula
----
426880 (10005)^(1/2) \ (6n)! (545140134 n + 13591409)
-------------------- = / ------------------------------
Pi ---- (n!)^3 (3n)! (-640320)^(3n)
n>=0
which adds roughly 14 decimal digits by term.
The Ramanujan method is based on the Ramanujan formula
----
1 \ (4n)! (1103 + 26390 n)
---- = 2 (2)^(1/2) / -----------------------
Pi ---- 4^(4n) (n!)^4 99^(4n+2)
n>=0
which adds roughly 8 decimal digits by term.
From these formulas, the algorithm used is the Brent binary splitting
acceleration together with an efficient cache handling hermitian FFT
to multiply big integers. Details of the binary splitting method can be
found on my web site :
http://numbers.computation.free.fr/Constants/Algorithms/splitting.html
Note : This approach has a theoritical complexity of O(n log(n)^3)
to compute n digits of Pi. The AGM techniques (Gauss-Salamin or
Borwein Quartic algorithm for example) have an asymptotically better
complexity of O(n log(n)^2). Nethertheless these
theoritical bounds are not practical (non cached-memory access
and data trashing make the practical complexity higher) and the
constants on front of the big O are important.
My experience is that for a reachable number of digits
on actual machines, a well handled binary splitting approach
from Chudnovsky formula seems better than any other known method.
E computation
-------------
Two formulas are available to compute E,
---- ----
\ 1 1 \ (-1)^n
E = / --- and --- = / --------
---- n! E ---- n!
n>=0 n>=0
PiFast uses Binary splitting method. In the particular case of E the
theoritical cost to compute n digits is O(n log(n)^2).
User constant computation
-------------------------
In version 4.0 and higher, PiFast can compute any user constant defined
as a linear combination of hypergeometric series :
----
\
C = / gamma_n * H_n
----
0<=n=1
or
---- -----
\ / | | F(k) \
H = alpha + beta * / ( | | ------ ) * Z^n (2)
---- \ G(k) /
n>=1 0