High order algorithms to find square roots and inverses

1 Introduction

Inverting a number of size n can be done in time O(n²) with the classical school division. This time cost can be reduced to O(nlog(n)) when using a fast multiplication, thanks to Newton's iteration. The same type of techniques apply to the n-th root computation of a number. For those operations we also describe some very efficient algorithms with high order of convergence (that is quartic, quintic, sextic, ... algorithms). Computing square roots and inverses to a great accuracy is also useful to compute many usual mathematical functions like logarithms and trigonometric functions (see Brent's remarkable paper [2]) or to compute constants like p,log(2),...([2], [6]).

2 High precision inversion

In this section we want to compute the inverse 1/A of the number A to a great accuracy.

2.1 Newton's iteration

Starting from a few digits approximation x₀ of 1/A, we apply Newton's iteration to the function f(x) = 1/x-A, which gives the approximating sequence

x_n+1 = x_n-

f(x_n)

f^¢(x_n)

= 2x_n-Ax_n². (*)

It's a property of Newton iteration to converge quadratically near the root, that is the number of right digits doubles at each iteration. Thus, a number of about log(n) iterations suffices to have n digits of 1/A.

The iteration (*) have a nice feature : the only needed operations are the product by 2, the difference and the product of large numbers. The Newton convergence being quadratic, the precision needed to compute iteration (*) is not the full final precision but only with the precision wanted at step n .

As an example, we compute the inverse of p, starting from the approximation x₀ = 0.31831 (5 digits). The first three iterations become

x₁

2x₀-px₀² = 0.3183098861 (operations done with 10 digits of precision)

x₂

2x₁-px₁² = 0.31830988618379067151 (operations done with 20 digits of precision)

x₃

2x₂-px₂² = 0.3183098861837906715377675267450287240665 (operations done with 40 digits)

The first 38 digits of x₃ agree with those of 1/p.

The cost of this process is of the order of the multiplication cost : since each iteration involves two multiplications on high precision numbers (that is x_n×x_n and A×x_n²), to find 1/A we need to compute 2 multiplications of size n (last step), 2 multiplications of size n/2 (previous step), 2 multiplications of size n/4, etc. Using for example an FFT multiplication, with cost O(nlogn), the final cost is of order

2nlog(n)+2

log

æ
ç
è

ö
÷
ø

log

æ
ç
è

ö
÷
ø

+¼ £ 2n

æ
ç
è

+¼

ö
÷
ø

log(n) = 4nlog(n).

Note : a better iteration is obtained by writing (*) in the form

h_n

1-Ax_n

x_n+1

x_n+x_nh_n.

Since cancellations occurs in h_n = 1-Ax_n, the second multiplication x_nh_n can be done by taking into account only the non zero part of h_n. This has also another advantage : the number of vanishing terms in front of h_n gives the approximation precision of x_n.

A division B/A is performed by multiplying B by the inverse of A.

A practical example of inverse of large numbers from FFT multiplication can be found as an easy C sample code at Easy programs for constants computation.

2.2 A cubic iteration

From the general Householder's iteration ([3] and [1] p. 216)

x_n+1 = x_n-

f(x_n)

f^¢(x_n)

æ
ç
è

f(x_n)f^¢¢(x_n)

2f^¢(x_n)²

ö
÷
ø

applied on f(x) = 1/x-A, we find

h_n

1-Ax_n

x_n+1

x_n+x_n(h_n+h_n²)

Starting with a good initial point x₀, the convergence is cubical, that is, the number of good digits is multiplied by 3 at each step of the process (A similar form of this iteration is given in [1] p. 216 and [4] p. 279). Since h_n tends to zero, the new term h_n² can be computed at a lower cost.

As an example, we compute again, the inverse of p, starting from the approximation x₀ = 0.31831 (5 digits). The first two iterations are

x₁

0.3183098861837906715(523...), (19 good digits)

x₂

0.3183098861837906715377675267450287240689...,(58 good digits)

and x₃has 174 good digits...

2.3 High order iterations

Quartic iteration

It's possible to find a quartic iteration, just apply twice Newton's iteration on f(x) = 1/x-A, we find

h_n

1-Ax_n

x_n+1

x_n+x_n( h_n+h_n²+h_n³) .

If we take a look at our example (inverse of p), starting as usual with x₀ = 0.31831, we find that x₁ has 26 good digits, x₂ is 103 digits exact, x₃ has 413 good digits ... The sequence of correct digits is then for this example

26,103,413,1650,6601,26405,...

Quintic iteration

A quintic iteration founded by the application of a general quintic modified iteration (here, we omit the details of the demonstration which is deduced from our essay on Newton's methods) is given by

h_n

1-Ax_n

x_n+1

x_n+x_n( h_n+h_n²+h_n³+h_n⁴) = x_n+x_n( 1+h_n²) ( h_n+h_n²) ,

the rate of convergence is impressive, again on the previous example, x₁ has 32 correct digits and x₂ has 161 good digits, x₃ has 806 good digits ... The second factorization of this iteration requires only 4 multiplications (A×x_n,h_n×h_n,(1+h_n²)×(h_n+h_n²),x_n×(1+h_n²)(h_n+h_n²)) at each step!

Higher order

The inverse of a number can be computed with an iteration at any desired order, it's possible to show that the general algorithm of order r is given by

h_n

1-Ax_n

x_n+1

x_n+x_nP^r-1(h_n),

where P^m(u) is a polynomial of degree m, given by

P^m(u)

m
å
k = 1

a_ku^k

a_k

For example, for m = 3,m = 4, m = 5, m = 6 and m = 7 (respectively quartic, quintic, sextic, septic and octic iterations)

P³(u)

u+u²+u³,

P⁴(u)

u+u²+u³+u⁴ = (1+u²)(u+u²),

P⁵(u)

u+u²+u³+u⁴+u⁵ = u+(1+u)(u²+u⁴),

P⁶(u)

u+u²+u³+u⁴+u⁵+u⁶ = (1+u²+u⁴)(u+u²),

P⁷(u)

u+u²+u³+u⁴+u⁵+u⁶+u⁷ = u+(1+u³)(u²+u³+u⁴)

...

In practical cases the number h_n tends to zero, therefore the evaluation of the powers h_n^k is more and more efficient when k increases. Observe that some of the given factorizations may reduce the number of required multiplications. A careful implementation of those high order iterations may be very efficient.

3 Square roots

The same technique as for the inverses applies for square roots.

3.1 Newton's iteration

Using the function f(x) = x²-A, Newton iteration gives

x_n+1 = x_n-

f(x_n)

f^¢(x_n)

x_n

2x_n

(In other words, the iteration is the mean value of x_n and A/x_n).

This iteration has one disadvantage : one should compute the division A/x_n. When one wants to compute 1/ÖA, a better iteration is obtained from the function f(x) = 1/x²-A, yielding the iteration

x_n+1 =

x_n-

Ax_n³. (**)

This formula only requires multiplications ! An easier way of computing a square root ÖA is then to compute B = 1/ÖA from this latest process and then to perform the product A×B = ÖA.

Note : as for the inverse, the best way to compute the iteration (**) is to write it in the form

h_n

1-Ax_n²

x_n+1

x_n+

x_n

h_n,

since cancellations occur in h_n = 1-Ax_n².

3.2 A cubic iteration

Again, from the Householder's iteration, applied on the function f(x) = 1/x²-A, we find after some easy manipulations

x_n+1 =

x_n

(15-10Ax_n²+3A²x_n⁴).

This process converges cubically to the inverse of the square root of the number A (This form of the iteration is also given in [1] p. 217). Just like for the quadratic iteration, a more efficient way is to write it

h_n

1-Ax_n²

x_n+1

x_n+x_n

æ
ç
è

h_n+

h_n²

ö
÷
ø

= x_n+

x_n

( 4h_n+3h_n²) .

3.3 High order iterations

Again like for the inverse, it's possible to find a quartic iteration, just apply twice Newton's iteration on f(x) = 1/x²-A :

x_n+1 = x_n-

x_n

(1-Ax_n²)(-20+19Ax_n²-8A²x_n⁴+A³x_n⁶).

This algorithm converges quartically to 1/ÖA.

It is interesting to compare this result to the direct application of the general quartic modified iteration to our function (see the essay on Newton's methods)

x_n+1 = x_n-

x_n

(1-Ax_n²)(-19+16Ax_n²-5A²x_n⁴).

This simpler relation suggests that it may be more efficient to use the quartic iteration than twice the Newton iteration. The best form of this quartic algorithm is

h_n

1-Ax_n²

x_n+1

x_n+

x_n

( 8h_n+6h_n²+5h_n³) .

As an application of the modified iterations, we provide the general expression of some high order iterations to compute square roots. The general pattern of a high order iteration of order r is given by

h_n

1-Ax_n²

x_n+1

x_n+x_nP^r-1(h_n)

with P^m(u) being a polynomial of degree m with rational coefficients. The first polynomials are given with factorization to save some multiplications, the best choice depends on your implementation.

Quintic iteration

P⁴(u)

128

( 64u+48u²+40u³+35u⁴)

128

( 64u+u²( 48+40u+35u²) ) .

Sextic iteration

P⁵(u)

256

( 128u+96u²+80u³+70u⁴+63u⁵)

256

( 128u+96u²+u³( 80+70u+63u²) )

Septic iteration

P⁶(u)

1024

(512u+384u²+320u³+280u⁴+252u⁵+231u⁶)

1024

( 512u+384u²+320u³+u³(280u+252u²+231u³) )

Octic iteration

P⁷(u)

2048

(1024u+768u²+640u³+560u⁴+504u⁵+462u⁶+429u⁷)

2048

(1024u+768u²+640u³+560u⁴+u⁴(504u+462u²+429u³))

For example to evaluate P⁷(u), you may compute in this order

u×u,u×u²,u×u³,u⁴(504u+462u²+429u³)

that is, four large multiplications and don't forget that, in our cases, u is small so that those multiplications are even faster to perform.

Expression of the polynomials P^m(u)

The coefficients of the polynomials P^m(u) are in fact given by the series expansion of

___
Ö1-u

-1

¥
å
k = 1

a_ku^k

a_k

2(2k-1)!

k!(k-1)!4^k

this allow to define easily any order algorithm to compute square roots.

4 m-th roots

The previous techniques may be used in order to compute the m-th root (m being an integer) of the number A. Therefore, to find with a great accuracy the number A^-1/m, we use Newton's method on the function f(x) = 1/x^m-A and this will produce the process

h_n

1-Ax_n^m

x_n+1

x_n+x_n

h_n

which converges quadratically to A^-1/m. To find A^1/m just compute at the end of the N previous iterations

y_N = Ax_N^m-1.

Householder's iteration also becomes

h_n

1-Ax_n^m

x_n+1

x_n+x_n

æ
ç
è

h_n+

1+m

2m²

h_n²

ö
÷
ø

which converges cubically to A^-1/m.

4.1 High order iteration

The general pattern for any order iteration is given by

h_n

1-Ax_n^m

x_n+1

x_n+x_nP(h_n)

and to find P(u) you need to compute the following series expansion

(1-u)^-1/m-1 =

1+m

2m²

u²+

(1+m)(1+2m)

3!m³

u³+...

The truncated series at order k will provide an algorithm of order k+1. Observe that for the value m = 1, we retrieve the polynomials used to compute the inverse of a number and for m = 2, it gives back the algorithms to find square roots.

4.1.1 Example

For m = 4 the following algorithm converges quartically to A^-1/4

h_n

1-Ax_n⁴

x_n+1

x_n+x_n

æ
ç
è

h_n+

h_n²+

128

h_n³

ö
÷
ø

and A^1/4 is given by

Ax_n³.

References

[1]: J.M. Borwein and P.B. Borwein, ''Pi and the AGM - A study in Analytic Number Theory and Computational Complexity'', A Wiley-Interscience Publication, New York, (1987)
[2]: R.P. Brent, Fast multiple-Precision evaluation of elementary functions, J. Assoc. Comput. Mach., (1976), vol. 23, p. 242-251
[3]: A. S. Householder, The Numerical Treatment of a Single Nonlinear Equation, McGraw-Hill, New York, (1970)
[4]: D.E. Knuth, The Art of Computer Programming, Vol. II, Seminumerical Algorithms, Addison Wesley, (1998).
[5]: I. Newton, Methodus fluxionum et serierum infinitarum, (1664-1671)
[6]: E. Salamin, Computation of p Using Arithmetic-Geometric Mean, Mathematics of Computation, (1976), vol. 30, p. 565-570 This is why it's important to find such algorithms.

File translated from T_EX by T_TH, version 2.32.
On 5 Feb 2001, 19:06.