Newton's method and high order iterations

 

(Click here for a Postscript version of this page).

 

1  Introduction

It is of great importance to solve equations of the form
f(x)=0,

in many applications in Mathematics, Physics, Chemistry, ... and of course in the computation of some important mathematical constants or functions like square roots. In this essay, we are only interested in one type of methods : the Newton's methods.

1.1  Newton's approach

Around 1669, Isaac Newton (1643-1727) gave a new algorithm [3] to solve a polynomial equation and it was illustrated on the example y3-2y-5=0. To find an accurate root of this equation, first one must guess a starting value, here y 2. Then just write y=2+p and after the substitution the equation becomes


p3+6p2+10p-1=0.

Because p is supposed to be small we neglect p3+6p2 compared to 10p-1 and the previous equation gives p 0.1, therefore a better approximation of the root is y 2.1. It's possible to repeat this process and write p=0.1+q, the substitution gives


q3+6.3q2+11.23q+0.061=0,

hence q -0.061/11.23=-0.0054... and a new approximation for y 2.0946... And the process should be repeated until the required number of digits is reached.

In his method, Newton doesn't explicitly use the notion of derivative and he only applies it on polynomial equations.

1.2  Raphson's iteration

A few years later, in 1690, a new step was made by Joseph Raphson (1678-1715) who proposed a method [6] which avoided the substitutions in Newton's approach. He illustrated his algorithm on the equation x3-bx+c=0, and starting with an approximation of this equation x g, a better approximation is given by


x g+  c+g3-bg

b-3g2
.

Observe that the denominator of the fraction is the opposite of the derivative of the numerator.

This was the historical beginning of the very important Newton's (or sometimes called Newton-Raphson's) algorithm.

1.3  Later studies

The method was then studied and generalized by other mathematicians like Simpson (1710-1761), Mourraille (1720-1808), Cauchy (1789-1857), Kantorovich (1912-1986), ... The important question of the choice of the starting point was first approached by Mourraille in 1768 and the difficulty to make this choice is the main drawback of the algorithm (see also [5]).

2  Newton's method

Nowadays, Newton's method is a generalized process to find an accurate root of a system (or a single) equations f(x)=0. We suppose that f is a C2function on a given interval, then using Taylor's expansion near x


f(x+h)=f(x)+hf(x)+O(h2),

and if we stop at the first order (linearization of the equation), we are looking for a small h such as


f(x+h)=0 f(x)+hf(x),

giving


 h
=-  f(x)

f(x)
,
 x+h
=x-  f(x)

f(x)
.

2.0.1  Newton's iteration

The Newton iteration is then given by the following procedure: start with an initial guess of the root x0, then find the limit of recurrence:


xn+1=xn-  f(xn)

f(xn)
,

and Figure 1 is a geometrical interpretation of a single iteration of this formula.

 

Figure
Figure 1: One Iteration of Newton's method

Unfortunately, this iteration may not converge or is unstable (we say chaotic sometimes) in many cases. However we have two important theorems giving conditions to insure the convergence of the process (see [2]).

2.0.2  Convergence's conditions

Theorem 1 Let x* be a root of f(x)=0, where f is a C2function on an interval containing x*, and we suppose that | f(x*)| > 0, then Newton's iteration will converge to x* if the starting point x0 is close enough to x*.

This theorem insure that Newton's method will always converge if the initial point is sufficiently close to the root and if this root if not singular (that is f(x*) is non zero). This process has the local convergence property. A more constructive theorem was given by Kantorovich, we give it in the particular case of a function of a single variable x.

Theorem 2 (Kantorovich). Let f be a C2 numerical function from an open interval I of the real line, let x0 be a point in this interval, then if there exist finite positive constants (m0,M0,K0) such as

 1

f(x0)

< m0,

 f(x0)

f(x0)

< M0,
| f(x0)|
< K0,

and if h0=2m0M0K0 1,Newton's iteration will converge to a root x* of f(x)=0, and
| xn-x*| < 21-nM0h02n-1.

This theorem gives sufficient conditions to insure the existence of a root and the convergence of Newton's process. More if h0 < 1, the last inequality shows that the convergence is quadratic (the number of good digits is doubled at each iteration). Note that if the starting point x0 tends to the root x*, the constant M0 will tend to zero and h0 will become smaller than 1, so the local convergence theorem is a consequence of Kantorovich's theorem.

3  Cubic iteration

Newton's iteration may be seen as a first order method (or linearization method), it's possible to go one step further and write the Taylor expansion of f to a higher order


f(x+h)=f(x)+hf(x)+  h2

2
f(x)+O(h3),

and we are looking for h such as


f(x+h)=0 f(x)+hf(x)+  h2

2
f(x),

we take the smallest solution for h (we have to suppose that f(x) and f(x) are non zero)


h=-  f(x)

f(x)


1-   


1-  2f(x)f(x)

f(x)2
 


.

It's not necessary to compute the square root, because if f(x) is small, using the expansion


1-

 

1-a
 
=  a

2
+  a2

8
+O(a3),

h becomes


h=-  f(x)

f(x)

1+  f(x)f(x)

2f(x)2
+...
.

The first attempt to use the second order expansion is due to the astronomer E. Halley (1656-1743) in 1694.

3.0.3  Householder's iteration

The previous expression for h, allows to derive the following cubic iteration (the number of digits triples at each iteration), starting with x0


xn+1=xn-  f(xn)

f(xn)

1+  f(xn)f(xn)

2f(xn)2

.

This procedure is given in [1]. It can be efficiently used to compute the inverse or the square root of a number.

Another similar cubic iteration may be given by


xn+1=xn-  2f(xn)f(xn)

2f(xn)2-f(xn)f(xn)
,

sometimes known as Halley's method. We may also write it as


xn+1
=xn-f(xn)
 1

f(xn)-f(xn)f(xn)/(2f(xn))

=xn-  f(xn)

f(xn)

1-  f(xn)f(xn)

2f(xn)2

-1

 
.

Note that if we replace (1-a)-1 by 1+a+O(a2), we retrieve Householder's iteration.

4  High order iterations

4.1  Householder's methods

Under some conditions of regularity of f and it's derivative, Householder gave in [1] the general iteration


xn+1=xn+(p+1)
 (1/f)(p)

(1/f)(p+1)



xn 
,

where p is an integer and (1/f)(p) is the derivative of order p of the inverse of the function f. This iteration has convergence of order (p+2). For example p=0 has quadratic convergence (order 2) and the formula gives back Newton's iteration while p=1 has cubical convergence (order 3) and gives again Halley's method. Just like Newton's method a good starting point is required to insure convergence.

Using the iteration with p=2, gives the following iteration which has quartical convergence (order 4)


xn+1=xn-f(xn)
 f(xn)2-f(xn)f(xn)/2

f(xn)3-f(xn)f(xn)f(xn)+f(3)(xn)f2(xn)/6

.

4.2  Modified methods

Another idea is to write


xn+1=xn+hn+a2n  hn2

2!
+a3n  hn3

3!
+...,

where hn=-f(xn)/f(xn) is given by the simple Newton's iteration and (a2n,a3n,...) are real parameters which we will estimate in order to minimize the value of f(xn+1):


f(xn+1)=f
xn+hn+a2n  hn2

2!
+a3n  hn3

3!
+...
,

we assume that f is regular enough and hn+a2nhn2/2!+a3nhn3/3!+... is small, hence using the expansion of f near xn


f(xn+1)=f(xn)+
hn+a2n  hn2

2!
+a3n  hn3

3!
+...
f(xn)+
hn+a2n  hn2

2!
+a3n  hn3

3!
+...
2

 
 f(xn)

2
+...,

and because f(xn)+hnf(xn)=0, we have


f(xn+1)=( a2nf(xn)+f(xn))  hn2

2!
+( a3nf(xn)+3a2nf(xn)+f(3)(xn))  hn3

3!
+O(hn4).

A good choice for the ain is clearly to cancel as many terms as possible in the previous expansion, so we impose


a2n
=-  fn

fn
,
a3n
=  -fnfn(3)+3( fn) 2

( fn) 2
,
a4n
=  -( fn) 2fn(4)+10fnfnfn(3)-15( fn) 3

( fn) 3
,
a5n
=  -( fn) 3fn(5)+15(fn) 2fnfn(4)+10(fn) 2( fn(3)) 2-105fn( fn) 2fn(3)+105( fn) 4

( fn) 4
,
a6n
=...



with, for the sake of brevity, the notation fn(k)=f(k)(xn). The formal values of the ain may be computed for much larger values of i. Finally the general iteration is


xn+1
=xn+hn
1+a2n  hn

2!
+a3n  hn2

3!
+a4n  hn3

4!
+...
=xn-  f(xn)

f(xn)

1+  f(xn)

2!f(xn)

 f(xn)

f(xn)

+  3f(xn)2-f(xn)f(3)(xn)

3!f(xn)2

 f(xn)

f(xn)

2

 
+...
.

For example, if we stop at a3n and set a4n=a5n=...=0, we have the helpful quartic modified iteration (note that this iteration is different than the previous Householder's quartic method)
xn+1=xn-  f(xn)

f(xn)

1+  f(xn)f(xn)

2!f(xn)2
+  f(xn)2(3f(xn)2-f(xn)f(3)(xn))

3!f(xn)4

,

and if we omit a3n, we retrieve Householder's cubic iteration.

It's also possible to find the expressions for (a4n,a5n,a6n,a7n,...), and define quintic, sextic, septic, octic ... iterations.

5  Examples

In the essay on the square root of two, some applications of those methods are given and also in the essay on how compute the inverse or the square root of a number. Here let's compare the different iterations on the computation of the Golden Ratio


j =  1+5

2
.

First we write j = 1/2+x and x=5/2 is root of the equation


 1

x2
-  4

5
=0,

then, it's convenient to set hn=1-4/5×xn2, we have the 5 algorithms, deduced from the general previous methods (left as exercise!), with respectively quadratic, cubic, quartic, sextic and octic convergence :


xn+1
=xn+  1

2
xnhn      (Newton),
xn+1
=xn+  1

8
xnhn( 4+3hn)      (Householder),
xn+1
=xn+  1

16
xnhn( 8+6hn+5hn2)      (Quartic),
xn+1
=xn+  1

256
xnhn( 128+96hn+hn2(80+70hn+63hn2) )       (Sextic),
xn+1
=xn+  1

2048
xnhn( 1024+768hn+hn2( 640+560hn+hn2( 504+462hn+429hn2)) )       (Octic).

Starting with the initial value x0=1.118, the first iteration becomes respectively


j1Newton
=x1+1/2=1.61803398(720...)      8 digits,
j1Householder
=x1+1/2=1.6180339887498(163...)      13digits,
j1Quartic
=x1+1/2=1.61803398874989484(402...)      17digits,
j1Sextic
=x1+1/2=1.6180339887498948482045868(216...)      25 digits,
j1Octic
=x1+1/2=1.618033988749894848204586834365638(076...)      33 digits,

and one more step gives for x2 respectively 17, 39, 69,154 and 273 good digits. If we iterate just three more steps, x5 will give respectively 139, 1049, 4406, 33321 and 140053 correct digits ... Observe that hn tends to zero as n increases which may be used to accelerate the computations and the choice of the best algorithm depends on your implementation (algorithm used for multiplication, the square, ...)

6  Newton's method for several variables

Newton's method may also be used to find a root of a system of two or more non linear equations


f(x,y)
=0
g(x,y)
=0,

where f and g are C2functions on a given domain. Using Taylor's expansion of the two functions near (x,y) we find


f(x+h,y+k)
=f(x,y)+h  f

x
+k  f

y
+O(h2+k2)
g(x+h,y+k)
=g(x,y)+h  g

x
+k  g

y
+O(h2+k2)

and if we keep only the first order terms, we are looking for a couple (h,k) such as


f(x+h,y+k)
=0 f(x,y)+h  f

x
+k  f

y
g(x+h,y+k)
=0 g(x,y)+h  g

x
+k  g

y

hence it's equivalent to the linear system








 f

x
 f

y
 g

x
 g

y








h
k


=-

f(x,y)
g(x,y)


.

The (2×2) matrix is called the Jacobian matrix (or Jacobian) and is sometimes denoted


J(x,y)=





 f

x
 f

y
 g

x
 g

y






(it's generalization as a (N×N) matrix for a system of N equations and N variables is immediate) . This suggest to define the new process




xn+1
yn+1


=

xn
yn


-J-1(xn,yn)

f(xn,yn)
g(xn,yn)


starting with an initial guess (x0,y0) and under certain conditions (which are not so easy to check and this is again the main disadvantage of the method), it's possible to show that this process converges to a root of the system. The convergence remains quadratic.

Example 3 We are looking for a root near (x0=-0.6,y0=0.6) of the following system
f(x,y)
=x3-3xy2-1
g(x,y)
=3x2y-y3,

here the Jacobian and its inverse become


J(xn,yn)
=3

xn2-yn2
-2xnyn
2xnyn
xn2-yn2


J-1(xn,yn)
=  1

3( xn2+yn2) 2


xn2-yn2
2xnyn
-2xnyn
xn2-yn2


and the process gives
x1
=-0.40000000000000000000,y1=0.86296296296296296296
x2
=-0.50478978186242263605,y2=0.85646430512069295697
x3
=-0.49988539803643124722,y3=0.86603764032215486664
x4
=-0.50000000406150565266,y4=0.86602539113638168322
x5
=-0.49999999999999983928,y5=0.86602540378443871965
...

Depending on your initial guess Newton's process may converge to one of the three roots of the system:
(-1/2,3/2),(-1/2,-3/2),(1,0),

and for some values of (x0,y0) the convergence of the process may be tricky! The study of the influence of this initial guess leads to aesthetic fractal pictures.

Cubic convergence also exists for several variables system of equations : Chebyshev's method.

References

[1]
A. S. Householder, The Numerical Treatment of a Single Nonlinear Equation, McGraw-Hill, New York, (1970)

[2]
L.V. Kantorovich and G.P. Akilov, Functional Analysis in Normed Spaces, Pergamon Press, Elmsford, New York, (1964)

[3]
I. Newton, Methodus fluxionum et serierum infinitarum, (1664-1671)

[4]
J.M. Ortega and W.C. Rheinboldt, Iterative solution of non linear equations in several variables, (1970)

[5]
H. Qiu, A Robust Examination of the Newton-Raphson Method with Strong Global Convergence Properties, Master's Thesis, University of Central Florida, (1993)

[6]
J. Raphson, Analysis Aequationum universalis, London, (1690)



File translated from TEX by TTH, version 3.01.
On 4 Oct 2001, 17:24.