Newton's method and high order iterations

(Click here for a Postscript version of this page).

1 Introduction

It is of great importance to solve equations of the form

f(x)=0,

in many applications in Mathematics, Physics, Chemistry, ... and of course in the computation of some important mathematical constants or functions like square roots. In this essay, we are only interested in one type of methods : the Newton's methods.

1.1 Newton's approach

Around 1669, Isaac Newton (1643-1727) gave a new algorithm [3] to solve a polynomial equation and it was illustrated on the example y³-2y-5=0. To find an accurate root of this equation, first one must guess a starting value, here y » 2. Then just write y=2+p and after the substitution the equation becomes

p³+6p²+10p-1=0.

Because p is supposed to be small we neglect p³+6p² compared to 10p-1 and the previous equation gives p » 0.1, therefore a better approximation of the root is y » 2.1. It's possible to repeat this process and write p=0.1+q, the substitution gives

q³+6.3q²+11.23q+0.061=0,

hence q » -0.061/11.23=-0.0054... and a new approximation for y » 2.0946... And the process should be repeated until the required number of digits is reached.

In his method, Newton doesn't explicitly use the notion of derivative and he only applies it on polynomial equations.

1.2 Raphson's iteration

A few years later, in 1690, a new step was made by Joseph Raphson (1678-1715) who proposed a method [6] which avoided the substitutions in Newton's approach. He illustrated his algorithm on the equation x³-bx+c=0, and starting with an approximation of this equation x » g, a better approximation is given by

x » g+ c+g³-bg
b-3g²
.

Observe that the denominator of the fraction is the opposite of the derivative of the numerator.

This was the historical beginning of the very important Newton's (or sometimes called Newton-Raphson's) algorithm.

1.3 Later studies

The method was then studied and generalized by other mathematicians like Simpson (1710-1761), Mourraille (1720-1808), Cauchy (1789-1857), Kantorovich (1912-1986), ... The important question of the choice of the starting point was first approached by Mourraille in 1768 and the difficulty to make this choice is the main drawback of the algorithm (see also [5]).

2 Newton's method

Nowadays, Newton's method is a generalized process to find an accurate root of a system (or a single) equations f(x)=0. We suppose that f is a C²function on a given interval, then using Taylor's expansion near x

f(x+h)=f(x)+hf^¢(x)+O(h²),

and if we stop at the first order (linearization of the equation), we are looking for a small h such as

f(x+h)=0 » f(x)+hf^¢(x),

giving

h

=- f(x)
f^¢(x)
,
x+h

=x- f(x)
f^¢(x)
.

2.0.1 Newton's iteration

The Newton iteration is then given by the following procedure: start with an initial guess of the root x₀, then find the limit of recurrence:

x_n+1=x_n- f(x_n)
f^¢(x_n)
,

and Figure 1 is a geometrical interpretation of a single iteration of this formula.

Figure Figure 1: One Iteration of Newton's method

Unfortunately, this iteration may not converge or is unstable (we say chaotic sometimes) in many cases. However we have two important theorems giving conditions to insure the convergence of the process (see [2]).

2.0.2 Convergence's conditions

Theorem 1 Let x^* be a root of f(x)=0, where f is a C²function on an interval containing x^*, and we suppose that | f^¢(x^*)| > 0, then Newton's iteration will converge to x^* if the starting point x₀is close enough to x^*.

This theorem insure that Newton's method will always converge if the initial point is sufficiently close to the root and if this root if not singular (that is f^¢(x^*) is non zero). This process has the local convergence property. A more constructive theorem was given by Kantorovich, we give it in the particular case of a function of a single variable x.

Theorem 2 (Kantorovich). Let f be a C² numerical function from an open interval I of the real line, let x₀ be a point in this interval, then if there exist finite positive constants (m₀,M₀,K₀) such as

ê
ê 1
f^¢(x₀)
ê
ê

< m₀,
ê
ê f(x₀)
f^¢(x₀)
ê
ê

< M₀,
| f^¢¢(x₀)|

< K₀,

and if h₀=2m₀M₀K₀ £ 1,Newton's iteration will converge to a root x^* of f(x)=0, and

| x_n-x^*| < 2^1-nM₀h₀^{2^n-1}.

This theorem gives sufficient conditions to insure the existence of a root and the convergence of Newton's process. More if h₀ < 1, the last inequality shows that the convergence is quadratic (the number of good digits is doubled at each iteration). Note that if the starting point x₀ tends to the root x^*, the constant M₀ will tend to zero and h₀ will become smaller than 1, so the local convergence theorem is a consequence of Kantorovich's theorem.

3 Cubic iteration

Newton's iteration may be seen as a first order method (or linearization method), it's possible to go one step further and write the Taylor expansion of f to a higher order

f(x+h)=f(x)+hf^¢(x)+ h²
2
f^¢¢(x)+O(h³),

and we are looking for h such as

f(x+h)=0 » f(x)+hf^¢(x)+ h²
2
f^¢¢(x),

we take the smallest solution for h (we have to suppose that f^¢(x) and f^¢¢(x) are non zero)

h=- f^¢(x)
f^¢¢(x)
æ
ç
è 1- æ
Ö

1- 2f(x)f^¢¢(x)
f^¢(x)²

ö
÷
ø .

It's not necessary to compute the square root, because if f(x) is small, using the expansion

1-
Ö

1-a

= a
2
+ a²
8
+O(a³),

h becomes

h=- f(x)
f^¢(x)
æ
è 1+ f(x)f^¢¢(x)
2f^¢(x)²
+... ö
ø .

The first attempt to use the second order expansion is due to the astronomer E. Halley (1656-1743) in 1694.

3.0.3 Householder's iteration

The previous expression for h, allows to derive the following cubic iteration (the number of digits triples at each iteration), starting with x₀

x_n+1=x_n- f(x_n)
f^¢(x_n)
æ
è 1+ f(x_n)f^¢¢(x_n)
2f^¢(x_n)²
ö
ø .

This procedure is given in [1]. It can be efficiently used to compute the inverse or the square root of a number.

Another similar cubic iteration may be given by

x_n+1=x_n- 2f(x_n)f^¢(x_n)
2f^¢(x_n)²-f(x_n)f^¢¢(x_n)
,

sometimes known as Halley's method. We may also write it as

x_n+1

=x_n-f(x_n) æ
è 1
f^¢(x_n)-f(x_n)f^¢¢(x_n)/(2f^¢(x_n))
ö
ø

=x_n- f(x_n)
f^¢(x_n)
æ
è 1- f(x_n)f^¢¢(x_n)
2f^¢(x_n)²
ö
ø -1

.

Note that if we replace (1-a)^-1 by 1+a+O(a²), we retrieve Householder's iteration.

4 High order iterations

4.1 Householder's methods

Under some conditions of regularity of f and it's derivative, Householder gave in [1] the general iteration

x_n+1=x_n+(p+1) æ
è (1/f)^(p)
(1/f)^(p+1)
ö
ø

x_n
,

where p is an integer and (1/f)^(p) is the derivative of order p of the inverse of the function f. This iteration has convergence of order (p+2). For example p=0 has quadratic convergence (order 2) and the formula gives back Newton's iteration while p=1 has cubical convergence (order 3) and gives again Halley's method. Just like Newton's method a good starting point is required to insure convergence.

Using the iteration with p=2, gives the following iteration which has quartical convergence (order 4)

x_n+1=x_n-f(x_n) æ
è f^¢(x_n)²-f(x_n)f^¢¢(x_n)/2
f^¢(x_n)³-f(x_n)f^¢(x_n)f^¢¢(x_n)+f⁽³⁾(x_n)f²(x_n)/6
ö
ø .

4.2 Modified methods

Another idea is to write

x_n+1=x_n+h_n+a₂ⁿ h_n²
2!
+a₃ⁿ h_n³
3!
+...,

where h_n=-f(x_n)/f^¢(x_n) is given by the simple Newton's iteration and (a₂ⁿ,a₃ⁿ,...) are real parameters which we will estimate in order to minimize the value of f(x_n+1):

f(x_n+1)=f æ
è x_n+h_n+a₂ⁿ h_n²
2!
+a₃ⁿ h_n³
3!
+... ö
ø ,

we assume that f is regular enough and h_n+a₂ⁿh_n²/2!+a₃ⁿh_n³/3!+... is small, hence using the expansion of f near x_n

f(x_n+1)=f(x_n)+ æ
è h_n+a₂ⁿ h_n²
2!
+a₃ⁿ h_n³
3!
+... ö
ø f^¢(x_n)+ æ
è h_n+a₂ⁿ h_n²
2!
+a₃ⁿ h_n³
3!
+... ö
ø 2

f^¢¢(x_n)
2
+...,

and because f(x_n)+h_nf^¢(x_n)=0, we have

f(x_n+1)=( a₂ⁿf^¢(x_n)+f^¢¢(x_n)) h_n²
2!
+( a₃ⁿf^¢(x_n)+3a₂ⁿf^¢¢(x_n)+f⁽³⁾(x_n)) h_n³
3!
+O(h_n⁴).

A good choice for the a_iⁿ is clearly to cancel as many terms as possible in the previous expansion, so we impose

a₂ⁿ

=- f_n^¢¢
f_n^¢
,
a₃ⁿ

= -f_n^¢f_n⁽³⁾+3( f_n^¢¢) ²
( f_n^¢) ²
,
a₄ⁿ

= -( f_n^¢) ²f_n⁽⁴⁾+10f_n^¢f_n^¢¢f_n⁽³⁾-15( f_n^¢¢) ³
( f_n^¢) ³
,
a₅ⁿ

= -( f_n^¢) ³f_n⁽⁵⁾+15(f_n^¢) ²f_n^¢¢f_n⁽⁴⁾+10(f_n^¢) ²( f_n⁽³⁾) ²-105f_n^¢( f_n^¢¢) ²f_n⁽³⁾+105( f_n^¢¢) ⁴
( f_n^¢) ⁴
,
a₆ⁿ

=...

with, for the sake of brevity, the notation f_n^(k)=f^(k)(x_n). The formal values of the a_iⁿ may be computed for much larger values of i. Finally the general iteration is

x_n+1

=x_n+h_n æ
è 1+a₂ⁿ h_n
2!
+a₃ⁿ h_n²
3!
+a₄ⁿ h_n³
4!
+... ö
ø

=x_n- f(x_n)
f^¢(x_n)
æ
è 1+ f^¢¢(x_n)
2!f^¢(x_n)
æ
è f(x_n)
f^¢(x_n)
ö
ø + 3f^¢¢(x_n)²-f^¢(x_n)f⁽³⁾(x_n)
3!f^¢(x_n)²
æ
è f(x_n)
f^¢(x_n)
ö
ø 2

+... ö
ø .

For example, if we stop at a₃ⁿ and set a₄ⁿ=a₅ⁿ=...=0, we have the helpful quartic modified iteration (note that this iteration is different than the previous Householder's quartic method)

x_n+1=x_n- f(x_n)
f^¢(x_n)
æ
è 1+ f(x_n)f^¢¢(x_n)
2!f^¢(x_n)²
+ f(x_n)²(3f^¢¢(x_n)²-f^¢(x_n)f⁽³⁾(x_n))
3!f^¢(x_n)⁴
ö
ø ,

and if we omit a₃ⁿ, we retrieve Householder's cubic iteration.

It's also possible to find the expressions for (a₄ⁿ,a₅ⁿ,a₆ⁿ,a₇ⁿ,...), and define quintic, sextic, septic, octic ... iterations.

5 Examples

In the essay on the square root of two, some applications of those methods are given and also in the essay on how compute the inverse or the square root of a number. Here let's compare the different iterations on the computation of the Golden Ratio

j = 1+Ö5
2
.

First we write j = 1/2+x and x=Ö5/2 is root of the equation

1
x²
- 4
5
=0,

then, it's convenient to set h_n=1-4/5×x_n², we have the 5 algorithms, deduced from the general previous methods (left as exercise!), with respectively quadratic, cubic, quartic, sextic and octic convergence :

x_n+1

=x_n+ 1
2
x_nh_n      (Newton),
x_n+1

=x_n+ 1
8
x_nh_n( 4+3h_n)      (Householder),
x_n+1

=x_n+ 1
16
x_nh_n( 8+6h_n+5h_n²)      (Quartic),
x_n+1

=x_n+ 1
256
x_nh_n( 128+96h_n+h_n²(80+70h_n+63h_n²) )       (Sextic),
x_n+1

=x_n+ 1
2048
x_nh_n( 1024+768h_n+h_n²( 640+560h_n+h_n²( 504+462h_n+429h_n²)) )       (Octic).

Starting with the initial value x₀=1.118, the first iteration becomes respectively

j₁^Newton

=x₁+1/2=1.61803398(720...)      8 digits,
j₁^Householder

=x₁+1/2=1.6180339887498(163...)      13digits,
j₁^Quartic

=x₁+1/2=1.61803398874989484(402...)      17digits,
j₁^Sextic

=x₁+1/2=1.6180339887498948482045868(216...)      25 digits,
j₁^Octic

=x₁+1/2=1.618033988749894848204586834365638(076...)      33 digits,

and one more step gives for x₂ respectively 17, 39, 69,154 and 273 good digits. If we iterate just three more steps, x₅ will give respectively 139, 1049, 4406, 33321 and 140053 correct digits ... Observe that h_n tends to zero as n increases which may be used to accelerate the computations and the choice of the best algorithm depends on your implementation (algorithm used for multiplication, the square, ...)

6 Newton's method for several variables

Newton's method may also be used to find a root of a system of two or more non linear equations

f(x,y)

=0
g(x,y)

=0,

where f and g are C²functions on a given domain. Using Taylor's expansion of the two functions near (x,y) we find

f(x+h,y+k)

=f(x,y)+h ¶f
¶x
+k ¶f
¶y
+O(h²+k²)
g(x+h,y+k)

=g(x,y)+h ¶g
¶x
+k ¶g
¶y
+O(h²+k²)

and if we keep only the first order terms, we are looking for a couple (h,k) such as

f(x+h,y+k)

=0 » f(x,y)+h ¶f
¶x
+k ¶f
¶y

g(x+h,y+k)

=0 » g(x,y)+h ¶g
¶x
+k ¶g
¶y

hence it's equivalent to the linear system

æ
ç
ç
ç
ç
ç
è

¶f
¶x

¶f
¶y

¶g
¶x

¶g
¶y

ö
÷
÷
÷
÷
÷
ø æ
ç
è

h

k
ö
÷
ø =- æ
ç
è

f(x,y)

g(x,y)
ö
÷
ø .

The (2×2) matrix is called the Jacobian matrix (or Jacobian) and is sometimes denoted

J(x,y)= æ
ç
ç
ç
ç
ç
è

¶f
¶x

¶f
¶y

¶g
¶x

¶g
¶y

ö
÷
÷
÷
÷
÷
ø

(it's generalization as a (N×N) matrix for a system of N equations and N variables is immediate) . This suggest to define the new process

æ
ç
è

x_n+1

y_n+1
ö
÷
ø = æ
ç
è

x_n

y_n
ö
÷
ø -J^-1(x_n,y_n) æ
ç
è

f(x_n,y_n)

g(x_n,y_n)
ö
÷
ø

starting with an initial guess (x₀,y₀) and under certain conditions (which are not so easy to check and this is again the main disadvantage of the method), it's possible to show that this process converges to a root of the system. The convergence remains quadratic.

Example 3 We are looking for a root near (x₀=-0.6,y₀=0.6) of the following system

f(x,y)

=x³-3xy²-1
g(x,y)

=3x²y-y³,

here the Jacobian and its inverse become

J(x_n,y_n)

=3 æ
ç
è

x_n²-y_n²
-2x_ny_n

2x_ny_n
x_n²-y_n²
ö
÷
ø
J^-1(x_n,y_n)

= 1
3( x_n²+y_n²) ²
æ
ç
è

x_n²-y_n²
2x_ny_n

-2x_ny_n
x_n²-y_n²
ö
÷
ø

and the process gives

x₁

=-0.40000000000000000000,y₁=0.86296296296296296296
x₂

=-0.50478978186242263605,y₂=0.85646430512069295697
x₃

=-0.49988539803643124722,y₃=0.86603764032215486664
x₄

=-0.50000000406150565266,y₄=0.86602539113638168322
x₅

=-0.49999999999999983928,y₅=0.86602540378443871965

...

Depending on your initial guess Newton's process may converge to one of the three roots of the system:

(-1/2,Ö3/2),(-1/2,-Ö3/2),(1,0),

and for some values of (x₀,y₀) the convergence of the process may be tricky! The study of the influence of this initial guess leads to aesthetic fractal pictures.

Cubic convergence also exists for several variables system of equations : Chebyshev's method.

References

[1]: A. S. Householder, The Numerical Treatment of a Single Nonlinear Equation, McGraw-Hill, New York, (1970)
[2]: L.V. Kantorovich and G.P. Akilov, Functional Analysis in Normed Spaces, Pergamon Press, Elmsford, New York, (1964)
[3]: I. Newton, Methodus fluxionum et serierum infinitarum, (1664-1671)
[4]: J.M. Ortega and W.C. Rheinboldt, Iterative solution of non linear equations in several variables, (1970)
[5]: H. Qiu, A Robust Examination of the Newton-Raphson Method with Strong Global Convergence Properties, Master's Thesis, University of Central Florida, (1993)
[6]: J. Raphson, Analysis Aequationum universalis, London, (1690)

File translated from T_EX by T_TH, version 3.01.
On 4 Oct 2001, 17:24.