(Click here for a Postscript version of this page).
It is of great importance to solve equations of the form

in many applications in Mathematics, Physics, Chemistry, ... and of course in the computation of some important mathematical constants or functions like square roots. In this essay, we are only interested in one type of methods : the Newton's methods.
Around 1669, Isaac Newton (16431727) gave a new algorithm [3] to solve a polynomial equation and it was illustrated on the example y^{3}2y5=0. To find an accurate root of this equation, first one must guess a starting value, here y » 2. Then just write y=2+p and after the substitution the equation becomes

Because p is supposed to be small we neglect p^{3}+6p^{2} compared to 10p1 and the previous equation gives p » 0.1, therefore a better approximation of the root is y » 2.1. It's possible to repeat this process and write p=0.1+q, the substitution gives

hence q » 0.061/11.23=0.0054... and a new approximation for y » 2.0946... And the process should be repeated until the required number of digits is reached.
In his method, Newton doesn't explicitly use the notion of derivative and he only applies it on polynomial equations.
A few years later, in 1690, a new step was made by Joseph Raphson (16781715) who proposed a method [6] which avoided the substitutions in Newton's approach. He illustrated his algorithm on the equation x^{3}bx+c=0, and starting with an approximation of this equation x » g, a better approximation is given by

Observe that the denominator of the fraction is the opposite of the derivative of the numerator.
This was the historical beginning of the very important Newton's (or sometimes called NewtonRaphson's) algorithm.
The method was then studied and generalized by other mathematicians like Simpson (17101761), Mourraille (17201808), Cauchy (17891857), Kantorovich (19121986), ... The important question of the choice of the starting point was first approached by Mourraille in 1768 and the difficulty to make this choice is the main drawback of the algorithm (see also [5]).
Nowadays, Newton's method is a generalized process to find an accurate root of a system (or a single) equations f(x)=0. We suppose that f is a C^{2}function on a given interval, then using Taylor's expansion near x

and if we stop at the first order (linearization of the equation), we are looking for a small h such as

giving

The Newton iteration is then given by the following procedure: start with an initial guess of the root x_{0}, then find the limit of recurrence:

and Figure 1 is a geometrical interpretation of a single iteration of this formula.
Unfortunately, this iteration may not converge or is unstable (we say chaotic sometimes) in many cases. However we have two important theorems giving conditions to insure the convergence of the process (see [2]).
Theorem 1 Let x^{*} be a root of f(x)=0, where f is a C^{2}function on an interval containing x^{*}, and we suppose that  f^{¢}(x^{*}) > 0, then Newton's iteration will converge to x^{*} if the starting point x_{0 }is close enough to x^{*}.
This theorem insure that Newton's method will always converge if the initial point is sufficiently close to the root and if this root if not singular (that is f^{¢}(x^{*}) is non zero). This process has the local convergence property. A more constructive theorem was given by Kantorovich, we give it in the particular case of a function of a single variable x.
Theorem 2
(Kantorovich). Let f be a C^{2} numerical function from an open interval
I of the real line, let x_{0} be a point in this interval, then if there
exist finite positive constants (m_{0},M_{0},K_{0}) such as
ê
ê
1
ê
ê
< m_{0},
ê
ê
f(x_{0})
ê
ê
< M_{0},
 f^{¢¢}(x_{0})
< K_{0},
and if h_{0}=2m_{0}M_{0}K_{0} £ 1,Newton's iteration will converge to a
root x^{*} of f(x)=0, and
 x_{n}x^{*} < 2^{1n}M_{0}h_{0}^{2n1}.
This theorem gives sufficient conditions to insure the existence of a root and the convergence of Newton's process. More if h_{0} < 1, the last inequality shows that the convergence is quadratic (the number of good digits is doubled at each iteration). Note that if the starting point x_{0} tends to the root x^{*}, the constant M_{0} will tend to zero and h_{0} will become smaller than 1, so the local convergence theorem is a consequence of Kantorovich's theorem.
Newton's iteration may be seen as a first order method (or linearization method), it's possible to go one step further and write the Taylor expansion of f to a higher order

and we are looking for h such as

we take the smallest solution for h (we have to suppose that f^{¢}(x) and f^{¢¢}(x) are non zero)

It's not necessary to compute the square root, because if f(x) is small, using the expansion

h becomes

The first attempt to use the second order expansion is due to the astronomer E. Halley (16561743) in 1694.
The previous expression for h, allows to derive the following cubic iteration (the number of digits triples at each iteration), starting with x_{0}

This procedure is given in [1]. It can be efficiently used to compute the inverse or the square root of a number.
Another similar cubic iteration may be given by

sometimes known as Halley's method. We may also write it as

Note that if we replace (1a)^{1} by 1+a+O(a^{2}), we retrieve Householder's iteration.
Under some conditions of regularity of f and it's derivative, Householder gave in [1] the general iteration

where p is an integer and (1/f)^{(p)} is the derivative of order p of the inverse of the function f. This iteration has convergence of order (p+2). For example p=0 has quadratic convergence (order 2) and the formula gives back Newton's iteration while p=1 has cubical convergence (order 3) and gives again Halley's method. Just like Newton's method a good starting point is required to insure convergence.
Using the iteration with p=2, gives the following iteration which has quartical convergence (order 4)

Another idea is to write

where h_{n}=f(x_{n})/f^{¢}(x_{n}) is given by the simple Newton's iteration and (a_{2}^{n},a_{3}^{n},...) are real parameters which we will estimate in order to minimize the value of f(x_{n+1}):

we assume that f is regular enough and h_{n}+a_{2}^{n}h_{n}^{2}/2!+a_{3}^{n}h_{n}^{3}/3!+... is small, hence using the expansion of f near x_{n}

and because f(x_{n})+h_{n}f^{¢}(x_{n})=0, we have

A good choice for the a_{i}^{n} is clearly to cancel as many terms as possible in the previous expansion, so we impose

with, for the sake of brevity, the notation f_{n}^{(k)}=f^{(k)}(x_{n}). The formal values of the a_{i}^{n} may be computed for
much larger values of i. Finally the general iteration is

For example, if we stop at a_{3}^{n} and set a_{4}^{n}=a_{5}^{n}=...=0, we
have the helpful quartic modified iteration (note that this
iteration is different than the previous Householder's quartic method)

and if we omit a_{3}^{n}, we retrieve Householder's cubic iteration.
It's also possible to find the expressions for (a_{4}^{n},a_{5}^{n},a_{6}^{n},a_{7}^{n},...), and define quintic, sextic, septic, octic ... iterations.
In the essay on the square root of two, some applications of those methods are given and also in the essay on how compute the inverse or the square root of a number. Here let's compare the different iterations on the computation of the Golden Ratio

First we write j = 1/2+x and x=Ö5/2 is root of the equation

then, it's convenient to set h_{n}=14/5×x_{n}^{2}, we have the 5 algorithms, deduced from the general previous methods (left as exercise!), with respectively quadratic, cubic, quartic, sextic and octic convergence :

Starting with the initial value x_{0}=1.118, the first iteration becomes respectively

and one more step gives for x_{2} respectively 17, 39, 69,154 and 273 good digits. If we iterate just three more steps, x_{5} will give respectively 139, 1049, 4406, 33321 and 140053 correct digits ... Observe that h_{n} tends to zero as n increases which may be used to accelerate the computations and the choice of the best algorithm depends on your implementation (algorithm used for multiplication, the square, ...)
Newton's method may also be used to find a root of a system of two or more non linear equations

where f and g are C^{2}functions on a given domain. Using Taylor's expansion of the two functions near (x,y) we find

and if we keep only the first order terms, we are looking for a couple (h,k) such as

hence it's equivalent to the linear system

The (2×2) matrix is called the Jacobian matrix (or Jacobian) and is sometimes denoted

(it's generalization as a (N×N) matrix for a system of N equations and N variables is immediate) . This suggest to define the new process

starting with an initial guess (x_{0},y_{0}) and under certain conditions (which are not so easy to check and this is again the main disadvantage of the method), it's possible to show that this process converges to a root of the system. The convergence remains quadratic.
Example 3
We are looking for a root near (x_{0}=0.6,y_{0}=0.6) of the following
system
f(x,y)
=x^{3}3xy^{2}1
g(x,y)
=3x^{2}yy^{3},
here the Jacobian and its inverse become
J(x_{n},y_{n})
=3
æ
ç
è
x_{n}^{2}y_{n}^{2}
2x_{n}y_{n}
2x_{n}y_{n}
x_{n}^{2}y_{n}^{2}
ö
÷
ø
J^{1}(x_{n},y_{n})
=
1
æ
ç
è
x_{n}^{2}y_{n}^{2}
2x_{n}y_{n}
2x_{n}y_{n}
x_{n}^{2}y_{n}^{2}
ö
÷
ø
and the process gives
x_{1}
=0.40000000000000000000,y_{1}=0.86296296296296296296
x_{2}
=0.50478978186242263605,y_{2}=0.85646430512069295697
x_{3}
=0.49988539803643124722,y_{3}=0.86603764032215486664
x_{4}
=0.50000000406150565266,y_{4}=0.86602539113638168322
x_{5}
=0.49999999999999983928,y_{5}=0.86602540378443871965
...
Depending on your initial guess Newton's process may converge to one of the
three roots of the system:
(1/2,Ö3/2),(1/2,Ö3/2),(1,0),
and for some values of (x_{0},y_{0}) the convergence of the process may be tricky! The study of the influence of this initial guess leads to aesthetic fractal pictures.
Cubic convergence also exists for several variables system of equations : Chebyshev's method.