The Java Mathematician: Newton's Method

As a follow-up from the previous post dealing with fixed-point iteration, another particularly useful family of numerical techniques deals with the iterative approximation of zeros (or roots) of a given function – that is, those points x where the value of a function f is zero, or f(x)=0. One such technique is Newton's Method (or the Newton-Raphson Method, if you like.)

The idea behind Newton's Method is as follows. If we suppose that we can use Taylor's Theorem to get a polynomial approximation to a function f in the neighbourhood of points that we care about, then we can actually use our fixed-point iteration to converge upon an approximation for a root of f.

So we'll do just that: we'll derive an expression from Taylor's theorem that we can use for the fixed-point iteration, and then code it up. We can then use our code to find the roots of some polynomials, as well as to implement another square-root finding procedure. As for the JVM language we'll use? We haven't seen JRuby make an appearance yet on this blog, so it's about time for its debut!

Using Taylor's Theorem to Derive Newton's Method

We have a definition for Taylor's Theorem as follows.

Now this theorem is fine on its own, but it turns out that it is also massively useful for deriving expressions for numerical derivatives!

(As an example, suppose we want an expression for the third derivative at the point x=a. We can simply truncate the Taylor's theorem expression after the f"'(a) term and rearrange things! This will give us an awesome approximation provided that the higher-order derivative terms are small in magnitude – which they can be, if we make the distance between x and a sufficiently small first.)

So suppose we're given a function that works with Taylor's theorem. We can truncate the formula to get the following expression involving only the first derivative:

Now what we actually have here is a rearrangement of an approximation to the secant line through the points (x, f(x)) and (a, f(a)), that will limit toward the tangent line as x approaches a.

Suppose we were using this approximation in an iterative routine of some sort. Let the point a be the x-value for the previous iterate x_k-1 and let the point x in the equation refer to our new x-value x_k :

We're trying to find a root of the function f, so this means that we want f(x_k)=0. Then we can rearrange the formula as follows:

Assuming that we started relatively close to the root, this equation will move us even closer to the root upon each iteration.

So this equation allows us to converge upon a root of the function f. When we've actually reached the root, the iteration will not be changing our x-value much, so we should stop when both sides of this equation are “equal”.

That is, we can rewrite that equation in one final, illuminating way:

So we've expressed our iteration equation in terms of a fixed-point computation, which means we can use everything we saw about fixed-points last time to solve this!

Coding Newton's Method

We saw how to write Scala code to approximate the fixed-point of a function in the previous post. So we won't spend too much time reinventing the wheel here; we'll simply bring that same code in again, except we'll translate it into Ruby (JRuby) first. Luckily Ruby has some similar functional constructs to Scala, so the translation is easy and almost word-for-word.

Now for the implementation of Newton's method using the above fixed-point formula, we need a couple of things:

We need a notion of the derivative of the function f. Here we'll use a simple numerical differentiation formula: (f(x+dx) – f(x)) / dx. It's reminiscent of the first principles derivative formula, except we won't be taking the limit as dx approaches 0 here, and instead we'll just define dx to be a really small number right off the bat.

(As an interesting fact, you could derive this formula using Taylor's theorem in a similar way to how we derived the Newton's Method formula, if you like! Take all of the terms up to the f'(a) term, and rearrange to get an expression for f'(a). Then write x=x+dx and a=x, and you've got the formula exactly.)

In terms of our code, we'll write a method derivative() as a higher-order procedure that takes in a given function f(x) and returns a procedure that calculates f'(x) using the numerical formula.
For newtons_method(), we simply pass the fixed-point expression g(x) defined above (using our derivative function where necessary), together with the value of the first guess, straight to the fixed_point() method.

The code is then written like this:

	# Returns an approximation to the fixed-point of the given function f
	# (using the given first_guess to get us started.)
	def fixed_point(f, first_guess)
	tolerance = 0.0001
	max_iterations = 30

	guess = first_guess
	max_iterations.times do
	return guess if (guess - f.call(guess)).abs < tolerance

	guess = f.call(guess)
	puts guess
	end

	# We return NaN if we didn't converge in time.
	return Float::NAN
	end

	# Returns an approximation to the first derivative of the function f.
	def derivative(f)
	dx = 0.0001
	lambda {\|x\| (f.call(x + dx) - f.call(x)) / dx}
	end

	# Newton's Method defined in terms of fixed-point iteration.
	# Returns an approximation to the solution of f(x)=0 for the
	# given function f (using the first_guess to get us started.)
	def newtons_method(f, first_guess)
	g = lambda {\|x\| x - (f.call(x) / derivative(f).call(x))}
	fixed_point(g, first_guess)
	end

view raw newtons_method.rb hosted with ❤ by GitHub

We should test the ancillary methods out to make sure that they work.

For our fixed_point() method, let's try calculating the Golden Ratio as a fixed-point of
1 + 1/x again to make sure that we didn't break anything with the translation:

     fixed_point(lambda {|x| 1 + 1/x}, 1.0)
     #> 2.0
     #> 1.5
     #> 1.6666666666666665
     #> 1.6
     #> 1.625
     #> 1.6153846153846154
     #> 1.619047619047619
     #> 1.6176470588235294
     #> 1.6181818181818182
     #> 1.6179775280898876
     #> => 1.6179775280898876

So far so good; that's the exact same sequence that we had last time.

Next, let's test our derivative() method with some easily verifiable derivative expressions, like e^x and sin(x):

   # The derivative of e^x is e^x:
     f_prime = derivative(lambda {|x| Math::E**x})
     #> #<Proc:0x6d64ab...

     f_prime.call(1)
     #> 2.718417747082924

     Math::E**1
     #> 2.718281828459045

     # The derivative of sin(x) is cos(x):
     g_prime = derivative(lambda {|x| Math::sin(x)})
     #> #<Proc:0x19e5ddf...

     g_prime.call(2)
     #> -0.4161923007262036

     Math::cos(2)
     #> -0.4161468365471424

Yep that seems about right, given our dx value.

Awesome, everything seems to be in order! Let's check out how our Newton's Method code works now.

Finding the Roots of a Cubic Polynomial

Let's use our newtons_method() procedure to find some roots of the cubic polynomial
p(x) = (x+2)(x-1)(x-3). Written in that form, we obviously expect that the answers should be x=-2, x=1 and x=3.

Now the exact root that we end up with will likely depend on which of them is closest to our initial guess. So we should try the computation with different initial guesses and see if we get different roots:

     cubic = lambda {|x| (x+2)*(x-1)*(x-3)}
     #> #<Proc:0x17e1b0c...

     newtons_method(cubic, -4.0)
     #> -2.813531168739014
     #> -2.2123485112688557
     #> -2.0204890109997287
     #> -2.00021914283588
     #> -2.0000000139230365
     #> => -2.0000000139230365

     newtons_method(cubic, 0.2)
     #> 1.0675842512119373
     #> 0.9991124332828843
     #> 0.9999998837245792
     #> => 0.9999998837245792

     newtons_method(cubic, 7.0)
     #> 5.1052947364746
     #> 3.941614424158428
     #> 3.3048012717747115
     #> 3.0486159292291264
     #> 3.0015726455603353
     #> 3.0000018380118294
     #> => 3.0000018380118294

Sure enough, we can find approximations for each of the different roots of the cubic by changing our initial guess. Note that the convergence is quite quick too.

Finding the Roots of the Sine Function

Here's a trickier example. Because the sine function is periodic, we actually have an infinite number of roots of the form k(π), for any integer k.

But more than that, we also have an infinite number of points of the form m(π/2) with m odd, where the tangent at that point is going to be almost horizontal*. Since geometrically we're always taking our next x-value to be the intercept of this tangent with the x-axis, we could end up miles away from where we started if we manage to hit one of these points!

(* We could also end up with a dead-horizontal tangent of slope 0.0, in which case our method ends up trying to divide by zero and we'll get an error. But this scenario is actually pretty unlikely computationally-speaking, given that we're using approximations everywhere: approximations for the function value, approximations for the derivative, approximations for using floating-point values at all, etc.)

Here's an example to demonstrate why this might be annoying:

     newtons_method(lambda {|x| Math::sin(x)}, 1.4)
     #> -4.399564985304007
     #> -1.3083184654500046
     #> 2.41293702113489
     #> 3.3053965925716784
     #> 3.1401103430854116
     #> 3.1415926545680652
     #> => 3.1415926545680652

     newtons_method(lambda {|x| Math::sin(x)}, 1.5)
     #> -12.611369487980836
     #> -12.56634031832556
     #> => -12.56634031832556

     newtons_method(lambda {|x| Math::sin(x)}, Math::PI/2)
     #> 20001.570917876215
     #> 20002.967748408017
     #> 20002.48797923701
     #> 20002.520436743973
     #> => 20002.520436743973

     newtons_method(lambda {|x| Math::sin(x)}, 1.6)
     #> 35.774039596475035
     #> 33.070107460573205
     #> 45.027768605502274
     #> 43.302452782431864
     #> 44.110824111529126
     #> 43.981583887046526
     #> 43.98229715035379
     #> => 43.98229715035379

We're getting radically different answers for surprisingly small changes between the initial guesses. Because of the nature of the function, we're not guaranteed to hit the root that is closest to our initial guess this time. So we do sometimes need to be careful when using Newton's method.

Finding Square Roots

But enough about how it can go wrong; let's see how it can go right! We can use Newton's method to find the square root of a given number x by solving for the root of the quadratic q(y)=x-y² . So we can write another square-root finding method:

	# We'll bring in the Newton's Method code...
	def fixed_point(f, first_guess)
	tolerance = 0.0001
	max_iterations = 30

	guess = first_guess
	max_iterations.times do
	return guess if (guess - f.call(guess)).abs < tolerance

	guess = f.call(guess)
	end

	return Float::NAN
	end

	def derivative(f)
	dx = 0.0001
	lambda {\|x\| (f.call(x + dx) - f.call(x)) / dx}
	end

	def newtons_method(f, first_guess)
	g = lambda {\|x\| x - (f.call(x) / derivative(f).call(x))}
	fixed_point(g, first_guess)
	end

	# ... And define our square-root method as follows.
	def sqrt(x)
	newtons_method(lambda {\|y\| x - y**2}, 1.0)
	end

view raw sqrt_newton.rb hosted with ❤ by GitHub

Now we have yet another implementation of a square-root finding function! We can compare this one to the sqrt() function available in Ruby's Math module:

     sqrt(2)
     #> 1.4142157752280764

     Math::sqrt(2)
     #> 1.4142135623730951

     sqrt(64)
     #> 8.000001687567377

     Math::sqrt(64)
     #> 8.0

     sqrt(1234567)
     #> 1111.1107055558664

     Math::sqrt(1234567)
     #> 1111.1107055554814

So we've got another decent implementation of a sqrt() function. Not bad, huh?

Saturday, January 24, 2015

Newton's Method

Using Taylor's Theorem to Derive Newton's Method

Coding Newton's Method

Finding the Roots of a Cubic Polynomial

Finding the Roots of the Sine Function

Finding Square Roots

No comments:

Post a Comment