Faster Cube Root
Over the weekend, I decided to do some spelunking for a faster cube root. After a lot of searching, I realized most of the existing implementations (Kahan/BSD, Turkowski, et al.) rely on Newton-Raphson.
I found a little gem in a 1942 (!) volume of the Journal of the American Statistical Association by Otis Lancaster, Machine Method for the Extraction of Cube Root. With a little massaging of Lancaster's equation**, I offer you:
Given an approximation a of the cube root of R, the function will return an approximation of the cube root of R that is closer than a.
In my tests, at the cost of an extra add* and multiply, Lancaster converges faster than Newton-Raphson--the number of bits of precision seems to triple with each iteration, which is better than Newton-Raphson's doubling. With two iterations, it's the difference between a 4X improvement and a 9X improvement (quadratic vs cubic convergence).
Consequences. Your initial guess doesn't have to be much better than 5 bits of precision to get 16 bits of precision with one iteration of Lancaster. With a 6 bit initial guess, two iterations of Lancaster will give you near double precision (52 bit). If an initial guess is in the 7 to 8 bit range, one iteration will give you single precision (23 bit). Note: Kahan's cbrt( ) includes a bit hack that gives a 5-bit initial guess.
This is another subject I hope to write more about.
*The (a3 + R) add may be done only once.
** I've realized Lancaster's equation is the derived from Halley's method. More here.
(Excuse the lack of indentation in the function. Blogger is still horrible at this.)
I found a little gem in a 1942 (!) volume of the Journal of the American Statistical Association by Otis Lancaster, Machine Method for the Extraction of Cube Root. With a little massaging of Lancaster's equation**, I offer you:
double Lancaster(const double a, const double R)
{
const double a3 = a*a*a;
const double b= a * (a3 + R + R) / (a3 + a3 + R);
return b;
}
Given an approximation a of the cube root of R, the function will return an approximation of the cube root of R that is closer than a.
In my tests, at the cost of an extra add* and multiply, Lancaster converges faster than Newton-Raphson--the number of bits of precision seems to triple with each iteration, which is better than Newton-Raphson's doubling. With two iterations, it's the difference between a 4X improvement and a 9X improvement (quadratic vs cubic convergence).
Consequences. Your initial guess doesn't have to be much better than 5 bits of precision to get 16 bits of precision with one iteration of Lancaster. With a 6 bit initial guess, two iterations of Lancaster will give you near double precision (52 bit). If an initial guess is in the 7 to 8 bit range, one iteration will give you single precision (23 bit). Note: Kahan's cbrt( ) includes a bit hack that gives a 5-bit initial guess.
This is another subject I hope to write more about.
*The (a3 + R) add may be done only once.
** I've realized Lancaster's equation is the derived from Halley's method. More here.
(Excuse the lack of indentation in the function. Blogger is still horrible at this.)
0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
<< Home