Faster Cube Root III
This is post #3 on the subject of faster cube roots (#1, #2).
One of the challenges with efficiently leveraging iterative methods such as Newton-Raphson and Halley's is coming up with a good initial guess to feed into the iteration. This is the topic of this post.
One approach involves a bit hack exploiting properties of IEEE 754 floating point numbers. This sort of strategy is used in Kahan's implementation of cbrt( ) as found in the BSD sources. Some knowledge of the guts of floating point numbers is helpful.
Single precision numbers consist of three bit fields consuming 32 bits in the following way:
S[1] E[8] F[23]
The first bit, S, determines the sign of the number. If the bit is set, the floating point number is negative; otherwise, it's positive. The next 8 bits represent the exponent with a bias of 127. The last 23 bits form the fractional part such that:
value = 2^(E-127) * (1.0 + F/(2^24))
See Wikipedia and attendant references for a more detailed explanation: IEEE 754.
A key insight comes when the memory occupied by a IEEE 754 float is reinterpreted as an integer. When this is done, the integer in question is approximately logarithmic; after the exponent bias is subtracted, what remains is a log2(x) approximation scaled by 2^23.
For example, a function like the following can be used to approximate log2(x) by masking off the sign bit (u & 0x7fffffff), subtracting the exponent bias and scaling appropriately.
float log2_approx(float x)
{
unsigned int& i = (unsigned int&) x;
return float(((i&0x7fffffff) - (127<<23))) * (1.0f / (1<<23));
}
In the case of cube root, we'll divide the integer representation by 3 for an approximation of cbrt(x), which we can then refine using an iterative method such as Newton-Raphson or Halley's. The following function returns a cube root approximation with max relative error around 5% (in my tests). The exponent bias is subtracted, the integer representation is divided by three and then the exponent bias is restored. Caveat: It is assumed x >= 0. If negative x is possible, the sign bit needs to be saved and restored.
float cbrta(float x)
{
int& i = (int&) x;
i = (i - (127<<23)) / 3 + (127<<23);
return x;
}
Note: I've collected some notes on my investigations into faster cube roots here:In Search of a Fast Cube Root. Some sample C code is also available.
Homework:
1. Reduce to a divide and an add.
2. Replace the divide with a multiply.
3. Implement a double precision version.
4. Create general n-th root approximation.
One of the challenges with efficiently leveraging iterative methods such as Newton-Raphson and Halley's is coming up with a good initial guess to feed into the iteration. This is the topic of this post.
One approach involves a bit hack exploiting properties of IEEE 754 floating point numbers. This sort of strategy is used in Kahan's implementation of cbrt( ) as found in the BSD sources. Some knowledge of the guts of floating point numbers is helpful.
Single precision numbers consist of three bit fields consuming 32 bits in the following way:
S[1] E[8] F[23]
The first bit, S, determines the sign of the number. If the bit is set, the floating point number is negative; otherwise, it's positive. The next 8 bits represent the exponent with a bias of 127. The last 23 bits form the fractional part such that:
value = 2^(E-127) * (1.0 + F/(2^24))
See Wikipedia and attendant references for a more detailed explanation: IEEE 754.
A key insight comes when the memory occupied by a IEEE 754 float is reinterpreted as an integer. When this is done, the integer in question is approximately logarithmic; after the exponent bias is subtracted, what remains is a log2(x) approximation scaled by 2^23.
For example, a function like the following can be used to approximate log2(x) by masking off the sign bit (u & 0x7fffffff), subtracting the exponent bias and scaling appropriately.
float log2_approx(float x)
{
unsigned int& i = (unsigned int&) x;
return float(((i&0x7fffffff) - (127<<23))) * (1.0f / (1<<23));
}
In the case of cube root, we'll divide the integer representation by 3 for an approximation of cbrt(x), which we can then refine using an iterative method such as Newton-Raphson or Halley's. The following function returns a cube root approximation with max relative error around 5% (in my tests). The exponent bias is subtracted, the integer representation is divided by three and then the exponent bias is restored. Caveat: It is assumed x >= 0. If negative x is possible, the sign bit needs to be saved and restored.
float cbrta(float x)
{
int& i = (int&) x;
i = (i - (127<<23)) / 3 + (127<<23);
return x;
}
Note: I've collected some notes on my investigations into faster cube roots here:In Search of a Fast Cube Root. Some sample C code is also available.
Homework:
1. Reduce to a divide and an add.
2. Replace the divide with a multiply.
3. Implement a double precision version.
4. Create general n-th root approximation.
1 Comments:
Thanks for all your work! I've been trying to implement your cube root bit hacks on a 64-bit system; do you have any hints?
Post a Comment
Subscribe to Post Comments [Atom]
<< Home