Skip to content
Advertisement

Floating-point errors in cube root of exact cubic input

I found myself needing to compute the “integer cube root”, meaning the cube root of an integer, rounded down to the nearest integer. In Python, we could use the NumPy floating-point cbrt() function:

import numpy as np
def icbrt(x):
    return int(np.cbrt(x))

Though this works most of the time, it fails at certain input x, with the result being one less than expected. For example, icbrt(15**3) == 14, which comes about because np.cbrt(15**3) == 14.999999999999998. The following finds the first 100,000 such failures:

print([x for x in range(100_000) if (icbrt(x) + 1)**3 == x])
# [3375, 19683, 27000, 50653] == [15**3, 27**3, 30**3, 37**3]

Question: What is special about 15, 27, 30, 37, …, making cbrt() return ever so slightly below the exact result? I can find no obvious underlying pattern for these numbers.

A few observations:

  • The story is the same if we switch from NumPy’s cbrt() to that of Python’s math module, or if we switch from Python to C (not surprising, as I believe that both numpy.cbrt() and math.cbrt() delegate to cbrt() from the C math library in the end).
  • Replacing cbrt(x) with x**(1/3) (pow(x, 1./3.) in C) leads to many more cases of failure. Let us stick to cbrt().
  • For the square root, a similar problem does not arise, meaning that
    import numpy as np
    def isqrt(x):
        return int(np.sqrt(x))
    
    returns the correct result for all x (tested up to 100,000,000). Test code:
    print([x for x in range(100_000) if (y := np.sqrt(x))**2 != x and (y + 1)**2 <= x])
    

Extra

As the above icbrt() only seems to fail on cubic input, we can correct for the occasional mistakes by adding a fixup, like so:

import numpy as np
def icbrt(x):
    y = int(np.cbrt(x))
    if (y + 1)**3 == x:
        y += 1
    return y

A different solution is to stick to exact integer computation, implementing icbrt() without the use of floating-point numbers. This is discussed e.g. in this SO question. An extra benefit of such approaches is that they are (or can be) faster than using the floating-point cbrt().

To be clear, my question is not about how to write a better icbrt(), but about why cbrt() fails at some specific inputs.

Advertisement

Answer

This problem is caused by a bad implementation of cbrt. It is not caused by floating-point arithmetic because floating-point arithmetic is not a barrier to computing the cube root well enough to return an exactly correct result when the exactly correct result is representable in the floating-point format.

For example, if one were to use integer arithmetic to compute nine-fifths of 80, we would expect a correct result of 144. If a routine to compute nine-fifths of a number were implemented as int NineFifths(int x) { return 9/5*x; }, we would blame that routine for being implemented incorrectly, not blame integer arithmetic for not handling fractions. Similarly, if a routine uses floating-point arithmetic to calculate an incorrect result when a correct result is representable, we blame the routine, not floating-point arithmetic.

Some mathematical functions are difficult to calculate, and we accept some amount of error in them. In fact, for some of the routines in the math library, humans have not yet figured out how to calculate them with correct rounding in a known-bounded execution time. So we accept that not every math routine is correctly rounded.

Howver, when the mathematical value of a function is exactly representable in a floating-point format, the correct result can be obtained by faithful rounding rather than correct rounding. So this is a desirable goal for math library functions.

Correctly rounded means the computed result equals the number you would obtain by rounding the exact mathematical result to the nearest representable value.1 Faithfully rounded means the computed result is less than one ULP from the exact mathematical result. An ULP is the unit of least precision, the distance between two adjacent representable numbers.

Correctly rounding a function can be difficult because, in general, a function can be arbitrarily close to a rounding decision point. For round-to-nearest, this is midway between two adjacent representable numbers. Consider two adjacent representable numbers a and b. Their midpoint is m = (a+b)/2. If the mathematical value of some function f(x) is just below m, it should be rounded to a. If it is just above, it should be rounded to b. As we implement f in software, we might compute it with some very small error e. When we compute f(x), if our computed result lies in [m-e, m+e], and we only know the error bound is e, then we cannot tell whether f(x) is below m or above m. And because, in general, a function f(x) can be arbitrarily close to m, this is always a problem: No matter how accurately we compute f, no matter how small we make the error bound e, there is a possibility that our computed value will lie very close to a midpoint m, closer than e, and therefore our computation will not tell us whether to round down or to round up.

For some specific functions and floating-point formats, studies have been made and proofs have been written about how close the functions approach such rounding decision points, and so certain functions like sine and cosine can be implemented with correct rounding with known bounds on the compute time. Other functions have eluded proof so far.

In contrast, faithful rounding is easier to implement. If we compute a function with an error bound less than ½ ULP, then we can always return a faithfully rounded result, one that is within one ULP of the exact mathematical result. Once we have computed some result y, we round that to the nearest representable value2 and return that. Starting with y having error less than ½ ULP, the rounding may add up to ½ ULP more error, so the total error is less than one ULP, which is faithfully rounded.

A benefit of faithful rounding is that a faithfully rounded implementation of a function always produces the exact result when the exact result is representable. This is because the next nearest result is one ULP away, but faithful rounding always has an error less than one ULP. Thus, a faithfully rounded cbrt function returns exact results when they are representable.

What is special about 15, 27, 30, 37, …, making cbrt() return ever so slightly below the exact result? I can find no obvious underlying pattern for these numbers.

The bad cbrt implementation might compute the cube root by reducing the argument to a value in [1, 8) or similar interval and then applying a precomputed polynomial approximation. Each addition and multiplication in that polynomial may introduce a rounding error as the result of each operation is rounded to the nearest representable value in floating-point format. Additionally, the polynomial has inherent error. Rounding errors behave somewhat like a random process, sometimes rounding up, sometimes down. As they accumulate over several calculations, they may happen to round in different directions and cancel, or they may round in the same direction ad reinforce. If the errors happen to cancel by the end of the calculations, you get an exact result from cbrt. Otherwise, you may get an incorrect result from cbrt.

Footnotes

1 In general, there is a choice of rounding rules. The default and most common is round-to-nearest, ties-to-even. Others include round-upward, round-downward, and round-toward-zero. This answer focuses on round-to-nearest.

2 Inside a mathematical function, numbers may be computed using extended precision, so we may have computed results that are not representable in the destination floating-point format; they will have more precision.

Advertisement