I am trying to understand the following calculation results. Out[1]
, Out[2]
and Out[3]
seem to be related to the limit on precision of floats, and Out[4]
may be due to the fact that there is no limit on digits of int
. Correct? I wonder if someone can explain them in more detail.
In [1]: 2.0**52 == 2.0**52 + 1.0 Out[1]: False In [2]: 2.0**53 == 2.0**53 + 1.0 Out[2]: True In [3]: 2**53 == 2**53 + 1.0 Out[3]: True In [4]: 2**53 == 2**53 + 1 Out[4]: False
Advertisement
Answer
To understand why this happens, we must first understand the limits of int and float.
int & float type:
Integers have unlimited precision. Floating point numbers are usually implemented using double in C;
pow()
function:
For int operands, the result has the same type as the operands (after coercion) unless the second argument is negative;
A double usually occupies 64 bits, with a 52 bit mantissa. Therefore the largest integer you can store without losing precision is 2⁵³. (Integers above this limit can be stored, but precision is lost and is rounded to another integer.) Refer to this for more information.
Out[1]
: 2.0**52
is a float, and it can also store the exact value of it, thus it returns False
.
Out[2]
: 2.0**53
is a float, but 2.0**53 + 1.0
is too large for it to handle, so it is rounded to the nearest power of 2 (which is 2.0**53
). Therefore, it returns True
.
Out[3]
: 2**53
is an int (because of how the pow function works), however 1.0
is a float, so when these 2 are added, the int gets casted to a float. Just as above, it gets rounded. That is why when the 2 are compared, it also returns True
.
Out[4]
: 2**53
and 1
are ints, and since they have unlimited precision, the added 1 is not rounded off, and thus returns False
.