I have run recently into a surprising and annoying bug in which I converted an integer into a float16 and the value changed:
>>> import numpy as np >>> np.array([2049]).astype(np.float16) array([2048.], dtype=float16) >>> np.array([2049]).astype(np.float16).astype(np.int32) array([2048.], dtype=int32)
This is likely not a bug, because it happens also for PyTorch. I guess it is related to half-float representation, but I couldn’t figure out why 2049 is the first integer that is badly casted.
The question is not specially related to Python (I guess)
Advertisement
Answer
You are right, its in general related to how floating-point numbers are defined (In IEEE 754 as others said). Lets look into it:
The float is represented by a sign s (here 1 bit), a mantissa m (here 10 bits) and an exponent e (here 5 bits for −14 ≤ e ≤ 15). The float x is then calculated by
x=s*[1].m*b**e,
where the basis b is 2 and [1] is a fixed (for-free) bit.
Up to 2**11 our integer number can be represented exactly by the mantissa, where
- 2** 11-1 is represented by m = bin(2**10-1) and e = bin(10)
- 2**11 is represented by m = bin(0) and e = bin(11)
then things get interesting:
- 2**11+1 can not be represented exactly by our mantissa and is rounded.
- 2**11+2 can be represented (by m = bin(0) and e = bin(11))
and so on…
Watch this video for detailed examples https://www.youtube.com/watch?v=L8OYx1I8qNg