Numpy multiplication using * (asterisk) returning wrong values when using named variables

Question

I am running into a problem using the operator * with numpy scalars, and it would be great if someone can explain what is going on. Basically, I needed to multiply the sums of columns and rows from various dataframes, and the easiest way to do that was to assign each aggregate to a variable, and then multiply those variables

Accepted Answer

The problem is that you are using fixed width integers (int64) that are capped in the minimum and maximum values they can hold, and you are trying to represent a number larger than what can be represented (integer overflow).You could either use variable size integers (like big int that Python uses) or you could switch to floats which trade off some precision for larger minimum and maximum values they can represent.Practically, you can just force the _sum variables to be treated as float before overflowing:a_sum = a_sum.astype(np.float_)With this you can observe that the following:no_vars = 111110 * 222220 * 333330 * 444440a_sum = a_sum.astype(np.float_)with_vars = a_sum * b_sum * c_sum * d_sumprint(no_vars/with_vars)will print a value of 1.0.Note that such apparently exact result is a result of this specific calculation and how numbers get converted.In general, results obtained with float arithmetic and big int arithmetic will be different, e.g.:print(no_vars)# 3657832649657049840000print(with_vars)# 3.6578326496570497e+21print(float(no_vars))# 3.6578326496570497e+21print(int(with_vars))# 3657832649657049677824print(no_vars == with_vars)# Falseprint(float(no_vars) == with_vars)# Trueprint(no_vars == int(with_vars))# False

Advertisement

Answer