Skip to content
Advertisement

Trouble subtracting two column values correctly/precisely in pandas dataframe in Python

I’m trying to create a new column in my pandas dataframe which will be the difference of two other columns, but the new column has values that are significantly different what what the differences between the values of the columns are. I have heard that ‘float’ values often don’t subtract precisely, so I have tried to convert the decimal values here to integers by changing the columns’ dtypes to ‘int64’ (as suggested here Pandas Subtract Two Columns Not Working Correctly) and then multiplying each value by 100000:

JavaScript

The values in the new column still do not equal the difference in the two original columns and I haven’t been able to find any questions on this site or others that have been able to help me resolve this. Could someone point out how I could fix this? I would be extremely grateful for any help.

For reference, here is a snapshot of my data with the incorrect difference-column values: enter image description here

EDIT: Here is a a bit of my CSV data too:

JavaScript

Ideally, I’d like to create a ‘diff_logfoldchanges0’ column that is equal to the values from the ‘0_logfoldchanges’ column minus the values from the ‘1_logfoldchanges’ column. In the CSV data below, I believe that might be “-1.3434665 – -2.952315”, “-15.159286 – -15.13933”, and “-0.015067068 – 0.18173021”.

Advertisement

Answer

pd.read_csv by default uses a fast but less precise way of reading floating point numbers

JavaScript

This difference is tiny (less than a quadrillionth of the original value), and usually not visible because the display rounds it, so I would not in most contexts call this ‘significant’ (it’s likely to be insignificant in relation to the precision / accuracy of the input data), but does show up if you check your calculation by manually typing the same numbers into the Python interpreter.

To read the values more precisely, use float_precision="round_trip":

JavaScript

Subtracting now produces the expected values (the same as doing a conventional python subtraction):

JavaScript

This is not due to floating point being imprecise as such, but is specific to the way pandas reads CSV files. This is a good guide to floating point rounding. In general, converting to integers will not help, except sometimes when dealing with money or other quantities that have a precise decimal representation.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement