Skip to content
Advertisement

Unexpected tail of the floating point from using arange loop to name the columns

I tried to create some numbers and rename the output columns with the np.arange loop as the following:

def conditional_zero_column(filename="random.csv"):
    df = pd.read_csv(filename)
    for i in np.arange(0.6,1.0,0.01):
        df['reject'+str(i)] = np.where(df['expected_discount'] < i, df['expected_discount'], i)
        df['reject'+str(i)] = np.where(df['reject'+str(i)] >= i, df['reject'+str(i)], 0.0)
        df.to_csv("random_data4.csv", index=False)

The conditional number part worked for me.

The columns’ names were fine between columns reject0.6-reject0.68. After that, all columns’ names turned to reject with the unexpected numbers, e.g., reject0.6900000000000001, reject0.8000000000000002, reject0.9900000000000003for all other columns as the attached the picture shows.

[column names] 1

I am curious why the numbers are different after 0.69. I tried to simply replace np.arange with np.linspace, but it doesn’t work for me. Am I wrong with any part?

I appreciate any help you can provide.

Advertisement

Answer

You are seeing the tail of the floating point precision. It is impossible to exactly represent most floats, and we end up with a tail that end at the numeric precision.

I think you can solve this by formatting the strings you are using for column names.

def conditional_zero_column(filename="random.csv"):
    df = pd.read_csv(filename)
    for i in np.arange(0.6, 1.0, 0.01):
        col = f'reject{i:.2f}'
        df[col] = np.where(df['expected_discount'] < i, df['expected_discount'], i)
        df[col] = np.where(df[col] >= i, df[col], 0.0)
        df.to_csv("random_data4.csv", index=False)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement