I tried to create some numbers and rename the output columns with the np.arange
loop as the following:
def conditional_zero_column(filename="random.csv"): df = pd.read_csv(filename) for i in np.arange(0.6,1.0,0.01): df['reject'+str(i)] = np.where(df['expected_discount'] < i, df['expected_discount'], i) df['reject'+str(i)] = np.where(df['reject'+str(i)] >= i, df['reject'+str(i)], 0.0) df.to_csv("random_data4.csv", index=False)
The conditional number part worked for me.
The columns’ names were fine between columns reject0.6-reject0.68
.
After that, all columns’ names turned to reject
with the unexpected numbers, e.g., reject0.6900000000000001
, reject0.8000000000000002
, reject0.9900000000000003
for all other columns as the attached the picture shows.
[column names]
I am curious why the numbers are different after 0.69. I tried to simply replace np.arange
with np.linspace
, but it doesn’t work for me.
Am I wrong with any part?
I appreciate any help you can provide.
Advertisement
Answer
You are seeing the tail of the floating point precision. It is impossible to exactly represent most floats, and we end up with a tail that end at the numeric precision.
I think you can solve this by formatting the strings you are using for column names.
def conditional_zero_column(filename="random.csv"): df = pd.read_csv(filename) for i in np.arange(0.6, 1.0, 0.01): col = f'reject{i:.2f}' df[col] = np.where(df['expected_discount'] < i, df['expected_discount'], i) df[col] = np.where(df[col] >= i, df[col], 0.0) df.to_csv("random_data4.csv", index=False)