I am experiencing some problems while using .loc / .iloc as part of a loop. This is a simplified version of my code:
INDEX=['0', '1', '2', '3', '4'] COLUMNS=['A','B','C'] df=pd.DataFrame(index=INDEX, columns=COLUMNS) i=0 while i<1000: for row in INDEX: df.loc[row] = function() #breakpoint i_max = df['A'].idxmax() row_MAX=df.loc[i_max] if i == 0: row_GLOBALMAX=row_MAX elif row_MAX > row_GLOBALMAX: row_GLOBALMAX=row_MAX i+=1
basically:
I initialize a dataframe with index and columns
I populate each row of the dataframe with a for loop
I find the index “i_max” finding the maximum value in column ‘A’
I save the row of the dataframe where the value is maximum ‘row_MAX’
The while loop iterates over steps 2 to 4 and uses a new variable row_GLOBALMAX to save the row with the highest value in row ‘A’
The code works as expected during the first execution of the while loop (i=0), however at the second iteration (i=1) when I stop at the indicated breakpoint I observe a problem: both ‘row_MAX’ and ‘row_GLOBALMAX’ have already changed with respect to the first iteration and have followed the values in the updated ‘df’ dataframe, even though I haven’t yet assigned them in the second iteration.
basically it seems like the .loc function created a pointer to a particular row of the ‘df’ dataframe instead of actually assigning a value in that particular moment. Is this the normal behaviour? What should I use instead of .loc?
Advertisement
Answer
I think both loc
and iloc
(didn’t test iloc
) will point to a specific index of the dataframe. They do not make copies of the row.
You can use the copy()
method on the row to solve your problem.
import pandas as pd import numpy as np INDEX=['0', '1', '2', '3', '4'] COLUMNS=['A','B','C'] df=pd.DataFrame(index=INDEX, columns=COLUMNS) np.random.seed(5) for idx in INDEX: df.loc[idx] = np.random.randint(-100, 100, 3) print("First state") a_row = df.loc["3"] a_row_cp = a_row.copy() print(df) print("---n") print(a_row) print("n==================================nnn") for idx in INDEX: df.loc[idx] = np.random.randint(-100, 100, 3) print("Second state") print(df) print("---n") print(a_row) print("---n") print(a_row_cp)