I am experiencing some problems while using .loc / .iloc as part of a loop. This is a simplified version of my code: basically: I initialize a dataframe with index and columns I populate each row of the dataframe with a for loop I find the index "i_max" finding the maximum value in column 'A' I save the row of

Does loc/iloc return a reference or a copy?

I am experiencing some problems while using .loc / .iloc as part of a loop. This is a simplified version of my code:

INDEX=['0', '1', '2', '3', '4']
COLUMNS=['A','B','C']
df=pd.DataFrame(index=INDEX, columns=COLUMNS)
i=0

while i<1000:

    for row in INDEX:
        df.loc[row] = function()
    #breakpoint

    i_max = df['A'].idxmax()
    row_MAX=df.loc[i_max]

    if i == 0:
        row_GLOBALMAX=row_MAX
    elif row_MAX > row_GLOBALMAX:
        row_GLOBALMAX=row_MAX

i+=1

JavaScript
​x
 
INDEX=['0', '1', '2', '3', '4']
COLUMNS=['A','B','C']
df=pd.DataFrame(index=INDEX, columns=COLUMNS)
i=0
​
while i<1000:
​
    for row in INDEX:
        df.loc[row] = function()
    #breakpoint
​
    i_max = df['A'].idxmax()
    row_MAX=df.loc[i_max]
​
    if i == 0:
        row_GLOBALMAX=row_MAX
    elif row_MAX > row_GLOBALMAX:
        row_GLOBALMAX=row_MAX
​
i+=1
​

basically:

I initialize a dataframe with index and columns
I populate each row of the dataframe with a for loop
I find the index “i_max” finding the maximum value in column ‘A’
I save the row of the dataframe where the value is maximum ‘row_MAX’
The while loop iterates over steps 2 to 4 and uses a new variable row_GLOBALMAX to save the row with the highest value in row ‘A’

The code works as expected during the first execution of the while loop (i=0), however at the second iteration (i=1) when I stop at the indicated breakpoint I observe a problem: both ‘row_MAX’ and ‘row_GLOBALMAX’ have already changed with respect to the first iteration and have followed the values in the updated ‘df’ dataframe, even though I haven’t yet assigned them in the second iteration.

basically it seems like the .loc function created a pointer to a particular row of the ‘df’ dataframe instead of actually assigning a value in that particular moment. Is this the normal behaviour? What should I use instead of .loc?

Answer

I think both loc and iloc (didn’t test iloc) will point to a specific index of the dataframe. They do not make copies of the row.

You can use the copy() method on the row to solve your problem.

import pandas as pd
import numpy as np

INDEX=['0', '1', '2', '3', '4']
COLUMNS=['A','B','C']

df=pd.DataFrame(index=INDEX, columns=COLUMNS)

np.random.seed(5)

for idx in INDEX:
    df.loc[idx] = np.random.randint(-100, 100, 3)

print("First state")
a_row = df.loc["3"]
a_row_cp = a_row.copy()

print(df)
print("---n")
print(a_row)

print("n==================================nnn")

for idx in INDEX:
    df.loc[idx] = np.random.randint(-100, 100, 3)

print("Second state")
print(df)
print("---n")
print(a_row)
print("---n")
print(a_row_cp)

JavaScript
 
import pandas as pd
import numpy as np
​
INDEX=['0', '1', '2', '3', '4']
COLUMNS=['A','B','C']
​
df=pd.DataFrame(index=INDEX, columns=COLUMNS)
​
np.random.seed(5)
​
for idx in INDEX:
    df.loc[idx] = np.random.randint(-100, 100, 3)
​
print("First state")
a_row = df.loc["3"]
a_row_cp = a_row.copy()
​
print(df)
print("---n")
print(a_row)
​
print("n==================================nnn")
​
for idx in INDEX:
    df.loc[idx] = np.random.randint(-100, 100, 3)
​
print("Second state")
print(df)
print("---n")
print(a_row)
print("---n")
print(a_row_cp)
​

Advertisement

Answer