Skip to content
Advertisement

Replacing values in pandas dataframe using nested loop based on conditions

I want to replace the first 3 values with 1 by a 0 if the current row value df.iloc[i,0] is 0 by iterating through the dataframe df. After replacing the values the dafaframe iteration should skip the new added value and start from the next index-in the following example from index 7. If the last tow values in the dataframe are 1 this should be replaced as well by 0- Replacing two values is only happened if these values are the last values. In the example this is the case for the values with index 9 and 10.

original DataFrame:

  index       column 1 
    0            1        
    1            1      
    2            1        
    3            0        
    4            1        
    5            1        
    6            1        
    7            1
    8            0
    9            1
   10            1

the new DataFrame what I want to have should look as follows:

 index       column 1 
    0            1        
    1            1      
    2            1        
    3            0        
    4          **0** --> new value 
    5          **0** --> new value 
    6          **0** --> new value        
    7            1    
    8            0
    9          **0** --> new value 
   10          **0** --> new value 

I type that code but it does not work.

for i in range(len(df)):

   print(df.iloc[i,0])

if df.iloc[i,0]== 0 :

    j= i + 1 
    
    while j <= i + 3:
        
        df.iloc[j,1]= 0
        
        j= j+ 1 
    
i = i + 4 #this is used to skip the new values and starting by the next firt index 


if (len(df)- i < 2) and (df.iloc[i,0]== 0): #replacing the two last values by 0 if the previous value is 0. 
    
    j= i + 1 
        
        while j <= len(df)
            
            df.iloc[j,1]= 0

Advertisement

Answer

There are many issues you could improve and change in your code.

First it is usually not a good idea to use for i in range(len(df)): loop. It’s not Pandas. Pandas has **df.size** (for use instead of len(df). And you loop in Python like:

for i, colmn_value in enumerate(df[colmn_name]): 

if you definitely need the index ( in most cases, including this one in your question you don’t ) or with

for colmn_value in df[colmn_name]: 

I have provided at the bottom your corrected code which now works. The issues I have fixed to make your code run are explained in the code so check them out. These issues were only usual ‘traps’ a beginner runs into learning how to code. The main idea was the right one.

You seem to have already programming experience in another programming language like C or C++, but … don’t expect a for i in range(N): Python loop to behave like a C-loop which increases the index value on each iteration, so you could change it in a loop to skip indices. You can’t do the same in the Python for loop getting its values from range(), enumerate() or other iterable. If you want to change the index within the loop use the Python ‘while’ loop.

The code I provide here below for the same task in two versions (a longer one, not Pandas way, and another doing the same Pandas way) is using the ‘trick’ of counting down the replacements from 3 to 0 if a zero value was detected in the column and replaces the values only if countdown:.

Change VERBOSE to False to switch off printing lines which show how the code works under the hood. And as it is Python, the code explains mostly by itself using in Python available appropriate syntax sounding like speaking about what is to do.

VERBOSE = True
if VERBOSE: new_colmn_value = "**0**"
else:       new_colmn_value =    0
new_colmn = []
countdown = 0
for df_colmn_val in df.iloc[:,0]: # i.e. "column 1"
    new_colmn.append(new_colmn_value if countdown else df_colmn_val)
    if VERBOSE: 
        print(f'{df_colmn_val=}, {countdown=}, new_colmn={new_colmn_value if countdown else df_colmn_val}')
    if df_colmn_val == 0 and not countdown:
       countdown = 4
    if countdown: countdown -= 1 
df.iloc[:,[0]] = new_colmn # same as df['column 1'] = new_colmn
print(df)

gives:

df_colmn_val=1, countdown=0, new_colmn=1
df_colmn_val=1, countdown=0, new_colmn=1
df_colmn_val=1, countdown=0, new_colmn=1
df_colmn_val=0, countdown=0, new_colmn=0
df_colmn_val=1, countdown=3, new_colmn=**0**
df_colmn_val=1, countdown=2, new_colmn=**0**
df_colmn_val=1, countdown=1, new_colmn=**0**
df_colmn_val=1, countdown=0, new_colmn=1
df_colmn_val=0, countdown=0, new_colmn=0
df_colmn_val=1, countdown=3, new_colmn=**0**
df_colmn_val=1, countdown=2, new_colmn=**0**
      column 1
index         
0            1
1            1
2            1
3            0
4        **0**
5        **0**
6        **0**
7            1
8            0
9        **0**
10       **0**

And now the Pandas way of doing the same:

ct = 0; nv ='*0*'
def ctF(row): 
    global ct # the countdown counter
    r0 = row.iloc[0] # column 0 value in the row of the dataframe
    row.iloc[0] = nv if ct else r0 # assign new or old value depending on counter
    if ct: ct -= 1 # decrease the counter if not yet zero
    else : ct  = 3 if not ct and r0==0 else 0 # set counter if there is zero in row
df.apply(ctF, axis=1) # axis=1: work on rows (and not on columns)
print(df)

The code above uses the Pandas .apply() method which passes as argument a row of the DataFrame to the ctF function which then works on the row and assigning new values to its elements if necessary. So the looping over the rows is done outside Python which is usually faster in case of large DataFrames. A global variable in the ctF function makes sure that the next function call knows the countdown value set in previous call. The .apply() returns a column of values ( this feature is not used in code above ) which can be for example added as new column to the DataFrame df providing the results of processing all the rows.

Below your own code which I had fixed so that it runs now and does what it was written for:

for i in range(len(df)):
   print(df.iloc[i,0])
   if df.iloc[i,0]== 0 :
       j= i + 1 
       while ( j <= i + 3 ) and j < df.size: # handles table end !!!
           print(f'{i=} {j=}')
           df.iloc[j, 0] = '**0**' # first column has index 0 !!!
           j= j+ 1 
       # i = i + 4 # this is used to skip the new values and starting by the next firt index
       # !!!### changing i in the loop will NOT do what you expect it to do !!!
       # the next i will be just i+1 getting its value from range() and NOT i+4              
   this_is_not_necessary_as_it_is_handled_already_above = """
   if (len(df)- i < 2) and (df.iloc[i,0]== 0): #replacing the two last values by 0 if the previous value is 0. 
       j= i + 1 
       while j <= len(df):
           df.iloc[j,1]= 0
   """

printing:

1
1
1
0
i=3 j=4
i=3 j=5
i=3 j=6
**0**
**0**
**0**
1
0
i=8 j=9
i=8 j=10
**0**
**0**
      column 1
index         
0            1
1            1
2            1
3            0
4        **0**
5        **0**
6        **0**
7            1
8            0
9        **0**
10       **0**
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement