I want to replace the first 3 values with 1 by a 0 if the current row value df.iloc[i,0] is 0 by iterating through the dataframe df. After replacing the values the dafaframe iteration should skip the new added value and start from the next index-in the following example from index 7. If the last tow values in the dataframe are 1 this should be replaced as well by 0- Replacing two values is only happened if these values are the last values. In the example this is the case for the values with index 9 and 10.
original DataFrame:
index column 1 0 1 1 1 2 1 3 0 4 1 5 1 6 1 7 1 8 0 9 1 10 1
the new DataFrame what I want to have should look as follows:
index column 1 0 1 1 1 2 1 3 0 4 **0** --> new value 5 **0** --> new value 6 **0** --> new value 7 1 8 0 9 **0** --> new value 10 **0** --> new value
I type that code but it does not work.
for i in range(len(df)): print(df.iloc[i,0]) if df.iloc[i,0]== 0 : j= i + 1 while j <= i + 3: df.iloc[j,1]= 0 j= j+ 1 i = i + 4 #this is used to skip the new values and starting by the next firt index if (len(df)- i < 2) and (df.iloc[i,0]== 0): #replacing the two last values by 0 if the previous value is 0. j= i + 1 while j <= len(df) df.iloc[j,1]= 0
Advertisement
Answer
There are many issues you could improve and change in your code.
First it is usually not a good idea to use for i in range(len(df)):
loop. It’s not Pandas. Pandas has **df.size**
(for use instead of len(df)
. And you loop in Python like:
for i, colmn_value in enumerate(df[colmn_name]):
if you definitely need the index ( in most cases, including this one in your question you don’t ) or with
for colmn_value in df[colmn_name]:
I have provided at the bottom your corrected code which now works. The issues I have fixed to make your code run are explained in the code so check them out. These issues were only usual ‘traps’ a beginner runs into learning how to code. The main idea was the right one.
You seem to have already programming experience in another programming language like C or C++, but … don’t expect a for i in range(N):
Python loop to behave like a C-loop which increases the index value on each iteration, so you could change it in a loop to skip indices. You can’t do the same in the Python for loop getting its values from range(), enumerate() or other iterable. If you want to change the index within the loop use the Python ‘while’ loop.
The code I provide here below for the same task in two versions (a longer one, not Pandas way
, and another doing the same Pandas way
) is using the ‘trick’ of counting down the replacements from 3 to 0 if a zero value was detected in the column and replaces the values only if countdown:
.
Change VERBOSE
to False
to switch off printing lines which show how the code works under the hood. And as it is Python, the code explains mostly by itself using in Python available appropriate syntax sounding like speaking about what is to do.
VERBOSE = True if VERBOSE: new_colmn_value = "**0**" else: new_colmn_value = 0 new_colmn = [] countdown = 0 for df_colmn_val in df.iloc[:,0]: # i.e. "column 1" new_colmn.append(new_colmn_value if countdown else df_colmn_val) if VERBOSE: print(f'{df_colmn_val=}, {countdown=}, new_colmn={new_colmn_value if countdown else df_colmn_val}') if df_colmn_val == 0 and not countdown: countdown = 4 if countdown: countdown -= 1 df.iloc[:,[0]] = new_colmn # same as df['column 1'] = new_colmn print(df)
gives:
df_colmn_val=1, countdown=0, new_colmn=1 df_colmn_val=1, countdown=0, new_colmn=1 df_colmn_val=1, countdown=0, new_colmn=1 df_colmn_val=0, countdown=0, new_colmn=0 df_colmn_val=1, countdown=3, new_colmn=**0** df_colmn_val=1, countdown=2, new_colmn=**0** df_colmn_val=1, countdown=1, new_colmn=**0** df_colmn_val=1, countdown=0, new_colmn=1 df_colmn_val=0, countdown=0, new_colmn=0 df_colmn_val=1, countdown=3, new_colmn=**0** df_colmn_val=1, countdown=2, new_colmn=**0** column 1 index 0 1 1 1 2 1 3 0 4 **0** 5 **0** 6 **0** 7 1 8 0 9 **0** 10 **0**
And now the Pandas way of doing the same:
ct = 0; nv ='*0*' def ctF(row): global ct # the countdown counter r0 = row.iloc[0] # column 0 value in the row of the dataframe row.iloc[0] = nv if ct else r0 # assign new or old value depending on counter if ct: ct -= 1 # decrease the counter if not yet zero else : ct = 3 if not ct and r0==0 else 0 # set counter if there is zero in row df.apply(ctF, axis=1) # axis=1: work on rows (and not on columns) print(df)
The code above uses the Pandas .apply()
method which passes as argument a row of the DataFrame to the ctF
function which then works on the row and assigning new values to its elements if necessary. So the looping over the rows is done outside Python which is usually faster in case of large DataFrames. A global variable in the ctF function makes sure that the next function call knows the countdown value set in previous call. The .apply()
returns a column of values ( this feature is not used in code above ) which can be for example added as new column to the DataFrame df providing the results of processing all the rows.
Below your own code which I had fixed so that it runs now and does what it was written for:
for i in range(len(df)): print(df.iloc[i,0]) if df.iloc[i,0]== 0 : j= i + 1 while ( j <= i + 3 ) and j < df.size: # handles table end !!! print(f'{i=} {j=}') df.iloc[j, 0] = '**0**' # first column has index 0 !!! j= j+ 1 # i = i + 4 # this is used to skip the new values and starting by the next firt index # !!!### changing i in the loop will NOT do what you expect it to do !!! # the next i will be just i+1 getting its value from range() and NOT i+4 this_is_not_necessary_as_it_is_handled_already_above = """ if (len(df)- i < 2) and (df.iloc[i,0]== 0): #replacing the two last values by 0 if the previous value is 0. j= i + 1 while j <= len(df): df.iloc[j,1]= 0 """
printing:
1 1 1 0 i=3 j=4 i=3 j=5 i=3 j=6 **0** **0** **0** 1 0 i=8 j=9 i=8 j=10 **0** **0** column 1 index 0 1 1 1 2 1 3 0 4 **0** 5 **0** 6 **0** 7 1 8 0 9 **0** 10 **0**