I want to replicate the data from the same dataframe when a certain condition is fulfilled. Dataframe:
Hour,Wage 1,15 2,17 4,20 10,25 15,26 16,30 17,40 19,15
I want to replicate the dataframe when going through a loop and there is a difference greater than 4 in row.hour.
Expected Output:
Hour,Wage 1,15 2,17 4,20 10,25 15,26 16,30 17,40 19,15 2,17 4,20
i want to replicate the rows when the iterating through all the row and there is a difference greater than 4 in row.hour
row.hour[0] = 1
row.hour[1] = 2.here the difference between is 1 but in (row.hour[2]=4 and row,hour[3]=10).here the difference is 6 which is greater than 4.I want to replicate the data above of the index where this condition(greater than 4) is fulfilled
I can replicate the data with **df = pd.concat([df]*2, ignore_index=False)**
.but it does not replicate when i run it with if statement
I tried the code below but nothing is happening.
**for i in range(0,len(df)-1): if (df.iloc[i,0] - df.iloc[i+1,0]) > 4 : df = pd.concat([df]*2, ignore_index=False)**
Advertisement
Answer
My understanding is: you want to compare ‘Hour’ values for two successive rows. If the difference is > 4 you want to add the previous row to the DF. If that is what you want try this:
Create a DF:
j = pd.DataFrame({'Hour':[1, 2, 4,10,15,16,17,19], 'Wage':[15,17,20,25,26,30,40,15]})
Define a function:
def f1(d): dn = d.copy() for x in range(len(d)-2): if (abs(d.iloc[x+1].Hour - d.iloc[x+2].Hour) > 4): idx = x + 0.5 dn.loc[idx] = d.iloc[x]['Hour'], d.iloc[x]['Wage'] dn = dn.sort_index().reset_index(drop=True) return dn
Call the function passing your DF:
nd = f1(j) Hour Wage 0 1 15 1 2 17 2 2 17 3 4 20 4 4 20 5 10 25 6 15 26 7 16 30 8 17 40 9 19 15