I have a dataframe (here an example)
Date | UnitId | ServiceDomineId | Interval | ServiceTime |
---|---|---|---|---|
01/01/2021 | 1 | 1 | 8:00 | 30 |
01/01/2021 | 1 | 1 | 8:30 | 20 |
01/01/2021 | 1 | 1 | 9:00 | 10 |
01/01/2021 | 2 | 1 | 8:00 | 50 |
01/01/2021 | 2 | 1 | 9:00 | 10 |
01/01/2021 | 1 | 2 | 8:30 | 25 |
01/01/2021 | 1 | 2 | 9:00 | 15 |
01/01/2021 | 1 | 2 | 9:30 | 30 |
01/01/2021 | 2 | 2 | 8:00 | 45 |
01/01/2021 | 2 | 2 | 8:30 | 10 |
02/01/2021 | 1 | 1 | 8:00 | 30 |
02/01/2021 | 1 | 1 | 8:30 | 45 |
02/01/2021 | 1 | 1 | 9:00 | 10 |
02/01/2021 | 2 | 1 | 8:00 | 30 |
02/01/2021 | 2 | 1 | 8:30 | 55 |
02/01/2021 | 2 | 1 | 9:00 | 60 |
02/01/2021 | 1 | 2 | 8:00 | 35 |
02/01/2021 | 1 | 2 | 8:30 | 15 |
02/01/2021 | 1 | 2 | 9:00 | 10 |
02/01/2021 | 2 | 2 | 8:00 | 20 |
02/01/2021 | 2 | 2 | 8:30 | 35 |
02/01/2021 | 2 | 2 | 9:00 | 10 |
And I need to have a new column being what happened on the previous Iterval (Note: I actually have the intervals saved as a number, every interval is 1/48 more than the previous one, for example 00:00 is 0, 8:00 would be 16/48, then it is 1/3 or as I have it 0.333333) Also if the previous interval doesn’t exist in the Data I have, I would like it to show 0 in that column. Here an example of how it should be.
Date | UnitId | ServiceDomineId | Interval | ServiceTime | ServiceTimePreviousInterval |
---|---|---|---|---|---|
01/01/2021 | 1 | 1 | 8:00 | 30 | 0 |
01/01/2021 | 1 | 1 | 8:30 | 20 | 30 |
01/01/2021 | 1 | 1 | 9:00 | 10 | 20 |
01/01/2021 | 2 | 1 | 8:00 | 50 | 0 |
01/01/2021 | 2 | 1 | 9:00 | 10 | 0 |
01/01/2021 | 1 | 2 | 8:30 | 25 | 0 |
01/01/2021 | 1 | 2 | 9:00 | 15 | 25 |
01/01/2021 | 1 | 2 | 9:30 | 30 | 15 |
01/01/2021 | 2 | 2 | 8:00 | 45 | 0 |
01/01/2021 | 2 | 2 | 8:30 | 10 | 45 |
02/01/2021 | 1 | 1 | 8:00 | 30 | 0 |
02/01/2021 | 1 | 1 | 8:30 | 45 | 30 |
02/01/2021 | 1 | 1 | 9:00 | 10 | 45 |
02/01/2021 | 2 | 1 | 8:00 | 30 | 0 |
02/01/2021 | 2 | 1 | 8:30 | 55 | 30 |
02/01/2021 | 2 | 1 | 9:00 | 60 | 55 |
02/01/2021 | 1 | 2 | 8:00 | 35 | 0 |
02/01/2021 | 1 | 2 | 8:30 | 15 | 35 |
02/01/2021 | 1 | 2 | 9:00 | 10 | 15 |
02/01/2021 | 2 | 2 | 8:00 | 20 | 0 |
02/01/2021 | 2 | 2 | 8:30 | 35 | 20 |
02/01/2021 | 2 | 2 | 9:00 | 10 | 35 |
I thought about using Pandas function shift, but it whouldn’t help when there are missing intervals. Also thought about using nested for to separate in different dataframes and make the desicion with an if, but it since the complete dataframe is really big, it takes too long. Do you know how could I do it?
Here an image of the complete dataframe and a comparison of how I see my intervals (float) vs what they are to understand it better.
Another logical way I found to do this is
df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals']) df['ServiceTimePI']=np.zeros(len(df)) df['ServiceTimePI'][0]=0 for i in range(len(df)-1): if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001): df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i] else: df['ServiceTimePI'][i+1]=0
Advertisement
Answer
First it orders the dataframe by the columns needed, then creates the new column filled with 0s, since the first value is always going to be 0, then the for just checks if there is a change in the columns and if the interval are half an hour one from the other, if there is no change and the intervals are half an hour from each other, then the value I need is the one from the previous interval, else it is 0.
df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals']) df['ServiceTimePI']=np.zeros(len(df)) df['ServiceTimePI'][0]=0 for i in range(len(df)-1): if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001): df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i] else: df['ServiceTimePI'][i+1]=0