In words:
I have a data frame that consists of values over a day, for multiple days per Userid
. I’d like to shift all of certain people’s data by 1 period, so that the first value in their first column is a nan
, and then everything is cyclically offset, with the last value truncated or lost to space and time, whichever is faster.
What I’m currently doing is slow, because I end up iterating over all of the rows of the people I’m interested in. I wonder if there is not a faster solution I’m not thinking of.
In code:
Given this dataset:
df = pd.DataFrame([['person1','3/1/2014',1,2,3,4,5,6,7,8,9], ['person1','3/2/2014',4,1,4,1,4,1,4,1,4], ['person2','3/1/2014',2,3,4,5,6,7,8,9,9], ['person2','3/2/2014',6,5,4,3,2,1,0,-1,-15]],columns = ['Userid','Date','T1','T2','T3','T4','T5','T6','T7','T8','T9'])
I currently use the following code to shift the values for certain ids:
def shift_by_1(df_): next_seed_value = [np.nan] L=[] for i,row in df.iterrows(): newrow_plus = next_seed_value+row.values.tolist() next_seed_value = [newrow_plus[-1]] newrow = newrow_plus[:-1] L.append(newrow) return pd.DataFrame(L) LL = [] for xx in ids_to_be_shifted: LL.append(shift_by_15(df[df['id']==xx])) newdf = pd.concat(LL)
Which achieves the following:
newdf == pd.DataFrame([['person1','3/1/2014',np.nan,1,2,3,4,5,6,7,8], ['person1','3/2/2014',9,4,1,4,1,4,1,4,1], ['person2','3/1/2014',np.nan,2,3,4,5,6,7,8,9], ['person2','3/2/2014',9,6,5,4,3,2,1,0,-1]],columns = ['Userid','Date','T1','T2','T3','T4','T5','T6','T7','T8','T9'])
The problem is that shift_by_1
is slow. I feel like there has to be a faster way, maybe through a group-by + clever use of lambda and something like np.ravel() and its opposite?
Any and all help is much appreciated, thanks in advance.
Advertisement
Answer
IIUC:
newdf = (df.set_index(['Userid','Date']) .stack() .groupby(level=[0]) .shift() .unstack(level=-1) .reset_index() )
Output:
Userid Date T1 T2 T3 T4 T5 T6 T7 T8 T9 0 person1 3/1/2014 NaN 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 1 person1 3/2/2014 9.0 4.0 1.0 4.0 1.0 4.0 1.0 4.0 1.0 2 person2 3/1/2014 NaN 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 3 person2 3/2/2014 9.0 6.0 5.0 4.0 3.0 2.0 1.0 0.0 -1.0