Skip to content
Advertisement

Fast way to cyclically wrap values in pandas dataframe

In words: I have a data frame that consists of values over a day, for multiple days per Userid. I’d like to shift all of certain people’s data by 1 period, so that the first value in their first column is a nan, and then everything is cyclically offset, with the last value truncated or lost to space and time, whichever is faster.

What I’m currently doing is slow, because I end up iterating over all of the rows of the people I’m interested in. I wonder if there is not a faster solution I’m not thinking of.

In code:

Given this dataset:

df = pd.DataFrame([['person1','3/1/2014',1,2,3,4,5,6,7,8,9],
    ['person1','3/2/2014',4,1,4,1,4,1,4,1,4], 
    ['person2','3/1/2014',2,3,4,5,6,7,8,9,9], 
    ['person2','3/2/2014',6,5,4,3,2,1,0,-1,-15]],columns = 
    ['Userid','Date','T1','T2','T3','T4','T5','T6','T7','T8','T9'])

I currently use the following code to shift the values for certain ids:

def shift_by_1(df_):
    next_seed_value = [np.nan]
    L=[]
    for i,row in df.iterrows():
        newrow_plus = next_seed_value+row.values.tolist()
        next_seed_value = [newrow_plus[-1]]
        newrow = newrow_plus[:-1]
        L.append(newrow)
    return pd.DataFrame(L)

LL = []
for xx in ids_to_be_shifted:
    LL.append(shift_by_15(df[df['id']==xx]))
    
newdf = pd.concat(LL)

Which achieves the following:

newdf == pd.DataFrame([['person1','3/1/2014',np.nan,1,2,3,4,5,6,7,8],
    ['person1','3/2/2014',9,4,1,4,1,4,1,4,1], 
    ['person2','3/1/2014',np.nan,2,3,4,5,6,7,8,9], 
    ['person2','3/2/2014',9,6,5,4,3,2,1,0,-1]],columns = 
    ['Userid','Date','T1','T2','T3','T4','T5','T6','T7','T8','T9'])

The problem is that shift_by_1 is slow. I feel like there has to be a faster way, maybe through a group-by + clever use of lambda and something like np.ravel() and its opposite?

Any and all help is much appreciated, thanks in advance.

Advertisement

Answer

IIUC:

newdf = (df.set_index(['Userid','Date'])
           .stack()
           .groupby(level=[0])
           .shift()
           .unstack(level=-1)
           .reset_index()
        )

Output:

    Userid      Date   T1   T2   T3   T4   T5   T6   T7   T8   T9
0  person1  3/1/2014  NaN  1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0
1  person1  3/2/2014  9.0  4.0  1.0  4.0  1.0  4.0  1.0  4.0  1.0
2  person2  3/1/2014  NaN  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0
3  person2  3/2/2014  9.0  6.0  5.0  4.0  3.0  2.0  1.0  0.0 -1.0
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement