How to efficiently do operation on pandas each group

Question

So I have a data frame like this&#8211; What I am doing is grouping by id and doing rolling operation on the delay column like below&#8211; It is working just fine but I am curious whether .apply on grouped data frame is vectorized or not. Since my dataset is huge, is there a better-vectorized way to do this …

Accepted Answer

You can use strides for vectorized rolling with GroupBy.transform:k = [0.1, 0.5, 1]def rolling_window(a, window):    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)    strides = a.strides + (a.strides[-1],)    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)def f(d):    return np.sum(rolling_window(np.append([0,0],d.to_numpy()), 3) * k, axis=1)df['new_delay'] = df.groupby('id')['delay'].transform(f)print (df)   id  delay  new_delay0   1     22       22.01   1     23       34.02   1     44       57.73   2     33       33.04   2     55       71.5

Advertisement

Answer