I want to lag every column in a dataframe, by group. I have a frame like this:
import numpy as np import pandas as pd index = pd.date_range('2015-11-20', periods=6, freq='D') df = pd.DataFrame(dict(time=index, grp=['A']*3 + ['B']*3, col1=[1,2,3]*2, col2=['a','b','c']*2)).set_index(['time','grp'])
which looks like
col1 col2 time grp 2015-11-20 A 1 a 2015-11-21 A 2 b 2015-11-22 A 3 c 2015-11-23 B 1 a 2015-11-24 B 2 b 2015-11-25 B 3 c
and I want it to look like this:
col1 col2 col1_lag col2_lag time grp 2015-11-20 A 1 a 2 b 2015-11-21 A 2 b 3 c 2015-11-22 A 3 c NA NA 2015-11-23 B 1 a 2 b 2015-11-24 B 2 b 3 c 2015-11-25 B 3 c NA NA
This question manages the result for a single column, but I have an arbitrary number of columns, and I want to lag all of them. I can use groupby
and apply
, but apply
runs the shift
function over each column independently, and it doesn’t seem to like receiving an [nrow, 2]
shaped dataframe in return. Is there perhaps a function like apply
that acts on the whole group sub-frame? Or is there a better way to do this?
Advertisement
Answer
IIUC, you can simply use level="grp"
and then shift by -1:
>>> shifted = df.groupby(level="grp").shift(-1) >>> df.join(shifted.rename(columns=lambda x: x+"_lag")) col1 col2 col1_lag col2_lag time grp 2015-11-20 A 1 a 2 b 2015-11-21 A 2 b 3 c 2015-11-22 A 3 c NaN NaN 2015-11-23 B 1 a 2 b 2015-11-24 B 2 b 3 c 2015-11-25 B 3 c NaN NaN