I feel like there is a better way than this:
import pandas as pd
df = pd.DataFrame(
columns=" index c1 c2 v1 ".split(),
data= [
[ 0, "A", "X", 3, ],
[ 1, "A", "X", 5, ],
[ 2, "A", "Y", 7, ],
[ 3, "A", "Y", 1, ],
[ 4, "B", "X", 3, ],
[ 5, "B", "X", 1, ],
[ 6, "B", "X", 3, ],
[ 7, "B", "Y", 1, ],
[ 8, "C", "X", 7, ],
[ 9, "C", "Y", 4, ],
[ 10, "C", "Y", 1, ],
[ 11, "C", "Y", 6, ],]).set_index("index", drop=True)
def callback(x):
x['seq'] = range(1, x.shape[0] + 1)
return x
df = df.groupby(['c1', 'c2']).apply(callback)
print df
To achieve this:
c1 c2 v1 seq 0 A X 3 1 1 A X 5 2 2 A Y 7 1 3 A Y 1 2 4 B X 3 1 5 B X 1 2 6 B X 3 3 7 B Y 1 1 8 C X 7 1 9 C Y 4 1 10 C Y 1 2 11 C Y 6 3
Is there a way to do it that avoids the callback?
Advertisement
Answer
use cumcount(), see docs here
In [4]: df.groupby(['c1', 'c2']).cumcount() Out[4]: 0 0 1 1 2 0 3 1 4 0 5 1 6 2 7 0 8 0 9 0 10 1 11 2 dtype: int64
If you want orderings starting at 1
In [5]: df.groupby(['c1', 'c2']).cumcount()+1 Out[5]: 0 1 1 2 2 1 3 2 4 1 5 2 6 3 7 1 8 1 9 1 10 2 11 3 dtype: int64