I feel like there is a better way than this:
import pandas as pd df = pd.DataFrame( columns=" index c1 c2 v1 ".split(), data= [ [ 0, "A", "X", 3, ], [ 1, "A", "X", 5, ], [ 2, "A", "Y", 7, ], [ 3, "A", "Y", 1, ], [ 4, "B", "X", 3, ], [ 5, "B", "X", 1, ], [ 6, "B", "X", 3, ], [ 7, "B", "Y", 1, ], [ 8, "C", "X", 7, ], [ 9, "C", "Y", 4, ], [ 10, "C", "Y", 1, ], [ 11, "C", "Y", 6, ],]).set_index("index", drop=True) def callback(x): x['seq'] = range(1, x.shape[0] + 1) return x df = df.groupby(['c1', 'c2']).apply(callback) print df
To achieve this:
c1 c2 v1 seq 0 A X 3 1 1 A X 5 2 2 A Y 7 1 3 A Y 1 2 4 B X 3 1 5 B X 1 2 6 B X 3 3 7 B Y 1 1 8 C X 7 1 9 C Y 4 1 10 C Y 1 2 11 C Y 6 3
Is there a way to do it that avoids the callback?
Advertisement
Answer
use cumcount()
, see docs here
In [4]: df.groupby(['c1', 'c2']).cumcount() Out[4]: 0 0 1 1 2 0 3 1 4 0 5 1 6 2 7 0 8 0 9 0 10 1 11 2 dtype: int64
If you want orderings starting at 1
In [5]: df.groupby(['c1', 'c2']).cumcount()+1 Out[5]: 0 1 1 2 2 1 3 2 4 1 5 2 6 3 7 1 8 1 9 1 10 2 11 3 dtype: int64