I feel like there is a better way than this:
JavaScript
x
22
22
1
import pandas as pd
2
df = pd.DataFrame(
3
columns=" index c1 c2 v1 ".split(),
4
data= [
5
[ 0, "A", "X", 3, ],
6
[ 1, "A", "X", 5, ],
7
[ 2, "A", "Y", 7, ],
8
[ 3, "A", "Y", 1, ],
9
[ 4, "B", "X", 3, ],
10
[ 5, "B", "X", 1, ],
11
[ 6, "B", "X", 3, ],
12
[ 7, "B", "Y", 1, ],
13
[ 8, "C", "X", 7, ],
14
[ 9, "C", "Y", 4, ],
15
[ 10, "C", "Y", 1, ],
16
[ 11, "C", "Y", 6, ],]).set_index("index", drop=True)
17
def callback(x):
18
x['seq'] = range(1, x.shape[0] + 1)
19
return x
20
df = df.groupby(['c1', 'c2']).apply(callback)
21
print df
22
To achieve this:
JavaScript
1
14
14
1
c1 c2 v1 seq
2
0 A X 3 1
3
1 A X 5 2
4
2 A Y 7 1
5
3 A Y 1 2
6
4 B X 3 1
7
5 B X 1 2
8
6 B X 3 3
9
7 B Y 1 1
10
8 C X 7 1
11
9 C Y 4 1
12
10 C Y 1 2
13
11 C Y 6 3
14
Is there a way to do it that avoids the callback?
Advertisement
Answer
use cumcount()
, see docs here
JavaScript
1
16
16
1
In [4]: df.groupby(['c1', 'c2']).cumcount()
2
Out[4]:
3
0 0
4
1 1
5
2 0
6
3 1
7
4 0
8
5 1
9
6 2
10
7 0
11
8 0
12
9 0
13
10 1
14
11 2
15
dtype: int64
16
If you want orderings starting at 1
JavaScript
1
16
16
1
In [5]: df.groupby(['c1', 'c2']).cumcount()+1
2
Out[5]:
3
0 1
4
1 2
5
2 1
6
3 2
7
4 1
8
5 2
9
6 3
10
7 1
11
8 1
12
9 1
13
10 2
14
11 3
15
dtype: int64
16