JavaScript
x
33
33
1
df=(pd.DataFrame({'user_id':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4],'survey_id':[1,1,1,1,2,2,3,4,4,4,5,5,6,6,7,8,8,9,9,9,9,10,10,11,11,12,12],
2
'answer':["no","yes","no","no","yes","no","no","yes","no","yes","no","no","yes","no","no","no","yes","yes","yes","no","no","no","yes","no","yes","no","yes"]}))
3
df
4
5
user_id survey_id answer
6
0 1 1 no
7
1 1 1 yes
8
2 1 1 no
9
3 1 1 no
10
4 1 2 yes
11
5 1 2 no
12
6 1 3 no
13
7 2 4 yes
14
8 2 4 no
15
9 2 4 yes
16
10 2 5 no
17
11 2 5 no
18
12 2 6 yes
19
13 2 6 no
20
14 3 7 no
21
15 3 8 no
22
16 3 8 yes
23
17 3 9 yes
24
18 3 9 yes
25
19 3 9 no
26
20 3 9 no
27
21 4 10 no
28
22 4 10 yes
29
23 4 11 no
30
24 4 11 yes
31
25 4 12 no
32
26 4 12 yes
33
I want to group by user_id
, then get the first element of survey_id
, and get all elements related to this selection
JavaScript
1
13
13
1
df_head=
2
user_id survey_id answer
3
0 1 1 no
4
1 1 1 yes
5
2 1 1 no
6
3 1 1 no
7
4 2 4 yes
8
5 2 4 no
9
6 2 4 yes
10
7 3 7 no
11
8 4 10 no
12
9 4 10 yes
13
In the same way I want to group by user_id
, then get the last element of survey_id
, and get all elements related to this selection
JavaScript
1
12
12
1
df_tail=
2
user_id survey_id answer
3
0 1 3 no
4
1 2 6 yes
5
2 2 6 no
6
3 3 9 yes
7
4 3 9 yes
8
5 3 9 no
9
6 3 9 no
10
7 4 12 no
11
8 4 12 yes
12
Is there a quick groupby command to get this? I can do this by merging dataframes but I think there is some better way to do this in less command lines. Thank you in advance
Advertisement
Answer
Solution with no merging:
JavaScript
1
2
1
df_head = df[df.survey_id.eq(df.groupby('user_id').transform('min').survey_id)]
2
result:
JavaScript
1
12
12
1
user_id survey_id answer
2
0 1 1 no
3
1 1 1 yes
4
2 1 1 no
5
3 1 1 no
6
7 2 4 yes
7
8 2 4 no
8
9 2 4 yes
9
14 3 7 no
10
21 4 10 no
11
22 4 10 yes
12
JavaScript
1
2
1
df_tail = df[df.survey_id.eq(df.groupby('user_id').transform('max').survey_id)]
2
result:
JavaScript
1
11
11
1
user_id survey_id answer
2
6 1 3 no
3
12 2 6 yes
4
13 2 6 no
5
17 3 9 yes
6
18 3 9 yes
7
19 3 9 no
8
20 3 9 no
9
25 4 12 no
10
26 4 12 yes
11
Idea is to calculate min / max of survey_id
per user_id
and compare it to survey_id
at row level of df
. Please note that original index of dataframe is preserved. If You need new index just add at the end:
JavaScript
1
3
1
df_head = df_head.reset_index(drop = True)
2
df_tail = df_tail.reset_index(drop = True)
3