df=(pd.DataFrame({'user_id':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4],'survey_id':[1,1,1,1,2,2,3,4,4,4,5,5,6,6,7,8,8,9,9,9,9,10,10,11,11,12,12],
'answer':["no","yes","no","no","yes","no","no","yes","no","yes","no","no","yes","no","no","no","yes","yes","yes","no","no","no","yes","no","yes","no","yes"]}))
df
user_id survey_id answer
0 1 1 no
1 1 1 yes
2 1 1 no
3 1 1 no
4 1 2 yes
5 1 2 no
6 1 3 no
7 2 4 yes
8 2 4 no
9 2 4 yes
10 2 5 no
11 2 5 no
12 2 6 yes
13 2 6 no
14 3 7 no
15 3 8 no
16 3 8 yes
17 3 9 yes
18 3 9 yes
19 3 9 no
20 3 9 no
21 4 10 no
22 4 10 yes
23 4 11 no
24 4 11 yes
25 4 12 no
26 4 12 yes
I want to group by user_id, then get the first element of survey_id, and get all elements related to this selection
df_head=
user_id survey_id answer
0 1 1 no
1 1 1 yes
2 1 1 no
3 1 1 no
4 2 4 yes
5 2 4 no
6 2 4 yes
7 3 7 no
8 4 10 no
9 4 10 yes
In the same way I want to group by user_id, then get the last element of survey_id, and get all elements related to this selection
df_tail=
user_id survey_id answer
0 1 3 no
1 2 6 yes
2 2 6 no
3 3 9 yes
4 3 9 yes
5 3 9 no
6 3 9 no
7 4 12 no
8 4 12 yes
Is there a quick groupby command to get this? I can do this by merging dataframes but I think there is some better way to do this in less command lines. Thank you in advance
Advertisement
Answer
Solution with no merging:
df_head = df[df.survey_id.eq(df.groupby('user_id').transform('min').survey_id)]
result:
user_id survey_id answer 0 1 1 no 1 1 1 yes 2 1 1 no 3 1 1 no 7 2 4 yes 8 2 4 no 9 2 4 yes 14 3 7 no 21 4 10 no 22 4 10 yes
df_tail = df[df.survey_id.eq(df.groupby('user_id').transform('max').survey_id)]
result:
user_id survey_id answer 6 1 3 no 12 2 6 yes 13 2 6 no 17 3 9 yes 18 3 9 yes 19 3 9 no 20 3 9 no 25 4 12 no 26 4 12 yes
Idea is to calculate min / max of survey_id per user_id and compare it to survey_id at row level of df. Please note that original index of dataframe is preserved. If You need new index just add at the end:
df_head = df_head.reset_index(drop = True) df_tail = df_tail.reset_index(drop = True)