df=(pd.DataFrame({'user_id':[1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4],'survey_id':[1,1,1,1,2,2,3,4,4,4,5,5,6,6,7,8,8,9,9,9,9,10,10,11,11,12,12], 'answer':["no","yes","no","no","yes","no","no","yes","no","yes","no","no","yes","no","no","no","yes","yes","yes","no","no","no","yes","no","yes","no","yes"]})) df user_id survey_id answer 0 1 1 no 1 1 1 yes 2 1 1 no 3 1 1 no 4 1 2 yes 5 1 2 no 6 1 3 no 7 2 4 yes 8 2 4 no 9 2 4 yes 10 2 5 no 11 2 5 no 12 2 6 yes 13 2 6 no 14 3 7 no 15 3 8 no 16 3 8 yes 17 3 9 yes 18 3 9 yes 19 3 9 no 20 3 9 no 21 4 10 no 22 4 10 yes 23 4 11 no 24 4 11 yes 25 4 12 no 26 4 12 yes
I want to group by user_id
, then get the first element of survey_id
, and get all elements related to this selection
df_head= user_id survey_id answer 0 1 1 no 1 1 1 yes 2 1 1 no 3 1 1 no 4 2 4 yes 5 2 4 no 6 2 4 yes 7 3 7 no 8 4 10 no 9 4 10 yes
In the same way I want to group by user_id
, then get the last element of survey_id
, and get all elements related to this selection
df_tail= user_id survey_id answer 0 1 3 no 1 2 6 yes 2 2 6 no 3 3 9 yes 4 3 9 yes 5 3 9 no 6 3 9 no 7 4 12 no 8 4 12 yes
Is there a quick groupby command to get this? I can do this by merging dataframes but I think there is some better way to do this in less command lines. Thank you in advance
Advertisement
Answer
Solution with no merging:
df_head = df[df.survey_id.eq(df.groupby('user_id').transform('min').survey_id)]
result:
user_id survey_id answer 0 1 1 no 1 1 1 yes 2 1 1 no 3 1 1 no 7 2 4 yes 8 2 4 no 9 2 4 yes 14 3 7 no 21 4 10 no 22 4 10 yes
df_tail = df[df.survey_id.eq(df.groupby('user_id').transform('max').survey_id)]
result:
user_id survey_id answer 6 1 3 no 12 2 6 yes 13 2 6 no 17 3 9 yes 18 3 9 yes 19 3 9 no 20 3 9 no 25 4 12 no 26 4 12 yes
Idea is to calculate min / max of survey_id
per user_id
and compare it to survey_id
at row level of df
. Please note that original index of dataframe is preserved. If You need new index just add at the end:
df_head = df_head.reset_index(drop = True) df_tail = df_tail.reset_index(drop = True)