How do I select the first item in a column after grouping for another column in pandas?

Question

I have the following data frame: Note that the df is grouped by name / name_ID. names can have n scores, e.g. A has 2 scores, whereas B has 3 scores. I want an additional column, that indicates the first score per name / name_ID. The reference_score for the first scores of a name should be NaN. Like this: I

Accepted Answer

Let&#8217;s use groupby.transform to get first row value then mask the first row as NaN with condition ~df.duplicated('name', keep='first').# sort the dataframe first if score number is not ascending# df = df.sort_values(['name_ID', 'score_number'])df['reference_score'] = (df.groupby('name')['score']                         .transform('first')                         .mask(~df.duplicated('name', keep='first')))print(df)  name  name_ID  score  score_number  reference_score0    A        1    400             1              NaN1    A        1    500             2            400.02    B        2   3000             1              NaN3    B        2   1000             2           3000.04    B        2   4000             3           3000.05    C        3    600             1              NaN6    C        3    750             2            600.0Or we can compare score_number with 1 to define the first row in each group.df['reference_score'] = (df.groupby('name')['score']                         .transform('first')                         .mask(df['score_number'].eq(1))

Advertisement

Answer