Skip to content
Advertisement

How do I select the first item in a column after grouping for another column in pandas?

I have the following data frame:

JavaScript

Note that the df is grouped by name / name_ID. names can have n scores, e.g. A has 2 scores, whereas B has 3 scores. I want an additional column, that indicates the first score per name / name_ID. The reference_score for the first scores of a name should be NaN. Like this:

enter image description here

I have tried: df_v2['first_fund'] =df_v2['fund_size'].groupby(df_v2['firm_ID']).first(), also with .nth but it didn’t work.

Thanks in advance.

Advertisement

Answer

Let’s use groupby.transform to get first row value then mask the first row as NaN with condition ~df.duplicated('name', keep='first').

JavaScript
JavaScript

Or we can compare score_number with 1 to define the first row in each group.

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement