I have the following data frame:
JavaScript
x
8
1
df = {
2
'name': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
3
'name_ID' : [1, 1, 2, 2, 2, 3, 3],
4
'score' : [400, 500, 3000, 1000, 4000, 600, 750],
5
'score_number' : [1, 2, 1, 2, 3, 1, 2]
6
}
7
df = pd.DataFrame(df)
8
Note that the df is grouped by name
/ name_ID
. names
can have n scores, e.g. A
has 2 scores, whereas B
has 3 scores. I want an additional column, that indicates the first score per name
/ name_ID
. The reference_score
for the first scores of a name should be NaN. Like this:
I have tried:
df_v2['first_fund'] =df_v2['fund_size'].groupby(df_v2['firm_ID']).first()
,
also with .nth
but it didn’t work.
Thanks in advance.
Advertisement
Answer
Let’s use groupby.transform
to get first row value then mask
the first row as NaN
with condition ~df.duplicated('name', keep='first')
.
JavaScript
1
6
1
# sort the dataframe first if score number is not ascending
2
# df = df.sort_values(['name_ID', 'score_number'])
3
df['reference_score'] = (df.groupby('name')['score']
4
.transform('first')
5
.mask(~df.duplicated('name', keep='first')))
6
JavaScript
1
11
11
1
print(df)
2
3
name name_ID score score_number reference_score
4
0 A 1 400 1 NaN
5
1 A 1 500 2 400.0
6
2 B 2 3000 1 NaN
7
3 B 2 1000 2 3000.0
8
4 B 2 4000 3 3000.0
9
5 C 3 600 1 NaN
10
6 C 3 750 2 600.0
11
Or we can compare score_number
with 1
to define the first row in each group.
JavaScript
1
4
1
df['reference_score'] = (df.groupby('name')['score']
2
.transform('first')
3
.mask(df['score_number'].eq(1))
4