I have a dataframe that contains multiple columns as follow:
JavaScript
x
6
1
df = pd.DataFrame()
2
df ['Player'] = ['A','A','A','A','A','B','B','B','B','B',]
3
df ['Competition'] = ['x','x','y','y','y','x','y','z','y','y']
4
df ['Home'] = ['AB','EF','GH','AB','CF','EF','BD','BD','FG','CH']
5
df ['Away'] = ['CD','AB','AB','CF','AB','BD','BD','HF','BD','BD']
6
I want to create a new column based on the player, competition and value of highest occurrence in Home column and Away column. Let’s say the name of a new column that I want to create is Team. I would like have a new column as follow:
So it supposes to assign a team for a each player for each competition. How can I do it?
Advertisement
Answer
Use custom function with GroupBy.apply
with DataFrame.stack
, Series.mode
and first value by Series.iat
:
JavaScript
1
4
1
def f(x):
2
x['Team'] = x[['Home','Away']].stack().mode().iat[0]
3
return x
4
Another similar idea with Series.append
:
JavaScript
1
4
1
def f(x):
2
x['Team'] = x['Home'].append(x['Away']).mode().iat[0]
3
return x
4
JavaScript
1
14
14
1
df = df.groupby(['Player','Competition']).apply(f)
2
print (df)
3
Player Competition Home Away Team
4
0 A x AB CD AB
5
1 A x EF AB AB
6
2 A y GH AB AB
7
3 A y AB CF AB
8
4 A y CF AB AB
9
5 B x EF BD BD
10
6 B y BD BD BD
11
7 B z BD HF BD
12
8 B y FG BD BD
13
9 B y CH BD BD
14