I have a dataframe that contains multiple columns as follow:
df = pd.DataFrame() df ['Player'] = ['A','A','A','A','A','B','B','B','B','B',] df ['Competition'] = ['x','x','y','y','y','x','y','z','y','y'] df ['Home'] = ['AB','EF','GH','AB','CF','EF','BD','BD','FG','CH'] df ['Away'] = ['CD','AB','AB','CF','AB','BD','BD','HF','BD','BD']
I want to create a new column based on the player, competition and value of highest occurrence in Home column and Away column. Let’s say the name of a new column that I want to create is Team. I would like have a new column as follow:
So it supposes to assign a team for a each player for each competition. How can I do it?
Advertisement
Answer
Use custom function with GroupBy.apply
with DataFrame.stack
, Series.mode
and first value by Series.iat
:
def f(x): x['Team'] = x[['Home','Away']].stack().mode().iat[0] return x
Another similar idea with Series.append
:
def f(x): x['Team'] = x['Home'].append(x['Away']).mode().iat[0] return x
df = df.groupby(['Player','Competition']).apply(f) print (df) Player Competition Home Away Team 0 A x AB CD AB 1 A x EF AB AB 2 A y GH AB AB 3 A y AB CF AB 4 A y CF AB AB 5 B x EF BD BD 6 B y BD BD BD 7 B z BD HF BD 8 B y FG BD BD 9 B y CH BD BD