Skip to content
Advertisement

Creating a new columns with maximum count of value in multiple columns

I have a dataframe that contains multiple columns as follow:

df = pd.DataFrame()
df ['Player'] = ['A','A','A','A','A','B','B','B','B','B',]
df ['Competition'] = ['x','x','y','y','y','x','y','z','y','y']
df ['Home'] = ['AB','EF','GH','AB','CF','EF','BD','BD','FG','CH']
df ['Away'] = ['CD','AB','AB','CF','AB','BD','BD','HF','BD','BD']

I want to create a new column based on the player, competition and value of highest occurrence in Home column and Away column. Let’s say the name of a new column that I want to create is Team. I would like have a new column as follow:

enter image description here

So it supposes to assign a team for a each player for each competition. How can I do it?

Advertisement

Answer

Use custom function with GroupBy.apply with DataFrame.stack, Series.mode and first value by Series.iat:

def f(x):
    x['Team'] = x[['Home','Away']].stack().mode().iat[0]
    return x

Another similar idea with Series.append:

def f(x):
    x['Team'] = x['Home'].append(x['Away']).mode().iat[0]
    return x

df = df.groupby(['Player','Competition']).apply(f)
print (df)
  Player Competition Home Away Team
0      A           x   AB   CD   AB
1      A           x   EF   AB   AB
2      A           y   GH   AB   AB
3      A           y   AB   CF   AB
4      A           y   CF   AB   AB
5      B           x   EF   BD   BD
6      B           y   BD   BD   BD
7      B           z   BD   HF   BD
8      B           y   FG   BD   BD
9      B           y   CH   BD   BD
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement