I’ve a dataframe look like this
0 1 2 3 0 {'Emotion': 'female_angry', 'Score': '90.0%'} {'Emotion': 'female_disgust', 'Score': '0.0%'} {'Emotion': 'female_fear', 'Score': '0.0%'} 1 {'Emotion': 'female_angry', 'Score': '0.0%'} {'Emotion': 'female_disgust', 'Score': '0.0%'} {'Emotion': 'female_fear', 'Score': '80.0%'} 2 {'Emotion': 'female_angry', 'Score': '0.1%'} {'Emotion': 'female_disgust', 'Score': '99.0%'} {'Emotion': 'female_fear', 'Score': '4.6%'}
I want to make a separate column based on highest score values.
Like so
Emotion 0 'female_angry' 1 'female_fear' 2 'female_disgust'
I’ve went through many ref but I can’t relate with my problem. Any suggestions?
Advertisement
Answer
You can use pandas.apply with axis=1
for iterate over each row:
df_new = df.apply(lambda row: max([tuple(dct.values()) for dct in row], key= lambda x: x[1] )[0], axis=1).to_frame(name = 'Emotion') print(df_new)
Output:
Emotion 0 female_angry 1 female_fear 2 female_disgust
Explanation:
>>> df.apply(lambda row: [tuple(dct.values()) for dct in row], axis=1) # [('female_angry', '90.0%'), ('female_disgust', '0.0%'), ('female_fear', '0.0%')] # [('female_angry', '0.0%'), ('female_disgust', '0.0%'), ('female_fear', '80.0%')] # [('female_angry', '0.1%'), ('female_disgust', '99.0%'), ('female_fear', '4.6%')] >>> max([('female_angry', '90.0%'), ('female_disgust', '0.0%'), ('female_fear', '0.0%')], key=lambda x : x[1]) # ('female_angry', '90.0%') >>> ('female_angry', '90.0%')[0] # 'female_angry'