I want to retain the string with the largest value based on a dictionary’s key and value. Any suggestion to how to do it effectively?
fruit_dict = {
"Apple": 10,
"Watermelon": 20,
"Cherry": 30
}
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Apple, Watermelon",
"Cherry, Watermelon",
"Apple",
"Cherry, Apple",
"Cherry",
],
}
)
ID name
0 1 Apple, Watermelon
1 2 Cherry, Watermelon
2 3 Apple
3 4 Cherry, Apple
4 5 Cherry
Expected output:
ID name 0 1 Watermelon 1 2 Cherry 2 3 Apple 3 4 Cherry 4 5 Cherry
Advertisement
Answer
One way it to use apply with max and fruit_dict.get as key:
new_df = (df.assign(name=df['name'].str.split(', ')
.apply(lambda l: max(l, key=fruit_dict.get)))
)
or, if you expect some names to be missing from the dictionary:
new_df = (df.assign(name=df['name'].str.split(', ')
.apply(lambda l: max(l, key=lambda x: fruit_dict.get(x, float('-inf'))))
)
output:
ID name 0 1 Watermelon 1 2 Cherry 2 3 Apple 3 4 Cherry 4 5 Cherry