I want to retain the string with the largest value based on a dictionary’s key and value. Any suggestion to how to do it effectively?
fruit_dict = {
  "Apple": 10,
  "Watermelon": 20,
  "Cherry": 30
}
df = pd.DataFrame(
    {
        "ID": [1, 2, 3, 4, 5],
        "name": [
            "Apple, Watermelon",
            "Cherry, Watermelon",
            "Apple",
            "Cherry, Apple",
            "Cherry",
        ],
    }
)
   ID                name
0   1   Apple, Watermelon
1   2  Cherry, Watermelon
2   3               Apple
3   4       Cherry, Apple
4   5              Cherry
Expected output:
ID name 0 1 Watermelon 1 2 Cherry 2 3 Apple 3 4 Cherry 4 5 Cherry
Advertisement
Answer
One way it to use apply with max and fruit_dict.get as key:
new_df = (df.assign(name=df['name'].str.split(', ')
            .apply(lambda l: max(l, key=fruit_dict.get)))
          )
or, if you expect some names to be missing from the dictionary:
new_df = (df.assign(name=df['name'].str.split(', ')
            .apply(lambda l: max(l, key=lambda x: fruit_dict.get(x, float('-inf'))))
          )
output:
ID name 0 1 Watermelon 1 2 Cherry 2 3 Apple 3 4 Cherry 4 5 Cherry
