I want to retain the string with the largest value based on a dictionary’s key and value. Any suggestion to how to do it effectively?
fruit_dict = { "Apple": 10, "Watermelon": 20, "Cherry": 30 } df = pd.DataFrame( { "ID": [1, 2, 3, 4, 5], "name": [ "Apple, Watermelon", "Cherry, Watermelon", "Apple", "Cherry, Apple", "Cherry", ], } ) ID name 0 1 Apple, Watermelon 1 2 Cherry, Watermelon 2 3 Apple 3 4 Cherry, Apple 4 5 Cherry
Expected output:
ID name 0 1 Watermelon 1 2 Cherry 2 3 Apple 3 4 Cherry 4 5 Cherry
Advertisement
Answer
One way it to use apply
with max
and fruit_dict.get
as key:
new_df = (df.assign(name=df['name'].str.split(', ') .apply(lambda l: max(l, key=fruit_dict.get))) )
or, if you expect some names to be missing from the dictionary:
new_df = (df.assign(name=df['name'].str.split(', ') .apply(lambda l: max(l, key=lambda x: fruit_dict.get(x, float('-inf')))) )
output:
ID name 0 1 Watermelon 1 2 Cherry 2 3 Apple 3 4 Cherry 4 5 Cherry