So, I have some sample data as such:
import pandas as pd objs = [ {'location':'US', 'fruit':'apple', 'time':'night', 'value': 1}, {'location':'US', 'fruit':'orange', 'time':'night', 'value': 3}, {'location':'US', 'fruit':'banana', 'time':'night', 'value': 1}, {'location':'EU', 'fruit':'apple', 'time':'night', 'value': 4}, {'location':'EU', 'fruit':'orange', 'time':'night', 'value': 1}, {'location':'EU', 'fruit':'banana', 'time':'night', 'value': 2}, {'location':'US', 'fruit':'apple', 'time':'day', 'value': 5}, {'location':'US', 'fruit':'orange', 'time':'day', 'value': 2}, {'location':'US', 'fruit':'banana', 'time':'day', 'value': 3}, {'location':'EU', 'fruit':'apple', 'time':'day', 'value': 6}, {'location':'EU', 'fruit':'orange', 'time':'day', 'value': 2}, {'location':'EU', 'fruit':'banana', 'time':'day', 'value': 1}, ] df = pd.DataFrame.from_records(objs)
which gives a dataframe in long form like:
location fruit time value 0 US apple night 1 1 US orange night 3 2 US banana night 1 3 EU apple night 4 4 EU orange night 1 5 EU banana night 2 6 US apple day 5 7 US orange day 2 8 US banana day 3 9 EU apple day 6 10 EU orange day 2 11 EU banana day 1
I want to, for each pair/grouping of location
and time
, conditionally sum the value
column based on the value in the fruit
column.
Specifically:
I want to sum the apple
and orange
but NOT the banana
rows for each grouping.
Resulting in the below dataframe, with the new rows as specified
location fruit time value 0 US apple night 1 1 US orange night 3 2 US banana night 1 3 US NO_BANANA night 4 <-- 4 EU apple night 4 5 EU orange night 1 6 EU banana night 2 7 EU NO_BANANA night 5 <-- 8 US apple day 5 9 US orange day 2 10 US banana day 3 11 US NO_BANANA day 7 <-- 12 EU apple day 6 13 EU orange day 2 14 EU banana day 1 15 EU NO_BANANA day 8 <--
Any help is greatly appreciated
Advertisement
Answer
If the condition is the same for each group, just filter first then group by:
subdf = df[df['fruit']!='banana'].groupby(['location', 'time']).sum().reset_index() subdf['fruit'] = 'NO_BANANA' df = pd.concat([df, subdf]).sort_values(['time', 'location'], ascending = False).reset_index(drop=True)
location fruit time value 0 US apple night 1 1 US orange night 3 2 US banana night 1 3 US NO_BANANA night 4 4 EU apple night 4 5 EU orange night 1 6 EU banana night 2 7 EU NO_BANANA night 5 8 US apple day 5 9 US orange day 2 10 US banana day 3 11 US NO_BANANA day 7 12 EU apple day 6 13 EU orange day 2 14 EU banana day 1 15 EU NO_BANANA day 8