So, I have some sample data as such:
JavaScript
x
17
17
1
import pandas as pd
2
objs = [
3
{'location':'US', 'fruit':'apple', 'time':'night', 'value': 1},
4
{'location':'US', 'fruit':'orange', 'time':'night', 'value': 3},
5
{'location':'US', 'fruit':'banana', 'time':'night', 'value': 1},
6
{'location':'EU', 'fruit':'apple', 'time':'night', 'value': 4},
7
{'location':'EU', 'fruit':'orange', 'time':'night', 'value': 1},
8
{'location':'EU', 'fruit':'banana', 'time':'night', 'value': 2},
9
{'location':'US', 'fruit':'apple', 'time':'day', 'value': 5},
10
{'location':'US', 'fruit':'orange', 'time':'day', 'value': 2},
11
{'location':'US', 'fruit':'banana', 'time':'day', 'value': 3},
12
{'location':'EU', 'fruit':'apple', 'time':'day', 'value': 6},
13
{'location':'EU', 'fruit':'orange', 'time':'day', 'value': 2},
14
{'location':'EU', 'fruit':'banana', 'time':'day', 'value': 1},
15
]
16
df = pd.DataFrame.from_records(objs)
17
which gives a dataframe in long form like:
JavaScript
1
14
14
1
location fruit time value
2
0 US apple night 1
3
1 US orange night 3
4
2 US banana night 1
5
3 EU apple night 4
6
4 EU orange night 1
7
5 EU banana night 2
8
6 US apple day 5
9
7 US orange day 2
10
8 US banana day 3
11
9 EU apple day 6
12
10 EU orange day 2
13
11 EU banana day 1
14
I want to, for each pair/grouping of location
and time
, conditionally sum the value
column based on the value in the fruit
column.
Specifically:
I want to sum the apple
and orange
but NOT the banana
rows for each grouping.
Resulting in the below dataframe, with the new rows as specified
JavaScript
1
18
18
1
location fruit time value
2
0 US apple night 1
3
1 US orange night 3
4
2 US banana night 1
5
3 US NO_BANANA night 4 <--
6
4 EU apple night 4
7
5 EU orange night 1
8
6 EU banana night 2
9
7 EU NO_BANANA night 5 <--
10
8 US apple day 5
11
9 US orange day 2
12
10 US banana day 3
13
11 US NO_BANANA day 7 <--
14
12 EU apple day 6
15
13 EU orange day 2
16
14 EU banana day 1
17
15 EU NO_BANANA day 8 <--
18
Any help is greatly appreciated
Advertisement
Answer
If the condition is the same for each group, just filter first then group by:
JavaScript
1
4
1
subdf = df[df['fruit']!='banana'].groupby(['location', 'time']).sum().reset_index()
2
subdf['fruit'] = 'NO_BANANA'
3
df = pd.concat([df, subdf]).sort_values(['time', 'location'], ascending = False).reset_index(drop=True)
4
JavaScript
1
18
18
1
location fruit time value
2
0 US apple night 1
3
1 US orange night 3
4
2 US banana night 1
5
3 US NO_BANANA night 4
6
4 EU apple night 4
7
5 EU orange night 1
8
6 EU banana night 2
9
7 EU NO_BANANA night 5
10
8 US apple day 5
11
9 US orange day 2
12
10 US banana day 3
13
11 US NO_BANANA day 7
14
12 EU apple day 6
15
13 EU orange day 2
16
14 EU banana day 1
17
15 EU NO_BANANA day 8
18