Skip to content
Advertisement

Pandas append row based on conditional sum in long form

So, I have some sample data as such:

import pandas as pd
objs = [
    {'location':'US', 'fruit':'apple', 'time':'night', 'value': 1},
    {'location':'US', 'fruit':'orange', 'time':'night', 'value': 3},
    {'location':'US', 'fruit':'banana', 'time':'night', 'value': 1},
    {'location':'EU', 'fruit':'apple', 'time':'night', 'value': 4},
    {'location':'EU', 'fruit':'orange', 'time':'night', 'value': 1},
    {'location':'EU', 'fruit':'banana', 'time':'night', 'value': 2},
    {'location':'US', 'fruit':'apple', 'time':'day', 'value': 5},
    {'location':'US', 'fruit':'orange', 'time':'day', 'value': 2},
    {'location':'US', 'fruit':'banana', 'time':'day', 'value': 3},
    {'location':'EU', 'fruit':'apple', 'time':'day', 'value': 6},
    {'location':'EU', 'fruit':'orange', 'time':'day', 'value': 2},
    {'location':'EU', 'fruit':'banana', 'time':'day', 'value': 1},
]
df = pd.DataFrame.from_records(objs)

which gives a dataframe in long form like:

   location   fruit   time  value
0        US   apple  night      1
1        US  orange  night      3
2        US  banana  night      1
3        EU   apple  night      4
4        EU  orange  night      1
5        EU  banana  night      2
6        US   apple    day      5
7        US  orange    day      2
8        US  banana    day      3
9        EU   apple    day      6
10       EU  orange    day      2
11       EU  banana    day      1

I want to, for each pair/grouping of location and time, conditionally sum the value column based on the value in the fruit column. Specifically:

I want to sum the apple and orange but NOT the banana rows for each grouping.

Resulting in the below dataframe, with the new rows as specified

   location      fruit   time  value
0        US      apple  night      1
1        US     orange  night      3
2        US     banana  night      1
3        US  NO_BANANA  night      4  <--
4        EU      apple  night      4
5        EU     orange  night      1
6        EU     banana  night      2
7        EU  NO_BANANA  night      5  <--
8        US      apple    day      5
9        US     orange    day      2
10       US     banana    day      3
11       US  NO_BANANA    day      7  <--
12       EU      apple    day      6
13       EU     orange    day      2
14       EU     banana    day      1
15       EU  NO_BANANA    day      8  <--

Any help is greatly appreciated

Advertisement

Answer

If the condition is the same for each group, just filter first then group by:

subdf = df[df['fruit']!='banana'].groupby(['location', 'time']).sum().reset_index()
subdf['fruit'] = 'NO_BANANA'
df = pd.concat([df, subdf]).sort_values(['time', 'location'], ascending = False).reset_index(drop=True)
location    fruit   time    value
0   US  apple   night   1
1   US  orange  night   3
2   US  banana  night   1
3   US  NO_BANANA   night   4
4   EU  apple   night   4
5   EU  orange  night   1
6   EU  banana  night   2
7   EU  NO_BANANA   night   5
8   US  apple   day 5
9   US  orange  day 2
10  US  banana  day 3
11  US  NO_BANANA   day 7
12  EU  apple   day 6
13  EU  orange  day 2
14  EU  banana  day 1
15  EU  NO_BANANA   day 8
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement