Skip to content
Advertisement

Derive consumption from existing column (Pandas)

Data

client  total   avail   used    date    
charles 5000    5000    0       2/1/2022    
charles 8000    5000    3000    2/1/2022    
charles 500     500     0       3/1/2022    
cara    1000    100     900     3/1/2022    
cara    2000    50      1550    2/1/2022    
cara    500     0       500     2/1/2022    
cara    100     0       100     3/1/2022


                                
                
                            

Desired

client  used    date            
charles 0.23    2/1/2022            
charles 0       3/1/2022            
cara    0.84    2/1/2022            
cara    1       3/1/2022            
                    

Doing

Groupby client and date; create calculated column ['used']/['total'] * 100

first create derived column

df["used"] = df["used)"]/['total'] * 100

df.groupby(['client','date']) 
.agg({'used':'max'}).reset_index() 

df["used"] = df["used)"]/['total'] * 100

Any suggestion is helpful

Advertisement

Answer

You aren’t using a correct aggregation function. You should be using sum on both your “used” and “total” columns:

df = pd.DataFrame(
    {
        'client': ['charles', 'charles', 'charles', 'cara', 'cara', 'cara', 'cara'],
        'total': [5000, 8000, 500, 1000, 2000, 500, 100],
        'used': [0, 3000, 0, 900, 1550, 500, 100],
        'date': ['2/1/2022', '2/1/2022', '3/1/2022', '3/1/2022', '2/1/2022', '2/1/2022', '3/1/2022']
    }
)

tmp = df.groupby(['client', 'date']).agg({'used': 'sum', 'total': 'sum'})
tmp['used_frac'] = tmp['used'] / tmp['total']
tmp.reset_index(inplace=True)
tmp

# Out: 
    client  date      used      total   used_frac
0   cara    2/1/2022    2050    2500    0.820000
1   cara    3/1/2022    1000    1100    0.909091
2   charles 2/1/2022    3000    13000   0.230769
3   charles 3/1/2022    0       500     0.000000
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement