Skip to content
Advertisement

Pandas sum() with character condition

I have the following dataframe:

>>>variable    value
0  var1        0.25
1  var2        0.11
2  tt_var1     0.01
3  tt_var2     0.0004
4  tt_var3     0.07
5  bb_var5     0.102
6  var_3       0.45
7  bb_var74    0.23

I want to use cumsum() in order to sum the values in column “1”, but only for specific variables: I want to sum all the variables that start with tt and all the variable that start with bb in my dataframe, so in the end i’ll have the folowing table :

>>>variable    value
0  var1        0.25
1  var2        0.11
2  tt          *sum of all tt **
3  bb          *sum of all bb**
4  var_3       0.45

I know how to use cumsum but I haven’t found any way to specify it to be only on specific rows that have something in common in one row (e.g the letters bb or tt).

Is there any way to use cumsum() in this way?

Advertisement

Answer

Use groupby,

grouper = df['variable'].replace({'tt_.*': 'tt', 'bb_.*': 'bb'}, regex=True)
df.groupby(grouper).sum().reset_index()

Output

  variable   value
0       bb  0.3320
1       tt  0.0804
2     var1  0.2500
3     var2  0.1100
4    var_3  0.4500
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement