I have the following dataframe:
>>>variable value 0 var1 0.25 1 var2 0.11 2 tt_var1 0.01 3 tt_var2 0.0004 4 tt_var3 0.07 5 bb_var5 0.102 6 var_3 0.45 7 bb_var74 0.23
I want to use cumsum() in order to sum the values in column “1”, but only for specific variables: I want to sum all the variables that start with tt and all the variable that start with bb in my dataframe, so in the end i’ll have the folowing table :
>>>variable value 0 var1 0.25 1 var2 0.11 2 tt *sum of all tt ** 3 bb *sum of all bb** 4 var_3 0.45
I know how to use cumsum but I haven’t found any way to specify it to be only on specific rows that have something in common in one row (e.g the letters bb or tt).
Is there any way to use cumsum() in this way?
Advertisement
Answer
Use groupby,
grouper = df['variable'].replace({'tt_.*': 'tt', 'bb_.*': 'bb'}, regex=True) df.groupby(grouper).sum().reset_index()
Output
variable value 0 bb 0.3320 1 tt 0.0804 2 var1 0.2500 3 var2 0.1100 4 var_3 0.4500