Groupby several columns, summing them up based on the presence of a sub-string

Question

Context: I'm trying to sum all values based in a list only if they start with or contain a string So with a config file like this: And a dataframe like this: How can I group by if they all start by a given substring present on the granularity_suffix_list? Desired output: Attempts: I was trying this: But It doesn't work.

Accepted Answer

Okay finally managed to solve what I wantedPosting the solution if anyone finds it relevanttt = pd.DataFrame({'A_2':[1,2,3],'A_3':[3,4,2],'B_4':[5,2,1],'B_1':[8,2,1],'C_3':[2,4,2]})granularity_suffix_list = ['A','B']def correct_categories(cols_to_aggregate):    lst = []    for _, column in enumerate(cols_to_aggregate):        if not column.startswith(tuple(granularity_suffix_list)):            lst.append(column)        else:            lst.append(granularity_suffix_list[                [i for i, w in enumerate(granularity_suffix_list) if column.startswith(w)][0]            ])    return lstdf = tt.groupby(correct_categories(tt.columns),axis=1).sum()

Advertisement

Answer