GroupBy columns on column header prefix

Question

I have a dataframe with column names that start with a set list of prefixes. I want to get the sum of the values in the dataframe grouped by columns that start with the same prefix. The only way I could figure out how to do it was to loop through the prefix list, get the columns from the dataframe

Accepted Answer

First, it is necessary to determine what columns contain what prefix. We then use this to perform a groupby.grouper = [next(p for p in prefixes if p in c) for c in df.columns]u = df.groupby(grouper, axis=1).sum()   ab  wx0   3   71   3   72   3   73   3   7Almost there, now,u.sum().to_frame().T   ab  wx0  12  28Another option is using np.char.startswith and argmax to vectorize:idx = np.char.startswith(    df.columns.values[:, None].astype(str), prefixes).argmax(1)(pd.Series(df.groupby(idx, axis=1).sum().sum().values, index=prefixes)   .to_frame()   .transpose())   ab  wx0  12  28

Advertisement

Answer