I have a dataset with multiple variables. I am trying to group these variables according to the end of the name of variable and calculate the mean of each group. Here is an example of my dataset:
data = {'AST_0-01': [1, 2, 3], 'AST_0-02': [4, 5, 6], 'AST_1-01': [7, 8, 9], 'AST_1-02': [1, 2, 3], 'AST_2-01': [4, 5, 6], 'AST_2-02': [7, 8, 9]} df = pd.DataFrame(data)
What I am trying to do is to group the variables that ends with the same number, e.g.: [AST_0-01, AST_1-01, AST_2-01]
, [AST_0-02, AST_1-02, AST_2-02]
, and then get the means.
Advertisement
Answer
First, “transpose” your dataframe so that you can group by the string names
In [3]: df = df.T.reset_index() In [4]: df Out[4]: index 0 1 2 0 AST_0-01 1 2 3 1 AST_0-02 4 5 6 2 AST_1-01 7 8 9 3 AST_1-02 1 2 3 4 AST_2-01 4 5 6 5 AST_2-02 7 8 9 In [5]: df.groupby(df["index"].str[-2:]).mean() Out[5]: 0 1 2 index 01 4.0 5.0 6.0 02 4.0 5.0 6.0
This mean is broken out into the three separate rows in the original dataframe, but if you want the “total” mean, then
In [6]: df.groupby(df["index"].str[-2:]).mean().sum(axis=1) Out[6]: index 01 15.0 02 15.0 dtype: float64