Skip to content
Advertisement

automatic mean of multiple columns in python

I have a dataset with multiple variables. I am trying to group these variables according to the end of the name of variable and calculate the mean of each group. Here is an example of my dataset:

data = {'AST_0-01': [1, 2, 3], 
        'AST_0-02': [4, 5, 6],
        'AST_1-01': [7, 8, 9],
        'AST_1-02': [1, 2, 3],
        'AST_2-01': [4, 5, 6],
        'AST_2-02': [7, 8, 9]}

df = pd.DataFrame(data)

What I am trying to do is to group the variables that ends with the same number, e.g.: [AST_0-01, AST_1-01, AST_2-01], [AST_0-02, AST_1-02, AST_2-02], and then get the means.

Advertisement

Answer

First, “transpose” your dataframe so that you can group by the string names

In [3]: df = df.T.reset_index()

In [4]: df
Out[4]:
      index  0  1  2
0  AST_0-01  1  2  3
1  AST_0-02  4  5  6
2  AST_1-01  7  8  9
3  AST_1-02  1  2  3
4  AST_2-01  4  5  6
5  AST_2-02  7  8  9

In [5]: df.groupby(df["index"].str[-2:]).mean()
Out[5]:
         0    1    2
index
01     4.0  5.0  6.0
02     4.0  5.0  6.0

This mean is broken out into the three separate rows in the original dataframe, but if you want the “total” mean, then

In [6]: df.groupby(df["index"].str[-2:]).mean().sum(axis=1)
Out[6]:
index
01    15.0
02    15.0
dtype: float64
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement