Skip to content
Advertisement

covariance between two columns in pandas groupby pandas

I am trying to calculate the covariance between two columns by group. I am doing doing the following:

A = pd.DataFrame({'group':['A','A','A','A','B','B','B'],
                  'value1':[1,2,3,4,5,6,7],
                  'value2':[8,5,4,3,7,8,8]})

B = A.groupby('group')

B['value1'].cov(B['value2'])

Ideally, I would like to get the covariance between X and Y and not the whole variance-covariance matrix, since I only have two columns.

Thank you,

Advertisement

Answer

You are almost there, only that you do not clear understand the groupby object, see Pandas-GroupBy for more details.

For your problem, if I understand correctly, you would like to calculate cov between two columns in same group.

The simplest one is to use groupeby.cov function, which gives pairwise cov between groups.

A.groupby('group').cov()

                value1    value2
group                           
A     value1  1.666667 -2.666667
      value2 -2.666667  4.666667
B     value1  1.000000  0.500000
      value2  0.500000  0.333333

If you only need cov(grouped_v1, grouped_v2)

grouped = A.groupby('group')
grouped.apply(lambda x: x['value1'].cov(x['value2']))

group
A   -2.666667
B    0.500000

In which, grouped is a groupby object. For grouped.apply function, it need a callback function as argument and each group will be the argument for the callback function. Here, the callback function is a lambda function, and the argument x is a group (a DataFrame).

Hope this will be helpful for your understanding of groupby.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement