covariance between two columns in pandas groupby pandas

Question

I am trying to calculate the covariance between two columns by group. I am doing doing the following: Ideally, I would like to get the covariance between X and Y and not the whole variance-covariance matrix, since I only have two columns. Thank you, Answer You are almost there, only that you do not clear understand the groupby object, see

Accepted Answer

You are almost there, only that you do not clear understand the groupby object, see Pandas-GroupBy for more details.For your problem, if I understand correctly, you would like to calculate cov between two columns in same group.The simplest one is to use groupeby.cov function, which gives pairwise cov between groups.A.groupby('group').cov()                value1    value2group                           A     value1  1.666667 -2.666667      value2 -2.666667  4.666667B     value1  1.000000  0.500000      value2  0.500000  0.333333If you only need cov(grouped_v1, grouped_v2)grouped = A.groupby('group')grouped.apply(lambda x: x['value1'].cov(x['value2']))groupA   -2.666667B    0.500000In which, grouped is a groupby object. For grouped.apply function, it need a callback function as argument and each group will be the argument for the callback function. Here, the callback function is a lambda function, and the argument x is a group (a DataFrame).Hope this will be helpful for your understanding of groupby.

Advertisement

Answer