Constructing a co-occurrence matrix in python pandas

Question

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring. For example a matrix df: would yield: Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code. Answe…

Accepted Answer

It&#8217;s a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don&#8217;t forget to convert them to integer):>>> df_asint = df.astype(int)>>> coocc = df_asint.T.dot(df_asint)>>> coocc       Dop  Snack  TransDop      4      2      3Snack    2      3      2Trans    3      2      4if, as in R answer, you want to reset diagonal, you can use numpy&#8217;s fill_diagonal:>>> import numpy as np>>> np.fill_diagonal(coocc.values, 0)>>> coocc       Dop  Snack  TransDop      0      2      3Snack    2      0      2Trans    3      2      0

Advertisement

Answer