I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring.
For example a matrix df:
JavaScript
x
21
21
1
import pandas as pd
2
3
df = pd.DataFrame({'TFD' : ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],
4
'Snack' : ['1', '0', '1', '1', '0', '0'],
5
'Trans' : ['1', '1', '1', '0', '0', '1'],
6
'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')
7
8
print df
9
10
>>>
11
Dop Snack Trans
12
TFD
13
AA 1 1 1
14
SL 0 0 1
15
BB 1 1 1
16
D0 0 1 0
17
Dk 1 0 0
18
FF 1 0 1
19
20
[6 rows x 3 columns]
21
would yield:
JavaScript
1
6
1
Dop Snack Trans
2
3
Dop 0 2 3
4
Snack 2 0 2
5
Trans 3 2 0
6
Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code.
Advertisement
Answer
It’s a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don’t forget to convert them to integer):
JavaScript
1
8
1
>>> df_asint = df.astype(int)
2
>>> coocc = df_asint.T.dot(df_asint)
3
>>> coocc
4
Dop Snack Trans
5
Dop 4 2 3
6
Snack 2 3 2
7
Trans 3 2 4
8
if, as in R answer, you want to reset diagonal, you can use numpy’s fill_diagonal
:
JavaScript
1
8
1
>>> import numpy as np
2
>>> np.fill_diagonal(coocc.values, 0)
3
>>> coocc
4
Dop Snack Trans
5
Dop 0 2 3
6
Snack 2 0 2
7
Trans 3 2 0
8