How do I construct an incidence matrix from two dataframe columns using scipy.sparse.coo_matrix((data, (i, j)))?

Question

I have a pandas DataFrame containing two columns [&#8216;A&#8217;, &#8216;B&#8217;]. Each column is made up of integers. I want to construct a sparse matrix with the following properties: row index is all integers from 0 to the max value in the dataframe column index is the same as row index entry i,j = 1 if …

Accepted Answer

import numpy as npimport pandas as pdfrom scipy.sparse import coo_matrixdf = pd.DataFrame(np.random.default_rng(seed=100).integers(5, size=(10,2)))df.columns = ['a', 'b']arr = coo_matrix((np.ones_like(df.a), (df.a.values, df.b.values)))This is what you&#8217;ve got. It gives you i,j >= 1 if [i,j] is in df.arr = arr + arr.Tarray([[0, 1, 2, 2, 0],       [1, 0, 0, 0, 0],       [2, 0, 0, 1, 2],       [2, 0, 1, 0, 1],       [0, 0, 2, 1, 2]])Now i,j >= 1 if [i,j] or [j,i] is in df.arr.data = np.ones_like(arr.data)Now i,j = 1 if [i,j] or [j,i] is in df.array([[0, 1, 1, 1, 0],       [1, 0, 0, 0, 0],       [1, 0, 0, 1, 1],       [1, 0, 1, 0, 1],       [0, 0, 1, 1, 1]])

How do I construct an incidence matrix from two dataframe columns using scipy.sparse.coo_matrix((data, (i, j)))?

EDIT

Sample code

Defining the dataframe

The closest I’ve come to a solution

Advertisement

Answer