Skip to content
Advertisement

How do I construct an incidence matrix from two dataframe columns using scipy.sparse.coo_matrix((data, (i, j)))?

I have a pandas DataFrame containing two columns [‘A’, ‘B’]. Each column is made up of integers.

I want to construct a sparse matrix with the following properties:

  • row index is all integers from 0 to the max value in the dataframe
  • column index is the same as row index
  • entry i,j = 1 if [i,j] or [j,i] is a row of my dataframe (1 should be the max value of the matrix).

Most importantly, I want to do this using

JavaScript

from scipy.sparse as I’m trying to understand this constructor and this particular way of using it. I have never worked with sparse matrices before. I’ve tried a few things but none of them is working.


EDIT

Sample code

Defining the dataframe

JavaScript

The closest I’ve come to a solution

JavaScript

The problem is this isn’t a symmetric matrix (as it doesn’t add to both i,j and j,i for a given row) and I think it would give values greater than 1 if there were duplicate rows.

Advertisement

Answer

JavaScript

This is what you’ve got. It gives you i,j >= 1 if [i,j] is in df.

JavaScript

Now i,j >= 1 if [i,j] or [j,i] is in df.

JavaScript

Now i,j = 1 if [i,j] or [j,i] is in df.

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement