I have a pandas DataFrame containing two columns [‘A’, ‘B’]. Each column is made up of integers.
I want to construct a sparse matrix with the following properties:
- row index is all integers from 0 to the max value in the dataframe
- column index is the same as row index
- entry i,j = 1 if [i,j] or [j,i] is a row of my dataframe (1 should be the max value of the matrix).
Most importantly, I want to do this using
JavaScript
x
2
1
coo_matrix((data, (i, j)))
2
from scipy.sparse as I’m trying to understand this constructor and this particular way of using it. I have never worked with sparse matrices before. I’ve tried a few things but none of them is working.
EDIT
Sample code
Defining the dataframe
JavaScript
1
18
18
1
In [96]: df = pd.DataFrame(np.random.randint(5, size=(10,2)))
2
3
In [97]: df.columns = ['a', 'b']
4
5
In [98]: df
6
Out[98]:
7
a b
8
0 0 3
9
1 1 4
10
2 3 3
11
3 2 0
12
4 0 2
13
5 1 0
14
6 1 1
15
7 2 3
16
8 3 4
17
9 3 2
18
The closest I’ve come to a solution
JavaScript
1
8
1
In [100]: scipy.sparse.coo_matrix((np.ones_like(df['a']), (df['a'].array, df['b'
2
array))).toarray() : ].
3
Out[100]:
4
array([[0, 0, 1, 1, 0],
5
[1, 1, 0, 0, 1],
6
[1, 0, 0, 1, 0],
7
[0, 0, 1, 1, 1]])
8
The problem is this isn’t a symmetric matrix (as it doesn’t add to both i,j and j,i for a given row) and I think it would give values greater than 1 if there were duplicate rows.
Advertisement
Answer
JavaScript
1
9
1
import numpy as np
2
import pandas as pd
3
from scipy.sparse import coo_matrix
4
5
df = pd.DataFrame(np.random.default_rng(seed=100).integers(5, size=(10,2)))
6
df.columns = ['a', 'b']
7
8
arr = coo_matrix((np.ones_like(df.a), (df.a.values, df.b.values)))
9
This is what you’ve got. It gives you i,j >= 1 if [i,j] is in df.
JavaScript
1
8
1
arr = arr + arr.T
2
3
array([[0, 1, 2, 2, 0],
4
[1, 0, 0, 0, 0],
5
[2, 0, 0, 1, 2],
6
[2, 0, 1, 0, 1],
7
[0, 0, 2, 1, 2]])
8
Now i,j >= 1 if [i,j] or [j,i] is in df.
JavaScript
1
2
1
arr.data = np.ones_like(arr.data)
2
Now i,j = 1 if [i,j] or [j,i] is in df.
JavaScript
1
6
1
array([[0, 1, 1, 1, 0],
2
[1, 0, 0, 0, 0],
3
[1, 0, 0, 1, 1],
4
[1, 0, 1, 0, 1],
5
[0, 0, 1, 1, 1]])
6