Skip to content
Advertisement

Creating adjacency matrix from sparse SKU data in Python

I have ecommerce data with about 6000 SKUs and 250,000 obs. Simple version below but a lot more sparse. There is only one SKU per line as each line is a transaction.

What I have:

JavaScript

I want to create a weighted undirected adjacency matrix so that I can do some graph analysis on the market baskets. It would look like the below, where SKU2 and SKU1 were bought together in baskets 55 and 66 and therefore have a total weight of 2.

What I want:

JavaScript

I have tried a for loop iterating through the original DF but it crashes immediately.

Ideally I would collapse the first dataframe by the ID column but without aggregating, as there are no duplicate transactions for the same item and same ID. However, when I try to collapse using df.groupby(['ID']).count() I get the following. When I remove .count() there is no output. I’m sure there is another way to do this but can’t seem to find it in the documentation.

What I tried: df.groupby(['ID']).count()

JavaScript

Anyone know how I can generate the sparse matrix without immediately crashing my computer?

Advertisement

Answer

Count also counts zeros. Aggregate by sum instead and then convert to 0s and 1s.

JavaScript

Turn it into a occurrence table and fill the diagonal with 0s for whatever reason.

JavaScript

Turn it back into a dataframe

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement