Creating adjacency matrix from sparse SKU data in Python

Question

I have ecommerce data with about 6000 SKUs and 250,000 obs. Simple version below but a lot more sparse. There is only one SKU per line as each line is a transaction. What I have: I want to create a weighted undirected adjacency matrix so that I can do some graph analysis on the market baskets. It would look l…

Accepted Answer

Count also counts zeros. Aggregate by sum instead and then convert to 0s and 1s.agg = df.groupby('ID').agg('sum')agg = (agg > 0).astype(int)    SKU1    SKU2    SKU3ID          55  1       1       166  1       1       077  0       1       0Turn it into a occurrence table and fill the diagonal with 0s for whatever reason.occurrence = np.dot(agg.T, agg)np.fill_diagonal(occurrence, 0)Turn it back into a dataframepd.DataFrame(occurrence, columns=df.columns[1:], index=df.columns[1:])        SKU1    SKU2    SKU3SKU1    0       2       1SKU2    2       0       1SKU3    1       1       0

Advertisement

Answer