Skip to content
Advertisement

Visualize how multiple categorical values differ across rows and columns in a dataframe

I have the following DataFrame where each column represents a categorization algorithm for the items in the index (a,b, …)

JavaScript
JavaScript

I would like to reorder the category names in each column so that I can better assess whether the index items are being categorised similarly across columns.

Is there a way to visualise how the categories differ across columns? Something like a vendiagram.

Thank you in advance.

Advertisement

Answer

Here is my take on your interesting question.

Using Python standard library difflib module, which provides helpers for computing deltas, you can define a helper function.

JavaScript

The general idea is to rate similarities between rows using a unique identifier (based on all columns), and sort the dataframe from most similar to less similar rows.

JavaScript
JavaScript

enter image description here

Then, assign an arbitrary color to the first row (as a whole) and its individual values, and go through each row and either assign the previous color (if identical) or a new one (both to row itself and the values), so that, for instance, c2 in rows e and f has the same color.

JavaScript

And finally, in a Jupyter notebook cell, run:

JavaScript

Output:

enter image description here

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement