Skip to content
Advertisement

PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance. How to get rid of it?

I have the following line of code

JavaScript

It basically, filts my multi index df by a specific level 1 column. Drops a few not wanted columns. And does the sum, of all the other ones.

I took a glance, at a few of the documentation and other asked questions. But i didnt quite understood what causes the warning, and i also would love to rewrite this code, so i get rid of it.

Advertisement

Answer

Let’s try with an example (without data for simplicity):

JavaScript
JavaScript

Clearly, the column MultiIndex is not sorted. We can check it with:

JavaScript
JavaScript

This matters because Pandas performs index lookup and other operations much faster if the index is sorted, because it can use operations that assume the sorted order and are faster. Indeed, if we try to drop a column:

JavaScript
JavaScript

Instead, if we sort the index before dropping, the warning disappears:

JavaScript
JavaScript

EDIT (see comments): As the warning suggests, this happens when we do not specify the level from which we want to drop the column. This is because to drop the column, pandas has to traverse the whole index (happens here). By specifying it we do not need such traversal:

JavaScript

However, in general this problem relates more on row indices, as usually column multi-indices are way smaller. But definitely to keep it in mind for larger indices and dataframes. In fact, this is in particular relevant for slicing by index and for lookups. In those cases, you want your index to be sorted for better performance.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement