PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance. How to get rid of it?

Question

I have the following line of code It basically, filts my multi index df by a specific level 1 column. Drops a few not wanted columns. And does the sum, of all the other ones. I took a glance, at a few of the documentation and other asked questions. But i didnt quite understood what causes the warning, and i

Accepted Answer

Let&#8217;s try with an example (without data for simplicity):import pandas as pd# Column MultiIndex.idx = pd.MultiIndex(levels=[['Col1', 'Col2', 'Col3'], ['subcol1', 'subcol2']],                     codes=[[2, 1, 0], [0, 1, 1]])df = pd.DataFrame(columns=range(len(idx)))df.columns = idxprint(df)    Col3    Col2    Col1subcol1 subcol2 subcol2Clearly, the column MultiIndex is not sorted. We can check it with:print(df.columns.is_monotonic_increasing)FalseThis matters because Pandas performs index lookup and other operations much faster if the index is sorted, because it can use operations that assume the sorted order and are faster. Indeed, if we try to drop a column:df.drop('Col1', axis=1)PerformanceWarning: dropping on a non-lexsorted multi-index without a level parameter may impact performance.  df.drop('Col1', axis=1)Instead, if we sort the index before dropping, the warning disappears:print(df.sort_index(axis=1))# Index is now sorted in lexicographical order.    Col1    Col2    Col3subcol2 subcol2 subcol1# No warning here.df.sort_index(axis=1).drop('Col1', axis=1)EDIT (see comments): As the warning suggests, this happens when we do not specify the level from which we want to drop the column. This is because to drop the column, pandas has to traverse the whole index (happens here). By specifying it we do not need such traversal:# Also no warning.df.drop('Col1', axis=1, level=0)However, in general this problem relates more on row indices, as usually column multi-indices are way smaller. But definitely to keep it in mind for larger indices and dataframes. In fact, this is in particular relevant for slicing by index and for lookups. In those cases, you want your index to be sorted for better performance.

Advertisement

Answer