Skip to content
Advertisement

Pandas update index on dataframe view

I have a multi-indexed (2 levels) pandas Dataframe called data. I create a view on a part of this dataframe with smalldata = data.loc[(slice(start,end),slice(None)),:]. If I display smalldata, it displays the data I expect. If I print smalldata.index, it shows the multi-index I expect based on the values I chose for start and end. But, if I print smalldata.index.levels, I instead get the full index of data, not the subset of index values for smalldata. If I want to know the size of the index levels in smalldata, there is no simple way I can find to do it. Are there any suggestions? Also, this seems like a bug, which I’m considering reporting to the project Github page, fyi.

An example:

JavaScript

I expected smalldata.index.levels and smalldata.index.levshape to reflect the index returned by smalldata.index, but they do not.

Advertisement

Answer

This behavior is intended, as explained in the pandas user guide:

The MultiIndex keeps all the defined levels of an index, even if they are not actually used. When slicing an index, you may notice this.

This is done to avoid a recomputation of the levels in order to make slicing highly performant. If you want to see only the used levels, you can use the get_level_values() method.

To reconstruct the MultiIndex with only the used levels, the remove_unused_levels() method may be used.

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement