I have a multi-indexed (2 levels) pandas Dataframe called data
. I create a view on a part of this dataframe with smalldata = data.loc[(slice(start,end),slice(None)),:]
. If I display smalldata
, it displays the data I expect. If I print smalldata.index
, it shows the multi-index I expect based on the values I chose for start
and end
. But, if I print smalldata.index.levels
, I instead get the full index of data
, not the subset of index values for smalldata
. If I want to know the size of the index levels in smalldata
, there is no simple way I can find to do it. Are there any suggestions? Also, this seems like a bug, which I’m considering reporting to the project Github page, fyi.
An example:
>>> import pandas as pd >>> import numpy as np >>> index = [[1,1,1,1,2,2,2,2,3,3,3,3], [1,2,3,4,1,2,3,4,1,2,3,4]] >>> data = pd.DataFrame(np.random.randn(12,2),index=index) >>> data 0 1 1 1 1.001250 1.010419 2 1.199399 -0.395711 3 1.098046 -0.241143 4 0.817590 -0.362434 2 1 1.136975 2.357741 2 -0.470942 -1.223479 3 -0.852259 -0.044660 4 0.380354 -0.214278 3 1 0.609915 0.466289 2 -1.335292 1.368531 3 -1.115441 -0.769688 4 -0.122587 -0.454691 >>> smalldata = data.loc[(slice(None),slice(1,2)),:] >>> smalldata 0 1 1 1 1.001250 1.010419 2 1.199399 -0.395711 2 1 1.136975 2.357741 2 -0.470942 -1.223479 3 1 0.609915 0.466289 2 -1.335292 1.368531 >>> data.index MultiIndex([(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3, 3), (3, 4)], ) >>> smalldata.index MultiIndex([(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2)], ) >>> smalldata.index.levels FrozenList([[1, 2, 3], [1, 2, 3, 4]]) >>> smalldata.index.levshape (3, 4)
I expected smalldata.index.levels and smalldata.index.levshape to reflect the index returned by smalldata.index, but they do not.
Advertisement
Answer
This behavior is intended, as explained in the pandas user guide:
The
MultiIndex
keeps all the defined levels of an index, even if they are not actually used. When slicing an index, you may notice this.
This is done to avoid a recomputation of the levels in order to make slicing highly performant. If you want to see only the used levels, you can use the
get_level_values()
method.
To reconstruct the
MultiIndex
with only the used levels, theremove_unused_levels()
method may be used.