I have a multi-indexed (2 levels) pandas Dataframe called data
. I create a view on a part of this dataframe with smalldata = data.loc[(slice(start,end),slice(None)),:]
. If I display smalldata
, it displays the data I expect. If I print smalldata.index
, it shows the multi-index I expect based on the values I chose for start
and end
. But, if I print smalldata.index.levels
, I instead get the full index of data
, not the subset of index values for smalldata
. If I want to know the size of the index levels in smalldata
, there is no simple way I can find to do it. Are there any suggestions? Also, this seems like a bug, which I’m considering reporting to the project Github page, fyi.
An example:
>>> import pandas as pd
>>> import numpy as np
>>> index = [[1,1,1,1,2,2,2,2,3,3,3,3],
[1,2,3,4,1,2,3,4,1,2,3,4]]
>>> data = pd.DataFrame(np.random.randn(12,2),index=index)
>>> data
0 1
1 1 1.001250 1.010419
2 1.199399 -0.395711
3 1.098046 -0.241143
4 0.817590 -0.362434
2 1 1.136975 2.357741
2 -0.470942 -1.223479
3 -0.852259 -0.044660
4 0.380354 -0.214278
3 1 0.609915 0.466289
2 -1.335292 1.368531
3 -1.115441 -0.769688
4 -0.122587 -0.454691
>>> smalldata = data.loc[(slice(None),slice(1,2)),:]
>>> smalldata
0 1
1 1 1.001250 1.010419
2 1.199399 -0.395711
2 1 1.136975 2.357741
2 -0.470942 -1.223479
3 1 0.609915 0.466289
2 -1.335292 1.368531
>>> data.index
MultiIndex([(1, 1),
(1, 2),
(1, 3),
(1, 4),
(2, 1),
(2, 2),
(2, 3),
(2, 4),
(3, 1),
(3, 2),
(3, 3),
(3, 4)],
)
>>> smalldata.index
MultiIndex([(1, 1),
(1, 2),
(2, 1),
(2, 2),
(3, 1),
(3, 2)],
)
>>> smalldata.index.levels
FrozenList([[1, 2, 3], [1, 2, 3, 4]])
>>> smalldata.index.levshape
(3, 4)
I expected smalldata.index.levels and smalldata.index.levshape to reflect the index returned by smalldata.index, but they do not.
Advertisement
Answer
This behavior is intended, as explained in the pandas user guide:
The
MultiIndex
keeps all the defined levels of an index, even if they are not actually used. When slicing an index, you may notice this.
This is done to avoid a recomputation of the levels in order to make slicing highly performant. If you want to see only the used levels, you can use the
get_level_values()
method.
To reconstruct the
MultiIndex
with only the used levels, theremove_unused_levels()
method may be used.