Pandas update index on dataframe view

Tags: , ,



I have a multi-indexed (2 levels) pandas Dataframe called data. I create a view on a part of this dataframe with smalldata = data.loc[(slice(start,end),slice(None)),:]. If I display smalldata, it displays the data I expect. If I print smalldata.index, it shows the multi-index I expect based on the values I chose for start and end. But, if I print smalldata.index.levels, I instead get the full index of data, not the subset of index values for smalldata. If I want to know the size of the index levels in smalldata, there is no simple way I can find to do it. Are there any suggestions? Also, this seems like a bug, which I’m considering reporting to the project Github page, fyi.

An example:

>>> import pandas as pd
>>> import numpy as np

>>> index = [[1,1,1,1,2,2,2,2,3,3,3,3],
             [1,2,3,4,1,2,3,4,1,2,3,4]]
>>> data = pd.DataFrame(np.random.randn(12,2),index=index)
>>> data
            0         1
1 1  1.001250  1.010419
  2  1.199399 -0.395711
  3  1.098046 -0.241143
  4  0.817590 -0.362434
2 1  1.136975  2.357741
  2 -0.470942 -1.223479
  3 -0.852259 -0.044660
  4  0.380354 -0.214278
3 1  0.609915  0.466289
  2 -1.335292  1.368531
  3 -1.115441 -0.769688
  4 -0.122587 -0.454691

>>> smalldata = data.loc[(slice(None),slice(1,2)),:]
>>> smalldata
            0         1
1 1  1.001250  1.010419
  2  1.199399 -0.395711
2 1  1.136975  2.357741
  2 -0.470942 -1.223479
3 1  0.609915  0.466289
  2 -1.335292  1.368531
>>> data.index
MultiIndex([(1, 1),
            (1, 2),
            (1, 3),
            (1, 4),
            (2, 1),
            (2, 2),
            (2, 3),
            (2, 4),
            (3, 1),
            (3, 2),
            (3, 3),
            (3, 4)],
           )
>>> smalldata.index
MultiIndex([(1, 1),
            (1, 2),
            (2, 1),
            (2, 2),
            (3, 1),
            (3, 2)],
           )
>>> smalldata.index.levels
FrozenList([[1, 2, 3], [1, 2, 3, 4]])
>>> smalldata.index.levshape
(3, 4)

I expected smalldata.index.levels and smalldata.index.levshape to reflect the index returned by smalldata.index, but they do not.

Answer

This behavior is intended, as explained in the pandas user guide:

The MultiIndex keeps all the defined levels of an index, even if they are not actually used. When slicing an index, you may notice this.

This is done to avoid a recomputation of the levels in order to make slicing highly performant. If you want to see only the used levels, you can use the get_level_values() method.

To reconstruct the MultiIndex with only the used levels, the remove_unused_levels() method may be used.



Source: stackoverflow