Skip to content
Advertisement

How do I filter multi-level columns using notnull() in pandas?

I generate a multi-index dataframe that has some NAN values using this:

arrays = [["bar", "bar", "baz", "baz", "foo", "foo", "qux", "qux"],["one", "two", "one", "two", "one", "two", "one", "two"],]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples)
a = np.random.randn(3, 8)
mask = np.random.choice([1, 0], a.shape, p=[.3, .7]).astype(bool)
a[mask] = np.nan
df = pd.DataFrame(a, columns=index)
df

Which will create something like this:

enter image description here

I’d like to get rows of a specific subset of top-level columns (eg df[['baz','qux']]) that have no nulls. For example in df[['baz','qux']] I’d like to get rows 0 and 1 since they both have all nulls in 3.

Hoping things would just work like a normal df I tried:

cols = ['bar','baz']
df[cols].loc[df[cols].notnull()]

But I obviously am missing something:

ValueError: Cannot index with multidimensional key

The pandas documentation for multiindex/advanced indexing illustrates how to index and slic this sort of dataframe but doesn’t seem to have anything regarding .loc/lookups/filtering. So I assume I’m looking in the wrong place. But I am having trouble finding results or resources on this.

Advertisement

Answer

df[cols].notna() is not a 1D boolean mask. You have to reduce the dimension using all or any on axis.

>>> df[df[cols].notna().all(1)]

        bar                 baz                 foo                 qux
        one       two       one       two       one       two       one       two
0  1.799680 -0.901705 -1.575930  0.185863 -0.793007  1.485423       NaN       NaN
2  1.379878 -0.748599  0.661697 -1.015311 -0.858144       NaN -1.623013  0.340043
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement