Skip to content
Advertisement

How to filter columns containing missing values

I am using the following code:

sns.displot(
    data=df.isna().melt(value_name="missing"),
    y="variable",
    hue="missing",
    multiple="fill",
    height=16
)
plt.show()

to create a heatmap of missing values of the df. However since my df has a lot of columns, the chart has to be very tall in order to accommodate all the information. I tried altering the data argument to be something like this:

data = df[df.columns.values.isna()].isna() or data = df[df.isna().sum() > 0].isna() so basically, I want to filter the dataframe to have only columns with at least one missing value. I tried looking for a correct answer but couldn’t find it.

Advertisement

Answer

Nearly there. To select all columns with at least one missing value, use:

df[df.columns[df.isna().any()]]

Alternatively, you could use .sum() and then choose some threshold:

threshold = 0
df[df.columns[df.isna().sum() > threshold]]

And then append .isna().melt(value_name="missing") for your data var.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement