Skip to content
Advertisement

How to calculate the outliers in a Pandas dataframe while excluding NaN values

I have a pandas dataframe that should look like this.

JavaScript

Some values in this dataframe are outliers. I came across this method of calculating the outliers in every colum using the z score:

JavaScript

My goal is to create a column Is Outlier and put a True/False on each row that has/doesn’t have at least one outlier and NaN for rows with at least one NaN value and, at the same time, keep a count of all “True” values.

This is my code so far.

JavaScript

How can I go about doing this?

Advertisement

Answer

If you consider NaN rows to be noise, you can compute the zscore dropping them, this will automatically give you NaNs when you assign the result:

JavaScript

NB. I used at threshold of 1 for the example here.

Output:

JavaScript

Alternatively, zscore has a nan_policy='omit' option, but this wouldn’t directly give you NaN in the output. The zscore computation however will use all values, including those from NaN rows. (This makes no difference in the final result here).

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement