Skip to content
Advertisement

Filter Pyspark dataframe column with None value

I’m trying to filter a PySpark dataframe that has None as a row value:

JavaScript

and I can filter correctly with an string value:

JavaScript

but this fails:

JavaScript

But there are definitely values on each category. What’s going on?

Advertisement

Answer

You can use Column.isNull / Column.isNotNull:

JavaScript

If you want to simply drop NULL values you can use na.drop with subset argument:

JavaScript

Equality based comparisons with NULL won’t work because in SQL NULL is undefined so any attempt to compare it with another value returns NULL:

JavaScript

The only valid method to compare value with NULL is IS / IS NOT which are equivalent to the isNull / isNotNull method calls.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement