Skip to content
Advertisement

Combining WHEN and aggregation functions

I need to convert this pyspark SQL code sample:

df = df.groupby("id").agg(expr(
"""
CASE
    WHEN last(a) IS NULL THEN first(a)
    ELSE last(a)
    END AS a
"""))

Into a fully dataframe code without SQL expression. I tried:

df = df.groupby("id").agg(
    when(last("a").isNull, first("a"))
    .otherwise(last("a"))
    .alias("a"))

TypeError: condition should be a Column

But obviously, it’s not working. What am I doing wrong? Any suggestion will be appreciated!

Advertisement

Answer

Use isNull to check, not is None:

df = df.groupby("id").agg(
    when(last("a").isNull(), first("a"))
    .otherwise(last(col("a")))
    .alias("a"))
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement