I need to convert this pyspark SQL code sample:
df = df.groupby("id").agg(expr(
"""
CASE
    WHEN last(a) IS NULL THEN first(a)
    ELSE last(a)
    END AS a
"""))
Into a fully dataframe code without SQL expression. I tried:
df = df.groupby("id").agg(
    when(last("a").isNull, first("a"))
    .otherwise(last("a"))
    .alias("a"))
TypeError: condition should be a Column
But obviously, it’s not working. What am I doing wrong? Any suggestion will be appreciated!
Advertisement
Answer
Use isNull to check, not is None:
df = df.groupby("id").agg(
    when(last("a").isNull(), first("a"))
    .otherwise(last(col("a")))
    .alias("a"))
