I need to convert this pyspark SQL code sample:
df = df.groupby("id").agg(expr(
"""
CASE
WHEN last(a) IS NULL THEN first(a)
ELSE last(a)
END AS a
"""))
Into a fully dataframe code without SQL expression. I tried:
df = df.groupby("id").agg(
when(last("a").isNull, first("a"))
.otherwise(last("a"))
.alias("a"))
TypeError: condition should be a Column
But obviously, it’s not working. What am I doing wrong? Any suggestion will be appreciated!
Advertisement
Answer
Use isNull to check, not is None:
df = df.groupby("id").agg(
when(last("a").isNull(), first("a"))
.otherwise(last(col("a")))
.alias("a"))