I need to convert this pyspark SQL code sample:
df = df.groupby("id").agg(expr( """ CASE WHEN last(a) IS NULL THEN first(a) ELSE last(a) END AS a """))
Into a fully dataframe code without SQL expression. I tried:
df = df.groupby("id").agg( when(last("a").isNull, first("a")) .otherwise(last("a")) .alias("a"))
TypeError: condition should be a Column
But obviously, it’s not working. What am I doing wrong? Any suggestion will be appreciated!
Advertisement
Answer
Use isNull
to check, not is None
:
df = df.groupby("id").agg( when(last("a").isNull(), first("a")) .otherwise(last(col("a"))) .alias("a"))