I need to convert this pyspark SQL code sample:
JavaScript
x
8
1
df = df.groupby("id").agg(expr(
2
"""
3
CASE
4
WHEN last(a) IS NULL THEN first(a)
5
ELSE last(a)
6
END AS a
7
"""))
8
Into a fully dataframe code without SQL expression. I tried:
JavaScript
1
5
1
df = df.groupby("id").agg(
2
when(last("a").isNull, first("a"))
3
.otherwise(last("a"))
4
.alias("a"))
5
TypeError: condition should be a Column
But obviously, it’s not working. What am I doing wrong? Any suggestion will be appreciated!
Advertisement
Answer
Use isNull
to check, not is None
:
JavaScript
1
5
1
df = df.groupby("id").agg(
2
when(last("a").isNull(), first("a"))
3
.otherwise(last(col("a")))
4
.alias("a"))
5