Skip to content

how to use multiple when conditions in pyspark for updating column values

I am looking for a solution where we can use multiple when conditions for updating a column values in pyspark.

I am currently trying to achieve a solution when we have multiple conditions in spark how we can update a column. I have one dataframe in which we have three columns DATE, Flag_values, salary:

df = df.withColumn('Flag_values', F.when(F.col('DATE') < '2019-08-01', 'GIVEHIKE').otherwise(''))

After this I have to update the same column Flag_values now with multiple conditins:

new_column=when((col("Flag_values")=='') & (col("DATE")<'2019-12-15'),'GIVE BONUS ONLY').otherwise('')
df = df.withColumn("Flag_values", new_column)

But when I am running these two queries together the second query is overwriting the first query with data which I dont want. I want data for both query conditions.

The other queries that I have also tried are:

df = df.withColumn('Flag_values', F.when(F.col('DATE') < '2019-08-01', 'GIVEHIKE').otherwise('PYTHON'))

df = df.withColumn('Flag_values', F.when((F.col('DATE') < '2019-12-15') & (F.col('Flag_values') == 'PYTHON') , 'GIVE BONUS').otherwise('JAVA'))

I thought may be otherwise null is creating issue, but again after running the query I am getting output of second query only. Seems like it is overwriting first query data. Is there any approach by which we can achieve this solution?



second query is overwriting the first query

To avoid that you can pass data frame column value in .otherwise So it will replace the value you want and keep the previous value as it is.

  • My sample Dataframe

enter image description here

  • Query I used
df2 = df1.withColumn('Flag_values', when( < '2019-08-01', 'GIVEHIKE').otherwise(''))
new_column=when((df2.Flag_values == '') & ( < '2019-12-15'),'GIVE BONUS ONLY').otherwise(df2.Flag_values)
df3 = df2.withColumn("Flag_values", new_column)
  • Execution enter image description here
User contributions licensed under: CC BY-SA
7 People found this is helpful