Skip to content
Advertisement

How to write this pandas logic for pyspark.sql.dataframe.DataFrame without using pandas on spark API?

I’m totally new to Pyspark, as Pyspark doesn’t have loc feature how can we write this logic. I tried by specifying conditions but couldn’t get the desirable result, any help would be greatly appreciated!

JavaScript

Advertisement

Answer

For a data like the following

JavaScript

You’re actually updating total column in each statement, not in an if-then-else way. Your code can be replicated (as is) in pyspark using multiple withColumn() with when() like the following.

JavaScript

We can merge all the withColumn() with when() into a single withColumn() with multiple when() statements.

JavaScript

It’s like numpy.where and SQL’s case statements.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement