Skip to content
Advertisement

Debugging PySpark udf (lambda function using datetime)

I came across- the below lambda code line in PySpark while browsing a long python Jupyter notebook, I am trying to understand this piece of line. Can you explain what it does in a best possible way?

JavaScript

Advertisement

Answer

JavaScript

udf in PySpark assigns a Python function which is run for every row of Spark df.

Creates a user defined function (UDF).

New in version 1.3.0.

Parameters:

The returnType will be a string. Removing it, we get the function body we’re interested in:

JavaScript

In order to find out what the given lambda function does, you can create a regular function from it. You may need to add imports too.

JavaScript

To really see what’s going on you can create variables out of every element and print them.

JavaScript

This way you see how result changes after every step. You can print whatever you want at any step you need. E.g. print(type(v4))

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement