Using pyspark.sql.functions without sparkContext import problem

Question

I have situation which can be trivialized to example with two files. filters.py main.py It appears, that F.col object cannot be created without active sparkSession/sparkContext object, so import fails. Is there any way to keep filters separated from other files and how i can import them? My situation is a lit…

Accepted Answer

You could create conditions as strings:filters.pycondition = "F.col('a') == 123"And then use eval to run is as code:main.pyfrom pyspark.sql import SparkSessionimport pyspark.sql.functions as Ffrom filters import conditionif __name__ == "__main__":    spark = SparkSession.builder.getOrCreate()    data = [        {"id": 1, "a": 123},        {"id": 2, "a": 23},    ]    df = spark.createDataFrame(data=data)    df = df.filter(eval(condition))The result in this example is, as expected:+---+---+|  a| id|+---+---+|123|  1|+---+---+

Advertisement

Answer