Skip to content
Advertisement

Tag: apache-spark-sql

Why do I got TypeError: cannot pickle ‘_thread.RLock’ object when using pyspark

I’m using spark to deal with my data, like that: But I got this error from spark: Traceback (most recent call last): File “/private/var/www/http/hawk-scripts/hawk_etl/scripts/spark_rds_to_parquet.py”, line 46, in process() File “/private/var/www/http/hawk-scripts/hawk_etl/scripts/spark_rds_to_parquet.py”, line 36, in process result = spark.sparkContext.parallelize(dataframe_mysql, 1).map(func) File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py”, line 574, in parallelize File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/context.py”, line 611, in _serialize_to_jvm File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py”, line 211, in dump_stream File “/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py”, line 133,

Logical with count in Pyspark

I’m new to Pyspark and I have a problem to solve. I have a dataframe with 4 columns, being customers, person, is_online_store and count: customer PersonId is_online_store count afabd2d2 4 true 1 afabd2d2 8 true 2 afabd2d2 3 true 1 afabd2d2 2 false 1 afabd2d2 4 false 1 I need to create according to the following rules: If PersonId count(column)

Advertisement