Skip to content
Advertisement

Tag: apache-spark

pyspark regex extract all

I have a dataframe like below. I am trying to extract the next word after function or var My code is here. as it is capture only one word, the final row returns only AWS and not Twitter. So I would like to capture all matching. My spark version is less than 3, so I tried df.withColumn(‘output’, f.expr(“regexp_extract_all(js, ‘(func)s+(w+)|(var)s+(w+)’, 4)”)).show()

PySpark create a json string by combining columns

I have a dataframe. I would like to perform a transformation that combines a set of columns and stuff into a json string. The columns to be combined is known ahead of time. The output should look like something below. Is there any sugggested method to achieve this? Appreciate any help on this. Answer You can create a struct type

Logical with count in Pyspark

I’m new to Pyspark and I have a problem to solve. I have a dataframe with 4 columns, being customers, person, is_online_store and count: customer PersonId is_online_store count afabd2d2 4 true 1 afabd2d2 8 true 2 afabd2d2 3 true 1 afabd2d2 2 false 1 afabd2d2 4 false 1 I need to create according to the following rules: If PersonId count(column)

Advertisement