Skip to content
Advertisement

pyspark regex extract all

I have a dataframe like below.

JavaScript

I am trying to extract the next word after function or var

My code is here.

JavaScript

as it is capture only one word, the final row returns only AWS and not Twitter.

So I would like to capture all matching.

My spark version is less than 3,

so I tried df.withColumn('output', f.expr("regexp_extract_all(js, '(func)s+(w+)|(var)s+(w+)', 4)")).show()

but it returns only empty for all rows.

my expected output is

JavaScript

Advertisement

Answer

You need to use four to form a regular expression.

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement