Skip to content
Advertisement

Filter pyspark DataFrame by string match

i would like check substring match between comments and keyword column and find if anyone of the keywords present in that particular row.

input

JavaScript

expected output

JavaScript

Advertisement

Answer

The most efficient here is to loop, you can use set intersection:

JavaScript

Output:

JavaScript

Used input:

JavaScript

With a minor variation you could check for substring match (“activ” would match “activateds”):

JavaScript

Output:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement