Skip to content
Advertisement

pyspark – filter rows containing set of special characters

I have a data frame as follow:-

df= 
    a    b     
   goat* bat    
   ki^ck ball
   range@ kick
   rick?  kill

Now I want to find the count of total special characters present in each column. So I have used str. contains function to find it, though it is running but it does not find the special characters.

code:-
special = df.filter(df['a'].contains('[!@$^&-_;:?.#*]'))
print(special.count())

output:-  0
excepted output:- 4

Advertisement

Answer

You may want to use rlike instead of contains, which allows to search for regular expressions

df.filter(df['a'].rlike('[!@$^&-_;:?.#*]')).count()
# 4
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement