Skip to content
Advertisement

Remove words when match with first 3 or 4 characters

words_to_remove  = ['sstlgh8j', 'abchjk9j']

I need to remove the words in the sentance which starts with sst or abc

I have a sentence in this way:

1) error in node occurred in sstlgh8j at 10pm afterabchjk9j after 12pm
2) error in node occurredsstlgh8j at 10pm after abchjk9j after 12pm

I need to remove those words from the above two sentences. I tried with regex sub module but not working

re.sub('(?:s)sst[, ]*', '', my_string)

It is removing the word when there is a space only

Desired output:
    1) error in node occurred in at 10pm after 12pm
    2) error in node occurred at 10pm after 12pm

Advertisement

Answer

You can use

my_string = re.sub(r's*(?:abc|sst)w*', '', my_string)

See the regex demo. Details:

  • s* – zero or more whitespace chars
  • (?:abc|sst) – either abc or sst
  • w* – zero or more word chars. Replace with [^Wd_]* to match any Unicode letters or [a-zA-Z]* to only match ASCII letters.

See a Python demo:

import re
texts = ['error in node occurred in sstlgh8j at 10pm afterabchjk9j after 12pm',
'error in node occurredsstlgh8j at 10pm after abchjk9j after 12pm']
rx = re.compile(r's*(?:abc|sst)w*')
for mystring in texts:
    print(rx.sub('', mystring))

# => error in node occurred in at 10pm after after 12pm
#    error in node occurred at 10pm after after 12pm
Advertisement