Skip to content
Advertisement

regex whole string match between numbers

I want to extract a whole word from a sentence. Thanks to this answer,

import re

def findWholeWord(w):
    return re.compile(r'b({0})b'.format(w), flags=re.IGNORECASE).search

I can get whole words in cases like:

findWholeWord('thomas')('this is Thomas again')   # -> <match object>
findWholeWord('thomas')('this is,Thomas again')   # -> <match object>
findWholeWord('thomas')('this is,Thomas, again')  # -> <match object>
findWholeWord('thomas')('this is.Thomas, again')  # -> <match object>
findWholeWord('thomas')('this is ?Thomas again')  # -> <match object>

where symbols next to the word don’t bother.

However if there’s a number it doesn’t find the word.

How should I modify the expression to match cases where there’s a number next to the word? Like:

findWholeWord('thomas')('this is 9Thomas, again')
findWholeWord('thomas')('this is9Thomas again')
findWholeWord('thomas')('this is Thomas36 again')

Advertisement

Answer

Can use the regexp (?:d|b){0}(?:d|b) to match the target word with either a word-boundary or a digit on either side of it.

import re

def findWholeWord(w):
    return re.compile(r'(?:d|b){0}(?:d|b)'.format(w), flags=re.IGNORECASE).search

for s in [
    'this is thomas',
    'this is Thomas again',
    'this is,Thomas again',
    'this is,Thomas, again',
    'this is.Thomas, again',
    'this is ?Thomas again',
    'this is 9Thomas, again',
    'this is9Thomas again',
    'this is Thomas36 again',
    'this is 1Thomas2 again',
    'this is -Thomas- again',
    'athomas is no match',
    'thomason no match']:
    print("match >" if findWholeWord('thomas')(s) else "*no match* >", s)

Output:

match > this is thomas
match > this is Thomas again
match > this is,Thomas again
match > this is,Thomas, again
match > this is.Thomas, again
match > this is ?Thomas again
match > this is 9Thomas, again
match > this is9Thomas again
match > this is Thomas36 again
match > this is 1Thomas2 again
match > this is -Thomas- again
*no match* > athomas is no match
*no match* > thomason no match

If you want to reuse the same target word against multiple inputs or in a loop then you can assign findWholeWord() call to a variable then call it.

matcher = findWholeWord('thomas')
print(matcher('this is Thomas again'))
print(matcher('this is,Thomas again'))
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement