Python – Fast count words in text from list of strings and that start with

Question

I know that similar questions have been asked several times, but my problem is a bit different and I am looking for a time-efficient solution, in Python. I have a set of words, some of them end with the &#8220;*&#8221; and some others don&#8217;t: I have to count their total occurrences in a text, considering…

Accepted Answer

You can do this with regex, creating a regex out of the set of words, putting word boundaries around the words but leaving the trailing word boundary off words that end with *. Compiling the regex should help performance:import rewords = set(["apple", "cat*", "dog"])text = "My cat loves apples, but I never ate an apple. My dog loves them less than my CATS"regex = re.compile('|'.join([r'b' + w[:-1] if w.endswith('*') else r'b' + w + r'b' for w in words]), re.I)matches = regex.findall(text)print(len(matches))Output:4

Advertisement

Answer