I have the following list:
pre = ["unable to", "would not", "was not", "did not", "there is not", "could not", "failed to"]
From dataframe column I want to find texts that have the words of the list in order to generate a new column that can print these words along with the next word, for example, in a column cell there is the following text WOULD NOT PRIME CORRECTLY DURING VIRECTOMY.
, I want a new column that prints the following: WOULD NOT PRIME
.
I have tried something like this
def matcher(Event_Description): for i in pre: if i in Event_Description: return i + 1 return "Not found"
Advertisement
Answer
You can loop over every prefix in the list and check for the prefix using .find()
. If it is found, you can change the prefix to the case of event
and append the next word. Like this:
def matcher(event): pres = ["unable to", "would not", "was not", "did not", "there is not", "could not", "failed to"] for pre in pres: i = event.lower().find(pre) if i != -1: return ' '.join([pre.upper() if event.isupper() else pre, *event[i + len(pre) + 1:].split(' ')[0]]) return "Not found"
If you want to include the next two words, just change this line:
return ' '.join([pre.upper() if event.isupper() else pre, *event[i + len(pre) + 1:].split(' ')[0]])
to a slice like this:
return ' '.join([pre.upper() if event.isupper() else pre, *event[i + len(pre) + 1:].split(' ')[0:2]])