I’m trying to scrape a portion of text out of a long text using regex.
Original text: If you have any questions or concerns, you may contact us at kaieldentsome [!at] gmail.com. You can also follow us on fb
Portion I’m interested in: kaieldentsome [!at] gmail.com.
It’s not necessary that contact us at
will always be present there.
I’ve tried with:
JavaScript
x
6
1
import re
2
3
item_str = 'If you have any questions or concerns, you may contact us at kaieldentsome [!at] gmail.com. You can also follow us on fb'
4
output = re.findall(r"(?<=s).*?s[!at].*?s.*?s",item_str)[0]
5
print(output)
6
Output I wish to get:
JavaScript
1
2
1
kaieldentsome [!at] gmail.com.
2
Advertisement
Answer
You could use
JavaScript
1
2
1
(?<=s)S+s[!at]sS+.S+
2
(?<=s)
Positive lookbehind, assert a whitespace char to the leftS+
Match 1+ non whitespace charss[!at]s
Match[!at]
between whitespace charsS+.S+
Match 1+ non whitespace chars with at least a dot
Note that there has to be a whitespace to the left present. If that is not mandatory, you can omit (?<=s)