I need to extract emails from random text strings. For example:
s = 'Application for training - customer@gmail.com Some notes'
I found out how can i find end of email:
email_end = s.find('.com')+4
But how can i find it’s start index? Maybe we could reverse string and find first ‘ ‘ after @ but how can we do it?
Advertisement
Answer
This is a very non-trivial approach without using regular expression: you can reverse the string.
s = 'Application for training - customer@gmail.com Some notes' s_rev = s[::-1] # Now you are looking for "moc." and this is the starting point: s_rev.find("moc.") -> 11 # Then you can search for the next "space" after this index: s_rev.find(" ", 11) -> 29 # Then you can find the email from the reversed string: s_rev[11:29] -> 'moc.liamg@remotsuc' # Finally reverse it back: s_rev[11:29][::-1] -> 'customer@gmail.com'
As a one-liner:
s[::-1][s[::-1].find("moc."):s[::-1].find(" ", s[::-1].find("moc."))][::-1]
Note that the second find
is looking for a space
after the email address, which is the example you gave. You might ask what if the string ends with the email? That’s fine, since in that case find
will return -1
which is the end of the string, thus you are still able to get the correct email address. The only exception is, there are other characters followed by the email address (i.e., a comma).