I am using this code:
import imaplib mail = imaplib.IMAP4_SSL('imap.gmail.com') mail.login(myusername, mypassword) mail.list() # Out: list of "folders" aka labels in gmail. mail.select("inbox") # connect to inbox. result, data = mail.search(None, "ALL") ids = data[0] # data is a list. id_list = ids.split() # ids is a space separated string latest_email_id = id_list[-1] # get the latest result, data = mail.fetch(latest_email_id, "(RFC822)") # fetch the email body (RFC822) for the given ID raw_email = data[0][1] # here's the body, which is raw text of the whole email # including headers and alternate payloads print raw_email
and it works, except, when I print raw_email
it returns a bunch of extra information, how can I, parse, per say, the extra information and get just the From and body text?
Advertisement
Answer
Python’s email package is probably a good place to start.
import email msg = email.message_from_string(raw_email) print msg['From'] print msg.get_payload(decode=True)
That should do ask you ask, though when an email has multiple parts (attachments, text and HTML versions of the body, etc.) things are a bit more complicated.
In that case, msg.is_multipart()
will return True and msg.get_payload()
will return a list instead of a string. There’s a lot more information in the email.message documentation.
Alternately, rather than parsing the raw RFC822-formatted message – which could be very large, if the email contains attachments – you could just ask the IMAP server for the information you want. Changing your mail.fetch
line to:
mail.fetch(latest_email_id, "(BODY[HEADER.FIELDS (FROM)])")
Would just request (and return) the From line of the email from the server. Likewise setting the second parameter to "(UID BODY[TEXT])"
would return the body of the email. RFC2060 has a list of parameters that should be valid here.