Skip to content
Advertisement

How to search for email where the subject contains numbers

I’m looking for emails where the title has information on how many Bitcoin I received, but as there’s a number in the email title, I want a way to find emails where the number is equal to or greater than that number.

Example… I have an email title like “You received 0.000666703 BTC” but I want to search if the title is this one or has a larger amount of numbers, for example, I want to be able to find this title “You received 0.002719281 BTC”, but I don’t want to find this “You received 0.000028181 BTC” because the number is smaller. I want to be able to find numbers greater than or equal to the first title, this is my code:

import imaplib                                       
import credentials                                   
import email
from bs4 import BeautifulSoup                                                                             
imap_ssl_host = 'imap.gmail.com'                     
imap_ssl_port = 993                                  
username = "myemail"          
password = "mypass"
server = imaplib.IMAP4_SSL(imap_ssl_host, imap_ssl_port)                                                                                                       
server.login(username, password)                     
server.select('INBOX')                               
typ, data = server.search(None, '(FROM "no-reply@coinbase.com" SUBJECT "You received 0,00066703 BTC" SINCE "24-Sep-2021")')                                         
for num in data[0].split():                           
 typ, data = server.fetch(num,'(RFC822)')             
 msg = email.message_from_bytes(data[0][1])     
 print(msg.get_payload(decode=True))                

The beginning of the subject will always be “You received” but after that there are numbers, and letters that will be the amount of btc and “BTC” as well as my example in the question, but how can I extract only the numbers?

The console output is HTML content, I just want to know if the title (like I explained before) exists so I can do the rest, is there any way to do this more efficiently?

Advertisement

Answer

If you only care about the subject, only fetch the subject.

import imaplib
from email.parser import HeaderParser
from email.policy import default  # use Python >= 3.6 EmailMessage API

... 

parser = HeaderParser(policy=default)

server.select('INBOX')
typ, data = server.search(None, '(FROM "no-reply@coinbase.com" SUBJECT "You received" SINCE "24-Sep-2021")')
if typ == 'ok':
    for num in data[0].split():
       ok, fetched = server.fetch(num, '(BODY.PEEK[HEADER.FIELDS (SUBJECT)])')
       if ok == 'ok':
           subj = parser.parsestr(fetched[0][1].decode('us-ascii'))
           if not subj.startswith('Subject: You received'):
               continue
           try:
               amount = float(subj.split()[2])
           except IndexError, ValueError:
               continue
           if amount > 0.000666703:
               print('Message %i: %s', num, subj)

The Subject: header is a bytes string which at a minimum you have to decode. However, there may also be a MIME wrapping (like maybe Subject: =?UTF-8?B?WW91IHJlY2VpdmVkIDAuMTIzIEJUQw==) which you need to decode using the email.parser.HeaderParser methods or something similar. The interface is a bit messy (you really wish there was a way to pass it bytes so you don’t have to separately decode).

The BODY.PEEK method does not modify the message’s flags (whereas just BODY would mark the message as read, etc).

Some IMAP servers support more complex search syntax (perhaps even regex) but this should be reasonably portable and robust, I hope.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement