I’m trying to parse emails and convert tables within them into pandas dataframes. Since some of the emails are multipart, I took some code from this answer.
The following code works fine but it breaks with multipart/related emails (no tables are found).
HOST = 'imap.gmail.com' m = imaplib.IMAP4_SSL(HOST, 993) m.login(USERNAME, PASSWORD) m.select('Inbox') result, data = m.uid('search', None, "UNSEEN", '(FROM "xxx@xxx.xxx")') print(result) if result == 'OK': for num in data[0].split()[:]: result, data = m.uid('fetch', num, '(RFC822)') if result == 'OK': email_message = email.message_from_bytes(data[0][1]) b = email_message body = "" print(b.is_multipart()) if b.is_multipart(): for part in b.walk(): ctype = part.get_content_type() cdispo = str(part.get('Content-Disposition')) # skip any text/plain (txt) attachments if ctype == 'text/plain' and 'attachment' not in cdispo: body = part.get_payload(decode=True) # decode break else: body = b.get_payload(decode=True) soup = BeautifulSoup(body) table = soup.find_all('table') df = pd.read_html(str(table))[0] display(df)
Here’s the header of one of the multipart/related emails:
Delivered-To: xxxxxx@gmail.com Received: by 2002:a05:6a10:cc86:0:0:0:0 with SMTP id gj6csp6140432pxb; Mon, 27 Dec 2021 14:52:14 -0800 (PST) X-Google-Smtp-Source: ABdhPJxPtKdKdVFNfgIE5xJdGrqDvekcD9MVkXdJaQyjJcVjc63N0KmOSN1LKvqLDbzssUU+6xjG X-Received: by 2002:a05:620a:1132:: with SMTP id p18mr13912209qkk.778.1640645534051; Mon, 27 Dec 2021 14:52:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1640645534; cv=none; d=google.com; s=arc-20160816; b=JUwqNu9ZFFy3j5ke7GddEIhpUGSdzB0gby+k5PFr3AwQv+/JtDY6p9ksOhReeFkQpd 2rNOhn9HknPnVpu1s+S9BT+YIrKWo8jrCzqJRWkaiY7MN80BGjw+oSkoD+WTNoo9rk7t ojil3vIatY02Unl5FfYlOUxZbFZ7Xb3xT44Zd9lRI7aQNrLZxSjeQAF/oL+N8eE0rMXo T5McU5R165sEb81twUpHrSkbp34/v31W25kOwx68Mb7hkuOTv/komZiQy1oiP+xzUKDH CxKOgF/UgzVD5mhyB6DSSEN22DQ4ybrmshmd+B5wugSVlY9hfw0t89kJQGChKUphk9GH /VWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=feedback-id:mime-version:date:message-id:subject:to:from :dkim-signature:dkim-signature; bh=iqw+mlksCZlkG8lxD5rVcYUL5uh/jJYU8nLc+GpCr/4=; b=qnu0Xb2/dj8zwtelmnry7/okDbUj4QpsNPtWtovwrbtlDIpnSS8HRq4qzVzUy6TDFE flm0XO489XNMO/GJ8Jw0J5Duujhnto3PiBRrAtIcA4CXkKhRe3SpXYk7D+PjROg+Zngk 5lqA9RgxerLMq+wMRD4WlcZVuWmmUtBhY/T9XbXOXUlJJJa9qn6AlKNOp5ZV8CDxweTp yCDuQpJSCrbp1mldDe3N6lQAUXfaoGIBu6Kv7hpdZHwdrNMIeuhyCHTI4JF1IV0lK+G0 DzJg76RxnRQ3q0eacW9X/hzbMLZeljxfUO18BeDzRp45i3XqVyVsC53TirpmYv7OcB50 MaWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@xxxxxx.com header.s=xdzpvx2vm2fr73bjeppds7oqr3jbfy5s header.b=BATglTQY; dkim=pass header.i=@amazonses.com header.s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug header.b=YLjw7lGE; spf=pass (google.com: domain of 0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com designates 54.240.11.40 as permitted sender) smtp.mailfrom=0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=xxxxxx.com Return-Path: <0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com> Received: from a11-40.smtp-out.amazonses.com (a11-40.smtp-out.amazonses.com. [54.240.11.40]) by mx.google.com with ESMTPS id g19si4414275qtm.154.2021.12.27.14.52.13 for <xxxxxx@gmail.com> (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Dec 2021 14:52:14 -0800 (PST) Received-SPF: pass (google.com: domain of 0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com designates 54.240.11.40 as permitted sender) client-ip=54.240.11.40; Authentication-Results: mx.google.com; dkim=pass header.i=@xxxxxx.com header.s=xdzpvx2vm2fr73bjeppds7oqr3jbfy5s header.b=BATglTQY; dkim=pass header.i=@amazonses.com header.s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug header.b=YLjw7lGE; spf=pass (google.com: domain of 0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com designates 54.240.11.40 as permitted sender) smtp.mailfrom=0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=xxxxxx.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=xdzpvx2vm2fr73bjeppds7oqr3jbfy5s; d=xxxxxx.com; t=1640645533; h=Content-Type:From:To:Subject:Message-ID:Date:MIME-Version; bh=y7l8Len/FG0elemUfgWg28W0SEj5eOJRIMBt9xIFrQo=; b=BATglTQY6PkcRChCgrX9BMdkZVwppc3CCPZ2QliEN6VGtr4YxW7l0C1n3mMgeRCL 0fXjKZwX3enRf9cHfKFJQErkxlmUfyKkLbtKJ4xNd78r4D04aCgUBRgovY05e2lE2vq KZEiJhF7oUN+QyxE87GahoQ88S/7cVjVVIh0RSHQ= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug; d=amazonses.com; t=1640645533; h=Content-Type:From:To:Subject:Message-ID:Date:MIME-Version:Feedback-ID; bh=y7l8Len/FG0elemUfgWg28W0SEj5eOJRIMBt9xIFrQo=; b=YLjw7lGEYZH+SQ4mx1EEdMVAo2v0EzbKGyGHmzH1CkvlnMv9yjMn4x3/BYhpOTxm yZ532qDZBGIIUPkCjoKOAz6K6a11xzPBREIl8Bz0O0kJyEcoShGahRbY4bgNCkOocx8 IJD+NREMTfVK6wlsxzoWRS+HAnVfg1pU80yORo7M= Content-Type: multipart/related; type="text/html"; boundary="--_NmP-f890ebfb5c0d8a34-Part_1" From: xxxxxx <noreply@xxxxxx.com> To: xxxxxx@gmail.com Subject: Watchlist Summary for Mon, December 27, 2021 (Futures) Message-ID: <0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@email.amazonses.com> Date: Mon, 27 Dec 2021 22:52:13 +0000 MIME-Version: 1.0 Feedback-ID: 1.us-east-1.xy6STr9N8VtfY9IEmltVU/dtudHWlVMH37XgJn5/ROY=:AmazonSES X-SES-Outgoing: 2021.12.27-54.240.11.40 ----_NmP-f890ebfb5c0d8a34-Part_1 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable <!DOCTYPE html><html lang=3D"en"><head><meta charset=3D"UTF-8"><meta http-e= quiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8"><meta http-equ= iv=3D"X-UA-Compatible" content=3D"IE=3Dedge"><meta name=3D"viewport" conten= t=3D"width=3Ddevice-width, initial-scale=3D1.0"><!-- So that mobile webkit = will display zoomed in--><meta name=3D"format-detection" content=3D"telepho= ne=3Dno"><!-- disable auto telephone linking in iOS--><title></title><style= type=3D"text/css">} .ad p { margin-top: 4px; } </style><style type=3D"text/css">#data-table, .data-table { max-width:100%; min-width:100%; width:100%; border-collapse:c= ollapse; } #data-table th, #data-table td, .data-table th, .data-table td { color:#000000; border-collapse:collapse; padding:4px; whit= e-space:nowrap; border:1px solid #D8D8D8; } #data-table .body tr:nth-of-type(odd), .data-table .body tr:nth-of-type(odd) { background-color:#f3f3f3; } #data-table table tbody .spacer td, .data-table table tbody .spacer td { border:none; } .preHeaderHide { display:none !important; mso-hide:all !important; } /* Outlook link fix */ #outlook a { padding:0; } /* Resets: see reset.css for details */ .ReadMsgBody { width:100%; background-color:#ebebeb; } /* Hotmail background and line height fixes */ .ExternalClass { width:100%; background-color:#ebebeb; } .ExternalClass, .ExternalClass p, .ExternalClass span, .ExternalClass font,= .ExternalClass td, .ExternalClass div { line-height:100%; }
Any ideas? Thanks
Advertisement
Answer
you want to parse text/html parts
you should check for content type == ‘text/html’