I’m trying to parse emails and convert tables within them into pandas dataframes. Since some of the emails are multipart, I took some code from this answer.
The following code works fine but it breaks with multipart/related emails (no tables are found).
JavaScript
x
32
32
1
HOST = 'imap.gmail.com'
2
m = imaplib.IMAP4_SSL(HOST, 993)
3
m.login(USERNAME, PASSWORD)
4
m.select('Inbox')
5
6
result, data = m.uid('search', None, "UNSEEN", '(FROM "xxx@xxx.xxx")')
7
print(result)
8
if result == 'OK':
9
for num in data[0].split()[:]:
10
result, data = m.uid('fetch', num, '(RFC822)')
11
if result == 'OK':
12
email_message = email.message_from_bytes(data[0][1])
13
b = email_message
14
body = ""
15
16
print(b.is_multipart())
17
if b.is_multipart():
18
for part in b.walk():
19
ctype = part.get_content_type()
20
cdispo = str(part.get('Content-Disposition'))
21
22
# skip any text/plain (txt) attachments
23
if ctype == 'text/plain' and 'attachment' not in cdispo:
24
body = part.get_payload(decode=True) # decode
25
break
26
else:
27
body = b.get_payload(decode=True)
28
soup = BeautifulSoup(body)
29
table = soup.find_all('table')
30
df = pd.read_html(str(table))[0]
31
display(df)
32
Here’s the header of one of the multipart/related emails:
JavaScript
1
90
90
1
Delivered-To: xxxxxx@gmail.com
2
Received: by 2002:a05:6a10:cc86:0:0:0:0 with SMTP id gj6csp6140432pxb;
3
Mon, 27 Dec 2021 14:52:14 -0800 (PST)
4
X-Google-Smtp-Source: ABdhPJxPtKdKdVFNfgIE5xJdGrqDvekcD9MVkXdJaQyjJcVjc63N0KmOSN1LKvqLDbzssUU+6xjG
5
X-Received: by 2002:a05:620a:1132:: with SMTP id p18mr13912209qkk.778.1640645534051;
6
Mon, 27 Dec 2021 14:52:14 -0800 (PST)
7
ARC-Seal: i=1; a=rsa-sha256; t=1640645534; cv=none;
8
d=google.com; s=arc-20160816;
9
b=JUwqNu9ZFFy3j5ke7GddEIhpUGSdzB0gby+k5PFr3AwQv+/JtDY6p9ksOhReeFkQpd
10
2rNOhn9HknPnVpu1s+S9BT+YIrKWo8jrCzqJRWkaiY7MN80BGjw+oSkoD+WTNoo9rk7t
11
ojil3vIatY02Unl5FfYlOUxZbFZ7Xb3xT44Zd9lRI7aQNrLZxSjeQAF/oL+N8eE0rMXo
12
T5McU5R165sEb81twUpHrSkbp34/v31W25kOwx68Mb7hkuOTv/komZiQy1oiP+xzUKDH
13
CxKOgF/UgzVD5mhyB6DSSEN22DQ4ybrmshmd+B5wugSVlY9hfw0t89kJQGChKUphk9GH
14
/VWw==
15
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816;
16
h=feedback-id:mime-version:date:message-id:subject:to:from
17
:dkim-signature:dkim-signature;
18
bh=iqw+mlksCZlkG8lxD5rVcYUL5uh/jJYU8nLc+GpCr/4=;
19
b=qnu0Xb2/dj8zwtelmnry7/okDbUj4QpsNPtWtovwrbtlDIpnSS8HRq4qzVzUy6TDFE
20
flm0XO489XNMO/GJ8Jw0J5Duujhnto3PiBRrAtIcA4CXkKhRe3SpXYk7D+PjROg+Zngk
21
5lqA9RgxerLMq+wMRD4WlcZVuWmmUtBhY/T9XbXOXUlJJJa9qn6AlKNOp5ZV8CDxweTp
22
yCDuQpJSCrbp1mldDe3N6lQAUXfaoGIBu6Kv7hpdZHwdrNMIeuhyCHTI4JF1IV0lK+G0
23
DzJg76RxnRQ3q0eacW9X/hzbMLZeljxfUO18BeDzRp45i3XqVyVsC53TirpmYv7OcB50
24
MaWA==
25
ARC-Authentication-Results: i=1; mx.google.com;
26
dkim=pass header.i=@xxxxxx.com header.s=xdzpvx2vm2fr73bjeppds7oqr3jbfy5s header.b=BATglTQY;
27
dkim=pass header.i=@amazonses.com header.s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug header.b=YLjw7lGE;
28
spf=pass (google.com: domain of 0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com designates 54.240.11.40 as permitted sender) smtp.mailfrom=0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com;
29
dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=xxxxxx.com
30
Return-Path: <0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com>
31
Received: from a11-40.smtp-out.amazonses.com (a11-40.smtp-out.amazonses.com. [54.240.11.40])
32
by mx.google.com with ESMTPS id g19si4414275qtm.154.2021.12.27.14.52.13
33
for <xxxxxx@gmail.com>
34
(version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128);
35
Mon, 27 Dec 2021 14:52:14 -0800 (PST)
36
Received-SPF: pass (google.com: domain of 0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com designates 54.240.11.40 as permitted sender) client-ip=54.240.11.40;
37
Authentication-Results: mx.google.com;
38
dkim=pass header.i=@xxxxxx.com header.s=xdzpvx2vm2fr73bjeppds7oqr3jbfy5s header.b=BATglTQY;
39
dkim=pass header.i=@amazonses.com header.s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug header.b=YLjw7lGE;
40
spf=pass (google.com: domain of 0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com designates 54.240.11.40 as permitted sender) smtp.mailfrom=0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@amazonses.xxxxxx.com;
41
dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=xxxxxx.com
42
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=xdzpvx2vm2fr73bjeppds7oqr3jbfy5s; d=xxxxxx.com; t=1640645533; h=Content-Type:From:To:Subject:Message-ID:Date:MIME-Version; bh=y7l8Len/FG0elemUfgWg28W0SEj5eOJRIMBt9xIFrQo=; b=BATglTQY6PkcRChCgrX9BMdkZVwppc3CCPZ2QliEN6VGtr4YxW7l0C1n3mMgeRCL 0fXjKZwX3enRf9cHfKFJQErkxlmUfyKkLbtKJ4xNd78r4D04aCgUBRgovY05e2lE2vq KZEiJhF7oUN+QyxE87GahoQ88S/7cVjVVIh0RSHQ=
43
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/simple; s=ug7nbtf4gccmlpwj322ax3p6ow6yfsug; d=amazonses.com; t=1640645533; h=Content-Type:From:To:Subject:Message-ID:Date:MIME-Version:Feedback-ID; bh=y7l8Len/FG0elemUfgWg28W0SEj5eOJRIMBt9xIFrQo=; b=YLjw7lGEYZH+SQ4mx1EEdMVAo2v0EzbKGyGHmzH1CkvlnMv9yjMn4x3/BYhpOTxm yZ532qDZBGIIUPkCjoKOAz6K6a11xzPBREIl8Bz0O0kJyEcoShGahRbY4bgNCkOocx8 IJD+NREMTfVK6wlsxzoWRS+HAnVfg1pU80yORo7M=
44
Content-Type: multipart/related; type="text/html"; boundary="--_NmP-f890ebfb5c0d8a34-Part_1"
45
From: xxxxxx <noreply@xxxxxx.com>
46
To: xxxxxx@gmail.com
47
Subject: Watchlist Summary for Mon, December 27, 2021 (Futures)
48
Message-ID: <0100017dfe181d85-07cbd269-a94a-4b30-8d52-1e4e48a34639-000000@email.amazonses.com>
49
Date: Mon, 27 Dec 2021 22:52:13 +0000
50
MIME-Version: 1.0
51
Feedback-ID: 1.us-east-1.xy6STr9N8VtfY9IEmltVU/dtudHWlVMH37XgJn5/ROY=:AmazonSES
52
X-SES-Outgoing: 2021.12.27-54.240.11.40
53
54
----_NmP-f890ebfb5c0d8a34-Part_1
55
Content-Type: text/html; charset=utf-8
56
Content-Transfer-Encoding: quoted-printable
57
58
<!DOCTYPE html><html lang=3D"en"><head><meta charset=3D"UTF-8"><meta http-e=
59
quiv=3D"Content-Type" content=3D"text/html; charset=3DUTF-8"><meta http-equ=
60
iv=3D"X-UA-Compatible" content=3D"IE=3Dedge"><meta name=3D"viewport" conten=
61
t=3D"width=3Ddevice-width, initial-scale=3D1.0"><!-- So that mobile webkit =
62
will display zoomed in--><meta name=3D"format-detection" content=3D"telepho=
63
ne=3Dno"><!-- disable auto telephone linking in iOS--><title></title><style=
64
type=3D"text/css">}
65
.ad p {
66
margin-top: 4px;
67
}
68
</style><style type=3D"text/css">#data-table,
69
.data-table { max-width:100%; min-width:100%; width:100%; border-collapse:c=
70
ollapse; }
71
#data-table th,
72
#data-table td,
73
.data-table th,
74
.data-table td { color:#000000; border-collapse:collapse; padding:4px; whit=
75
e-space:nowrap; border:1px solid #D8D8D8; }
76
#data-table .body tr:nth-of-type(odd),
77
.data-table .body tr:nth-of-type(odd) { background-color:#f3f3f3; }
78
#data-table table tbody .spacer td,
79
.data-table table tbody .spacer td { border:none; }
80
81
.preHeaderHide { display:none !important; mso-hide:all !important; }
82
/* Outlook link fix */
83
#outlook a { padding:0; }
84
/* Resets: see reset.css for details */
85
.ReadMsgBody { width:100%; background-color:#ebebeb; }
86
/* Hotmail background and line height fixes */
87
.ExternalClass { width:100%; background-color:#ebebeb; }
88
.ExternalClass, .ExternalClass p, .ExternalClass span, .ExternalClass font,=
89
.ExternalClass td, .ExternalClass div { line-height:100%; }
90
Any ideas? Thanks
Advertisement
Answer
you want to parse text/html parts
you should check for content type == ‘text/html’