If I create a message this way (using real addresses, of course):
msg = email.message.EmailMessage() msg['From'] = "sender@example.com" msg['To'] = "recipient@example.com" msg['Subject'] = "Ayons asperges pour le déjeuner" msg.set_content("Cela ressemble à un excellent recipie déjeuner.")
I can successfully send it using smtplib
. No problem with the Unicode characters in the body. The received message has these headers:
Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable
If I try to create the same message in this alternative way:
msgsource = """ From: sender@example.com To: recipient@example.com Subject: Ayons asperges pour le déjeuner Cela ressemble à un excellent recipie déjeuner. """ msg = email.parser.Parser(policy=email.policy.default).parsestr(msgsource)
I can’t send it. send_message()
from smtplib
fails with
UnicodeEncodeError: 'ascii' codec can't encode character 'xe0' in position 15: ordinal not in range(128)
and obviously expects ascii, not Unicode. What causes the difference and how to fix it properly?
(code is based on these examples)
Advertisement
Answer
The error can be avoided by encoding msgsource
and then parsing the resulting bytes:
msgsource = msgsource.encode('utf-8') msg = email.message_from_bytes(msgsource, policy=policy.default) print(msg)
outputs
From: sender@example.com To: recipient@example.com Subject: Ayons asperges pour le =?unknown-8bit?q?d=C3=A9jeuner?= Cela ressemble �� un excellent recipie d��jeuner.
sending it to Python’s SMTP DebuggingServer produces
b'From: sender@example.com' b'To: recipient@example.com' b'Subject: Ayons asperges pour le dxc3xa9jeuner' b'X-Peer: ::1' b'' b'Cela ressemble xc3xa0 un excellent recipie dxc3xa9jeuner.'
Note that no encoding headers are written: I’m guessing that the parsers attempt to reproduce the message from the source string or bytes as faithfully as possible, making as few additional assumptions as possible. The Parser docs
[Parser is] an API that can be used to parse a message when the complete contents of the message are available in a [string/bytes/file]
seem to me to support this interpretation.