Problem with a mail message created by a parser



If I create a message this way (using real addresses, of course):

msg = email.message.EmailMessage()
msg['From'] = "sender@example.com"  
msg['To'] = "recipient@example.com" 
msg['Subject'] = "Ayons asperges pour le déjeuner"
msg.set_content("Cela ressemble à un excellent recipie déjeuner.")

I can successfully send it using smtplib. No problem with the Unicode characters in the body. The received message has these headers:

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable

If I try to create the same message in this alternative way:

msgsource = """
From: sender@example.com
To: recipient@example.com
Subject: Ayons asperges pour le déjeuner

Cela ressemble à un excellent recipie déjeuner.
"""

msg = email.parser.Parser(policy=email.policy.default).parsestr(msgsource)

I can’t send it. send_message() from smtplib fails with

UnicodeEncodeError: 'ascii' codec can't encode character 'xe0' in position 15: ordinal not in range(128)

and obviously expects ascii, not Unicode. What causes the difference and how to fix it properly?

(code is based on these examples)

Answer

The error can be avoided by encoding msgsource and then parsing the resulting bytes:

msgsource = msgsource.encode('utf-8')
msg = email.message_from_bytes(msgsource, policy=policy.default)
print(msg)

outputs

From: sender@example.com
To: recipient@example.com
Subject: Ayons asperges pour le =?unknown-8bit?q?d=C3=A9jeuner?=

Cela ressemble �� un excellent recipie d��jeuner.

sending it to Python’s SMTP DebuggingServer produces

b'From: sender@example.com'
b'To: recipient@example.com'
b'Subject: Ayons asperges pour le dxc3xa9jeuner'
b'X-Peer: ::1'
b''
b'Cela ressemble xc3xa0 un excellent recipie dxc3xa9jeuner.'

Note that no encoding headers are written: I’m guessing that the parsers attempt to reproduce the message from the source string or bytes as faithfully as possible, making as few additional assumptions as possible. The Parser docs

[Parser is] an API that can be used to parse a message when the complete contents of the message are available in a [string/bytes/file]

seem to me to support this interpretation.



Source: stackoverflow