Skip to content
Advertisement

Python – Unicode De/Encode

How can I pass all the content from making a db-input(s1), loading it from there (s2) and pass it correctly back-formated to the file?

import time,os,sys,base64
s = "Hello World!rnHeyho"
#with s1 i make an input to the database; with s2 I select it -> works most time
s1 = base64.b64encode(s.encode("UTF-8")).decode("UTF-8") #print("Base64 Encoded:", s1)
s2 = base64.b64decode(s1.encode("UTF-8")).decode("UTF-8") #print(s2)

#example that I try to save it in a file:
s3 = "PGhlYWQ+CiAgICA8dGl0bGU+4pa3IEltbW9iaWxpZW4gLSBIw6R1c2VyIC0gV29obnVuZ2VuIC0gZmluZGVuIGJlaSBpbW1vd2VsdC5kZTwvdGl0bGU+"
with open("C:\Users\001\Downloads\Output.txt", "w") as text_file:
    text_file.write("Ausgabe: %s" % base64.b64decode(s3.encode("UTF-8")).decode("UTF-8")) #with .encode('ascii', 'ignore') i whould delete the signs

Log:

C:Users01Downloads>python trythis.py
Traceback (most recent call last):
  File "trythis.py", line 11, in <module>
    text_file.write("Ausgabe: %s" % base64.b64decode(s3.encode("UTF-8")).decode("UTF-8")) #with .encode('ascii', 'ignore') i whould delelte signs
  File "C:Users01AppDataLocalProgramsPythonPython35libencodingscp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character 'u25b7' in position 28: character maps to <undefined>

EDIT: I am working on windows.

C:Users01Downloads>python -V
Python 3.5.2

Advertisement

Answer

The problem is that you open the file in text mode, but don’t specify the encoding. In that case the system default encoding is used, which may be different on any system.

Solution: specify the encoding argument to open().

As a side-note: why do you .decode('UTF-8')? It does work, but since the data is Base64-encoded, I think ASCII decoding would make more sense. Besides, you should only encode/decode at the I/O boundaries (so in this case when writing to file), although you may have done it for testing/demonstration purposes only in this case.

Update:

Apparently your Base64-encoded data is also UTF-8 encoded (first UTF-8, then Base64), so that’s why you need to first Base64-decode then UTF-8-decode it.

The following is a portable, working example:

import base64

b64_encoded_text = 'PGhlYWQ+CiAgICA8dGl0bGU+4pa3IEltbW9iaWxpZW4gLSBIw6R1c2VyIC0gV29obnVuZ2VuIC0gZmluZGVuIGJlaSBpbW1vd2VsdC5kZTwvdGl0bGU+'
decoded_text = base64.b64decode(b64_encoded_text).decode('utf-8')

with open('Output.txt', 'wt', encoding='utf-8') as text_file:
    text_file.write('Ausgabe: %s' % decoded_text)

Although it’s even easier to just write the raw binary (UTF-8 encoded) data to the file:

import base64

b64_encoded_text = 'PGhlYWQ+CiAgICA8dGl0bGU+4pa3IEltbW9iaWxpZW4gLSBIw6R1c2VyIC0gV29obnVuZ2VuIC0gZmluZGVuIGJlaSBpbW1vd2VsdC5kZTwvdGl0bGU+'

with open('Output.txt', 'wb') as file:
    # file.write(b'Ausgabe: ')  # uncomment if really needed
    file.write(base64.b64decode(b64_encoded_text))
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement