I’m encoding a message with rsa.encrypt
but then I cannot convert the encrypted data to a str
with .decode()
.
That’s a bit strange because the encrypted data is a bytes string and there shouldn’t be any problem converting that to a str
.
data = [self.id, data, self.my_pubkey] # actually don't care about type of components, they are correct
My code:
import json import rsa def msg(query_type, data): if query_type == 'PubKey': try: query = {"Type": "PubKey", "Message": {"PubKey": data[0], "Id": data[1] } } to_send = json.dumps(query) to_send = to_send.encode() return to_send except Exception as ex: print("Error in creating message") print(ex) elif query_type == 'Message': try: encrypted_data = rsa.encrypt(data[1].encode('utf-8'), data[2]) print(encrypted_data.decode('utf-8')) query = {"Type": "Message", "Message": {"Id": data[0], "Data": str(encrypted_data)[2:-1] } } pub = rsa.lo to_send = json.dumps(query) to_send = to_send.encode() return to_send except Exception as ex: print("Error in creating message") print(ex) except Exception as ex: to_send = str(ex).encode() return to_send
But, I’m getting this error:
Error in creating message 'utf-8' codec can't decode byte 0xfc in position 5: invalid start byte Exception in thread Thread-2: Traceback (most recent call last): File "C:UsersvladiAppDataLocalProgramsPythonPython37libthreading.py", line 926, in _bootstrap_inner self.run() File "C:UsersvladiAppDataLocalProgramsPythonPython37libthreading.py", line 870, in run self._target(*self._args, **self._kwargs) File "C:UsersvladiDocumentsProgrammingpythonServer_ClientClientclient.py", line 28, in send self.sock.send(str(len(data_to_send)).encode()) TypeError: object of type 'NoneType' has no len()```
Advertisement
Answer
Decoding a byte string as utf-8 only makes sense if you know the bytes represent valid utf-8 data. utf-8 will not decode arbitrary byte strings: there’s a specific format (when bytes are >= 0x80, they are interpreted as either “start” or “continuation” bytes and must follow certain patterns; see the Wikipedia page for more information).
On the other hand, encrypting data (using almost any encryption algorithm) will generate random-looking byte strings that almost certainly will not be valid utf-8.
The solution is to treat the output of the encryption process as a byte string – do not attempt to decode it to a string, as it will not make sense as a string. Python provides the bytes/str distinction precisely for this kind of case: bytes are for binary data (e.g. encrypted data), strings are for textual data.
For dumping binary data (as byte strings) into JSON, I suggest using an encoding like Base64 to encode the bytes into ASCII, rather than trying to use a string. This will be more efficient and much easier to debug.