How can I read a byte array file of strings?

Question

There is a file with following contents: This is my try to read the lines and convert them to readable utf characters, but still it shows the same strings in the output file: The output file is: As you see, the problem exists for input line but not for target and prediction lines (however scrambled but that's okay) Answer It

Accepted Answer

It seems someone wrote bytes in wrong way. Someone used str(bytes) instead of bytes.decode('utf-8'). Or maybe code was created for Python 2 which treats bytes and strings in different way then Python 3.if you can correct code which write it then you have to fix texttext = "b'oEffect:PersonX xd8xafxd8xb1 xd8xacxd9x86xdaxaf ___ xd8xa8xd8xa7xd8xb2xdbx8c xd9x85xdbx8c xdaxa9xd9x86xd8xaf'"crop b' 'text = text[2:-1]convert back to bytes using special encoding 'raw_unicode_escape'text = text.encode('raw_unicode_escape')and convert to string correctlytext = text.decode()And nowprint(text)gives meoEffect:PersonX در جنگ ___ بازی می کندEDIT:It seems it has codes converted to strings with double slashes like b'\xd8' but print() may display it as single slash  but print(repr()) may show it with double slashes.It may need more decode/encode to convert it correctly.text = "b'xNeed:PersonX \xd8\xaf\xd8\xb1 \xd8\xac\xd9\x86\xda\xaf'"print(repr(text))print(text)text = text[2:-1]text = text.encode('raw_unicode_escape')text = text.decode('unicode_escape')text = text.encode('raw_unicode_escape')text = text.decode()print(text)

Advertisement

Answer