Skip to content
Advertisement

How can I read a byte array file of strings?

There is a file with following contents:

JavaScript

This is my try to read the lines and convert them to readable utf characters, but still it shows the same strings in the output file:

JavaScript

The output file is:

JavaScript

As you see, the problem exists for input line but not for target and prediction lines (however scrambled but that’s okay)

Advertisement

Answer

It seems someone wrote bytes in wrong way. Someone used str(bytes) instead of bytes.decode('utf-8'). Or maybe code was created for Python 2 which treats bytes and strings in different way then Python 3.


if you can correct code which write it then you have to fix text

JavaScript

crop b' '

JavaScript

convert back to bytes using special encoding 'raw_unicode_escape'

JavaScript

and convert to string correctly

JavaScript

And now

JavaScript

gives me

JavaScript

EDIT:

It seems it has codes converted to strings with double slashes like b'\xd8' but print() may display it as single slash but print(repr()) may show it with double slashes.

It may need more decode/encode to convert it correctly.

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement