Skip to content
Advertisement

What is the difference between a string and a byte string?

I am working with a library which returns a “byte string” (bytes) and I need to convert this to a string.

Is there actually a difference between those two things? How are they related, and how can I do the conversion?

Advertisement

Answer

Assuming Python 3 (in Python 2, this difference is a little less well-defined) – a string is a sequence of characters, ie unicode codepoints; these are an abstract concept, and can’t be directly stored on disk. A byte string is a sequence of, unsurprisingly, bytes – things that can be stored on disk. The mapping between them is an encoding – there are quite a lot of these (and infinitely many are possible) – and you need to know which applies in the particular case in order to do the conversion, since a different encoding may map the same bytes to a different string:

>>> b'xcfx84oxcfx81xcexbdoxcfx82'.decode('utf-16')
'蓏콯캁澽苏'
>>> b'xcfx84oxcfx81xcexbdoxcfx82'.decode('utf-8')
'τoρνoς'

Once you know which one to use, you can use the .decode() method of the byte string to get the right character string from it as above. For completeness, the .encode() method of a character string goes the opposite way:

>>> 'τoρνoς'.encode('utf-8')
b'xcfx84oxcfx81xcexbdoxcfx82'
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement