I made a list of strings like this:
>>> example = ["","1","2","3","4","5","6","7","8","9","10","11"]
But if I try to print it, I get a very different looking result:
>>> print(example) ['x00', 'x01', 'x02', 'x03', 'x04', 'x05', 'x06', 'x07', '\8', '\9', 'x08', 't']
Why does this happen? Does the character have some special meaning here?
Advertisement
Answer
The backslash is used to escape special (unprintable) characters in string literals. n
is for newline, t
for tab, f
for a form-feed (rarely used) and several more exist.
When you give the string literal ""
you effectively denote a string with exactly one character which is the (unprintable) NUL character (a 0-byte). You can represent this as in string literals. The same goes for
1
(which is a 1-byte in a string) etc.
Actually, the 8
and 9
are different because after a backslash you have to denote the value of the byte you want in octal notation, e. g. using digits 0
… 7
only. So effectively, the backslash before the 8
and before the 9
has no special meaning and 8
results in two characters, namely the backslash verbatim and the 8
as a digit verbatim.
When you now print the representation of such a string literal (e. g. by having it in a list you print), then the Python interpreter recreates a representation for the internal string (which is supposed to look like a string literal). This is not the string contents, but the version of the string as you can denote it in a Python program, i. e. enclosed in quotes and using backslashes to escape special characters. The Python interpreter doesn’t represent special characters using the octal notation, though. It uses the hexadecimal notation instead which introduces each special character with a x
followed by exactly two hexadecimal characters.
That means that becomes
x00
, 1
becomes x01
etc. The 8
, as mentioned, is in fact the representation of two characters, namely the backslash and the digit 8
. The backslash is then escaped by the Python interpreter to a double backslash \
, and the 8
is appended as normal character.
The input 10
is the character with value 8 (because octal 10 is decimal 8 and also hexadecimal 8, look up octal and hexadecimal numbers to learn about that). So the input 10
becomes x08
. The 11
is the character with value 9 which is a tab character for which a special notation exists, that is t
.