Skip to content
Advertisement

Decode UUID 4 as a Python string

I would like to generate a UUID v4 string starting from the uuid import from the Python standard library.

I know I can cast a UUID to str by doing str(uuid.uuid4()), however I am trying to understand what the bytes in that class instance mean. While trying to decode those bytes I see all sorts of errors, either the string is not the one I expect, or an exception is thrown. I think these bytes are UTF-16 encoded as per documentation here https://docs.python.org/3/library/uuid.html#uuid.UUID.bytes

UUID instances have these read-only attributes:

UUID.bytes The UUID as a 16-byte string (containing the six integer fields in big-endian byte order).

However what I get from those fields is not the expected UUID I get when casting to str, why is this happening?

>>> import uuid
>>> my_uuid = uuid.uuid4()
>>> str(my_uuid)
'3f5017be-a314-4bb2-92c0-5135b47f8c45'
>>> my_uuid.bytes.decode('latin1')
'?Px17¾£x14K²x92ÀQ5´x7fx8cE'
>>> my_uuid.bytes.decode('utf-8', 'ignore')
'?Px17x14KQ5x7fE'
>>> my_uuid.bytes.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 3: invalid start byte
>>> my_uuid.bytes.decode('utf-16')
'倿븗ᒣ뉋삒㕑羴䖌'
>>> my_uuid.bytes_le.decode('utf-16')
'ើ㽐ꌔ䮲삒㕑羴䖌'

Advertisement

Answer

Decoding bytes is for text not structures so do not try to decode them. To inspect the bytes, use .hex():

import uuid
u = uuid.uuid4()
print(u)
print(u.bytes.hex(' '))
print([hex(n) for n in u.fields])

Output:

6ea36117-e3d1-464a-94ee-1571104650a5
6e a3 61 17 e3 d1 46 4a 94 ee 15 71 10 46 50 a5
['0x6ea36117', '0xe3d1', '0x464a', '0x94', '0xee', '0x1571104650a5']

See this Raymond Chen “Old New Thing” blog article about GUIDS for more information and why the 6 integer fields are printed as 5.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement