I would like to generate a UUID v4 string starting from the uuid
import from the Python standard library.
I know I can cast a UUID to str
by doing str(uuid.uuid4())
, however I am trying to understand what the bytes in that class instance mean. While trying to decode those bytes I see all sorts of errors, either the string is not the one I expect, or an exception is thrown. I think these bytes are UTF-16 encoded as per documentation here https://docs.python.org/3/library/uuid.html#uuid.UUID.bytes
UUID instances have these read-only attributes:
UUID.bytes The UUID as a 16-byte string (containing the six integer fields in big-endian byte order).
However what I get from those fields is not the expected UUID I get when casting to str
, why is this happening?
>>> import uuid >>> my_uuid = uuid.uuid4() >>> str(my_uuid) '3f5017be-a314-4bb2-92c0-5135b47f8c45' >>> my_uuid.bytes.decode('latin1') '?Px17¾£x14K²x92ÀQ5´x7fx8cE' >>> my_uuid.bytes.decode('utf-8', 'ignore') '?Px17x14KQ5x7fE' >>> my_uuid.bytes.decode('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 3: invalid start byte >>> my_uuid.bytes.decode('utf-16') '倿븗ᒣ뉋삒㕑羴䖌' >>> my_uuid.bytes_le.decode('utf-16') 'ើ㽐ꌔ䮲삒㕑羴䖌'
Advertisement
Answer
Decoding bytes is for text not structures so do not try to decode them. To inspect the bytes, use .hex()
:
import uuid u = uuid.uuid4() print(u) print(u.bytes.hex(' ')) print([hex(n) for n in u.fields])
Output:
6ea36117-e3d1-464a-94ee-1571104650a5 6e a3 61 17 e3 d1 46 4a 94 ee 15 71 10 46 50 a5 ['0x6ea36117', '0xe3d1', '0x464a', '0x94', '0xee', '0x1571104650a5']
See this Raymond Chen “Old New Thing” blog article about GUIDS for more information and why the 6 integer fields are printed as 5.