After reading this blog post https://blog.starkandwayne.com/2015/05/23/uuid-primary-keys-in-postgresql/
I wanted to know more about how Django generates uuid because I am using them as my pk. Well, according to the docs, https://docs.djangoproject.com/es/1.9/ref/models/fields/#uuidfield, Django is relying on the Python UUID module https://docs.python.org/3/library/uuid.html#uuid.UUID. But there are many kinds of UUID, and it is not at all clear to me which one is being generated in Django, or how to chose, assuming a choice is available.
Finally, given the fragmentation issue pointed out in the blog post, and assuming uuid_generate_v1mc
is not available directly in Python or Django, is there a way to force them to use it?
Advertisement
Answer
How does Django and or Python generate a UUID in Postgresql?
But there are many kinds of UUID, and it is not at all clear to me which one is being generated in Django
When you use UUIDField
as a primary key in Django, it doesn’t generate a UUID one for you, you generate it yourself before you save the object
I don’t know if things have changed since, but last time I have used a UUIDField
, you had to specify the UUID value yourself (e.g. when you create the object, Django won’t let you save an object with a blank UUID and have the database generate one). Looking at the Django documentation samples reinforces my thought, because they provide a default=uuid.uuid4()
e.g. in the primary key.
class MyUUIDModel(models.Model): id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False) ^ |__ calls uuid.uuid4()
Which UUID version to choose
For a comparison of the properties of the different UUID versions please see this question: Which UUID version to use?
For a lot of applications, UUID4 is just fine
If you just want to generate a UUID and get on with your life, uuid.uuid4()
like the snippet above is just fine. UUID4 is a random UUID and the chances of a collision are so remote that you don’t really need to worry about, especially if you’re not generating a ton of them per second.
Finally, given the fragmentation issue pointed out in the blog post, and assuming
uuid_generate_v1mc
is not available directly in Python or Django, is there a way to force them to use it?
A Python UUID1 with random MAC address, like uuid-ossp
‘s uuid_generate_v1mc
The blog you linked mentions the use of UUID1. Python’s uuid.uuid1()
takes a parameter that is used instead of the default real hardware MAC address (48 bits). Because these random bits are the end of the UUID1, the first bits of the UUID1 can be sequential/timestamp-based to limit the index fragmentation.
So
uuid.uuid1(random_48_bits)
Should get you similar results as uuid_generate_v1mc
, which is a UUID1 with a random MAC address.
To generate a random 48 bits, as a dummy example we can use:
import random random_48_bits = random.randint(0, 2**48 - 1)
Try it:
>>> import uuid >>> import random >>> 2 ** 48 - 1 281474976710655 >>> uuid.uuid1(random.randint(0, 281474976710655)) UUID('c5ecbde1-cbf4-11e5-a759-6096cb89d9a5')
Now make a function out of it, and use it as the default
for your Django UUIDField
Custom UUIDs, and an example from Instagram
Note that it’s totally fine to come up with your custom UUID scheme, and use the available bits to encode information that can be useful to your application.
E.g. you may use a few bits to encode the country of a given user, a few bits with a timestamp, some bits for randomness etc.
You may want to read how Instagram (built on Django and PostgreSQL) cooked up their own UUID scheme to help with sharding.