Skip to content
Advertisement

Querying a Django model and comparing its fields for similarities or near duplicates

I have a model that records changes to other models in my database.

I would like to query for changes that only add a newline character — n.

My changes model looks like this:

class Change(models.Model):
    object_type = models.ForeignKey(
        ContentType,
        related_name="object_change_set",
        on_delete=models.CASCADE,
    )
    object_id = models.CharField(max_length=255, db_index=True)
    object = GenericForeignKey("object_type", "object_id")
    old_fields = models.JSONField(encoder=JSONEncoder, null=True, blank=True)  
    new_fields = models.JSONField(encoder=JSONEncoder, null=True, blank=True)  

Essentially, I want to find instances of Change where the difference between a value for a key in old_fields and new_fields is n.

Here’s what new_fields look like:

{
    "body": "Had a great time on Wednesday.",
    "created": "2022-06-15T19:49:06.784Z",
}

And similarly, old_fields:

{
    "body": "Had a great time on Wednesday.n",
    "created": "2022-06-15T19:49:06.784Z",
}

Note that new_fields and old_fields are both JSONFields.

Ultimately, I want to remove the Change instances that only add n to the body.

I realize that I could do this by iterating over the queryset and looking for these discrepancies, but I’m wondering if there’s a more elegant solution.

Advertisement

Answer

You can use the __startswith (along with an F() expression) and __endswith filters to check if the old_fields body starts with the new_fields body and ends with a :

from django.db.models import F

queryset = Change.objects.filter(
    old_fields__body__startswith=F("new_fields__body"), old_fields__body__endswith="n"
)

This would, however, still match if the old_fields body had any string between the end of the new_fields body and the ending line break (e.g. "Had a great time on Wednesday.aaaaaaaaaaaaaaaaaan"). To fix that, you can also annotate the queryset with a Length function and check if the old_fields body’s length is equal to Length("new_fields__body") + 1:

from django.db.models import F
from django.db.models.functions import Length

queryset = Change.objects.annotate(old_body_len=Length("old_fields__body")).filter(
    old_fields__body__startswith=F("new_fields__body"),
    old_fields__body__endswith="n",
    old_body_len=Length("new_fields__body") + 1,
)

This should work!

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement