Skip to content
Advertisement

How to extract all occurrences of a JSON object that share a duplicate key:value pair?

I am writing a python script that reads a large JSON file containing data from an API, and iterates through all the objects. I want to extract all objects that have a specific matching/duplicate “key:value”, and save it to a separate JSON file.

Currently, I have it almost doing this, however the one flaw in my code that I cannot fix is that it skips the first occurrence of the duplicate object, and does not add it to my dupObjects list. I have an OrderedDict keeping track of unique objects, and a regular list for duplicate objects. I know this means that when I add the second occurrence, I must add the first (unique) object, but how would I create a conditional statement that only does this once per unique object?

This is my code at the moment:

import collections import OrderedDict
import json

with open('input.json') as data:
    data = json.load(data)

uniqueObjects = OrderedDict()
dupObjects = list()

for d in data:
    value = d["key"]

    if value in uniqueObjects:
        # dupObjects.append(uniqueObjects[hostname])
        dupHostnames.append(d)

    if value not in uniqueObjects:
        uniqueObjects[value] = d

with open('duplicates.json', 'w') as g:
    json.dump(dupObjects, g, indent=4)

Where you see that one commented line is where I tried to just add the object from the OrderedList to my list, but that causes it to add it as many times as there are duplicates. I only want it to add it one time.

Edit:

There are several unique objects that have duplicates. I’m looking for some conditional statement that can add the first occurrence of an object that has duplicates, once per unique object.

Advertisement

Answer

In this line you forgot .keys(), so you skip need values

if value in uniqueObjects.keys():

And this line

if value not in uniqueObjects.keys():

Edit #1

My mistake :) You need to add first duplicate object from uniqueObjects in first if

if value in uniqueObjects:
    if uniqueObjects[value] != -1:
        dupObjects.append(uniqueObjects[value])
        uniqueObjects[value] = -1
    dupHostnames.append(d)

Edit #2 Try this option, it will write only the first occurrence in duplicates

if value in uniqueObjects:
    if uniqueObjects[value] != -1:
        dupObjects.append(uniqueObjects[value])
        uniqueObjects[value] = -1
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement