Skip to content
Advertisement

Preventing reference re-use during deepcopy

Consider the following example:

from copy import deepcopy

item = [0]
orig = [item, item]
copy = deepcopy(orig)

orig[0][0] = 1
print(f"{orig=} {copy=}")

copy[0][0] = 2
print(f"{orig=} {copy=}")

The first print outputs what I would expect because the same reference is duplicated in the list.

orig=[[1], [1]] copy=[[0], [0]]

However, the second print surprised me.

orig=[[1], [1]] copy=[[2], [2]]

I would have expected the deepcopy to end up with two independent references inside the copy list. Instead it maintains the property of a single list reference duplicated. I’m guessing that’s alluded to in this part of the docs:

A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.

I see that the deepcopy function has a memo argument. Is there anything interesting that could be done with this argument to prevent the duplicated reference, such that the final output would become:

orig=[[1], [1]] copy=[[2], [0]]

Advertisement

Answer

If your whole point is to copy data that could come from JSON, i.e. list, dict, string, numbers, bool, then you can trivially implement your own function:

def copy_jsonlike(data):
    if isinstance(data, list):
        return [copy_jsonlike(x) for x in data]
    elif isinstance(data, dict):
        return {k: copy_jsonlike(v) for k,v in data.items()}
    else:
        return data

It has the added bonus of probably being faster than copy.deepcopy

Or, your original solution, json.loads(json.dumps(data)) isn’t a bad idea either.

Advertisement