How do I remove duplicates from a list, while preserving order? Using a set to remove duplicates destroys the original order. Is there a built-in or a Pythonic idiom?
Advertisement
Answer
Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark
Fastest one:
def f7(seq): seen = set() seen_add = seen.add return [x for x in seq if not (x in seen or seen_add(x))]
Why assign seen.add
to seen_add
instead of just calling seen.add
? Python is a dynamic language, and resolving seen.add
each iteration is more costly than resolving a local variable. seen.add
could have changed between iterations, and the runtime isn’t smart enough to rule that out. To play it safe, it has to check the object each time.
If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/
O(1) insertion, deletion and member-check per operation.
(Small additional note: seen.add()
always returns None
, so the or
above is there only as a way to attempt a set update, and not as an integral part of the logical test.)