lis = [ [12,34,56],[45,78,334],[56,90,78],[12,34,56] ]
I want the result to be 2 since number of duplicate lists are 2 in total. How do I do that?
I have done something like this
count=0 for i in range(0, len(lis)-1): for j in range(i+1, len(lis)): if lis[i] == lis[j]: count+=1
But the count value is 1 as it returns matched lists. How do I get the total number of duplicate lists?
Advertisement
Answer
Solution
You can use collections.Counter
if your sub-lists only contain numbers and therefore are hashable:
>>> from collections import Counter >>> lis = [[12, 34, 56], [45, 78, 334], [56, 90, 78], [12, 34, 56]] >>> sum(y for y in Counter(tuple(x) for x in lis).values() if y > 1) 2 >>> lis = [[12, 34, 56], [45, 78, 334], [56, 90, 78], [12, 34, 56], [56, 90, 78], [12, 34, 56]] >>> sum(y for y in Counter(tuple(x) for x in lis).values() if y > 1) 5
In Steps
Convert your sub-list into tuples:
tuple(x) for x in lis
Count them:
>>> Counter(tuple(x) for x in lis) Counter({(12, 34, 56): 3, (45, 78, 334): 1, (56, 90, 78): 2})
take only the values:
>>> Counter(tuple(x) for x in lis).values() dict_values([3, 1, 2])
Finally, sum only the ones that have a count greater than 1:
> sum(y for y in Counter(tuple(x) for x in lis).values() if y > 1) 5
Make it Re-usable
Put it into a function, add a docstring, and a doc test:
"""Count duplicates of sub-lists. """ from collections import Counter def count_duplicates(lis): """Count duplicates of sub-lists. Assumption: Sub-list contain only hashable elements. Result: If a sub-list appreas twice the result is 2. If a sub-list aprears three time and a other twice the result is 5. >>> count_duplicates([[12, 34, 56], [45, 78, 334], [56, 90, 78], ... [12, 34, 56]]) 2 >>> count_duplicates([[12, 34, 56], [45, 78, 334], [56, 90, 78], ... [12, 34, 56], [56, 90, 78], [12, 34, 56]]) ... 5 """ # Make it a bit more verbose than necessary for readability and # educational purposes. tuples = (tuple(elem) for elem in lis) counts = Counter(tuples).values() return sum(elem for elem in counts if elem > 1) if __name__ == '__main__': import doctest doctest.testmod(verbose=True)
Run the test:
python count_dupes.py Trying: count_duplicates([[12, 34, 56], [45, 78, 334], [56, 90, 78], [12, 34, 56]]) Expecting: 2 ok Trying: count_duplicates([[12, 34, 56], [45, 78, 334], [56, 90, 78], [12, 34, 56], [56, 90, 78], [12, 34, 56]]) Expecting: 5 ok 1 items had no tests: __main__ 1 items passed all tests: 2 tests in __main__.count_duplicates 2 tests in 2 items. 2 passed and 0 failed. Test passed.