Skip to content
Advertisement

How do I find duplicates and print the line in a text file?

I am trying to go through a .txt and find duplicate names in the file, but also print out the line that duplicate is located on.

This is what I have so far for finding the duplicate names:

from collections import Counter

d = 0
with open('handles.txt') as f:
    c = Counter(c.strip() for c in f if c.strip())
    for line in c:
        if c[line]>1:
            print(line)
            d += 1
    if d >= 1:
        print("Duplicates found:",d)
    else:
        print("No duplicates found, all ready to go!")

But I am unsure of how to show what line they are located on. I read about finding a keyword in file but I don’t know how to put it in this code, I tried:

from collections import Counter
d = 0
with open('handles.txt') as f:
    c = Counter(c.strip() for c in f if c.strip())
    for line in c:
        if c[line]>1:
            print(line)
            d += 1
            for num, line in enumerate(f, 1):
                print("Found at line: ",num)
    if d >= 1:
        print("Duplicates found:",d)
    else:
        print("No duplicates found, all ready to go!")

And that just printed out the duplicates.

Advertisement

Answer

Just add the enumerate to the place where you are reading the line in the first place.

from collections import defaultdict

count = defaultdict(int)
d = 0
with open('handles.txt') as f:
    for lineno, line in enumerate(f):
        c = line.strip()
        if c:
            if c in count:
                print(lineno, "is a duplicate")
                d += 1
            count[c] += 1
if d >= 1:
    print("Duplicates found:",d)
else:
    print("No duplicates found, all ready to go!")

This just collects duplicates while reading the file, so we avoid looping over the same data several times. If you want to keep track of where the original occurrence was, that will be easy too (just add another dict where the key is the c value and the value is the line number where it was first seen).

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement