Skip to content
Advertisement

Reading file and counting repeated elements automation

I need help to solve a problem quickly. At work I have a file that has 18 characters per line (there are more than 600 lines). I need to read each line and on each line read 6 characters at a time. And in these 6 characters I want to count how many times it is repeated in this file. Ex: ABCDEF – 4, ABDEDF – 1, AAAAAA – 10 … I was researching that Python could be my solution (if someone has a silver bullet with other languages ​​it is welcome). Sorry for not mastering programming to leave something that I started. But I saw that I can use something like this (Python):

from collections import Counter

with open('arq.txt') as f:
    occurrence= Counter(f.read().split())
print(occurrence) 

Advertisement

Answer

Is this what you need?

Given this file:

$ cat file.txt 
aaaaaabbbbbbcccccc
bbbbbbcccccceeeeee

Counter code:

import json
import textwrap
from collections import Counter

group_size = 6
res = Counter()

with open('file.txt') as f:
  for line in f:
    res.update({_: 1 for _ in textwrap.wrap(line, group_size)})

print(res)    
with open('results.json', 'w') as f:
  json.dump(dict(res), f)

Output:

Counter({'bbbbbb': 2, 'cccccc': 2, 'aaaaaa': 1, 'eeeeee': 1})

Result file:

$ cat results.json 
{"aaaaaa": 1, "bbbbbb": 2, "cccccc": 2, "eeeeee": 1}
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement