I need help to solve a problem quickly. At work I have a file that has 18 characters per line (there are more than 600 lines). I need to read each line and on each line read 6 characters at a time. And in these 6 characters I want to count how many times it is repeated in this file. Ex: ABCDEF – 4, ABDEDF – 1, AAAAAA – 10 … I was researching that Python could be my solution (if someone has a silver bullet with other languages it is welcome). Sorry for not mastering programming to leave something that I started. But I saw that I can use something like this (Python):
from collections import Counter with open('arq.txt') as f: occurrence= Counter(f.read().split()) print(occurrence)
Advertisement
Answer
Is this what you need?
Given this file:
$ cat file.txt aaaaaabbbbbbcccccc bbbbbbcccccceeeeee
Counter code:
import json import textwrap from collections import Counter group_size = 6 res = Counter() with open('file.txt') as f: for line in f: res.update({_: 1 for _ in textwrap.wrap(line, group_size)}) print(res) with open('results.json', 'w') as f: json.dump(dict(res), f)
Output:
Counter({'bbbbbb': 2, 'cccccc': 2, 'aaaaaa': 1, 'eeeeee': 1})
Result file:
$ cat results.json {"aaaaaa": 1, "bbbbbb": 2, "cccccc": 2, "eeeeee": 1}