I’m using Spacy and I am looking for a program that counts the frequencies of each word in a text, and output each word with its count and sentence numbers where it appears. Sample input
JavaScript
x
2
1
Python is cool. But Ocaml is cooler since it is purely functional.
2
Sample output
JavaScript
1
11
11
1
1 Python 1
2
3 is 1 2
3
1 cool 1
4
1 But 2
5
1 Ocaml 2
6
1 cooler 2
7
1 since 2
8
1 it 2
9
1 purely 2
10
1 functional 2
11
Advertisement
Answer
I would split the sentence into words and create a dictionary with each key being a word in the text, like so:
JavaScript
1
14
14
1
text = "Python is cool. But Ocaml is cooler since it is purely functional."
2
specialSymbols = '.,;:'
3
words = [[word.strip(specialSymbols) for word in sentence.split(' ')] for sentence in text.split('. ')]
4
d = {word: [0, []] for sentence in words for word in sentence}
5
6
for i, sentence in enumerate(words):
7
for word in sentence:
8
d[word][0] += 1
9
if i + 1 not in d[word][1]:
10
d[word][1].append(i + 1)
11
12
for key, val in d.items():
13
print(f'{val[0]} {key} {" ".join([str(i) for i in val[1]])}')
14