Skip to content
Advertisement

Get for each word the number of the sentences in which appears in a given text [closed]

I’m using Spacy and I am looking for a program that counts the frequencies of each word in a text, and output each word with its count and sentence numbers where it appears. Sample input

Python is cool. But Ocaml is cooler since it is purely functional.

Sample output

1 Python 1
3 is 1 2
1 cool 1
1 But 2
1 Ocaml 2
1 cooler 2
1 since 2
1 it 2
1 purely 2
1 functional 2

Advertisement

Answer

I would split the sentence into words and create a dictionary with each key being a word in the text, like so:

text = "Python is cool. But Ocaml is cooler since it is purely functional."
specialSymbols = '.,;:'
words = [[word.strip(specialSymbols) for word in sentence.split(' ')] for sentence in text.split('. ')]
d = {word: [0, []] for sentence in words for word in sentence}

for i, sentence in enumerate(words):
    for word in sentence:
        d[word][0] += 1
        if i + 1 not in d[word][1]:
            d[word][1].append(i + 1)

for key, val in d.items():
    print(f'{val[0]} {key} {" ".join([str(i) for i in val[1]])}')
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement