Skip to content
Advertisement

Returns an empty list instead of bigrams

The code mentioned below returns the expected output.

[(‘the’, 23135851162), (‘of’, 13151942776), (‘and’, 12997637966), (‘to’, 12136980858), (‘a’, 9081174698)]

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
sym_spell.load_dictionary(dictionary_path, 0, 1)

# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.words.items(), 5)))

But the next block of code returns an empty list.

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
sym_spell.load_bigram_dictionary(dictionary_path, 0, 2)

# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.bigrams.items(), 5)))

The expected output is:

[(‘abcs of’, 10956800), (‘aaron and’, 10721728), (‘abbott and’, 7861376), (‘abbreviations and’, 13518272), (‘aberdeen and’, 7347776)]

as per this page:

https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html

I will like to know the mistake that I made with the second section of code.

Advertisement

Answer

The second example given on the linked page and also in your question references the wrong data file. You have to refer the included bigram data file.

The doc explaining the examples shows the expected data formats for each example, and the formats are different. And yet, the two examples refer to the same datafile. This has to be wrong in one place or the other, and it is wrong in that the second example should refer to the bigram data file.

Here’s the complete code that works correctly:

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_bigramdictionary_en_243_342.txt") # << - fixed to refer to the bigram data file
sym_spell.load_bigram_dictionary(dictionary_path, 0, 2)

# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.bigrams.items(), 5)))

Result:

[('abcs of', 10956800), ('aaron and', 10721728), ('abbott and', 7861376), ('abbreviations and', 13518272), ('aberdeen and', 7347776)]
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement