The code mentioned below returns the expected output.
[(‘the’, 23135851162), (‘of’, 13151942776), (‘and’, 12997637966), (‘to’, 12136980858), (‘a’, 9081174698)]
from itertools import islice import pkg_resources from symspellpy import SymSpell sym_spell = SymSpell() dictionary_path = pkg_resources.resource_filename( "symspellpy", "frequency_dictionary_en_82_765.txt") sym_spell.load_dictionary(dictionary_path, 0, 1) # Print out first 5 elements to demonstrate that dictionary is # successfully loaded print(list(islice(sym_spell.words.items(), 5)))
But the next block of code returns an empty list.
from itertools import islice import pkg_resources from symspellpy import SymSpell sym_spell = SymSpell() dictionary_path = pkg_resources.resource_filename( "symspellpy", "frequency_dictionary_en_82_765.txt") sym_spell.load_bigram_dictionary(dictionary_path, 0, 2) # Print out first 5 elements to demonstrate that dictionary is # successfully loaded print(list(islice(sym_spell.bigrams.items(), 5)))
The expected output is:
[(‘abcs of’, 10956800), (‘aaron and’, 10721728), (‘abbott and’, 7861376), (‘abbreviations and’, 13518272), (‘aberdeen and’, 7347776)]
as per this page:
https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html
I will like to know the mistake that I made with the second section of code.
Advertisement
Answer
The second example given on the linked page and also in your question references the wrong data file. You have to refer the included bigram data file.
The doc explaining the examples shows the expected data formats for each example, and the formats are different. And yet, the two examples refer to the same datafile. This has to be wrong in one place or the other, and it is wrong in that the second example should refer to the bigram data file.
Here’s the complete code that works correctly:
from itertools import islice import pkg_resources from symspellpy import SymSpell sym_spell = SymSpell() dictionary_path = pkg_resources.resource_filename( "symspellpy", "frequency_bigramdictionary_en_243_342.txt") # << - fixed to refer to the bigram data file sym_spell.load_bigram_dictionary(dictionary_path, 0, 2) # Print out first 5 elements to demonstrate that dictionary is # successfully loaded print(list(islice(sym_spell.bigrams.items(), 5)))
Result:
[('abcs of', 10956800), ('aaron and', 10721728), ('abbott and', 7861376), ('abbreviations and', 13518272), ('aberdeen and', 7347776)]