I’ve a text file full of common misspellings and their corrections.
All misspellings, of the same intended word, should be on the same line.
I do have this somewhat done, but not for all misspellings of the same word.
misspellings_corpus.txt
(snippet):
I'de->I'd aple->apple appl->apple I'ed, I'ld, Id->I'd
Desired:
I'de, I'ed, I'ld, Id->I'd aple, appl->apple
template: wrong1, wrong2, wrongN->correct
Attempt:
lines = [] with open('/content/drive/MyDrive/Colab Notebooks/misspellings_corpus.txt', 'r') as fin: lines = fin.readlines() for this_idx, this_line in enumerate(lines): for comparison_idx, comparison_line in enumerate(lines): if this_idx != comparison_idx: if this_line.split('->')[1].strip() == comparison_line.split('->')[1].strip(): #...
correct_words = [l.split('->')[1].strip() for l in lines] correct_words
Advertisement
Answer
Store the correct spelling of your words as keys of a dictionary that maps to a set of possible misspellings of that word. The dict is intended for you to easilly find the word you’re trying to correct and the set is to avoid duplicates of the misspellings.
possible_misspellings = {} with open('my-file.txt') as file: for line in file: misspellings, word = line.split('->') word = word.strip() misspellings = set(m.strip() for m in misspellings.split(',')) if word in possible_misspellings: possible_misspellings[word].update(misspellings) else: possible_misspellings[word] = misspellings
Then you can iterate over your dictionary
with open('my-new-file.txt', 'w') as file: for word, misspellings in possible_misspellings.items(): line = ','.join(misspellings) + '->' + word + 'n' file.write(line)