Skip to content
Advertisement

How do I approximate search multiple terms in a string in JS?

I have a string S and a list of strings allItems, allItems contains strings that may have common “sub-words” but one element is never an extension of another:

//good, they both contain Fuzzy but Fuzzy isn't in allItems
const allItems = "FuzzyBunny", "FuzzyBear"

//not good BCS allItems[i] = whatever1+allItems[j]+whatever2
const allItems = "Fuzzy", "FuzzyBear", "FuzzyBunny" , "FuzzyBearBunny"

My goal is to find every match or approximate match of a string in allItems in S alongside their index(can be start or end, or ideally both). I’ve been searching for some algorithms to do this, similar to the aho-corasick algorithm, but that doesn’t do approximate string matching.

Example:

S = "I love FuzzyBears and FuzzyDucks"
allItems = ["FuzzyBear", "FuzzyDuck"]

->

[
 { match: "FuzzyBear", matchIndex: 0, startIndex: 7, endIndex: 16 },
 { match: "FuzzyDuck", matchIndex: 1, startIndex: 22, endIndex: 31 }
]

I’m still pretty new to pattern matching so I’d appreciate some resources on how to code any of the recommended algorithms.

UPDATE: I’ve found a Fuzzified Aho-Corasick automata as described here, but I have very little idea on how to implement this in JS.I also have no issues with the code being slow as I use this for a one time run, and rarely need to do this often.

Advertisement

Answer

Python version (fuzzy-matching) [splits string by spaces]:

import difflib

s = "I love FuzzyBears and FuzzyDucks"
allitems = ["FuzzyBear", "FuzzyDuck"]

lst = []
for x in allitems:
  d = {}
  match = difflib.get_close_matches(x, s.split())
  if match:
    d["match"] = x
    d["matchIndex"] = allitems.index(x)
    d["startIndex"] = s.find(x)
    d["endIndex"] = d["startIndex"] + len(x)
  lst.append(d)
print(lst)

>>>
[{'match': 'FuzzyBear', 'matchIndex': 0, 'startIndex': 7, 'endIndex': 16}, {'match': 'FuzzyDuck', 'matchIndex': 1, 'startIndex': 22, 'endIndex': 31}]

Python version (fuzzy-matching) [split by all possible substrings]:

import difflib
from itertools import combinations

s = "I love FuzzyBears and FuzzyDucks"
allitems = ["FuzzyBear", "FuzzyDuck"]

lst = []
for x in allitems:
  d = {}
  match = difflib.get_close_matches(x, [x[i:j] for i, j in combinations(range(len(x) + 1), r=2)])
  if match:
    d["match"] = x
    d["matchIndex"] = allitems.index(x)
    d["startIndex"] = s.find(x)
    d["endIndex"] = d["startIndex"] + len(x)
  lst.append(d)
print(lst)

>>>
[{'match': 'FuzzyBear', 'matchIndex': 0, 'startIndex': 7, 'endIndex': 16}, {'match': 'FuzzyDuck', 'matchIndex': 1, 'startIndex': 22, 'endIndex': 31}]

After a bunch of online searching:

JS version (exact-matching):

const s = "I love FuzzyBears and FuzzyDucks";
const allitems = ["FuzzyBear", "FuzzyDuck"];
const lst = [];
for (const i of allitems) {
  const x = {};
  if (s.includes(i)) {
    x["match"] = i;
    x["matchIndex"] = allitems.indexOf(i);
    x["startIndex"] = s.indexOf(i);
    x["endIndex"] =  s.indexOf(i) + i.length;
   }
  lst.push(x);
}
console.log(lst);

>>>
[ { match: 'FuzzyBear', matchIndex: 0, startIndex: 7, endIndex: 16 },
  { match: 'FuzzyDuck',
    matchIndex: 1,
    startIndex: 22,
    endIndex: 31 } ]
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement