Skip to content
Advertisement

Text difference algorithm

I need an algorithm that can compare two text files and highlight their difference and ( even better!) can compute their difference in a meaningful way (like two similar files should have a similarity score higher than two dissimilar files, with the word “similar” defined in the normal terms). It sounds easy to implement, but it’s not.

The implementation can be in c# or python.

Thanks.

Advertisement

Answer

In Python, there is difflib, as also others have suggested.

difflib offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:

def text_compare(text1, text2, isjunk=None):
    return difflib.SequenceMatcher(isjunk, text1, text2).ratio()
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement