I need an algorithm that can compare two text files and highlight their difference and ( even better!) can compute their difference in a meaningful way (like two similar files should have a similarity score higher than two dissimilar files, with the word “similar” defined in the normal terms). It sounds easy to implement, but it’s not.
The implementation can be in c# or python.
Thanks.
Advertisement
Answer
In Python, there is difflib, as also others have suggested.
difflib
offers the SequenceMatcher class, which can be used to give you a similarity ratio. Example function:
def text_compare(text1, text2, isjunk=None): return difflib.SequenceMatcher(isjunk, text1, text2).ratio()