Problem Background
Git is an awesome version control system, I want learn git by writing my own version control system. The first step I have to do is implement a string diff tool. I’ve read this blog and this paper. In order to get the diff of two strings, I need to locate the common part. Hence, I came into the problem: How can I find all common sub strings of two strings?
This is first part of my problem:The algorithm problem.
This is the algorithm I am using:
Algorithm Problem
【Problem】Find all common sub strings of string1 and string2.
【Solution】
- compare all sub string of string1 with string2 then gather matches into answer.
- compare all sub string of string2 with string1 then gather matches into answer.
This algorithm is O(N^2) time complex.
The language Problem
To proof my thought, I can easily implement it with Python:
def commonSubstringFinder(string1, string2): answer=[] len1, len2 = len(string1), len(string2) match = "" x,y = 0,0 for i in range(len2): for j in range(len1): if ( i+j < len2) and (string1[j] == string2[i+j]): if len(match)==0: x,y=i+j,j match += string1[j] else: if len(match)>0: answer.append(((x,y), match)) match="" for i in range(1,len1): for j in range(len2): if (i+j<len1 and string1[i+j] == string2[j] ): if len(match)==0: x,y=j,i+j match += string2[j] else: if len(match)>0: answer.append(((x,y), match)) match="" return answer print(commonSubstringFinder("apkleses", "appleses")) print(commonSubstringFinder("cappleses", "caplekses")) # [((0, 0), 'ap'), ((3, 3), 'leses'), ((2, 1), 'p'), ((6, 4), 'es'), ((4, 6), 'es')] # [((0, 0), 'cap'), ((6, 6), 'ses'), ((7, 5), 'es'), ((2, 3), 'ple'), ((6, 8), 's'), ((4, 7), 'e')]
However, It’s really hard for me to migrate this algrithom by using Rust Programming language. The problem is Rust programming language always uses iter(say string1.chars()
) to get chars instead of index.
Any 🦀 can help me?
Conclusion
I found an interesting thing when I solve this problem: How Do biologists find the common part of two DNA or RNA?
Advertisement
Answer
Update on 2021-10-27
I find Google’s awesome repo:
👉 https://github.com/google/diff-match-patch
But there is no rust language 🦀.
Old answer
Thanks for your reply. I’ve found a simple solution here.
Welcome to upgrade it 😘:
#![allow(unused)] #[derive(Debug)] struct Common(u32, String); pub fn common(old_content: &str, new_content: &str) { let mut commons: Vec<Common> = vec![]; let mut sub = String::new(); // cdx is the start index of common substring in old content. let mut cdx = 0u32; let mut idx = 0u32; // inner loop counter let mut odx = 0u32; // outer loop counter let mut new_iter = new_content.chars().peekable(); while (!new_iter.peek().is_none()) { for (old_ch, new_ch) in old_content.chars().zip(new_iter.clone()) { if old_ch == new_ch { if sub.is_empty() { cdx = idx; } sub.push(old_ch); } else { if sub.len() > 0 { commons.push(Common(cdx, sub.clone())); sub.clear(); } } idx += 1; } new_iter.next(); odx += 1; idx = 0; } odx = 1; idx = 0; let mut old_iter = old_content.chars().skip(1).peekable(); while (!old_iter.peek().is_none()) { for (new_ch, old_ch) in new_content.chars().zip(old_iter.clone()) { if old_ch == new_ch { if sub.is_empty() { cdx = odx + idx; } sub.push(old_ch); } else { if sub.len() > 0 { commons.push(Common(cdx, sub.clone())); sub.clear(); } } idx += 1; } old_iter.next(); odx += 1; idx = 0; } println!("{:?}", commons); } #[cfg(test)] mod tests { use self::super::*; #[test] fn test_common() { // test with print output: cargo test -- --nocapture common("apkleses", "appleses"); common("cappleses", "caplekses"); } }
The solution may not be rusty, but it worked.
update on 2021-11-3
I find a non iterator solution:collect elements into a Vec.