Problem Background
Git is an awesome version control system, I want learn git by writing my own version control system. The first step I have to do is implement a string diff tool. I’ve read this blog and this paper. In order to get the diff of two strings, I need to locate the common part. Hence, I came into the problem: How can I find all common sub strings of two strings?
This is first part of my problem:The algorithm problem.
This is the algorithm I am using:
Algorithm Problem
【Problem】Find all common sub strings of string1 and string2.
【Solution】
- compare all sub string of string1 with string2 then gather matches into answer.
- compare all sub string of string2 with string1 then gather matches into answer.
This algorithm is O(N^2) time complex.
The language Problem
To proof my thought, I can easily implement it with Python:
def commonSubstringFinder(string1, string2):
answer=[]
len1, len2 = len(string1), len(string2)
match = ""
x,y = 0,0
for i in range(len2):
for j in range(len1):
if ( i+j < len2) and (string1[j] == string2[i+j]):
if len(match)==0:
x,y=i+j,j
match += string1[j]
else:
if len(match)>0:
answer.append(((x,y), match))
match=""
for i in range(1,len1):
for j in range(len2):
if (i+j<len1 and string1[i+j] == string2[j] ):
if len(match)==0:
x,y=j,i+j
match += string2[j]
else:
if len(match)>0:
answer.append(((x,y), match))
match=""
return answer
print(commonSubstringFinder("apkleses", "appleses"))
print(commonSubstringFinder("cappleses", "caplekses"))
# [((0, 0), 'ap'), ((3, 3), 'leses'), ((2, 1), 'p'), ((6, 4), 'es'), ((4, 6), 'es')]
# [((0, 0), 'cap'), ((6, 6), 'ses'), ((7, 5), 'es'), ((2, 3), 'ple'), ((6, 8), 's'), ((4, 7), 'e')]
However, It’s really hard for me to migrate this algrithom by using Rust Programming language. The problem is Rust programming language always uses iter(say string1.chars()
) to get chars instead of index.
Any 🦀 can help me?
Conclusion
I found an interesting thing when I solve this problem: How Do biologists find the common part of two DNA or RNA?
Advertisement
Answer
Update on 2021-10-27
I find Google’s awesome repo:
👉 https://github.com/google/diff-match-patch
But there is no rust language 🦀.
Old answer
Thanks for your reply. I’ve found a simple solution here.
Welcome to upgrade it 😘:
#![allow(unused)]
#[derive(Debug)]
struct Common(u32, String);
pub fn common(old_content: &str, new_content: &str) {
let mut commons: Vec<Common> = vec![];
let mut sub = String::new();
// cdx is the start index of common substring in old content.
let mut cdx = 0u32;
let mut idx = 0u32; // inner loop counter
let mut odx = 0u32; // outer loop counter
let mut new_iter = new_content.chars().peekable();
while (!new_iter.peek().is_none()) {
for (old_ch, new_ch) in old_content.chars().zip(new_iter.clone()) {
if old_ch == new_ch {
if sub.is_empty() {
cdx = idx;
}
sub.push(old_ch);
} else {
if sub.len() > 0 {
commons.push(Common(cdx, sub.clone()));
sub.clear();
}
}
idx += 1;
}
new_iter.next();
odx += 1;
idx = 0;
}
odx = 1;
idx = 0;
let mut old_iter = old_content.chars().skip(1).peekable();
while (!old_iter.peek().is_none()) {
for (new_ch, old_ch) in new_content.chars().zip(old_iter.clone()) {
if old_ch == new_ch {
if sub.is_empty() {
cdx = odx + idx;
}
sub.push(old_ch);
} else {
if sub.len() > 0 {
commons.push(Common(cdx, sub.clone()));
sub.clear();
}
}
idx += 1;
}
old_iter.next();
odx += 1;
idx = 0;
}
println!("{:?}", commons);
}
#[cfg(test)]
mod tests {
use self::super::*;
#[test]
fn test_common() {
// test with print output: cargo test -- --nocapture
common("apkleses", "appleses");
common("cappleses", "caplekses");
}
}
The solution may not be rusty, but it worked.
update on 2021-11-3
I find a non iterator solution:collect elements into a Vec.