Skip to content
Advertisement

How can I find all common sub strings using Rust , Python, javascript?

Problem Background

Git is an awesome version control system, I want learn git by writing my own version control system. The first step I have to do is implement a string diff tool. I’ve read this blog and this paper. In order to get the diff of two strings, I need to locate the common part. Hence, I came into the problem: How can I find all common sub strings of two strings?

This is first part of my problem:The algorithm problem.

This is the algorithm I am using:

Algorithm Problem

【Problem】Find all common sub strings of string1 and string2.

【Solution】

  1. compare all sub string of string1 with string2 then gather matches into answer.
  2. compare all sub string of string2 with string1 then gather matches into answer.

This algorithm is O(N^2) time complex.

The language Problem

To proof my thought, I can easily implement it with Python:

def commonSubstringFinder(string1, string2):
    answer=[]
    len1, len2 = len(string1), len(string2)
    match = ""
    x,y = 0,0
    for i in range(len2):
        for j in range(len1):
            if ( i+j < len2) and (string1[j] == string2[i+j]):
                if len(match)==0:
                    x,y=i+j,j
                match += string1[j]
            else:
                if len(match)>0:
                    answer.append(((x,y), match))
                    match=""
    
    for i in range(1,len1):
        for j in range(len2):
            if (i+j<len1 and string1[i+j] == string2[j] ):
                if len(match)==0:
                    x,y=j,i+j
                match += string2[j]
            else:
                if len(match)>0:
                    answer.append(((x,y), match))
                    match=""
                
    return answer

print(commonSubstringFinder("apkleses", "appleses"))
print(commonSubstringFinder("cappleses", "caplekses"))

# [((0, 0), 'ap'), ((3, 3), 'leses'), ((2, 1), 'p'), ((6, 4), 'es'), ((4, 6), 'es')]
# [((0, 0), 'cap'), ((6, 6), 'ses'), ((7, 5), 'es'), ((2, 3), 'ple'), ((6, 8), 's'), ((4, 7), 'e')]

However, It’s really hard for me to migrate this algrithom by using Rust Programming language. The problem is Rust programming language always uses iter(say string1.chars()) to get chars instead of index.

Any 🦀 can help me?

Conclusion

I found an interesting thing when I solve this problem: How Do biologists find the common part of two DNA or RNA?

Advertisement

Answer

Update on 2021-10-27

I find Google’s awesome repo:

👉 https://github.com/google/diff-match-patch

But there is no rust language 🦀.

Old answer

Thanks for your reply. I’ve found a simple solution here.

Welcome to upgrade it 😘:

#![allow(unused)]

#[derive(Debug)]
struct Common(u32, String);

pub fn common(old_content: &str, new_content: &str) {
    let mut commons: Vec<Common> = vec![];

    let mut sub = String::new();
    // cdx is the start index of common substring in old content.
    let mut cdx = 0u32;
    let mut idx = 0u32; // inner loop counter
    let mut odx = 0u32; // outer loop counter
    let mut new_iter = new_content.chars().peekable();
    while (!new_iter.peek().is_none()) {
        for (old_ch, new_ch) in old_content.chars().zip(new_iter.clone()) {
            if old_ch == new_ch {
                if sub.is_empty() {
                    cdx = idx;
                }
                sub.push(old_ch);
            } else {
                if sub.len() > 0 {
                    commons.push(Common(cdx, sub.clone()));
                    sub.clear();
                }
            }
            idx += 1;
        }
        new_iter.next();
        odx += 1;
        idx = 0;
    }

    odx = 1;
    idx = 0;
    let mut old_iter = old_content.chars().skip(1).peekable();
    while (!old_iter.peek().is_none()) {
        for (new_ch, old_ch) in new_content.chars().zip(old_iter.clone()) {
            if old_ch == new_ch {
                if sub.is_empty() {
                    cdx = odx + idx;
                }
                sub.push(old_ch);
            } else {
                if sub.len() > 0 {
                    commons.push(Common(cdx, sub.clone()));
                    sub.clear();
                }
            }
            idx += 1;
        }
        old_iter.next();
        odx += 1;
        idx = 0;
    }
    println!("{:?}", commons);
}

#[cfg(test)]
mod tests {
    use self::super::*;
    #[test]
    fn test_common() {
        // test with print output: cargo test -- --nocapture
        common("apkleses", "appleses");
        common("cappleses", "caplekses");
    }
}

The solution may not be rusty, but it worked.

update on 2021-11-3

I find a non iterator solution:collect elements into a Vec.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement