I am doing a bioinformatics course and I am trying to write a function to find all occurrences of a substring within a string.
def find_match(s, t): """Returns a list of all positions of a substring t in string s. Takes two arguments: s & t. """ occurrences = [] for i in range(len(s)-len(t)+1): # loop over alignment match = True for j in range(len(t)): # loop over characters if s[i+j] != t[j]: # compare characters match = False # mismatch break if match: # allchars matched occurrences.append(i) return(occurrences) print(find_match("GATATATGCATATACTT", "ATAT")) # [1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 9, 9, 9, 9, 11, 11, 11, 13] print(find_match("AUGCUUCAGAAAGGUCUUACG", "U")) # [1, 4, 5, 14, 16, 17]
The output above should exactly match the following:
[2, 4, 10]
[2, 5, 6, 15, 17, 18]
How can I fix this? Preferably without using regular expressions.
Advertisement
Answer
It looks like you badly indented the code, the
if match:
has to be outside of the inner cycle.
def find_match(s, t): """Returns a list of all positions of a substring t in string s. Takes two arguments: s & t. """ occurrences = [] for i in range(len(s)-len(t)+1): # loop over alignment match = True for j in range(len(t)): # loop over characters if s[i+j] != t[j]: # compare characters match = False # mismatch break if match: # <--- This shouldn't be inside the inner for cycle occurrences.append(i + 1) return occurrences print(find_match("GATATATGCATATACTT", "ATAT")) # [1, 1, 1, 1, 3, 3, 3, 3, 5, 5, 9, 9, 9, 9, 11, 11, 11, 13] print(find_match("AUGCUUCAGAAAGGUCUUACG", "U")) # [1, 4, 5, 14, 16, 17]