t = '010's = '0111010100001'print(s.find(t)) - first position
Common problem: we have string s (usually very long) and string t (usually short). Find positions, where t is a substring of s.
Example: for s = 'TATGCAATGC' and t = 'TGC' positions are: 2, 7.
Trere are several algorithms to find them: wikipedia.
I'll show here two easiest ways:
- naive algorithm
- knuth-morris-pratt algorithm
def substring_positions(s,t): res = [] for i in range(len(s) - len(t) + 1): flag = True for j in range(len(t)): if s[i + j] != t[j]: flag = False break if flag: res.append(str(i)) return res
knuth-morris-pratt algorithm:
def kmp(s,t, limit = 1): # building prefix table pt for s # it's i-th element shows, how many steps go backward on s # if there will be a mismatch with t string pt = [0] for m in s[1:]: j = pt[-1] while j > 0 and m != s[j]: j = pt[j-1] if m == s[j]: j += 1 pt.append(j) # walk through the s res = [] j = 0 # number or symbols matched with t for i in range(len(s)): while j > 0 and s[i] != t[j]: j = pt[j - 1] if s[i] == t[j]: j += 1 if j == len(t): res.append(i - j + 1) if len(res) == limit: return res j = 0 return res
Комментариев нет:
Отправить комментарий