среда, 28 декабря 2016 г.

Finding a Shared Motif (ROSALIND LCSM)

Given: A collection of k (k100   ) DNA strings of length at most 1 kbp each in FASTA format. Return: A longest common substring of the collection. (If multiple solutions exist, you may return any single solution.)

import re

def main():
 print MainFunc()

def MainFunc():
 f = open('17.txt', 'r')
 string = ''
 strings = []
 for line in f:
  if line[0] == '>':
   if string != '':
    string = string.replace('\n', '')
    strings.append(string)
    string = ''
  else:
   string = string + line
 if string != '':
    string = string.replace('\n', '')
    strings.append(string)
 global s1, s2
 s1 = strings.pop()
 s2 = strings.pop()
 m = []
 m.append(['A', 'C', 'G', 'T'])
 pre_motifs = pre_motif(m)
 len_pre_motifs = len(pre_motifs)
 i = len_pre_motifs - 1
 suit = 1
 while i >= 0:
  for motif in pre_motifs[i]:
   for s in strings:
    if not re.search(motif, s):
     suit = 0
     break
   if suit == 1:
    return motif
  i -= 1
 return 'Found nothing.'

def pre_motif(m):
 letters = ['A', 'C', 'G', 'T']
 lenm = len(m)
 new_m_arr = []
 for head in m[lenm - 1]:
  for letter in letters:
   new_m = head + letter
   match_s1 = re.search(new_m, s1)
   if match_s1:
    match_s2 = re.search(new_m, s2)
    if match_s2:
     new_m_arr.append(new_m)
 if len(new_m_arr) > 0:
  m.append(new_m_arr)
  return pre_motif(m)
 else:
  return m


if __name__ == "__main__":
 main()

Комментариев нет:

Отправить комментарий