Естественные науки и программирование: Consensus and Profile (ROSALIND CONS)

суббота, 19 ноября 2016 г.

Consensus and Profile (ROSALIND CONS)

numpy helps!
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

import numpy as np
import re

f = open('11.txt', 'r')
strings = re.findall(r'(>Rosalind_[0-9]+)\n(([A-T]+\n)+)', f.read())
str_len = len(strings[0][1].replace('\n', ''))
profile = np.zeros((4, str_len))
for s in strings:
counter = 0
str_data = np.zeros((4, str_len))
st = s[1].replace('\n', '')
for i in st:
if i == 'A':
   str_data[0,counter] = 1
elif i == 'C':
   str_data[1,counter] = 1
elif i == 'G':
   str_data[2,counter] = 1
elif i == 'T':
   str_data[3,counter] = 1
counter += 1
profile = profile + str_data
consensus = ''
position = 0
while position < str_len:
column = profile[:, position]
column_max = column.max()
nucleotides = ['A', 'C', 'G', 'T']
i = 0
while i < 4:
if column[i] == column_max:
   consensus = consensus + nucleotides[i]
   break
i += 1
position += 1
print consensus
A_line = ''
C_line = ''
G_line = ''
T_line = ''
j = 0
while j < str_len:
A_line = A_line + str(int(profile[0, j])) + ' '
C_line = C_line + str(int(profile[1, j])) + ' '
G_line = G_line + str(int(profile[2, j])) + ' '
T_line = T_line + str(int(profile[3, j])) + ' '
j += 1
print 'A: ' + A_line
print 'C: ' + C_line
print 'G: ' + G_line
print 'T: ' + T_line

Естественные науки и программирование

суббота, 19 ноября 2016 г.

Consensus and Profile (ROSALIND CONS)

Комментариев нет:

Отправить комментарий

суббота, 19 ноября 2016 г.

Consensus and Profile (ROSALIND CONS)

Комментариев нет:

Отправить комментарий

суббота, 19 ноября 2016 г.