среда, 16 ноября 2016 г.

Computing GC Content (ROSALIND GC)

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).
Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

It was much easier to solve this with regular expressions, no cyrcles.
It doesn't work without from __future__ import division because I have python2.7.

from __future__ import division
import re

def cg_content(s):
 CG = len(re.findall(r'C|G', s))
 AT = len(re.findall(r'A|T', s))
 ACGT = CG + AT
 return round(100*CG/ACGT, 6)

f = open('6.txt', 'r')
strings = re.findall(r'(Rosalind_.[0-9]+)\n(([A-Z]+\n)+)', f.read())
max_cg_content = {}
max_cg_content['id_string'] = ''
max_cg_content['string'] = ''
max_cg_content['cg_content'] = 0
for s in strings:
 string_cg_content = cg_content(s[1])
 if string_cg_content > max_cg_content['cg_content']:
  max_cg_content['id_string'] = s[0]
  max_cg_content['string'] = s[1]
  max_cg_content['cg_content'] = string_cg_content
print max_cg_content['id_string']
print max_cg_content['cg_content']

Комментариев нет:

Отправить комментарий