вторник, 17 января 2017 г.

Calculating Protein Mass (ROSALIND PRTM)

Given: A protein string P of length at most 1000 aa.

Return: The total weight of P. Consult the monoisotopic mass table.

def mass_table():
    mt = {}
    mt['A'] = 71.03711 
    mt['C'] = 103.00919 
    mt['D'] = 115.02694
    mt['E'] = 129.04259
    mt['F'] = 147.06841 
    mt['G'] = 57.02146 
    mt['H'] = 137.05891 
    mt['I'] = 113.08406 
    mt['K'] = 128.09496 
    mt['L'] = 113.08406 
    mt['M'] = 131.04049
    mt['N'] = 114.04293 
    mt['P'] = 97.05276 
    mt['Q'] = 128.05858 
    mt['R'] = 156.10111 
    mt['S'] = 87.03203 
    mt['T'] = 101.04768 
    mt['V'] = 99.06841 
    mt['W'] = 186.07931
    mt['Y'] = 163.06333 
    return mt

protein = input()
mt = mass_table()
mass = 0 
for aa in protein:
    mass += mt[aa]
print(round(mass,3))

Enumerating Gene Orders (ROSALIND PERM)

Given: A positive integer n7.

Return: The total number of permutations of length n, followed by a list of all such permutations (in any order).

copy.deepcopy() rules!

import copy
n = int(input())
listn = list(range(1, n+1))
results = []
for i in range(1, n+1):
    results.append([i])
for position in range(1, n):
    new_results = []
    for result in results:
        new_listn = [x for x in listn if x not in result]
        for element in new_listn:
            new_result = copy.deepcopy(result)
            new_result.append(element)
            new_results.append(new_result)
    results = new_results
print(len(new_results))
print(new_results)

воскресенье, 15 января 2017 г.

Open Reading Frames (ROSALIND ORF)

Given: A DNA string s of length at most 1 kbp in FASTA format.

Return: Every distinct candidate protein string that can be translated from ORFs of s. Strings can be returned in any order.

I use codon_table() function here.

def find_substr(find_what, find_where):
    res = []
    for i in range(len(find_where)-len(find_what) + 1):
        flag = bool(1)
        for j in range(len(find_what)):
            if find_where[i+j] != find_what[j]:
                flag = bool(0)
                break 
            if flag:
            res.append(i)
    return res

def protein_maker(line):
    proteins = []
    start_points = find_substr('ATG', line)
    protein_string = '' 
    for i in start_points:
        k = i
        while k < len(line) - 2:
            codon = line[k:k + 3]
            if codon == 'TAG' or codon == 'TGA' or codon == 'TAA':
                proteins.append(protein_string)
                protein_string = '' 
                break 
            else:
                protein_string += c_table[codon]
                k += 3     
    return proteins

f = open('orf.txt', 'r')
line = ''counter = 0for l in f:
    if counter != 0:
        line += l
    counter += 1line = line.replace('\n', '')

c_table = codon_table()

# consider reverse compliment
reversereverse_line = line[::-1]
#compliment
reverse_compliment_line = ''
for k in reverse_line:
    if k == 'A':
        reverse_compliment_line += 'T'     
    if k == 'T':
        reverse_compliment_line += 'A'     
    if k == 'C':
        reverse_compliment_line += 'G' 
    if k == 'G':
        reverse_compliment_line += 'C'
prts = protein_maker(line)
rev_prts = protein_maker(reverse_compliment_line)
proteins = list(set(prts)|set(rev_prts))
for i in proteins:
    print(i)

суббота, 14 января 2017 г.

Inferring mRNA from Protein (ROSALIND MRNA)

Given: A protein string of length at most 1000 aa.

Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)

Using my old function codon_table()

def rev_codon_table():
    c_t = codon_table()
    rev_c_t = {}
    for key in sorted(c_t.keys()):
        val = c_t[key]
        if val in rev_c_t:
            rev_c_t[val] += 1 
        else:
            rev_c_t[val] = 1     
    return rev_c_t

f = open('mrna.txt', 'r')
line = ''for l in f:
    line += l
line = line.replace('\n', '')

rct = rev_codon_table()
res = 3 #there are 3 stop-codons 
for s in line:
    res = res*rct[s]%1000000
print(res)

Finding a Protein Motif (ROSALIND MPRT)

How to read from web in Python 3 using the standard library:
  • urllib.request.urlopen
    import urllib.request
    response = urllib.request.urlopen('http://www.example.com/')
    html = response.read()
  • urllib.request.urlretrieve
    import urllib.request
    urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')
     
    
     
Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

import urllib.request
import os

filenames = []
sequences = []
f = open('mprt.txt', 'r')
for l in f:
    l = l.replace('\n', '')
    filename = l + '.txt'    filenames.append(l)
    urllib.request.urlretrieve('http://www.uniprot.org/uniprot/' + l + '.fasta', filename)
    f1 = open(filename, 'r')
    counter = 0    line = ''    for l1 in f1:
        if counter != 0:
            line += l1
        counter += 1    line = line.replace('\n', '')
    sequences.append(line)
    f1.close()
    os.remove(filename)

res = []
for s in sequences:
    positions = ''    for i in range(0, len(s)-3):
        if s[i] == 'N':
            if s[i+1] != 'P':
                if s[i+2] == 'S' or s[i+2] == 'T':
                    if s[i+3] != 'P':
                        pos = i+1                        positions += str(pos) + ' '    res.append(positions)

for q in range(len(filenames)):
    if len(res[q]) > 0:
        print(filenames[q])
        print(res[q])