Естественные науки и программирование: января 2017

среда, 18 января 2017 г.

чего стоит одна строчка

import sys

x, y = [int(s) for s in sys.stdin.readline().split(" ")]
print(x + y)

вторник, 17 января 2017 г.

Calculating Protein Mass (ROSALIND PRTM)

Given: A protein string P of length at most 1000 aa.

Return: The total weight of P. Consult the monoisotopic mass table.

def mass_table():
    mt = {}
    mt['A'] = 71.03711

    mt['C'] = 103.00919

    mt['D'] = 115.02694

    mt['E'] = 129.04259

    mt['F'] = 147.06841

    mt['G'] = 57.02146

    mt['H'] = 137.05891

    mt['I'] = 113.08406

    mt['K'] = 128.09496

    mt['L'] = 113.08406

    mt['M'] = 131.04049

    mt['N'] = 114.04293

    mt['P'] = 97.05276

    mt['Q'] = 128.05858

    mt['R'] = 156.10111

    mt['S'] = 87.03203

    mt['T'] = 101.04768

    mt['V'] = 99.06841

    mt['W'] = 186.07931

    mt['Y'] = 163.06333

    return mt

protein = input()
mt = mass_table()
mass = 0

for aa in protein:
    mass += mt[aa]
print(round(mass,3))

Enumerating Gene Orders (ROSALIND PERM)

Given: A positive integer

n \leq 7

.

Return: The total number of permutations of length

n

, followed by a list of all such permutations (in any order).

copy.deepcopy() rules!

import copy
n = int(input())
listn = list(range(1, n+1))
results = []
for i in range(1, n+1):
    results.append([i])
for position in range(1, n):
    new_results = []
    for result in results:
        new_listn = [x for x in listn if x not in result]
        for element in new_listn:
            new_result = copy.deepcopy(result)
            new_result.append(element)
            new_results.append(new_result)
    results = new_results
print(len(new_results))

print(new_results)

воскресенье, 15 января 2017 г.

Open Reading Frames (ROSALIND ORF)

Given: A DNA string s of length at most 1 kbp in FASTA format.

Return: Every distinct candidate protein string that can be translated from ORFs of s. Strings can be returned in any order.

I use codon_table() function here.

def find_substr(find_what, find_where):
    res = []
    for i in range(len(find_where)-len(find_what) + 1):
        flag = bool(1)
        for j in range(len(find_what)):
            if find_where[i+j] != find_what[j]:
                flag = bool(0)
                break

            if flag:
            res.append(i)
    return res

def protein_maker(line):
    proteins = []
    start_points = find_substr('ATG', line)
    protein_string = ''

    for i in start_points:
        k = i
        while k < len(line) - 2:
            codon = line[k:k + 3]
            if codon == 'TAG' or codon == 'TGA' or codon == 'TAA':
                proteins.append(protein_string)
                protein_string = ''

                break

            else:
                protein_string += c_table[codon]
                k += 3

    return proteins

f = open('orf.txt', 'r')
line = ''counter = 0for l in f:
    if counter != 0:
        line += l
    counter += 1line = line.replace('\n', '')

c_table = codon_table()

# consider reverse compliment#

reversereverse_line = line[::-1]
#compliment

reverse_compliment_line = ''

for k in reverse_line:
    if k == 'A':
        reverse_compliment_line += 'T'

    if k == 'T':
        reverse_compliment_line += 'A'

    if k == 'C':
        reverse_compliment_line += 'G'

    if k == 'G':
        reverse_compliment_line += 'C'
prts = protein_maker(line)
rev_prts = protein_maker(reverse_compliment_line)
proteins = list(set(prts)|set(rev_prts))
for i in proteins:
    print(i)

суббота, 14 января 2017 г.

Inferring mRNA from Protein (ROSALIND MRNA)

Given: A protein string of length at most 1000 aa.

Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)

Using my old function codon_table()

def rev_codon_table():
    c_t = codon_table()
    rev_c_t = {}
    for key in sorted(c_t.keys()):
        val = c_t[key]
        if val in rev_c_t:
            rev_c_t[val] += 1

        else:
            rev_c_t[val] = 1

    return rev_c_t

f = open('mrna.txt', 'r')
line = ''for l in f:
    line += l
line = line.replace('\n', '')

rct = rev_codon_table()
res = 3 #there are 3 stop-codons

for s in line:
    res = res*rct[s]%1000000

print(res)

Finding a Protein Motif (ROSALIND MPRT)

How to read from web in Python 3 using the standard library:

urllib.request.urlopen

import urllib.request
response = urllib.request.urlopen('http://www.example.com/')
html = response.read()

urllib.request.urlretrieve

import urllib.request
urllib.request.urlretrieve('http://www.example.com/songs/mp3.mp3', 'mp3.mp3')

Given: At most 15 UniProt Protein Database access IDs.

Return: For each protein possessing the N-glycosylation motif, output its given access ID followed by a list of locations in the protein string where the motif can be found.

import urllib.request
import os

filenames = []
sequences = []
f = open('mprt.txt', 'r')
for l in f:
    l = l.replace('\n', '')
    filename = l + '.txt'    filenames.append(l)
    urllib.request.urlretrieve('http://www.uniprot.org/uniprot/' + l + '.fasta', filename)
    f1 = open(filename, 'r')
    counter = 0    line = ''    for l1 in f1:
        if counter != 0:
            line += l1
        counter += 1    line = line.replace('\n', '')
    sequences.append(line)
    f1.close()
    os.remove(filename)

res = []
for s in sequences:
    positions = ''    for i in range(0, len(s)-3):
        if s[i] == 'N':
            if s[i+1] != 'P':
                if s[i+2] == 'S' or s[i+2] == 'T':
                    if s[i+3] != 'P':
                        pos = i+1                        positions += str(pos) + ' '    res.append(positions)

for q in range(len(filenames)):
    if len(res[q]) > 0:
        print(filenames[q])
        print(res[q])

среда, 18 января 2017 г.