Естественные науки и программирование

среда, 30 ноября 2016 г.

Introduction to Random Strings (ROSALIND PROB)

If I want to find the propobility that string s and some random string are equal, it'll 0.25 to the power of s length. Here I have not a random string, and it's not 0.25, but some other number, which is easy to find from gc-content.
Hint in this problem is very useful.

Given: A DNA string

s

of length at most 100 bp and an array

A

containing at most 20 numbers between 0 and 1.
Return: An array

B

having the same length as

A

in which

B [k]

represents the common logarithm of the probability that a random string constructed with the GC-content found in

A [k]

will match

s

exactly.
Hint: One property of the logarithm function is that for any positive numbers

x

and

y

\log_{10} (x \cdot y) = \log_{10} (x) + \log_{10} (y)

.

from __future__ import division
import re
import sys
import math

def atcg_prob(x):
cg_prob = float(x)
at_prob = 1 - cg_prob
atcg_prob = {}
atcg_prob['A'] = at_prob / 2
atcg_prob['T'] = atcg_prob['A']
atcg_prob['C'] = cg_prob / 2
atcg_prob['G'] = atcg_prob['C']
return atcg_prob

def main():
if len(sys.argv) > 1:
res = ''
i = 2
while i < len(sys.argv):
   cont = atcg_prob(sys.argv[i])
   prob = 0
   for nuk in sys.argv[1]:
    prob = prob + math.log(cont[nuk], 10)
   #res = res + str(math.log(prob, 10)) + ' '
   res = res + str(round(prob, 3)) + ' '
   i += 1
print res
else:
print 'Enter datas!'

if __name__ == '__main__':
main()

вторник, 29 ноября 2016 г.

Хожу на химфак, слушаю лекции.

Хожу на химический факультет, слушаю органику. Многое понимаю, но сделать ничего не могу. Хотя было бы странно, если бы что-то могла, позанимавшись химией всего 60 часов. На занятиях очень интересно, прям как в кино. Тут пример того, что рассказывают.
Как получить фенолфталеин из фенола и фталевого ангидрида.
Вообще не очень понятно, почему атакуется только верхний кислород, а нижний нет. Но такая уж она химия - волшебная наука.

А тут показано почему фенолфталеин меняет окраску в щелочной среде. Как видно, от щелочи появляется много двойных связей по всей молекуле. Они то и меняет оптические свойства, так что щелочной среде он становится малиновым. Кстати, в кислой он розовый.

вторник, 22 ноября 2016 г.

Overlap Graphs (ROSALIND GRPH)

Given: A collection of DNA strings in FASTA format having total length at most 10 kbp.
Return: The adjacency list corresponding to O3. You may return edges in any order.

import re

f = open('12.txt', 'r')
reads = re.findall(r'(Rosalind_[0-9]+)\n(([A-T]+\n)+)', f.read())
suffix = {}
prefix = {}
for s in reads:
string = s[1].replace('\n', '')
if len(string) > 3:
head = string[:3]
if head in prefix:
   prefix[head].append(s[0])
else:
   prefix[head] = [s[0]]
tail = string[-3:]
if tail in suffix:
   suffix[tail].append(s[0])
else:
   suffix[tail] = [s[0]]
for rec in suffix:
if rec in prefix:
i = 0
while i < len(suffix[rec]):
   j = 0
   while j < len(prefix[rec]):
    if suffix[rec][i] != prefix[rec][j]:
     print suffix[rec][i] + ' ' + prefix[rec][j]
    j += 1
   i += 1

суббота, 19 ноября 2016 г.

Consensus and Profile (ROSALIND CONS)

numpy helps!
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

import numpy as np
import re

f = open('11.txt', 'r')
strings = re.findall(r'(>Rosalind_[0-9]+)\n(([A-T]+\n)+)', f.read())
str_len = len(strings[0][1].replace('\n', ''))
profile = np.zeros((4, str_len))
for s in strings:
counter = 0
str_data = np.zeros((4, str_len))
st = s[1].replace('\n', '')
for i in st:
if i == 'A':
   str_data[0,counter] = 1
elif i == 'C':
   str_data[1,counter] = 1
elif i == 'G':
   str_data[2,counter] = 1
elif i == 'T':
   str_data[3,counter] = 1
counter += 1
profile = profile + str_data
consensus = ''
position = 0
while position < str_len:
column = profile[:, position]
column_max = column.max()
nucleotides = ['A', 'C', 'G', 'T']
i = 0
while i < 4:
if column[i] == column_max:
   consensus = consensus + nucleotides[i]
   break
i += 1
position += 1
print consensus
A_line = ''
C_line = ''
G_line = ''
T_line = ''
j = 0
while j < str_len:
A_line = A_line + str(int(profile[0, j])) + ' '
C_line = C_line + str(int(profile[1, j])) + ' '
G_line = G_line + str(int(profile[2, j])) + ' '
T_line = T_line + str(int(profile[3, j])) + ' '
j += 1
print 'A: ' + A_line
print 'C: ' + C_line
print 'G: ' + G_line
print 'T: ' + T_line

пятница, 18 ноября 2016 г.

Translating RNA into Protein (ROSALIND PROT)

Given: An RNA string s corresponding to a strand of mRNA (of length at most 10 kbp).
Return: The protein string encoded by s.


import sys



def codon_table():

 nukaa = {}

 nukaa['UUU'] = 'F'

 nukaa['CUU'] = 'L'

 nukaa['AUU'] = 'I'

 nukaa['GUU'] = 'V'

 nukaa['UUC'] = 'F'

 nukaa['CUC'] = 'L'

 nukaa['AUC'] = 'I'

 nukaa['GUC'] = 'V'

 nukaa['UUA'] = 'L'

 nukaa['CUA'] = 'L'

 nukaa['AUA'] = 'I'

 nukaa['GUA'] = 'V'

 nukaa['UUG'] = 'L'

 nukaa['CUG'] = 'L'

 nukaa['AUG'] = 'M'

 nukaa['GUG'] = 'V'

 nukaa['UCU'] = 'S'

 nukaa['CCU'] = 'P'

 nukaa['ACU'] = 'T'

 nukaa['GCU'] = 'A'

 nukaa['UCC'] = 'S'

 nukaa['CCC'] = 'P'

 nukaa['ACC'] = 'T'

 nukaa['GCC'] = 'A'

 nukaa['UCA'] = 'S'

 nukaa['CCA'] = 'P'

 nukaa['ACA'] = 'T'

 nukaa['GCA'] = 'A'

 nukaa['UCG'] = 'S'

 nukaa['CCG'] = 'P'

 nukaa['ACG'] = 'T'

 nukaa['GCG'] = 'A'

 nukaa['UAU'] = 'Y'

 nukaa['CAU'] = 'H'

 nukaa['AAU'] = 'N'

 nukaa['GAU'] = 'D'

 nukaa['UAC'] = 'Y'

 nukaa['CAC'] = 'H'

 nukaa['AAC'] = 'N'

 nukaa['GAC'] = 'D'

 nukaa['CAA'] = 'Q'

 nukaa['AAA'] = 'K'

 nukaa['GAA'] = 'E'

 nukaa['CAG'] = 'Q'

 nukaa['AAG'] = 'K'

 nukaa['GAG'] = 'E'

 nukaa['UGU'] = 'C'

 nukaa['CGU'] = 'R'

 nukaa['AGU'] = 'S'

 nukaa['GGU'] = 'G'

 nukaa['UGC'] = 'C'

 nukaa['CGC'] = 'R'

 nukaa['AGC'] = 'S'

 nukaa['GGC'] = 'G'

 nukaa['CGA'] = 'R'

 nukaa['AGA'] = 'R'

 nukaa['GGA'] = 'G'

 nukaa['UGG'] = 'W'

 nukaa['CGG'] = 'R'

 nukaa['AGG'] = 'R'

 nukaa['GGG'] = 'G'

 return nukaa





def main():

 if len(sys.argv) > 1:

  rna_string = sys.argv[1]

  nukaa = codon_table()

  nucl_str_len = len(rna_string) // 3

  nucl_string = ''

  i = 0

  while i < nucl_str_len:

   codon = rna_string[3*i:3*(i+1)]

   if codon=='UAA' or codon=='UAG' or codon=='UGA':

    break

   else:

     nucl_string += nukaa[codon]

   i+=1

  print nucl_string

 else:

  print 'Enter RNA sequence.'



if __name__ == '__main__':

 main()

Finding a Motif in DNA (ROSALIND SUBS)



Given: Two DNA strings s and t (each of length at most 1 kbp).

Return: All locations of t as a substring of s. 



s = 'TCACTCGATTGGAACCGA'

t = 'CACTCGACA'

occurence = []

start_position = 0

while True:

 start_position = s.find(t, start_position+1)

 if start_position == -1:

  break

 occurence.append(start_position+1)

res = ''

for i in occurence:

 res += str(i) + ' '

print res

среда, 16 ноября 2016 г.

Mendel's First Love (ROSALIND IPRB)

Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.
Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.

import sys

def main():
if len(sys.argv) > 1:
k = int(sys.argv[1])
m = int(sys.argv[2])
n = int(sys.argv[3])
T = k + m + n
Z = T*(T - 1)
print round(1-(n*(n-1) + 0.25*m*(m-1) + m*n)/Z, 5)
else:
print 'Enter k, m, n.'

if __name__ == '__main__':
main()

среда, 30 ноября 2016 г.

вторник, 29 ноября 2016 г.

вторник, 22 ноября 2016 г.

суббота, 19 ноября 2016 г.

пятница, 18 ноября 2016 г.