Problem No 1 Determinaiton of GC Content Among the four nucleotides, {A,T,C,G}, the ratio of C & G over a DNA sequence carries some very important signals. This ratio is measured through “GC-content %” using the following formula. GC-content % = ((n(G)+n(C))/(len(DNA)))*100% Where n(G) = number of G in the sequence n(C) = number of C in the sequence len(DNA)= length of the DNA sequence in base-pair(bp) write a python program that can perform as bellow. Input 1. DNA sequence as a string Output 1. Length of the sequence 2. GC-content % Example Input ATCG Output Length of the seqeunce = 4 bp GC-content % = 50% 1 Problem No 2 Complement DNA strand DNA forms the double helix structure with two strands of DNA. Though when we work with DNA seqeunce, we usually talk about a single DNA-sequence (single strand). But in the chromosome DNA remains in a double stranded form. These two strands are called complement of each other. One is named as 5’-3’ (forward strand) and other is named as 3’-5’(reverse strand). When it is not explicitly mentioned the strand type (or direction), it is assumed that the respective DNA sequence is of 5’-3’ or forward strand. 5’---ACCGTA---3’ | | ||| | 3’---TGGCAT---5’ In a complement DNA strand each base of the original DNA sequence is replaced by the following interchanging ruleA is replaced by T and vice-versa C is replaced by G and vice-versa This is because, in the double helix structure A of one strand is connected with T of other strand with hydrogen bond and same in the case of C & G. write a python program that can perform as bellow. Input 1. DNA sequence as a string Output 1. Complement of input DNA sequence Example Input ACCGTA Output Complement DNA Sequence = TGGCAT 2 Problem No 3 Reverse Complement of a DNA Sequence This problem can be thought as an extension of Problem No 2 (read Problem No 2 first). In bioinformatics analysis the concept of Reverse Complement DNA sequence is very often encountered. If the complement of a DNA sequence is reversed, this is called reverse-complement of the original DNA sequence. 5’---ACCGTA---3’ | | ||| | 3’---TGGCAT---5’ Here the complement of (5’---ACCGTA---3’) is (3’---TGGCAT---5’), and reverse of (3’---TGGCAT---5’) is TACGGT, so the reverse complement of ACCGTA is TACGGT. write a python program that can perform as bellow. Input 1. DNA sequence as a string Output 1. Reverse Complement of input DNA sequence Example Input ACCGTA Output Reverse Complement DNA Sequence = TACGGT 3 Problem No 4 Codon List from a DNA sequence Triplets of nucleotides (for example ATT, TCG, CCC, etc) are called Codons. Through the process of Transcription and Translation these Codons of a DNA sequence become responsible to produce an amino acid individually. And finally chain of amino acids builds a protein. 64 (4x4x4) different codons are possible. Lets think of a DNA sequence as ATTTCGAGGT. If we start parsing codons from left to right, the possible codons will be ATT, TCG, AGG (ignore the right most remaining part with length <3 bp, in this case T). Write a function/python program that returns the list of codons for a DNA sequence. This program should return/print the list of codons as the “list” data structure of python. Input 1. DNA sequence as a string Output 1. List of Codons Example Input ATTTCGAGGT Output Codon-List = [‘ATT’,’TCG’,’AGG’] 4 Problem No 5 Translate a DNA Sequence Each codon represents an amino acid (skim through Problem No 4). The standard Codon-To-Amino Acid mapping table is called the “Standard Genetic Code Table” or “Codon-Table”. This is built for codons derived from RNA (detail will discussed in separate space beyond this problem), as a result you will find U instead of T. But for the simplicity, in this specific problem definition you should use the customized (for DNA) genetic code table. Standard Genetic Code U C UUU A UCU UAU Phe (F) UUC UGU Tyr (Y) UCC UAC UCA Leu (L) Phe (F) Phe (F) UAA U Cys (C) UGC Phe (F) Phe (F) Ser (S) U UUA G Phe (F) C Phe (F) Phe (F) UGA Stop A UGG Trp (W) G Stop UUG UCG UAG CUU CCU CAU CGU U His (H) CUC CCC C CUA Phe (F) CCA Phe (F) CCG CGC CGA Gln (Q) CAG Phe (F) C Arg (R) Phe (F) CAA Phe (F) Phe (F) Phe (F) Phe (F) Phe (F) Phe (F) CUG CAC Pro (P) Leu (L) Phe (F) A Phe (F) Phe (F) CGG G Phe (F) Phe (F) AUU ACU AAU AGU Asn (N) AUC Ile (I) ACC Thr (T) Phe (F) A AUA Phe (F) Phe (F) AAC ACA Phe (F) ACG Phe (F) Phe (F) AGA AAG Phe (F) AGG Phe (F) GAU A Phe (F) G Phe (F) Phe (F) Phe (F) GCU C Arg (R) Lys (K) Phe (F) GUU Phe (F) Phe (F) AAA Phe (F) Met (M) AGC Phe (F) Phe (F) AUG Phe (F) U Ser (S) GGU U Asp (D) GUC GCC G GUA Phe (F) GCA GGC Phe (F) GAA GCG GGA Glu (E) Phe (F) Phe (F) C Gly (G) Phe (F) Phe (F) GUG GAC Ala (A) Val (V) GAG Phe (F) Phe (F) A Phe (F) Phe (F) GGG G Phe (F) Phe (F) 5 Customized (for DNA) Genetic Code ttt: F ttc: F tta: L ttg: L tct: S tcc: S tca: S tcg: S tat: Y tac: Y taa: * tag: * ctt: L ctc: L cta: L ctg: L cct: P ccc: P cca: P ccg: P cat: H cac: H caa: Q cag: Q tgt: C tgc: C tca: * tcg: W cgt: R cgc: R cga: R cgg: R att: I act: T aat: N agt: S atc: I acc: T aac: N agc: S ata: I aca: T aaa: K aga: R atg: M acg: T aag: K agg: R gtt: V gtc: V gta: V gtg: V gct: A gcc: A gca: A gcg: A gat: D gac: D gaa: E gag: E ggt: G ggc: G gga: G ggg: G There are 20 different amino acids. Detial table is as bellow. 20 Amino Acids and Their Codes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1-Letter code A R N D C Q E G H I L K M F 3-Letter Code Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Name Alanine Arginine Asparagine Aspartic acid Cysteine Glutamine Glutamic acid Glycine Histidine Isoleucine Leucine Lysine Methionine Phenylalanine 6 15 16 17 18 19 20 P S T W Y V Pro Ser Thr Trp Tyr Val Proline Serine Threonine Thryptophan Tyrosine Valine Write a function/program that takes a DNA sequence and returns/prints the translated protein sequence (using the customized codon table, and representing amino acids using 1-letter codes). Ignore right-most incomplete codon of length <3 bp, as explained in Problem No 4. Input 1. DNA sequence as a string Output 1. Amino Acids Sequence of Protein Example Input TTTCCTAATC Output Protein Sequence =FPN 7