Bio/CS 251 Bioinformatics EXAM 1 Dr. James February 15, 2005 Spring 2006 1. (10 pts) A Bioinformatics student was asked to draw the chemical structure of an adenine- and thyminecontaining dinucleotide derived from DNA. The student’s answer is shown below. The student made more than six major errors. One of them is circled, numbered, and explained. a. Find four (4) other errors, circle them, number them, and briefly explain what is wrong and how it would look if correct. #1 – extra phosphate should not be present #2 – Uracil not present in DNA; this base should be thymine (5-methyl-uracil) or cytosine #3 – Adenine base structure: reverse N8 and C9 to N9 and C8 #4 – Adenine bond to deoxyribose is incorrect: C1’ of deoxyribose should bond to N9 of Adenine #5 – Terminal sugar is ribose, not deoxyribose; replace 2’ OH with H #6 – C5’ of 2nd deoxyribose is bonded to 5 atoms. Carbon can bond to only 4 atoms. The –OH group must be removed. #7, 8 – Various double bonds are misplaced in both the Adenine and Uracil rings b. On the diagram, correctly number each ring atom of the sugar. c. Draw an arrow along the diagram to show the strand polarity, indicating the 5’ and 3’ end of the strand. 2. (14 points) Consider the following peptide sequences, written from NH2- to COOH end: (refer to your handouts where needed) 1 H 2 PC 3 PC 4 H 5 6 7 PU PC PU 8 H 9 PU 10 PU 11 H 12 H 13 PU 14 H 15 H 16 17 18 PC PU H I - Leu – Arg – Asp – Val – Tyr – His – Gln – Leu – Asn – Ser – Val – Met – Thr – Trp – Leu - Glu – Asn – Ile – 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 o + o + o + o + o + o + + o o + o + II – Ile – His – Val – Lys – Trp – Arg – Ala – Lys – Leu – Arg – Met – Lys – Glu – Pro – Gly – Arg – Leu – Lys (o = hydrophobic; + = Charged, basic) a. Above each aa in the first peptide (I), record the polarity as Hydrophobic (H), Polar Uncharged (PU), or Polar Charged (PC). b. Choose the peptide most likely to form an amphipathic –helix, and do the following: Peptide I will form an amphipathic –helix (1) Using the rules for formation of an –helix, record the first five pairs of aa that would interact via H-bonding to form the -helix, by filling in the blanks below (e.g., Tyrx – Trpy, etc). __Leu__ __Tyr___ __Arg__ __His___ __Asp__ __Gln__ __Val__ __Leu__ __Tyr__ __Asn__ (2) Based on your understanding of –helix structure, explain how you know that this –helix would be amphipathic. An amphipathic molecule is one that is charged at one end, or along one side, and uncharged at the other end, or on the opposite side. An amphipathic –helix has a hydrophobic face and a hydrophilic face. An –helix contains ~3.6 aa per turn. Thus, every 3rd or 4th amino acid will occupy the same face, or side, of the helix. In an –helix with a hydrophobic face, one would expect that hydrophobic aa would occupy, e.g., position # 1 -- 3/4 -- 7 -- 11 -- 14 -- 17/18 -- 21 -- etc. In peptide #1 above, the occurrence of hydrophobic residues is close to the expected pattern. c. Which peptide, or part of a peptide, would most likely participate in forming a -sheet? Explain, pointing out the basic features that would occur in this -strand. (…get the pun?) In peptide II, aa 1-12 would form an amphipathic -strand, with hydrophobic aa (o) extending above and basic aa (+) extending below the plane of the -sheet. d. Which peptide would most likely contain a bend, or U-turn? Explain by drawing the relevant tetrapeptide, and show how this bend would be stabilized by ionic interactions between two of the four R-groups. Peptide II would contain a bend, involving aa 13-16, as shown below. Proline-14 would create the bend, Glycine-15 would cause least interference with the bend since its R-group is a simple H, and the U-turn would be stabilized by ionic bonding between the oppositely charged R-groups of Glu-13 and Arg-16. 3. (6 pts) Examine the two short oligonucleotides below, and answer the following questions: 5’ T G C T A C G A A T C A G T C A C 3’ 5’ T A T A A A G G G G G T T T A T A 3’ a. Which one of these molecules is more likely to form a stem-loop secondary structure (a hairpin)? Draw out this oligonucleotide’s secondary structure as it would appear in two dimensions. G G G G T T A A T A A T T A A 3’ T 5’ G b. Which one of these oligonucleotides would have a higher Tm (melting temperature), i.e., if each molecule was double-stranded, which one would anneal more stably with its complementary strand? Briefly explain why. The GC content of the top oligonucleotide is higher (8 G or C) than the bottom oligonucleotide (5 G). Therefore, the top strand would form a more stable double-stranded molecule than the bottom strand, which is more AT-rich (and therefore less stable). 4. (40 points) Please refer to the Genetic Code handouts for this question: Consider the bacterial DNA sequence below, which codes for part of a gene: __________________________________5’ UTR_________________________________ RBS 5’ 3’ A T C C A C G G A C C G C A G G A G G T C C A A G T G A C C G T A G G T G C C T G G C G T C C T C C A G G T T C A C T G G C A U C C A C G G A C C G C A G G A G G U C C A A G U G A C C G __ T A T G T A C G T T G C C C T A G A G A A T A C C A T A A A C G A A A T A C A T G C A A C G G G A T C T C T T A T G G T A T T T G C T T mRNA 3’ 5’ U A U G U A C G U U G C C C U A G A G A A U A C C A U A A A C G A A M Y V A L E N T I N E a. If you know that the bottom strand is the anti-coding (antisense) strand, in which direction must the gene be transcribed and translated? Briefly justify. The anti-coding strand is also called the antisense or template or mRNA-unlike strand. The top strand must therefore be the coding, or sense strand. A gene is always oriented from 5’ to 3’ on the coding strand (left to right in this case), and is transcribed from 5’ to 3’ (left to right) from the template strand. b. The first base of the sequence is the +1 startsite. (1) Does the sequence shown above contain a promoter? Why or why not? This sequence cannot contain a promoter, since it begins at the +1 startsite of transcription. The promoter lies upstream (to the left of) the startsite. The promoter DNA sequence recruits RNA Polymerase to the DNA, and determines where RNA polymerase will begin transcription (at the +1 startsite). (2) Write the first base in the RNA sequence here ___A____ The template base T on the bottom strand will serve to specify an A as the 1st nucleotide of the transcript. c. On the sequence above, do the following: (1) Underline and label the sequence in the DNA that enables a ribosome to bind to the corresponding mRNA. What is the name given to this sequence? AGGAGGT in the coding strand of the DNA. In the mRNA the corresponding AGGAGGU would serve as the Ribosome Binding Site (RBS), which is also known as the ShineDalgarno sequence. (2) Find the beginning of the coding region, and underline each codon on the mRNA-like strand of the DNA. See the diagram above (3) Underline and label the 5’ UTR for this gene. The 5’ UTR begins at the +1 startsite and ends just prior to the START codon (ATG). It is designated by the orange line. (4) Beginning with the first codon in the DNA sequence, transcribe the coding region, codon by codon, aligning this transcribed sequence underneath the DNA sequence. (5) Underneath your transcript, translate the sequence into one-letter amino acid code (e.g., C for Cysteine). How do you know that you have the correct peptide sequence? Hint: Happy Valentine’s Day! (6) Consider just the first three amino acids: For each aa, write all of the possible codon sequences here (leave blanks empty, as appropriate): aa1 aa2 aa3 ___AUG_____ ___UAC_____ __GUU____ ___________ ___UAU_____ __ GUC____ ___________ ___________ ___GUA___ ___________ ___________ ___GUG____ (a) Determine the minimal # of tRNAs that could be used to translate these three amino acids, as follows: In the space below, draw the anticodon-codon pairing that could occur for each set of codons that can be served by a single tRNA. In other words, show each unique tRNA anticodon only once -- do not repeat the same tRNA. A-C: Codon: 3’ U A C 5’ A U G 5’ 3’ 3’ A U G 5’ 5’ U A C 3’ U A-C: Codon: 3’ C A I 5’ 5’ G U A 3’ C U 3’ C A C 5’ 5’ G U G 3’ (b) Determine what fraction of mutations at the first, second, and third codon positions will be synonymous. 1st position synonymy: 0/9 = 0% (c) (d) 2nd position synonymy: 0/9 = 0% 3rd position synonymy: 4/9 = 44.4% List the transversion mutations in the three codons above. GUU GUA GUC GUA GUU GUG GUC GUG Does a transversion mutation always cause a non-synonymous substitution? Explain. No. In 4-fold degenerate codons, transversions at the 3rd position are always synonymous. However, transversions at the 1st and 2nd positions are always non-synonymous, and in 2-fold degenerate codons transversions are always non-synonymous. In 3-fold degenerate ILE codons, transversions from AUA AUC or AUU are synonymous, and vice versa, but transversions from AUC or AUU AUG are non-synonymous. Multiple choice: (2 points apiece, #s 5-7) 5. If Inosine were a legitimate base in DNA a. the rules for complementary base pairing would be corrupted. b. the integrity of DNA would be compromised by frequent deaminations of Adenine to Inosine. c. I:A base pairing would distort the diameter of the double helix. d. All of the above. e. None of the above. 6. The chain reaction leading to Severe Combined Immunodeficiency Syndrome (SCID) occurs because a. Ribonucleotide reductase (RNR) converts too much ATP to dATP. b. Excess guanine and adenine cannot be reused to make guanosine and adenosine. c. Adenosine deaminase fails to convert excess deoxyadenosine to deoxyinosine. d. Excessive amounts of DNA accumulate in the nucleus of white blood cells. 7. In which one of the following ways does the initiation of transcription differ between prokaryotes and eukaryotes? a. In prokaryotes only, an AT-rich region near the transcription startsite must be melted apart. b. In eukaryotes only, the promoter is transcribed along with the rest of the gene. c. In prokaryotes only, many accessory proteins affect transcription by binding to additional promoter, enhancer, and silencer elements in the DNA. d. In eukaryotes only, RNA Polymerase cannot directly recognize promoter DNA sequences, and instead relies upon the basal transcription complex. 8. (1 pt per answer) Circle all of the correct answers. Mutations may eventually result from… a. Copying mistakes that occur during DNA replication b. DNA damage caused by, e.g., sunlight. c. Deamination events, e.g, the oxidation of cytosine to uracil. d. Loss of DNA surveillance proteins such as the mutS/MSH2 gene. For this question open up a word document, Exam1_PAH, that you will save to your H drive. As you answer the questions, type them in the word document properly identified as to what you are answering and allowing space between your answers for clarity. We suggest that you begin this question first and work on other questions while you are waiting for reply from the web sites. When you have finished this question, save the document in your H drive and also mail it to: sjames@gettysburg.edu 9. (20 pts) Use the following web sites to locate and analyze the PAH gene from one bacterial species and humans http://www.ncbi.nlm.nih.gov/ a. Use NCBI to find the PAH gene from a bacterial species such as Xanthomonas or Bacillus Provide in your answer (1) the full name of the organism (genus and species), (2) the full name of the protein, and (3) show the protein sequence in a FASTA format. I searched “Gene” for “PAH Bacillus cereus” (1) full name of organism: [Bacillus cereus E33L] (2) full name of the protein: phenylalanine 4-monooxygenase (phenylalanine-4-hydroxylase) pah phenylalanine 4-monooxygenase (phenylalanine-4hydroxylase) [Bacillus cereus E33L] GeneID: 3026076 Locus tag: BCZK4102 updated 03-Dec-2005 SummaryGene type: protein coding Gene name: pah RefSeq status: Provisional Organism: Bacillus cereus E33L (strain: E33L) Lineage: Bacteria; Firmicutes; Bacillales; Bacillaceae; Bacillus; Bacillus cereus group (3) Link to protein sequence: AAU16167 Protein sequence in a FASTA format: AAU16167. Reports phenylalanine 4-m...[gi:51974617] BLink, Conserved Domains, Links >gi|51974617|gb|AAU16167.1| phenylalanine 4-monooxygenase (phenylalanine-4-hydroxylase) [Bacillus cereus E33L] MTKKTEIPSHLKPFVSTQHYDQYTPVNHAVWRYIMRQNHSFLKDVAHPAYVNGLQSSGINIEAIPKVEEM NECLASSGWGAVTIDGLIPGVAFFDFQGHGLLPIATDIRKVENIEYTPAPDIVHEAAGHAPILLDPTYAK YVKRFGQIGAKAFSTKEEHDAFEAVRTLTIVKESPTSTPDEVTAAENNVLEKQKLVSGLSEAEQISRLFW WTVEYGLIGNIDAPKIYGAGLLSSVGESKHCLTDAVEKVPFSIETCTSTTYDVTKMQPQLFVCESFEELT EALEKFSETMAFKTGGKEGLEKAIRSENHATAELNSGLQITGTFTETIENDAGELIYMRTSSPTALAIHN KQLANHSTSVHSDGFGTPIGLLTENIALENCTDEQLQSLGITIGNKAAFTFASGIHVKGTVTDIVKNDKK IALISFINCTVTYNDRVLFDASWGSFDMAVGSTITSVFPGAADAAAFFPMDEEIQEIPAPLVLNELERMY QTVRDIRNEGILHDAHIEQLVAIQEVLNKFYTKEWLLRLEILELLLEHNKGHETSAALLQQLSTFTTDEA VTRLINNGLTLLPVKDVKNDATIN b. Use NCBI to find the human counterpart(s) of the bacterial PAH gene. Into your document paste the human-bacterial alignment for the PAH gene. Go to the NCBI homepage, choose BLAST, then do a protein-protein Blast search (Blastp) Alignment between Bacillus cereus PAH and human PAH (the human gene was …..th in the list of Descriptions >gi|18765885|gb|AAL78816.1| Length=452 phenylalanine hydroxylase [Homo sapiens] Score = 97.1 bits (240), Expect = 2e-18 Identities = 64/234 (27%), Positives = 105/234 (44%), Gaps = 46/234 (19%) Query 58 Sbjct 218 Query 117 Sbjct 278 Query 177 Sbjct 314 Query 237 Sbjct 353 GINIEAIPKVEEMNECLAS-SGWGAVTIDGLIPGVAFFDFQGHGLLPIATDIRKVENIEY G + + IP++E++++ L + +G+ + GL+ F + IR Y GFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMY 116 277 TPAPDIVHEAAGHAPILLDPTYAKYVKRFGQIGAKAFSTKEEHDAFEAVRTLTIVKESPT TP PDI HE GH P+ D ++A++ + G A TPQPDICHELLGHVPLFSDRSFAQFSQEIGLASLGA------------------------ 176 STPDEVTAAENNVLEKQKLVSGLSEAEQISRLFWWTVEYGLIGNIDAPKIYGAGLLSSVG PDE E+++ ++W+TVE+GL D+ K YGAGLLSS G --PDEYI-------------------EKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFG 236 ESKHCLTDAVEKVPFSIETCTSTTYDVTKMQPQLFVCESFEELTEALEKFSETM E ++CL++ + +P +E Y VT+ QP +V ESF + E + F+ T+ ELQYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATI 290 406 313 352 c. Answer the following questions in your document. Clearly indicate what question you are answering. (1) What type of algorithm was used to make the bacterial-human alignment? Blastp (2) What proportion of the two sequence alignments are gapped? 19% (46/234) (3) How similar are the two proteins? What two features of the alignment are used to determine the overall similarity of the two proteins? 44% similar (105/234). The similarity value is calculated by summing the identities (64/234 – 27%) and the conservative amino acid substitutions. d. Find the Entrez Gene page for this human gene from the NCBI site and use it to answer the following questions: Here is the Entrez Gene summary: PAH phenylalanine hydroxylase [Homo sapiens] GeneID: 5053 Primary source: HGNC:8582 updated 03-Feb-2006 SummaryOfficial Symbol: PAH and Name: phenylalanine hydroxylase provided by HUGO Gene Nomenclature Committee See related: HPRD:08943, MIM:261600 Gene type: protein coding Gene name: PAH Gene description: phenylalanine hydroxylase RefSeq status: Reviewed Organism: Homo sapiens Lineage: Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Hominidae; Homo Gene aliases: PKU; PKU1 Summary: PAH encodes the enzyme phenylalanine hydroxylase that is the ratelimiting step in phenylalanine catabolism. Deficiency of this enzyme activity results in the autosomal recessive disorder phenylketonuria. (1) What is the length of the gene? 79278 bp, from link to NC_000012 (2) What is the length of the mRNA transcript? Calculate the proportion of the gene that is codegenic. (express this as a %) mRNA Sequence NM_000277 : 2680 bp The proportion of the gene that is codegenic, i.e., composed of sequences that code for protein, is approximately 2680/79278 = 0.0338 = ~3.4% (3) How many exons and introns does the human gene contain? Starting from the Entrez Gene page above, select “Gene Table” in the “Display” box. You will see that the human PAH gene contains 13 exons and 12 introns, as follows: Exon information: NM_000277 length: 2681 bp, number of exons: 13 NP_000268 length: 453 aa, number of exons: 13 EXON Coding EXON INTRON coords length coords length coords length 1 - 533 533 bp 474 - 533 60 bp 534 - 4705 4172 bp 4706 - 4813 108 bp 4706 - 4813 108 bp 4814 - 22685 17872 bp 22686 - 22869 184 bp 22686 - 22869 184 bp 22870 - 40053 17184 bp 40054 - 40142 89 bp 40054 - 40142 89 bp 40143 - 50940 10798 bp 50941 - 51008 68 bp 50941 - 51008 68 bp 51009 - 62271 11263 bp 62272 - 62468 197 bp 62272 - 62468 197 bp 62469 - 64653 2185 bp 64654 - 64789 136 bp 64654 - 64789 136 bp 64790 - 65847 1058 bp 65848 - 65917 70 bp 65848 - 65917 70 bp 65918 - 70652 4735 bp 70653 - 70709 57 bp 70653 - 70709 57 bp 70710 - 73172 2463 bp 73173 - 73268 96 bp 73173 - 73268 96 bp 73269 - 73824 556 bp 73825 - 73958 134 bp 73825 - 73958 134 bp 73959 - 77088 3130 bp 77089 - 77204 116 bp 77089 - 77204 116 bp 77205 - 78385 1181 bp 78386 - 79278 893 bp 78386 - 78429 44 bp (4) Where did you find this information? From Gene Table, as described above (5) On what chromosome does this gene reside? Specify the subchromosomal location, e.g. 15q7.2 chromosome: 12; Location: 12q22-q24.2 e. PAH gene: Is the PAH gene associated with a disease or syndrome? Find the OMIM file for this gene and list one disease associated with defects in this gene. +261600 PHENYLKETONURIA Alternative titles; symbols PKU PHENYLALANINE HYDROXYLASE DEFICIENCY PAH DEFICIENCY OLIGOPHRENIA PHENYLPYRUVICA FOLLING DISEASE PHENYLALANINE HYDROXYLASE, INCLUDED; PAH, INCLUDED PKU1, INCLUDED HYPERPHENYLALANINEMIA, MILD, INCLUDED; HPA, INCLUDED PHENYLALANINEMIA, INCLUDED Gene map locus 12q24.1 DESCRIPTION Phenylketonuria is an inborn error of metabolism resulting from a deficiency of phenylalanine hydroxylase (EC 1.14.16.1) and characterized by mental retardation. There are other causes of hyperphenylalaninemia; Scriver et al. (1994) reviewed the hyperphenylalaninemias of man and mouse. CLINICAL FEATURES Early diagnosis of phenylketonuria (PKU), a cause of mental retardation, is important because it is treatable by dietary means. Features other than mental retardation in untreated patients include a 'mousy' odor; light pigmentation; peculiarities of gait, stance, and sitting posture; eczema; and epilepsy (Paine, 1957). Kawashima et al. (1988) suggested that cataracts and brain calcification may be frequently overlooked manifestations of classic untreated PKU. Brain calcification has been reported in dihydropteridine reductase (DHPR) deficiency (261630). Pitt and O'Day (1991) found only 3 persons with cataracts among 46 adults, aged 28 to 71 years, with untreated PKU. They concluded that PKU is not a cause of cataracts. Levy et al. (1970) screened the serum of 280,919 'normal' teenagers and adults whose blood had been submitted for syphilis testing. Only 3 adults with the biochemical findings of PKU were found. Each was mentally subnormal. Normal mentality is very rare among patients with phenylketonuria who have not received dietary therapy. The basic defect in PKU is phenylalanine hydroxylase deficiency. Evidence of heterogeneity in phenylketonuria was presented by Auerbach et al. (1967) and by Woolf et al. (1968).