Bio/CS 251 Bioinformatics Homework 1 1/23/06 20 points Due Date: Thursday, 1/26/06 at 5 pm to Dr. James, 255 SC Assignment: Complete Problems 1, 2, 3, and either #4 or #5 (but not both!). 1. A double-stranded DNA molecule is 28% deoxyguanosine (G). What is the complete base composition of this molecule? 2. Draw the structures of the nucleoside monophosphates containing Inosine, Guanosine, and Adenosine, then do the following: 3. a. Identify the base hypoxanthine by drawing a circle around it. b. Which two bases are the most different in chemical structure? How many chemical changes would need to occur to change one base into the other, or vice versa? Indicate whether these changes are aminations, deaminations, or oxidations. c. From your answer in b., explain why evolution (natural selection) chose these two particular nucleotides over the other in this trio, as the two bases in DNA. The total number of base pairs of DNA in one human cell is 6 x 109. From the values presented in Monday’s Powerpoint lecture, do the following: a. Calculate the total length of the DNA from a human cell, expressed in meters. Show your calculations. b. Your body is composed of approximately one quadrillion cells (1 x 1015 cells). If you laid all of your DNA molecules end to end, approximately how many times would this DNA extend to the moon and back? Complete one of the following two exercises, either #4 or #5. 4. Bioinformatic exercise: Do the following a. Go to the National Center for Biotechnology Information homepage: http://www.ncbi.nlm.nih.gov/ b. Under “Search”, choose “Gene” from the scroll-down list. In the search box, type “HPRT”, then hit the “Go” button (this is the abbreviation for Hypoxanthine-Guanine Phosphoribosyl Transferase, the gene whose job it is to Salvage excess guanine and adenine and recycle it back into usable nucleoside monophosphates. c. How many GenBank entries exist for this gene? d. Examine the first three pages of GenBank entries for this gene, and take note of the organisms from which this gene has been identified. The names of these organisms are italicized, within brackets, at the end of the second line of each gene entry. (1) List at least 10 species, and no more than 20, in which this gene has been reported. After listing the genus and species names for each organism, e.g., Homo sapiens, indicate the type of organism, such as human, dog, chicken, plant, fungus, bacteria, etc. (2) How diverse is the range of species in which this gene occurs? (e.g., primates-only, mammals-only, vertebrates-only, etc.) (if you need help with the biology here, please see Dr. James) (3) From your answer in (2) above, would you say that the HPRT gene evolved recently, or did it evolve in the distant past. What information lead you to your conclusion? e. From the list of HPRT genes, locate the human version of this gene and click on the blue HPRT link. (1) This is the Entrez Gene page containing the single-page annotation for the human HPRT gene. Print out this page, and submit it with this assignment. (2) Scroll towards the bottom of this page of annotation, and click on the genomic sequence link titled “M26434”. Scroll towards the bottom until you find the DNA sequence for the human HPRT gene. How long is this gene, in nucleotides? #4 (continued) f. Go back to the previous page, and click on the link titled mRNA “M31642”. How long is the messenger RNA (mRNA) for the same gene? g. Go back to the previous page, and click on the link titled Protein “AAA52690” How long is the protein encoded by the HPRT gene and mRNA? For the moment, take note of the length differences between the gene, the mRNA, and the protein. These reasons for these differences will be explained a lecture or two from now. h. Finally, determine the number of sites in the gene at which allelic variants, or mutations, are known to occur. These mutations are often the result of a single base substitution, also known as Single Nucleotide Polymorphism, or SNP. Mutations can also be caused by deletion or insertion of one or more bases. To determine the number of sites within the DNA sequence of this gene at which single –base mutations have been discovered, do the following: From the “Display” box on the Entrez Gene page, select “SNP links”, and proceed to the new page. (1) How many different allelic variants are listed in the SNP database for this gene? (2) Scroll down and click on SNP #18 and #19. From the “Allele” column on the “Single Nucleotide Polymorphism” page for each of these variants, determine the type of variation in each case. List and briefly describe each type mutation. For example, is the mutation caused by substitution of one base for another (which ones?), or by the insertion or deletion of a base (which one)? 5. Bioinformatics Exercise: Do the following: a. Go to the National Center for Biotechnology Information homepage: http://www.ncbi.nlm.nih.gov/ b. Under “Search”, choose “Gene” from the scroll-down list. In the search box, type “Adenosine deaminase”, then hit the “Go” button. This will provide a list of ADA and ADA-related genes. One of these genes is responsible for Adenosine Deaminase Deficiency, aka Severe Combined Immunodeficiency Syndrome (SCID), aka “Bubble Boy Disease”. c. How many GenBank entries exist for this gene? d. Examine the first three pages of GenBank entries for this gene, and take note of the organisms from which this gene has been identified. The names of these organisms are italicized, within brackets, at the end of the second line of each gene entry. (1) List at least 10 species, and no more than 20, in which this gene has been reported. After listing the genus and species names for each organism, e.g., Homo sapiens, indicate the type of organism, such as human, dog, chicken, plant, fungus, bacteria, etc. (2) How diverse is the range of species in which this gene occurs? (e.g., primates-only, mammals-only, vertebrates-only, etc.) (if you need help with the biology here, please see Dr. James) (3) From your answer in (2) above, would you say that the HPRT gene evolved recently, or did it evolve in the distant past. What information lead you to your conclusion? e. From the list of ADA and ADA-related genes, scroll down to #19, “ADA”, and open this link. (1) This is the Entrez Gene page containing the single-page annotation for one version of the human gene. Print out this page, and submit it with this assignment. (2) Scroll towards the bottom of this page of annotation, and click on the genomic sequence link titled “M13792”. Scroll towards the bottom until you find the DNA sequence for the human HPRT gene. How long is this gene, in nucleotides? f. Go back to the previous page, and click on the link titled mRNA “X02994”. How long is the messenger RNA (mRNA) for the same gene? g. Go back to the previous page, and click on the link titled Protein “AAA78791” How long is the protein encoded by the ADA gene and mRNA? For the moment, take note of the length differences between the gene, the mRNA, and the protein. These reasons for these differences will be explained a lecture or two from now. h. Finally, determine the number of sites in the gene at which allelic variants, or mutations, are known to occur. These mutations are often the result of a single base substitution, also known as Single Nucleotide Polymorphism, or SNP. Mutations can also be caused by deletion or insertion of one or more bases. To determine the number of sites within the DNA sequence of this gene at which single –base mutations have been discovered, do the following: From the “Display” box on the Entrez Gene page, select “SNP links”, and proceed to the new page. (1) How many different allelic variants are listed in the SNP database for this gene? (2) Scroll down and study SNP #s 7 and 8 by clicking on each SNP. From the “Allele” column on the “Single Nucleotide Polymorphism” page for each of these variants, determine the type of variation in each case. List and briefly describe each type mutation. For example, is the mutation caused by substitution of one base for another (which ones?), or by the insertion or deletion of a base (which one)?