Module 11. Genome annotation: Using a Genome browser to observe evolutionary patterns Background Species evolve over time. Evolution is the consequence of the interactions of 1) the potential for a species to increase its numbers, 2) the genetic variability of offspring due to mutation and recombination of genes, 3) a finite supply of the resources required for life, and 4) the ensuing selection by the environment of those offspring better able to survive and leave offspring. The great diversity of organisms is the result of more than 3.5 billion years of evolution that has filled every available niche with life forms. Natural selection and its evolutionary consequences provide a scientific explanation for the fossil record of ancient life forms as well as for the striking molecular similarities observed among the diverse species of living organisms. The millions of different species of plants, animals, and microorganisms that live on earth today are related by descent from common ancestors. Biological classifications are based on how organisms are related. Organisms are classified into a hierarchy of groups and subgroups based on similarities which reflect their evolutionary relationships. National Science Education Standards, p. 185 Once a novel whole genome has been sequenced, the sequence and accompanying information can be used by the research community more easily through genome and table browsers, such as those at UCSC. The generic model organism database (GMOD) has been developed for researchers to post genomes and plot information. We will explore genomes at the UCSC genome browser, which contains many finished genomes and is supported by a large group of dedicated researchers. Whole genome sequences from many different species have been aligned by researchers. Now that you’ve gained some experience in understanding the fine structure of a gene in the previous module, lets see what the same gene, beta-globin, looks like when compared among many different species. Do you think it is possible for mutations at a single gene to reveal phylogenetic relationships among vertebrates? Goal To give a basic understanding of genes and their functions. To navigate around a genome, using a genome browser. To give an understanding of how DNA comparisons reflect evolutionary history and gene function. V&C Competencies 1) Ability to apply the process of science: Observational strategies, Hypothesis testing 2) Ability to use quantitative reasoning: Developing and interpreting graphs, Applying statistical 1 methods to diverse data 4) Understand the interdisciplinary nature of science: Chemistry of molecules and biological systems Protocols 1. Go to the UCSC genome browser web page at genome.ucsc.edu, select Genomes, and maximize the window. 2. Select the clade “Mammal”, genome “Human”, clear all text from “position or search term,” and enter the abbreviation for human beta globin “HBB” as a search term. Wait a moment and you will get a list of search results. Select HBB: Homo Sapiens Hemoglobin Beta. 3. You will now see a close up view of the beta-globin gene in humans. Some sections on your screen may look different than below, but the top should be similar. Find the browser navigation tools, exact location of the gene on chromosome 11, the sketch of the gene, and miscellaneous information tracks below the gene sketch. Navigation tools Exact location shown Location on chromosome View of gene Miscellaneous Information Tracks 2 4. The beta globin gene is in the reverse orientation in this view of the chromosome (arrows on the sketch point to the left). Before looking deeper, scroll down and reverse your orientation of the gene so that it matches the orientation of the gene from the hand-annotation exercise you performed earlier. (Worksheet 2A, Question 4) 5. Scroll up to the picture and see if you can reconcile the genome browser’s gene sketch with your hand sketch of this gene. How are UTRs, coding sequences, and introns depicted in the browser? Roughly redraw the sketch below, and label the major pieces. 6. Scroll down to the Comparative Genomics Track controls and adjust the settings so that “conservation” is set to “full.” This will give your browser the most expanded view of the alignment among many different species, which allows you to see how similar (i.e. conserved) each nucleotide is among many different vertebrates. Then hit “refresh” and scroll back up to the view the changes. 7. Under, “Multiz alignment of 46 (your number might differ) species” you will see vertical bars corresponding to each nucleotide location in the genome. The taller the bars, the more conserved the DNA sequence is at that location based on pairwise comparison of the human versus the species identified on the left of the screen. 3 8. To complete the lab, continue from this point to answer the questions on the “Evolution Worksheet” below. Assessment Worksheet: Annotation Name: ________________________ 1. Which species on your screen looks most similar to the human sequence? Does that make phylogenetic sense? Explain. 2. Does it look like some regions of the gene are more conserved than others? a. What evidence supports your answer? b. Which regions of the gene appear more conserved? c. Why might that be the case? 3. Zoom in all the way to the “base level” from chromosome 11:5,247,991-5,248,191. You can do this by entering: “chr11:5,247,991-5,248,191” in the search window. Alternatively, you can center the 4 view on the first intron/exon boundary by “double-click-drag” and zoom to base view. You should be looking at a view of the first exon/intron boundary in forward orientation. You only need to see the first boundary. a. Does this intron/exon border follow the GT/AG “rule”? Explain. b. You should see nine species in the browser viewer, with zebrafish (Danio rerio) on the bottom. If the alignments are hidden, click on the text “Multi-z Alignment of 46 vertebrates,” to expand. What nucleotides (A,G,C,T) do you observe in each of the first four positions of the intron over all nine species? List them in the table below. For each position, what fraction of the nine species are identical (to human)? Ignore gaps in your calculations. 1 Position in Intron 2 3 4 Nucleotides Observed % identity Among all Species c. What could explain the variation in conservation among the four positions that you see in (b)? 5 4. Write down the amino acid sequence for the six amino acids right before the intron begins. a. Fill in the box below to calculate the fraction of 6 sites for which are all species are identical, and then repeat it just for mammals. Fill in the amino acids found at each position and the percent identity among the relevant species. Also calculate the percentage of the 6 sites that are biochemically “similarity” among all species, and among only mammals. Position 1 2 4 3 5 6 % of the 6 sites that are identical among all spp, or similar among all spp 6 % of the 6 sites that are identical among mammals or similar among mammals Amino Acids Observed Over All Species Identical? (Y/N) Similar? (Y/N) Position 1 2 4 3 Amino Acids Observed Over Mammals Identical? (Y/N) Similar? (Y/N) b. How do you explain the patterns detected above? 6 5 c. Do you think the patterns you’ve been describing are unique to Beta Globin, or general? Support your idea by going to the HBD (Homo sapiens hemoglobin delta) gene and repeating the amino acid analysis above. Position 1 2 3 4 5 6 % of the 6 sites that are identical among all spp, or similar among all spp 6 % of the 6 sites that are identical among mammals or similar among mammals Amino Acids Observed Over All Species Identical? (Y/N) Similar? (Y/N) Position 1 2 3 Amino Acids Observed Over Mammals Identical? (Y/N) Similar? (Y/N) 7 4 5