SNPs and Haplotypes Researchers are deciphering the genomes from many organisms, and bioinformatics plays a critical role in mining information form genome sequences, because of the need to analyze massive amounts of data using computer processing power and the right algorithms. Genomics: Medicine and Microbes Genomics is the study of functions and interactions of all the genes in the genome, and this understanding of the genome can lead to discovering new genes and proteins to address needs in medicine, agriculture, the environment, and other fields. The Human Genome Project has paved the way for academic labs and companies to use genomics for medical advances. Some benefits include: Improved diagnosis of disease and early detection of genetic predisposition to disease New methods of drug therapy and “pharmacogenomics”, the customizing of drug selection and dosage to a patient’s genetic makeup A good overview of the medical benefits from the Human Genome Project is available at the government-sponsored Human Genome Program http://www.ornl.gov/sci/techresources/Human_Genome/medicine/medicine.shtml This resource has additional links to more detailed information about specific medical applications of the Human Genome Project. Take some time to explore some applications that interest you on the Medicine and New Genetics web page. One biotechnology company, deCode Genetics, has taken an innovative, though controversial, bioinformatics approach to using genomics for gene discovery leading to new biomedical diagnostic tools and therapies. http://www.decode.com On the other end of the genome spectrum, even the simple microbe can provide valuable gene resources. Microbes have a diverse collection of genes and enzymes since these organisms can thrive in a wide variety of environments – some microbes are even called “extremophiles” because of their ability to live in extreme environments, such as high or low temperatures and pH. There are several useful applications for these extremophiles: Cleanup of toxic waste Development of renewable energy sources, such as methane and hydrogen Production of chemical catalysts, reagents, and enzymes to improve efficiency of industrial processes Use of genetically altered bacteria as living sensors (biosensors) to detect harmful chemicals in soil, air, or water The U.S. Department of Energy has a Microbial Genome Program to explore these useful applications from microbial genomics. Their website is http://microbialgenome.org The Microbial Genome Program has also assembled reports with more information about microbial genomics – you can browse the materials list and download them for free from the website http://microbialgenome.org/pubs.shtml SNPs and Haplotypes Another approach to preventing and treating human disease involves a product of genomics and bioinformatics called SNPs, or single nucleotide polymorphisms. SNPs are DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the genome sequence is altered. SNPs are the most common type of DNA differences between individuals, and roughly 10 million SNPs are estimated in the human genome. SNPs are important in biomedical research since they can be used as a DNA marker to locate a disease gene. Haplotypes consist of neighboring SNPs that are inherited together. Haplotypes provide a way to organize the genetic variation that occurs between individuals – this means genotyping an individual instead of examining every SNP in an individual (remember there are millions to examine). Collections of SNPs or haplotypes can be examined by identifying the “tag” SNPs that determine a particular haplotype. The next diagram (from the International The HapMap Project) shows the relationship between SNPs and haplotype. Part A shows separate SNPs from a small region of the same chromosome in four individuals, part B shows the different haplotypes resulting from variations in the SNPs (a real haplotype would have many more SNPs than the three shown), and part C shows the “tag” SNPs that can be used to identify a specific haplotype. (figure 2 http://www.hapmap.org/whatishapmap.html ) The key to finding disease-related genes depends on discovering a haplotype consistently associated with a particular disease. For example, an individual could be genotyped to determine if they have a specific haplotype (or set of haplotypes) associated with cancer. Even though the individual may be far from showing any physical signs of cancer, the genetic testing indicates that the individual is at risk, so preventative treatments could be started. Currently, genomics and bioinformatics research groups are exhaustively determining the genetic variation in the human genome, as represented by SNPs and haplotypes. One major research effort in this area is by a public consortium of researchers called the International HapMap Project. Recently (Oct 2005), the HapMap Project reported the completion of the first phase of their project, in which they compile data for more than a million SNPs from individuals representing four different populations. The goal of this effort is to provide at least one SNP for every 5,000 bases of human genome sequence. A second effort to characterize human genetic variation is being carried out by a private company, Perlegen, a San Francisco Bay Area company focusing on biomedical applications for SNPs. In a recent Science paper from Perlegen, genome-wide genotyping was carried out by systematically testing more than 1.5 million SNPs in each of 71 individuals. Obviously, this requires compiling and analyzing a huge amount of data, so bioinformatics has come to the rescue. We can look at this genetic variation data by going to the Perlegen website at http://genome.perlegen.com/browser/index.html This brings up the Perlegen genotype browse, which gives several options for locating SNPs. If you scroll down the page, you can also read an explanation of the browser and how the SNP data is organized. If you click on the default Gene Name = CFTR, you will be directed to another page showing the gene area and SNPs (represented by triangles) below the gene area. (CFTR is cystic fibrosis transmembrane conductance regulator). You can update the image to include the haplotype regions in each of the three populations examined – African American, European American and Chinese – by checking the appropriate boxes in the menu. These blocks are the sets of SNPs that are inherited together in a human population and used to simplify genotyping individuals. You can zoom in to a particular area of the CFTR gene (in this example, zoom to 50kbp). Now you can see the names of the individual SNPs. Also, beneath each name is the nucleotide variation that occurs in the SNP. You can then click on an individual SNP to find detailed information about the DNA marker. The population distribution of the nucleotide variations for the designated SNP is shown. You can examine how each individual tested for the SNP – there are two letters for each individual since they inherit two copies of each SNP (one from Mom and the other from Dad). Question 1: Beta-globin gene Haplotype map In this problem you will examine the genetic variation in the human beta globin gene, which encodes a protein subunit of hemoglobin. A. Locate the gene sequence entry for the human beta globin gene. The HUGO identifier for this gene is HBB. One way to find the gene sequence entry is to search the NCBI Entrez Gene database to link to the entry. Be sure to retrieve the human entry, not entries from some other animal. From the human beta globin gene entry page, list the gene identifier number (four digit number) and the chromosome location. B. Use the Perlegen genotype browser http://genome.perlegen.com/browser/index.html to examine SNPs and haplotypes associated with the beta globin gene. Find the SNP (there should only be one) located within the gene sequence of the beta globin gene. Note: you may have to zoom out with the browser to find the SNP. Record the Perlegen SNP_ID name (starts with afd…) and the two nucleotides listed for the SNP. C. Using the Perlegen browser for the beta globin gene, identify the haplotype block ID (7 digit number) from the African-American population that contains the beta globin gene. Record the identifier name of this haplotype. D. Click on the haplotype block to get more information and details about the haplotype block. List the individual SNPs that comprise this haplotype (there should be six SNPs). E. Go to the HapMap project data browser by pointing your web browser to the home page http://www.hapmap.org and then clicking on the “Browse Project Data” link in the left column. Use the HapMap browser find the beta globin gene SNPs – hint: use the HBB identifier as a search term. When you reach the results page, look for the SNP that corresponds to the SNP you found in Part B using the Perlegen browser. Record the HapMap ID (begins with rs…) and explain how you know that the HapMap and Perlegen SNPs are the same. Solution 1: A. The NCBI Entrez Gene entry for the human beta globin gene is GeneID 3043. The chromosome location is listed on the same page in the “Genomic Context” section – the location is 11p15.5 B. Enter HBB in the “By Gene Name” search box at the Perlegen Browser. You needed to zoom out for the SNP afd0606878. Click on the afd0606878 link for the details of the SNP nucleotide variation. The two nucleotides are G/C. This means that at this sequence position, either a G or C is found in the three populations tested. C. The African-American haplotypes are indicated in green. The arrow indicates the haplotype that includes the beta globin gene. By clicking on the haplotype, you can get the ID number: 1211087. D. The individual SNPs that comprise this haplotype can be found by clicking on the haplotype link. The six SNPs are listed: afd0606880, afd0606878, afd0606876, afd0606875, afd0606873, afd0606872 E. By entering HBB in the search box o the “Browse Project Data” page you will reach the results summary. The SNP nucleotide variations are listed by each SNP. The SNP with the same nucleotide variations as afd0606878 (from the Perlegne databank) is rs10768683. You can confirm that the two SNPs are the same by studying the results page from the Perlegen browser results – see page 2 of the key. A cross-reference rsid is provided –rs10768683.