SNPs and Haplotypes

advertisement
SNPs and Haplotypes
Researchers are deciphering the genomes from many organisms, and bioinformatics plays
a critical role in mining information form genome sequences, because of the need to
analyze massive amounts of data using computer processing power and the right
algorithms.
Genomics: Medicine and Microbes
Genomics is the study of functions and interactions of all the genes in the genome, and
this understanding of the genome can lead to discovering new genes and proteins to
address needs in medicine, agriculture, the environment, and other fields. The Human
Genome Project has paved the way for academic labs and companies to use genomics for
medical advances. Some benefits include:
 Improved diagnosis of disease and early detection of genetic predisposition to
disease
 New methods of drug therapy and “pharmacogenomics”, the customizing of drug
selection and dosage to a patient’s genetic makeup
A good overview of the medical benefits from the Human Genome Project is available at
the government-sponsored Human Genome Program
http://www.ornl.gov/sci/techresources/Human_Genome/medicine/medicine.shtml
This resource has additional links to more detailed information about specific medical
applications of the Human Genome Project. Take some time to explore some applications
that interest you on the Medicine and New Genetics web page.
One biotechnology company, deCode Genetics, has taken an innovative, though
controversial, bioinformatics approach to using genomics for gene discovery leading to
new biomedical diagnostic tools and therapies. http://www.decode.com
On the other end of the genome spectrum, even the simple microbe can provide valuable
gene resources. Microbes have a diverse collection of genes and enzymes since these
organisms can thrive in a wide variety of environments – some microbes are even called
“extremophiles” because of their ability to live in extreme environments, such as high or
low temperatures and pH. There are several useful applications for these extremophiles:




Cleanup of toxic waste
Development of renewable energy sources, such as methane and hydrogen
Production of chemical catalysts, reagents, and enzymes to improve efficiency of
industrial processes
Use of genetically altered bacteria as living sensors (biosensors) to detect harmful
chemicals in soil, air, or water
The U.S. Department of Energy has a Microbial Genome Program to explore these useful
applications from microbial genomics. Their website is http://microbialgenome.org
The Microbial Genome Program has also assembled reports with more information about
microbial genomics – you can browse the materials list and download them for free from
the website http://microbialgenome.org/pubs.shtml
SNPs and Haplotypes
Another approach to preventing and treating human disease involves a product of
genomics and bioinformatics called SNPs, or single nucleotide polymorphisms. SNPs are
DNA sequence variations that occur when a single nucleotide (A, T, C, or G) in the
genome sequence is altered. SNPs are the most common type of DNA differences
between individuals, and roughly 10 million SNPs are estimated in the human genome.
SNPs are important in biomedical research since they can be used as a DNA marker to
locate a disease gene.
Haplotypes consist of neighboring SNPs that are inherited together. Haplotypes provide a
way to organize the genetic variation that occurs between individuals – this means
genotyping an individual instead of examining every SNP in an individual (remember
there are millions to examine). Collections of SNPs or haplotypes can be examined by
identifying the “tag” SNPs that determine a particular haplotype.
The next diagram (from the International The HapMap Project) shows the relationship
between SNPs and haplotype. Part A shows separate SNPs from a small region of the
same chromosome in four individuals, part B shows the different haplotypes resulting
from variations in the SNPs (a real haplotype would have many more SNPs than the three
shown), and part C shows the “tag” SNPs that can be used to identify a specific
haplotype. (figure 2 http://www.hapmap.org/whatishapmap.html )
The key to finding disease-related genes depends on discovering a haplotype consistently
associated with a particular disease. For example, an individual could be genotyped to
determine if they have a specific haplotype (or set of haplotypes) associated with cancer.
Even though the individual may be far from showing any physical signs of cancer, the
genetic testing indicates that the individual is at risk, so preventative treatments could be
started.
Currently, genomics and bioinformatics research groups are exhaustively determining the
genetic variation in the human genome, as represented by SNPs and haplotypes. One
major research effort in this area is by a public consortium of researchers called the
International HapMap Project. Recently (Oct 2005), the HapMap Project reported the
completion of the first phase of their project, in which they compile data for more than a
million SNPs from individuals representing four different populations. The goal of this
effort is to provide at least one SNP for every 5,000 bases of human genome sequence. A
second effort to characterize human genetic variation is being carried out by a private
company, Perlegen, a San Francisco Bay Area company focusing on biomedical
applications for SNPs. In a recent Science paper from Perlegen, genome-wide genotyping
was carried out by systematically testing more than 1.5 million SNPs in each of 71
individuals. Obviously, this requires compiling and analyzing a huge amount of data, so
bioinformatics has come to the rescue.
We can look at this genetic variation data by going to the Perlegen website at
http://genome.perlegen.com/browser/index.html
This brings up the Perlegen genotype browse, which gives several options for locating
SNPs. If you scroll down the page, you can also read an explanation of the browser and
how the SNP data is organized.
If you click on the default Gene Name = CFTR, you will be directed to another page
showing the gene area and SNPs (represented by triangles) below the gene area. (CFTR
is cystic fibrosis transmembrane conductance regulator).
You can update the image to include the haplotype regions in each of the three
populations examined – African American, European American and Chinese – by
checking the appropriate boxes in the menu. These blocks are the sets of SNPs that are
inherited together in a human population and used to simplify genotyping individuals.
You can zoom in to a particular area of the CFTR gene (in this example, zoom to 50kbp).
Now you can see the names of the individual SNPs. Also, beneath each name is the
nucleotide variation that occurs in the SNP. You can then click on an individual SNP to
find detailed information about the DNA marker. The population distribution of the
nucleotide variations for the designated SNP is shown. You can examine how each
individual tested for the SNP – there are two letters for each individual since they inherit
two copies of each SNP (one from Mom and the other from Dad).
Question 1: Beta-globin gene Haplotype map
In this problem you will examine the genetic variation in the human beta globin gene,
which encodes a protein subunit of hemoglobin.
A. Locate the gene sequence entry for the human beta globin gene. The HUGO identifier
for this gene is HBB. One way to find the gene sequence entry is to search the NCBI
Entrez Gene database to link to the entry. Be sure to retrieve the human entry, not entries
from some other animal. From the human beta globin gene entry page, list the gene
identifier number (four digit number) and the chromosome location.
B. Use the Perlegen genotype browser http://genome.perlegen.com/browser/index.html to
examine SNPs and haplotypes associated with the beta globin gene. Find the SNP (there
should only be one) located within the gene sequence of the beta globin gene. Note: you
may have to zoom out with the browser to find the SNP. Record the Perlegen SNP_ID
name (starts with afd…) and the two nucleotides listed for the SNP.
C. Using the Perlegen browser for the beta globin gene, identify the haplotype block ID
(7 digit number) from the African-American population that contains the beta globin
gene. Record the identifier name of this haplotype.
D. Click on the haplotype block to get more information and details about the haplotype
block. List the individual SNPs that comprise this haplotype (there should be six SNPs).
E. Go to the HapMap project data browser by pointing your web browser to the home
page http://www.hapmap.org and then clicking on the “Browse Project Data” link in the
left column.
Use the HapMap browser find the beta globin gene SNPs – hint: use the HBB identifier
as a search term. When you reach the results page, look for the SNP that corresponds to
the SNP you found in Part B using the Perlegen browser. Record the HapMap ID (begins
with rs…) and explain how you know that the HapMap and Perlegen SNPs are the same.
Solution 1:
A. The NCBI Entrez Gene entry for the human beta globin gene is GeneID 3043. The
chromosome location is listed on the same page in the “Genomic Context” section – the
location is 11p15.5
B. Enter HBB in the “By Gene Name” search box at the Perlegen Browser. You needed
to zoom out for the SNP afd0606878. Click on the afd0606878 link for the details of the
SNP nucleotide variation. The two nucleotides are G/C. This means that at this sequence
position, either a G or C is found in the three populations tested.
C. The African-American haplotypes are indicated in green. The arrow indicates the
haplotype that includes the beta globin gene. By clicking on the haplotype, you can get
the ID number: 1211087.
D. The individual SNPs that comprise this haplotype can be found by clicking on the
haplotype link. The six SNPs are listed: afd0606880, afd0606878, afd0606876,
afd0606875, afd0606873, afd0606872
E. By entering HBB in the search box o the “Browse Project Data” page you will reach
the results summary. The SNP nucleotide variations are listed by each SNP. The SNP
with the same nucleotide variations as afd0606878 (from the Perlegne databank) is
rs10768683. You can confirm that the two SNPs are the same by studying the results
page from the Perlegen browser results – see page 2 of the key. A cross-reference rsid is
provided –rs10768683.
Download