Peter Castaldi
January 29, 2013
• Introduce the concept of linkage disequilibrium
(LD)
• Describe how the HapMap project provides publically available information on genetic variation and LD structure
• Review how LD enables genome-wide screens with only a subset of genome-wide SNP markers
• Describe the design of chip-based genotype assays
• 3 billion base pairs, 23 paired chromosomes
• 99.9% sequence similarity between individuals
• ~12 million variant sites
• Single base pair change (A C GT A T GT), aka S ingle
N ucleotide P olymorphism
• ~12 million across the genome
• Insertions/Deletions (TGGT TTC TA TGGT---TA)
• Can be of variable size
• Trinucelotide repeats (microsatellites)
• Highly polymorphic, less common than SNPs
• Responsible for certain clinic disorders (Huntington’s, Fragile
X, myotonic dystrophy)
• SNPs can have up to four possible alleles (A,C,G,T), most have only two alleles present in human populations
• Each person has two SNP alleles (one for each copy of the chromosome)
• when both copies are the same, you’re homozygous (i.e. AA,
CC, GG, TT). When they’re different (AT), your heterozygous.
• Each allele has a frequency in which it appears in a given population
• major allele (more common), minor allele (less common)
• they sum to 1 (or 100%)
• Properties of SNPs that make them good markers for
GWAS
• densely spaced across the genome
• usually bi-allelic (only 2 alleles in the population, simplifies statistical tests)
• GWAS chips can effectively represent most common variation with just a subset of SNPs
• with ~500,000 SNPs, most common variation can be captured
• this is because there is significant correlation between neighboring SNPs
Linkage Disequilibrium Causes
Correlation Between Neighboring SNPs
• Mendel’s laws state that genes (alleles) are independently transferred across generations
(random assortment – linkage equilibrium).
• This is not the case when two genetic loci are physically close to each other.
• When two physically close genetic loci are not randomly assorted, this is called linkage disequilibrium.
Linkage Equilibrium Arises Because of
Meiotic Recombination http://kenpitts.net/hbio/8cell_repro/meiosis_pics.htm
Gametogenesis
From Paternal grandfather
From Paternal grandmother
Paternal DNA Maternal DNA
X
Y x y
X y
X y
Z z z z
X
Y z
X z y
Recombination Breaks Up
Chromosomal Segments
Over Generations
• recombination is not uniform across the genome
( recombination hotspots ).
• SNPs within the yellow region are correlated with each other and form haplotypes .
• Because of this correlation, one can often use a single SNP from a haplotype to represent all the
SNP variation within a haplotype.
Haplotype Structure Reflects
Evolutionary History
• The structure of haplotype blocks varies across racial groups
• African populations have short LD blocks, reflecting the longer evolutionary history of those populations
~500,000 SNP Markers Can Reasonably Represent Most of the Common Genetic Variation in European Genomes
• GWAS relies upon linkage disequilibrium and the ubiquitous nature of SNP markers to enable genome-wide surveys of the impact of common variation on disease susceptibility
Pe’er et al. Nat Gen. 2006
The HapMap Project is a catalog of human variation across populations
• The Human Genome project provided the complete human sequence for a small number of individuals
• To get an accurate sense of variable sites, data from many individuals is needed
• HapMap has three iterations
(http://hapmap.ncbi.nlm.nih.gov/)
• dense genotype data from multiple populations groups
• CEU – individuals of Northern and Western European ancestry from Utah
• YRI – Yorubans from Nigeria
• JPT – Japanese from Tokyo
• CHB – Han Chinese from Beijing
Data from the HapMap Project Enabled
GWAS Chip Design
• Information from HapMap Used in chip design
• panel of potential SNPs to use in a genotype chip
• population specific LD structure to allow the identification of tag SNPs that effectively tag haplotypes
• Linkage disequilibrium (LD) means that sites of genetic variation can serve as “markers” for larger chromosomal segments.
• Correlation between markers is quantified with rsquared and D’.
GWAS identify novel disease loci, but additional localization is often necessary
http://scienceeducation.nih.gov/newsnapshots/TOC_Chips/Chips_RITN/How_Chips_Wo rk_1/how_chips_work_1.html
Kang et al. The American Journal of Human Genetics Volume 74, Issue 3 2004 495 - 510
• Genetic material is transmitted across generations in blocks called haplotypes .
• Linkage disequilibrium and haplotype blocks allow for SNP tagging approaches that enable GWAS chips to capture common genetic variation with a subset of genetic markers.
• Haplotype structure varies across ancestral groups.
• The HapMap project catalogs human genetic variation and LD structure across populations.