Bayesian Variable Selection in
Semiparametric Regression
Modeling with Applications to
Genetic Mappping
Fei Zou
Department of Biostatistics
University of North Carolina-Chapel Hill
Email: fzou@bios.unc.edu
June 2012 Finland
http://www.cs.unc.edu/Courses/comp590-090-f06/Slides/CSclass_Threadgill.ppt
The Central Dogma of Molecular Biology
tall
short
•Significant difference in genotype distributions?
http://psb.stanford.edu/psb06/presentations/association_mapping.pdf
• Copied (with modifications) from psb.stanford.edu/psb06/presentations/association_mapping.pdf
Mendel’s
Experiment
Experimental Crosses: F2
P1
Parents
P2
Experimental Crosses
•
P1
F2
AA
F1
F2:
P1
BB
P1
F1
F1
AA
AB
AB
P2
AA
BB
AB
BB
Backcross(BC)
P2
AB
BC:
AB
AB
AA
AB
F2 Data Format
0: homozygous AA, 2: homozygous BB,
1: heterozygote AB.
Data Structure
• For each subject i (i=1,2,…,n)
– Phenotype: yi
– Genotypes: xij (coded as 0, 1, 2 for genotypes
AA, AB and BB, respectively) at marker j
(j=1,2,…,m)
– Genetic map: locations of markers
– Other non-genetic covariates, such as age,
sex, environmental conditions
Locations of markers
Linkage Analysis
• Quantitative trait loci (QTL): a particular region of
the genome containing one or more genes that
are associated with the trait being assayed or
measured
QTL Mapping of Experimental
Crosses
• Single QTL Mapping
• Single marker analysis
• Interval mapping: Lander & Botstein (1989,
Genetics)
• Multiple QTL mapping
• Composite interval mapping
• Multiple interval mapping
• Bayesian analysis
Single Marker Analysis
Correlations of marker
genotypes in experimental
crosses
Interval Mapping
• Traditional QTL mapping method
• Treat QTL position as unknown and use marker
genotypes to infer conditional probabilities of QTL
genotypes
• Profile LOD scores calculated across whole genome
– LOD score is a measure for strength of support for QTL
– LOD = LRT/4.8
– In any region where the profile exceeds a (genome-wide)
significance threshold, a QTL is declared at the position with the
highest LOD score.
Profile LOD
8
lod
6
4
2
0
1
2
3 4
5
6 7 8 9 10 11 12 13 14 15 16171819 X
Chromosome
QTL
• Old believe: one trait one gene
– very unlikely
• Most traits have a significant
environmental exposure component
• The vast majority of biological traits are
caused by complex polygenic interactions
– also context dependent
Multiple QTL Mapping
• Most complicated traits are caused by
multiple (potentially interacting) genes,
which also interact with environmental
stimuli
• Single QTL interval mapping
– Ghost QTL
– Low power
if multiple QTLs affect the trait
Two QTL Data
Two QTL with opposite effects
Two QTL with effects in same direction
Multiple QTL Mapping
• Available Methods
– Composite interval mapping: searching for a putative QTL in a
given region while simultaneously fitting partial regression
coefficients for "background markers" to adjust the effects of
other QTLs outside the region
• which background markers to include; window size etc
– Multiple interval mapping: fitting multiple QTLs simultaneously
• Computationally very intensive; how many QTLs to fit?
Multiple QTL Mapping
Multiple QTL Mapping
Multiple QTL Mapping
Bayesian QTL Mapping
• Reversible jump Markov chain Monte Carlo
(MCMC) (Green 1995): treat the number of
QTLs as a parameter
– Change of dimensionality, the acceptance probability for
such dimension change, which in practice, may not be
handled correctly (Ven 2004)
• Bayesian variable selection procedures
– composite model space (Yi 2004)
– stochastic search variable selection (SSVS) (George and
McCulloch 1993)