1 - r - Barley World

advertisement
Basic QTL Analysis
Is there an association between marker genotype and quantitative trait phenotype?
- Classify progeny by marker genotype
g = genotypic effect
- Compare phenotypic mean between classes (t-test or ANOVA)
µ1 = trait mean for
- Significance = marker linked to QTL
genotypic class AA
- Difference between means = estimate of QTL effect
g = (µ1 - µ2)/2
µ2 = trait mean for
genotypic class aa
y
βo
0
-1
aa
AA
Genotypic classes
x
Notations for single-QTL models in backcross and F2
populations
Model
Backcross (Qq x QQ)
DH (qq x QQ)
F2
(Qq x Qq)
Genotype
Value
QQ
µ1
Qq
µ2
Genetic effect
g = 0.5(µ1 - µ2)
QQ
µ1
Qq
µ2
Genetic effect
g = 0.5(µ1 - µ2)
QQ
µ1
Qq
µ2
qq
µ3
Additive
a = 0.5(µ1 - µ3)
Dominance
d = 0.5(2µ2 - µ1 - µ3)
Single-marker analysis
• How it works
– Finds associations between marker genotype
and trait value
y j    f ( A)   j
r
A
(marker)
Q
(putative QTL)
• When to use
– Order of markers unknown or incomplete
maps
– Quick scan
– Find best possible QTLs
– Identify missing or incorrectly formatted
data
• Limitations
Underestimates QTL number and effects
QTL position can not be precisely
determined
r = recombination fraction
yj = trait value for the jth
individual in the population
μ = population mean
f(A) = function of marker
genotype
εj = residual associated with
the jth individual
Single-marker analysis in backcross
progeny
• Parents:
• Backcross:
AAQQ x aaqq
aaqq
x
AaQq
x
AAQQ
Expected
Frequency
• BC Progeny
AaQq
AAQQ
0.5 (1 - r)
Aaqq
AAQq
0.5r
aaQq
AaQQ
0.5r
aaqq
AaQq
0.5(1 - r)
r is recombination frequency between A and Q
Expected QTL genotypic frequencies
conditional on genotypes
Marker
genotype
Observed
count
Marginal
frequencies
QTL genotype
QQ
Qq
Expected trait
value
Joint frequency
AA
n1
0.5
0.5(1-r)
0.5r
Aa
n2
0.5
0.5r
0.5(1-r)
Conditional frequency
AA
n1
0.5
1-r
r
(1-r)µ1 + rµ2
Aa
n2
0.5
r
1-r
rµ1 + (1-r)µ2
Single-marker analysis
r
A
Q
(marker) (putative QTL)
- Simple t-test
- Analysis of variance
- Linear regression
- Likelihood
Simple t-test using backcross progeny
H0: [μAa - μaa ] = 0
Yj(i)k = μ + Mi + g(M)j(i) + ei(j)k
(a + d) = 0
r = 0.5
tM 
ˆ Aa  ˆ aa
1 1
sˆ   
 n1 n2 
2
M
ˆ Aa  ˆ aa
tM 
2
sˆAa
sˆaa2

n1 n2
t-distribution with df = N – 2
Yj(i)k = trait value for individual j with
genotype
i in the replication k
μ = population mean
Mi = effect of the marker genotype
g(M)j(i) = genotypic effect which cannot be
explained by the marker genotype
ei(j)k = error term
µAa = trait mean for genotypic class Aa
µaa = trait mean for genotypic class aa
s2M = pooled variance within the two
classes
If tM is significant, then a QTL is declared to be near the marker
Analysis of variance using backcross
H : [μ
progeny
0
Aa
- μaa ] = 0
(a + d) = 0
r = 0.5
Source
df
MS (Mean
Square)
Expected MS
Total Genetics N - 1
MSG
 e2  b G2
Marker
1
MSM
 e2  b  G2 (QTL )  4r (1  r )a 2  bc(1  2r ) 2 a 2
G(Marker)
N-2
MSG(M)
 e2
Residual
N (b - 1)
MSE
 e2

 b
2
G ( QTL )
MSM
F
MSG(M )
F-distribution with 1 and N – 2 df
If F is significant, then a QTL is declared to be near the marker
F = t if df for numerator is 1

 4r (1  r )a 
2
N= no. of individuals in pop.
b = no. of replications
r = recombination fraction
Analysis of variance using SAS
(A simple example)
data a;
input Individuals Trait1 Marker1 Marker2;
cards;
1 1.57 A B
2 1.35 B A
3 10.7 B B
…
proc glm;
class Marker1 Marker2;
model Trait1 = Marker1 Marker2;
lsmeans Marker1 Marker2;
run;
Linear regression using backcross progeny
y j  0  1x j   j
H0: [μAa - μaa ] = 0
(a + d) = 0
R2: percent of the phenotypic variance explained by
the QTL
r = 0.5
y
β1
Dummy variables:
yj= trait value for the jth
individual
βo
aa = -1
xj= dummy variable
Aa = 1
βo= intercept for the regression
0
-1
Expectations:
aa
Aa
Genotypic classes
x
β1= slope for the regression
j= random error
E(βo) = 0.5 (µAa + µaa) = Mean for the trait
E(β1) = 0.5 (1 - 2r) (µAa - µaa) = (1 - 2r) g = 0.5 (a + d) (1 - 2r)
Linear regression using backcross
progeny
Interpretation of results depends on coding of the dummy variables
6
y
y=3 +x+e
5
-1
4
3
3
2
2
1
1
0
1
aa
Aa
Genotypic classes
µ=3
µAa = 4
µaa = 2
g = 0.5(µAa - µaa) = 1
y=3 -x+e
5
4
0
-2
y
6
x2
0
-2
-1
0
1
aa
Aa
Genotypic classes
µ=3
µAa = 2
µaa = 4
g = 0.5(µAa - µaa) = -1
x2
A likelihood approach using backcross
progeny
Joint distribution function:
L
 ( yi   j ) 2 
p(Q j / M i ) exp


2
2

i 1 j 1


N
1
 2 
N
2
A likelihood approach using backcross progeny
(cont.)
2
2


  N
(
y


)

i
j
2
2
Ln L( 1 , 2 ,  , r   Ln p(Q j / M i ) exp

Ln
(
2

)


2
2
 j 1
i 1

  2


N
1 N
N
LnL(1  2      2  ( yi   ) 2  Ln(2 2 )
2 i 1
2
  ( yi  1 ) 2   ( yi   2 ) 2   N
2
LnL(r  0.5)   Lnexp


Ln
(
2

)




2
2
i 1
  2
 2
  2
N
A likelihood approach using backcross
progeny (cont.)
(Weller, 1986)
G-statistics
H0: [μAa - μaa ] = 0
Likelihood ratio test statistics (LR)
Probability of occurrence of the data under the
(a + d) = 0
null
hypothesis


r = 0.5

G  2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ)  lnL(r  0.5)

G is distributed asymptotically as a chisquare variable with one degree of
freedom


G  2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ)  lnL(  Aa   aa   )
The t-test is approximately equivalent to
the likelihood ratio test using this formula
LOD score
LOD : Logarithm of the odds ratio
Base 10 logarithm of G
LR= 2 (log)LOD = 4.605LOD
LOD= 0.217LR
LOD is interpreted as an odds ratio
(probability of observing the data under linkage/probability of
observing the same data under no linkage)
No theoretical distribution is needed to interpret a lOD score
Key value: ≥ 3 (H1 is 1000 times more likely than H0 -no linkage-)
(approx: p = 0.001)
p= probability of type I error
Type I error: false positive (declare a QTL when there is no QTL)
G-Statistics and LOD score
Single-marker analysis
Summary
•
•
•
•
Identify marker-trait associations
Identify missing or incorrectly formatted data
Genetic map is not required
Divide the population into subpopulations based on the
allelic segregation of individual loci (one marker at a time)
• Get trait means for each subpopulation (genotypic class)
• Determine if the subpopulations trait means are significantly
different
• Limitations
Underestimates QTL number and effects
QTL position can not be precisely determined
Download