Basic QTL Analysis Is there an association between marker genotype and quantitative trait phenotype? - Classify progeny by marker genotype g = genotypic effect - Compare phenotypic mean between classes (t-test or ANOVA) µ1 = trait mean for - Significance = marker linked to QTL genotypic class AA - Difference between means = estimate of QTL effect g = (µ1 - µ2)/2 µ2 = trait mean for genotypic class aa y βo 0 -1 aa AA Genotypic classes x Notations for single-QTL models in backcross and F2 populations Model Backcross (Qq x QQ) DH (qq x QQ) F2 (Qq x Qq) Genotype Value QQ µ1 Qq µ2 Genetic effect g = 0.5(µ1 - µ2) QQ µ1 Qq µ2 Genetic effect g = 0.5(µ1 - µ2) QQ µ1 Qq µ2 qq µ3 Additive a = 0.5(µ1 - µ3) Dominance d = 0.5(2µ2 - µ1 - µ3) Single-marker analysis • How it works – Finds associations between marker genotype and trait value y j f ( A) j r A (marker) Q (putative QTL) • When to use – Order of markers unknown or incomplete maps – Quick scan – Find best possible QTLs – Identify missing or incorrectly formatted data • Limitations Underestimates QTL number and effects QTL position can not be precisely determined r = recombination fraction yj = trait value for the jth individual in the population μ = population mean f(A) = function of marker genotype εj = residual associated with the jth individual Single-marker analysis in backcross progeny • Parents: • Backcross: AAQQ x aaqq aaqq x AaQq x AAQQ Expected Frequency • BC Progeny AaQq AAQQ 0.5 (1 - r) Aaqq AAQq 0.5r aaQq AaQQ 0.5r aaqq AaQq 0.5(1 - r) r is recombination frequency between A and Q Expected QTL genotypic frequencies conditional on genotypes Marker genotype Observed count Marginal frequencies QTL genotype QQ Qq Expected trait value Joint frequency AA n1 0.5 0.5(1-r) 0.5r Aa n2 0.5 0.5r 0.5(1-r) Conditional frequency AA n1 0.5 1-r r (1-r)µ1 + rµ2 Aa n2 0.5 r 1-r rµ1 + (1-r)µ2 Single-marker analysis r A Q (marker) (putative QTL) - Simple t-test - Analysis of variance - Linear regression - Likelihood Simple t-test using backcross progeny H0: [μAa - μaa ] = 0 Yj(i)k = μ + Mi + g(M)j(i) + ei(j)k (a + d) = 0 r = 0.5 tM ˆ Aa ˆ aa 1 1 sˆ n1 n2 2 M ˆ Aa ˆ aa tM 2 sˆAa sˆaa2 n1 n2 t-distribution with df = N – 2 Yj(i)k = trait value for individual j with genotype i in the replication k μ = population mean Mi = effect of the marker genotype g(M)j(i) = genotypic effect which cannot be explained by the marker genotype ei(j)k = error term µAa = trait mean for genotypic class Aa µaa = trait mean for genotypic class aa s2M = pooled variance within the two classes If tM is significant, then a QTL is declared to be near the marker Analysis of variance using backcross H : [μ progeny 0 Aa - μaa ] = 0 (a + d) = 0 r = 0.5 Source df MS (Mean Square) Expected MS Total Genetics N - 1 MSG e2 b G2 Marker 1 MSM e2 b G2 (QTL ) 4r (1 r )a 2 bc(1 2r ) 2 a 2 G(Marker) N-2 MSG(M) e2 Residual N (b - 1) MSE e2 b 2 G ( QTL ) MSM F MSG(M ) F-distribution with 1 and N – 2 df If F is significant, then a QTL is declared to be near the marker F = t if df for numerator is 1 4r (1 r )a 2 N= no. of individuals in pop. b = no. of replications r = recombination fraction Analysis of variance using SAS (A simple example) data a; input Individuals Trait1 Marker1 Marker2; cards; 1 1.57 A B 2 1.35 B A 3 10.7 B B … proc glm; class Marker1 Marker2; model Trait1 = Marker1 Marker2; lsmeans Marker1 Marker2; run; Linear regression using backcross progeny y j 0 1x j j H0: [μAa - μaa ] = 0 (a + d) = 0 R2: percent of the phenotypic variance explained by the QTL r = 0.5 y β1 Dummy variables: yj= trait value for the jth individual βo aa = -1 xj= dummy variable Aa = 1 βo= intercept for the regression 0 -1 Expectations: aa Aa Genotypic classes x β1= slope for the regression j= random error E(βo) = 0.5 (µAa + µaa) = Mean for the trait E(β1) = 0.5 (1 - 2r) (µAa - µaa) = (1 - 2r) g = 0.5 (a + d) (1 - 2r) Linear regression using backcross progeny Interpretation of results depends on coding of the dummy variables 6 y y=3 +x+e 5 -1 4 3 3 2 2 1 1 0 1 aa Aa Genotypic classes µ=3 µAa = 4 µaa = 2 g = 0.5(µAa - µaa) = 1 y=3 -x+e 5 4 0 -2 y 6 x2 0 -2 -1 0 1 aa Aa Genotypic classes µ=3 µAa = 2 µaa = 4 g = 0.5(µAa - µaa) = -1 x2 A likelihood approach using backcross progeny Joint distribution function: L ( yi j ) 2 p(Q j / M i ) exp 2 2 i 1 j 1 N 1 2 N 2 A likelihood approach using backcross progeny (cont.) 2 2 N ( y ) i j 2 2 Ln L( 1 , 2 , , r Ln p(Q j / M i ) exp Ln ( 2 ) 2 2 j 1 i 1 2 N 1 N N LnL(1 2 2 ( yi ) 2 Ln(2 2 ) 2 i 1 2 ( yi 1 ) 2 ( yi 2 ) 2 N 2 LnL(r 0.5) Lnexp Ln ( 2 ) 2 2 i 1 2 2 2 N A likelihood approach using backcross progeny (cont.) (Weller, 1986) G-statistics H0: [μAa - μaa ] = 0 Likelihood ratio test statistics (LR) Probability of occurrence of the data under the (a + d) = 0 null hypothesis r = 0.5 G 2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ) lnL(r 0.5) G is distributed asymptotically as a chisquare variable with one degree of freedom G 2 ln L( ˆ Aa , ˆ aa , ˆ 2 , rˆ) lnL( Aa aa ) The t-test is approximately equivalent to the likelihood ratio test using this formula LOD score LOD : Logarithm of the odds ratio Base 10 logarithm of G LR= 2 (log)LOD = 4.605LOD LOD= 0.217LR LOD is interpreted as an odds ratio (probability of observing the data under linkage/probability of observing the same data under no linkage) No theoretical distribution is needed to interpret a lOD score Key value: ≥ 3 (H1 is 1000 times more likely than H0 -no linkage-) (approx: p = 0.001) p= probability of type I error Type I error: false positive (declare a QTL when there is no QTL) G-Statistics and LOD score Single-marker analysis Summary • • • • Identify marker-trait associations Identify missing or incorrectly formatted data Genetic map is not required Divide the population into subpopulations based on the allelic segregation of individual loci (one marker at a time) • Get trait means for each subpopulation (genotypic class) • Determine if the subpopulations trait means are significantly different • Limitations Underestimates QTL number and effects QTL position can not be precisely determined