Day 2 QTL Detection Objective Present principles for detection of genes affecting quantitative traits (QTL) using genetic markers in ‘simple’ experimental designs Concepts covered relevant to issues in ‘genomic selection’ 1. 2. 3. 4. 5. 6. Single locus quantitative genetic model Principle of use of LD to detect QTL using markers Overview of strategies for QTL detection QTL detection using line crosses QTL interval mapping in line crosses QTL detection in line crosses – additional topics a. Significance testing b. Accuracy of position estimates c. Breed crosses (vs inbred line crosses) 7. QTL detection in outbred populations – linkage analysis 8. Summary and limitations 9. Software for QTL mapping 1 1. Single locus Quantitative Genetic Model • Partition phenotype into genetic and environmental components: P = mean + G + E • G = collective effect of many genes = quantitative trait loci (QTL) • Genotypes for QTL have an associated genotypic value: GT = E( P | T ) GT = phenotype you expect to get from an individual with genotype T GT = Average phenotype over all individuals with genotype T GT is often deviated from the mean Î overall average GT is zero 2 1 Falconer Model for effects of QTL Genotype T A2A2 A1A2 A1A1 Genotypic value GT μ–a μ μ+d μ+a μ is NOT the population mean - it is the “mid-homozygote” value. - it is often standardized to zero (by subtraction) T Under HWE: Frequency, f( T) Genotypic value, GT f(T ) x GT A1A1 p2 a p2a A1A2 2pq d 2pqd A2A2 q2 –a –q2a Population mean = E(GT) = M = p2a + 2pqd + –q2a = a(p – q ) + 2pqd 3 Example The pygmy gene in mice Allele frequency: Pr(+) = p = 0.7 q = 0.3 ++ + pg pg pg Average weight (gr): 14 12 6 Genotypic value GT a =4 d =2 –a = –4 Genotype: Expected freq. under HWE: 2 p = 0.49 2pq = 0.42 Î μ = 14+6 = 10 2 2 q = 0.09 Mean GT = E(GT) = M = 0.49*4 + 0.42*2 + 0.09*(-4) = 2.44 = a(p – q ) + 2pqd = 4(0.7-0.3) + 2*0.7*0.3*2 = 2.44 Expected population mean = 0.49*14 + 0.42*12 + 0.09*6 = 12.44 = μ + E(GT) Most QTL have much smaller effects than the mouse pygmy gene and cannot be observed directly 4 2 How can we find these QTL? Since we cannot observe the QTL directly, we want to use (or create) an association between the QTL and something we CAN observe: A genetic marker… 2. Principles of the use of LD to detect QTL using markers 5 Molecular Genetics “In Search of the Holy Grail” M Q M Q M Q m q m q m q Major genes Quantitative Trait Loci (QTL) = position (locus) on genome associated with genetic differences for a quantitative trait 6 3 Most QTL cannot be observed at DNA level Two types of observable molecular genetic loci • Functional mutations - known genes • Most beneficial and easy to use • Difficult to find Q q M • Anonymous markers linked to QTL • Easier to find m • More restrictive and difficult to use Q Use of markers for QTL detection and MAS relies on association of markers with phenotype QTL detection M Q m q Marker Genotype MM Mm mm Mean Phenotype 20 18 14 MAS q Allele M is associated with favorable QTL allele Select MM or individuals that inherited allele M Requires Linkage Disequilibrium between marker and QTL 8 4 QTL has effect on phenotype, marker does not Illustration that marker genotype means don’t differ if marker and QTL are in Linkage Equilibrium Allele frequencies:P(M)=pM P(m)=qM P(Q)=p P(q)=q D=0 Genotypic Frequency value M Q M Q m Q μ+a M Q pM2p2 m Q 2pMqMp2 m Q qM2p2 μ+d μ+d μ-a M Q M q M q M Q M q M q pM2pq pM2pq pM2q2 Average μ+a(p-q)+2pqd M Q m q M q m Q M q m q 2pMqMpq 2pMqMpq 2pMqMq2 μ+a(p-q)+2pqd m Q m q m q m Q m q m q qM2pq qM2pq qM2q2 μ+a(p-q)+2pqd 9 Illustration that marker genotype means don’t differ if marker and QTL are in Linkage Equilibrium Allele frequencies: P(M)=pM P(m)=qM P(Q)=0.7 P(q)=0.3 D=0 Genotypic Example value Q M M m Q QFrequency 10 M m Q 2pMqM(.49) m Q qM2(.49) Q pM2(.49) 8 8 5 Average M Q M q M q M Q M q M q pM2(.21) pM2(.21) pM2(.09) .49*10+.21*8+.21*8+.09*5=8.71 M Q m q M q m Q M q m q 2pMqM(.21) 2pMqM(.21) 2pMqM(.09) m Q m q m q m Q m q m q qM2(.21) qM2(.21) qM2(.09) .49*10+.21*8+.21*8+.09*5=8.71 .49*10+.21*8+.21*8+.09*5=8.71 10 5 Detection of QTL based on markers requires Linkage Disequilibrium between marker and QTL Relative frequency of Q must differ between marker genotypes Example (arbitrary) Allele frequencies: P(M) = pM =0.4 P(Q) = p =0.7 P(m) = qM =0.6 P(q) = q = 0.3 Assumed Haplotype frequencies M Q 0.38 = pMp + D M q 0.02 = pMq - D m Q 0.32 = qMp - D m q 0.28 = qMq + D 8 8 5 Average = 0.38-(0.4)(0.7) = +0.1011 Example D=+0.10 Random mating of parents Genotypic value 10 Disequilibrium = D = P(MQ) – pMp M Q Frequency M Q M Q m Q M Q M Q M q m q M q M q M Q m Q M q M q M q m q (.38)(.38) =.1444 (.38)(.02) =.0076 (.02)(.38) =.0076 (.02)(.02) =.0004 9.80 2(.38)(.32) =.2432 2(.38)(.28) =.2128 2(.02)(.32) =.0128 2(.02)(.28) =.0112 8.94 m Q m Q m Q m q m q m Q m q m q (.32)(.32) =.1024 (.32)(.28) =.0896 (.28)(.32) =.0896 (.28)(.28) =.0784 7.92 12 6 3. Overview of Strategies for QTL Detection Depend on the type of LD between markers and QTL Strategies differ in the # of rounds of that you want to exploit recombination that occurred since creation of LD and, therefore, in how close a marker needs to be to be in • LD you create by a cross sufficient LD with a QTL • F2 cross • Backcross • Advanced Intercross Line – AIL • Recombinant Inbred Line – RIL • LD that exists within families • Within half-sib families • In extended pedigree outbred F2/BC • LD that is already present in an outbred population • LD created in past by drift, mutation, selection, migration r2 1 c=.001 0.9 0.8 c=.01 0.7 0.6 0.5 c=.05 0.4 0.3 c=.1 0.2 c=.2 0.1 c=.5 0 0 5 10 15 Generation 20 25 Type of LD used affects marker density required, type of analysis needed, and how results are to be interpreted 13 Scope of QTL Detection Strategy ¾ Targeted – e.g. candidate gene approach ¾ Look for QTL in targeted region if the genome ¾ Genome-wide – genome scan approach ¾ Place markers across the genome ¾ Look for associations of markers with trait phenotype across the genome ¾ Identify QTL across the genome M1 M2 M3 Q M4 M5 M6 m1 m2 m3 q m4 m5 m6 14 7 r2 1 c=.001 Overview of Strategies for QTL mapping 0.9 0.8 c=.01 Outbred population Line/breed cross 0.7 0.6 0.5 Linkage analysis LD markers c=.05 0.4 0.3 c=.1 0.2 c=.2 0.1 F2 / BC c=.5 0 0 5 10 15 Generation LD used 20 AIL RIL 25 Population wide Linkage analysis LE markers LD mapping LD markers HS/FS Extended Candidate High pedigree genes density families Within family Population wide Recomb. LD extent Marker map Scope Map resol. 15 4. QTL detection in Line Crosses Line crossing creates extensive Linkage Disequilibrium M Q M Q M MQ q M Q M QQ M m q M Q X m q m q M Q m q M Q m q m q m q m q m q M Q m q M Q M Q M Q m q M Q m q m q 16 8 QTL detection in Backcross of Inbred Lines M Q Parental lines M Q q m q c = recombination rate c M q m m X Q X F1 m q m Progeny Back cross q produced M Q μ+d m q μ-a m q m q M q μ-a m Q μ+d Recombinants m q m q Contrast YMm-Ymm 1/ 2 (1-c) 1/ 2 c 1/ 2 Non-recombinants (1-c) 1/ 2 c = (1-2c)(a+d) Mean phenotype by marker genotype YMm= μ - c a + (1-c)d Ymm= μ - (1-c) a + c d BC has only 1 round of recombination Line crossing creates extensive LD r2 M Q M Q m q M Q MM QQ M Q M MQ q M Q m q M Q m q X m q m q m q m q m q m q c=.001 0.8 c=.01 0.7 0.6 0.5 c=.05 0.4 0.3 m q M Q M Q 1 0.9 M Q M Q 17 m q M Q m q m q Contrast YMm-Ymm = (1-2c)(a+d) c=.1 0.2 c=.2 0.1 c=.5 0 0 5 F2/BC 10 15 Generation 20 25 Î marker doesn’t need to be close to the QTL to show an effect on phenotype c = 0.2 Æ 1-2c = 0.6 Î marker with 0.2 rec.rate with QTL still shows 60% of QTL effect General recommendation is a marker every 20 cM 18 Î each QTL is within 10 cM of a marker 9 F2 Cross between Inbred Lines F2 Q μ+a M M Q M m Q μ+a m Q q μ+d m q M Q m Q M Q μ+d m Q μ+d M q m q 1/ M q μ-a m q μ-a M 1/ 4 (1-c)(1-c) 1/ 1/ 4 4 q c (1-c) (1-c) c 1/ 4 m cc M q 1/ 4 cc c Q M Q M Q m q X F1 X m q m q M Q m q μ+d 1/ 1/ 4 4 (1-c) c 4 c (1-c) Contrast YMM - Ymm = 2(1-2c)a (1-c)(1-c) Expected mean of marker genotypes YMM= μ +(1-c)2a+2c(1-c)d -c2a Ymm= μ +c2a+2c(1-c)d -(1-c)2a F2 Cross between Inbred Lines M Q m q X 19 M Q m q M Q μ+a M Q μ+a m Q μ+a m Q μ+a M Q 1/ (1-c)(1-c) 4 m Q c M Q m Q M q μ+d M q μ+d m q μ+d m q μ+d M Q m Q cc M Q m Q M Q μ+d M Q μ+d m Q μ+d m Q μ+d M q c m q M q cc m q M q μ-a M q μ-a m q μ-a m q μ-a M q m q / c (1-c) M q 2 YMm= μ +c d +(1-c)2d m q 1/ 4 c (1-c) 1/ (1-c) 4 1/ 4 cc YMM= μ+(1-c)2a+2c(1-c)d -c2a a(1-2c)=(YMM-Ymm)/2 1/ (1-c) 4 1/ 4 1/ (1-c)(1-c) 4 1 4 1/ 4 c (1-c) 1/ (1-c)(1-c) 4 1/ 4 1/ (1-c) 4 c 1/ 4 cc 1/ (1-c) 4 1/ 4 c c (1-c) 1/ (1-c)(1-c) 4 Ymm= μ+c2a+2c(1-c)d -(1-c)2a d(1-2c)2=YMm - 1/2(YMM+Ymm) 20 10 M m c Q Summary q Backcross: F2 cross: Expectation if c = 0.5 (1-2c)(a+d) = YMm-Ymm =0 (1-2c)a = (YMM-Ymm)/2 =0 (1-2c)2 d = YMm - 1/2(YMM+Ymm) =0 Estimates confound QTL position and effect E.g. if (YMM - Ymm) / 2 = 10 kg (F2 cross) • QTL could be near M with a = 10 (if c=0) • QTL could be distant (c=0.25) with a = 20 Marker-associated • or any other possibility effect = 10 • QTL can be on either side of the marker 21 But, if we test multiple markers and find the following marker-associated effects: (YMM -Ymm)/2 = (1-2c)a = M1 M2 M3 M4 M5 5 10 10 5 2.5 there is evidence that the QTL is between M2 and M3 (although we cannot exclude presence of multiple QTL) 22 11 5. QTL Interval Mapping in Line Crosses Use of flanking markers To estimate QTL position and effect separately c1 M c2 Q N Backcross m q n X m q n m q n θ = assumed known Contrast YMm-Ymm = (1-2c1)(a+d) Contrast YNn-Ynn = (1-2c2)(a+d) Î 3 equations 3 unknowns c1, c2 , (a+d) No interference Æ θ = c1 + c2 -2c1c2 23 c1 c2 M Q N m q n Backcross Interval Mapping m q n X To estimate QTL position and effect separately m q n M 1/ 2θ M m 1/ 2θ m m 1/ 2(1-θ) m Q q Q q Q q n n N N n n 1/ 2 (1-c1) c2 μ+d 1/ 2 c1 (1-c2) μ -a 1/ 2 c1 (1-c2) μ+d 1/ 2 1/ 2 1/ 2 (1-c1) c2 c1 c2 Pr(Q|marker data) = XQ QTL position (1-c1)(1-c2)/(1-θ) (1-c1) c2 /θ c1 (1-c2)/θ μ -a μ+d (1-c1)(1-c2) μ c1 Use θ = c1 + c2 -2c1c2 θ F1 gametes and progeny Frequency value Frequency M Q N 1 /2(1-c1)(1-c2) μ+d 1/ (1-θ) 2 M q N 1 /2 c1 c2 μ -a c2 /(1-θ) -a 24 12 E(Yi|Marker Genotype) • Two possible QTL genotypes: Qq or qq – If Qq, E(Yi|Qq) = μ + d – If qq, E(Yi|qq) = μ – a • Put those two together with P(Qq | gmarker) = XQi and P(qq | gmarker) = 1 – XQi • E(Yi | M) = (μ + d)XQi + (μ - a)(1 - XQi) = (μ - a) + (a + d)XQi = m + bQ XQi Î Regression model: Yi = m + bQ XQi + e Regression Interval Mapping 25 Estimate QTL position and effect separately Haley and Knott (1992) Heredity 69: 315 Backcross regression model c1 Yi = m + bQ XQi + ei E(bQ) = a+d c2 M Q N m q n θ 12 Fit Model for various positions of QTL (e.g. in steps of 1 cM) 8 F-value Position with lowest RSS or highest F-test gives best estimate of c1 and bQ (=a+d) 10 6 4 2 0 0 10 20 30 Position (cM) 40 50 26 13 M Q X M Q N M Q N m q n F2 m N F1 X gmarkers MM NN MM Nn MM nn Mm NN Mm Nn Mm nn mm NN mm Nn mm nn q n m q n M Q N m q n c1 F2 Cross between Inbred Lines M c2 Q N θ Pr(QQ|gmarkers) Pr(Qq|gmarkers) Pr(qq|gmarkers) f(c1,c2,θ) f(c1,c2,θ) f(c1,c2,θ) 27 F2 Cross between Inbred Lines Haley and Knott (1992) Heredity 69: 315 Additive coef. Dom. Coef. Markers MM MM MM Mm Mm Mm mm mm mm NN Nn nn NN Nn nn NN Nn nn Xadd Pr(QQ) Pr(Qq) Pr(qq) Pr(QQ)-Pr(qq) f(c1,c2,θ) f(c1,c2,θ) f(c1,c2,θ) f(c1,c2,θ) Yi = μ + baXadd,i + bdXdom,i + ei E(ba) = a Xdom Pr(Qq) f(c1,c2,θ) at QTL position E(bd) = d Fitted at each 1 cM position on chromosome Position with highest F-test Æ QTL (if significant) 28 14 29 6. QTL detection in line crosses Additional Topics (see also Lynch and Walsh ch 15) a. Significance test for presence of QTL b. Accuracy of position estimates • Advanced intercross lines c. Breed Crosses (vs inbred line crosses) 30 15 6a. How to decide if you’ve detected a QTL? Test statistic (e.g. F or LR) > threshold T Set T to control the Type I error rate (False Positives) • Comparison-wise test at 5% : set threshold T such that: • Prob(test > T | no QTL) < .05 allow 5% FP tests Possible outcomes for test for QTL at a given position: True state Ho is true (no QTL) Result of significance test Accept Ho Reject Ho True negative False positive Type I error Ho is false (QTL) False negative True positive Type II error 31 Expected result for tests at 100 positions on chromosome with NO QTL at 5% comparison-wise test level: True state Ho is true (no QTL) Result of significance test Accept Ho Reject Ho 95 5 Type I error Ho is false (QTL) 0 0 Type II error Î Significance testing complicated by: • Large # tests performed (many markers, QTL positions) • At α = 0.05, 5% of tests significant even if no QTL exist • Tests on the same chromosome are dependent • Bonferroni adjustment (α*= α/(# tests)) is too stringent 32 16 Strategies to control % false positives (%FP) (Lander & Kruglyak, 1995, Nature Genetics 11: 241-247) • Chromosome-wise test - control % FP at chrom. level • • Account for multiple (correlated) tests on chrom. # FP/chromosome > 1 on 5% of chromosomes • Experiment-wise test - control %FP within experiment • • Account for all tests conducted in experiment # FP/experiment > 1 on 5% of experiments • Genome-wise test - control % FP at genome level • Account for all tests conducted on the genome • # FP/genome > 1 on 5% of genomes tested • Significance Levels (Lander & Kruglyak, 1995) • Significant Linkage at p < .05 : Prob(> 1 FP) < .05 • Suggestive Linkage : at least 1 false positive test 33 Computing significance thresholds • Adjust Table test statistic values by equation of Lander & Kruglyak (1995) • Assumes high-density marker map • Develop empirical threshold based on permutation test (Churchil and Doerge, 1994, Genetics 138:963) • • • • Simulate data under the Null Hypothesis (=no QTL) Compute test statistic (F-test / LR) Replicate many times Determine 95 % level of tests statistic (for 5% test) 34 17 Significance thresholds by Permutation test (Churchill&Doerge, 1994 Genetics 138:963) • • • • Simulate data under the Null Hypothesis (=no QTL) Compute test statistic (F-test / LR) Replicate many times Determine 95 % level of tests statistic (for 5% test) Randomly permuted data Original data Animal Marker Pheno- Anima l Ma rker Pheno- ID Genotype type ID Ge notype type 1 Mmnn 9.8 1 MmNn 9.8 2 mmnn 10.4 2 mmNn 10.4 3 mmnn 9.3 3 Mmnn 9.3 4 5 6 7 8 9 10 Mmnn MmNn MmNn MmNn mmnn MmNn mmNn 8.5 11.3 9.6 9.9 7.6 8.0 10.7 4 5 6 7 8 9 10 MmNn mmnn MmNn Mmnn mmnn MmNn mmnn 8.5 11.3 9.6 9.9 7.6 8.0 10.7 95% Test statistic under Null Hypothesis Replicate Distribution of test statistic 35 5% Threshold Control of False Discovery Rate (FDR) True state Ho is true Result of significance test Reject Ho Accept Ho U V Type I error Ho is false T S Type II error FDR - Control the expected proportion of significant tests that are false positives - Control E(V (V+S)) / 36 18 Frequency Distribution of p-values across many tests 400 Low FDR H0 False H0 True 300 high FDR 200 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P-value See notes “False discovery rate.doc” for further details 37 6. QTL detection in line crosses Additional Topics a. Significance test for presence of QTL b. Accuracy of position estimates • Advanced intercross lines c. Breed Crosses (vs inbred line crosses) 38 19 Replicate Genome Scan results for F2 N=500 6 markers Trait with SD=2 QTL at 23 cM a=1 d=0.5 --> 14% of variance 12 12 12 12 10 10 10 8 8 8 16 10 14 12 8 6 4 6 F-va lue 8 F -va lu e 6 F-va lue F-va lue F-va lue 10 6 6 4 4 4 2 2 2 0 0 4 2 2 0 0 0 10 20 30 40 0 50 10 20 30 40 0 50 10 12 14 12 10 20 30 40 0 0 50 10 20 30 40 50 0 10 20 30 40 50 Position (cM) Position (cM) Position (cM) Position (cM) Position (cM) 12 12 12 10 10 10 8 8 8 6 F -va lu e F-va lue F-va lue F-va lue 8 6 6 4 F -va lu e 10 8 6 6 4 4 4 2 2 2 0 0 4 2 2 0 0 0 10 20 30 Position (cM) 40 50 0 10 20 30 40 Replicate 1 2 3 4 5 6 7 8 9 10 Average St.dev. TRUE 50 0 10 20 30 40 50 0 0 10 20 30 40 50 0 Position (cM) Position (cM) Position (cM) 10 20 30 Position (cM) Position a 15 0.791 0.19 24 1.56 0.19 23 27 20 28 22 13 17 29 1.03 0.771 1.201 1.35 0.96 0.991 0.94 0.924 0.3 0.24 0.93 0.94 0.14 0.64 0.52 1.44 21.8 5.231 1.052 0.236 0.55 0.41 23 1 0.5 40 50 39 d 40 20 F2 cross design Line 2 Line 1 X M M F0 M m F1 X m m M m F2 Mm MM r2 1 mm c=.001 0.9 0.8 • Large chunks • High LD • Only 1 round of recombination Î low accuracy of QTL position c=.01 0.7 0.6 0.5 c=.05 0.4 0.3 c=.1 0.2 c=.2 0.1 c=.5 0 0 F2/BC 5 10 15 20 Generation 25 41 Resolving Power of QTL Mapping (Darvasi & Soller 1997. Behavior Genetics) 9 5 % C I fo r Q T L lo c a tio n 25 Approximate 95% confidence interval for QTL location (cM) for a=.5σ p 20 ~ 3000/kNa2 k=1 for BC k=2 for F2 15 N=population size 10 BC 5 F2 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Mapping population size (actual) Increase resolution with advanced intercross lines • Recombination breaks genome up in smaller pieces - reduces LD except at short distance (Darvasi & Soller 1995 Genetics) 42 21 Strategies to increase accuracy of estimates of QTL position in line crosses F2/BC: • Increasing marker density limited effect • Increase population size r2 1 c=.001 0.9 0.8 c=.01 0.7 0.6 0.5 c=.05 0.4 Advanced intercross lines 0.3 ÎHigher accuracy of QTL position • Requires more markers c=.1 0.2 c=.2 0.1 c=.5 0 0 F2/BC 5 AIL 10 15 Generation 20 25 to maintain power to detect QTL (lower LD) 43 Recent LD extends over large distances r2 Generations of recombination 1 0.9 0.8 Gen 1 rt sho ted ver a LD o e if cre o anc dist long ag Gen 100 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Gen 2 LD distan over long ce if recen created tl y Gen 5 Gen 10 Gen 20 Gen 50 0 0 5 10 15 Distance (cM) 20 25 30 44 22 r2 1 c=.001 Overview of Strategies for QTL mapping 0.9 0.8 c=.01 Outbred population Line/Breed cross 0.7 0.6 0.5 Linkage analysis LD markers Linkage analysis LE markers F2 / BC families c=.05 0.4 0.3 c=.1 0.2 c=.2 0.1 c=.5 0 0 5 10 15 Generation HS/FS AIL 20 25 LD used Population wide Recomb. 1 rnd >1 rnd LD extent Long Smaller Marker map Sparse Coverage Map resol. Ext. pedigree LD mapping LE markers Cand. genes High density Denser Genome wide Poor Better 45 6. QTL detection in line crosses Additional Topics a. Significance test for presence of QTL b. Accuracy of position estimates • Advanced intercross lines c. Breed Crosses (vs inbred line crosses) 46 23 QTL mapping in livestock Using F2 cross between outbred breeds Berkshire x Yorkshire F2 cross 47 F0 2 Berkshire sires M1 N1 BB 9 Yorkshire dams YY M2 N2 x M1 N1 F1 F2 M2 N2 BY 8 sires BY 26 dams x M1 N1 M1 N1 M2 N2 M2 N2 525 BB Breed origin probabilities BY YB YY M1 N1 M1 N1 M2 N2 M1 N1 M2 N2 M2 N2 PBB PBY PYB PYY derived for a given position 48 24 Haley and Knott (1992) Heredity 69: 315 F2 Cross between breeds Identical to cross of inbreds but follow B vs. Y alleles Markers MM MM MM Mm Mm Mm mm mm mm NN Nn nn NN Nn nn NN Nn nn Additive coef. Dom. Coef. Xadd Pr(BB) Pr(BY) Pr(YY) Pr(BB)-Pr(YY) f(c1,c2,θ) f(c1,c2,θ) f(c1,c2,θ) f(c1,c2,θ) Yi = μ + baXadd,i + bdXdom,i + ei E(ba) = a Xdom Pr(BY) f(c1,c2,θ) at QTL position E(bd) = d Fitted at each 1 cM position on chromosome Position with highest F-test Æ QTL (if significant) 49 SSC1 MARBLING Line-Cross 4.5 a = - 0.13 d = +0.19 -logP 4.0 3.5 1% Chr.w 3.0 2.5 5% Chr.w 2.0 Breed cross X F1 F2 1.5 1.0 0.5 Detect QTL that differ in frequency 50 10 20 30 40 between 50 60 70 80 breeds 90 100 110 120 130 0.0 cM 0 25 Breed cross interval mapping F0 2 Berkshire sires BB x 9 Yorkshire dams YY F1 8 sires x BY 26 dams F2 525 BY BB BY YB YY Compares average Berk allele to average York allele Î QTL only detected if breeds differ in frequency Berk Frequency of Q Line cross additive effect pB = Line cross dominance effect = X York pY (pB-pY)a (pB-pY)d QTL effect QQ +a Qq d qq -a 51 Summary of QTL mapping in Line/ Breed Crosses • QTL detection requires LD between markers and QTL • Cross Æ extensive LD Æ genome scan with markers @ 20 cM • Regression interval mapping Æ estimate QTL position, effect • Estimates have limited accuracy Æ 10 – 30 cM confidence intervals • Fine mapping not limited by # markers but requires • larger populations • crosses that accumulate recombinations • Recombinant Inbred Lines • Advanced Intercross Lines • Only detects QTL that differ between breeds 52 26 Breed cross QTL scan F0 2 Berkshire sires BB x 9 Yorkshire dams YY F1 8 sires x BY 26 dams F2 525 BB BY BY QTL that differ Î in frequency between breeds Î Wide QTL region (20-50 cM) YB YY Within-breed MAS requires QTL that segregate within breeds Follow-up within-breed research in QTL region: Î Linkage mapping Evans et al. Æ see next (2003 Genetics:621) Î LD mapping - confirmed QTL in 10 commercial lines Æ day 3 53 r2 1 c=.001 Overview of Strategies for QTL mapping 0.9 0.8 c=.01 Outbred population Line/Breed cross 0.7 0.6 0.5 Linkage analysis LD markers Linkage analysis LE markers F2 / BC families c=.05 0.4 0.3 c=.1 0.2 c=.2 0.1 c=.5 0 0 5 10 15 Generation HS/FS AIL 20 25 LD used Population wide Recomb. 1 rnd >1 rnd LD extent Long Smaller Marker map Sparse Coverage Map resol. Ext. pedigree LD mapping LE markers Cand. genes High density Denser Genome wide Poor Better 54 5 27 7. QTL detection in outbred populations – linkage analysis e.g. livestock, wildlife, human Reading Dekkers and van der Werf (2007) Chapter 10 at http://www.fao.org/docrep/010/a1120e/a1120e00.htm 55 LD always exists within families r2 LD behavior similar to BC/F2 Sire c=.001 0.8 c=.01 0.7 r M 1 0.9 0.6 Q 0.5 c=.05 0.4 0.3 M Progeny m m q M Q c=.2 0.1 c=.5 0 0 meiosis M Q M Q M q M q M M QQ M Q M MQ q M Q M q M Q M Q c=.1 0.2 HS m q m q m q 5 10 15 20 Generation 25 m q m q m q m Q m Q m Q m q m q m q Î Marker - QTL LD among progeny at large distance 56 28 QTL mapping in half-sib family design Within-family LD not consistent across families Sire 1 Sire 2 Sire 3 Sire 4 M Q M q M Q M q m q m Q m Q m q Î Analysis must allow for different marker-QTL linkage phases within each family QTL effects must be fitted w/in family: Yij = μi + αQ,i PQ,ij + eij PQ,ij = Prob(QMi | marker genotype, QTL position) αQ,i = QTL allele substitution effect for sire i See e.g. Knott et al. Theor.Appl.Genet. 1996. 93: 71-80 57 Power of alternative QTL mapping designs For given number of animals genotyped F2 > BC > Fullsib > Halfsib Typical size used animals > 500 animals >1000 Outbred designs: Fraction p2+q2 of parents are homozygous for QTL = non-informative 58 29 Daughter design for QTL detection and MAS Mm M m m m m m m m m m M M M M M M M M Compare production 59 Grand daughter design Mm M M M M m m m m Compare progeny test 60 30 Grand-daughter Design c M Grand Sire Q ? ? ? ? X m q M Q μ+1/2α M Sons q μ -1/2α ? ? ? ? 1/ 2 (Weller et al. 1990) (1-c) 1/ 2 Genotyped for marker c m q μ -1/2α m Q μ+1/2α ? ? ? ? 1/ 2 (1-c) 1/ 2 c Mean phenotype of progeny for each son (or son’s EBV or deregressed EBV) μ +1/4 (1-2c)α Average μ - 1/4 (1-2c)α Contrast of average EBV of sons mM?-mm? = r2 1 1/ 2 (1-2c)a 61 c=.001 Overview of Strategies for QTL mapping 0.9 0.8 c=.01 Outbred population Line/Breed cross 0.7 0.6 0.5 Linkage analysis LD markers Linkage analysis LE markers F2 / BC families c=.05 0.4 0.3 c=.1 0.2 c=.2 0.1 c=.5 0 0 5 10 15 Generation HS/FS AIL 20 25 Ext. pedigree LD used Population wide Recomb. 1 rnd >1 rnd 1 rnd >1 rnd LD extent Long Smaller Long Smaller Denser Sparse Denser Marker map Sparse Coverage Map resol. Genome wide Poor Better LD mapping LD markers Cand. genes High density Within family Genome wide Poor Better Linkage Analysis in extended pedigrees by random QTL effects - see later 62 31 8. Summary and limitations of QTL mapping in outbred populations using sparse markers • Within family Æ extensive LD Æ genome scan with markers @ 20 cM • Regression interval mapping Æ estimate QTL position, effect • Estimates of marker/QTL effects differ by family Æ complicates MAS • Estimates have limited accuracy Æ 10 – 30 cM confidence intervals • Fine mapping not limited by # markers but requires • larger populations • Populations that accumulate recombinations • Linkage analysis in deep pedigrees • Historical recombination Æ LD mapping 63 Software for QTL mapping by linkage analysis Many programs available (with tutorials) See: http://linkage.rockefeller.edu/soft/list.html • For inbred line crosses: Mapmaker QTL http://www.broad.mit.edu/genome_software/other/qtl.html http://darwin.eeb.uconn.edu/notes/qtl-mapmaker.pdf • For breed crosses and outbred populations: QTL Express http://qtl.cap.ed.ac.uk/ 64 32 Day 2 QTL Detection Objective Present principles for detection of genes affecting quantitative traits (QTL) using genetic markers in ‘simple’ experimental designs Concepts covered relevant to issues in ‘genomic selection’ 1. 2. 3. 4. 5. 6. Single locus quantitative genetic model Principle of use of LD to detect QTL using markers Overview of strategies for QTL detection QTL detection using line crosses QTL interval mapping in line crosses QTL detection in line crosses – additional topics a. Significance testing b. Accuracy of position estimates c. Breed crosses (vs inbred line crosses) 7. QTL detection in outbred populations – linkage analysis 8. Summary and limitations Æ need for LD mapping 65 9. Software for QTL mapping 66 33 Extra notes: Multiple QTL problem What if there is more than 1 QTL linked to the marker? c1 Q1 M q1 Backcross: c2 m Q2 2 QTL q2 E(YMm-Ymm ) = (1-2c1)(a1+d1) + (1-2c2)(a2+d2) QTL 1 Î Marker picks up combined effect of both QTL 16 14 12 Possible result from fitting 1-QTL model: Æ Ghost QTL or no QTL QTL 2 F -v alu e 10 8 6 QTL 1 4 (if in coupling phase) QTL 2 2 0 (if in repulsion phase) 0 10 20 30 40 Position (cM) 50 67 Solution – for inbred line crosses Composite Interval Mapping (CIM) Add markers as co-factors to control for QTL in other intervals A B C D E F Eg. When mapping a QTL in interval C-D, include B and E as co-factors: Yi = m + baXadd,i + bdXdom,i + Affected only by QTL in B – E Use to detect QTL in C-D interval bBXB,i + bEXE,i + ei Controls for QTL outside B outside E In general – include markers just outside the interval as co-factors Can include other (unlinked) QTL markers as co-factors to reduce residual var. 68 There’s no single perfect strategy on how to choose co-factors 34 69 Multiple QTL mapping in breed crosses Comp.int.mapping not possible because markers may not be completely informative Alternative: Fit 2-QTL models: Yi = m + ba1Xadd,1,i + bd1Xdom,1,i + ba2Xadd,2,i + bd2Xdom,2,i + ei E.g. - fix QTL 1 at best position - scan chromosome for best position of QTL 2 Test statistic is LRT = Likelihood ratio test = -2ln[likelihood 2 QTL model / likelihood 1 QTL model] ~ Chi-square See QTLExpress http://qtl.cap.ed.ac.uk/ 70 35