Corporate Profile - Crop and Soil Science

advertisement
PBG 650 Advanced Plant Breeding
Module 1:
• Introduction
• Population Genetics
– Hardy Weinberg Equilibrium
– Linkage Disequilibrium
Plant Breeding
“The science, art, and business of improving plants for
human benefit”
Considerations:
– Crop(s)
– Production practices
– End-use(s)
– Target environments
– Type of cultivar(s)
– Traits to improve
– Breeding methods
– Source germplasm
– Time frame
– Varietal release and intellectual property rights
Bernardo, Chapter 1
Plant Breeding
A common mistake that breeders make is to improve
productivity without sufficient regard for other
characteristics that are important to producers,
processors and consumers.
 Well-defined Objectives
 Good Parents
 Genetic Variation
 Good Breeding Methods
 Functional Seed System
Adoption of Cultivars by Farmers
Quantitative Traits
• Continuum of phenotypes (metric traits)
• Often many genes with small effects
• Environmental influence is greater than for
•
•
qualitative traits
Specific genes and their mode of inheritance
may be unknown
Analysis of quantitative traits
– population parameters
• means
• variances
– molecular markers linked to QTL
Populations
•
In the genetic sense, a population is a breeding group
– individuals with different genetic constitutions
– sharing time and space
•
In animals, mating occurs between individuals
– ‘Mendelian population’
– genes are transmitted from one generation to the next
• In plants, there are additional ways for a population to survive
– self-fertilization
– vegetative propagation
•
Definition of ‘population’ may be slightly broader for plants
– e.g., lines from a germplasm collection
Falconer, Chapt. 1; Lynch and Walsh, Chapt. 4
What do population geneticists do?
Study genes in populations
– Frequency and interaction of alleles
– Mating patterns, genotype frequencies
– Gene flow
– Selection and adaptation vs random genetic drift
– Genetic diversity and relationship
– Population structure
Related Fields
– Evolutionary Biology – e.g., crop domestication
– Landscape Genetics
Gene and genotype frequencies
For a population of diploid organisms:
Alleles
p+q=1
P11 + P12 + P22 = 1
Genotypes
A1
A2
A1A1
A1A2
A2A2
Frequencies
p
q
P11
P12
P22
# Individuals
80
120
16
48
36
Proportions
0.4
0.6
0.16
0.48
0.36
p1  p  P11 
1
2
1
P12  0.16  0.24  0.4
p2  q  P22  P12  0.36  0.24  0.6
2
Bernardo, Chapter 2
Gene frequencies (another way)
Number of individuals = N = N11+ N12+ N22 = 100
Number of alleles = 2N = N1 + N2 = 200
Alleles
Genotypes
A1
A2
A1A1
A1A2
A2A2
Frequencies
p
q
P11
P12
P22
# Individuals
80
120
16
48
36
Proportions
0.4
0.6
0.16
0.48
0.36

1
p1  p  N 11  N 12  N  2N 11  N 12  2N  2 * 16  48 200  0.4
2


1
p 2  q  N 22  N 12  N  2N 22  N 12  2N  2 * 36  48 200  0.6
2











Allele frequencies in crosses
Inbred x inbred
Alleles are unknown, but allele frequencies at
segregating loci are known
F1 and F2:
p = q = 0.5
p
q
BC1
0.75
0.25
BC2
0.875
0.125
BC3
0.9375
0.0625
BC4
0.96875
0.03125
Value of q is reduced
by ½ in each
backcross generation
Factors that may change gene frequencies
•
Population size
– changes may occur due to sampling
 assume ‘large’ population
•
Differences in fertility and viability
– parents may differ in fertility
– gametes may differ in viability
– progeny may differ in survival rate
 assume no selection
•
Migration and mutation
 assume no migration and no mutation
Factors that may change genotype frequencies
Changes in genotype frequency (not gene frequency)
•
Mating system
– assortative or disassortative mating
– selfing
– geographic isolation
assume that mating occurs at random (panmixia)
Hardy-Weinberg Equilibrium
•
Assumptions
– large, random-mating population
– no selection, mutation, migration
– normal segregation
– equal gene frequencies in males and females
– no overlap of generations (no age structure)
• Note that assumptions only need to be true for the locus in question
 Gene and genotype frequencies remain constant from one
generation to the next
 Genotype frequencies in progeny can be predicted from
gene frequencies of the parents
 Equilibrium attained after one generation of random mating
Hardy-Weinberg Equilibrium
Genes in parents
Frequencies
Example
Genotypes in progeny
A1
A2
A1A1
A1A2
A2A2
p
q
P11 = p2
P12 = 2pq
P22 = q2
0.4
0.6
0.16
0.48
0.36
Expected genotype
frequencies are obtained by
expanding the binomial
(p + q)2 = p2 + 2pq + q2 = 1
A1
A2
A1
p2=.16 pq=.24
p = 0.4
A2
pq=.24 q2=.36
q = 0.6
Equilibrium with multiple alleles
For multiple alleles, expected genotype
frequencies can be found by expanding the
multinomial (p1 + p2 + ….+ pn)2
For example, for three alleles:
2
2
2
p

p

p

p

2
p
p

2
p
p

p

2
p
p

p
 1 2 3
1
1 2
1 3
2
2 3
3
2
Corresponding genotypes:
A1A1 A1A2
A1A3
A2A2 A2A3 A3A3
Lynch and Walsh (pg 57) describe equilibrium for autopolyploids
Relationship between gene and genotype frequencies
•
•
f(A1A2) has a
maximum of 0.5,
which occurs when
p=q=0.5
Most rare alleles
occur in
heterozygotes
Implications for
1
0.9
0.8
Genotype frequency
•
0.7
A2A2
A1A2
0.6
0.5
0.4
0.3
0.2
– F1?
0.1
– F2?
0
– Any BC?
A1A1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Frequency of A2
0.8
0.9
1
Applications of the Hardy-Weinberg Law
•
Predict genotype frequencies in random-mating
populations
•
Use frequency of recessive genotypes to estimate
the frequency of a recessive allele in a population
– Example: assume that the incidence of individuals
homozygous for a recessive allele is about 1/11,000.
q2 = 1/11,000
•
q  0.0095
Estimate frequency of individuals that are carriers
for a recessive allele
p = 1 - 0.0095 = 0.9905
2pq = 0.0188  2%
Testing for Hardy-Weinberg Equilibrium
All genotypes must be distinguishable
Genotypes
Gene frequencies
A1A1
A1A2
A2A2
A1
A2
Observed
233
385
129
0.5696
0.4304
Expected
242.36
366.26
138.38
N = N11+ N12+ N22= 233 + 385 + 129 = 747
pˆ1 
N11  0.5 * N12
 1
 (233   385) / 747  0.5696
N
2
2
E (N11 )  pˆ1 * N  0.5696  * 747  242.36
2
Chi-square test for Hardy-Weinberg Equilibrium
χ 
2
Obs - Exp 2
Exp
 1.96
2
critical χ1df
 3.84
•
Example in Excel
only 1 df because gene
frequencies are
estimated from the
progeny data
Accept H0: no reason to think that assumptions for HardyWeinberg equilibrium have been violated
– does not tell you anything about the fertility of the parents
• When you reject H0, there is an indication that one or more of
the assumptions is not valid
– does not tell you which assumption is not valid
Exact Test for Hardy-Weinberg Equilibrium
•
•
Chi-square is only appropriate for large sample sizes
If sample sizes are small or some alleles are rare, Fisher’s
Exact test is a better alternative
N
N! n A ! na !2 Aa
Pr(N AA ,N Aa ,N aa n A , na ) 
N AA ! N Aa ! Naa ! (2N )!
– Calculate the probability of all possible arrays of genotypes for the
observed numbers of alleles
– Rank outcomes in order of increasing probability
– Reject those that constitute a cumulative probability of <5%
Example in Excel
Weir (1996) Chapt. 3
Likelihood Ratio Test
 
 

L r z
 
L z
Maximum of the likelihood function given the data (z)
when some parameters are assigned hypothesized
values
Maximum of the likelihood function given the data (z)
when there are no restrictions
When the hypothesis is true:


  

LR  2 ln   2 L r z  L  z
2 df=#parameters assigned values
Likelihood ratio tests for multinomial proportions are often
called G-tests (for goodness of fit)
Lynch and Walsh Appendix 4
Likelihood Ratio Test for HWE
 Nˆ ij
G  2 Nij ln
 Nij
i 1 j  i

n
n




where N̂ij is the expected number
and Nij is the observed number of the ijth genotype
Calculations in Excel
Gametic phase equilibrium
Random association of
alleles at different loci
(independence)
PAB=pApB
Disequilibrium
DAB = PAB – pA pB
DAB = PAB Pab – PAb PaB
DAB = 0.40 – 0.5*0.5 = 0.15
DAB = 0.4*0.4 – 0.1*0.1 = 0.15
B
b
A
PAB PAb
pA
a
PaB Pab
pa
pB
pb
B
b
A
.40
.10
.50
a
.10
.40
.50
.50
.50
Lynch and Walsh, pg 94-100; Falconer, pg 15-19
Linkage Disequilibrium
•
Nonrandom association of alleles at different loci
– the covariance in frequencies of alleles between the loci
•
Refers to frequencies of alleles in gametes
(haplotypes)
•
May be due to various causes in addition to linkage
– ‘gametic phase disequilibrium’ is a more accurate
term
– ‘linkage disequilibrium’ (LD) is widely used to
describe associations of alleles in the same or in
different linkage groups
Linkage Disequilibrium
Gametic types
AB
Ab
aB
ab
Observed
PAB
PAb
PaB
Pab
Expected
pA pB
pA pb
pa p B
pa pb
+D
-D
-D
+D
Disequilibrium
Excess of coupling phase gametes  +D
Excess of repulsion phase gametes  -D
Sources of linkage disequilibrium
•
•
•
•
•
•
•
•
Linkage
Multilocus selection (particularly with epistasis)
Assortative mating
Random drift in small populations
Bottlenecks in population size
Migration or admixtures of different populations
Founder effects
Mutation
Two locus equilibrium
•
For two loci, it may take many generations to reach
equilibrium even when there is independent
assortment and all other conditions for equilibrium
are met
– New gamete types can only be produced when the parent
is a double heterozygote
A
B
A
B
A
b
a
b
0.5 AB
0.5 Ab
0.25 AB 0.25 aB
0.25 Ab 0.25 ab
Decay of linkage disequilibrium
• In the absence of linkage, LD decays by one-half
with each generation of random mating
c = recombination frequency
Dt  (1 c ) D0
t
Disequilibrium (D)
Dt 1  (1 c )Dt
0.25
c=.50
c=.20
c=.10
c=.01
0.20
0.15
0.10
0.05
0.00
0
10
20
30
40
50
60
Generation
70
80
90 100
Factors that delay approach to equilibrium
Dt  (1 c ) D0
t
•
•
Linkage
•
Small population size – because it reduces the
likelihood of obtaining rare recombinants
Selfing – because it decreases the frequency of
double heterozygotes
Implications for breeding
Effect of inbreeding on the frequency of a
recombinant genotype
P1
P2
A1A1B1B1 x A2A2B2B2
A1A2B1B2
gamete
A1B1
A1B2
A2B1
A2B2
frequency
0.5*(1-c)
0.5*c
0.5*c
0.5*(1-c)
Frequency of A1A1B2B2
F1
Inbreds
F2
F2 (adjusted)
0.25
0.20
0.15
0.10
0.05
0.00
0
0.1
0.2
0.3
0.4
0.5
c = recombination frequency
• Gametic Phase Disequilibrium that is not due to linkage is eliminated by
•
•
making the F1 cross
Recombination occurs during selfing
There would be greater recombination with additional random mating,
but it may not be worth the time and resources
Effect of mating system on LD decay
s
c  1
2s
c = effective recombination rate
s = the fraction of selfing
1
0.9
0.8
0.05 0.00
0.7
D'
0.6
0.05 0.99
99% selfing
0.25 0.00
0.5
0.25 0.99
0.4
0.50 0.00
0.3
0.50 0.99
0.2
0.1
outcrossing
Generation
40
37
34
31
28
25
22
19
16
13
10
7
4
1
0
no linkage
Alternative measures of LD
D
r 
pA pa pB pb
D is the covariance between alleles at different loci
Maximum values of D depend on allele frequencies
It is convenient to consider r2 to be the square of the
correlation coefficient, but it can only obtain a value
of 1 when allele frequences at the two loci are the
same
r2 indicates the degree of association between
alleles at different loci due to various causes
(linkage, mutation, migration)
2
•
•
•
•
2
AB
D – minimum and maximum values
fyi
B
b
A
PAB = pApB + D
PAb = pApb - D
pA
a
PaB = papB - D
Pab = papb + D
pa
pB
pb
If D>0 Look for the maximum value D can have
PAb = pApb - D  0  D  pApb
D  min(pApb, papB)
PaB = papB - D  0  D  papB
If D<0 Look for the minimum value D can have
PAB = pApB + D  0  D  -pApB
D  max(-pApB, -papb)
Pab = papb + D  0  D  -papb
Alternative measures of LD
fyi
DAB
D' 
min( pA pb , pa pB )
When DAB > 0
DAB
D' 
( 1) * min( pA pB , pa pb )
When DAB < 0
•
•
D’ is scaled to have a minimum of 0 and a maximum of 1
•
•
D’=1 indicates that one of the haplotypes is missing
D’ indicates the degree to which gametes exhibit the
maximum potential disequilbrium for a given array of allele
frequencies
D’ is very unstable for small sample sizes, so r2 is more
widely utilized to measure LD
Testing for gametic phase disequilibrium
•
Best when you can determine haplotypes
– inbred lines or doubled haploids
– haplotypes of double heterozygotes inferred from progeny
tests
•
Use a Goodness of Fit test if the sample size is large
– Chi-square
– G-test (likelihood ratio)
• Use Fisher’s exact test for smaller sample sizes
• Use a permutation test for multiple alleles
• Need a fairly large sample to have reasonable power for LD
(~200 individuals or more)
See Weir (1996) pg 112-133 for more information
Depiction of Linkage Disequilibrium
Disequilibrium matrix for polymorphic sites within sh1 in maize
Prob value
Fisher’s
Exact Test
r2
Flint-Garcia et al., 2003. Annual Review of Plant Biology 54: 357-374.
Extent of LD in Maize
Average LD decay distance is 5–10 kb
r2
Linkage disequillibrium across the 10 maize chromosomes measured
with 914 SNPs in a global collection of 632 maize inbred lines.
Yan et al. 2009. PLoS ONE 4(12): e8451
Extent of LD in Barley
Elite North American Barley
No adjustment for population structure 
Average LD decay distance is ~5 cM
r2
Adjusted for population structure 
Other studies
Wild barley – LD decays within a gene
Landraces ~ 90 kb
European germplasm - significant LD:
mean 3.9 cM, median 1.16 cM
Waugh et al., 2009, Current Opinion in Plant Biology 12:218-222
References on linkage disequilibrium
Flint-Garcia et al., 2003. Structure of linkage disequilibrium in plants. Annual Review
of Plant Biology 54: 357–374.
Gupta et al., 2005. Linkage disequilibrium and association studies in higher plants:
present status and future prospects. Plant Molecular Biology 57: 461–485.
Mangin et al., 2012. Novel measures of linkage disequilibrium that correct the bias
due to population structure and relatedness. Heredity 108: 285–291.
Slatkin, M. 2008. Linkage disequilibrium – understanding the evolutionary past and
mapping the medical future. Nature Reviews Genetics 9: 477–485.
Waugh, R., Jean-Luc Jannink, G.J. Muehlbauer, L. Ramsay. 2009. The emergence of
whole genome association scans in barley. Current Opinion in Plant Biology
12(2): 218–222.
Yan, J., T. Shah, M.L Warburton, E.S. Buckler, M.D. McMullen, et al. 2009. Genetic
characterization and linkage disequilibrium estimation of a global maize collection
using SNP Markers. PLoS ONE 4(12): e8451.
Zhu et al., 2008. Status and prospects of association mapping in plants. The Plant
Genome 1: 5–20.
Download