3 - Nature

advertisement
Supplementary information
(Jarne & David - Quantifying inbreeding in natural populations of hermaphroditic
organisms)
(all in Windows Word format)
Supplementary Appendix 1 (including 2 tables) – The three main molecular markers
(allozymes, microsatellites and AFLPs) used for estimating the selfing rate, and associated
technical problems.
Supplementary Appendix 2 (including 3 figures) – The sampling properties of an estimator
of the selfing rate in the single-locus case.
Supplementary Appendix 3 (including 2 figures) – Joint estimation of the selfing rate and
inbreeding depression.
Supplementary Appendix 4 (including 1 table) – Accounting for the bias due to partial
dominance when estimating the inbreeding coefficient: a general single-locus model.
Supplementary Appendix 5 (including 1 figure) – Estimating the selfing rate from linkage
disequilibrium data.
Supplementary Appendix 6 (including 1 table and 1 figure) – The progeny-arrays approach
(PAA): basic conditions and some pitfalls associated with technical problems.
References (to all appendices)
2
Supplementary Appendix 1 – The three main molecular markers (allozymes, microsatellites
and AFLPs) used for estimating the selfing rate, and associated technical problems.
Some general characteristics of allozymes, microsatellites and AFLPs are provided in Table 1.
More details are given in Avise (2000) and Lowe et al. (Lowe et al., 2004, chapter 2). These
characteristics should be thoroughly considered before launching a study on selfing rates. A
critical step is to produce as fair as possible data which requires using controls at several steps
(Lowe et al., 2004; Hoffman and Amos, 2005; Pompanon et al., 2005). The influence of a
given marker’s biological characteristics, as well as associated technical problems, on the
estimation of selfing rates are also discussed in main text. The technical problems are detailed
in Table 2, and their influence on the estimation of selfing rate is explained in main text, in
Supplementary Appendix 4 (estimates based on the inbreeding coefficient) and in
Supplementary Appendix 6 (progeny-arrays analyses).
3
Table 1. Some general characteristics of allozymes, microsatellites and AFLPs. SAD = short allele dominance; * = low; ** = intermediate; *** =
high. The financial costs include development and subsequent use. The table has been built with diploid organisms in mind, and the situation is
generally more complex with polyploids. a depends on the number of primer pairs used;
b
the Esterase family is an example; c some technical
problems are presumably more acute with dinucleotide motifs than with larger motifs; d refers to erroneous reading of an allele (band); e refers to
the influence of environmental (e.g., room temperature) and technical (e.g., chemicals, machines) factors on result quality; f more automatized
practises lead to less direct access to primary data.
4
Marker
Dominance
Number of loci
Number of alleles / locus
Mendelian transmission
Allelism (problems with)
Biological material
Amount
State / storing
Technical problems
Null alleles
Band stuttering
Fuzzy bands
SAD
Misreading d
Repeatability
Environmental influence e
Technical cost
Automatization f
Financial costs
References
Allozymes
Microsatellites
AFLPs
codominant
up to a few tens
often < 5
yes
with multi-genes familiesb
codominant
up to a few tens
often < 10
yes
no
dominant
up to a few hundredsa
2
not always
when more than 2 alleles / locus
g
fresh / frozen
ng to µg
fresh / frozen / alcohol
ng to µg
fresh / frozen / alcohol
c
yes
no
yes
irrelevant
yes
**
*
*
*
*
Richardson et al. (1986)
Pasteur et al. (1987)
yes
yes
yes
yes
yes
**
**
**
** / ***
** / ***
Jarne and Lagoda (1996)
Estoup and Angers (1998)
Ellegren (2004)
irrelevant
no
yes
irrelevant
yes
*
***
**
** / ***
**
Vos et al. (1995)
5
Table 2. Some technical problems encountered with the three molecular markers considered here, allozymes (Al), microsatellites (M) and AFLP.
For null alleles, SAD and band stuttering, appropriate methods can be used to analyse the source and magnitude of heterozygote deficiencies
(e.g. Van Oosterhout et al., 2004; David et al., 2007).
Source
Null alleles
Definition
Alleles with no electrophoretic expression / phenotype, because of failed primer
amplification at M loci or no enzymatic reaction at Al loci. Homozygous individuals
(say 00) do not display any patterns, and heterozygous individuals (say B0) are read as
homozygous for the other allele (BB)
Marker
Al, M
Solution
Design more appropriate
PCR primers (M)
Short
allele Preferential PCR amplification of short alleles in heterozygotes, such that heterozygotes
dominance
are misscored as homozygotes for the shortest allele
M
Manipulate PCR conditions
Band stuttering
Stuttered patterns at M loci results from additional PCR products which differ in size
from the actual allele by even (and small) numbers of unit size (e.g., two base pairs for
a dinucleotide). Heterozygotes may be misscored for homozygotes, most probably
when alleles are separated by a single repeat unit.
M
Manipulate PCR conditions
Fuzzy bands
Bands (signals) of larger width than expected and blurred outlines. Might be due to too
much PCR products or too active enzymatic reactions.
Al, M,
AFLP
Manipulate PCR conditions
(M,
AFLP)
or
stop
enzymatic reactions (A)
Miscoring
Erroneous reading of bands leading to the creation of new (imaginary) alleles or to
misreading of an already-existing allele
Al, M
Pool alleles with similar
mobilities
6
Supplementary Appendix 2 – The sampling properties of an estimator of the selfing rate in
the single-locus case.
Our goal is to examine the sampling properties, especially the variance, of estimators of the
selfing rate derived from the inbreeding coefficient (F). Several estimators of F are available

(see e.g., Curie-Cohen, 1982), but we focus on the ‘total heterozygosity’ estimator ( f in
1
Curie-Cohen, 1982) which on the whole is the least biased and exhibits the lowest variance,
and is the one used in this review. Let us assume an inbred population (inbreeding coefficient
F) of infinite size with no mutation, migration or selection. Assume also a locus with k
codominant alleles Ai with frequency pi. n individuals are sampled.
AiAi
AiAj (i ≠ j)
Observed number
aii
2 aij
Expected number
( pi2  pi (1  pi ) F )n
2 pi p j (1  F )n
Genotype
The ‘total heterozygosity’ estimator is defined as:
Hˆ  Hˆ o
Fˆ  e
(1)
Hˆ e
with Hˆ o  2 aij / n is the observed frequency of heterozygotes and Hˆ e  2 pˆ i pˆ j the
j i
j i
expected frequency under random mating. Assuming that n is large enough, it is possible to

derive an approximate expression of Var ( F ) , based on the Delta method (see e.g. Appendix 1
in Lynch and Walsh, 1998) and the variances and covariance of the numerator and
denominator of equation (1) (Curie-Cohen, 1982).

Var ( F ) 
1  F  H e  1  F H e 2  1  F 2  (2 pi p j ( pi  p j ) 2 )

i j
nH e
2
  O 1  (2).
 2
n 
7
When alleles are equifrequent ( pi  1 / k , i ), this simplifies to:
1  F 1  F k  1  1  F 1  H e 1  F 
Var ( Fˆ ) 
nk  1
nH e
Note that Var ( Fˆ ) 
(3).
1
1  He

when F is small. It can be shown that:
n(k  1)
nH e
4Var ( Fˆ )
Var ( Sˆ ) 
(1  Fˆ ) 4
(4)
which when alleles are equifrequent gives:
1  S (2  S )2 2  S  2H e 1  S 
Var ( Sˆ ) 
2nH e
(5).
The variances of the inbreeding coefficient and the selfing rate are given in Figure 1 for a
three-allele locus in two contrasted situations with regard to allelic frequencies. An interesting
result is that the variance of S can be quite substantial when inbreeding is limited and allelic
frequencies are not balanced. The influence of gene diversity on the variance of the selfing
rate can be evaluated using equations (3) and (4) (equifrequent alleles) for various values of
the inbreeding coefficient (Figure 2). There is indeed a clear benefit to using polymorphic
loci, especially in rather outcrossing populations.
When several (L) loci are available, the inbreeding coefficient can be estimated as an
average value over loci. The sampling variance of F decreases with increasing L, and so does
the variance in S. However, the decrease in variance is less than linear with L when L and/or S
are high. This is of limited importance for large values of S because the single-locus variance
is already small (Figures 1 and 2). On the other hand, it might be asked whether the sampling
variance will be more efficiently minimized by increasing either n or L when S is small. In
such a situation, the population is essentially composed of two classes of individuals (selfed
and outcrossed), and the frequency of selfed individuals in a sample of n individuals is a
binomial variable with variance S 1 S  / n . This variance does not depend on the number of
8
loci used. Another source of variance derives from determining the selfed versus outcrossed
status of each individual based on their genotype. This depends on the number and genetic
diversity of loci. The total variance is the sum of these two sources of variance. This is
illustrated in Figure 3 in which the two sources of variance are presented as a function of S (S
< 0.3) in the single-locus case. When He increases (compare the situation with He = 0.66 and
1), the part of total variance attributed to the Binomial variance increases, and it is worth
increasing n. When He is low, increasing L will provide more gain than when He is high. In
general it seems preferable to increase n because the total variance decreases in 1/n, while
only the non-Binomial component decreases when increasing L or He. However it is
sometimes less costly to score more loci than more individuals (e.g., when several loci are
scored in the same electrophoresis gel or co-amplified with the same PCR mix).
9
Figure 1. Variances of the inbreeding coefficient F̂ (empty squares) and the selfing rate Ŝ (full
squares) as a function of the inbreeding coefficient using equations (2) and (4) in the threeallele case. n = 100. A. p1 = 0.98, p2 = p3 = 0.01. B. p1 = p2 = 0.33, p3 = 0.34.
1A
0.16
0.12
0.08
0.04
F
0.00
0
0.2
0.4
0.6
0.8
1
1B
0.03
0.02
0.01
F
0.00
0
0.2
0.4
0.6
0.8
1
10
Figure 2. Variance of the selfing rate Ŝ (equation (4)) as a function of gene diversity (He)
when the inbreeding coefficient is 0.01 (triangles), 0.2 (squares) and 0.8 (circles) –
corresponding to selfing rates of 0.02, 0.33 and 0.89 respectively. n = 100.
Var (S )
2.0
1.6
1.2
0.8
0.4
He
0.0
0
0.2
0.4
0.6
0.8
1
11
Figure 3. Variance of the selfing rate Ŝ as a function of S (equations (3) and (4)) when alleles
are equifrequent. The total variance is given for He = 0.66 (black triangles) and 1 (black
circles), and the binomial variance (independent of He) is indicated by white squares.
Sampling size is 30.
Var(S)
0.08
0.06
0.04
0.02
0
S
0
0.1
0.2
0.3
12
Supplementary Appendix 3 – Joint estimation of the selfing rate and inbreeding depression.
Ritland (1990) proposed to jointly estimate the selfing rate and inbreeding depression based
on an experimental design in which a classical progeny-arrays (PA) analysis (Gn parents and
their Gn+1 offspring) is associated with estimation of the inbreeding coefficient in Gn+1 adults.
This is illustrated in Figure 1 where inbreeding is given as a function of time for adults that
are partially selfing (generation n), their offspring and adults from the next generation
(generation n+1). Partial selfing in Gn adults increases the inbreeding coefficient at
fertilization (corresponding to the primary selfing rate). Inbreeding is then reduced by natural
selection (inbreeding depression). The effect is stronger in outcrossers than in selfers (lower
and upper parts of panel respectively). The adult inbreeding coefficient in successive
generations might differ which constitutes a departure from inbreeding equilibrium
(outcrossing situation in Figure 1).
Ritland (1990) derived a simple expression relating S, F and inbreeding depression (δ)
assuming mixed mating and inbreeding equilibrium:

S 1  F   2 F
S 1  F 
(1).
With no depression, we have equation 1 from main text. This formula can be used to depict
the relationship between the selfing rate (S) and the inbreeding coefficient (F) in adults for
various values of inbreeding depression (Figure 2). It suggests that progenies should be
sampled as early as possible in the life-cycle to approach to the primary selfing rate,
especially when inbreeding depression is strong.
13
Figure 1. Variation of the inbreeding coefficient F as a function of time in two successive
generations. Selfers and outcrossers are distinguished. Four-branch stars indicate the stage at
which progenies are generally genotyped in progeny-arrays (PA) analyses. Inbreeding
depression, indicated by arrows, measured through Ritland’s method (Ritland, 1990) covers
the period from these stars to Gn+1 adults. For the sake of clarity, a linear decline of inbreeding
depression is assumed.
Fertilization
Sampling stage for PA
F
Selfers
Outcrossers
Fn+1
Fn
Adults
Gn
Juven.
Gn+1
Adults
Gn+1
Time
14
Figure 2. Relationship between adult inbreeding coefficient (F) and the selfing rate (S) using
equation (1). The values of inbreeding depression are, from right to left, 0, 0.5, 0.75 and 0.9.
S
1
0.8
0.6
0.4
0.2
F
0
0
0.2
0.4
0.6
0.8
1
15
Supplementary Appendix 4 – Accounting for the bias due to partial dominance when
estimating the inbreeding coefficient: a general single-locus model.
Let us assume a locus with n alleles. The allelic frequency of allele Ai is pi. The population
considered is at inbreeding equilibrium with actual inbreeding coefficient F. The actual
observed heterozygosity and gene diversity at this locus are Ho and He. The observed values
of these three parameters are F*, Ho* and He*, and they might differ from the actual values due
to various technical reasons (see main text and Supplementary Appendix 1). For parameter X,
the bias ΔX is defined as X  X *  X . The relationship F  1 H o / H e holds for both actual
and observed values, and can be used to derive the bias in F:
 1  H o H o 

F  1  F 1 
 1  H e H e 
(1).
The relative biases in observed heterozygosity H o / H o and gene diversity H e / H e depend
on locus characteristics (number and frequencies of alleles) and on the kind of technical
artefacts. Such artefacts can be considered as various forms of dominance. We consider a
general model under which heterozygotes AiAj are read as homozygotes AiAi with probability
ji and as homozygotes AjAj with probability ij (0 <ji + ij < 1). Note that this means that
the observed heterozygosity will always be underestimated (negative bias). The actual and
observed frequencies of genotypes AiAj are Pij and Pij* respectively. Partial dominance
decreases the observed frequency of heterozygous genotypes, and Pij*  Pij 1  ij   ji  .
Summing over all heterozygotes, it can be shown that:
n 1
H o

Ho
  P 
n
i 1 j  i 1
ij
Ho
ij
  ji 
 
(2)
16
where  is the average apparent loss of heterozygotes due to partial dominance. Partial
dominance also modifies apparent allelic frequencies. For example, an allele i dominant over
most (or all) other alleles (ji >> ij for all j) will increase in apparent frequency. The
frequency variation is given by:
pi 
1 n
 Pij ij   ji 
2 j 1, j  i
(3).
Equation (3) can be used to derive the bias in expected heterozygosity:
n
 n

2
H e    pi  2 pi pi 
i 1
 i 1

(4).
The first term of equation (4) represents a variance in dominance among alleles and will
always be positive, while the second term represents a covariance between the average
dominance level of an allele and its frequency. In most situations, this term is expected to be
near zero, and the bias on gene diversity will usually be negative. Equations (2) to (4) can be
used to solve equation (1), and find the deviation due to partial dominance. Because both
observed and expected heterozygosities are underestimated, the two effects oppose each other
when computing ΔF (equation (1)). However ΔHe < ΔHo, because (ji - ij)2 << (ji + ij),
and F is therefore overestimated (ΔF > 0). This general framework allows analysing
situations encountered by experimenters:
- Random heterozygote loss (ji = ij =  / 2 for all i,j). This will happen for example
when some alleles are not amplified or loose enzymatic activity by chance.
- Hierarchical dominance series (ji =  if i > j, and 0 if j > i). Short-allele dominance
is a slightly more general case:  ij  g  j  i  , with g(x) an increasing function of x verifying 0
≤ g(x) ≤ 1 when 0 ≤ x ≤ n-1 (i < j). This allows for various shapes of curve (e.g., linear,
quadratic). Band stuttering at microsatellite loci can be modelled as:  ij  0 when j = i + 1,
and  ij  0 otherwise.
17
- Null alleles (for all j, ij = 0 if i < n and nj = 1). By convention, all null alleles will
be lumped together as allele n.
Formulas for ΔHo / Ho, ΔHe / He, and ΔF are provided in Table 1 (exact formulas are
given together with first-order approximations). Note that for hierarchical dominance, we
consider the simple situation of equifrequent alleles. The expressions remain approximately
identical when this assumption is relaxed. For null alleles, we introduced a minor correction
to equations (1) to (4) to take into account the fact that null homozygotes will probably be
discarded from actual datasets. Note that these formulas do not account for sampling error on
allelic and genotypic frequencies.
Table 1. Formulas for ΔHo / Ho (always < 0), ΔHe / He (< 0, except for random heterozygote
loss), ΔF and ΔS (always > 0). pn is the frequency of null alleles. Note that the formula for kn
assumes equifrequent alleles.
Bias
Random
heterozygote loss
Hierarchical
dominance
-
-
ΔHo / Ho
Null alleles
 (1   )
 ( 2 / H e  F ) pn  O ( pn )
2
ΔHe / He
-kn Ho (1-F) 2
0
(1  H e ) 2
p
(   2  )  n (1   ) 2
He
He
2

 2
(1  H e )
2
p n  O( p n )
He
ΔF
(1-F)
(1-F) + O(2)
(2  F )(1  F ) pn  O( pn )
ΔS
 (2  S )1  S 
1   (1  S )
 (2  S )1  S 
 O 2 
1   (1  S )
2  S 4  3S 1  S  pn  O( p 2 )
n
2  S   1  S 4  3S  pn
with k n 
 (n  1  2i)
i
n 2 (n  1) 2
2
2
, 
1  (1  F ) 2 p n (1  p n )
1  2 p n (1  p n ) / H e
and   p n
.
(1  p n )(1  p n (1  F ))
(1  p n )(1  p n (1  F ))
Supplementary Appendix 5 – Estimating the selfing rate from linkage disequilibrium data.
Cutter (2006) estimated the selfing rate from its long-term effect on recombination. Linkage
disequilibrium can be estimated from r2, the squared correlation coefficient between pairs of
nucleotidic sites. In a population at drift / recombination equilibrium:
r2 
1
1  4 N e ce
(1)
with Ne the effective population size and ce the effective recombination rate. In an inbreeding
population with inbreeding coefficient F, ce  c1  F  . At inbreeding equilibrium,
F  S / 2  S  . This provides an estimate of the outcrossing rate as:
1 S 
1 r2
r 2 1  8 N e c   1
(2).
Note that this equation does not provide meaningful estimates of S when r 2  1 1  2 N e  ,
and negative values of 1- S can easily be obtained. 1- S is presented as a function of r2 for
various values of Ne in Figure 1. It is clear that the conditions of the model will not
necessarily be fulfilled. It is also likely that r2 has a large variance (Hudson, 2001). This
method might though be useful in highly selfing species for which both sequence data and
recombination and effective population size estimates are available. For example, Cutter
(2006) used it in the nematode Caenorhabditis elegans, and returned estimates at least an
order of magnitude lower than direct estimates.
20
Figure 1. The outcrossing rate (log scale) as a function of r2 using equation (2) and assuming
c = 0.5. From top to bottom, Ne = 50, 1000 and 100000.
1-S
1
10-2
10-4
10-6
10-8
0
0.2
0.4
0.6
0.8
r2
1
21
Supplementary Appendix 6 – The progeny-arrays approach (PAA): basic conditions and
some pitfalls associated to technical problems.
1. The basic model
A detailed presentation of the basic model and some of its early extensions is given in Brown
et al. (1989). We briefly review it, and present recent developments. The PAA is based on the
comparison of mother and offspring genotypes. Its logical underpinnings can be exposed
using the simple one-locus two-alleles case with alleles A1 and A2 in frequency p and q
respectively. The expected number of offspring of each genotype is derived from the mixedmating model with selfing rate S. For example, an A1A1 mother with have progenies A1A1
with probability 1  1  S q and progenies A1A2 with probability 1  S q . The expected and
observed number of offspring can then be used to build the likelihood of a given array. This
might be generalized over several loci. Ritland (2002) proposed the following general
formulation. Let us assume that Pklij , S is the probability of observing a progeny with genotype
AkAl given parental genotype AiAj and selfing rate S. Under a mixed-mating system, the
multilocus likelihood becomes:
Pklij  S  Pklij, S  1  S  Pklij,1 S
loci
(1).
loci
The likelihood of family m given parent n (genotype AiAj) is Lmn 
 P 
ij N kl
kl
with Nkl
genotypes
progenies with genotype AkAl. The likelihood of the array given all parental genotypes is:
Ln   f n Lmn with fn the frequency of parent n in the population (which depends on allelic
n
frequencies). The likelihood over all arrays is given by the product over all
families L   Ln . Parameters (here S and allelic frequencies) are estimated by maximizing L
m
using classical methods (inversion of the information matrix; see e.g. Appendix 4 in Lynch
22
and Walsh (1998) for a brief introduction to ML methods). This also allows building
confidence intervals and constructing tests based on likelihoods (e.g., likelihood ratio tests,
Burnham and Anderson, 2002).
An important point is that the expectations are derived based on several assumptions.
Of importance are (see also Table 1 in main text): (i) the expected values of both S and p are
uniform over mothers; (ii) segregation of alleles follows Mendelian rules, (iii) mothers and
offspring genotypes are known without errors (no technical problems leading to scoring bias),
and (iv) no selection occurs between fertilization and the stage at which offspring are
genotyped. The latter point implies that progenies should be genotyped as early as possible in
order to access to the primary selfing rate, because inbreeding depression has a strong early
expression.
The mother genotype should not necessarily be known, but can be inferred together
with the selfing rate, provided enough offspring are screened (e.g., 15 to 20). A peculiar
situation is that of gymnosperms in which mother genotypes can directly be known from the
megagametophyte. Recognizing that seeds or ovules belong to a single progeny-array is
straightforward in sessile, brooding organisms which include plants, fungi, and several groups
of animals (e.g., cnidarians). This is not true anymore in mobile species in which newborns
get away from their mother (e.g., snails). In such a situation, estimating the selfing rate in
natural populations is difficult and one has to resort to more or less artificial conditions. One
possibility is to collect mature individuals in natural populations, set them in controlled
conditions under which offspring can be attributed to a given mother and collect their
offspring (e.g., Henry et al., 2005). The inferred selfing rate is that at the stage of the lifecycle at which offspring are genotyped (see point (iv) above). If the focus is on the evolution
of selfing, it might be of interest to come as close as possible to the primary selfing rate.
Seeds or ovules might therefore be preferred to seedlings or juveniles.
23
The basic model has been extended in several directions which are reported in main
text. The reader is referred to Ritland (2002) and Thompson and Ritland (2006), as well as to
MLTR documentation (http://www.genetics.forestry.ubc.ca/ritland/).
2. Markers
Individuals can be genotyped for various markers (see main text), but the most widely used
have been allozymes and microsatellites (Goodwillie et al., 2005; Jarne and Auld, 2006).
Dominant markers, such as AFLP, can be used, but require a much larger number of loci. The
reason is that fewer situations are favourable to the detection of outcrossing events: they can
be detected among the offspring of recessive homozygous mothers, while both homozygotes
can be used with a two-allele codominant locus. An important question is the number of
families, offspring per family and loci that should be studied. There is probably no single
answer to this question, and parameters such as the actual variance in S among families or
locus variability should be taken into account. The answer also depends on the model
considered (e.g., effective selfing, correlated matings; see main text) and the parameters to be
estimated. Ritland (1986) reports simulation results suggesting that there is little gain in using
more than eight to ten offspring per family when estimating S under the mixed-mating and
effective selfing models. Using highly polymorphic loci allows more precise estimates, since
outcrossed events are detected with less ambiguity, and the variance of various mating system
estimates decreases with the number of alleles per locus (Ritland, 1988). Such loci might
though be associated with larger error rates due to technical problems (Hoffman and Amos,
2005). In the correlated-matings model, the variance of the main parameters (selfing,
correlation of selfing and correlation of paternity) decreases with both the number of loci and
the number of alleles per locus (Ritland, 1989, 2002). In general, K. Ritland’s simulations
24
suggest that there is little gain in using more than five to six loci when estimating mating
system parameters.
3. Some pitfalls: progeny-arrays and technical artefacts
The influence of partial dominance and the kind of technical artefacts mentioned in main text
(e.g., null alleles) have not been worked out, although K. Ritland mentions in the most recent
version of MLTR documentation (May 2004) that family estimates of selfing are sensitive to
scoring errors. We do not propose a general view on this problem, but as a first approach
consider a very simple situation in which S estimates might be biased by null alleles. Let us
assume a progeny-array analysis in which a large number of families are assessed, as well as a
large number of offspring per family (to avoid sampling variance). Individuals are genotyped
at a locus with three alleles (A1, A2 and A3; A3 is a null allele). We also assume that A1 and
A2 have same frequency (q / 2), and that the frequency of A3 is p. To remain close to
experimental conditions, we consider that families with A3A3 (null homozygotes) mother are
eliminated, as well as offspring which are either A3A3, or incompatible with their mother’s
genotype. Although this might look at first glance as an unlikely situation, it should be
remembered that the maternal genotype is in some studies inferred from offspring genotype
(and a null allele at low frequency might well be “invisible”), or even worse corrected to be
consistent with those of offspring (K. Ritland, pers. comm.). This also means that the apparent
allelic frequency (estimated without taking the null allele into account) of both A1 and A2 is
½. The population has inbreeding coefficient F and selfing rate S. Let Z be the apparent
selfing rate.
Four situations can be distinguished with regard to the mother genotypes: null
homozygotes, null heterozygotes, homozygotes for A1 or A2, A1A2 heterozygotes. This
stratification can be used to derive LZ / M , O , the likelihood of Z given the mother and
25
offspring genotypes. The likelihoods are given in Table 1, together with the expected
frequencies of mothers and their actual and scored genotypes. The log-likelihood of the whole
sample can be derived from Table 1 (removing a constant and taking into account that p* = 0
and q* = ½ with p* and q* the frequencies of A1 and A2 estimated on data):
 

q
q 

LZ / M , O   ln 1  Z  P1 1  1  S    P2 1  1  S 
2
4
t


2 



q
q 
S
q

 ln 1  Z  P1 1  S   P2 1  S    ln 2 P3 1  S  p   1  S  
2
4t 2 
2
2


(2).
This can more simply be written as: LZ / M , O  ln 1  Z Q1  ln 1  Z Q2  c , where
Q1 and Q2 correspond to the first and second terms in equation (2), and c is a constant with
regard to Z (third term). The maximum likelihood value of Z can be found by deriving this
equation with regard to Z and equating to 0. It comes Z  Q1  Q2 /Q1  Q2  , or:
Z S
1  S 8  5S  p  2(S  1)2 2S  5 p 2  4(S  1)3 S  2 p3  Op 4 
32  S 
9S  2
27S  2
(3).
When S is small, Z  4 p 3 . This might be compared to the situation when the selfing rate is
estimated using the inbreeding coefficient (Supplementary Appendix 4) in which the bias is of
order 4p. The difference between actual and estimated selfing rates increases with the null
allele frequency and decreases with the selfing rate. An illustration is provided in Figure 1. As
mentioned in main text, technical problems can be detected when genotyping a large enough
number of progenies.
Table 1. Mother and offspring genotypes at a locus with three alleles, together with frequencies and likelihoods ( LZ / M , O ).A3 is a null allele,
and the actual genotypes might differ from the scored genotypes. Null homozygous mother and offspring, as well as incompatible motheroffspring pairs, are discarded (grey overlay). Frequencies are given assuming that A3 is a regular allele, and likelihoods assuming that it is a null
allele (denoted 0). The frequencies of A1, A2 and A3 are q/2, q/2 and p. S is the actual selfing rate, and Z the selfing rate to be estimated taking
into account the occurrence of a null allele. p* and q* are frequency estimates from data, i.e. 0 and ½ respectively.


2
2 qF  1  F q 2
2 pq1  F 
q1  F 
,


Q

pF

1

F
p
, P2 
and P3 
. In the second and third rows, mother
P1 
t2  3S  1  S 3  p 4 ,
1 Q
1 Q
2(1  Q)
genotypes before (resp. after) “/” are associated to offspring genotypes before (resp. after) “/”. More details in text.
27
Actual genot.
A3A3
A1A1/ A2A2
A1A3 / A2A3
A1A2
Mother
Scored genot.
00
A1A1 / A2A2
A1A1 / A2A2
A1A2
Freq.
Q
P1
P2
P3
LZ / M , O
A1A3 or A2A3
Offspring
Scored genot.
Frequency
00
S  1  S  p
A1A1 or A2A2
1  S q 2
1  Z q*
A1A1 or A2A2
A1A2
A1A1 / A2A2
A2A2 / A1A1
A1A2 / A1A2
A1A1 or A2A2
A1A2
A1A1 / A2A2
A2A2 / A1A1
A1A2 / A1A2
0
0
Z  1  Z q*  1  Z  2
0
1  Z q*  1  Z  2
A1A3 / A2A3
A1A1 / A2A2
A2A3 /A1A3
A3A3 / A3A3
A3A3 / A3A3
A1A3 / A2A3
A2A3 /A1A3
A1A2 / A1A2
A1A1 / A2A2
A2A2 / A1A1
A3A3
A1A3 or A2A3
A1A1 or A2A2
A2A2 / A1A1
00 / 00
00 / 00
A1A1 / A2A2
A2A2 / A1A1
A1A2 / A1A2
A1A1/ A2A2
A2A2 / A1A1
00
A1A1 or A2A2
A1A1 or A2A2
A1A2
A1A2
Actual genot.
A3A3
0
0
S  1  S q 2
0
1  S q 2
1  S  p
0
0
excluded
S  (1  S )1  q 2 2t2
excluded
1  S q 4t2
S  1  S q 4t2
0
0
1  S  p
S 2  1  S q 2
S 2  1  S q 2
Z  1  Z  p*
2
Z  1  Z q*  1  Z  2
0
0
0
Z  1  Z q*  1  Z  2
0
1  Z q*  1  Z 
2
Z  1  Z q  1  Z  2
*
0
0
Z 4  1  Z q* 2  1 4
Z 4  1  Z q* 2  1 4
Z 2  1  Z q*  1 2
Figure 1. Difference between the estimated and actual selfing rates (ΔS) as a function of the
null allele frequency (p) for various values of S (diamonds: 0; squares: 0.2; triangles: 0.5;
crosses: 0.8) in the single-locus PAA. The difference is given by equation (3).
ΔS
0.5
0.4
0.3
0.2
0.1
p
0
0
0.1
0.2
0.3
29
References to Supplementary information
Avise JC (2000). Phylogeography. Harvard University Press: Cambridge, Massachusetts.
Brown AHD, Buron JJ, Jarosz AM (1989). Isozyme analysis of plant mating systems. In
Soltis D, Soltis P (eds) Isozymes in plant biology, Dioscorides Press. Pp. 73-86.
Burnham KP, Anderson DR (2002). Model selection and multimodel inference: a practical
information-theoretic approach. Springer-Verlag: New York.
Curie-Cohen M (1982). Estimates of inbreeding in a natural population: a comparison of
sampling properties. Genetics 100: 339-358.
Cutter AD (2006). Nucleotide polymorphism and linkage disequilibrium in wild populations
of the partial selfer Caenorhabditis elegans. Genetics 172: 171-184.
David P, Pujol B, Viard F, Castella E, Goudet J (2007). Reliable selfing rate estimates from
imperfect population genetic data. Mol. Ecol. 16: 2474-2487.
Ellegren H (2004). Microsatellites: Simple sequences with complex evolution. Nat. Rev.
Genet. 5: 435-445.
Estoup A, Angers B (1998). Microsatellites and minisatellites for molecular ecology:
theoretical and empirical considerations. In Carvalho G (eds) Advances in Molecular
Ecology, NATO press: Amsterdam. Pp. 55-86.
Goodwillie C, Kalisz S, Eckert CG (2005). The evolutionary enigma of mixed mating in
plants: occurrence, theoretical explanations, and empirical evidence. Ann. Rev. Ecol.
Evol. Syst. 36: 47–79.
Henry P-Y, Bousset L, Sourrouille P, Jarne P (2005). Partial selfing, ecological disturbance
and reproductive assurance in an invasive freshwater snail. Heredity 95: 428-436.
Hoffman JI, Amos W (2005). Micosatellite genotyping errors: detection approaches, common
sources and consequences for paternal exclusion. Mol. Ecol. 14: 599-612.
30
Hudson RR (2001). Two-locus sampling distributions and their application. Genetics 159:
1805-1817.
Jarne P, Auld JR (2006). Animals mix it up too: the distribution of self-fertilization among
hermaphroditic animals. Evolution 60: 1816-1824.
Jarne P, Lagoda PJL (1996). Microsatellites, from molecules to populations and back. Tr.
Ecol. Evol. 11: 424-429.
Lowe A, Harris S, Ashton P (2004). Ecological genetics - Design, analysis and application.
Blackwell.
Lynch M, Walsh B (1998). Genetics and analysis of quantitative traits. Sinauer: Sunderland,
Massachusetts.
Pasteur N, Pasteur G, Bonhomme F, Catalan J, Britton-Davidian J (1987). Manuel technique
de génétique par électrophorèse des protéines. Lavoisier: Paris.
Pompanon F, Bonin A, Bellemain E, Taberlet P (2005). Genotyping errors: causes,
consequences and solutions. Nat. Rev. Genet. 6: 847-859.
Richardson BJ, Baverstock PR, Adams M (1986). Allozyme electrophoresis: a handbook for
animal systematics and population studies. Academic Press: Sidney.
Ritland K (1986). Joint maximum-likelihood-estimation of genetic and mating structure using
open-pollinated progenies. Biometrics 42: 25-43.
Ritland K (1988). The genetic-mating structure of subdivided populations. 2. Correlated
mating models. Theor. Pop. Biol. 34: 320-346.
Ritland K (1989). Correlated matings in the partial selfer Mimulus guttatus. Evolution 43:
848-859.
Ritland K (1990). Inferences about inbreeding depression based on changes of the inbreeding
coefficient. Evolution 44: 1230-1241.
31
Ritland K (2002). Extensions of models for the estimation of mating systems using n
independent loci. Heredity 88: 221-228.
Thompson SL, Ritland K (2006). A novel mating system analysis for modes of self-oriented
mating applied to diploid and polyploid arctic Easter daisies (Townsendia hookeri).
Heredity 97: 119-126.
Van Oosterhout C, Hutchinson WF, Wills DPM, Shipley P (2004). MICRO-CHECKER:
software for identifying and correcting genotyping errors in microsatellite data. Mol.
Ecol. 4: 535-538.
Vos P, Hogers R, Bleeker M, Reijans M, Vandelee T, Hornes M et al. (1995). AFLP- A new
technique for DNA-fingerprinting. Nucl. Ac. Res. 23: 4407-4414.
Download