genetic drift

advertisement
Definition of Evolution
The Operational Definition of
Evolution at the Level of a Deme
is a Change in Allele or Gamete
Frequency In the Gene Pool.
Evolutionary Force
A Factor or Process That Can
Change The Frequency of an
Allele In the Gene Pool.
Deme of N Individuals
Gene Pool Before
Mutation
A
p=1
Mutation
A
p = 1-1/(2N)
a
Gene Pool After
Mutation
q = 1/(2N)
Mutation Is an Evolutionary Force
Genetic Drift
Genetic Drift Occurs When
Sampling Error Alters Allele
Frequencies.
Sampling Error Occurs When
Populations Are Finite in Size.
Therefore, Finite Population Size
is An Evolutionary Force
e.g., the two largest samples have ratios closest to 3:1, but still not “perfect”
Mendel’s Ratios Were Not “Perfect” Because
They Are Based On A Finite Number of
Observations.
A Frequency In A Sample Only Converges To the
Probability As The Sample Size Gets Larger and
Larger.
A Deme Is A Collection of Such
Crosses, Each Subject to Random
Sampling Error in Its Mendelian
Ratios
Probabilities Vs. Frequencies in Demes
and Gene Pools:
MM
0.59
diploid
Meiosis
haploid
Mendelian
Probabilities
In Meiosis
MN
0.33
1
1/
M
1(0.59) + 1/2(0.33) = 0.76
2
1/
NN
.08
1
2
N
1(.08) + 1/2(.33) = .24
This is a Mendelian Probability. In a finite sample of gametes from
MN individuals you will often get deviations from Mendel’s 1:1 ratio.
Probabilities Vs. Frequencies in Demes
and Gene Pools:
MM
0.59
diploid
Meiosis
haploid
Mendelian
Probabilities
In Meiosis
MN
0.33
1
1/
M
1(0.59) + 1/2(0.33) = 0.76
2
1/
NN
.08
1
2
N
1(.08) + 1/2(.33) = .24
These are the probabilities that MM and MN individuals live and have
offspring. In a finite sample, can get deviations by chance alone.
Probabilities Vs. Frequencies in Demes
and Gene Pools:
MM
0.59
diploid
Meiosis
haploid
Mendelian
Probabilities
In Meiosis
MN
0.33
1
1/
M
1(0.59) + 1/2(0.33) = 0.76
2
1/
NN
.08
1
2
N
1(.08) + 1/2(.33) = .24
In a finite population, this is the probability of an M allele in the gene
pool, and not necessarily the frequency in the offspring produced.
Computer Simulation of Genetic Drift
Gene Pool
A
1/
2
Sample 10 Gametes to
Create 5 Individuals
a
1/
2
Do This 20 Times To Show
Sampling Variation
Number (Frequency) of A Alleles
Property 1 of Genetic Drift: No Direction
A
1/
2
a
1/
2
p = 0.5
Number (Frequency) of A Alleles
Property 2 of Genetic Drift: It Is Cumulative
A
1/
2
a
1/
2
10 Gametes
Let This Be The Sample That Actually Occurs
Property 2 of Genetic Drift: It Is Cumulative
A
1/
2
a
1/
2
10 Gametes
10 Gametes
Property 2 of Genetic Drift: It Is Cumulative
2N = 10
p
Bigger Deviations From Initial Gene Pool
Become More Likely With Passing Time
Generation
Property 2 of Genetic Drift: It Is Cumulative
Simulations of N = 50, p = 0.5
BottleneckSim[50,50,20,40,.5]
MultiSim[50, 50, 20, 40, .5]
Property 3 of Genetic Drift: Strength 1/2N
Gene Pool
A
1/
2
10
Gametes
a
1/
2
20
Gametes
Number (Frequency) of A Alleles
Property 3 of Genetic Drift: Strength 1/2N
Simulations of p=0.5 with N=25, 100 and 1000
MultiSim[N, N, 20, 40, .5]
Property 4 of Genetic Drift: Loss of Alleles
A
1/
2
a
1/
2
10 Gametes
10 Gametes
Properties 3 & 4 of Genetic Drift: Rate of Loss of Alleles
Rate of Loss = 1/2N
Properties 3 & 4 of
Genetic Drift: Loss of
Alleles = Coalescence
Under Genetic Drift:
Rate of Loss = 1/2N
Average Time for 2
Genes to Coalesce
= 2N Generations
Average Time for all
Genes To Coalesce
= 4N Generations
Properties 3 & 4 of
Genetic Drift: Loss of
Alleles = Coalescence
DriftSim[N, .5]
Property 5 of Genetic Drift: Isolated Demes
Become Genetically Differentiated (From Property 1)
2N = 20
p
4 Isolated Demes Started
From One Ancestral
Deme With p = 0.5
Generation
Property 6 of Genetic Drift: Random Changes In
Multi-locus Gamete Frequencies Create Linkage
Disequilibrium
Properties of Genetic Drift
1.
2.
3.
4.
Has No Direction
Is Cumulative
Strength is Proportional to 1/2N
Leads to Loss (and Fixation and Coalescence) of
Alleles Within Demes
5. Leads to Genetic Differentiation Between
Isolated Demes
6. Creates |D| > 0
Although Strength of Genetic Drift is
Proportional to 1/2N, Drift Can be
Important in Large Populations
1. Founder Effects -- A Large Population Today
Was Founded By A Small Number of Founders
in the Past.
2. Bottleneck Effects -- A Large Population Today
Underwent One or More Generations of Small
Size in the Past.
3. Neutral Alleles -- Alleles With No Impact on
Any Phenotype Related to Reproductive
Success. Their Fate is Determined by Drift and
Mutation.
Although Strength of Genetic Drift is
Proportional to 1/2N, Drift Can be
Important in Large Populations
1. Founder Effects -- A Large Population Today
Was Founded By A Small Number of Founders
in the Past.
2. Bottleneck Effects -- A Large Population Today
Underwent One or More Generations of Small
Size in the Past.
MultiSim[500, 2, 20, 40, .5]
A Human Founder Event
• The Population of the Mountain Village of Salinas
in the Dominican Republic Was 4,300 in 1974.
• The Village Was Founded By A Handful of People
7 Generations Before
• One Founder, Altagracia Carrasco, Had Many
Children by Four Women
• The Alleles Carried by Him Were Therefore in
High Frequency in the Founder Population Gene
Pool
• Subsequent Population Growth Reduced the Force
of Drift But “Freezes In” The Allele Frequencies
Created by the Initial Founder Event So His
Alleles Remain In High Frequency Even Today
Altagracia Carrasco, Like Most
People, Was A Heterozygous
Carrier For an Autosomal
Recessive Genetic Disease:
5- Steroid Reductase Deficiency
5- Steroid Reductase
testosterone
dihydrotestosterone
Under The Control of
Testosterone
Default
Pathway in
All
Mammals
Under The
Control of
Dihydrotestosterone
Linkage Disequilibrium In a Founder
Population From Costa Rica
Linkage Disequibrium Is Created By Population
Subdivision In A Manner Not Related To Recombination
(Creates Serious Problems For Disequilibrium Mapping)
Gene Pool for Population 1
Gene Pool for Population 2
gAB=1
gab=1
D=0
D=0
Gene Pool for Pooled Populations
gAB=1/2
D=gABgab=1/4, D’=1
gab=1/2
Problem!
Population Structure or Historical Isolates Can Create
Spurious Phenotypic Associations. E.g., in Quebec there are
French and English Speaking Canadians. French Canadians
Have Been Strongly Influenced by a Past Founder Event and
Show Allele Frequency Differences At Many Loci From the
English Population. Therefore, A Mapping Study of the
“Quebec” Population Would Reveal A Strong Association
Between Many Loci and the Language One Spoke.
Similarly, A Candidate Locus Study Would Find An
Association With Language If The Candidate Locus Showed
Haplotype Frequency Differences Between English and
French Canadians.
Avoiding Problem of Hidden
Population Structure
1. Use founder or bottleneck populations (but must
make sure they truly are and have been highly
isolated since the drift event)
2. Use several loci to reconstruct recent
evolutionary history and population structure prior
to initiating association study, and then choose
populations accordingly or use as a control set of
loci in the association study.
Founder & Bottleneck Events
• Can Drastically Alter Allele Frequencies, Including
Making Certain Genetic Disease Allele or Disease Risk
Alleles Common (makes obtaining pedigrees for linkage
mapping much easier)
• Leads to pedigree inbreeding (Speke’s gazelles; humans
on Tristan da Cunha)
• Creates Linkage Disequilibrium, Which Rarely Extends
Over 1 cM in Large Demes (makes disequilibrium
mapping much easier)
• Reduce Overall Genetic Variation, Creating A Simpler
Genetic Background
• For The Above Reasons, Such Populations Are
Important In Biomedical Research & Conservation
E.g., Positional Cloning & QTL’s
• The First Case of Positional Cloning Was
the Gene for Huntington’s Chorea
• Nancy Wexler Realized That The Key Was
to Find a Founder Population With A High
Frequency of HD.
• She Found Such A Population On Lake
Maracaibo
• Now, Founder Populations Such As This
Are Regarded As Commercially Valuable
Assets.
E.g., Positional Cloning & QTL’s
About 200 years ago, a single woman who happened to carry the
Huntington's allele bore 10 children — and today, many
residents of Lake Maracaibo trace their ancestry (and their
disease-causing gene) back to this lineage.
Effective Population Size
• Founder And Bottleneck Events Show That
The Current Size Of A Population May Not
Be A Good Indicator Of The Impact Of
Genetic Drift Upon That Population
• The Concept of EFFECTIVE
POPULATION SIZE Solves This Problem.
Effective Population Size
measures the strength of genetic drift
in influencing some population
genetic feature of interest relative to
how that same feature evolves
through genetic drift in an idealized
population over the same number of
generations
The Idealized Reference Population
•
•
•
•
•
•
a diploid population of hermaphroditic, self-compatible organisms
constant size of N breeding Adults
random mating
complete genetic isolation (no contact with any other population)
discrete generations with no age structure
all individuals contribute the same number of gametes on the average
to the next generation (no natural selection)
• the sampling variation in the number of gametes contributed to the
next generation by an individual is given by a Poisson probability
distribution.
The Most Common Parameters Used
To Monitor Genetic Drift are:
• The Average Level
of Identity by
Descent (inbreeding
effective size)
• The Variance In
Allele Frequency
Induced By Genetic
Drift (variance
effective size)
p
Generation
Tristan da Cunha
Impact of Drift On Average F In
An Idealized Population
(
)
1
1
F(t) = 2N + 1 F(t-1)
2N
Probability
Average
The 2
Probability
Gametes
Of Identity
By Descent From The
Same
At generation t
Individual
Are
Identical
Probability
Randomly
Draw 2
Gametes
From The
Same
Individual
Impact of Drift On Average F In
An Idealized Population
(
)
1
1
F(t) = 2N + 1 F(t-1)
2N
Probability That
Probability Of
Probability Of
2 Randomly Drawn
Identity By Descent
Not Drawing 2
Gametes That Are
Due To Drawing 2
Copies of The
Not Copies of The
Copies of The
Same Gamete
Same Gamete
Same Gamete
From The Previous
From The Previous
From The Previous
Generation
Generation Are
Generation
Identical By Descent
Due to Earlier Inbreeding
Impact of Drift On Average F In
An Idealized Population
(
)
1
1
F(t) = 2N + 1 F(t-1)
2N
Can Use The Above Equation Recursively To Obtain:
(
)
1 t
F(t) = 1- 1 2N
[F(0) = 0]
Impact of Drift On Average F In
An Idealized Population
(
)
1 t
F(t) = 1- 1 2N
If A Real Population Has An Observed
Average F of F(t) After t Generations
From the Reference Generation With
F = 0; Then The Inbreeding Effective
Size Is Given By:
(
1
F(t) = 1- 1 2Nef
)
t
or
Tristan da Cunha
Nef =
1
2{1-[1-F(t)]1/t}
Impact of Drift On Allele Freq.
Variance In An Idealized Population
(
)
1 t
 (t) = pq{1- 1 }
2N
2
If A Real Population Has An Observed
Variance of v(t) After t Generations
From the Reference Generation; Then
The Variance Effective Size Is Given By:
(
)
t
1
v(t) = pq{1- 1 }
2Nev
or
Nev =
1
2{1-[1-v(t)/(pq)]1/t}
There Is No Such Thing As The
Effective Size of a Population
• The effective size depends upon which
genetic parameter you are using
• The effective size depends upon which
reference generation you are using
• Therefore, a single population can have
many different effective sizes associated
with it, all biologically meaningful but
distinct
Example: Speke’s Gazelle
• Herd Started in 1969 With 4 Animals
• By 1979 There Were 19 Animals With An Average F of
0.1283 After 1.7 Generations
• Therefore, Nef Relative to the Founders is 6.4 < 19
(Founder Effect)
• In 1979, Management Was Changed, and 15 New Animals
Bred with F = 0.149 and t = 2.7, yielding Nef = 8.6 < 15
(Founder Effect & f < 0)
• Using the parents of the 19 Animals in 1979 as Reference
Generation, then F = 0.0207 and t = 2, yielding Nef = 96.1
> 15 (Effect of Avoidance of Inbreeding in System of
Mating Sense)
Example: Speke’s Gazelle
• Herd Started in 1969 With 4 Animals
• In 1979, Management Was Changed, and 15 New Animals
Bred with v/(pq) = 0.135 and t = 2.7 (computer simulation
of exact pedigree), yielding Nev = 9.6 < 15 (Founder
Effect)
• The same 15 animals have
– Nev = 9.6 < 15 (relative to founder generation)
– Nef = 8.6 < 15 (relative to founder generation)
– Nef = 96.1 > 15 (relative to the management change
generation)
• WHAT IS THE EFFECTIVE SIZE OF THIS
POPULATION?
In Most Cases, Do Not Have Complete
Pedigree Information, Precluding the
Calculation of Various Effective Sizes.
Many Formulae Have Been Derived as
Estimators or Approximations to
Effective Size.
The Literature Is A Mess, Because
Many Do Not Distinguish Among The
Various Effective Sizes, and Often Mix
Inappropriate Formulae
Interactions of System of Mating
with Genetic Drift via Effective Size
• The ideal reference population assumes random mating.
• Suppose mating is non-random, either due to inbreeding or assortative
mating such that f > 0.
• Then:
 1 

1 
F (t)  f  (1 f )  1
F (t  1)
2N  2N 

I by D created
by system of
mating beyond
random mating
expectations.
I by D created by genetic drift at
random mating expectations.
Interactions of System of Mating
with Genetic Drift via Effective Size


1 
F (t)  1 (1 f )1

 2N 

t
N
N ef 
1 f (2N 1)
Interactions of System of Mating
with Genetic Drift via Effective Size
• The ideal reference population assumes random mating.
• Suppose mating is non-random, either due to inbreeding or assortative
mating such that f > 0.
• Then:
Variance in Allele Frequency
pq
pq pq(1 f )
= (1 - f )
f

2N
N
2N
variance created by genetic drift at
random mating expectations.
variance created
by system of
mating beyond
random mating
expectations.
Interactions of System of Mating
with Genetic Drift via Effective Size
N
N ev 
1 f
Interactions of System of Mating
with Genetic Drift via Effective Size
f=0.1
Nev
Nef
Population Size N
Interactions of Population Growth
with Genetic Drift via Effective Size
2N 1
N ef 
k 1 1 2Nk 
N ev  N
Where N is an idealized population in every way except
that each individual has an average of k offspring (k=2

corresponds to a constant sized population)
Interactions of Population Growth
with Genetic Drift via Effective Size
Neutral Alleles
Have no effect on any phenotype that
influences reproductive success and
therefore their evolutionary dynamics
are determined by mutation and
genetic drift
Neutral
Unfavorable
Favorable
Effects of 50 Spontaneous Mutation Lines Derived from a
Strain of Yeast Growing in a Laboratory Environment.
Neutral Alleles
(Kimura 1968)
• Genetic Drift Determines the Rate of Loss = 1/2N
• Mutation Determines the Rate of Input = (2N)
• Rate of Evolution = Rate of Input X Rate of Loss =
(2N)1/2N = 
Note: The Rate of Neutral Evolution Does Not
Depend upon Population Size. All populations,
regardless of size, have an innate tendency to
evolve as driven by mutation and drift. Moreover,
if the neutral mutations rates are comparable, this
tendency is just as strong in a large population as
in a small population. GENETIC DRIFT IS
IMPORTANT FOR ALL POPULATIONS!
Amino Acid Sequence Data
Human
Mouse
Chicken
Newt
Carp
Shark
Human
Mouse
Mouse
Chicken
Newt
Carp
Shark
16
35
62
68
79
39
63
68
79
63
72
83
74
84
Chicken
Newt
-Hb Data
Carp
• The Substitutions Seemed To Define A “Molecular Clock”
(King & Jukes, Sci. 154:788-798,1969).
• This Also Seemed To Support Kimura’s Theory Because It
Predicted The Rate of Substitution=, which was usually
treated as a constant.
85
Protein Electrophoresis Data
• Lewontin & Hubby (Genetics 54: 595-609, 1966),
Johnson et al. (Studies in Genetics. III: 517-532,
1966), and Harris (Proceedings of the Royal Society
of London B 164:298-310. 1966) showed that about
1/3 of all protein coding loci were polymorphic for
electrophoretically detectable alleles in Drosophila
and in humans
•Kimura and Ohta (Nat. 229: 467-489, 1971) could
explain this high level of variation with the Neutral
Theory
Kimura & Ohta
Time Period of Transient Polymorphism
1/(2N) of Neutral Mutations Go
To Fixation and Transiently
Contribute To Polymorphism
Levels
Most Neutral Mutations Are Lost
and Contribute Little to
Polymorphism Levels
Kimura & Ohta
 1 

1 
2
F (t)    1
F (t  1)(1  )
2N  2N 

 Average Probability  Probability of Identity Probability of No

 

of Identity by Descent   by Descent Due to Mutation in Both
 at Generation t
 
 Gametes
Genetic
Drift

 

Feq 
2N

1
1
(1  )2

1  1

1
4N  1
for  small
Let  = 4Nef

1

1 Feq  H eq  1

 1  1





Kimura & Ohta
Most
Observations
Below This
Threshold
This Implies A Small Range of
Population Sizes, and That Almost
All Species Have N < 5,000
(Including Insects & Bacteria).
Neutral & Nearly
Neutral
Effects of 50 Spontaneous Mutation Lines Derived from a
Strain of Yeast Growing in a Laboratory Environment.
Ohta (1973-1976) Created The Nearly
Neutral Theory To Explain The
Heterozygosity Observations
•Showed That Genetic Drift Determines Evolutionary Dynamics For Any Mutation With
|s|<1/(2Nev)
•Let (s) describe the probability of a mutation having selection coefficient s, then
1
•The neutral mutation rate=neutral=
•As Nev , neutral 
2N ev
 (s)ds
0
•This explains why Heterozygosity levels off and has a narrow range (recall =4Nneutral)

•Unfortunately, this also means you lose the molecular clock because the rate of
substitution is now a function of Nev
Evidence for Neutral Alleles
Evidence for Neutral Alleles
Evidence for Neutral Alleles
The pseudogene evolves more rapidly than the functional gene
Neutral Alleles
A substantial portion, perhaps the
majority, of the genetic variation
observed at the DNA sequence level
is neutral, making genetic drift a
major evolutionary force
This Also Means That It Is Difficult To Find The Minority
Of The Variation At the DNA Sequence Level That Has
Functional Significance.
Download