lecture_17(LP)

advertisement
Selection and Neutrality
• Mutations (though rare) arise constantly in all organisms.
• What happens to them over time?
• Some changes are without any consequence - neutral.
• Other changes are deleterious - negative selection.
• Very rarely, some change is advantageous - positive
selection.
Neutrality
Any particular allele that confers no selective advantage
or disadvantage (as homozygote or heterozygote) is
called neutral.
Examples?
• most synonymous codon changes.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• most changes in introns.
• most changes in transposable element sequences.
Allele frequency
• Consider one position in the genome (conventionally
called a “site”).
• Count up the nucleotide at that position in all
members of a population (or the entire species).
For example:
nt
A
C
G
T
total
number frequency
7934
0.793
0
0.000
2054
0.205
12
0.001
10000
1.000
Two allele simplification
• Most positions in the genome are all (or almost all) one
nucleotide.
• Called “monomorphic” (one form).
• Most positions with differences are dominated by two
nucleotides.
• Called “dimorphic” - extensively modeled.
For example:
nt
A
C
G
T
total
number frequency
99998
1.000
1
0.000
0
0.000
1
0.000
100000
1.000
(nearly) monomorphic
For example:
nt
A
C
G
T
total
number frequency
65476
0.655
0
0.000
34523
0.345
1
0.000
100000
1.000
(nearly) dimorphic
Two allele simplification
By convention, allele frequencies are called p and q.
Designate allele A (frequency p), allele a (frequency q).
Genotype frequencies (with random mating) are:
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
AA homozygote: p2
Note - A and a don’t imply
Aa heterozygote: 2pq which one is “normal” or
dominant, they are simply
aa homozygote: q2
designators.
p  q 1
( p  q) 2  p 2  2 pq  q 2  1 (Hardy Weinberg equilibrium)
Tracking allele frequency over time (2-allele system)
• At each generation, the gametes that make the next
generation are drawn AT RANDOM from the current alleles.
• As this random draw repeats, allele frequencies change.
• Call the two alleles A and a.
fraction of A allele
(a allele is 1 - p)
diploid population size = 1000
no selection, simply random gametes drawn at each generation.
A graphical representation of how the
simulations work
random draw of alleles
for gametes that
produce next generation
Population: ~100
diploids red allele 20%
What will the frequency
of the red allele be?
Possibly 19%, 20%, 21% etc.
(in fact this value will have an approximately
normal distribution with mean 20%)
diploid population size = 1000
At each generation, make a pool of 2,000 gametes
drawn randomly according to the allele frequency.
Repeat.
Tracking allele frequency over time (cont.)
If you run multiple simulations, the result differs:
diploid population size = 1000
Each line is from an identical independent simulation.
This is called genetic drift.
Tracking allele frequency over time (cont.)
The rate of drift depends only on population size:
population 1,000
population 300
population 100
Allele extinction and fixation
A allele fixed
(a allele extinct)
a allele fixed
(A allele extinct)
Neutrality and Drift
Now consider a brand new mutation (arose in one gamete
of one parent):
population size 50,
single new allele
most of the time the
new allele quickly
disappears
rarely the new allele
drifts to fixation
Neutral mutations and drift
to fixation underlie most
genome change.
Most human SNPs are neutral!
• Probably >90% of all known human SNPs (possibly 99%)
are neutral.
If a sequence has no function (evolves neutrally) it will
usually have changed so much between mouse and human
that you can no longer detect similarity.
From UCSC browser:
thin olive line can’t be aligned
to mouse
positions of
known
SNPs
Sum up over whole genome - is each SNP located in region
conserved in mouse?
Neutrality and mapping SNPs
You can now understand why this sort of pedigree makes
sense - the A and B alleles have no phenotype (they are
neutral). The only way to detect them is by some molecular
genotyping test.
From
lecture 12:
Negative (purifying) selection
• Force that “keeps things the same”.
• Eliminates deleterious sequence changes via phenotypic
effect.
• Accounts for sequence conservation over long times.
Negative (purifying) selection
Eliminates deleterious sequence changes via phenotypic effect.
population size 1,000
¼ of AA homozygotes fail to reproduce
CFTR protein
alignment: human,
chimp, dog, mouse,
rat, chicken, zebrafish
(gap positions
removed)
strong
negative
selection
weaker
negative
selection
Human:
Chimp:
Dog:
Mouse:
Rat:
Chicken:
Zebrafish:
what about this site?
probably neutral - it doesn’t matter what amino
acid is here, so any mutation is acceptable
but then why are the human-chimp
and mouse-rat amino acids the same?
very little divergence time - amino acid changing
mutation just hasn’t happened
Negative selection gets very weak when allele frequency is
very low. (for recessives)
population size 10,000 ¼ of AA
homozygotes fail to reproduce
A allele frequency AA genotype frequency
0.5
0.25
0.3
0.09
0.1
0.01
0.03
0.0009
0.01
0.0001
0.003
0.000009
0.001
0.000001
for a recessive,
only these can
be selected
against
Disease alleles
• Almost by definition, disease alleles are all under strong
negative selection.
Why do they still exist?
• Primarily, balance between new mutation and effective
removal by negative selection.
• Rarely, disease alleles are advantageous in heterozygotes
or under unusual conditions.
Examples of heterozygote advantage include
sickle cell trait.
Heterozygote advantage, homozygote lethal
population size 500
From
lecture 1:
AA homozygote reproduces 80% as often as Aa
heterozygote, aa homozygote never reproduces.
a allele similar to sickle-cell trait - heterozygote is relatively
resistant to malaria, but homozygote is very sick.
Reaches a stable allele frequency proportional the degree of
heterozygote advantage.
Positive (Darwinian) selection
• When an allele confers a selective advantage.
• Very rare among newly arisen mutations.
• Corresponds to the type of selection Darwin had in mind.
Ultimately responsible for all “adaptive” change.
Positive selection is powerful
Simulation started with a SINGLE A allele (mimics new
mutation)
population size 1,000
AA reproduces 100% of the time
Aa reproduces 90% of the time
aa reproduces 80% of the time
notice relatively small
advantage conferred
by A allele
Lactase persistence - example of positive selection
• The enzyme lactase permits digestion of lactose (mostly
found in milk).
• Encoded by a single gene - LCT.
• Ancient humans expressed lactase only in infancy.
• Many modern human populations express lactase
throughout life.
ancestral
state
common in
many current
human
populations
weaning
LCT gene
weaning
LCT gene
off
Frequency of lactase persistence
little or no
dairy
farming
The selection for lactase persistence
probably started only about 5,000 to
7,000 years ago.
Course review
session
Final exam:
-8 questions - 200 points total
-Questions focus almost exclusively on material covered in
the second half of class (lecture 10 onwards)…
- if you’ve forgotten the major concepts from the
first half of the course (Mendelian segregation,
complementation, epistasis, etc.) you will have
trouble
-if you didn’t work the problem sets, you may also
be sorry
-Todays review session will cover molecular markers, the
use of molecular markers in linkage studies, LOD score
analysis & constructing contigs (STS analysis, chr. walk)
What are polymorphic loci?
A polymorphic site or locus…
A location in the genome where at least two versions of
the sequence exist in the population, each at a frequency
of at least 1%
e.g., UW student population— about 40,000
 80,000 copies of (e.g.) chromosome 2
70,000 copies have A-T base pair
10,000 copies have C-G base pair
each is at > 1% of total population,
so this is a polymorphic site
How to find polymorphic sites?
-Randomly select and test specific DNA sequences
DNA from
indiv. #1
DNA from
indiv. #2
DNA from
indiv. #3
Digest DNA with
restriction enzyme
** *
Use as a probe in
southern blot
Etc.
Southern blotting procedure
radioactive probe: * *
*
= restriction site
restriction endonuclease digestion
large
-
agarose gel electrophoresis
Denature DNA with
NaOH then transfer
to nylon filter
Hybridize radioactive
single-stranded DNA
probe
+
small
Expose to film
nylon filter with
immobilzed DNA
* *
*
Practice question
What would homozygous genotypes look like on the Southern
blot?
Hind III restriction
..AAAAGCTTAG..
endonuclease
..TTTTCGAATC..
cleavage site
..AATAGCTTAG..
..TTATCGAATC..
Practice question
What would homozygous genotypes look like on the Southern
blot?
Practice question
What would homozygous genotypes look like on the Southern
blot?
How to find polymorphic sites?
-Randomly select and test specific DNA sequences
DNA from
indiv. #1
DNA from
indiv. #2
DNA from
indiv. #3
Digest DNA with
restriction enzyme
** *
Use as a probe in
southern blot
Not polymorphic
Etc.
How to find polymorphic sites?
-Randomly select and test specific DNA sequences
Etc.
Repeat with a
different probe
DNA from DNA from DNA from
indiv. #1
indiv. #2
indiv. #3OR the same
probe and a
different
Digest DNA with
restriction
restriction enzyme
enzyme digest
** *
Use as a probe in
southern blot
Not
Polymorphic
polymorphic
!
could be
20,000,000bp
(or more)
Disease
gene
X
The POINT:
The polymorphic marker
a polymorphic
and the disease gene are
site in the
two different entities; the
genome
polymorphic marker may
segregate with (be linked
to) the disease gene, but
is not the disease gene.
normal
allele of
disease
gene
Using polymorphic markers to map a human gene
Example: Using a Restriction Fragment Length Polymorphism
(RFLP) marker to map a dominant (D) disease.
SNP resulting in RFLP
RFLP allele 1:
Restriction
endonuclease
RFLP allele 2:
cleavage site
Probe:
marker
Dd dd
Genomic DNA obtained
from corresponding
individuals in the pedigree.
1
2
3
4
5
6
7
8
dd Dd Dd dd dd Dd Dd Dd RFLP
DNA is digested with
restriction enzyme and
subjected to southern
blot analysis using probe
shown above
RFLP genotype: 1,2 1,1 1,1 1,2 1,2 1,1 1,1 1,2 1,2 1,1
1
2
Using polymorphic markers to map a human gene
Dd dd
If linked, one of the
RFLP alleles should
segregate with the
disease; the other
RFLP allele should
segregate with WT
phenotype.
1
2
3
4
5
6
7
8
dd Dd Dd dd dd Dd Dd Dd RFLP
RFLP genotype: 1,2 1,1 1,1 1,2 1,2 1,1 1,1 1,2 1,2 1,1
Let’s say that we know the phase of I-1 and I-2:
2
1
d
d
Which individuals in
the pedigree appear1
2
D
D
to be recombinants?
1
2
Using polymorphic markers to map a human gene
Dd dd
If linked, one of the
RFLP alleles should
segregate with the
disease; the other
RFLP allele should
segregate with WT
phenotype.
1
2
3
4
5
6
7
8
dd Dd Dd dd dd Dd Dd Dd RFLP
1
2
RFLP genotype: 1,2 1,1 1,1 1,2 1,2 1,1 1,1 1,2 1,2 1,1
Let’s say that we know the phase of I-1 and I-2:
1
d
Map distance between disease
gene and RFLP marker:
2
D
1/8 (100) = 12.5cM
Using polymorphic markers to map a human gene
Dd dd
If linked, one of the
RFLP alleles should
segregate with the
disease; the other
RFLP allele should
segregate with WT
phenotype.
1
2
3
4
5
6
7
8
dd Dd Dd dd dd Dd Dd Dd RFLP
1
2
RFLP genotype: 1,1 2,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2 1,2
Is the marker linked to the disease?
Can’t tell
Practice question
In a certain plant species…
R
f
r
F
X
flower fragrance (F) is dominant
over unscented (f)
r color (B) fis dominantr over white (b)
f
blue flower
rounded leaves (R) is dominant over pointy (r); and
The parental
and recombinant
thorny stems
(T) is dominant
over smooth stems (t).
types are the same! Need to be
From the following crosses, can you determine whether the fragrance
heterozygous at both loci
gene is linked to any of the other genes; if so, at what map distance?
Bb Ff x bb ff
Rr ff x rr Ff
Tt Ff x tt ff
270 blue, fragrant
281 blue, non-fragrant
268 white, fragrant
275 white, non-fragrant
219 rounded, fragrant
222 rounded, non-fragrant
209 pointy, fragrant
216 pointy, non-fragrant
333 thorny, fragrant
36 thorny, non-fragrant
39 smooth, fragrant
342 smooth, non-fragra
Can’t tell!
LOD score analysis
LOD=X% = log10
probability of observed genotypes
if the loci are linked at X cM
probability of observed genotypes
if the loci are unlinked
example
= Normal
= nail patella syndrome
(dominant)
I
B
A (blood type phenotype)
II
AB
1
2
3
O
4
5
6
7
8
III
B
B
A
B
B
B
A
A
Figure 1. A human pedigree showing segregation of the nail-patella
syndrome and blood type.
Recombinants?
Recombination frequency? 1/8*100 = 12.5%
example
= Normal
= nail patella syndrome
(dominant)
I
B
A (blood type phenotype)
II
AB
1
2
3
O
4
5
6
7
8
III
B
B
A
B
B
B
A
A
Figure 1. A human pedigree showing segregation of the nail-patella
syndrome and blood type.
LOD=X% = log10
probability of observed genotypes
if the loci are linked at X cM
probability of observed genotypes
if the loci are unlinked
example
= Normal
= nail patella syndrome
(dominant)
I
B
A (blood type phenotype)
II
AB
1
2
3
O
4
5
6
7
8
III
B
B
A
B
B
B
A
A
Figure 1. A human pedigree showing segregation of the nail-patella
syndrome and blood type.
LOD=12.5% = log10
probability of observed genotypes
if the loci are linked at 12.5 cM
probability of observed genotypes
if the loci are unlinked
example
= Normal
= nail patella syndrome
(dominant)
I
B
A (blood type phenotype)
II
AB
1
2
3
O
4
5
6
7
8
III
B
B
A
B
B
B
A
A
Figure 1. A human pedigree showing segregation of the nail-patella
syndrome and blood type.
probablity of recombinant (12.5cM) = 0.125/2 = 0.0625
probablity of non-recombinant = (1-0.125)/2 = 0.4375
example
LOD=12.5% = log10
probability of observed genotypes
if the loci are linked at 12.5 cM
probability of observed genotypes
if the loci are unlinked
probablity of recombinant (12.5cM) = 0.125/2 = 0.0625
probablity of non-recombinant = (1-0.125)/2 = 0.4375
example
LOD=12.5% = log10
(0.4375)7*(0.0625)
probability of observed genotypes
if the loci are unlinked
i.e. independent assortment
probablity of recombinant (12.5cM) = 0.125/2 = 0.0625
probablity of non-recombinant = (1-0.125)/2 = 0.4375
example
LOD=12.5% = log10
(0.4375)7*(0.0625)
= 1.099
(0.25)8
probablity of recombinant (12.5cM) = 0.125/2 = 0.0625
probablity of non-recombinant = (1-0.125)/2 = 0.4375
+3 and up: significant evidence in
favor of linkage
Does a LOD score value
of 1.099 provide
significant evidence in
favor of linkage?
evidence
not
significant
Below -2: evidence against linkage
I. Clone large pieces of the genome
Make a BAC library of large genomic DNA inserts
STS 24 62 17 54
20
9 19 36 4
Many copies of the same
chromosome… different
copies sheared at different
places
BACs
Bacterial Artificial Chromosome;
can hold inserts of 150-200 kb
(YACs
vector
insert
Yeast Artificial Chromosomes;
can hold inserts of > 1 million
bp.)
II. Map location on genome
STS 24 62 17 54
20
9 19 36 4
For example:
STS = sequence tagged
site… short, unique
genomic sequence—not
present anywhere else in
the genome— that can be
detected by PCR… ID tag
for that portion of genome
Which portion of the genome is represented in this
BAC’s insert?
Test the BAC by PCR:
Does it test positive* with PCR primers for STS 24?
Does it test positive with PCR primers for STS62? …etc.
*Test positive? What does that mean?
4. Repeat-but also test
whether your new clone has
the same sequences as the
previous clones
3. Design PCR primers
that can amplify these
sequences
2. Sequence a
few small parts
1. Randomly
pick one clone
BAC clone library
Means PCR of this clone produces a
product with primer sets that amplify
products A,B,C-but clone was tested with
all of the primer sets-so the other
sequences aren’t represented in this clone
Means that CHI sites are neighbors, but
don’t know the order in which they are
arranged
Means that B site is next to H site
(ATTTAT) 8
I
Dd Dd
Cf allele?
II
D? dd D?
(ATTTAT) 3
Cf allele?
Gel analyzing repeat size for each individual
I-1
I-2
II-1
II-2
II-3
Repeat
size
(ATTTAT) 6
10
Cf allele?
8
6
3
(ATTTAT) 10
Cf allele?
Cystic fibrosis is caused by mutations in a single human gene.
The pedigree to the right describes the inheritance of cystic
fibrosis in a small family. Based on that information, what is the
most likely mode of inheritance of cystic fibrosis? Write a
genotype for each individual given the phenotype information
alone. If you are uncertain of an allele, use a "?".
(ATTTAT) 8
I
Dd Dd
3,8 10,6
II
D? dd D?
8,6 10,3 10,8
Gel analyzing repeat size for each individual
I-1
I-2
II-1
II-2
II-3
Repeat
size
Cf allele?
(ATTTAT) 3
Cf allele?
(ATTTAT) 6
10
Cf allele?
8
6
3
(ATTTAT) 10
Cf allele?
A cartoon of the 4 parental chromosomes that carry the cf locus
and the linked repeat is shown to the left. Using the pedigree
and the linked marker, show which repeat is linked to a wild type
(WT) or a mutant (cf) allele. Write that information on the
figure, and write the repeat genotype under each individual in
the pedigree as well.
and
10 d
3 d
maybe
I
Dd Dd
3,8 10,6
II
D? dd D?
8,6 10,3 10,8
Gel analyzing repeat size for each individual
I-1
I-2
II-1
II-2
II-3
Repeat
size
10
8
3 d
and
10 d
maybe
6
3
These parents are expecting another child. The DNA of the fetus is
isolated and analyzed as above. The repeat locus genotype is 10/8.
What would you predict about the fetus’ genotype at the cf locus?
How certain can you be about the cf genotype? What other
information would you need to increase your confidence in the
prediction?
The two pedigrees show inheritance of an autosomal dominant
trait (D = disease, dominant; d = normal, recessive). Numbers in
{curly brackets} indicate alleles of a microsatellite repeat
polymorphic locus. For each pedigree, state whether the meiosis
in II-1 is informative or uninformative, giving the parental types
for II-1 in each case.
Are the pedigrees informative or uninformative?
Download