IV. De evolutie van menselijk gedrag

advertisement
Determining kinship relationships
using genetic markers
Tom Wenseleers
Laboratorium voor Entomologie
KULeuven
tom.wenseleers@bio.kuleuven.be
Lecture can be downloaded from
bio.kuleuven.be/ento/wenseleers/twpub.htm#courses
GA75 'Moderne Onderzoeksmethoden in de Biologie‘, maart 2006
Parentage and kinship analysis
Use of genetic markers to test hypotheses regarding
the kinship relationships among specific individuals
Kinship analysis: estimating the genetic relatedness
among individuals
Parentage analysis: identification of parents of specific
offspring
Applications
EVOLUTIONARY STUDIES
what is the relatedness between pairs of individuals?
test role of kin selection in the evolution of
cooperative behaviour
how common are extra-pair fertilizations?
how common is intraspecific brood
parasitism (egg dumping)?
BREEDING PROGRAMMES
design captive breeding programs for endangered
species
assign F1 generation fish to a particular father
in a mass-spawning experiment
HUMAN APPLICATIONS
parentage testing, forensic work at crime scenes
victim identification in mass disasters
What is Relatedness?
“The probability that a gene in one
individual is an identical copy, by
descent, of a gene in another individual”
Relatedness is a measure not of the absolute genetic
similarity between two individuals, BUT of the degree to
which this similarity exceeds the background similarity
between individuals randomly drawn from the population.
Marker considerations
Which markers can be used?
AA
AB
BB
best: nuclear, codominant markers
(allozymes) – low variability, not ideal
microsatellites – tandem repeats of 2-5 bp motifs
best choice
(high resolution, little DNA required)
also: dominant markers, but analysis methods not as powerful
minisatellites
AB
AA
BB
AFLP
RAPD
Microsatellites = short tandem repeats
CA
CA
CA
CA
CA
Non-repetitive
flanking sequence
CA
CA
CA
CA
CA
CA
CA
Non-repetitive
Microsatellite array
flanking sequence
(~20-100bp)
Length depends on
repeat unit size and
number of repeats
Sequences of DNA consisting of repeats of 2-5 base pair motifs
almost any combination possible (e.g. CA, GA, GGGAA )
discovered in 1980s, see e.g. Tautz, Trick & Dover (1986)
CA
Microsatellites
present in genome (nucleus + mitochondria + chloroplasts) of all
eukaryotes
easily amplified using PCR and separated on DNA sequencer
even highly degraded DNA can be used, e.g. single hair, faeces
highly variable, usually between 5 and 20 alleles
di-repeats most common; tri-, tetra- and penta-repeats rarer
human genome: 35,000 (CA)n repeats
wasps: 1% of 500bp fragments contain tri-repeat msats
most common in mammals and insects, in birds 10 x rarer
Microsatellites
primers to amplify msats:
- already developed for many organisms
- can sometimes be developed by searching Genbank for msat motifs
- cross-amplify using loci developed for other species
usually works within same genus and sometimes same family
if no loci are available  isolate and sequence new ones
steps: isolate total DNA  restriction digest  ligate small fragments into plasmid
or phage vector  transform E. coli cells  plate out colonies  lift colonies onto
filters  hybridize with probe containing msat repeats  pick & sequence positive
clones  design primers
if msats are rare and you need many loci: enrichment step
alternatively: have them developed commercially,
e.g. Amplicon, ca. 10.000 € / 10 loci
PCR conditions: universal touchdown program usually works for all loci
good isolation protocol:
http://www.uga.edu/srel/DNA_Lab/protocols.htm
http://snook.bio.indiana.edu/MENotes/home.html
[ MENotes DB search ]
[ MENotes DB home ]
------------------------------------------------------------------------
[ MENotes home ]
This is the sister database for Molecular Ecology Notes, containing the details for reported loci (i.e.,
primer sequences, amplification conditions, polymorphism levels, cross-species amplification, and
literature citations) in a searchable format.
The database contains all Primer Note submissions to Molecular Ecology, as well as primer
submissions to Molecular Ecology Notes. In the future, relevant submissions from other journals
will be included, as it is hoped that this database will become the on-line resource for molecular
markers developed for "non-commercial" and non-model species.
The database may be searched using either the easy search page (searches based on family, genus or
species names) or, in the future, an advanced search page, which will allow more flexible and
detailed queries as the database grows.
Authors whose data have been accepted for publication in the database should familiarize
themselves with the database submission instructions and then use the database submission form to
add their data.
For your convenience, links to the search page, this page, and the Molecular Ecology Notes
homepage at the Blackwell site are positioned at the top and bottom of each page.
If you have any questions or comments about this site, please email the database/website
administrator.
-----------------------------------------------------------------------[ MENotes DB search ]
[ MENotes DB home ]
[ MENotes home ]
Please direct comments and questions about this database to the administrator.
1. DNA extraction
For most purposes a small bit of tissue
can be boiled for 10 mins in 10% chelex
resin, and this works fine as a template
cheap & easy
2. PCR amplification
genomic DNA
+
primers
+
Taq DNA polymerase
+
dNTPs (ACGT)
+
buffer
Polymerase Chain Reaction
Process repeated 30-40 times
after 36 cycles: 236=68 billion copies
3. Detection:
Radioactive (P33) end-labelling
3. Detection:
Fluorescent labelling – gel based sequencer
3. Detection:
Fluorescent labelling – capillary sequencer
4 or 5 labels + 1 label
for internal size
standard
allows running up to
20 loci simultaneously
DNA minisatellites (“fingerprints”)
- tandem repeats of core
sequences 15-30 bp in length
(variable number tandem
repeats)
- most minisats occur 10 or 20
times in the whole genome
- human genome: ca. 50,000
VNTRs
- detected using Southern
blotting after restriction digest
- disadvantages: DNA quality
and amount needed; scoring
problems
DNA fingerprints can identify
individuals and determine
parentage
E.g., DNA fingerprints
confirmed Dolly the sheep
was cloned from an adult
udder cell
Donor udder (U), cell culture
from udder (C), Dolly’s
blood cell DNA (D), and
control sheep 1-12
AFLP
Amplified Fragment Length Polymorphism
Mutations at restriction enzyme
cutting sites result in
fragment length
polymorphism
Ligation of adapters to genomic
restriction fragments
Selective PCR amplification
with adapter-specific primers
Advantage
low development cost
Disadvantages
dominant marker
scoring
RAPD
Random Amplified Fragment Polymorphism
Arbitrary primers
8-10 bp long
V. little development
PCR amplification at low
stringency (Ta 35-45oC)
Variability
Point mutations
Insertions / deletions
Disadvantages:
dominant marker
repeatability, scoring
The bottom line
Microsatellite markers are the best !
Kinship analysis
Measure of Relatedness:
Queller and Goodnight estimator
P  P 
*
y
R
x
k
l
P  P 
*
x
x
k
l
(Queller and Goodnight 1989)
R = relatedness between individuals x and y where
Px = frequency within individual x of allele l at locus k (must be 0, 0.5
or 1.0 in diploid organisms)
Py = frequency of same allele in individuals to which x is
compared
P* = frequency of the allele in population at large (background
allele frequency)
Other estimators
Queller & Goodnight estimator works with codominant markers
assumes loci are unlinked
Also other estimators, e.g.
Ritland (1996), Lynch & Ritland (1999)
for codominant markers
Reeve et al. (1992), Lynch & Milligan (1994), Wang (2004)
for dominant markers
Different pros & cons in terms of how efficient & biased they are
see Van de Casteele et al. (2001), Wang (2004)
RELATEDNESS 5.0
http://www.gsoftnet.us/GSoft.html
Programs:
All Goodnight Software
programs are for
Macintosh PPC
computers only.
Relatedness calculates average genetic relatedness among
sets of individuals defined by demographic variables, either on
average or by pairs. It finds standard errors and confidence
intervals for signifiance testing using a jackknife resampling
method.
Features in Relatedness 5.0:
*Data sets with up to 127 loci of 127 alleles each, and number
of individuals limited only by computer memory.
*Up to 32 demographic variables with complete control over the
order in which they are checked.
*Pairwise values of relatedness.
*95% confidence intervals as well as standard errors.
The distribution package includes the program, a manual in
Microsoft Word 6.0 format, and a sample data set. (A
copy of the manual in Word 5.1 format is available on
request.)
Input data file
(1) Allele frequency block – you can also have the program calculate this
*Relatedness data file, population: Sample Data
*Saved 1/14/1998 10:12:28
*config Guide F-Delim
Deme-Col
ID-Col
*#config:
T/
F1
T1
*Allele frequencies
locus1
Freq
locus2
Freq
d
0.234 e
b
0.191 c
e
0.194 b
c
0.254 d
a
0.128 a
end
0.155
0.241
0.161
0.309
0.134
Grp 1-Col
T2
Grp 2-Col
F3
Demog-Col
2@3
locus3
b
c
a
e
d
Freq
locus4
b
e
c
a
d
0.266
0.321
0.215
0.143
0.055
Input data file
(2) Genotypes – you can also add demographic variables, e.g. nest
Ind ID
1—1
1—2
1—3
1—4
1—5
1—6
1—7
1—8
1—9
K-nest
1
1
1
1
1
1
1
1
1
color
red
red
red
red
red
red
red
red
red
sibship
f1
f1
f1
f1
f1
f2
f2
f2
f2
locus1
d/d
d/b
b/d
d/d
b/d
d/d
d/d
d/d
d/d
locus2
e/c
c/e
c/e
c/e
c/e
b/c
c/b
c/b
b/c
locus3
b/c
b/c
c/a
c/b
a/c
e/c
c/e
c/e
c/a
Analysis
1. Define Px and Py
i.e. define the sets of individuals you like to calculate the relatedness between
e.g.
Px: all individuals
Py: nest=X
 relatedness is calculated between
individuals of the same nest
Px: sex=female
Py: nest=X AND sex=female
 relatedness is calculated between
females of the same nest
2. Define whether to calculate pairwise and/or average relatedness
3. And how to calculate standard errors (by jacknifing over loci or over nests)
Results - example
Whole population relatedness results:
R:
0.5142
Nx:
341
Jackknife: By locus: By colony:
Std. Err.: 0.0163
0.0394
95% Conf.: 0.0520
0.0839
Pseud.:
4
16
Relatedness by colony
Value:
R:
Nx,Ny:
MA1
0.7835
21,21
MA2
0.4106
21,21
MA3
0.3652
20,20
R1
0.3907
26,26
R3
0.5681
15,15
R5
0.3994
25,25
R6
0.3600
20,20
R7m1
0.6503
4,4
R7m2
0.4792
12,12
T1
0.2852
24,24
T2
0.4596
30,30
T3
0.6132
25,25
T4
0.5969
33,33
T5
0.4373
18,18
T7
0.6306
31,31
T8
0.7126
16,16
J/loci:
0.0808
0.0385
0.0907
0.0602
0.1031
0.0707
0.0291
0.0372
0.0612
0.0621
0.0841
0.0259
0.0828
0.0405
0.0460
0.1060
Ny:
C.I.:
0.2572
0.1225
0.2885
0.1916
0.3281
0.2250
0.0927
0.1185
0.1948
0.1976
0.2676
0.0825
0.2634
0.1287
0.1465
0.3372
341
Red wasp Vespula rufa
Average relatedness among
workers from the same nest
= 0.51
Less than the value expected
if they were full-sisters (0.75)
Implies mother queen mates
with an average of
(1/2.(0.51-0.25))=1.9 males
Wenseleers et al. Evolution 2005
DNA fingerprinting example
Mueller et al. (1994) PNAS
What is the average relatedness among females in nests
of the Halictid bee Augochlorella striata ?
Used DNA fingerprinting – multilocus, dominant marker
Reeve et al. (1992): relatedness among individuals
within a nest can be estimated as
R = (w-b) / (1-b)
where w = proportion of bands shared between individuals of same nest
b = proportion of bands shared between individuals of different nests
band sharing = 2Nab/(Na+Nb), where
Nab is the total number of bands shared by individuals a and b and
Na and Nb are the total number of bands present in a and b
Results: R=0.78, not significantly different from full-sister relationship (0.75)
Interesting application:
estimate heritabilities in natural
populations
Thomas et al. (2000) Heredity
The heritability of a trait is usually determined
using breeding experiments
But it can also be estimated in natural
populations as the regression of the pairwise
estimate of phenotypic similarity against r
Other kinship analysis programs
Relatedness 5.0
Relatedness
estimators
Platform
pros
cons
Q&G, pairwise +
group-average
Mac
+ user interface
haploid+diploid
Mac only
http://www.gsoftnet.us/GSoft.html
Identix
Q&G, L&R, Id
pairwise
PC
3 different
estimators
flexibility
http://www.univ-lille1.fr/gepv/english/perso_pages_en/PagepersoVincent_c.htm
Spagedi
6 estimators, one for
dominant markers
pairwise
PC
use of spatial info,
can use dominant
markers
(AFLP/RAPD/
minisat.)
flexibility
http://www.ulb.ac.be/sciences/ecoevol/spagedi.html
Delrious
L&R,
pairwise
PC
Mathematica
- user
inferface
http://www.zoo.utoronto.ca/stone/DELRIOUS/delrious.htm
Parentage analysis
Parentage Analysis: Exclusion
Question: which male is the father of a particular offspring?
Female Offspring Male1
Is not the offspring of
Male 2
Male 2
Male 3
Unsampled male with
paternal allele
Parentage Analysis: Exclusion Based on
Compatibility of Genotypes Between Males and
Females
Question: what are the parents of a particular individual?
Female 2
Female1 Offspring
Male1
Male 2
With no a priori knowledge F1/M1 or F2/M2 are
equally likely sets of parents
http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html
Version 2.0
© Copyright Tristan Marshall 1998-2001
------------------------------------------------------------------------
About CERVUS
CERVUS is a Windows 95-based program designed for large-scale parentage
analysis using co-dominant loci. Analysis is broken down into three sequential
stages. Using genotype data in text file format, the program can analyse allele
frequencies, run appropriate simulations and carry out likelihood-based
parentage analysis, testing the confidence of each parentage using the results
of the simulation. Simulations may also be used to estimate the power of a
series of loci for parentage analysis, using real or imaginary allele frequencies.
http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html
References
Marshall, TC, Slate, J, Kruuk, LEB & Pemberton, JM (1998)
Statistical confidence for likelihood-based paternity inference in
natural populations. Molecular Ecology 7(5): 639-655.
Slate J, Marshall TC & Pemberton JM (2000) A retrospective
assessment of the accuracy of the paternity inference program
CERVUS. Molecular Ecology 9(6): 801-808.
Use of Cervus
Uses likelihood methods to find the most likely parents. Useful
when more than one possible parent remains non-excluded.
Cervus calculates the likelihood ratio, or Paternity Index (the
likelihood that the candidate parent is the true parent divided by
the likelihood that the candidate parent is not the true parent),
and LOD scores (the log base e of the product of the likelihood
ratios at each locus).
Delta (difference in LOD scores between the most likely parent and
the second most likely parent) assesses the reliability of the
assignment.
LOD score of 0 means that the candidate parent is equally likely as
a random individual.
The most likely parent is the one with the most positive LOD score.
Statistical Power of loci for
parentage analyses
Assessed via Probability of Exclusion: P(E) probability of excluding a male who is not the
genetic father of a given offspring
Calculated for each locus and then values pooled
across all loci
Pi(E): for a given paternal allele probability that
another male has that allele
Examples of Values for PE
Eight microsatellite loci cloned
from Northern Watersnakes
(Nerodia sipedon) (Prosser
et al. 1999)
Locus
P(E)
Nsµ2
0.65
Nsµ3
0.82
Nsµ4
0.64
Nsµ6
0.55
Nsµ9
0.79
Nsµ10
0.66
Nsµ110 0.78
Nsµ119 0.86
Overall
> 0.999
Individual PiE [C] values:0.99 - 0.9999999 ; mean = 0.999
Typing Errors
Perfect data is usually not the reality.
A mismatch due to a typing error will exclude a
true parent in a simple exclusion analysis.
In a likelihood analysis a single mismatch does
not exclude a parent, it simply decreases the
likelihood, but a true parent will probably still
be identified.
Also good for other kinds of errors – null alleles
and mutations.
Input files
Genotype – genotypes of all individuals
Allele frequencies
Offspring relationships to known parents and
candidate parents
Example: Noninvasive paternity
assignment in Gombe chimps
Constable et al. (2001) Mol. Ecol.
39 female and male chimps genotyped
at 16 loci using faecal and hair samples
Then determined paternity of 14 offspring
Mother known, but not the father
Using Cervus, 13 out of 14 could be assigned to a
particular father with a confidence of 99%, one could
be assigned with a confidence of 95%
Positive relationship between male rank and
reproductive success
No evidence of extra-group paternity
Other application: parentage testing
Parentage testing: settle disputes over
who is the father of a child & is thus
responsible for child support
Immigration cases: establishing that
individuals are the true children/
parents/siblings in cases of family
reunification
DNA Diagnostics, Auckland
Parentage testing
Paternity index
The index in this man’s analysis shows that the DNA
evidence is 25 million times more likely that he is the
biological father versus he is not (odds 25 million:1)
DNA Diagnostics, Auckland
Kinship 1.0
http://gsoft.smu.edu/GSoft.htm
runs on Mac
KF Goodnight, DC Queller (1999) Computer software for performing likelihood tests of pedigree
relationship using genetic markers. Mol Ecol 8, 1231-1234
KinGroup 2.0
http://www.it.jcu.edu.au/kingroup/
JAVA, runs on PC+Mac
same functionality as Kinship
Use of Kinship
Uses likelihood methods to test hypotheses about kinship
relationships, e.g. father-son (R=0.5) as opposed to
unrelated (R=0)
Generates expected distributions of R values for given kin
relationships given a specific data set. This yields
confidence intervals for expected R values.
Can e.g. be used to group offspring in full-sib groups, i.e.
sharing the same father, or allocate offspring to particular
candidate parents
Example
Dierkes et al. (2005) Ecol. Lett.
Cooperatively breeding
cichlid Neolamprologus pulcher
Young stay in the nest and help their parents rear more offspring
Relatedness 5.0 was used to estimate the relatedness between
helpers and breeders
KinGroup was used to group individuals into full-sib groups and
determine the timing of breeder replacements
Need lots of loci
Ability to accurately distinguish between classes of relatives
requires > 20 moderately variable loci
Except in special situations…
e.g. haplodiploidy: greatly simplifies parentage assignment,
since father is haploid
Example: Wenseleers et al. 2005
study of the red wasp Vespula rufa
who produces the colony’s males,
the queen or the workers?
If queen is AB mated to a C male  if queen produces the males
then half will be A and half will be B, if the workers produce all the
males then half will carry the paternal C allele
Workers, males and the mother queens genotyped at 4 loci
Results: 33 out of 342 males carried the paternal allele,
mean power to detect workers’ sons was 87%
 (33/342)/0.87=11% of the males were workers’ sons
Example of parentage analysis
using DNA fingerprinting
Gibbs et al. 1990
Red-winged Blackbird
population (Agelaius
phoeniceus) in eastern
Ontario
Frequency of extra-pair fertilizations: 47% of all nests had 1+ chick from EPF
EPFs made up an average of 21% of the male’s repr. success
DNA fingerprints of
Red-winged Blackbird
families showing
examples where
resident male is
excluded as the parent
of chicks found in
nests on his territory.
Arrows indicate bands
(alleles) that exclude
the resident male
Other parentage analysis programs
Famoz: calculated likelihoods of particular relationship, can
also use sex-linked loci and dominant markers
http://www.pierroton.inra.fr/genetics/labo/Software/Famoz/
Gerud: estimates minimum number of sires for a family given
one known parent, reconstructs parental genotypes
http://www.biology.gatech.edu/professors/labsites/joneslab/parentage.html
DNA view: forensics, paternity testing
http://dna-view.com/
Good reviews
Michael S. Blouin (2003) DNA-based methods
for pedigree reconstruction and kinship
analysis in natural populations. Trends in
Ecology & Evolution 18: 503-511.
Adam G. Jones & William R. Ardren (2003)
Methods of parentage analysis in natural
populations. Molecular Ecology 12: 25112523.
Download