Determining kinship relationships using genetic markers Tom Wenseleers Laboratorium voor Entomologie KULeuven tom.wenseleers@bio.kuleuven.be Lecture can be downloaded from bio.kuleuven.be/ento/wenseleers/twpub.htm#courses GA75 'Moderne Onderzoeksmethoden in de Biologie‘, maart 2006 Parentage and kinship analysis Use of genetic markers to test hypotheses regarding the kinship relationships among specific individuals Kinship analysis: estimating the genetic relatedness among individuals Parentage analysis: identification of parents of specific offspring Applications EVOLUTIONARY STUDIES what is the relatedness between pairs of individuals? test role of kin selection in the evolution of cooperative behaviour how common are extra-pair fertilizations? how common is intraspecific brood parasitism (egg dumping)? BREEDING PROGRAMMES design captive breeding programs for endangered species assign F1 generation fish to a particular father in a mass-spawning experiment HUMAN APPLICATIONS parentage testing, forensic work at crime scenes victim identification in mass disasters What is Relatedness? “The probability that a gene in one individual is an identical copy, by descent, of a gene in another individual” Relatedness is a measure not of the absolute genetic similarity between two individuals, BUT of the degree to which this similarity exceeds the background similarity between individuals randomly drawn from the population. Marker considerations Which markers can be used? AA AB BB best: nuclear, codominant markers (allozymes) – low variability, not ideal microsatellites – tandem repeats of 2-5 bp motifs best choice (high resolution, little DNA required) also: dominant markers, but analysis methods not as powerful minisatellites AB AA BB AFLP RAPD Microsatellites = short tandem repeats CA CA CA CA CA Non-repetitive flanking sequence CA CA CA CA CA CA CA Non-repetitive Microsatellite array flanking sequence (~20-100bp) Length depends on repeat unit size and number of repeats Sequences of DNA consisting of repeats of 2-5 base pair motifs almost any combination possible (e.g. CA, GA, GGGAA ) discovered in 1980s, see e.g. Tautz, Trick & Dover (1986) CA Microsatellites present in genome (nucleus + mitochondria + chloroplasts) of all eukaryotes easily amplified using PCR and separated on DNA sequencer even highly degraded DNA can be used, e.g. single hair, faeces highly variable, usually between 5 and 20 alleles di-repeats most common; tri-, tetra- and penta-repeats rarer human genome: 35,000 (CA)n repeats wasps: 1% of 500bp fragments contain tri-repeat msats most common in mammals and insects, in birds 10 x rarer Microsatellites primers to amplify msats: - already developed for many organisms - can sometimes be developed by searching Genbank for msat motifs - cross-amplify using loci developed for other species usually works within same genus and sometimes same family if no loci are available isolate and sequence new ones steps: isolate total DNA restriction digest ligate small fragments into plasmid or phage vector transform E. coli cells plate out colonies lift colonies onto filters hybridize with probe containing msat repeats pick & sequence positive clones design primers if msats are rare and you need many loci: enrichment step alternatively: have them developed commercially, e.g. Amplicon, ca. 10.000 € / 10 loci PCR conditions: universal touchdown program usually works for all loci good isolation protocol: http://www.uga.edu/srel/DNA_Lab/protocols.htm http://snook.bio.indiana.edu/MENotes/home.html [ MENotes DB search ] [ MENotes DB home ] ------------------------------------------------------------------------ [ MENotes home ] This is the sister database for Molecular Ecology Notes, containing the details for reported loci (i.e., primer sequences, amplification conditions, polymorphism levels, cross-species amplification, and literature citations) in a searchable format. The database contains all Primer Note submissions to Molecular Ecology, as well as primer submissions to Molecular Ecology Notes. In the future, relevant submissions from other journals will be included, as it is hoped that this database will become the on-line resource for molecular markers developed for "non-commercial" and non-model species. The database may be searched using either the easy search page (searches based on family, genus or species names) or, in the future, an advanced search page, which will allow more flexible and detailed queries as the database grows. Authors whose data have been accepted for publication in the database should familiarize themselves with the database submission instructions and then use the database submission form to add their data. For your convenience, links to the search page, this page, and the Molecular Ecology Notes homepage at the Blackwell site are positioned at the top and bottom of each page. If you have any questions or comments about this site, please email the database/website administrator. -----------------------------------------------------------------------[ MENotes DB search ] [ MENotes DB home ] [ MENotes home ] Please direct comments and questions about this database to the administrator. 1. DNA extraction For most purposes a small bit of tissue can be boiled for 10 mins in 10% chelex resin, and this works fine as a template cheap & easy 2. PCR amplification genomic DNA + primers + Taq DNA polymerase + dNTPs (ACGT) + buffer Polymerase Chain Reaction Process repeated 30-40 times after 36 cycles: 236=68 billion copies 3. Detection: Radioactive (P33) end-labelling 3. Detection: Fluorescent labelling – gel based sequencer 3. Detection: Fluorescent labelling – capillary sequencer 4 or 5 labels + 1 label for internal size standard allows running up to 20 loci simultaneously DNA minisatellites (“fingerprints”) - tandem repeats of core sequences 15-30 bp in length (variable number tandem repeats) - most minisats occur 10 or 20 times in the whole genome - human genome: ca. 50,000 VNTRs - detected using Southern blotting after restriction digest - disadvantages: DNA quality and amount needed; scoring problems DNA fingerprints can identify individuals and determine parentage E.g., DNA fingerprints confirmed Dolly the sheep was cloned from an adult udder cell Donor udder (U), cell culture from udder (C), Dolly’s blood cell DNA (D), and control sheep 1-12 AFLP Amplified Fragment Length Polymorphism Mutations at restriction enzyme cutting sites result in fragment length polymorphism Ligation of adapters to genomic restriction fragments Selective PCR amplification with adapter-specific primers Advantage low development cost Disadvantages dominant marker scoring RAPD Random Amplified Fragment Polymorphism Arbitrary primers 8-10 bp long V. little development PCR amplification at low stringency (Ta 35-45oC) Variability Point mutations Insertions / deletions Disadvantages: dominant marker repeatability, scoring The bottom line Microsatellite markers are the best ! Kinship analysis Measure of Relatedness: Queller and Goodnight estimator P P * y R x k l P P * x x k l (Queller and Goodnight 1989) R = relatedness between individuals x and y where Px = frequency within individual x of allele l at locus k (must be 0, 0.5 or 1.0 in diploid organisms) Py = frequency of same allele in individuals to which x is compared P* = frequency of the allele in population at large (background allele frequency) Other estimators Queller & Goodnight estimator works with codominant markers assumes loci are unlinked Also other estimators, e.g. Ritland (1996), Lynch & Ritland (1999) for codominant markers Reeve et al. (1992), Lynch & Milligan (1994), Wang (2004) for dominant markers Different pros & cons in terms of how efficient & biased they are see Van de Casteele et al. (2001), Wang (2004) RELATEDNESS 5.0 http://www.gsoftnet.us/GSoft.html Programs: All Goodnight Software programs are for Macintosh PPC computers only. Relatedness calculates average genetic relatedness among sets of individuals defined by demographic variables, either on average or by pairs. It finds standard errors and confidence intervals for signifiance testing using a jackknife resampling method. Features in Relatedness 5.0: *Data sets with up to 127 loci of 127 alleles each, and number of individuals limited only by computer memory. *Up to 32 demographic variables with complete control over the order in which they are checked. *Pairwise values of relatedness. *95% confidence intervals as well as standard errors. The distribution package includes the program, a manual in Microsoft Word 6.0 format, and a sample data set. (A copy of the manual in Word 5.1 format is available on request.) Input data file (1) Allele frequency block – you can also have the program calculate this *Relatedness data file, population: Sample Data *Saved 1/14/1998 10:12:28 *config Guide F-Delim Deme-Col ID-Col *#config: T/ F1 T1 *Allele frequencies locus1 Freq locus2 Freq d 0.234 e b 0.191 c e 0.194 b c 0.254 d a 0.128 a end 0.155 0.241 0.161 0.309 0.134 Grp 1-Col T2 Grp 2-Col F3 Demog-Col 2@3 locus3 b c a e d Freq locus4 b e c a d 0.266 0.321 0.215 0.143 0.055 Input data file (2) Genotypes – you can also add demographic variables, e.g. nest Ind ID 1—1 1—2 1—3 1—4 1—5 1—6 1—7 1—8 1—9 K-nest 1 1 1 1 1 1 1 1 1 color red red red red red red red red red sibship f1 f1 f1 f1 f1 f2 f2 f2 f2 locus1 d/d d/b b/d d/d b/d d/d d/d d/d d/d locus2 e/c c/e c/e c/e c/e b/c c/b c/b b/c locus3 b/c b/c c/a c/b a/c e/c c/e c/e c/a Analysis 1. Define Px and Py i.e. define the sets of individuals you like to calculate the relatedness between e.g. Px: all individuals Py: nest=X relatedness is calculated between individuals of the same nest Px: sex=female Py: nest=X AND sex=female relatedness is calculated between females of the same nest 2. Define whether to calculate pairwise and/or average relatedness 3. And how to calculate standard errors (by jacknifing over loci or over nests) Results - example Whole population relatedness results: R: 0.5142 Nx: 341 Jackknife: By locus: By colony: Std. Err.: 0.0163 0.0394 95% Conf.: 0.0520 0.0839 Pseud.: 4 16 Relatedness by colony Value: R: Nx,Ny: MA1 0.7835 21,21 MA2 0.4106 21,21 MA3 0.3652 20,20 R1 0.3907 26,26 R3 0.5681 15,15 R5 0.3994 25,25 R6 0.3600 20,20 R7m1 0.6503 4,4 R7m2 0.4792 12,12 T1 0.2852 24,24 T2 0.4596 30,30 T3 0.6132 25,25 T4 0.5969 33,33 T5 0.4373 18,18 T7 0.6306 31,31 T8 0.7126 16,16 J/loci: 0.0808 0.0385 0.0907 0.0602 0.1031 0.0707 0.0291 0.0372 0.0612 0.0621 0.0841 0.0259 0.0828 0.0405 0.0460 0.1060 Ny: C.I.: 0.2572 0.1225 0.2885 0.1916 0.3281 0.2250 0.0927 0.1185 0.1948 0.1976 0.2676 0.0825 0.2634 0.1287 0.1465 0.3372 341 Red wasp Vespula rufa Average relatedness among workers from the same nest = 0.51 Less than the value expected if they were full-sisters (0.75) Implies mother queen mates with an average of (1/2.(0.51-0.25))=1.9 males Wenseleers et al. Evolution 2005 DNA fingerprinting example Mueller et al. (1994) PNAS What is the average relatedness among females in nests of the Halictid bee Augochlorella striata ? Used DNA fingerprinting – multilocus, dominant marker Reeve et al. (1992): relatedness among individuals within a nest can be estimated as R = (w-b) / (1-b) where w = proportion of bands shared between individuals of same nest b = proportion of bands shared between individuals of different nests band sharing = 2Nab/(Na+Nb), where Nab is the total number of bands shared by individuals a and b and Na and Nb are the total number of bands present in a and b Results: R=0.78, not significantly different from full-sister relationship (0.75) Interesting application: estimate heritabilities in natural populations Thomas et al. (2000) Heredity The heritability of a trait is usually determined using breeding experiments But it can also be estimated in natural populations as the regression of the pairwise estimate of phenotypic similarity against r Other kinship analysis programs Relatedness 5.0 Relatedness estimators Platform pros cons Q&G, pairwise + group-average Mac + user interface haploid+diploid Mac only http://www.gsoftnet.us/GSoft.html Identix Q&G, L&R, Id pairwise PC 3 different estimators flexibility http://www.univ-lille1.fr/gepv/english/perso_pages_en/PagepersoVincent_c.htm Spagedi 6 estimators, one for dominant markers pairwise PC use of spatial info, can use dominant markers (AFLP/RAPD/ minisat.) flexibility http://www.ulb.ac.be/sciences/ecoevol/spagedi.html Delrious L&R, pairwise PC Mathematica - user inferface http://www.zoo.utoronto.ca/stone/DELRIOUS/delrious.htm Parentage analysis Parentage Analysis: Exclusion Question: which male is the father of a particular offspring? Female Offspring Male1 Is not the offspring of Male 2 Male 2 Male 3 Unsampled male with paternal allele Parentage Analysis: Exclusion Based on Compatibility of Genotypes Between Males and Females Question: what are the parents of a particular individual? Female 2 Female1 Offspring Male1 Male 2 With no a priori knowledge F1/M1 or F2/M2 are equally likely sets of parents http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html Version 2.0 © Copyright Tristan Marshall 1998-2001 ------------------------------------------------------------------------ About CERVUS CERVUS is a Windows 95-based program designed for large-scale parentage analysis using co-dominant loci. Analysis is broken down into three sequential stages. Using genotype data in text file format, the program can analyse allele frequencies, run appropriate simulations and carry out likelihood-based parentage analysis, testing the confidence of each parentage using the results of the simulation. Simulations may also be used to estimate the power of a series of loci for parentage analysis, using real or imaginary allele frequencies. http://helios.bto.ed.ac.uk/evolgen/cervus/cervus.html References Marshall, TC, Slate, J, Kruuk, LEB & Pemberton, JM (1998) Statistical confidence for likelihood-based paternity inference in natural populations. Molecular Ecology 7(5): 639-655. Slate J, Marshall TC & Pemberton JM (2000) A retrospective assessment of the accuracy of the paternity inference program CERVUS. Molecular Ecology 9(6): 801-808. Use of Cervus Uses likelihood methods to find the most likely parents. Useful when more than one possible parent remains non-excluded. Cervus calculates the likelihood ratio, or Paternity Index (the likelihood that the candidate parent is the true parent divided by the likelihood that the candidate parent is not the true parent), and LOD scores (the log base e of the product of the likelihood ratios at each locus). Delta (difference in LOD scores between the most likely parent and the second most likely parent) assesses the reliability of the assignment. LOD score of 0 means that the candidate parent is equally likely as a random individual. The most likely parent is the one with the most positive LOD score. Statistical Power of loci for parentage analyses Assessed via Probability of Exclusion: P(E) probability of excluding a male who is not the genetic father of a given offspring Calculated for each locus and then values pooled across all loci Pi(E): for a given paternal allele probability that another male has that allele Examples of Values for PE Eight microsatellite loci cloned from Northern Watersnakes (Nerodia sipedon) (Prosser et al. 1999) Locus P(E) Nsµ2 0.65 Nsµ3 0.82 Nsµ4 0.64 Nsµ6 0.55 Nsµ9 0.79 Nsµ10 0.66 Nsµ110 0.78 Nsµ119 0.86 Overall > 0.999 Individual PiE [C] values:0.99 - 0.9999999 ; mean = 0.999 Typing Errors Perfect data is usually not the reality. A mismatch due to a typing error will exclude a true parent in a simple exclusion analysis. In a likelihood analysis a single mismatch does not exclude a parent, it simply decreases the likelihood, but a true parent will probably still be identified. Also good for other kinds of errors – null alleles and mutations. Input files Genotype – genotypes of all individuals Allele frequencies Offspring relationships to known parents and candidate parents Example: Noninvasive paternity assignment in Gombe chimps Constable et al. (2001) Mol. Ecol. 39 female and male chimps genotyped at 16 loci using faecal and hair samples Then determined paternity of 14 offspring Mother known, but not the father Using Cervus, 13 out of 14 could be assigned to a particular father with a confidence of 99%, one could be assigned with a confidence of 95% Positive relationship between male rank and reproductive success No evidence of extra-group paternity Other application: parentage testing Parentage testing: settle disputes over who is the father of a child & is thus responsible for child support Immigration cases: establishing that individuals are the true children/ parents/siblings in cases of family reunification DNA Diagnostics, Auckland Parentage testing Paternity index The index in this man’s analysis shows that the DNA evidence is 25 million times more likely that he is the biological father versus he is not (odds 25 million:1) DNA Diagnostics, Auckland Kinship 1.0 http://gsoft.smu.edu/GSoft.htm runs on Mac KF Goodnight, DC Queller (1999) Computer software for performing likelihood tests of pedigree relationship using genetic markers. Mol Ecol 8, 1231-1234 KinGroup 2.0 http://www.it.jcu.edu.au/kingroup/ JAVA, runs on PC+Mac same functionality as Kinship Use of Kinship Uses likelihood methods to test hypotheses about kinship relationships, e.g. father-son (R=0.5) as opposed to unrelated (R=0) Generates expected distributions of R values for given kin relationships given a specific data set. This yields confidence intervals for expected R values. Can e.g. be used to group offspring in full-sib groups, i.e. sharing the same father, or allocate offspring to particular candidate parents Example Dierkes et al. (2005) Ecol. Lett. Cooperatively breeding cichlid Neolamprologus pulcher Young stay in the nest and help their parents rear more offspring Relatedness 5.0 was used to estimate the relatedness between helpers and breeders KinGroup was used to group individuals into full-sib groups and determine the timing of breeder replacements Need lots of loci Ability to accurately distinguish between classes of relatives requires > 20 moderately variable loci Except in special situations… e.g. haplodiploidy: greatly simplifies parentage assignment, since father is haploid Example: Wenseleers et al. 2005 study of the red wasp Vespula rufa who produces the colony’s males, the queen or the workers? If queen is AB mated to a C male if queen produces the males then half will be A and half will be B, if the workers produce all the males then half will carry the paternal C allele Workers, males and the mother queens genotyped at 4 loci Results: 33 out of 342 males carried the paternal allele, mean power to detect workers’ sons was 87% (33/342)/0.87=11% of the males were workers’ sons Example of parentage analysis using DNA fingerprinting Gibbs et al. 1990 Red-winged Blackbird population (Agelaius phoeniceus) in eastern Ontario Frequency of extra-pair fertilizations: 47% of all nests had 1+ chick from EPF EPFs made up an average of 21% of the male’s repr. success DNA fingerprints of Red-winged Blackbird families showing examples where resident male is excluded as the parent of chicks found in nests on his territory. Arrows indicate bands (alleles) that exclude the resident male Other parentage analysis programs Famoz: calculated likelihoods of particular relationship, can also use sex-linked loci and dominant markers http://www.pierroton.inra.fr/genetics/labo/Software/Famoz/ Gerud: estimates minimum number of sires for a family given one known parent, reconstructs parental genotypes http://www.biology.gatech.edu/professors/labsites/joneslab/parentage.html DNA view: forensics, paternity testing http://dna-view.com/ Good reviews Michael S. Blouin (2003) DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends in Ecology & Evolution 18: 503-511. Adam G. Jones & William R. Ardren (2003) Methods of parentage analysis in natural populations. Molecular Ecology 12: 25112523.