Lecture 16: Individual Identity and Paternity Analysis March 7, 2014 Last Time Interpretation of F-statistics More on the Structure program Principal Components Analysis Population assignment Individual identity (in lab) Today Population assignment examples More on forensic evidence and individual identity Introduction to paternity analysis Population Assignment: Likelihood "Assignment Tests" based on allele frequencies in source populations and genetic composition of individuals P(G | H1 ) L( H1 , H 2 | G ) LR , P(G | H 2 ) Pk l p 2 il for homozygote AiAi in population l at locus k Pkl 2 pil p jl for heterozygote AiAj in population l at locus k m P(G | H n ) = Õ Pk k=1 for m loci Individual likelihood often summarized as: -log10(P(G|Hn)) Low numbers mean higher probability of Hn Likelihoods can be plotted against each other Population Assignment Example: Wolf Populations in Northwest Territories Wolf populations sampled on island and mainland populations in Canadian Northwest Territories -log likelihood from Mainland Immigrants detected on mainland (black circles) from Banks Island (white circles) -log likelihood from Banks Island Carmichael et al. 2001 Mol Ecol 10:2787 Population Assignment Example:Fish Stories Fishing competition on Lake Saimaa in Southeast Finland Contestant allegedly caught a 5.5 kg salmon, much larger than usual for the lake Officials compared fish from the lake to fish from local markets (originating from Norway and Baltic sea) 7 microsatellites Lake Saimaa Based on likelihood analysis, fish was purchased rather than caught in lake - Market Individual Identity: Likelihood Assume you find skin cells and blood under fingernails of a murder victim A hitman for the Sicilian mafia is seen exiting the apartment You gather DNA evidence from the skin cells and from the suspect They have identical genotypes What is the likelihood that the evidence came from the suspect? P(G | H1 ) L( H1 , H 2 | G ) LR , P(G | H 2 ) What is H1 and what is H2? Match Probability Probability of observing a genotype at locus k by chance in population is a function of allele frequencies: Pk p Pk 2 pi p j Homozygote Heterozygote 2 i m P Pk for m loci k 1 Assumes unlinked (independent loci) and HardyWeinberg equilibrium Probability of Identity Probability 2 randomly selected individuals have same profile at locus k: PIDk pi4 (2 pi p j ) 2 i Homozygotes i i j Heterozygotes m P PIDk for m loci k 1 Exclusion Probability (E): E=1-P What if the slimy mob defense attorney argues that the most likely perpetrator is the mob hitman’s brother, who has conveniently “disappeared”? Does the general match probability apply to near relatives? Probability of identity for full sibs Homozygotes PIDhosibk 2 alleles IBD 1 2 (1 2 pi pi ) 4 0 alleles IBD 2 alleles IBD 1 allele IBD Heterozygotes 0 alleles IBD 1 PIDhesibk (1 pi p j 2 pi p j ) 4 General Probability of Identity for Full Sibs: PIDsibk 2 1 1 4 2 2 1 pi pi pi 4 i 2 i i Probability of identity for full sibs PIDsibk 2 1 1 4 2 2 1 pi pi pi 4 i 2 i i Probability of identity unrelated individuals PIDk p (2 pi p j ) 4 i i i 2 i j For a locus with 5 alleles, each at a frequency of 0.2: PID = 0.072 PIDsib = 0.368 What is minimum probability of identity for full sibs? PIDsibk 2 1 1 4 2 2 1 pi pi pi 4 i 2 i i NRC (1996) recommendations Use population that provides highest probability of observing the genotype (unless other information is known) Correct homozygous genotypes for substructure within selected population (e.g., Native Americans, hispanics, African Americans, caucasians, Asian Americans) No correction for heterozygotes P' [ pi2 pi (1 pi )FST ] 2 pi p j Homozygotes Heterozygotes Why is it ‘conservative’ (from the standpoint of proving a match) to ignore substructure for heterozygotes? HT H S FST HT H S = HT (1- FST ) Example: World Trade Center Victims Match victims using DNA collected from toothbrushes, hair brushes, or relatives Exact matches not guaranteed Why not? Use likelihood to match samples to victims A series of little NBA prospects are born to ardent basketball fans in every city with an NBA team. The mothers regularly allege that the fathers are NBA stars from visiting teams. The “players” deny this allegation. Can this be resolved using molecular markers and population genetics methodologies? Paternity Exclusion Analysis Determine multilocus genotypes of all mothers, offspring, and potential fathers Determine paternal gamete by “subtracting” maternal genotype from that of each offspring. Infer paternity by comparing the multilocus genotype of all gametes to those of all potential males in the population Assign paternity if all potential males, except one, can be excluded on the basis of genetic incompatibility with the observed pollen gamete genotype Unsampled males must be considered Paternity Exclusion First step is to determine paternal contribution based on seedling alleles that do not match mother Notice for locus 3 both alleles match mother, so there are two potential paternal contributions Male 3 is the putative father because he is the only one that matches paternal contributions at all loci Locus 1 NO NO YES YES YES YES NO YES YES YES Locus 2 NO NO Locus 3 YES NO NO Parentage Analysis: Paternity Exclusion Determine multilocus genotypes of all mothers, offspring, and potential fathers Determine paternal gamete by “subtracting” maternal genotype from that of each offspring. Infer paternity by comparing the multilocus genotype of all gametes to those of all potential males in the population Assign paternity if all potential males, except one, can be excluded on the basis of genetic incompatibility with the observed pollen gamete genotype Unsampled males must be considered Paternity Exclusion Analysis Possible outcomes: Consequences: Female Only one male cannot be excluded Male Male Paternity is assigned Analyze more loci More than one male cannot be excluded Female Male Male Male All males are excluded Female Male ? Male Conclude there is migration from external sources Probabilities of Paternity Exclusion, Single Locus, 2 alleles, codominant The paternity exclusion probability is the sum of the probability of all exclusionary combinations Hedrick 2005 Sum: Prk p1 p2 (1 p1 p2 ) Probability of a falsely accused male of not matching for at least one of m loci: m Pr 1 (1 Prk ) k 1 See Chakraborty et al. 1988 Genetics 118:527 for a more general calculation of exclusion power Alleles versus Loci For a given number of alleles: one locus with many alleles provides more exclusion power than many loci with few alleles 10 loci, 2 alleles, Pr = 0.875 1 locus, 20 alleles, Pr=0.898 Uniform allele frequencies provide more power Characteristics of an ideal genetic marker for paternity analysis Highly polymorphic, (i.e. with many alleles) Codominant Easy to use for genotyping large numbers of individuals Mendelian or paternal inheritance 0.90 0.85 0.80 0.75 0.70 0.65 9 0.60 8 7 0.55 0.50 10 6 7 6 5 4 5 Allele 4 s 3 3 2 2 Lo ci Low cost 0.95 bility Exclusion Proba Reliable 1.00 Shortcomings of Paternity Exclusion Requiring exact matches for potential fathers is excessively stringent Mutation Genotyping error Multiple males may match, but probability of match may differ substantially No built-in way to deal with cryptic gene flow: case when male matches, but unsampled male may also match Type I error: wrong father assigned paternity) Advantages and Disadvantages of Likelihood Advantages: Flexibility: can be extended in many ways - Compensate for errors in genotyping - Incorporate factors influencing mating success: fecundity, distance, and direction Compensates for lack of exclusion power - Fractional paternity Disadvantages Often results in ambiguous paternities Difficult to determine proper cutoff for LOD score Summary Direct assessment of movement is best way to measure gene flow Parentage analysis is powerful approach to track movements of mates retrospectively Paternity exclusion is straightforward to apply but may lack power and is confounded by genotyping error Likelihood-based approaches can be more flexible, but also provide ambiguous answers when power is lacking