Determining paternity and gene flow from cherry microsatellite data T. Connolly1, J. E. Cottrell1, S. P. Vaughan2 and K. Russell3 1 Forest Research, Northern Research Station, Roslin, Midlothian, EH25 9SY, UK. e-mail: thomas.connolly@forestry.gsi.gov.uk 2 Department of Tropical Plant and Soil Sciences, University of Hawai’i, 3190 Maile Way, Honolulu, HI 98622, USA. 3 Horticulture Research International, East Malling, UK. Abstract There are concerns that the increasing fragmentation of today’s woodlands may result in restricted gene flow between populations. Such isolation could lead to the loss of genetic diversity as well as to inbreeding depression. The extent of gene flow and the risk of genetic isolation are therefore important considerations in the development of appropriate conservation strategies for our increasingly fragmented forest habitats (Smouse & Sork, 2004). Gene flow can be estimated by both direct and indirect measures. Indirect approaches provide an insight into past gene flow and colonisation processes but often fail to provide information on the effects of recent changes such as landscape fragmentation. However, the development of the highly variable, codominant microsatellite markers enable direct estimates of contemporary gene flow to be obtained. Although these highly polymorphic markers make it technically possible to study the extent of gene flow, the analysis of such data raises several statistical issues. Direct methods use genetic variation in the progeny to identify directly the parental contribution and to calculate dispersal parameters of pollen or seed. The earliest and conceptually simplest technique of parentage analysis is exclusion, where incompatibilities between potential parents and offspring lead to rejection of particular parent-offspring combinations. Although the allelic diversity obtained using several microsatellite loci has been found to be large in early studies of paternity in trees (Dow & Ashley, 1996; Streiff et al., 1999) the exclusion approach has certain limitations. Under strict exclusion a single mismatch is enough to exclude a candidate parent and, consequently, genotyping errors, null alleles and mutations can contribute to false exclusions. These limitations suggest exclusion analysis is best suited to situations in which there are few candidate parents. With several loci and many potential parents it becomes necessary to use more sophisticated statistical analyses. The LOD (log-odds ratio) score represents the ratio of the likelihood of an individual being the parent of a given offspring and the likelihood that the potential parent and offspring are unrelated. In an isolated population, after an exhaustive evaluation of all genetically possible parents, offspring are assigned to the candidate parent with the highest LOD score (Meagher & Thompson, 1986; Slate et al., 2000). This approach has been further developed by Gerber et al. (2000, 2003) in the software program FAMOZ, where parentage in a non-isolated sub-population of adult trees can be analysed. The total gene flow is sub-divided into two components; that from either outside or from inside the stand. In situations in which the genotyped population forms part of a much larger population it is likely that gene flow from outside the stand is underestimated because foreign and local gametes are indistinguishable, thus generating an undetected ‘cryptic gene flow’. Because the genotyped sub-population forms part of a much larger population a rationale is set by Gerber et al. (2000) to decide whether a given individual could be considered as a true parent. Two simulations based on a theoretical, large, random mating population each generate a LOD frequency distribution curve for N hypothetical embryos. In the first simulation a set of possible gametes are generated based on the genotyped parents. In the second simulation the gametes are generated based on alleles in the genotyped parents according to their frequencies in the genotyped parent population. FAMOZ requires values for the following simulation parameters: departure from Hardy Weinberg equilibrium, calculation (mistyping) error and simulation error. These parameters are varied until the best separation of the two curves is found. The LOD threshold intersection value is that at which the two distributions intersect. A parent is considered to come from inside the stand if it has the highest LOD score above the threshold. Where no parent LOD score exceeds the threshold the embryo is considered to be fathered from outside the stand. The analysis presented here uses data collected from a native stand of wild cherry to compare estimations of pollen dispersal obtained either by exclusion or on likelihood and LOD scores using FAMOZ software. The 29 ha study site is situated near East Malling, Kent, England and consists of 248 wild cherry trees (Prunus avium). A total of 420 embryos collected from 10 mother trees were analysed. Both adult trees and embryos were analysed using 13 microsatellite loci. The data were subjected to paternity analysis by exclusion and by LOD threshold. Results based on a preliminary set of simulations (N=1000). The parameter values which resulted in the best separation of the two simulation curves were: departure from Hardy Weinberg equilibrium = 0, calculation error = 0.005 and simulation error = 0.010. The LOD threshold was at LOD=5 and the correct paternity assignment was achieved in 82.9% of cases if the tree with the highest LOD score exceeding the threshold was taken to be the actual father. LOD threshold is demonstrated to make better use of the data by allowing the identity of each father to be determined to a known level of accuracy. By using simple exclusion it was only possible to identify the father unambiguously in 30% of embryos, compared with 53% in the case of LOD threshold. Although simple exclusion identified a further 29% of embryos as having multiple candidate fathers, their paternity remained ambiguous. Reference Dow, B.D. and Ashley, M.V. (1996) Microsatellite analysis of seed dispersal and parentage of saplings in bur oak, Quercus macrocarpa. Molecular Ecology 5, 615627. Gerber, S., Mariette, S., Bodénès, C. and Kremer, A. (2000) Comparison of microsatellites and amplified fragment length popymorphism markers for parentage analysis. Molecular Ecology 9, 1037-1048. Gerber, S., Chabrier, P. and Kremer, A. (2003) FAMOZ: a software for parentage analysis using dominant, codominant and uniparentally inherited markers. Molecular Ecology Notes 3, 479-481. Meagher, T.R. and Thompson, E. (1986) The relationship between single parent and parent pair genetic likelihoods in genealogy reconstruction. Theoretical Population Biology 29, 87-106. Slate, J., Marshall, T. and Pemberton, J. (2000) A retrospective assessment of the accuracy of the paternity inference program CERVUS. Molecular Ecology 9, 801808. Smouse, P.E. and Sork, V.L. (2004) Measuring pollen flow in forest trees: an exposition of alternative approaches. Forest Ecology and Management 197, 21-38. Streiff, R., Ducousso, A., Lexer, C., Steinkellner, H., Gloessl, J. and Kremer, A. (1999) Pollen dispersal inferred from paternity analysis in a mixed oak stand of Quercus robur L. and Quercus petraea (Matt.) Liebl. Molecular Ecology 8, 831-841.