International Biometric Society RECONSTRUCTING PEDIGREES WITH MAXIMAL LIKELIHOOD FROM GENETIC MARKER DATA Nuala A Sheehan1, Mark Bartlett2, James Cussens2 1 Department of Health Sciences, University of Leicester, UK 2 Department of Computer Science, University of York, UK The estimation of particular relationships, or entire pedigree structures, from genetic marker data is relevant to a wide range of applications. The aim of a likelihood-based approach is to find the pedigree that gives the highest likelihood to the observed data. In order to guarantee a maximum likelihood pedigree reconstruction, a complete search over all possible pedigrees connecting the typed individuals is required. Existing approaches are either restricted to small numbers of individuals or else deliver a reconstruction that will probably have high likelihood but is not guaranteed to be the optimal pedigree. By encoding the pedigree learning problem as an integer linear program we can exploit efficient optimisation algorithms to construct pedigrees with maximal likelihood for the standard situation where all individuals are observed at unlinked marker loci, founder genotypes are in Hardy-Weinberg equilibrium and segregation of genes from parents to offspring is Mendelian. Since the simple factorisation of the likelihood in this setting defines a Bayesian network (BN), the reconstruction problem is equivalent to searching for an optimal BN where the search is constrained to BNs that are valid pedigrees. The method is not restricted to small pedigrees: we have tested it with simulated data on a real human pedigree structure of over 1600 individuals. It also competes well with other approaches on smaller problems in terms of solving times. We note that a maximum likelihood pedigree is not necessarily the true pedigree and is not necessarily unique. The true pedigree typically has lower likelihood due to the probabilistic nature of genetic inheritance. Because we can actually identify a maximum likelihood pedigree, we can also obtain any number of pedigrees in decreasing order of likelihood by adding an extra constraint at each step in the solving process. This permits a model averaging approach to addressing the uncertainty about how closely a reconstruction resembles the true pedigree. It also enables a proper investigation into the properties of maximum likelihood pedigree estimates which has not been possible up to now. In order to get accurate reconstructions ─ especially on large problems ─ we need a constrained maximum likelihood approach whereby the search is guided by appropriate prior information such as age or an age ranking, sex or specific relationships which are often available for some members of the pedigree. The efficiency of our approach on such large problems bodes well for extensions beyond the standard setting above where some pedigree members may be latent, genotypes may be measured with error and markers may be linked. 1. Cussens J, Bartlett M, Jones EM, Sheehan NA (2013). Maximum likelihood pedigree reconstruction using integer linear programming. Genetic Epidemiology 37: 69-83. 2. Thompson EA (1976). Inference of genealogical structure. Social Science Information 15: 477-526. International Biometric Conference, Florence, ITALY, 6 – 11 July 2014