39248-2-12118

advertisement
International Biometric Society
RECONSTRUCTING PEDIGREES WITH MAXIMAL LIKELIHOOD FROM GENETIC MARKER DATA
Nuala A Sheehan1, Mark Bartlett2, James Cussens2
1
Department of Health Sciences, University of Leicester, UK
2
Department of Computer Science, University of York, UK
The estimation of particular relationships, or entire pedigree structures, from genetic marker
data is relevant to a wide range of applications. The aim of a likelihood-based approach is to
find the pedigree that gives the highest likelihood to the observed data. In order to
guarantee a maximum likelihood pedigree reconstruction, a complete search over all
possible pedigrees connecting the typed individuals is required. Existing approaches are
either restricted to small numbers of individuals or else deliver a reconstruction that will
probably have high likelihood but is not guaranteed to be the optimal pedigree. By encoding
the pedigree learning problem as an integer linear program we can exploit efficient
optimisation algorithms to construct pedigrees with maximal likelihood for the standard
situation where all individuals are observed at unlinked marker loci, founder genotypes are
in Hardy-Weinberg equilibrium and segregation of genes from parents to offspring is
Mendelian. Since the simple factorisation of the likelihood in this setting defines a Bayesian
network (BN), the reconstruction problem is equivalent to searching for an optimal BN where
the search is constrained to BNs that are valid pedigrees. The method is not restricted to
small pedigrees: we have tested it with simulated data on a real human pedigree structure of
over 1600 individuals. It also competes well with other approaches on smaller problems in
terms of solving times.
We note that a maximum likelihood pedigree is not necessarily the true pedigree and is not
necessarily unique. The true pedigree typically has lower likelihood due to the probabilistic
nature of genetic inheritance. Because we can actually identify a maximum likelihood
pedigree, we can also obtain any number of pedigrees in decreasing order of likelihood by
adding an extra constraint at each step in the solving process. This permits a model
averaging approach to addressing the uncertainty about how closely a reconstruction
resembles the true pedigree. It also enables a proper investigation into the properties of
maximum likelihood pedigree estimates which has not been possible up to now. In order to
get accurate reconstructions ─ especially on large problems ─ we need a constrained
maximum likelihood approach whereby the search is guided by appropriate prior information
such as age or an age ranking, sex or specific relationships which are often available for
some members of the pedigree. The efficiency of our approach on such large problems
bodes well for extensions beyond the standard setting above where some pedigree
members may be latent, genotypes may be measured with error and markers may be linked.
1. Cussens J, Bartlett M, Jones EM, Sheehan NA (2013). Maximum likelihood pedigree
reconstruction using integer linear programming. Genetic Epidemiology 37: 69-83.
2. Thompson EA (1976). Inference of genealogical structure. Social Science
Information 15: 477-526.
International Biometric Conference, Florence, ITALY, 6 – 11 July 2014
Download