Computational inference beyond Kingman’s coalescent Abstract o

advertisement
Computational inference beyond Kingman’s coalescent
Jere Koskela, Paul Jenkins and Dario Spanò
October 25, 2013
Abstract
Full likelihood inference under Kingman’s coalescent is a computationally challenging problem.
Importance sampling (IS) has been a successful means of solving this problem, and in recent
decades much work has been done on designing good, and in some cases optimal IS algorithms.
A parallel development in mathematical population genetics has been the emergence of more
general Λ–coalescents and Ξ–coalescents, which provide better modelling fits to some data sets.
In this paper we aim to mitigate this cost by deriving an “approximately optimal” IS algorithm
by expressing the optimal proposal distribution in terms of certain intractable sampling distributions, and obtaining principled, tractable approximations to them. Results from Λ–coalescents
are also compared to existing methodology based on the Griffiths–Tavaré recursion. Empirical
tests suggest a noticeable improvement in accuracy, and an improvement in efficiency of up to
two orders of magnitude.
1
Introduction
Importance sampling has a well–established role in population genetic inference as a means
of approximating the likelihood of an observed sample. In this context the method was introduced by Griffiths and Tavaré who wrote down a recursion for quantities of interest under
Kingman’s coalescent [Kin82] and used a Markov chain approach to approximate its solution
[GT94a], [GT94b], [GT94c], [GT99]. Their approach was identified as an example of importance sampling by Felsenstein et al. [FKYB99] which led Stephens and Donnelly to express the
optimal proposal distribution in terms of a family of conditional sampling distributions (CSDs)
[SD00]. These distributions are inaccessible in general, but the authors introduced a principled
approximation which yielded marked improvements in efficiency and accuracy. In addition, IS
has been investigated and applied to genetic problems such as phasing, demographic inference
and imputation among others in [GM96], [FD01], [DIGLR05], [GJS08], [GT08], [HUW08] and
[JG11].
Approximating the CSDs for various generalisations of Kingman’s coalescent has received plenty
of attention, both as a means of deriving approximations to the optimal importance sampling
algorithm and due to the product of approximate conditionals (PAC) method introduced in
[LS03]. While we do not pursue the PAC approach further in this paper, the derivation of
an approximate CSD in Section 3 can be viewed as a first step towards a PAC method for
Λ–coalescents and Ξ–coalescents.
De Iorio and Griffiths derived an approximation to finite alleles CSDs based on the Fleming–
Viot generator ([DIG04a], [DIG04b]) and Paul and Song provided a genealogical interpretation
and included crossover recombination [PS10]. Further approximations based on Hidden Markov
1
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
Models have been obtained in [PSS11] and [SPS13], and applied in [SHS13], though we do not
consider these here.
Kingman’s coalescent only permits binary mergers of ancestral lineages. The Λ–coalescents,
which allow multiple lineages to coalesce in one event, were introduced by Pitman [Pit99] and
Sagitov [Sag99], where the rate of coalescence of any k out of n lineages is given by
Z 1
1
rk (1 − r)n−k 2 Λ(dr)
λn,k :=
r
0
for some finite measure Λ on [0, 1], which can be taken to be a probability measure without loss
of generality. Popular choices of Λ include Λ = δ0 , which corresponds to Kingman’s coalescent,
ψ2
2
Λ = δ1 leading to star–shaped genealogies, Λ = 2+ψ
2 δ0 + 2+ψ 2 δψ where ψ ∈ (0, 1] [EW06] and
Λ = Beta(2 − α, α) where α ∈ (1, 2) [Sch03]. See [BB09] for a review.
Investigations by Boom et al. [BBB94], Árnason [Á04], Eldon and Wakeley [EW06], and Birkner
and Blath [BB08] have concluded that the more general Λ–coalescents often provide better
descriptions of some populations than Kingman’s coalescent, particularly among marine species.
Thus similar strategies of inference have been developed. An analogue of the Griffiths–Tavaré
recursion (see equation (1) in Section 2) for Λ–coalescents was derived in [BB08]. In a subsequent
paper [BBS11] the authors characterised the optimal proposal distribution in terms of certain
Green’s functions relating to the time–reversal of the coalescent and obtained an approximately
optimal algorithm for the infinite sites model of mutation. [SBB13] contains a detailed discussion
of inference under Beta–coalescents and their applicability to marine populations.
The Λ–coalescent family allows for multiple mergers, but only permits one merger at a time.
They are generalised further by the Ξ–coalescents, which permit any number of simultaneous,
multiple mergers. They were introduced my Möhle and Sagitov [MS01] and Schweinsberg [Sch00]
and can be expressed in terms of a finite measure Ξ (again, a probability measure without loss
of generality) on the infinite simplex
(
)
∞
X
∆ = r = (r1 , r2 , . . .) ∈ [0, 1]N :
ri ≤ 1
i=1
We denote
P by λn;k1 ,...,kp ;s a coalescence event involving p ≥ 1 mergers with sizes k1 , . . . , kp and
s = n − pi=1 ki lineages not participating in any merger, where n is the total number of lineages
before the mergers. The rate of such mergers is given as
!s−l ∞ !−1
Z X
s X
∞
X
X
X
s
kp
k1
λn;k1 ,...,kp ;s =
...
ri1 . . . xip xip+1 . . . xip+l 1 −
ri
ri2
Ξ(dr)
l
∆
l=0
i1 ∈N
ip+l ∈N
i=1
i=1
Such Ξ–coalescents have also been used to model genealogies of marine organisms [SW08],
although the question of which measures Ξ are biologically relevant remains open. Note that if
Ξ assigns full mass to the set {r : r2 =r 3 = . . . = 0} the resulting process is a Λ–coalescent.
In this paper we characterise the optimal IS proposal distribution for a finite sites, finite alleles
Λ– and Ξ–coalescents in terms of respective families of conditional sampling distributions, and
derive principled approximations to these distributions in both cases. The rest of the paper
is laid out as follows. In Section 2 we give a heuristic description of Λ–coalescents and derive
its optimal proposal distribution. In Section 3 we give principled derivations for approximate
Λ–coalescent CSDs, based on the finite alleles Λ–Fleming–Viot generator and on genealogical
considerations. Section 4 presents a simulation study comparing the obtained algorithms to the
generalised Griffiths–Tavaré recursion of [BB08]. In Section 5 we generalise the IS algorithm
further for the Ξ–coalescent family, and derive the analogous approximate CSDs. Section 6
concludes with a discussion.
2
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
2
The Λ–coalescent and its optimal proposal distributions
In the notation of [PS10], let L = {1, . . . , b} be a set of loci, El be a finite set of possible alleles
at locus l ∈ L, θl be the mutation rate at locus l ∈ L and P (l) be aP
family of stochastic matrices
giving transition probabilities of mutations at locus l ∈ L. Let θ = l∈L θl be the total mutation
rate, H = E1 × . . . Eb be the set of haplotypes and Λ be the probability measure specifying an
arbitrary Λ–coalescent. Let
(
)
X
∆H = x = (x1 , . . . , x|H| ) ∈ [0, 1]|H| :
xh = 1
h∈H
be the space ofPprobability vectors of allele frequencies. Denote a sample by n = (nh )h∈H ∈ N|H|
and let n := h∈H nh . Let eh be the canonical unit vector with a 1 in position h and zeros
elsewhere. For l ∈ L and h ∈ H let h[l] denote the allele at locus l of haplotype h. Finally, for
a ∈ El let Sla (h) be the haplotype obtained from h by overwriting locus l by allele a.
The dynamics of a finite sample of individuals from the stationary Λ–Fleming–Viot process
under the finite alleles model of mutation can be described as follows. For a rigorous account
see [DK99].
Consider an typed sample of n lineages and associate to each lineage a unique level from
{1, . . . , n}. Let Π be a Poisson process on R+ × [0, 1] with rate dt ⊗ r−2 Λ(dr). At each (t, r) ∈ Π
every lineage flips a coin with success probability r, and all successful lineages “look down” and
copy the type of the participating lineage with the lowest level. Independently each lineage
mutates at locus l with rate θl and jumps drawn from P (l) . This particle system embeds the
Λ–coalescent, with coalescence events traced along the look–down–and–copy jumps.
Let (Hi )−T
i=0 denote the sequence of states of the ancestral sample after i events, whether they
be mutations or coalescences, so that H0 is the initial sample, H−T is the most recent common
ancestor (MRCA) and the other Hi ’s correspond to intermediate states along the ancestral tree.
Note that the likelihood P(H0 ) can be written
Z
P(H0 ) =
P(H0 |A)P(dA)
A
where A is the space of possible ancestries. These ancestries can be decomposed into a sequence
of updates as above to give
Z
P(H0 ) =
−T
Y
Z
...
P(H0 |A)
P(dHi+1 |Hi )P(dH−T )
(1)
i=−1
where P(H−T ) is the invariant distribution of the mutation operator and
(
(l)
θl
(nSla (h) + 1 − δah[l] )Pah[l] if Hi+1 = Hi − eSla (h) + eh
nn
P(Hi+1 |Hi ) = nθ−q
λn,k nh −k+1
n
if Hi+1 = Hi + (k − 1)eh
k nθ−qnn n−k+1
(2)
P
n
where −qnn = n−1
j=1 n−j+1 λn,n−j+1 is the total rate of coalescence of n untyped lineages (see
[BB08] for a detailed derivation).
As with Kingman’s coalescent (1) can be approximated by sampling N independent ancestors
from the stationary distribution of the mutation mechanism, generating an ancestral tree from
each ancestor and setting P(H0 |A) = 1 if the leaves of the tree are compatible with the data
and 0 otherwise. However, obtaining a nonzero estimator with reasonable probability requires a
prohibitively large number of simulations as likelihoods of O(10−10 ) are typical among modest
3
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
practical samples. A better approach is to start with the data, propose mutations and coalescences backwards in time until the MRCA is reached, and thus ensure every simulated tree is
compatible with the observation. This yields
" −T
#
Z
Z Y
−T
Y P(Hi+1 |Hi )
P(Hi+1 |Hi )
P(H0 ) = . . .
Q(dHi |Hi+1 )P(H−T ) = E
P(H−T )
(3)
Q(Hi |Hi+1 )
Q(Hi |Hi+1 )
i=−1
i=−1
where Q(·|Hi+1 ) is an arbitrary proposal distribution satisfying mild support conditions and the
expectation is with respect to ⊗−T
i=−1 Q(Hi |Hi+1 ). The expectation can be approximated using
the IS estimator
(j)
(j)
N −Tj
1 X Y P(Hi+1 |Hi )
(j)
pb(H0 ) =
P(H−Tj )
(j)
(j)
N
Q(H |H )
j=1 i=−1
i
i+1
(j)
i
where {Hi }−T
i=0 is an i.i.d. sample of sequentially constructed coalescent trees from the distributions Q(Hi |Hi+1 ).
Theorem 1. Let π(m|n) denote the sampling distribution of the next m individuals given the
types of the first n. Then the optimal proposal distributions Q∗ are given by

π(eS a (h) |Hi+1 −eh ) (l)

if Hi = Hi+1 − eh + eSla (h)
nh θl π(el h |Hi+1 −eh ) Pah[l]
∗
Q (Hi |Hi+1 ) ∝
−1
n
 h λn,k [π((k − 1)eh |Hi+1 − (k − 1)eh )]
if Hi = Hi+1 − (k − 1)eh
k
where the first term ranges over all possible mutations for all haplotypes present in the sample,
and the second over all present haplotypes and k ∈ {2, . . . nh }.
Proof. The argument giving the mutation term is identical to that in Theorem 1 of [SD00] and
is omitted.
For the coalescence term suppose the n lineages evolve according to the lookdown construction of
[DK99], and denote the types of n particles at time t by Dn (t) = (h1 , . . . , hn ). Define Υk as the
event that in the last δ units of time there was a coalescence of lineages n−k +1, n−k +2, . . . , n.
Then
P(Υk |Dn (t) = (h1 , . . . , hn−k , h, . . . , h)) =
X P(Υk ∩ Dn (t − δ) = (h1 , . . . , hn−k , h, g2 , . . . , gk ) ∩ Dn (t) = (h1 , . . . , hn−k , h, . . . , h))
g2 ,...,gk ∈H
=
X
g2 ,...,gk ∈H
=
P(Dn (t) = (h1 , . . . , hn−k , h, . . . , h))
P(Dn (t − δ) = (h1 , . . . , hn−k , h, g2 , . . . , gk ))δλn,k
+ o(δ)
P(Dn (t) = (h1 , . . . , hn−k , h, . . . , h))
δλn,k
+ o(δ)
π((k − 1)eh |Dn (t) − (k − 1)eh )
By exchangeability every
set of k lineages coalesces at this same rate, so the total rate is obtained
nh
by multiplying by k .
Remark 1. It is tempting to simplify the situation further by decomposing
π((k − 1)eh |n − (k − 1)eh ) =
k−2
Y
π(eh |n − (k − 1 + j)eh )
j=0
and thus requiring only univariate CSDs. In general a decomposition like this requires exchangeability, which the CSDs satisfy but typically approximations do not. However, in the
Λ–coalescent setting the argument being decomposed will always consist of multiples of only
4
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
one type of allele. Permuting lineages which feature only in the sample being conditioned on
does not affect the outcome even for non–exchangeable families of distributions, so in this particular context univariate CSDs are indeed sufficient. Note that this will not be the case for
Ξ–coalescents since simultaneous coalescence of lineages of multiple types is permitted.
3
Approximating the Λ–coalescent CSDs
An approximation to the CSDs for Kingman’s coalescent was derived in [DIG04a]
by noting
P
∂
that the Fleming–Viot generator can be written component–wise as L =
h∈H Lh ∂xh , then
b with respect to that
assuming that there exists a probability measure, and an expectation E
measure, such that the standard stationary condition E [Lf (X)] = 0 holds component–wise:
∂
b
E Lh
f (X) = 0 for every h ∈ H and f ∈ C 2 (∆)
∂xh
Q
Substituting the probability of an ordered sample q(n|x) = h∈H xnhh yields a recursion whose
solution is defined as the approximate CSD. The same argument can be applied to the Λ–
Fleming–Viot process to define approximate CSDs for the Λ–coalescent.
Theorem 2. Let π̂(m|n) denote the approximate Λ–coalescent CSD as defined above. It solves
the following recursion
#
"
n+m
X n + m
Λ({0})(n + m − 1)
1
m
λn+m,k π̂(m|n) =
+θ+
k
2
n+m
k=2
"
X
X X (l)
Λ({0})(nh + mh − 1)
mh
Pah[l] π̂(m − eh + eSla (h) |n)+
π̂(m − eh |n) +
θl
2
a∈El
h∈H
l∈L
( m +1 h
X nh + mh
1
λn+m,k π̂(m − (k − 1)eh |n)+
nh + mh
k
k=2
)#
nhX
+mh nh + m h
π̂(m − mh eh |n − (k − mh − 1)eh )
λn+m,k
(4)
k
π̂((k − mh − 1)eh |n − (k − mh − 1)eh )
k=mh +2
Proof. The generator of the Λ–Fleming–Viot jump–diffusion can be written as
Lf (x) =
X Λ({0})xh X
2
Z
h∈H
X
+
h0 ∈H
∂
XX X
∂2
(l)
f (x) +
θl
xSla (h) Pah[l] − δah[l]
f (x)
∂xh ∂xh0
∂xh
a∈El
h∈H l∈L
{f ((1 − r)x + reh ) − f (x)} r−2 Λ(dr) =:
xh
h∈H
(δhh0 − xh0 )
(0,1]
X
Lh f (x)
h∈H
Substituting q(n|x) yields the following three terms on the R.H.S.
"
#
X Λ({0})nh
X
(nh − 1)q(n + eh |x) −
(nh0 − δhh0 )q(n|x) +
2
0
h∈H
h ∈H


X
X
X (l)
nh
θl 
Pah[l] q(n − eh + eSla (h) |x) − q(n|x) +
h∈H
Z
(0,1]
l∈L
(
(5)
(6)
(7)
a∈El
)
nh n XX
X
nh k
n
r (1 − r)n−k q(n − (k − 1)eh |x) −
rk (1 − r)n−k q(n|x) r−2 Λ(dr)
k
k
h∈H k=0
k=0
5
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
P
Now the P
k = 0 terms inside the integral cancel because h∈H xh = 1 and the k = 1 terms cancel
because h∈H nh = n, which means the third summand can be written
(n )
n h
X X
nh
nh X n
λn,k q(n − (k − 1)eh |x) −
λn,k q(n|x)
(8)
k
n
k
h∈H
k=2
k=2
Substituting (6), (7) and (8) into (5) and rearranging gives
"
#
n X Λ({0})(n − 1)
X Λ({0})(nh − 1)
1X n
λn,k q(n|x) =
+θ+
q(n − eh |x)
2
n
k
2
h∈H
h∈H
k=2
nh X X X (l)
X 1 X
nh
λn,k q(n − (k − 1)eh |x)
+
θl
Pah[l] q(n − eh + eSla (h) |x) +
nh
k
h∈H l∈L
a∈El
h∈H
(9)
k=2
The component–wise vanishing property is
"
#
X
X
b [Lh q(n|X)] = 0
b
mh E
E
mh Lh q(n|X) =
h∈H
h∈H
so that from (9) we get
"
"
#
n X
Λ({0})(n − 1)
Λ({0})(nh − 1) b
1X n
b [q(n|X)] =
m
mh
λn,k E
+θ+
E [q(n − eh |X)]
k
2
n
2
h∈H
k=2
#
nh i
X
X X (l) h
n
1
h
b [q(n − (k − 1)eh |X)]
b q(n − eh + eS a (h) |X) +
λn,k E
+
θl
Pah[l] E
l
nh
k
l∈L
a∈El
k=2
Note that π(m|n) = E [q(n + m|X)] /E [q(n|X)] so that substituting n 7→ n + m, assuming that
b and dividing by E [q(n|X)] gives the desired recursion.
E=E
Corollary 1. The univariate approximate CSDs π̂(eh |n) satisfy
#
"
n+1 nh
Λ({0})n
1 X n+1
λn+1,k π̂(eh |n) =
+θ+
(Λ({0}) + λn+1,2 )
2
n+1
k
2
k=2
nX
h +1 X X (l)
λn+1,k
1
nh + 1
+
θl
Pah[l] π̂(eSla (h) |n) +
nh + 1
k
π̂((k − 2)eh |n − (k − 2)eh )
l∈L
a∈El
(10)
k=3
Proof. The result follows by substituting m = eh into (4).
As per Remark 1 it is sufficient to work with the simpler recursion (10) as opposed to the full
recursion (4). However, because of the denominator in the final term of (10) the resulting system of equations still contains as many unknowns as the recursion for the full likelihood. Hence
further approximations are needed to obtain a practical family of proposal distributions.
Definition 1. Setting Λ = δ0 in (10) results in the approximate CSDs derived in [SD00] for
Kingman’s coalescent. This approximation ignores the dynamics of the Λ–coalescent but results
in a valid IS proposal distribution, which is denoted by QSD .
In [PS10] Paul and Song introduced the trunk ancestry, which can be used to obtain an approximation which makes better use of the Λ–coalescent structure. We briefly recall the definition
of the trunk ancestry here before using it to define a second approximate CSD family.
6
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
Definition 2. The trunk ancestry A∗ (n) of a sample n is an improper ancestral forest (i.e. one
which does not reach a MRCA) which consists of the lineages that form n traced back in time
forever, undergoing no mutations or coalescences.
The first two terms on the R.H.S. of (10), corresponding to pairwise mergers and mutations,
can be interpreted genealogically as the rates with which the (n + 1)th lineage mutates and is
absorbed into A∗ (n) by a pairwise merger. The third term corresponds to a multiple merger
between the (n + 1)th lineage and two or more lineages in n of the same type. Because this
last term involves coalescence between lineages in n it does not have an interpretation in terms
of A∗ (n). However it can be forced into this framework by noting that the only relevant information is the time of absorption of the (n + 1)th lineage and the type of the lineage(s) in n
with which it merges. Motivated by the trunk ancestral interpretation we expect the following
recursion to be a good, tractable approximation to (10).
Definition 3. Let π̂ K (eh |n) be the distribution of the type of a lineage which, when traced
backwards in time, mutates with rate θl according to the transition matrix P (l) at each locus
l ∈ L and is absorbed into A∗ (n) with rate
n+1 Λ({0})n
1 X n+1
λn+1,k
+
k
2
n+1
k=2
choosing its parent uniformly upon absorption. The corresponding IS proposal distribution is
denoted by QK .
Proposition 1. π̂ K (eh |n) satisfies the equations
#
"
n+1 Λ({0})n
1 X n+1
+ θ+
λn+1,k π̂ K (eh |n) =
2
n+1
k
k=2
n+1
X n + 1
X X (l)
Λ({0})nh
nh
Pah[l] π̂ K (eSla (h) |n)
+
λn+1,k +
θl
2
n(n + 1)
k
k=2
l∈L
a∈El
and is the stationary distribution of the Markov chain on H with transition matrix
h
i
P
Pn+1 n+1
1
(l) + Λ({0})/2 +
θ
P
λ
N
l
n+1,k
l∈L
k=2
k
n(n+1)
P
P
Λ({0})n
n+1 n+1
1
+ n+1
l∈L θl +
k=2
k λn+1,k
2
(11)
where N is the |H| × |H| matrix with each row equal to (n1 , . . . , n|H| ).
Proof. The simultaneous equations follow by tracing the (n + 1)th lineage backwards in time
and decomposing based on the first event, and the transition matrix follows immediately from
the simultaneous equations.
Note that QK has a very similar form to QSD , and as a consequence of the linearity in N in
(11) the efficient Gaussian quadrature approximation of Appendix A in [SD00] can be applied
to both with minor modifications for QK .
4
Simulation study
In this section we present an empirical comparison between the IS algorithms defined by QSD
and QK and the generalisation of the Griffiths–Tavaré proposal distribution from [BB08] which
7
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
will be denoted by QGT . The samples have been generated using the efficient sampling algorithm
provided in Section 1.4.4 of [BB09].
Simulated chromosomes
consist of 15 loci with two possible alleles denoted {0, 1} and mutation
0
1
matrix P (l) =
at each locus. The coalescent is a Beta(2 − α, α)–coalescent. All
1 0
simulations have been run on a single core on a Toshiba laptop, and make use of a stopping
time resampling scheme [Jen12] with resampling checks made at hitting times of all sample
sizes reaching B = {n − 5, n − 10, . . . , 5}. This generic resampling regime has been chosen for
simplicity and without regard for any particular proposal distribution.
4.1
Experiment 1
The total mutation rate is θ = 0.1 spread evenly among all 15 loci. The coalescent is specified
as α = 1.5. The data consists of 100 sampled chromosomes, 95 of which share a single type,
four lineages a second type one mutation away from the main block, and a single lineage is of a
third type one different mutation removed from the main block.
We consider inferring both θ and α individually, assuming all other parameters are known and
θl = θ/L for every l ∈ L. Eight independent simulations of 30 000 particles each were run
on an evenly spaced grid of values for the mutation rate spanning the interval [0.025, 0.2] for
each proposal distribution. The same simulations were then repeated on an evenly spaced grid
spanning α ∈ [1.1125, 1.9]. The resulting likelihood surfaces are shown in Figure 1. The data
used to plot Figure 1 is tabulated in the Appendix.
Figure 1: Simulated log–likelihood surfaces from 30 000 particles with ±2SE confidence envelopes. The left column is for θ and the right for α. The true surfaces are based on a 1 000
000 particle simulation using the QK proposal distribution.
8
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
The most striking observation is that both approximate CSDs yield algorithms which are two
orders of magnitude faster than the Griffiths–Tavaré scheme. Moreover, it is clear that the
α–surface obtained from QGT has not fully converged. The wide confidence envelope at the
left hand edge and the lack of monotonicity at the right hand edge of the QGT θ–surface are
indicative of poorer performance when inferring θ as well.
The runtimes of QSD and QK are very similar in both cases, and all four surfaces from these
proposals are good approximations of the truth. In the θ–case the accuracy of the two is very
similar, but in the α–case QK yields noticeably tighter confidence bounds and a smoother surface.
This is particularly true of low values of α, which correspond to Beta–coalescents that are very
different from Λ = δ0 .
Joint inference of α and θ is also of interest. Figure 2 shows a joint likelihood heat map for
the two parameters constructed from a grid of simulations of 30 000 particles from the QK
proposal. The surface is flat due to the limited amount of information in 100 samples, but the
maximum likelihood estimator is close to the true (1.5, 0.1) and the surface shows a high degree
of monotonicity.
Figure 2: Simulated likelihood surface from 30 000 particles using QK . The surface is interpolated from an 8 × 8 grid of independent simulations.
4.2
Experiment 2
We expect the performance of the QSD proposal to deteriorate the further the true model is from
Kingman’s coalescent, and the more demanding the data set. To that end the one dimensional
inference problems for θ and α were repeated for a sample of 150 lineages with true parameters
θ = 0.15 and α = 1.2. The data set consists of 144 lineages of a given type with four other types
present, each a single mutation removed from the main group. The sizes of these groups are 3,
1, 1, 1. The results are shown in Figure 3. The data used to plot Figure 3 is tabulated in the
Appendix.
9
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
Figure 3: Simulated log–likelihood surfaces from 30 000 particles with ±2SE confidence envelopes. The left column is for θ and the right for α. The downward spike in the lower confidence
boundary of the bottom left surface is an artifact caused by a negative value of the estimate,
and the real part of the logarithm has bee plotted. The values of the standard errors have no
such spike.
QK is noticeably faster when inferring θ, and slightly faster when inferring α. It also produces
substantially more accurate estimates than QSD for small values of θ. The differences in accuracy
for inference of α are not as clear, although it appears QSD is underestimating the likelihood at
the left hand edge of the surface. This deterioration of the performance of QSD is to be expected
because the true Beta(0.8, 1.2)–coalescent is a significant departure from the Λ = δ0 assumption
used to derive the corresponding approximate CSDs. Such coalescents are of particular interest
because significantly more efficient implementations exist for Kingman’s coalescent, and these
should be preferred whenever the Kingman hypothesis of Λ = δ0 cannot be rejected. Hence QK
is the recommended proposal distribution in practice.
4.3
Experiment 3: product of approximate conditionals
The IS algorithms used in the previous numerical experiments provide accurate results reasonably quickly, but the inference problems and data sets are of toy size and complexity. It is clear
that these algorithms are infeasibly slow for many problems of interest, such as genome–wide
data or large sample sizes.
10
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
5
Importance sampling for Ξ–coalescents
The important tools in deriving the optimal proposal distributions Q∗ (·|·) and the approximate
CSDs π̂ K (·|·) were, respectively, the lookdown construction of [DK99] and the trunk ancestry of
[PS10]. Both of these are also available for the Ξ–coalescent, and in this section we make use of
them to extend the IS algorithm to this family of models.
A lookdown construction for the Ξ–coalescent and the Ξ–Fleming–Viot process was derived by
Birkner et al. [BBM+ 09] and can be described as follows. For ease of notation we assume
Ξ({0}) = 0. If Ξ does have an atom at zero, its treatment is identical to the Λ–case.
P∞ 2 −1
Ξ(dr)⊗dr⊗N
Let ΠΞ be a Poisson point process on on R+ ×∆×[0, 1]N with rate dt⊗
i=1 ri
and associate to each lineage a level {1, . . . , n}. Define the function
n
o
(
P
P
min j ∈ N : ji=1 ri ≥ u
if u ≤ ∞
i=1 ri
g(r, u) :=
∞
otherwise
and at each (tj , (rj1 , rj2 , . . .), (uj1 , uj2 , . . .)) ∈ ΠΞ group the n particles such that all particles
l ∈ {1, . . . , n} with g(rj , ujl ) = k form a family for each k ∈ N. Among each family every
particle copies the type of the particle with the lowest level. In addition each particle follows an
independent mutation process as for the Λ–coalescent. For brevity we assume Ξ({0}) = 0 for
the rest of this section. If Ξ has an atom at 0 the following results hold with the addition of a
term that is analogous to the Λ({0})–term in previous sections.
This lookdown construction will be instrumental in establishing the following recursion, which is
an explicit version of (1) for Ξ–coalescents and a finite sites analogue of the sampling recursion
presented in [M0̈6] for the infinite alleles model.
Theorem 3. The likelihood of type frequencies n ∈ N|H| sampled from the stationary Ξ–
Fleming–Viot process solves
(
X X X
1
P(n) =
θl
nSla (h) + 1 − δah[l] P(n − eh + eSla (h) )+
(12)
gn + nθ
a∈El
h:nh >0 l∈L
(
)
n|H|
n1
X
X X
X
X
n
...
kh < n
...
1
|H| ×
|π11 |, |π21 |, . . . , |π|H| |
k
k
k1 =1
k|H| =1 π 1 ∈P
1
n1
|H|
π |H| ∈Pn|H|
h∈H
)
−1
| ∨h∈H π h |
λn;K(∨h∈H πh );S(∨h∈H πh ) P(k)
|π 1 |, . . . , |π |H| |
P
with the convention that 0k=1 f (k) = f (0). Here Pnkhh denotes the set of equivalence relations on
nh ∈ N elements with kh ≤ nh equivalence classes, π h = (π1h . . . πkhh ) denotes such an equivalence
P h
relation so that ki=1
|πih | = nh and ∨h∈H π h is the equivalence relation on n elements obtained
from appying each π h to the corresponding nh elements. The vector K(π) lists the sizes of all
equivalence classes with more than one member, S(π) is the number of classes with exactly one
member and gn is the total coalescence rate of n untyped lineages given by
gn =
n−1
X
a=1
n!
a!
X
b1 ,...,ba ∈N
b1 +...+ba =n
λn;K(b);S(b)
b1 ! × . . . × ba !
Proof. The proof is the same as in Section 1.4.1 of [BB09], adapted here from the Λ–coalescent to
the Ξ–coalescent. Let p denote the distribution of the types of the first n levels of the stationary
11
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
lookdown construction. Decomposing according to which event (whether mutation or a merger)
occurred first when tracing backwards in time yields
( n
X X X (l)
1
p(y1 , . . . , yn ) =
θl
Pah[yi ] p(y1 , . . . , yi−1 , Sla (yi ), yi+1 , . . . , yn )+
(13)
gn + nθ
i=1 l∈L
a∈El
)
X
λn;K(π);S(π) p(γπ (y1 , . . . , yn ))
π∈P (y)
where P (n) is the set of equivalence relations describing permissible mergers for the sample
y = (y1 , . . . , yn ) (that is, mergers where no equivalence class contains lineages of more than one
type) and γπ (y1 , . . . , yn ) is the vector of types which results in (y1 , . . . , yn ) if the look–down–
and–copy event denoted by the equivalence relation π takes place.
By exchangeability we are only interested in the vector of type frequencies n = (n1 , . . . , n|H| ).
For such a vector define the canonical representative as
κ(n) := (1, . . . , 1, 2, . . . , 2, . . . , |H|, . . . , |H|)
| {z } | {z }
|
{z
}
n1
and the likelihood as
0
p (n) :=
n2
n|H|
n
p(κ(n))
n1 , . . . , n|H|
Now we have the following identities
n
nh
p(κ(n − eh + eSla (h) )) = (nSla (h) + 1 − δah[l] )p0 (n − eh + eSla (h) )
n1 , . . . , n|H|
Y
−1
nh
n
n
k
p(κ(k)) =
p0 (k)
|H|
n1 , . . . , n|H|
π1h , . . . , πkhh
π11 , π21 , . . . , π|H|
k1 , . . . , k|H|
h∈H
which, when substituted into (13), yield the desired recursion.
As in Section 2 we can consider approximating the solution to (12) by importance sampling,
and the following theorem is a straightforward extension of Theorem 1.
Theorem 4. The optimal proposal distributions for recursion (12), denoted Q∗Ξ , are

π(eS a (h) |Hi+1 −eh ) (l)


if Hi = Hi+1 − eh + eSla (h)
nh θl π(el h |Hi+1 −eh ) Pah[l]



X
X
Y
λn;K(∨h∈H πh );S(∨h∈H πh )

nh
.
.
.
∗
QΞ (Hi |Hi+1 ) ∝
π(n − k|k)
π1h , . . . , πkhh

k
k

|H| ∈P |H| h∈H
π 1 ∈Pn11

π
n|H|


P

if H
i+1 = n and Hi = k for kh = {1, . . . , nh } and
h∈H kh < n
Proof. The argument is identical to the proof of Theorem 1 taking into account the larger class
of permitted simultaneous multiple mergers and hence different combinatorial coefficients.
As before the CSDs used in the statement of Theorem 4 are not available, but any approximation to them will yield an unbiased algorithm and better approximations can be expected to
correspond to more efficient algorithms. The generator of the Ξ–Fleming–Viot process is not
as immediately tractable as its Fleming–Viot and Λ–Fleming–Viot counterparts, so we abandon
the generator–based approach of De Iorio and Griffiths and derive approximate CSDs from the
trunk ancestry A(n).
12
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
Definition 4. Let π̂ΞK (eh |n) be the CSD obtained by letting the (n + 1)th lineage mutate with
rates {θl }l∈L , be absorbed into A(n) at rate
n
X
X 1{0} (S(π))
k
k=1 π∈Pn+1
n+1
1
π , . . . , πk
n
+ 1N (S(π))
λn+1;K(π);S(π)
S(π)
and choosing its parent uniformly upon absorption.
Proposition 2. The approximate CSDs π̂ΞK (·|n) solve the following recursion:
"
"
#
#
n
X
X
n+1
n
θ+
1{0} (S(π)) 1
+ 1N (S(π))
λn+1;K(π);S(π) π̂ΞK (eh |n) =
k
π
,
.
.
.
,
π
S(π)
k
k=1 π∈Pn+1
n
nh X X
n+1
n
1{0} (S(π)) 1
+ 1N (S(π))
λn+1;K(π);S(π) +
n
π , . . . , πk
S(π)
k
k=1 π∈Pn+1
X
l∈L
θl
(l)
X
Pah[l] π̂ΞK (eSla (h) |n)
a∈El
and is the stationary distribution of the Markov Chain on H with transition probability matrix
nP
h
o
i
P
P
n
n+1
n
(l) +
θ
P
1
(S(π))
+
1
(S(π))
λ
k
N
{0}
n+1;K(π);S(π) N
l∈L l
k=1
π∈Pn+1
π 1 ,...,π k
S(π)
h
i
P
P
n
1{0} (S(π)) π1n+1
λn+1;K(π);S(π)
θ + nk=1 π∈P k
,...,π k + 1N (S(π)) S(π)
n+1
Proof. The proof is identical to Proposition 1 and follows by considering the first event backwards
in time encountered by the lineage.
Note that because simultaneous multiple collisions can take place it the decomposition in Remark 1 is no longer valid, and multivariate approximate CSDs π̂ΞK (m|n) must also be specified.
The most natural way to restore exchangeability is to average over all possible permutations of
the lineages in m, but this is computationally infeasible for all but very small samples m. A
practical approximation, used widely in PAC algorithms, is to average over some small number
of random permutations instead.
Remark 2. Note that computing Q∗Ξ requires computing all equivalence classes on n objects:
an extremely expensive task. Moreover, it does not seem likely that this expenditure can be
avoided while retaining a close approximation to the optimal proposal distributions, at least
without strong assumptions on Ξ, given the space of possible ancestral trees allowed by the
Ξ–coalescent. This approach has been presented for theoretical interest and completeness, as
we expect its practical use will have severe limitations in practice.
6
Discussion
The greater flexibility awarded by Λ–coalescents comes with considerable computational overhead in comparison to the more restrictive Kingman’s coalescent, and the situation for Ξ–
coalescents is even worse. The inference problems considered in this paper have consisted of
small samples of chromosomes comprised of a small number of loci, each with only two alleles and a simple mutation model. Yet the simulations used to obtain likelihood surfaces have
taken upwards of several hours of computer time. While some cost is certainly unavoidable, the
computations carried out in this paper could be sped up considerably by reducing the number
13
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
of independent simulations through making use of driving values [GT94c] or bridge sampling
[MW96]. It is also noteworthy that, as with IS algorithms in general, all of the algorithms used
here can be parallelised very effectively.
The results of [SD00] have been extended beyond Kingman’s coalescent to include additional
features, such as crossover recombination and a subdivided population structure. The method
by which such extensions are derived is always the same: computing posterior transition rates
in terms of CSDs, and approximating the CSDs either by using the component–wise vanishing
assumption on an appropriate generator for allele frequencies, or by specifying a simpler genealogical model such as the trunk ancestry. Because Λ–coalescents (resp. Ξ–coalescents) and
Λ–Fleming–Viot generators (resp. Ξ–Fleming–Viot generators) differ from their Kingman and
Fleming–Viot counterparts only through an additive multiple merger term (resp. simultaneous
multiple merger term), deriving the same extensions for either class of coalescents poses no additional difficulty. In practice the computational cost of including recombination or a subdivided
population structure will be considerable, and it seems inevitable that IS algorithms will be a
feasible approach only for small sample sizes when such features are included.
The limits of data sets which can be feasibly analysed using IS are restrictive even in the Kingman setting, so alternate methods have been developed to tackle a broader range of problems.
Among these is the product of approximate conditionals (PAC) method of [LS03], which uses
the approximate CSDs directly by decomposing the likelihood into a product of conditional
sampling distributions. The same approach could be investigated for Λ–coalescents as well as
Ξ–coalescents, and can be expected to yield much faster algorithms than the IS algorithms
considered in this paper. The approximate CSDs considered here were designed for speed of
evaluation as opposed to numerical accuracy, and hence can be thought of as a first step in this
direction, but more sophisticated approximations are likely to be necessary for a practical PAC
method.
References
[Á04]
E. Árnason. Mitochondrial cytochrome b DNA variation in the high–fecundity
atlantic cod: trans–Atlantic clines and shallow gene genealogy. Genetics, 166:1871–
1885, 2004.
[BB08]
M. Birkner and J. Blath. Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol., 57(3):435–463, 2008.
[BB09]
M. Birkner and J. Blath. Measure–valued diffusions, general coalescents and population genetic inference. in J. Blath, P. Mörters, M. Scheutzow (Eds.), Trends in
Stochastic Analysis, LMS 351:329–363, 2009.
[BBB94]
J.D.G. Boom, E.G. Boulding, and A.T. Beckenback. Mitochondrial DNA variation
in introduced populations of Pacific oyster, Crassostrea gigas, in British Columbia.
Can. J. Fish. Aquat. Sci., 51:1608–1614, 1994.
[BBM+ 09] M. Birkner, J. Blath, M. Möhle, M. Steinrücken, and J. Tams. A modified lookdown
construction for the Xi–Fleming–Viot process with mutation and populations with
recurrent bottlenecks. Alea, 6:25–61, 2009.
[BBS11]
M. Birkner, J. Blath, and M. Steinrücken. Importance sampling for Lambda–
coalescents in the infinitely many sites model. Theor. Popln Biol., 79(4):155–173,
2011.
[DIG04a]
M. De Iorio and R.C. Griffiths. Importance sampling on coalescent histories I. Adv.
in Appl. Probab., 36(2):417–433, 2004.
14
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
[DIG04b]
M. De Iorio and R.C. Griffiths. Importance sampling on coalescent histories II:
Subdivided population models. Adv. in Appl. Probab., 36(2):434–454, 2004.
[DIGLR05] M. De Iorio, R.C. Griffiths, L. Leblois, and F. Rousset. Stepwise mutation likelihood
computation by sequential importance sampling in subdivided population models.
Theor. Popln Biol., 68:41–53, 2005.
[DK99]
P. Donnelly and T. Kurtz. Particle representations for measure–valued population
models. Ann. Probab., 27(1):166–205, 1999.
[EW06]
B. Eldon and J. Wakeley. Coalescent processes when the distribution of offspring
number among individuals is highly skewed. Genetics, 172:2621–2633, 2006.
[FD01]
P. Fearnhead and P. Donnelly. Estimating recombination rates from population
genetic data. Genetics, 159:1299–1318, 2001.
[FKYB99] J. Felsenstein, M.K. Kuhner, J. Yamamoto, and P. Beerli. Likelihoods on coalescents: a Monte Carlo sampling approach to inferring parameters from population
samples of molecular data. IMS Lect. Notes Monogr. Ser., 33:163–185, 1999.
[GJS08]
R.C. Griffiths, P.A. Jenkins, and Y.S. Song. Importance sampling and the two–locus
model with subdivided population structure. Adv. in Appl. Probab., 40:473–500,
2008.
[GM96]
R.C. Griffiths and P. Marjoram. Ancestral inference from samples of DNA sequences
with recombination. J. Comput. Biol., 3:479–502, 1996.
[GT94a]
R.C. Griffiths and S. Tavaré. Ancestral inference in population genetics. Statist.
Sci., 9:307–319, 1994.
[GT94b]
R.C. Griffiths and S. Tavaré. Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. Lond. B, 344:403–410, 1994.
[GT94c]
R.C. Griffiths and S. Tavaré. Simulating probability distributions in the coalescent.
Theor. Popln Biol., 46:131–159, 1994.
[GT99]
R.C. Griffiths and S. Tavaré. The ages of mutations in gene trees. Ann. Appl.
Probab., 9:567–590, 1999.
[GT08]
D. Gorur and Y.W. Teh. An efficient sequential Monte Carlo algorithm for coalescent
clustering. NIPS, 2008.
[HUW08]
A. Hobolth, M. Uyenoyama, and C. Wiuf. Importance sampling for the infinite sites
model. Stat. Appl. Genet. Mol., 7:Article 32, 2008.
[Jen12]
P.A. Jenkins. Stopping–time resampling and population genetic inference under
coalescent models. Stat. Appl. Genet. Mol. Biol., 11(1):Article 9, 2012.
[JG11]
P.A. Jenkins and R.C. Griffiths. Inference from samples of DNA sequences using a
two–locus model. J. Comput. Biol., 18:109–127, 2011.
[Kin82]
J.F.C. Kingman. The coalescent. Stochast. Process. Appllic., 13(3):235–248, 1982.
[LS03]
N. Li and M. Stephens. Modeling linkage disequilibrium and identifying recombination hotspots using single–nucleotide polymorphism data. Genetics, 165:2213–2233,
2003.
[M0̈6]
M. Möhle. On sampling distributions for coalescent processes with simultaneous
multiple collisions. Bernoulli, 12(1):35–53, 2006.
15
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
[MS01]
M. Möhle and S. Sagitov. A classification of coalescent processes for haploid exchangeable population models. Ann. Probab., 29(4):1547–1562, 2001.
[MW96]
X.L. Meng and W.H. Wong. Simulating rations of normalizing constants via a
simple identity: a theoretical exploration. Statist. Sinica, 6:831–860, 1996.
[Pit99]
J. Pitman. Coalescents with multiple collisions. Ann. Probab., 27(4):1870–1902,
1999.
[PS10]
J.S. Paul and Y.S. Song. A principled approach to deriving approximate conditional
sampling distributions in population genetic models with recombination. Genetics,
186:321–338, 2010.
[PSS11]
J.S. Paul, M. Steinrücken, and Y.S. Song. An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics,
187:1115–1128, 2011.
[Sag99]
S. Sagitov. The general coalescent with asynchronous mergers of ancestral lineages.
J. Appl. Probab., 36(4):1116–1125, 1999.
[SBB13]
M. Steinrücken, M. Birkner, and J. Blath. Analysis of DNA sequence variation
within marine species using Beta–coalescents. Theor. Popln Biol., 87:15–24, 2013.
[Sch00]
J. Schweinsberg. Coalescents with simultaneous multiple collisions. Electron. J.
Probab., 5:1–50, 2000.
[Sch03]
J. Schweinsberg. Coalescent processes obtained from super–critical Galton–Watson
processes. Stoch. Proc. Appl., 106:107–139, 2003.
[SD00]
M. Stephens and P. Donnelly. Inference in molecular population genetics. J. R.
Statist. Soc. B, 62(4):605–655, 2000.
[SHS13]
S. Sheehan, K. Harris, and Y.S. Song. Estimating variable effective population sizes
from multiple genomes: A sequentially markov conditional sampling distribution
approach. Genetics, 194:647–662, 2013.
[SPS13]
M. Steinrücken, J.S. Paul, and Y.S. Song. A sequentially markov conditional sampling distribution for structured populations with migration and recombination.
Theor. Popln Biol., 87:51–61, 2013.
[SW08]
O. Sargsyan and J. Wakeley. A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor. Popln
Biol., 7:104–114, 2008.
16
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
Appendix
θ
QSD
SE
QK
SE
QGT
SE
Truth
SE
0.025
-9.51030
-10.5485
-9.57311
-10.9026
-9.47431
-10.3942
-9.52471
-12.4814
0.050
-9.13432
-10.4780
-9.15431
-10.6007
-9.21798
-9.78606
-9.09862
-10.8099
0.075
-8.90727
-9.82448
-9.0231
-10.1722
-8.92166
-10.2134
-8.93651
-10.7472
0.100
-8.88418
-10.0025
-8.85949
-10.0641
-8.91979
-10.0201
-8.87115
-11.8601
0.125
-8.80563
-9.98773
-8.84042
-10.1718
-8.82118
-10.0026
-8.85144
-10.8165
0.150
-8.86454
-10.1441
-8.88371
-10.1162
-8.87626
-10.0002
-8.86687
-10.9308
0.175
-8.92226
-10.3226
-8.93492
-10.4286
-8.9615
-10.1007
-8.91047
-10.9891
0.200
-8.95339
-10.4324
-8.97268
-10.2887
-8.90603
-10.0989
-8.94913
-11.1207
1.7875
-9.12203
-10.2608
-9.08472
-10.1954
-9.10205
-10.4741
-9.08969
-10.8996
1.900
-9.15706
-10.1753
-9.20387
-10.3699
-8.98686
-10.4735
-9.18711
-10.9756
Table 1: Data for the left hand column of Figure 1
α
QSD
SE
QK
SE
QGT
SE
Truth
SE
1.1125
-8.96808
-10.3372
-8.92542
-10.9427
-8.99351
-9.85581
-8.91967
-11.3147
1.2250
-8.84718
-9.97697
-8.86082
-10.9738
-8.89304
-9.86253
-8.87000
-11.0027
1.3375
-8.85592
-9.88253
-8.86241
-9.97233
-8.77234
-9.89071
-8.84048
-11.4185
1.4500
-8.80791
-9.89227
-8.84065
-10.0657
-8.90723
-10.0021
-8.85894
-10.8742
1.5625
-8.91378
-10.5350
-8.91129
-10.4949
-8.94775
-10.0471
-8.88302
-10.7238
1.6750
-8.92394
-9.92019
-9.00989
-10.1023
-8.96374
-10.3416
-8.96852
-10.7883
Table 2: Data for the right hand column of Figure 1
θ
QSD
SE
QK
SE
0.025
-12.4427
-12.9010
-12.2455
-13.0610
0.050
-10.2131
-10.4056
-11.0794
-11.5668
0.075
-9.82853
-10.2567
-10.7040
-10.9855
0.100
-10.3858
-11.3132
-10.4480
-10.9327
0.125
-10.5939
-11.5275
-10.2955
-10.8947
0.150
-10.5019
-11.1319
-10.506
-11.2048
0.175
-10.5287
-11.3065
-10.5081
-11.2250
0.200
-10.5280
-11.3001
-10.5652
-11.8668
1.7875
-11.7492
-12.7786
-12.1056
-13.2439
1.900
-12.4704
-13.3435
-12.4479
-13.6195
Table 3: Data for the left hand column of Figure 3
α
QSD
SE
QK
SE
1.1125
-10.8474
-11.7316
-10.4259
-11.9209
1.2250
-10.5705
-11.2151
-10.4521
-11.7356
1.3375
-10.7223
-11.7195
-10.4787
-11.1095
1.4500
-10.7911
-11.7504
-10.5692
-11.4067
1.5625
-10.8631
-11.5423
-11.1936
-12.2200
1.6750
-11.3215
-12.2433
-11.6482
-12.2696
Table 4: Data for the right hand column of Figure 3
17
CRiSM Paper No. 14-12, www.warwick.ac.uk/go/crism
Download