Supporting Materials to accompany Modelling Heterotachy in Phylogenetic Inference by Reversible-Jump Markov chain Monte Carlo Mark Pagel, School of Biological Sciences, University of Reading, and Andrew Meade, Institute of Biological Sciences, University of Aberystwyth Calculating Proposal Ratios and Jacobian terms for the reversible-jump Markov chain Monte Carlo method (MCMC, see text of article) For most applications of MCMC the dimensionality of the current and proposed models is the same and thus the Jacobian can be ignored, as it takes the value of 1. An unusual feature of the approach we describe in the associated article, however, is that we wish to explore models with differing numbers of parameters, corresponding to some branches of the tree topology being assigned more than one length. The reversible-jump MCMC (RJ-MCMC) algorithm can be used to construct chains that jump among models of differing dimensionality. These require carefully constructed proposal mechanisms and calculation of the Jacobian term. We describe these below. Splitting and Merging. To implement the reversible-jump procedure for determining branches to add to or subtract from the current model in the chain, the ‘normal’ moves of the Markov chain – those of exploring the parameters of the model of sequence evolution, Q, and the possible different phylogenetic trees -- can be ignored as these moves do not change the dimension of the parameter space. Instead we wish to define the proposal ratios and Jacobian terms for moves that add a new branch length to the tree by splitting an existing branch into two, or cause two distinct lengths to be merged into one. Call these moves “split” and “merge” respectively. Both moves begin by randomly selecting one of the 2s-3 edges of the tree. The model will always have the same number of complete branch-length sets for every edge. Call this number k where if k=1 the model is just a conventional non-mixture model, but for k>1 the model sums the likelihood at each site over the k branch-length sets as in eq. 3. We will assume in what follows that this number has been set in advance. Thus, if k=3 every edge is represented by three lengths. In our RJ model, these three lengths can be identical in which case they represent a single ‘length class’ and account for just one parameter. Alternatively, there could be two length classes among the three lengths (two lengths being identical but differing from the third) in which case this edge accounts for two parameters in the model (or one parameter over and above the conventional non- mixture model). By comparison, the full branch-lengths sets mixture models would always treat this circumstance as three parameters. For a given k, if all k lengths associated with the edge are identical – there is a single length class -- then the probability of attempting a split move on that edge is 1. This is because we can only increase the dimensionality of edges with one length class. The probability of attempting a merge move on the same edge is zero – we cannot reduce the dimesionality of an edge with one length class. For edges with at least two but fewer than k length classes, the probability of attempting a split move on that edge is 1/2, as is the probability of attempting a merge move. That is, we can attempt to split when there is at least one length class with two or more elements, and equally, so long as the number of elements in a single class is less than k, we could also attempt a merge move. If all of the k lengths associated with an edge are different, we can no longer attempt to split, so the probability of attempting a merge move on that edge is 1.0. To conduct a split we first determine the number of different length classes that could be split, that is, those containing at least two elements. If k=3, for example, it is possible to split so long as there is either one or two length classes. If there are three (each containing one element) we cannot split any of them because to do so would increase k beyond 3. Call the number of length classes with greater than one element m. Then the probability of selecting one of these m classes to split is 1/m. If the set m has n (identical) elements, some number ni of them will be assigned to one of the new length classes to arise from the split and nj of them to the other, where ni+nj= n. There are 2 n i n j 1 1 different ways of making these assignments. To accomplish the split we draw a uniform random number, u, and create two new lengths by calculating t i t u / n i and t j t u /n j . If u is drawn on the interval n i t to n j t then the two new lengths sum to t. A merge move seeks to combine two different length classes into one. If there is only one length class associated with the edge it cannot be merged. If the edge has two or more length classes, two are chosen at random from among the l classes available, and there are ways of doing l 2 this. The two length classes chosen are merged creating one fewer length class, with all of the elements in both classes being combined by a simple weighted average, the weights being the ni and nj. Following this algorithm the proposal ratio for a split move is: Pm' 1 l' 2 1 1 Ps n1 p(u) m (2 1) where Pm and Ps refer to the probabilities of merging and splitting respectively, p(u) is the probability of observing u, and the primes denote the proposed model. Similarly, the proposal ratio for merge move is: Ps' 1 1 n1 p(u) ' m (2 1) . 1 Pm l 2 The Jacobian is defined as the determinant of the square matrix of partial derivates of the proposed parameters with respect to the current parameters and to the amount by which they are to be changed, u. Here the Jacobian for a split performed on set m is written as: t i J= tt j t t i u n i n j t j n i n j , u where the bold elements in the Jacobian are square matrices of their respective partial derivatives with dimensions corresponding to the number of elements in ni and nj. The Jacobian for the merge is the reciprocal of J. For the analyses reported here we have implemented the RJ model to be seeded with k=2 identical branch length sets, corresponding to two vectors of branch lengths in the above equations. We then apply the split and merge moves to the shared edges of these vectors. The lengths of the edges that never accept a split always remain identical across the two branch-length sets although they can change as a pair in response to normal branch length updates in the Markov chain. Edges that accept a split can adopt lengths that diverge between the two length classes. At some later iteration of the chain they might even be merged to re-form a single length. We use just two branch-length sets here because previous testing (Meade and Pagel, 2008) had indicated that two were adequate for these data. In future implementations of the model we will investigate allowing the number of distinct sets to change dynamically according to ‘augment’ and ‘reduce’ moves such as we have described elsewhere (Pagel and Meade, 2006). These moves will propose to add or to remove an entire set of branch-lengths in one iteration of the chain, being aided by an appropriate Jacobian term to account for the large change in dimensionality of the chain.