Species Tree Workshop January 14, 2012 Practice with BEST Please download MrBayes 3.2 for either Windows, Macintos, or UNIX from http://mrbayes.sourceforge.net/ Agenda The MrBayes with BEST (v 3.2) implementation (work in progress) Run the finch example (download finch.nex) Run a multiple allele data set (yeast with 4 genes, 22 taxa, 6 species ) …or Try your own data Previous Implementation: MrBayes with BEST Step 1: Use MrBayes to propose vectors of joint gene trees (unlinked and rooted with outgroup). Step 2: Given those gene trees, propose a compatible species tree. Step 3: Implement the chain fully within MrBayes using the usual properties of the MCMC as proposed by the user. Program found at www.stat.osu.edu/~dkp/BEST New Implementation: MrBayes 3.2 integrated with BEST Assumes molecular clock for gene trees as part of a full model including Coalescent for gene trees|species tree Program found at http://mrbayes.sourceforge.net/ Implementation: MrBayes 3.2 As always Wide variety of nucleotide, amino acid, and codon models Variety of proposal distribution options Parallel “hot” and “cold” chains to balance efficiency while covering large tree spaces. Checkpointing to allow stop and starts New speed improvements BEST can use MPI for Mac and UNIX GPU (NVIDIA graphics card) support Steps for any Bayesian Runs Read the data Set the model (data|gene tree) Set the Prior (including gene|species) Set the MCMC rules Run the MCMC Check convergence Summarize results Files created ckp (Checkpoint file for restarting) tree5.run2.t (trees saved loci 5 in run 2) tree5.parts (partitions seen for tree 5) tree5.trprobs (tree probabilties) tree5.con.tre (consensus tree) tree5.tstat (partition statistics) tree5.vstat (branch and node statistics) Remember Use a separate folder for each analysis Remember the “taxset”and “speciespartition” statements in MrBayes with ≥ one taxa per species Remember to allow variable population sizes With n loci, the species tree shows up as files labeled n+1 Remember to unlink Gene tree topologies and branch lengths for sure unlink topology=(all) brlens=(all); Parameters of model as approriate unlink statefreq=(all) revmat=(all); Issues Gene trees following a molecular clock is too restrictive Some outputs still need to be modified for species tree use Species Tree Notation 0.005 0.01 0.035 0.03 0.02 A Topology, branch lengths, & population sizes: (D: 0.035(C:0.03(A:0.02,B:0.02):0.01#0. 3):0.005#0.2)#0.25 0.02 B C D qAB = 0.3, qABC = 0.2, qABCD = 0.25 Three lineages of grassfinches (Poephila) Long-tailed (acuticauda) Long-tailed (hecki) Black-throated (cincta) 30 gene trees from Australian finches P. acuticauda P. hecki P. cincta Jennings & Edwards (2005) Evolution 59, 2033-2047. Estimated species tree distribution using BEST Estimated species tree distribution using BEST 1.0 0.94 1.0 0.03