1 Sequence Alignments For the 18S rRNA, the alignment of the Piroplasmorida initially followed the secondarystructure model of Gutell et al. (1994), obtained from the Comparative RNA Website (Cannone et al., 2002), where a model and an associated alignment of almost-complete rRNA sequences is available for the piroplasms. The alignment was refined using the model for the variable area V4 of Wuyts et al. (2000) (which did not have any structure indicated in most of the initial models). The alignment was then manually optimized for best fit to the consensus structure, taking particular note of compensatory base changes where there was length variation in stems. For regions with variable lengths, hypotheses of base pairing were evaluated using the Mfold program (Zucker, 2003), which folds RNA based on energy minimizations. All structure nomenclature (for stems and regions) follows Wuyts et al. (2000). The remaining almost-complete database sequences (defined as >1500 nt long) were then added to this initial alignment using the pairwise alignment feature of the MacClade v. 4.08 program (Maddison and Maddison, 2000), and then manually optimized for fit to the consensus structure. Most of the new sequences were obtained from the SILVA database (Pruesse et al., 2007), although direct searches of the main sequence databases were also undertaken. The alignment and consensus structure were modified, if necessary, to accommodate sequences that deviated strongly from those already in the alignment. Sequences FJ668368 and HM538196 were excluded from the alignment because the sequencing seemd to be very poor, and sequence HM538255 proved to be difficult to align, possibly due to sequencing errors. The final alignment consisted of 603 sequences and 2085 aligned positions. Cardiosporidium cionae (EU052685) was used as the outgroup, based on the tree shown by Morrison (2009). For the two protein-coding genes, the DNA sequences were translated to amino acids and then the amino acid sequences were aligned. For regions with variable lengths, hypotheses of alignment were evaluated using the Promals3D program (Pei et al., 2008), which aligns the sequences based on the secondary sructure of the protein.The final alignment consisted of 36 sequences and 1980 aligned positions for the hsp70 gene, and 17 sequences and 1323 positions for the beta-tubulin gene. In neither case was a suitable outgroup sequence available, and so the trees were rooted based on the root location of the 18S rRNA tree. 2 Phylogenetic Analyses For the rRNA, sequence locations in which positional homology was ambiguous across all taxa were identified as regions of ambiguous alignment (RAA) or regions of expansion and contraction (REC) (Gillespie, 2004). The REC notably included stems 10, E10-1, 12, E23-2, E23-7, 43, 46 and 49, all of which are widely recognized as variable regions in 18S rRNA. Indeed, it proved to be impossible to derive a consensus secondary-structure model for stems E23-1 to E23-7 (in the V4 region). These RAA and REC regions were excluded from the phylogenetic analyses. Excluded from all analyses were autapomorphies, as many of these appeared to be sequencing errors. All sequence locations were included in the analyses of the protein-coding genes. Phylogenetic analyses were performed using bayesian analyses via the MrBayes v3.2.1 program (Ronquist and Huelsenbeck 2003). Following Beiko et al. (2006), analyses consisted of multiple, relatively short MCMC runs each with a small number of chains. The Tracer v. 1.4 program (Rambaut and Drummond, 2007) was used to check for convergence of the pooled data from the multiple runs. The number of runs and the length of the burnin for each analysis were calculated so that the expected sample size (ESS) values for all of the model parameters were >100, and with the standard deviation of the split frequencies <0.05. The default priors were used in all analyses. The analysis model for the rRNA used two partitions, one for the unpaired rRNA positions with the GTR+G+I model, and one for the paired (stem) positions with the doublet+G+I model. All model parameters were unlinked across the partitions, except for the tree topology. This analysis consisted of 17 runs of 2 chains each, with 1,200,000 generations per run and a burnin of 200,000 generations. This yielded 170,000 trees for the consensus tree, with a final standard deviation of the split frequencies = 0.064. The chains sampled from two islands of trees, with somewhat different parameter values for the likelihood models. The composition of the clades with large posterior probabilities was the same for both islands, but with some differences in branch order both between and within those clades. The analyses for the protein-coding genes used the M3 codon model. For the hsp70 gene, the analysis consisted of 6 runs of 2 chains each, with 1,025,000 generations per run and a burnin of 125,000 generations. This yielded 54,000 trees for the consensus tree, with a final standard deviation of the split frequencies = 0.010. For the beta-tubulin gene, the analysis consisted of 8 3 runs of 2 chains each, with 2,000,000 generations per run and a burnin of 750,000 generations. This yielded 100,000 trees for the consensus tree, with a final standard deviation of the split frequencies = 0.024. The phylogenetic trees were drawn with either FigTree v. 1.3.1 (Rambaut, 2009) or Dendroscope v. 2.6.1 (Huson et al., 2007). References Beiko R.G., Keith J.M., Harlow T.J., Ragan M.A. (2006) Searching for convergence in phylogenetic Markov Chain Monte Carlo. Systematic Biology 55, 553–565. Cannone J.J., Subramanian S., Schnare M.N., Collett J.R., D’Souza L.M., Du Y., Feng B., Lin N., Madabusi L.V., Muller K.M., Pande N., Shang Z., Yu N., Gutell R.R. (2002) The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 3, 2. Gillespie J.J. (2004) Characterizing regions of ambiguous alignment caused by the expansion and contraction of hairpin-stem loops in ribosomal RNA molecules. Molecular Phylogenetics and Evolution 33, 936–943. Gutell R.R, Larsen N, Woese C.R. (1994) Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perFor each spective. Microbiology Reviews 58, 10–26. Huson D.H., Richter D.C., Rausch C., Dezulian T., Franz M., Rupp R. (2007) Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinformatics 8, 460. Maddison D.R., Maddison W.P. (2000) MacClade: Analysis of Phylogeny and Character Evolution. Sinauer Associates, Sunderland. Morrison D.A. 2009. Evolution of the Apicomplexa: where are we now? Trends in Parasitology 25, 375–382. Pei J., Kim, B.-H., Grishin N.V. (2008) PROMALS3D: a tool for multiple sequence and structure alignment. Nucleic Acids Research 36, 2295–2300. Pruesse E., Quast C., Knittel K., Fuchs B.M., Ludwig W., Peplies J., Glöckner F.O. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Opens external link in new window Nucleic Acids Research 35, 7188–7196. 4 Rambaut A. (2009) FigTree v1.3.1. Institute of Evolutionary Biology, University of Edinburgh, Edinburgh. Rambaut A., Drummond A.J. (2007) Tracer: MCMC Trace Analysis Tool. Institute of Evolutionary Biology, University of Edinburgh, Edinburgh. Ronquist, F., Huelsenbeck, J.P. (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. Wuyts J., De Rijk P., Van de Peer Y., Pison G., Rousseeuw P., De Wachter R. (2000) Comparative analysis of more than 3000 sequences reveals the existence of two pseudoknots in area V4 of eukaryotic small subunit ribosomal RNA. Nucleic Acids Research 28, 4698–4708. Zuker M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research 31, 3406–3415.