1 Electronic Supplementary Material 2 3 Maintenance of essential amino acid synthesis pathways in Blattabacterium 4 symbionts of a wood-feeding cockroach 5 6 Gaku Tokuda1*, Liam D. H. Elbourne2*, Yukihiro Kinjo1*, Seikoh Saitoh1, Zakee Sabree3, 7 Masaru Hojo1, Akinori Yamada4, Yoshinobu Hayashi5, Shuji Shigenobu6, Claudio 8 Bandi7, Ian T. Paulsen2, Hirofumi Watanabe8#, Nathan Lo5# 9 10 TBRC, University of the Ryukyus, Okinawa, Japan1, Department of Chemistry and 11 Biomolecular Sciences, Macquarie University, Sydney, Australia2, Department of 12 Evolution, Ecology, and Organismal Biology, Ohio State University3, Department of 13 Biological Sciences, Graduate School of Bioscience and Biotechnology, Tokyo Institute 14 of Technology4, School of Biological Sciences, The University of Sydney, Sydney, 15 Australia5, National Institute for Basic Biology, Aichi, Japan6, DIPAV, Universita’ di 16 Milano, Milano, Italy7 National Institute of Agrobiological Sciences, Tsukuba, Japan8 17 18 19 *equal first authors 20 #Authors for correspondence: 21 nathan.lo@sydney.edu.au 22 hinabe@affrc.go.jp 23 24 25 26 27 28 29 30 31 32 Supplementary Materials and Methods 33 Genome sequencing and annotation. P. angustipennis individuals were collected on Mt. 34 Tsukuba, Ibaraki, Japan, in January 2010. Four adult females and final-instar larvae 35 were used for DNA preparations as previously described, omitting the pulse-field gel 36 electrophoresis step (1). 128,725 reads from two 454 GS Junior runs were assembled 37 into 29 contigs (N50, 89 Kb) with Newbler, ver. 2.6 (2) using default settings. Of these, 38 19 were successfully mapped onto the genome of Blattabacterum sp. strain Bge using 39 Projector 2 (3). For preparation of DNA library used for Illumina, a total DNA was 40 extracted from the fat body dissected from a single female larva using Isoplant II 41 (Nippon Gene, Tokyo) followed by further purification with columns in DNeasy Plant Mini 42 Kit (Qiagen, Hilden, Germany). 61.3 M reads were obtained from a 100 bp paired-end 43 library using an Illumina HiSeq2000. Illumina reads that could be matched with one of 44 the 19 contigs from 454 (574,354 reads) were then successfully assembled into a linear 45 632 kbp sequence using Newbler, ver. 2.7. A single gap consisting of sequence repeats 46 was closed by PCR and Sanger sequencing. Annotation was performed using the 47 Microbial Genome Annotation Pipeline in DDBJ and manually compared with other 48 strains of Blattabacterium. 16S rRNA was also detected by manual comparison with 49 genomes of other Blattabacterium strains, and some genes were detected using 50 BLASTN nr, Rfam (4), and BRUCE (5). The genome has been deposited in 51 DDBJ/EMBL/GenBank under accession number AP012548. 52 Phylogenetic analysis. Protein sequences for EngA, RpsE, GidA, ValS, FusA, Tuf, GyrA, 53 MutS, InfB, RpoC, RpoB, RplB, and RplK were obtained for 43 54 Bacteroidales/Flavobacteriales species and individually aligned using MAFFT 55 (v.6.624b), employing the 'linsi' algorithm (6). Columns containing gapped characters 56 were removed with a custom Perl script and alignments for each species were 57 concatenated, resulting in a total of 7,104 characters/species. RAxML (7) was used to 58 perform a maximum-likelihood phylogenetic reconstruction (JTT+I+GAMMA) with 100 59 bootstrapping replicates via the CIPRES Science Gateway (8). A phylogenetic analysis 60 was also performed for host taxa, based on 18S rDNA sequences retrieved from 61 GenBank (Periplaneta americana: AF220572; Cryptocercus punctulatus: AF220571; 62 Mastotermes darwiniensis: AF220568; Blatella germanica: AF220573; Blaberus 63 giganteus: EF363234; Panesthia angustipennis spadica: AB036190). Outgroups for this 64 analysis were the phasmid Agathamera crassa (Z97561) and the orthopteran Tettigonia 65 viridissima (Z97587). Alignment was based on secondary structure, as previously 66 described (9). Modelgenerator was used to select an appropriate model of sequence 67 evolution based on the Bayesian information criterion. Phylogenetic analysis was 68 performed using MrBayes 3.2.1 (10). In the Bayesian phylogenetic analysis, posterior 69 distributions of parameters, including the tree topology and branch lengths, were 70 estimated using Markov chain Monte Carlo (MCMC) sampling. Samples from the 71 posterior distribution were drawn every 1000 MCMC steps over a total of 10,000,000 72 steps. The analysis was run using four chains, comprising one cold chain and three 73 heated chains. The first 10% of samples were discarded as burn-in. Acceptable mixing 74 and convergence to the stationary distribution were checked by inspection of samples 75 from the posterior and by calculating the standard deviation of split frequencies among 76 the tree topologies. 77 78 Supplementary Results and Discussion 79 The GC content BPAA is 26.41%, and coding sequence density is 94%, with 575 protein 80 coding and 40 RNA-coding genes. The latter includes 34 tRNAs for transfer of all amino 81 acids, tmRNA, a signal recognition particle RNA, RNase P, and an operon containing 82 16S, 5S, and 23S. Figure S1 shows the inferred metabolic reconstruction of 83 Blattabacterium strain BPAA based on genome data. Most genes essential for DNA 84 replication as well as mRNA transcription and translation are present. Intracellular 85 bacterial symbionts are dependent upon the major sigma factor RpoD for transcription, 86 but they usually lack alternative sigma factors. BPAA, like other Blattabacterium strains, 87 was found to encode the sigma factor RpoN, a transcriptional regulator of genes 88 involved in nitrogen assimilation that is enhancer-activated. BPAA contains gene 89 complements for the production of outer membrane and cell-wall components. It is 90 inferred to generate carbon precursors and ATP respectively using gluconeogenesis 91 and aerobic biosynthesis. 92 93 A key difference between BPAA and all other Blattabacterium genomes is the absence 94 of any plasmid in the former. The three plasmid-based genes were present as a cluster 95 in the BPAA chromosome. We confirmed this apparent integration by PCR and Sanger 96 sequencing. BPAA has only two genes that are not found in any other Blattabacterium 97 genome, both of which encode hypothetical proteins (BPAA_093 and BPAA_129). A 98 total 9 genes in BPAA (nadD, mvaK, miaA, topA, murD, menC, rplR, rnpA, comEB) 99 contain single nucleotide insertions that alter the downstream reading frame. It is 100 possible that these genes no longer encode intact proteins; however, the mechanism of 101 transcriptional slippage within homopolymeric regions may correct the reading frame in 102 a subpopulation of transcripts that can be translated into functional enzymes (11). 103 Phylogenetic analysis showed that BPAA is most closely related to strain BGIGA. 104 The Blattabacterium clade was found to be a sister group to a clade containing strains of 105 the endosymbiont Sulcia muelleri. Phylogenetic relationships of Blattabacterium host 106 species of based on 18S rDNA were found to be equivalent to those of Blattabacterium 107 strains (data not shown). 108 109 Supplementary References 110 111 1. 112 genome characterization of the bacterial endosymbiont Blattabacterium cuenoti from the 113 fat bodies of cockroaches. BMC Res Notes. 2008;1:118. Tokuda G, Lo N, Takase A, Yamada Y, Watanabe H. Purification and partial 114 115 2. 116 et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 117 2005;437:376-380. Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A., 118 119 3. 120 mapping for efficient gap-closure of prokaryotic genome sequence assemblies. Nucleic 121 Acids Res 2005;33:W560-W566. van Hijum, S. A. F. T., Zomer, A. L., Kuipers, O. P. & Kok, J. Projector 2: contig 122 123 4. 124 Finn, R. D., Nawrocki, E. P., Kolbe, D. L., Eddy, S. R. & Bateman, A. Rfam: Wikipedia, Gardner, P. P., Daub, J., Tate, J., Moore, B. L., Osuch, I. H., Griffiths-Jones, S., 125 clans and the “decimal” release. Nucleic Acids Res. 2011;39: D141-D145. 126 127 5. 128 transfer-messenger RNA genes in nucleotide sequences. Nucleic Acids Res. 129 2002;30:3449-3453. Laslett, D., Canback, B. & Andersson, S. BRUCE: a program for the detection of 130 131 6. 132 Methods Mol Biol. 2009;537:39-64. 133 7. 134 Analyses with Thousands of Taxa and Mixed Models”. Bioinformatics. 2006; 22:2688- 135 2690. 136 8. 137 Gateway for inference of large phylogenetic trees" in Proceedings of the Gateway 138 Computing Environments Workshop (GCE), 14 Nov. 2010, New Orleans, LA pp 1 - 8. 139 9. 140 from multiple gene sequences indicates that termites evolved from wood-feeding 141 cockroaches. Curr Biol 2000;10:801-4. 142 10. 143 under mixed models. Bioinformatics 2003; 19:1572-1574. Katoh K, Asimenos G, Toh H. Multiple alignment of DNA sequences with MAFFT. Stamatakis A. RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Miller, M.A., Pfeiffer, W., and Schwartz, T. "Creating the CIPRES Science Lo N, Tokuda G, Watanabe H, Rose H, Slaytor M, Maekawa K, et al. Evidence Ronquist F & Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference 144 145 11. 146 Gomez-Valero, L., Lundin, D., Poole, A. M. & Andersson, S. G. E. Endosymbiont gene 147 functions impaired and rescued by polymerase infidelity at poly(A) tracts. PNAS. 148 2008;105:14934-14939. 149 150 151 152 153 Tamas, I., Wernegreen, J. J., Nystedt, B. r., Kauppinen, S. N., Darby, A. C., 154 Supplementary Figure Legends 155 156 Figure S1. Inferred metabolic reconstruction of Blattabacterium strain BPAA based on 157 genome data. Genes absent from predicted biosynthetic pathways are red. Orange, 158 blue, green, and purple blocks indicate sufficient gene content to recycle nitrogen 159 wastes and for production of indicated amino acids, metabolites, and vitamins, 160 respectively. Asterisks indicate essential amino acid pathways missing in CPU and 161 MADAR. 162 163 Figure S2. Phylogenetic relationship of BPAA (in bold) to other Blattabacterium cuenoti 164 strains as well as other members of the Flavobacteriales. Names of Blattabacterium 165 strains are shown with host species. The tree was inferred from an alignment of 13 166 protein coding genes from the strains shown. Bootstrap values for nodes <95% are 167 indicated; all unlabeled nodes were supported with bootstrap values at or above 95%. 168 Species for which only draft genomes are available are noted with asterisks. A 169 phylogenetic analysis of host species for each Blattabacterium strain shown based on 170 16S rRNA gene sequences provided an equivalent topology to that shown for 171 Blattabacterium strains (data not shown). 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 Figure S1 189 190 Figure S2