1 2 3 4 5 6 SUPPLEMENTARY INFORMATION 7 Thivierge a,b,c, Roger C. Levesquea,e, Steve J. Charettea,b,c# 8 a. Institut de biologie intégrative et des systèmes, Pavillon Charles-Eugène-Marchand, Université 9 Laval, 1030 avenue de la Médecine, Quebec City, QC, Canada, G1V 0A6 Increasing genomic diversity and evidence of constrained lifestyle evolution due to insertion sequences in Aeromonas salmonicida Antony T. Vincenta,b,c, Mélanie V. Trudela,b,c, Luca Freschia,e, Vandan Nagard, Cynthia Gagné- 10 b. Centre de recherche de l’Institut universitaire de cardiologie et de pneumologie de Québec 11 (Hôpital Laval), 2725 Chemin Sainte-Foy, Quebec City, QC, Canada, G1V 4G5 12 c. Département de biochimie, de microbiologie et de bio-informatique, Faculté des sciences et de 13 génie, Université Laval, 1045 avenue de la Médecine, Quebec City, QC, Canada G1V 0A6 14 d. Food Technology Division, Bhabha Atomic Research Centre, Mumbai, 400085, India 15 e. Département de microbiologie-infectiologie et immunologie, Faculté de médecine, Université 16 Laval, Quebec City, QC, Canada 17 18 # 19 Pavillon Charles-Eugène-Marchand, 1030 avenue de la Médecine, Université Laval, Quebec 20 City, QC, Canada G1V 0A6 21 Telephone: 418-656-2131, ext. 6914, Fax: 418-656-7176 22 steve.charette@bcm.ulaval.ca Corresponding author: Steve J. Charette, Institut de Biologie Intégrative et des Systèmes (IBIS), 23 24 1 25 Supplementary Experimental Procedures 26 27 Phylogenetic analyses 28 To perform a robust core genome phylogeny, we wrote an in-house Perl script called 29 CoreFinder.pl that relies on BioPerl modules [1] to find the genes involved in the core genome. 30 The script uses coding sequences extracted from a GenBank file and sequentially performs 31 tblastn [2] searches in fasta or multi-fasta (for draft genomes) files (Figure S1). We used A. 32 hydrophila ATCC 7966T [3], which is the A. hydrophila type strain, as a reference. The genome 33 of this strain has been well studied and has a high-quality annotation. The others aeromonads 34 used in the present study are listed in Table S1. The parameter used to search the CDSs was at 35 least a 85% query cover for various similarity percent (25% to 100%, with 5% steps). The 36 graphical interpretation of the results revealed three linear sections and two breakpoints estimated 37 at 40% and 80% similarity (Figure S2). To verify the importance of this parameter (e.g., the 38 similarity percent) with respect to the final phylogeny, we performed all subsequent analyses (as 39 indicated in the main manuscript) at 40% and 80% similarity. 40 41 To choose the most appropriate phylogenetic model, the Akaike Information Criterion (AIC) and 42 the Bayesian Information Criterion (BIC) were computed using jModelTest version 2.1.7 [4] for 43 both matrixes. In both cases, while the best-fit model was GTR+Γ closely followed by GTR+I+Γ 44 (Table S2), there was no significant difference between the two. However, as discussed and 45 reviewed elsewhere [5], the consideration of a rate class with a rate zero caused by invariable 46 sites is meaningless since the α parameter, which governs the shape of the gamma distribution, 47 already allows low-rate sites through an L-shaped gamma distribution caused by an α < 1. 48 Moreover, the use of the mixture model +I+Γ might result in an over-parameterization since it 2 49 would be difficult to optimize both parameters. We thus used the GTR+Γ model for both 50 matrixes at 40% and 80% similarity. 51 52 Species relatedness was inferred by average nucleotide identity (ANI) analyses for key taxa using 53 JSpecies version 1.2.1 [6]. MUMmer version 3.23 [7] was used to perform the analyses since it 54 provides more robust results for genomes sharing a high level of similarity (ANI > 90%) than 55 blast searches [6]. Two taxa were considered to be members of the same species if they shared an 56 ANI ≥ 96%, a value that is well adapted to the aeromonads [8]. 57 58 Bacterial growth at 7°C 59 The Indian isolates (Y577, Y567 and Y47) as well as A. salmonicida subsp. pectinolytica 60 (34melT), A. salmonicida subsp. smithia (JF4097), A. salmonicida subsp. masoucida (NBRC 61 13784T), and A. salmonicida subsp. salmonicida (01-B526) were inoculated on furunculosis agar 62 or on tryptic soy agar (TSA) from frozen stocks and were grown at 18°C for 24 to 48 h. The 63 isolates were then inoculated in 3 ml of lysogeny broth (LB) and were incubated at 7°C overnight 64 with shaking at 200 rpm. The turbidity was adjusted to an optical density of 0.1 at 595 nm 65 (OD595), and the cultures were incubated at 7°C with shaking at 200 rpm. The ODs were read 66 systematically every hour for 8 h. The experiment was performed in triplicate. 67 68 PCR assays 69 We performed PCR assays using previously published conditions [9] to verify whether the pAsa5 70 plasmid of strain RS 534 had lost its type three secretion system (TTSS) by the recombination of 71 ISAS11B and ISAS11C [10]. 72 3 73 Plasmid characterization 74 The contigs for the strains sequenced in the present study were locally mapped on the 75 chromosome sequence of the A. salmonicida reference strain A449 [11], the only A. salmonicida 76 strain with a fully assembled chromosome, using CONTIGuator version 2.7.4 [12]. Identity 77 searches of the unmapped contig sequences were performed by blast searches against the NCBI 78 nr/nt database. Sequence manipulations were performed using the bioinformatics package 79 EMBOSS version 6.6.0.0 [13]. 80 81 The plasmid sequences that were discovered were automatically annotated by the RAST 82 webserver [14]. All the putative CDSs were manually curated by performing blastp searches 83 against the NCBI nr/nt database. Putative toxin-antitoxin systems were found by TAfinder [15]. 84 85 The average copy number of each new plasmid for each chromosome was calculated using the 86 sequencing depth, a procedure that has been successfully used in the past [16]. We filtered the 87 sequencing reads using Trimmomatic version 0.32 [17] with the parameters suggested in the 88 manual. The resulting filtered sequencing reads were mapped on the gyrB gene (single copy per 89 chromosome) with CUSHAW3 version 3.0.3 [18] without allowing any mismatches in order to 90 avoid cross-mapping from reads related to other genes. The reads were also mapped on the 91 plasmid sequences. The average coverages were calculated using Qualimap version 2.1.1-dev 92 [19]. 93 94 Biochemical tests 95 Three Indian A. salmonicida isolates (Y47, Y567 and Y577) were further phenotypically 96 characterized using a set of biochemical tests as described by Pavan et al. [20] and Abbott et al. 4 97 [21]. All tests were performed in triplicate and according to conventional protocols with suitable 98 positive and negative controls and incubated at 35°C for 48 – 72 h (unless mentioned). The tests 99 for carbohydrate fermentation and extracellular enzymes were read daily for 7 days; whereas, 100 tests for Voges-Proskauer, polypectate degradation and production of brown diffusible pigment 101 on tryptic soya agar (TSA) were incubated at 25°C for 2 to 4 days. 102 5 103 Supplementary results 104 Sequencing results 105 Despite the stable average coverage of the assemblies, the N50 values, which are an indicator of 106 contigs length, varied considerably (Table S3). For example, the Y567 Indian strain had a N50 107 value two times higher than the other strains while JF4097 (smithia) had a lower N50 value and 108 the smallest large contig. Large repeated elements such as ISs, duplicated genes, and ribosomal 109 RNA clusters cause contig breaks during de novo assembly [22], which suggests that the A. 110 salmonicida subsp. smithia genome contained numerous large repeated elements. 111 112 Molecular phylogeny optimization 113 At 40% similarity (of the translated sequences), the core genome was estimated at 1645 genes 114 compared to 1190 genes at 80%. The functional categories of the genes (at 40 and 80%) were 115 found using an in-house Perl script as explained in the main manuscript to verify whether there 116 was an enrichment of one or more categories. There were major differences in the relative 117 abundance of the functional categories at 40% and 80% similarity in only two categories (J: 118 translational, ribosomal structure and biogenesis and K: transcription), indicating that the 119 gains/losses were uniform in the other categories (Figure S3). The relative importance of the J 120 category at 80% is higher than at 40%, which is consistent with this conserved process. The high 121 relative importance for the K category at 40% is in accordance with the capacity of various 122 aeromonads to react to a wide diversity of stimuli. 123 124 The basic features of the phylogenetic analyses are presented in the Table S4. The matrix at 80% 125 similarity had 35% fewer sites than the matrix at 40% similarity. In both cases, the values of the 126 alignment patterns, which are the numbers of different patterns in the matrixes, corresponded to 6 127 approximately 60% of the total number of sites. There was no significant difference between the 128 α parameters of the two phylogenetic analyses as estimated by RAxML, meaning that the 35% 129 more sites at 40% similarity shared the same rate as the other sites. 130 131 There were differences between the resulting trees in terms of bootstrap values and topology 132 (Figures S4 and S5). In fact, the phylogenetic analysis at 80% similarity had the weakest 133 bootstrap values (Figure S4), indicating that the 35% more sites obtained at 40% similarity are 134 important for obtaining a more robust tree (Figure S5). The topology diverged for the clade 135 containing Aeromonas veronii. This observation was understandable since this clade had a weak 136 bootstrap value, even with the matrix at 40% similarity (Figure S5). Based on the bootstrap 137 values, we thus believe that the tree based on the core genome found at 40% similarity more 138 accurately represents the true evolution links between the taxa, which is why we used this 139 phylogenetic tree for the remainder of the study. 140 141 Phylogenetic position of A. salmonicida 142 As mentioned in the main manuscript, the molecular phylogeny of the present paper revealed that 143 A. salmonicida CBA100, a recently deposited Chilean strain [23], is phylogenetically closer to A. 144 bestiarum than to A. salmonicida (Figure S5). To verify the relatedness of the CBA100 strain and 145 A. bestiarum, the average nucleotide identity (ANI) values were computed for some key taxa 146 (Figure S6). The fact that the ANI value between CBA100 and A. bestiarum is above 96% 147 reinforce the close evolutionary link between both taxa and let believe at a miss-classification of 148 CBA100. 149 7 150 Strain Y577 shared a clade with A. salmonicida subsp. pectinolytica. Based strictly on the 151 molecular phylogeny and the ANI values, we cannot rule out the possibility that Y577 is in fact 152 A. salmonicida subsp. pectinolytica. As previously published, the pectinolytica subspecies is the 153 only known aeromonad with pectinase activity [20]. Interestingly, all the genes in the 154 pectinolytica subspecies that are needed to degrade and use pectin as a carbon source [24] were 155 found in the genome of Y577. Given this, the pectinase activity of Y577 was verified and was 156 confirmed experimentally (Table S5). It is tempting to suggest that Y577 is a member of the 157 pectinolytica subspecies or a new subspecies sharing a near common ancestor. However, the 158 overall chromosomal organizations of Y577 and the pectinolytica subspecies strains appeared 159 divergent (Figure 1, main manuscript), which is unusual for such closely related strains given the 160 chromosomal uniformity of salmonicida subspecies strains. The results of some other 161 biochemical tests also diverged (Table S5), suggesting that strains Y577 and A. salmonicida 162 subsp. pectinolytica 34melT may not belong to the same subspecies. 163 164 Strains Y47 and Y567 formed a basal clade to the masoucida and salmonicida subspecies. 165 However, like the relation between Y577 and the pectinolytica 34melT strain, we cannot infer that 166 Y47 and Y567 belong to the same subspecies based solely on the molecular phylogeny and the 167 ANI values, especially since there were also macro-chromosomal differences between the two 168 strains. If they belong to the same subspecies, this would indicate that they display significant 169 genomic plasticity. There are also differences between many of the biochemical test results 170 (Table S5), which also points to a potential taxonomic difference. Surprisingly, Y47 and Y567 171 were also pectinolytic. While 34melT and Y577 bore genes coding for three lyases involved in the 172 first step of pectin degradation, Y567 and Y47 did not. The genomes of strains Y567, Y47, and 173 Y577 (as a positive control) were annotated using the RAST webserver [25] to verify whether 8 174 they possessed a subsystem related to pectin degradation. The annotation of Y577 contained the 175 three lyases (EC 4.2.2.2, EC 4.2.2.6 and EC 4.2.2.9) in the “D-galacturonate and D-glucuronate 176 utilization” subsystem while the annotations of Y47 and Y567 did not contain any enzymes 177 involved in pectin degradation. The pectinase activities of Y47 and Y567 likely involved an 178 unknown pathway and are potentially the result of convergent evolution (i.e., when compared to 179 the strains pectinolytica 34melT and Y577). This result is interesting since it evokes that 180 pectinolytic activity could be important for mesophilic A. salmonicida. 181 182 Bacterial growth at 7°C 183 The capacity of various A. salmonicida isolates to grow at 7°C was tested in addition to 18°C and 184 37°C (main manuscript). The same trend as at 18°C was observed, with the mesophilic strains 185 growing more efficiently than the psychrophilic ones (Figure S7). The isolate JF4097 of the 186 subspecies smithia was not able to grow at this temperature. This was expected knowing that this 187 isolate had a weak growth capacity at 18°C (main manuscript). 188 189 Investigation of the plasmidome 190 The putative chromosomal contigs were removed and the remaining contigs were analyzed in 191 order to investigate the plasmidome of the strains for which the DNA was sequenced in the 192 present study. This resulted in the identification of three small cryptic plasmids in Indian strain 193 Y47 (Figure S8). To our knowledge, this was the first time that these plasmids had been found. 194 We named them pY47-1, pY47-2, and pY47-3 and deposited their sequences in GenBank under 195 accession numbers KT334396, KT334397, and KT334398, respectively. There were no clear 196 known functions associated with these plasmids. All bore a putative type II toxin-antitoxin 197 maintenance system and/or a phage resistance mechanism [26]. The plasmids pY47-2 and pY47- 198 3 are ColE2-type replicon plasmids with a short RNA (RNA I) replication regulator [27]. 9 199 Interestingly, a blastn (word size of 11) search revealed sequence identity and structural 200 similarity between pY47-3 and the ColE2-type replicon plasmids pAQ2-1 and pAQ2-2 in 201 Aeromonas sobria and Aeromonas hydrophila, respectively [28]. However, unlike these 202 plasmids, pY47-3 did not bear the qnrS2 quinolone resistance gene. 203 204 The high sequencing depth provided by the Illumina technology was used to infer the average 205 copy number per chromosome of each plasmid. As it has been reported elsewhere [29], high copy 206 numbers of ColE2-type replicon plasmids are maintained per cell ( 24 copies for pY47-2 and 207 13 copies for pY47-3). The plasmid pY47-1, for which the incompatibility group is unknown, 208 also had a high copy number (22 copies). It is important to mention that inferring relative 209 plasmid copy numbers has an inherent bias since it is assumed that there is a single copy of the 210 chromosome in each cell, which is not true after the replication. However, the results showed that 211 these plasmids were maintained at much higher copy numbers than the bacterial chromosome. No 212 plasmids were found in strain Y567 while Y577 harbored a pY47-3 plasmid that shared more 213 than 99% identity with the one in Y47 (7 point mutations). 214 215 A plasmid maintained at a high copy number (~40 copies/cell) was also found in A. salmonicida 216 subsp. smithia JF4097 and was subsequently named pJF4097 and its sequence deposited in 217 GenBank under the accession number: KT334395 (Figure S9). pJF4097 bears the mobABCD 218 genes, which are related to mobilization, an ISAS11, and a gene encoding an ExoY-like protein, 219 which is a type-three secretion system (TTSS) effector in the human pathogen Pseudomonas 220 aeruginosa [30]. 221 10 222 The A. salmonicida subsp. salmonicida RS 534 strain harbors the same five plasmids as the A449 223 reference strain [11], including the large plasmid pAsa4, which encodes many drug resistance 224 genes; pAsa5, which normally bears the type-three secretion system, and the pAsa1, pAsa2, and 225 pAsa3 cryptic plasmids [31]. Basic bioinformatics analyses showed that the pAsa5 plasmid of the 226 RS 534 strain has lost its TTSS. It is known that this region is bordered by two ISAS11s (B and 227 C) and that growth above 25°C may result in the recombination of the two ISAS11s and the loss 228 of the TTSS [9,10]. We confirmed by PCR that the TTSS was lost by a recombination of 229 ISAS11B and C (Figure S10). 230 231 The pan-genome analyze 232 We used an in-house Perl script as indicated in the main manuscript to find the pan-genome of A. 233 salmonicida. The resulting binary matrix (i.e., presence/absence) was used to map the characters 234 (i.e., the genes) on a phylogenetic tree based on the core genome (Figure S11A). This analysis 235 made it possible to determine which genes were acquired or lost during evolution and, 236 consequently, may have played a role in the adaption of a given strain. As indicated in the main 237 manuscript, three functional categories (K, N and X) at branch 1 experienced many events (i.e., 238 gains and losses) (Figure S11B). The L, R, T and U categories have also acquired and lost many 239 genes, but this can more likely be attributed to general rather than mesophilic-to-psychrophilic 240 evolution. In the case of branch 2 (Figure S11C), the three functional categories exhibiting most 241 important changes are energy production and conversion (C) (only losses for this category), 242 carbohydrate transport and metabolism (G), replication, recombination, and repair (L). 243 Interestingly only gains have been detected for the category related to the mobilome (X). 244 11 245 Unfortunately, it was impossible to assign a cluster of orthologous groups (COGs) at 45,4 and 246 59,1% of the genes for the branches 1 and 2, respectively and, consequently, to infer their 247 functional categories. This highlights a drawback of bioinformatics analyses and their 248 dependence on incomplete and poorly curated databases. 249 250 Functional categories of the genes under positive selection in the mesophilic lineages 251 A total of 322 genes in the A. salmonicida lineages appear to be under positive selection for 252 various lineages among the salmonicida species, including 241 that were specific to at least one 253 mesophilic lineage. We used a COG assignment of these 241 genes to find their relative 254 functional categories (Figure S12). Many categories in the mesophilic lineages were under 255 positive selection, indicating that these lineages may have a high evolutionary potential. 256 12 257 258 259 260 261 Table S1. Aeromonads used in the study. Species Strain Accession no. Reference T A. allosaccharophila CECT 4199 CDBR00000000 [8] A. allosaccharophila BVH88 CDCB00000000 [8] A. australiensis CECT 8023T CDDH00000000 [8] T A. bestiarum CECT 4227 CDDA00000000 [8] A. bivalvium CECT 7113T CDBT00000000 [8] A. caviae CECT 838T CDBK00000000 [8] A. dhakensis CIP 107500 CDBH00000000 [8] A. diversa CECT 4254T CDCE00000000 [8] A. encheleia CECT 4342T CDDI00000000 [8] A. enteropelogenes CECT 4487T CDCG00000000 [8] A. eucrenophila CECT 4224T CDDF00000000 [8] A. fluvialis LMG 24681T CDBO00000000 [8] A. hydrophilaa ATCC 7966T CP000462 [3] T A. jandaei CECT 4228 CDBV00000000 [8] A. media CECT 4232T CDBZ00000000 [8] A. molluscorum 848T AQGQ00000000 [32] T A. piscicola LMG 24783 CDBL00000000 [8] A. popoffii CIP 105493T CDBI00000000 [8] A. rivuli DSM 22539T CDBJ00000000 [8] A. salmonicida subsp. salmonicida A449 CP000644 [11] A. salmonicida subsp. salmonicida 01-B526 AGVO01000000 [33] A. salmonicida subsp. salmonicida RS534 JYFF00000000 This study A. salmonicida subsp. salmonicida JF3224 JXTA00000000 [9] A. salmonicida subsp. salmonicida CIP 103209 CDDW00000000 [8] A. salmonicida subsp. salmonicida 2009-144K3 JRYV00000000 [34] A. salmonicida subsp. salmonicida 2004-05MF26 JRYW00000000 [34] A. salmonicida CBA100 JPWL00000000 [23] A. salmonicida subsp. achromogenes AS03 AMQG00000000 [35] A. salmonicida subsp. smithia JF4097 JZTI00000000 This study T A. salmonicida subsp. pectinolytica 34mel ARYZ00000000 [36] A. salmonicida subsp. masoucida NBRC 13784T BAWQ00000000 N/Ab A. salmonicida Y47 JZTF00000000 This study A. salmonicida Y567 JZTG00000000 This study A. salmonicida Y577 JZTH00000000 This study A. sanarellii LMG 24682T CDBN00000000 [8] A. schubertii CECT 4240T CDDB00000000 [8] T A. simiae CIP 107798 CDBY00000000 [8] A. sobria CECT 4245T CDBW00000000 [8] A. species AH4 ERX552948c [8] A. species AMC34 AGWU00000000 N/A A. taiwanensis LMG 24683T CDDD00000000 [8] A. tecta CECT 7082T CDCA00000000 [8] T A. veronii CECT 4257 CDDK00000000 [8] a: This strain was used as a model to find the genes involved in the core genome. b: N/A means that no publication is associated with the sequence. c: Only the sequencing reads were available via the SRA database for A. species AH4. The reads were de novo assembled as indicated in the “Methods of the main manuscript” section. 13 262 Table S2. The five best models and their –InL, AIC, and BIC values. Similarity 40% Model GTR+Γ GTR+I+Γ HKY+Γ HKY+I+Γ SYM+Γ -lnL 12827871 12827940 12832779 12832849 13005302 AIC 25655928 25656069 25665737 25665878 26010784 80% BIC 25656993 25657146 25666756 25666909 26011815 -lnL 7941102 7941148 7944260 7944305 8071711 AIC 15882391 15882484 15888698 15888791 16143602 BIC 15883417 15883521 15889679 15889783 16144595 263 264 14 265 Table S3. Assembly results. Strains Contigs Largest contigs (kbp) N50 (kbp) Average coverage Assembly size (Mbp) A449 fraction (%)a 266 267 268 Y47 Y567 Y577 JF4097 RS 534 118 395.27 117.77 66.92 4,710233 85.077 47 448.10 217.34 68.03 4,554847 85.607 104 383.04 101.76 78.53 4,736410 83.731 344 109.61 28.95 88.72 4,307768 84.059 123 382.43 119.02 62.59 4,889640 97.599 a: The chromosome sequence of the strain A449 (A. salmonicida subsp. salmonicida) [11] was used. This feature was found using QUAST version 3.1 [37]. 15 269 Table S4. Phylogenetic features. Similarity percent 40% 80% 1645 1190 Genes 696,249 454,574 Sites Alignment patterns 420,006 271,519 GTR+Γ GTR+Γ Best model 1.703415 1.686099 α parameter 270 16 271 Table S5. Biochemical tests used for the mesophilic A. salmonicida strains. Biochemical tests Strains Y577 Y47 Y567 Indole (35°C) + + + + ONPG + + + + VP (25°C) + + + + Simmons citrate + + + + Esculin hydrolysis + + Polypectate degradation (25°C) + + + + Motility (35°C) + + + Brown pigment (25°C) + Growth (37°C) + + + + Dnase + + + + Lipase + + + + Gelatinase + + + + H2S VP (35°C) ODC LDC + + + ADH + + + Urease Cellobiose + + + Salicin + + Sorbitol + + + + Rhamnose Mannitol + + + + Sucrose + + + + Glucose (gas) + + + + L-Arabinose + + + + Lactose + + + + Glycose + + + + Inositol Melibiose Glu + + + + Amygdalin + + Hemolysin (sheep, horse)a + + + + a: These results are from [20]. We have used horse blood agar for assessing hemolysis; whereas, Pavan et al. (2000) [20] have used sheep blood agar plates. 34melT a 272 273 274 17 275 Initial GenBank file (Genome 1) CDS extraction tblastn List of CDSs 276 277 278 279 Sequence(s) (Genome 2) ... Sequence(s) (Genome n) Sequence(s) (Genome n-1) tblastn List of CDSs List of CDSs Core genome Translated sequences List of CDSs with known biological function Figure S1. Conceptual schematization of the in-house CoreFinder.pl Perl script. 18 Core genome(genes) 1500 1000 500 0 100 280 281 282 283 284 285 286 85 70 55 40 25 similarity (%) Figure S2. Number of genes involved in the core genome based on the similarity percent used with the CoreFinder.pl script. The blue dots at 40% and 80% indicate the similarity percent used to perform the optimization analyses. 19 287 288 289 290 291 Figure S3. Relative abundance of 26 functional categories for genes used to construct the phylogenetic matrixes at 40 and 80% similarity. 20 292 293 294 295 296 297 298 299 300 301 Figure S4. (A) Molecular core genome phylogeny of 43 aeromonads inferred from the sequences of 1190 genes (determined using the 80% similarity) by maximum-likelihood using the GTR+Γ model and a 1000 rapid bootstrap analysis. Only bootstrap values under 100 are shown. For clarity, the bootstrap values have been removed for the taxa of the salmonicida species. The mesophilic strains are in red while the psychrophilic strains are in blue. (B) Zoom of salmonicida species with equal branch lengths. Only bootstrap values under 100 are shown. The mesophilic, intermediate, and psychrophilic strains are shown in red, purple, and blue, respectively. 21 .p e 56 A. s -Y A. ila sis i 7 sY5 77 ss ub sp o ff ii A. A. p op A . pi sc ic ol a A. s - CBA100 m ph en s ae dro ne nd ak ge ja sis A .s 47 -Y u ss A. s A. ve ron A. sa cc ha ro ph lo ien dh pe A. ro iali ial hy te fluv A. str H4 s-A A. en A. au i ar u A. be s t ie pec A. s A. A. ctin oly tica 302 bs s ma p. ub ss A. ila - BV H8 8 sp. subs A. s 96 ii ou sm A. s sub sp 34 ch el a ei A. s su su p. bs sa si m iu m ia e A. omo gene s 52 6 id a - 01 -B lm p. on bs sa ic p. lm id a sa on -J s a lm o n . sa lm lm ici F3 on da 22 o n ic icid ic id a 2 0 0 9 -1 id a aRS -C IP 10 32 - A4 4 53 44K3 9 4 09 4 div sa sch er A. ube rtii um A. mol l uscor A. r i v u l i rel an na sa A. iw ta A. bs s ss u A. a lv A . b iv 0.08 A. me d ia lii A n .e A. A. c avi ae en op la hi en cr A. ta s tec si A. eu A. 304 305 306 307 308 309 310 a chr p. a A. s su bsp. b ri a A. so 303 ithi A. s subsp. salmonic ida - 2004-05M F26 T 4199 A. sacc haro phila - CEC C - AM ie s pec A. s a ic p. sa lm on A. s su bs Aeromonads 96 cid Figure S5. Molecular phylogeny of 43 aeromonads inferred from 1645 core genes by maximumlikelihood using the GTR+Γ model. Only bootstrap values under 100 are shown in this figure. All the bootstrap values for the salmonicida subspecies are given on Figure 1 (main article) for clarity. The red branches correspond to mesophilic taxa, the purple branch corresponds to intermediate taxon and the blue branch corresponds to psychrophilic taxa. The strain numbers are shown only when there are two taxa from the same species or subspecies. 22 312 313 314 315 316 --- A. s CBA100 97.49 --- A. s subsp. pectinolytica 90.54 90.62 --- A. s Y577 90.51 90.63 97.52 --- A. s Y567 90.69 90.83 97.14 97.11 --- A. s Y47 90.63 90.78 97.11 97.13 97.64 --- A. s subsp. masoucida A. s subsp. smithia A. s subsp. achromogenes A. s subsp. salmonicida 90.52 90.59 97.08 97.08 97.59 97.47 --- 90.56 90.71 97.08 97.09 97.58 97.47 99.70 --- 90.54 90.75 97.11 97.11 97.59 97.50 99.68 99.62 --- 90.52 90.75 97.06 97.06 97.55 97.46 99.75 99.69 99.64 --- A. bestiarum A. s CBA100 A. s subsp. pectinolytica A. s Y577 A. s Y567 A. s Y47 A. s subsp. masoucida A. s subsp. smithia A. s subsp. achromogenes A. s subsp. salmonicida 311 A. bestiarum Figure S6. Average nucleotide identity (ANI) analyses for some A. salmonicida subspecies included in this study. A. bestiarum is also included for comparative purposes with A. salmonicida CBA100. Two taxa were considered as belonging to the same subspecies if they shared an ANI value ≥ 96 (yellow and green). 23 0.6 A. salmonicida O.D (595 nm) pectinolytica Y577 Y567 0.4 Y47 masoucida 01-B526 0.2 0.0 0 317 2 4 6 8 Time (h) 318 Figure S7. Growth curves at 7°C for selected A. salmonicida subspecies. The growth curves 319 were determined three times in independent experiments. The means of three replicates with 320 standard error of the mean are shown for each subspecies. 321 24 repB relE repA relB RNA I mob chemotaxis protein Origin of replication repB pY47-1 pY47-2 12 495 bp 6 042 bp repB parB chemotaxis protein parA mob acyltransferase parD parE repA GTP-binding protein RNA I pY47-3 ccdB Origin of replication 5 104 bp ccdA mobC mobD mobB mobA 322 323 324 325 326 327 Figure S8. The three high-copy plasmids found in the Indian strain Y47. The pY47-3 plasmid was also found in the Indian strain Y577. The blue arrows represent genes with a known function, the green arrows represent genes encoding hypothetical proteins, and the black arrow represents the putative RNA regulator. 25 mobD mobA mobB tnp ISAS11 pJF4097 6 231 bp mobC RNA I exoY-like RNA II 328 329 330 331 332 333 Figure S9. The high-copy plasmid pJF4097 found in A. salmonicida subsp. smithia. The blue arrows represent genes with a known function, the green arrows represent genes encoding hypothetical proteins, the black arrows represent the putative RNAs regulator, and the grey rectangle represents the ISAS11. 26 1 2 3 4 334 335 336 337 338 339 Figure S10. Result of the PCR assay confirming that the RS 534 strain lost its TTSS by the recombination of two ISAS11s (B-C rearrangement [10]). The wells are as follows: (1) 2-log DNA ladder (New England Biolabs), (2) RS 534, (3) JF3224 (positive control), and (4) 01-B526 (negative control). 27 1700 1600 1500 148/547 pectinolytica salmonicida achromogenes masoucida smithia A 370/348 1400 1300 1200 572/279 489/412 1100 514/335 Y47 Y567 Y577 1000 900 25/326 800 700 212/149 70/45 117/156 217/235 1 300 100 66/34 0 popoffii (outgroup) B Gain 0.09 Relative importance 500 200 46/119 Loss 1 0.06 0.03 0.00 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Functional category 2 Gain Loss 0.10 Relative Importance 600 400 201/235 44/262 C 2 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z RNA processing and modification Chromatin structure and dynamics Energy production and conversion Cell cycle control, cell division, chromosome partitioning Amino acid transport and metabolism Nucleotide transport and metabolism Carbohydrate transport and metabolism Coenzyme transport and metabolism Lipid transport and metabolism Translation, ribosomal structure and biogenesis Transcription Replication, recombination and repair Cell wall/membrane/envelope biogenesis Cell motility Posttranslational modification, protein turnover, chaperones Inorganic ion transport and metabolism Secondary metabolites biosynthesis, transport and catabolism General function prediction only Function unknown Signal transduction mechanisms Intracellular trafficking, secretion, and vesicular transport Defense mechanisms Extracellular structures Mobilome: prophages, transposons Nuclear structure Cytoskeleton 0.05 0.00 340 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Functional category 28 341 342 343 344 345 346 347 Figure S11. Pan-genome analysis of selected A. salmonicida subspecies, including A. popoffii as an outgroup. (A) Distribution of the pan-genome on a phylogenetic tree for some key taxa. The phylogenetic tree was based on the tree found using the core genome. The green and black values indicate the number of genes acquired and lost, respectively, for the specific branch using the parsimonious Dollo model. The branch lengths represent the number of genes acquired or lost. For A. salmonicida subsp. salmonicida the strain used was 01-B526. Relative importance of 26 functional categories for the genes implicated in branches 1 (B) and 2 (C). 29 348 Number of genes 30 20 10 0 A B C D E F G H 349 350 351 352 I J K L M N O P Q R S T U V W X Y Z Functional category A B C D E F G H I J K L M N O P Q R S T U V W X Y Z RNA processing and modification Chromatin structure and dynamics Energy production and conversion Cell cycle control, cell division, chromosome partitioning Amino acid transport and metabolism Nucleotide transport and metabolism Carbohydrate transport and metabolism Coenzyme transport and metabolism Lipid transport and metabolism Translation, ribosomal structure and biogenesis Transcription Replication, recombination and repair Cell wall/membrane/envelope biogenesis Cell motility Posttranslational modification, protein turnover, chaperones Inorganic ion transport and metabolism Secondary metabolites biosynthesis, transport and catabolism General function prediction only Function unknown Signal transduction mechanisms Intracellular trafficking, secretion, and vesicular transport Defense mechanisms Extracellular structures Mobilome: prophages, transposons Nuclear structure Cytoskeleton Figure S12. Functional categories of the genes under positive selection in the A. salmonicida mesophilic lineages. 30 353 References 354 355 1. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002;12:1611–8. 356 357 358 2. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. . 1997;25 :3389–402. 359 360 3. Seshadri R, Joseph SW, Chopra AK, Sha J, Shaw J, Graf J, et al. Genome sequence of Aeromonas hydrophila ATCC 7966T: Jack of all trades. J. Bacteriol. 2006;188:8272–82. 361 362 4. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods. 2012;9:772–772. 363 364 365 5. Jia F, Lo N, Ho SYW. The impact of modelling rate heterogeneity among sites on phylogenetic estimates of intraspecific evolutionary rates and timescales. PLoS One. 2014;9:e95722. 366 367 6. Richter M, Rosselló-Móra R. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U. S. A. 2009;106:19126–31. 368 369 7. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:R12. 370 371 372 8. Colston SM, Fullmer MS, Beka L, Lamy B, Gogarten JP. Bioinformatic Genome Comparisons for Taxonomic and Phylogenetic Assignments Using Aeromonas as a Test Case. MBio. 2014;5:1–13. 373 374 375 9. Emond-Rheault J-G, Vincent AT, Trudel M V, Frey J, Frenette M, Charette SJ. AsaGEI2b: a new variant of a genomic island identified in the Aeromonas salmonicida subsp. salmonicida JF3224 strain isolated from a wild fish in Switzerland. FEMS Microbiol. Lett. 2015;362:fnv093. 376 377 378 10. Tanaka KH, Dallaire-Dufresne S, Daher RK, Frenette M, Charette SJ. An Insertion SequenceDependent Plasmid Rearrangement in Aeromonas salmonicida Causes the Loss of the Type Three Secretion System. PLoS One. 2012;7:e33725. 379 380 381 11. Reith ME, Singh RK, Curtis B, Boyd JM, Bouevitch A, Kimball J, et al. The genome of Aeromonas salmonicida subsp. salmonicida A449: insights into the evolution of a fish pathogen. BMC Genomics. 2008;9:427. 382 383 12. Galardini M, Biondi EG, Bazzicalupo M, Mengoni A. CONTIGuator: a bacterial genomes finishing tool for structural insights on draft genomes. Source Code Biol. Med. 2011;6:11. 31 384 385 13. Rice P, Longden I, Bleasby A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000;16:276–7. 386 387 14. Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA, et al. The RAST Server: rapid annotations using subsystems technology. BMC Genomics. 2008;9:75. 388 389 15. Shao Y, Harrison EM, Bi D, Tai C, He X, Ou HY, et al. TADB: A web-based resource for Type 2 toxin-antitoxin loci in bacteria and archaea. Nucleic Acids Res. 2011;39:D606–11. 390 391 392 393 16. Rasko DA, Rosovitz MJ, Økstad OA, Fouts DE, Jiang L, Cer RZ, et al. Complete sequence analysis of novel plasmids from emetic and periodontal Bacillus cereus isolates reveals a common evolutionary history among the B. cereus-group plasmids, including Bacillus anthracis pXO1. J. Bacteriol. 2007;189:52–64. 394 395 17. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20. 396 397 18. Liu Y, Popp B, Schmidt B. CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding. PLoS One. 2014;9:e86869. 398 399 400 19. García-Alcalde F, Okonechnikov K, Carbonell J, Cruz LM, Götz S, Tarazona S, et al. Qualimap: Evaluating next-generation sequencing alignment data. Bioinformatics. 2012;28:2678–9. 401 402 403 20. Pavan ME, Abbott SL, Zorzópulos J, Janda JM. Aeromonas salmonicida subsp. pectinolytica subsp. nov., a new pectinase- positive subspecies isolated from a heavily polluted river. Int. J. Syst. Evol. Microbiol. 2000;50:1119–24. 404 405 21. Abbott SL, Cheung WKW, Janda JM. The genus Aeromonas: Biochemical characteristics, atypical reactions, and phenotypic identification schemes. J. Clin. Microbiol. 2003;41:2348–57. 406 407 22. Vincent AT, Boyle B, Derome N, Charette SJ. Improvement in the DNA sequencing of genomes bearing long repeated elements. J. Microbiol. Methods. 2014;107:186–8. 408 409 410 23. Valdes N, Espinoza C, Sanhueza L, Gonzalez A, Corsini G, Tello M. Draft Genome Sequence of the Chilean isolate Aeromonas salmonicida strain CBA100. FEMS Microbiol. Lett. 2015;362:fnu062. 411 412 413 24. Pavan ME, Pavan EE, López NI, Levin L, Pettinari MJ. Living in an extremely polluted environment: clues from the genome of melanin-producing Aeromonas salmonicida subsp. pectinolytica 34melT. Appl. Environ. Microbiol. 2015;81:5235–48. 414 415 416 25. Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T, et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 2014;42:D206–14. 32 417 418 26. Samson JE, Magadán AH, Sabri M, Moineau S. Revenge of the phages: defeating bacterial defences. Nat. Rev. Microbiol. 2013;11:675–87. 419 420 27. Sugiyama T, Itoh T. Control of ColE2 DNA replication: in vitro binding of the antisense RNA to the Rep mRNA. Nucleic Acids Res. 1993;21 :5972–7. 421 422 423 28. Han JE, Kim JH, Choresca JH, Shin SP, Jun JW, Chai JY, et al. First description of ColEtype plasmid in Aeromonas spp. carrying quinolone resistance (qnrS2) gene. Lett. Appl. Microbiol. 2012;55:290–4. 424 425 29. Horii T, Itoh T. Replication of ColE2 and ColE3 plasmids: The regions sufficient for autonomous replication. Mol. Gen. Genet. MGG. 1988;212:225–31. 426 427 428 30. Yahr TL, Vallis AJ, Hancock MK, Barbieri JT, Frank DW. ExoY, an adenylate cyclase secreted by the Pseudomonas aeruginosa type III system. Proc. Natl. Acad. Sci. U. S. A. 1998;95:13899–904. 429 430 31. Boyd J, Williams J, Curtis B, Kozera C, Singh R, Reith M. Three small, cryptic plasmids from Aeromonas salmonicida subsp. salmonicida A449. Plasmid. 2003;50:131–44. 431 432 433 32. Spataro N, Farfán M, Albarral V, Sanglas A, Lorén JG, Fusté MC, et al. Draft Genome Sequence of Aeromonas molluscorum Strain 848TT, Isolated from Bivalve Molluscs. Genome Announc. 2013;1:e00382–13. 434 435 436 33. Charette SJ, Brochu F, Boyle B, Filion G, Tanaka KH, Derome N. Draft genome sequence of the virulent strain 01-B526 of the fish pathogen Aeromonas salmonicida . J. Bacteriol. 2012;194:722–3. 437 438 439 34. Vincent AT, Tanaka KH, Trudel M V, Frenette M, Derome N, Charette SJ. Draft genome sequences of two Aeromonas salmonicida subsp. salmonicida isolates harboring plasmids conferring antibiotic resistance. FEMS Microbiol. Lett. 2015;362:1–4. 440 441 442 35. Han JE, Kim JH, Shin SP, Jun JW, Chai JY, Park SC. Draft Genome Sequence of Aeromonas salmonicida subsp. achromogenes AS03, an Atypical Strain Isolated from Crucian Carp (Carassius carassius) in the Republic of Korea. Genome Announc. 2013;1:e00791–13. 443 444 445 36. Pavan ME, Pavan EE, López NI, Levin L, Pettinari MJ. Genome Sequence of the MelaninProducing Extremophile Aeromonas salmonicida subsp. pectinolytica Strain 34melT. Genome Announc. 2013;1:e00675–13. 446 447 37. Gurevich A, Saveliev V, Vyahhi N, Tesler G. QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;29:1072–5. 448 449 33