Supplemental Text Strain diversity Strains P7266 and P7304 form very distinct branches in the tree, and were previously shown to be very different in genotype compared to both the lactis and the cremoris subspecies (Rademaker et al., 2007). Two other strains form separate branches in the tree, i.e. cremoris strain KW10, and lactis strain KF282. The observed deviation in genome content of these 4 strains could partially be caused by poor hybridization due to sequence variation in these strains with respect to probes on the multi-strain microarray, leading to false negatives in the presence/absence calling. This problem has recently been described to be the case for a few Lactobacillus plantarum strains (Siezen et al., 2010b). In that study, several genes with intermediate hybridization signals were sequenced and shown to be present in the strains, but diverging by 15-25% in nucleotide sequence. As the exact level of sequence difference remains to be verified for the diverging L. lactis strains described here, results derived for these four divergent strains should be considered as tentative, and were not used in all analyses. Subclade-specific gene clusters Ribose metabolism Ribose is a monosaccharide used for the synthesis of the amino acids tryptophan and histidine, or for use in the pentose-phosphate pathway to produce nucleotides and 5carbon-sugars. The ribose operon rbsBCADKR is present in all strains, except hordniae 1 strain LMG8520. The inability to metabolize ribose was already described for L. lactis hordniae in an earlier study (Rademaker et al., 2007). Tryptophan and histidine biosynthesis The 3 dairy cremoris strains HP, FG2 and LMG6897T seem to lack the complete trpABCDEFG gene cluster, required for tryptophan synthesis, and therefore appear unable to synthesize tryptophan. The subsp. lactis strain ML8 lacks 5 genes of the histidine biosynthesis operon and should be unable to synthesize histidine. Nisin production/resistance Our previous study showed increased resistance of plant L. lactis strains KF147 and KF282 to the antibiotic peptide nisin, compared to dairy strains SK11 and IL1403 (Siezen et al., 2008). The KF147 genome sequence (Siezen et al., 2010a) encodes a degenerate nisin biosynthesis cluster nisXBCIPRKFEG suggesting that strain KF147 lacks nisin production, but has active immunity against nisin, and this was experimentally verified (Siezen et al., 2008). Among the strains studied here, the complete nisin gene cluster encoding all functons required for production and immunity cluster was found in several plant strains and in dairy strain NCD0895 (Table 6). In addition, only part of the nisin gene cluster was present in some strains, suggesting that these are not functional in nisin production. 2 Niche-specific genes As the selected strains were derived from different ecological niches, we compared the distribution of chrOGs between dairy and plant-derived lactis strains. In this case, only 94 chrOGs were found to be significantly associated to one of the two subgroups, and most of these chrOGs were dairy-specific (82%). These chrOGs differing in the two ecological populations mainly contained transposases, phage proteins and hypothetical proteins (supplemental material Table S3). Next, we searched for differences in chrOGs between dairy strains from subspecies lactis and subspecies cremoris. We found nearly 600 chrOGs to be significantly associated with either one of these two dairy subgroups, but most of them (94%) were also significant for the distinction of cremoris and lactis subspecies in general (see above), indicating that the subspeciation event preceded adaptation to milk. The chrOGs significant only for the distinction among lactis and cremoris as dairy strains, and not as members of different subspecies, could be considered more recent acquisition or loss events, which happened after subspeciation. These genes were more frequently present in lactis than in cremoris subspecies, suggesting gene gain in lactis dairy strains. However, the biological significance of these differences between dairy strains belonging to different subspecies is difficult to interpret, as most of the genes encode hypothetical proteins and transposases. Comparison of closely related strains Some strains are quite similar in chromosomal gene content. For instance, our chromosome-based CGH cluster analysis presented in Figure 1, and a previous study based on MLSA sequence analysis (Rademaker et al., 2007), show a close relationship between subsp. cremoris strains SK11 and AM2, and between strains MG1363 and NCDO763. Close resemblance of cremoris strains of MG1363 and NCDO 763 was 3 expected, as they are both derivatives of NCDO 712 (Wegmann et al., 2007). SK11 is a phage-resistant derivative of strain AM1, while AM2 and AM1 are both isolates from similar New Zealand dairy starters. Analysis of differences in gene presence/absence between these couples “sequenced strain - closest relative” revealed that the differences were due primarily to pseudogenes and mobile elements, such as phages, transposases and RM systems (data not shown). Strains SK11 and AM2 differ in about 40 chrOGs, with as main differences the presence of additional phage genes in SK11 and several extra EPS biosynthesis genes in AM2; the latter genes could also be plasmid-encoded in AM2 (see below). Strains NCDO763 and MG1363 show a difference of only 22 chrOGs, among which also mainly hypothetical and phage proteins. While these findings on small genomic differences are not surprising considering the history of these strains, it does provide support for the validity of our CGH analysis methods. Diversity of plasmid-encoded genes As discussed above, some strains are highly similar in chromosomal gene content. When considering plasmid content, however, some of these strain pairs do show large differences. Supplementary Table S6 summarizes the presence/absence of important plasmid-encoded genes and their functions in the 39 strains. Strains SK11 and AM2, similar in chromosomal gene content, are also highly similar in plasmid content, apart from several additional, putatively plasmid-encoded EPS biosynthesis genes in strain AM2. It is, however, possible that this is caused by cross-hybridization with EPS genes on the chromosome, and hence these EPS genes may be on the AM2 chromosome and not on a plasmid. Other strains that turned out to be similar in chromosomal gene content 4 also do seem to differ in their plasmid-encoded genes. Strain HP is more related to FG2 in terms of plasmid content, where it was chromosomally most related to LMG6897T. This is mainly caused by gene clusters involved in mobilization and restriction/modification. These genes are present in both HP and FG2, whereas they are absent in LMG6897T. Subsp. cremoris strains HP and LMG6897T thus not only differ in their chromosomal gene content, but also in their plasmids. The same holds for MG1363 and NCDO763, which differ chromosomally in only 22 chrOGs, but NCD0763 harbours at least six plasmids while MG1363 has none (supplemental Figure S3). Strains NCD0763 and MG1363 thus mainly seem to differ in plasmid-encoded genes. References Rademaker, J.L., Herbet, H., Starrenburg, M.J., Naser, S.M., Gevers, D., Kelly, W.J. et al. (2007) Diversity analysis of dairy and nondairy Lactococcus lactis isolates, using a novel multilocus sequence analysis scheme and (GTG)5-PCR fingerprinting. Appl Environ Microbiol 73: 7128-7137. Siezen, R.J., Starrenburg, M.J., Boekhorst, J., Renckens, B., Molenaar, D., and van Hylckama Vlieg, J.E. (2008) Genome-scale genotype-phenotype matching of two Lactococcus lactis isolates from plants identifies mechanisms of adaptation to the plant niche. Appl Environ Microbiol 74: 424-436. Siezen, R.J., Bayjanov, J., Renckens, B., Wels, M., van Hijum, S.A., Molenaar, D., and van Hylckama Vlieg, J.E. (2010a) Complete genome sequence of Lactococcus lactis subsp. lactis KF147, a plant-associated lactic acid bacterium. J Bacteriol 192: 2649-2650. 5 Siezen, R.J., Tzeneva, V.A., Castioni, A., Wels, M., Phan, H.T., Rademaker, J.L. et al. (2010b) Phenotypic and genomic diversity of Lactobacillus plantarum strains isolated from various environmental niches. Environ Microbiol 12: 758-773. Wegmann, U., O'Connell-Motherway, M., Zomer, A., Buist, G., Shearman, C., Canchaya, C. et al. (2007) Complete genome sequence of the prototype lactic acid bacterium Lactococcus lactis subsp. cremoris MG1363. J Bacteriol 189: 3256-3270. 6