SUPPLEMENTARY INFORMATION Extracting and determining enrichment of pQBR103 DNA for sequencing. pQBR103 was extracted (Lilley et al., 1994) and further enriched by electrophoresis and excision from low melting5 point agarose gels. DNA was recovered from agarose using phenol extraction and ethanol precipitation. The enrichment of pQBR103 plasmid DNA relative to P. fluorescens SBW25 chromosomal DNA was determined by PCR targeting the plasmid-encoded merA and merR genes and the host 16S RNA genes (details of primers available on request). Purified plasmid and host control DNA were each diluted 100 – 10-6. 25µl PCR reactions were performed using BioLine (UK) 10 reagents with 0.5µl template DNA, 0.2pmol of each primer, 2.0mM dNTP, 1.5mM MgCl 2 and 0.1U/µl Taq polymerase, and cycling conditions of [3min at 94°C], 30 x [0.5 min at 94°C, 1min at 54°C, 1min at 72°C], then [10 min at 72°C]. Five microlitre aliquots of the PCR reactions were examined by gel electrophoresis, and the dilution at which products were not seen was used to estimate the ratio of plasmid to chromosomal DNA. The final purification of pQBR103 DNA resulted in an enrichment factor 15 of at least 100x which was acceptable for library construction (i.e. ~80% of cloned fragments would be of pQBR103 origin). Finally, the XbaI restriction pattern of the purified pQBR103 was compared with that obtained from total SBW25/pQBR103 and SBW25 chromosomal DNA (both isolated by the CTAB method (Sulakhe et al, 2005) by gel electrophoresis, to confirm that the purified DNA was pQBR103 rather than that of a possible deletion mutant or the result of some other plasmid misidentification. 20 Analysis of plasmid diversity. PCR was used to amplify specific regions from pQBR plasmid DNA using pQBR103-designed primers (primer details are available on request). PCR conditions are as described for the estimation of plasmid enrichment, and the results visualized by gel electrophoresis. The PCR products for CDSs 061, 391, 431, 435 and Int0023 for each pQBR plasmid were recovered 25 from agarose after electrophoresis using QIAquick gel extraction kit (Qiagen, UK) and sequenced using ABI BigDye V3.1 (Applied Biosystems, Europe) technology. Microarray probe production. 144 pUC19 clone-probes were chosen from the sequencing libraries to maximize the coverage of the entire pQBR103 plasmid genome. Of these only 122 were found to 30 amplify an appropriately sized insert using pUC19-specific primers (and 10 random PCR fragments ISME-J 0070OAR Page 1 were sequenced to confirm that the inserts were of pQBR103 DNA, data not shown). Subsequently, the 122 probe regions were found to be distributed throughout pQBR103 and cover 69% of the genome (details of the probe regions are available on request). Probes of 1415 – 3388 bp were obtained by PCR amplification of the pUC19 clones using M13f (5’-TGTAAAACGACGGCCAGT-3’) 35 and pUCR (5’-GCGGATAACAATTTCACACAGGA-3’). 50µl PCR reactions were performed using BioLine reagents with 1µl template DNA, 0.2pmol of each primer, 2.0mM dNTP, 1.5mM MgCl 2, 0.045 U/µl Taq polymerase and 0.005U/µl PFU (Promega, UK), and cycling conditions of [3min at 94°C], 30 x [1 min at 94°C, 1min at 54°C, 7min at 72°C], then [10 min at 72°C]. 5µl aliquots of the PCR reactions were checked by gel electrophoresis. Successful amplifications were pooled for each clone before 40 purification using GenElute PCR clean-up Kit (Sigma, UK). Probe concentration was estimated by gel electrophoresis using DNA of known concentration and were adjusted to 100ng/µl. Each probe was suspended 50:50 in Genetix (Genetix, UK) spotting solution for amine slides and printed onto aminosilane slides using a Genetix Qarray mini microarray printer with solid tungsten 150 µm aQu pins. The slide was kept humidified for 12 hours then baked for 30 minutes at 85°C and finally 45 irradiated with 300J UV. Mapping and analysis of IVET sequences. In vivo expression technology (IVET) had been used previously to determine regions of transcriptional activity in pQBR103 specifically induced on sugar beet seedlings (Zhang et al. 2004a). Unique IVET sequences were mapped unambiguously to 50 pQBR103, and these insertions were used to infer likely patterns of CDS expression by CDS cluster analysis. A cluster was arbitrarily defined as a series of CDS in the same orientation as the IVET dap gene, in which CDSs upstream and downstream of the IVET insertion point were separated by ≤ 200bp or ≤ 500bp (i.e. a run of CDSs likely to be co-expressed by the same RNA transcript reported by IVET fusion). IVET regions were defined by adjacent IVETs separated by ≤ 5kb (i.e. a gap unlikely 55 to be spanned by the RNA transcript reported by the IVET fusion). Plasmid and bacterial chromosome genomes used for comparison. The following sequences were used (accession numbers to the Protein tables are provided; references therein): (Largest plasmids) 60 pGMI1000MP (NC_003296), pSymA and pSymB (NC_003037), (NC_008043), p42c, p42d, p42e and p42f (NC_007766), pAT ISME-J 0070OAR mega plasmid (NC_003306), pNGR234a Page 2 (NC_000914), pHG1 (NC_005241), pMLa (NC_002679), plasmid 1 (NC_008242), R478 (NC_005211), and pREL1 (NC_007491); (Pseudomonas plasmids) pCAR1 (NC_004444), p1448A-A and p1448A-B (NC_007274), pWW0 (NC_003350), pND6-1 (NC_005244), pADP-1 (NC_004956), pDTG1 (NC_004999), NAH7 (NC_007926), pDC3000A and pDC3000B (NC_004632), pPSR1 65 (NC_005205), Rms149 (NC_007100), pPMA4326A and pPMA4326B (NC_005918), pFKN (NC_002759) and pRA2 (NC_005909); (Pseudomonas bacterial chromosomes) P. aeruginosa PA01 (NC_002516), C3719 (NZ_AAKV00000000), UCBPP-PA14 (NZ_AABQ00000000), 2192 (NZ_AAKW00000000); P. entomophilia L48 (NC_008027); P. fluorescens Pf-5 (NC_004129), Pf0-1 (NC_007492), SBW25; P. putida KT2440 (NC_002947), F1 (NZ_AALM00000000); P. syringae 1448A 70 (NC_005773), B728a (NC_007005), and DC3000 (NC_004578). Data was obtained directly from the NCBI Protein tables except for P. fluorescens SBW25 which was determined using Artemis and a preliminary annotation from the Wellcome Trust Sanger Institute. The presence of complete or partial insertion sequences and transposons (IS/Tn) in sample genomes was assessed by counting the number of CDS annotated using the word ‘transposase’ (Note that between 1-3 transposase subunits 75 are required to form an active transposase). 80 ISME-J 0070OAR Page 3 Supplementary Figure Legends Supp. Figure 1. The closest homologues of pQBR103 CDSs are from Pseudomonas spp. Of the 478 CDSs in pQBR103, 95 have significant levels of homology to sequences found in public databases. The likely origin of these 95 sequences can be inferred 85 from the taxonomy of the closest homologue; the main plot of homology (Expected value) verses rank demonstrates that the best homologues are from Pseudomonas spp. (red circles). The inset shows the Expected value for each CDS with significant homologies in order across the pQBR103 sequence, high-homologies are spread throughout the plasmid. Colour coding: Pseudomonas spp., Red: 90 Other -Proteobacteria excluding Pseudomonas spp., Blue: Others excluding all Proteobacteria, Green: Poor homologies from a variety of taxonomies, Black. Closest homologues for each CDS are listed in Supplementary Table 1. 95 ISME-J 0070OAR Page 4 100 Supplementary Tables Supplementary Table 1. Annotation of functional CDS in the pQBR103 plasmid genome. Closest homology to Pseudomonas spp.† 105 –––––––––––––––––––––––––– ––––––––––––––––––––––––––––––––––––––––––––––––––––––––– CDS Annotation* Species (Plasmid) E-value Taxa, Species (Plasmid) E-value % ID Accession 1 Plasmid partitioning protein, ParA P. putida (pWW0) 1.0 e-23 Alphaproteobacteria; Rhizobium meliloti (pMBA19a) 4.0 e-33 34.78 Q5BTN9_RHIME 2 Plasmid partitioning protein, ParB P. aeruginosa 1.0 e-05 Gammaproteobacteria; Xylella fastidiosa 2.0 e-13 26.81 Q9PH83_XYLFA 28 Type II/III secretion system pilus protein, PilN-like homologue P. syringae B728a 1.0 e-15 Deltaproteobacteria; Desulfotalea psychrophila 3.0 e-21 24.31 Q6AS33_DESPS 32 Pilus assembly-related protein, PilB/TapB-like homologue P. aeruginosa 1.0 e-57 Firmicutes; 6.0 e-66 Thermoanaerobacter tengcongensis 40.25 Q8RAG1_THETN 33 Type IV pilus biogenesis-like protein No significant hit Low homology 6.60 31.25 Q1QKR2_NITHA 34 Type II/IV transmembrane secretion-related protein No significant hit Chlamydiae; Parachlamydia sp. UWE25 5.0 e-05 22.49 Q6M9X9_PARUW 35 Transmembrane pilus-related protein, PilA/ComP-like homologue No significant hit Low homology 2.20 28.28 Q3SEB6_PARTE 36 General secretion pathway protein P. fluorescens SBW25 5.0 e-05 Betaproteobacteria; Burkholderia ambifaria AMMD 3.0 e-7 38.96 Q3FDY7_9BURK 37 Arylsulfatase-activating protein-like homologue No significant hit Euryarchaeota; Methanosaeta thermophila PT 3.0 e-20 24.93 Q2CLU4_9EURY 38 Arylsulfatase-activating protein-like homologue P. fluorescens PfO-1 Euryarchaeota; Methanosaeta thermophila PT 3.0 e-16 26.29 Q2CLU4_9EURY 51 Arylsulfatase-activating protein-like homologue No significant hit Gammaproteobacteria; Vibrio parahaemolyticus 7.0 e-6 24.52 Q87IA8_VIBPA 52 Twitching motility protein, PilT-like homolgue P. putida KT2440 4.0 e-19 Deinococcus-Thermus; Thermus thermophilus HB8 2.0 e-22 34.82 Q5SHF6_THET8 53 GGDEF-family domain protein P. putida F1 5.0 e-17 Alphaproteobacteria; 1.0 e-22 Agrobacterium tumefaciens C58 29.96 Q8UB10_AGRT5 55 Plant-inducible DNA helicase, HelA (EC 3.6.1.-) P. fluorescens Pf-5 2.0 e-13 Deltaproteobacteria; 2.0 e-145 33.19 Pelobacter carbinolicus DSM 2380 Q3A3V6_PELCD 56 Catabolite gene activator family protein, Crp/Vfr-like homologue No significant hit Gammaproteobacteria; Photorhabdus luminescens 2.0 e-4 23.76 Q7MB98_PHOLL 57 Ribosomal protein, RpsJ/NusE/S10-like homologue No significant hit Actinobacteria; 6.0 e-5 Rubrobacter xylanophilus DSM 9941 32.65 Q3X1J8_9ACTN 65 Deoxyribonuclease P. putida KT2440 41.45 Q3JF75_NITOC 110 115 Closest homology to all taxa‡ 120 125 130 135 9.0 e-4 140 145 150 155 ISME-J 0070OAR 1.0 e-12 Gammaproteobacteria; 2.0 e-44 Page 5 Nitrosococcus oceani ATCC 19707 165 5.0 e-26 Gammaproteobacteria; 1.0 e-26 Marine proteobacterium HTCC2207 73 Catabolite gene activator family protein, Crp/Vfr-like homologue P. putida KT2440 74 Plant-inducible DNA helicase, HelC Top hit Gammaproteobacteria; P. syringae 1448A 5.0 e-117 33.92 Q48IC3_PSE14 76 Restriction-modification methylase No significant hit Betaproteobacteria; Ralstonia eutropha (pHG1) 5.0 e-104 48.27 Q7WX17_RALEU 80 Transmembrane rhomboid family protein Top hit Gammaproteobacteria; P. syringae 1448A 2.0 e-68 65.93 Q48FU5_PSE14 84 Glutathionylspermidine synthase (EC 6.3.1.9) Top hit Gammaproteobacteria; P. syringae 1448A 2.0 e-178 74.29 Q48FU9_PSE14 97 DNA-binding domain protein P. putida (pWW0) 4.0 e-10 Gammaproteobacteria; 2.0 e-19 Shewanella denitrificans OS217 28.17 Q3NZV3_9GAMM 103 Response regulator domain protein P. fluorescens Pf-5 5.0 e-08 Bacteroidetes; Salinibacter ruber DSM 13855 4.0 e-11 38.39 Q2RZC9_SALRD 104 Type IV leader peptide processing enzyme (EC 2.1.1.-, 3.4.23.43) Top hit Gammaproteobacteria; P. aeruginosa 6.0 e-75 48.96 LEP4_PSEAE 105 Site-specific recombinase, Integrase family, Int Top hit Gammaproteobacteria; P. syringae DC300 1.0 e-85 42.82 Q881N3_PSESM 110 Site-specific recombinase, Integrase family, Int Pseudomonas sp. ND6 (pND6-1) 6.0 e-16 Gammaproteobacteria; 3.0 e-66 45.19 Xanthomonas campestris 85-10 (pXCV183) Q3C033_XANCS 113 Plasmid partitioning protein, ParB P. putida F1 6.0 e-10 Actinobacteria; 7.0 e-11 Symbiobacterium thermophilum 43.06 Q67J37_SYMTH 119 Transmembrane thiol:disulfide interchange protein, DsbD-like (EC 1.8.1.8) P. syringae DC300 2.0 e-75 Gammaproteobacteria; Azotobacter vinelandii AvOP 1.0 e-81 36.93 Q4IWJ7_AZOVI 123 Transmembrane protein Top hit Gammaproteobacteria; P. resinovorans (pCAR1) 8.0 e-29 40.23 Q8GHV0_PSERE 124 Zn-dependent protease with chaperone function Top hit Gammaproteobacteria; P. aeruginosa 4.0 e-63 54.65 Q9HVF9_PSEAE 126 Transmembrane protein, TolA-like homolgue Top hit Gammaproteobacteria; P. aeruginosa 1.0 e-13 29.65 TOLA_PSEAE 128 DNA-binding domain protein P. syringae B728a Betaproteobacteria; Polaromonas sp. JS666 2.0 e-36 46.53 Q4ASP6_9BURK 131 Transmembrane thiol:disulfide interchange protein, DsbD-like Top hit Gammaproteobacteria; P. syringae DC300 6.0 e-8 27.61 Q87VS7_PSESM 134 Transmembrane autotransporter Top hit Gammaproteobacteria; P. syringae B728a 6.0 e-129 40.65 Q4ZYT2_PSEU2 149 NAD-dependent deacetylase (EC 3.5.1.-) Top hit Gammaproteobacteria; P. fluorescens Pf-5 1.0 e-71 57.02 Q4KDX3_PSEF5 151 Nucleoid-associated protein, NdpA-like homologue Top hit Gammaproteobacteria; P. fluorescens Pf-5 1.0 e-154 81.08 Q4KHU2_PSEF5 155 Pilin-related protein, PilV-like homologue P. aeruginosa (pKLC102) Gammaproteobacteria; Serratia entomophila 1.0 e-6 Q7BQX0_9ENTR 160 31.25 Q1YVH0_9GAMM 170 175 180 185 190 195 200 205 2.0 e-4 210 215 220 ISME-J 0070OAR 5.0 e-05 33.33 Page 6 157 UV resistance protein, RulA Top hit Gammaproteobacteria; P. putida (pNAH7) 7.0 e-34 158 UV resistance protein, RulB Top hit Gammaproteobacteria; P. putida (pNAH7) 1.0 e-149 59.86 Q1XGP6_PSEPU 160 Conjugal transfer protein, TrbN-like homologue No significant hit Low homology 3.30 e-1 33.33 Q47073_ECOLI 171 Ribonuclease HII (EC 3.1.26.4) Top hit Gammaproteobacteria; P. fluorescens PfO-1 2.0 e-75 72.87 RNH2_PSEPF 175 Transcriptional regulator DnaK suppressor protein, TraR/DksA-like homologue Top hit Gammaproteobacteria; P. syringae DC300 2.0 e-19 44.09 Q87ZE2_PSESM 178 DNA-binding protein, Hu P. aeruginosa Gammaproteobacteria; Methylococcus capsulatus 2.0 e-25 61.8 Q60BE5_METCA 181 Exodeoxyribonuclease, RecD/TraA-like homologue No significant hit Alphaproteobacteria; Acidiphilium cryptum JF-5 1.0 e-18 20.94 Q2DE92_ACICY 182 Conjugal transfer TraG-family coupling protein P. syringae DC300 (pDC300A) Betaproteobacteria; Burkholderia vietnamiensis G4 3.0 e-33 25.68 Q4BGZ3_BURVI 188 Conjugal transfer TraB-family topoisomerase No significant hit Low homology 4.30 e-2 21.84 Q54WI5_DICDI 189 Conjugal transfer assembly protein, TraV-like homologue No significant hit Gammaproteobacteria; 6.0 e-5 Legionella pneumophila Philadelphia 1 33.64 Q5ZTS3_LEGPH 209 DNA primase, DnaG-like homologue No significant hit Spirochaetes; Borrelia burgdorferi 8.0 e-8 23.43 PRIM_BORBU 213 GGDEF two-component response regulator Top hit Gammaproteobacteria; P. aeruginosa 3.0 e-58 38.06 Q9HUW7_PSEAE 249 Restriction enzyme-related protein No significant hit Deltaproteobacteria; 2.0 e-13 Anaeromyxobacter dehalogenans 2CP-C 51.32 Q2IGC5_ANADE 255 Bacteriophage-related protein of unknown function P. fluorescens SBW25 1.0 e-20 dsDNA virus; Pseudomonas phage F116 2.0 e-29 37.87 Q5QF30_9CAUD 289 Plasmid conjugal transfer inhibition protein, Tir-like No significant hit Gammaproteobacteria; Erwinia amylovora (pEL60) 1.0 e-12 32.81 Q6TFZ5_ERWAM 301 Ankyrin repeat-containing protein No significant hit Eukaryota; Mus musculus 2.0 e-12 34.52 Q8C8R3_MOUSE Gammaproteobacteria; P. syringae 1448A 1.0 e-51 45.33 Q48GQ0_PSE14 Gammaproteobacteria; Halorhodospira halophila SL1 5.0 e-21 32.64 Q2CS52_ECTHA 318 Plant-inducible oligoribonuclease, Orn No significant hit (EC 3.1.-.-) Eukaryota; 2.0 e-29 Tetrahymena thermophila SB210 42.94 Q22ZB0_TETTH 327 Bacteriophage-related protein of unknown function No significant hit Low homology 8.80 e-2 32.26 Q853S7_9CAUD 344 Transcriptional regulator, AlgZ-like homologue P. fluorescens Pf-5 Gammaproteobacteria; Azotobacter vinelandii AvOP 5.0 e-19 51.55 Q4J155_AZOVI 53.9 Q1XGP5_PSEPU 225 230 235 2.0 e-25 240 245 7.0 e-12 250 255 260 265 270 307 Acetyltransferase GNAT family protein Top hit 275 311 Stringent starvation protein, SspA-like homologue P. syringae 1448A 1.0 e-42 280 285 ISME-J 0070OAR 1.0 e-18 Page 7 290 350 RNA polymerase sigma-32 factor, RpoH-like homologue Top hit Gammaproteobacteria; P. aeruginosa 2.0 e-84 62.45 RP32_PSEAE 361 Plant-inducible DNA helicase, HelB Top hit Gammaproteobacteria; P. resinovorans (pCAR1) 0.0 54.86 Q8GHN5_PSERE 364 Cold shock DNA-binding domain protein, Csp-like homologue Top hit Gammaproteobacteria; P. putida KT2440 7.0 e-41 39.18 Q88Q61_PSEPK 367 DnaJ family protein P. fluorescens SBW25 8.0 e-08 Gammaproteobacteria; 5.0 e-15 Nitrosococcus oceani ATCC 19707 54.29 Q3JC07_NITOC 371 DNA helicase P. aeruginosa Betaproteobacteria; Burkholderia vietnamiensis G4 2.0 e-56 32.85 Q4BMY9_BURVI 375 Bacteriophage-related protein of unknown function No significant hit dsDNA viruses; Vibrio phage K139 2.0 e-12 36.48 Q8W758_9CAUD 377 Transcriptional regulator, AlgZ-like homologue Top hit Gammaproteobacteria; P. aeruginosa 5.0 e-04 46 Q9RPY7_PSEAE 383 Plasmid IncA/C–Inc P3 replication protein, RepA No significant hit Gammaproteobacteria; Escherichia coli (pRA1) 1.0 e-61 44.56 Q08896_ECOLI 401 DNA helicase (EC 3.6.1.-) Top hit Gammaproteobacteria; P. syringae B728a 7.0 e-130 48.28 403 DNA restriction methylase Top hit Gammaproteobacteria; 1.0 e-72 Pseudomonas sp. ND6 (pND6-1) 35.87 Q6XUK5_9PSED 407 DNA ligase, bacteriophage-like homologue No significant hit dsDNA viruses; Bacteriophage KVP40 6.0 e-28 27.52 Q6WI94_BPKV4 413 Single-strand binding protein, Ssb-like homologue Top hit Gammaproteobacteria; P. syringae DC300 3.0 e-29 38.74 SSB_PSESM 421 Response regulator receiver domain protein Top hit Gammaproteobacteria; P. aeruginosa 1.0 e-34 53.23 PILG_PSEAE 426 Tn5042-like transposase, TnpA Top hit 2.0 e-61 99.15 Q70MR7_PSEFL 427 Tn5042-like transposase, TnpB Top hit Gammaproteobacteria; P. fluorescens Gammaproteobacteria; P. fluorescens 1.0 e-61 100 Q70MR8_PSEFL 428 Tn5042-like transposase, TnpC Top hit Gammaproteobacteria; P. fluorescens 0.0 99.4 Q70MR9_PSEFL 430 Tn5042-like organomercurial lyase, MerB (EC 4.99.1.2) Top hit Gammaproteobacteria; P. fluorescens 2.0 e-115 99.06 Q70MS1_PSEFL 431 Tn5042-like mercuric ion reductase, MerA (EC 1.16.1.1) Top hit Gammaproteobacteria: P. fluorescens 0.0 96.25 Q70MS2_PSEFL 432 Tn5042-like inner membrane mercury ion uptake protein, MerC Top hit Gammaproteobacteria; P. fluorescens 1.0 e-55 93.75 Q53IQ9_PSEFL 433 Tn5042-like periplasmic mercuric ion binding protein, MerP Top hit Gammaproteobacteria; P. fluorescens 9.0 e-44 100 Q53IR0_PSEFL 434 Tn5042-like mercuric ion transport protein, MerT Top hit Gammaproteobacteria; P. fluorescens 3.0 e-43 99.14 Q53IQ8_PSEFL 435 Tn5042-like Mer operon activator/repressor, MerR Top hit Gammaproteobacteria; P. fluorescens 1.0 e-76 99.3 Q70MS3_PSEFL 295 7.0 e-11 300 305 5.0 e-04 310 315 320 Q4ZWH0_PSEU2 325 330 335 340 345 350 ISME-J 0070OAR Page 8 355 360 365 438 Recombination-associated protein, RdgC-like homologue Top hit Gammaproteobacteria; P. fluorescens Pf-5 4.0 e-93 61.76 Q4K8D9_PSEF5 443 Carbon storage translational RsmA/CsrA family regulator, RsmA-like homologue Top hit Gammaproteobacteria; P. fluorescens Pf-5 4.0 e-14 73.08 Q4KEY0_PSEF5 445 Plasmid IncA/C–Inc P3 replication protein, RepA No significant hit Gammaproteobacteria; Buchnera aphidicola (pBPs2) 1.0 e-33 29.61 Q9ZER8_9ENTR 455 Plasmid partitioning protein, ParB P. syringae 1448A 1.0 e-18 Betaproteobacteria: Ralstonia metallidurans CH34 1.0 e-61 30.88 Q5NUW9_RALME 461 Exodeoxyribonuclease I P. putida KT2440 1.0 e-10 Gammaproteobacteria; Methylococcus capsulatus 4.0 e-15 24.77 Q605A2_METCA 464 DNA polymerase III subunit (EC 2.7.7.7) P. syringae 1448A 4.0 e-31 Gammaproteobacteria; Vibrio fischeri ATCC 70601 1.0 e-31 28.53 Q5E8Z1_VIBF1 465 RNA polymerase sigma factor, RpoD-like homologue P. putida F1 5.0 e-17 Betaproteobacteria; Ralstonia solanacearum 2.0 e-20 23.95 Q8XXA1_RLSO 470 Sensory transduction histidine kinase P. aeruginosa 3.0 e-13 Gammaproteobacteria; 2.0 e-19 Nitrosococcus oceani ATCC 19707 31.42 Q3JET8_NITOC 471 Two-component regulator sensor histidine kinase fused response regulator protein, PilL-like homologue (EC 2.7.3.-) P. aeruginosa 9.0 e-73 Gammaproteobacteria; 5.0 e-78 Alkalilimnicola ehrlichei MLHE-1 35.27 Q34VJ7_9GAMM 472 Chemotaxis signalling protein, PilK-like homologue Top hit 2.0 e-24 31.58 Q51346_PSEAE 473 Methyl-accepting chemotaxis P. fluorescens SBW25 2.0 e-22 transducer protein, PilJ-like homologue Gammaproteobacteria; 8.0 e-25 Alkalilimnicola ehrlichei MLHE-1 24.06 Q34VJ8_9GAMM 474 Pilus-related protein, PilI-like homologue No significant hit Gammaproteobacteria; Xylella fastidiosa 2.0 e-6 27.66 Q9PC31_XYLFA 475 Two-component response regulator transcriptional regulatory protein, PilG-like homologue P. aeruginosa 3.0 e-17 Gammaproteobacteria; Psychrobacter arcticum 1.0 e-19 42.98 Q4FQP2_PSYAR 477 Plasmid partitioning protein, ParB P. putida F1 9.0 e-29 Gammaproteobacteria; Legionella pneumophila Lens 2.0 e-36 35.23 Q5WTL1_LEGPL 370 375 380 385 Gammaproteobacteria; P. aeruginosa 390 395 * Annotation based on inspection of homology data over the entire length of the CDS using predicted protein sequences. Assignments made only if homologous sequences had functional annotations, except a limited number of CDSs annotated on the basis of domain-only homology or as bacteriophage-related proteins of unknown function. Enzyme Commission (EC) numbers were assigned by the GNARE system (Sulakhe et al, 2005). 400 † The closest BLAST homology to a Pseudomonas spp. chromosomal sequence or a Pseudomonas plasmid sequence unless the homology was the highest observed (Top hit), in which case the data are provided in the adjacent ‘Closest homology to all taxa’ columns. ‘No significant hit’ indicates no significant homology match to any Pseudomonas sequence. ‡ 405 The closest BLAST homology observed from any taxa including Pseudomonas. ‘Low homology’ indicates very weak levels of homology over a reasonable length of the CDSs or supporting genomic context. % ID is with no gaps. ISME-J 0070OAR Page 9 Supplementary Table 2. Clustering of the predicted proteins in pQBR103 into gene families. 410 Cluster Comment 415 420 425 430 CDS Mixed 1 ParB partition 113, 455, 477 2 Pilin-associated 36, 155 3 Arylsulfatase regulator-like 37, 38, 39, 42, 47, 50, 51* 4 Twitching motility protein 32, 52 5 DNA-binding domain protein 97, 221, 223, 337 6 Response regulator 103, 213, 421, 471, 475 7 Transmembrane thiol:disulfide interchange protein 119, 131 8 Transmembrane 126, 129 C,F 9 Transcriptional regulator 344, 347 F,O 10 RNA polymerase sigma factor 350, 465 11 Plasmid replication protein 383, 445 12 Conserved hypothetical 43, 44, 46* 13 Conserved hypothetical 391, 416 14 Conserved hypothetical/Orphan 240, 375 C,O 15 Conserved hypothetical/Orphan 327, 328 C,O 16 Orphan 125, 156 17 Orphan 216, 244, 247, 265 18 Orphan 217, 218 19 Orphan 233, 238 F,F,C,C,C,C,F F,C,C,C CDS regions were placed into protein families using the mcl clustering method at different stringencies (Enright et al, 2002). Zero clusters formed at an inflation level of 3.0 or above. Six and nineteen clusters formed respectively 435 at the less conservative thresholds of 2.0 (underlined) and 1.2. Possible functional characteristics for each gene family are based on the annotation of functional CDSs within each cluster. If a cluster contained a combination of CDSs it is labelled ‘mixed’ (F, functional, CH, conserved hypothetical, O, orphan). Contiguous CDSs are marked in bold. * Clusters 3 and 13 are transitively linked at very low homology. 440 ISME-J 0070OAR Page 10 Supplementary Table 3. Potential VirB/D4 T4SS-like transfer elements Closest VirB/D4 homologue 445 CDS Annotation 160 450 175 181 Conjugal transfer TrbN-like homologue Transcriptional regulator, TraR/DksA-like homologue Exodeoxyribonuclease/helicase, Vir* VirB1 - %ID† E-value 25/43 6 x E-3 44/62 xE-21 2 VirB6 21/37 5x Accession YP_190354.1 AAO56962.1 E-16 ZP_01144634.1 RecD/TraA-like homologue 455 182 Conjugal transfer TraG-family coupling protein VirD4 25/43 2 x E-33 ZP_00242688.1 188 Conjugal transfer TraB-family topoisomerase VirB10 26/40 1 x E-05 AAW83066.1 33/43 9x E-05 AAU28154.1 4x E-32 AAW83069.1 189 191 Conjugal transfer TraV-like homologue Conjugal transfer TraC-like homologue VirB7 VirB4 27/45 The Agrobacterium tumefaciens pTi (pTiC58) VirB/D4 system is used as the paradigm of Mating pair formation (Mpf) and related type IV secretion systems (Schroder & Lanka, 2005). It contains 12 genes, VirB1-11, D4, 460 which are conserved in a large number of secretion systems transporting DNA. These systems have 9-14 genes, and the order of these differ between transfer systems. Additional Mpf/type IV genes are Tra (IncFI F) and Trb (IncP RP4). * The most likely VirB/D4 homologue is indicated for each CDS. No significant homologues to VirB3, B5, B6, B8, B9 or B11 in pQBR103 were identified. The CDS in between the CDS indicated had no homology to proteins involved in Mfp/Type IV secretion systems. 465 † % identity and similarity are given. ISME-J 0070OAR Page 11 470 Supplementary Table 4. Plant-inducible IVET fusions and potential transcriptional regions. IVET Position & Transcription Number of CDS in Cluster likely to be expressed by transcription reported by IVET fusion (CDS) (F/CH/O) 1 CP-dS3 16,565 (-) 0† 2 G4-8S2 57,469 (-) 0† 2 G3-1S3 58,992 (+) 2 (054-055) (1/1/0) HelA is CDS 055. 2 p8-c2 (ppi4) 62,487 (+) 2 (054-055) (1/1/0) 3 BB-8S1 72,580 (-) 0† 4 R6 (ppi17) 80,188 (+) 1 (074) (1/0/0) HelC is CDS 074. 5 BB-5S3 161,288 (+) 0† 6 G3-4S5 171,715 (+) 1 (163) (0/0/1) [2 (163-164) (0/0/2)] 7 BB-2S2 180,322 (-) 2 (172-171) (1/0/1) 8 G4-7R1 225,860 (-) 0† 9 G4-7S2 242,651 (-) 0† 9 L2-R3 245,345 (+) 7 (243-249) (1/0/6) [11 (239-249) (1/1/9)] 9 G5-6R4 246,481 (-) 0† 9 G4-10S1 247,293 (-) 0† 10 R4' 257,955 (+) 1 (265) (0/0/1) 10 CP-dR1 259,778 (+) 3 (268-270) (0/0/3) 11 G2-3R1 269,276 (-) 1 (283) (0/0/1) 11 R5 271,081 (-) 0† 12 G4-2R2 278,449 (+) 0† 13 G4-6S1 (pIVETD5) 296,034 (+) 9 (315-323) (1/1/7) Orn is CDS 318. 14 G3-2R6 305,792 (-) 0† 14 G2-5S1 309,926 (-) 0† 14 CP5-1S 312,662 (-) 0† 14 G3-4R3 313,690 (+) 6 (344-349) (1/1/4) [32 (332-363) (3/4/25)] 15 CP-eR1 322,194 (+) 13 (351-363) (1/1/11) [32 (332-363) (3/4/25)] HelB is CDS Region 475 480 485 490 495 500 361. 505 510 16 CP4-1R 374,762 (+) 9 (421-429) (4/1/4) Adjacent to Tn5042. TnpA is CDS 426. 17 R3' 393,119 (-) 0† 17 G1-4R1 394,575 (-) 0† Estimate of CDS potentially expressed in planta (≤ 200 bp separation) : 65 (14%) 11/7/47 (12/7/17%) Estimate of CDS potentially expressed in planta (≤ 500 bp separation) : 83 (17%) 12/10/61 (13/10/22%) Number of IVET fusions apparently not reporting CDS transcription : 15 (52%) Number of regions apparently not reporting CDS transcription : 6 (35%) Number of regions in which transcription is reported in both orientations : 4 (24%) All Dap-end IVET sequences were mapped onto the pQBR103 genome sequence. IVET names and synonyms are given. Of the 37 reported in Zhang et al (2004a), the above were found to be unique after Dap-end sequences 515 were compared to the completed pQBR103 genome. The position of each IVET insertion point is given, plus the direction of transcription (+/- strand) reported by the IVET fusion. CDS clusters are defined in the Supporting text ISME-J 0070OAR Page 12 and are those likely to be transcribed as indicated by each IVET fusion (calculated assuming ≤ 200 bp between IVET insertion point and closest CDS, and with ≤ 200 bp between adjacent CDS in the same orientation [or assuming ≤ 500 bp]). The number of CDSs in each cluster is given followed by the CDSs and F/CH/O numbers in 520 parentheses (F, functional, CH, conserved hypothetical, O, orphan). † indicates that no CDSs are likely to be expressed in that orientation, and that a cluster of ≥ 1 CDS exists in the opposite orientation. The 11 F CDS identified with ≤ 200bp separation are: CDS 055, 074, 171, 249, 318, 344, 361, 421, 426, 427 and 428; the additional CDS (≤ 500bp) is CDS 350. IVET insertion positions have been grouped into regions where adjacent IVETs are separated by ≤ 5 kb. The transcription reported by each IVET may occur in a position containing CDSs 525 or in a position without CDSs (> 200 or > 500 bp of a CDS). Further, IVET transcription may support the direction of transcription inferred by the orientation of CDSs or it may not. Finally, transcription reported by adjacent IVETs may be in agreement or suggest convergent/divergent/overlapping transcription. 530 ISME-J 0070OAR Page 13 Supplementary Table 5. PCR surveys were used to demonstrate the presence or absence of pQBR103 regions in other pQBR plasmids. 535 540 545 550 555 560 565 570 pQBR Plasmid Group Estimated Size (kb) 4 I 321 41 I 301 42 I 367 44 I 130 47 I 256 29 III 174 55 III 149 57 IV 261 CDS Annotation 027 Orphan + + + - + - - - 028 Type II/III secretion system pilus protein + + + - + - - - 029 Orphan + + + - + - - - 030 Orphan + + + - + - - - 031 Orphan + + + - + - - - 032 Pilus assembly-related protein + + + - + - - - 033 Pilus biogenesis-like protein + + + - + - - - 034 Transmembrane secretion-related protein + + + - + - - - 035 Transmembrane pilus-related protein + + + - + - - - 036 Transmembrane secretion-related protein + + + - + - - - IV0023 Between CDS053-54, overlapping 54 + + + - + - - 055 Plant-inducible DNA helicase, HelC + + + - + - - 062 Conserved hypothetical + + + - + - - 144 Orphan + + + - - - - 151 Nucleoid-associated protein + + + - - - - 156 Orphan + + + - - - - NI0730 Between CDS215-216, overlapping 216 + + + - - - - 249 + + + - - - - IV0036 Between CDS282-283, overlapping 283 + + + - - - - 286 Conserved hypothetical + + + - - - - 297 Orphan + + + - - - - 310 Orphan + + + - - - - 318 Plant-inducible oligoribonuclease, Orn + + + - + - - 361 Plant-inducible DNA helicase, HelB + + + + + - - 371 Helicase + + + + + - - 391 Conserved hypothetical + + + + + - - 431 Tn5042-like Mercuric ion reductase, MerA + + + + + + + + 435 Tn5042-like Mer activator/repressor, MerR + Restriction enzyme-related protein + + + + + + + IV0034 Between CDS453-454, overlapping 454 + + + + + - - 470 Transcriptional regulator methylesterase + + + + + - - - 471 Two-component regulator sensor histidine + + + + + - - - kinase fused response regulator protein 575 472 Chemotaxis signalling protein, CheR + + + + + - - - 473 Methyl-accepting chemotaxis transducer protein + + + + + - - - 474 Pilus-related protein + + + + + - - - 475 Two-component response regulator + + + + + - - - transcriptional regulatory protein ISME-J 0070OAR Page 14 Note that pQBR57 was only surveyed for these regions plus CDS 431 and CDS 435. IV/NI regions are intergenic regions rather than specific CDS. All primer pairs amplified pQBR103 DNA but P. fluorescens SBW25 or P. 580 putida UWC1 DNA. PCR survey results: +, Successful amplification with appropriately-sized fragment; -, amplification not detected. ISME-J 0070OAR Page 15