SUPPLEMENTARY MATERIAL Figure S1. Electrostatic potential surface representations of AraC/XylS family members. The electrostatic potentials were calculated with the software DelPhi version 4 release 1.1 and the surfaces and images were generated with PyMOL. A) View on the side of the proteins facing DNA. B) View on the opposite side of the protein, by rotation of the proteins by 180 degrees. Blue color indicates a positive potential, red color indicates a negative potential and white color indicates a neutral potential. Figure S2. Genetic context of predicted binding sites of the uncharacterized AraC/XlyS transcription factors. Distance (in base pairs) from the hypothetical binding site is provided for each gene. In blue are genes with a function related with the proposed biological role of the AraC/XylS transcription factor. In black, genes with unknown or not obviously related function. In the majority of the cases, the function of these genes was obtained by sequence similarity to other genes or by the annotation of the InterPro database for their families. Table S1. List of 62 well-characterized AraC/XylS-family transcriptional regulators with known biological function Proteina Organism Biol. proccessb Regulatory function Categoryc Ref.d AarP (P43463) Providencia stuartii Antibiotic resistance S+V 7768849, 10390241 Ada (P06134) Escherichia coli (strain K12) DNA damage response S+V 3529081 Ada (P26189) Salmonella typhimurium DNA damage response S+V 1904855 AdaA (P19219) Bacillus subtilis DNA damage response S+V 8376346, 2120677 AdiY (P33234) Escherichia coli (strain K12) Acid resistance S+V 8795195, 21034467 AggR (P43464) Escherichia coli Cell adhesion V 7913930 AlkR (O31249) Acinetobacter sp. (strain ADP1) Alkane metabolism M 9811637 AppY (P05052) Escherichia coli (strain K12) Phosphate starvation response S 18383615 AraC (P07642) Erwinia chrysanthemi Plant carbohydrates metabolism Regulates the transcription of 2'-Nacetyltransferase, which is capable of acetylating both peptidoglycan and certain aminoglycoside antibiotics. Repairs O6-methylguanine and O4methylthymine and induces transcription of genes involved in the adaptive response to alkylation damage in DNA Repairs O6-methylguanine and O4methylthymine and induces transcription of genes involved in the adaptive response to alkylation damage in DNA Repairs O6-methylguanine and O4methylthymine and induces transcription of genes involved in the adaptive response to alkylation damage in DNA Regulates arginine-dependent acid resistance, is involved in glutamatedependent response to acid stress, and is associated with activation of virulent strains of E. coli in the stomage Transcriptional activator of aggregative adherence fimbria I expression in enteroaggregative E. coli Activates the expression the alkane 1monooxygenase that permits the use of alkanes with at least twelve carbon atoms as the sole source of carbon and energy Induces the synthesis of acid phosphatase (AppA) and several other polypeptides (such as AppBC) during the deceleration phase of growth in response to phosphate starvation. Also involved in the stabilization of the sigma stress factor RpoS during stress conditions Controls the expression of at least six genes that are involved in the transport and catabolism of Larabinose an aldopentose sugar found in plant gums, pectins and bacterial cell wall polysaccharides M 3902795 Proteina Organism Biol. proccessb Regulatory function Categoryc Ref.d AraC (P0A9E0) Escherichia coli (strain K12) Plant carbohydrates metabolism M 12596232 Caf1R (P26950) Yersinia pestis Capsule formation V 1633857, 19103769 CfaD (P25393) Escherichia coli Cell adhesion V 1971911 ChbR (P17410) Escherichia coli (strain K12) Carbohydrates metabolism M 15066032 CsvR (P43460) EnvY (P10805) Escherichia coli Cell adhesion V 1685133 Escherichia coli (strain K12) Temperature stress response S 2536924 EutR (P36547) Escherichia coli (strain K12) Amine metabolism M+V 20234377 ExsA (P26993) Pseudomonas aeruginosa V FapR (P23774) Escherichia coli Type III secretion system Cell adhesion V 16714561, 9618447, 11119548 2077360 FeaR (Q47129) Escherichia coli (strain K12) Amine metabolism M 9043126 GadW (P63201) Escherichia coli (strain K12) Acid resistance S+V 12730179 GadX (Q9EYV5) Escherichia coli O127:H6 Cell adhesion, Type III secretion system, Acid resistance S+V 17576759 HilD (P0CL08) Salmonella typhimurium Type III secretion system Controls the expression of at least six genes that are involved in the transport and catabolism of Larabinose an aldopentose sugar found in plant gums, pectins and bacterial cell wall polysaccharides Regulates the caf1M1A1 operon. The caf operon products constitute a fimbrial chaperone-usher system that acts to assemble and export F1 capsule components Transcriptional activator of the CFA/I adhesin (cfAA) gene of enterotoxigenic E. coli Regulates the expression of the chbBCARFG operon for the uptake and metabolism of chitobiose (the major breakdown product of chitin degradation) Transcriptional activator of the fimbrial gene in enterotoxigenic E. coli Regulates the temperature-dependent expression of several E. coli envelope proteins, most notably the porins ompF and ompC and the lambda receptor, lamB Activates the transcription of the eut operon, neccesary for uptake and metabolism of ethanolamine. It was suggested that the breakdown of ethanolamine contributes to disrupting gut functions, in particular innate immune functions Promotes the expresion of type III secretion system in response of low levels of Ca2+ Regulates the expression of the 987P operon for the fimbrial protein in enterotoxygenic E. coli Regulates tynA/maoA and feaB/padA genes necessary for 2phenylethylamine catabolism GadW regulates the gadA and gadBC genes that regulate acid tolerance and virulence gene expression in response to environmental cues within the gastrointestinal tract GadX regulates perABC and gadA and gadB, that regulate acid tolerance and virulence gene expression in response to environmental cues within the gastrointestinal tract Regulates the expression of the main transcriptional regulator of the Salmonella pathogenicity island 1 (SPI1) hilA V 11442828 Proteina Organism Biol. proccessb Regulatory function Categoryc Ref.d HrpB (P31778) Type III secretion system Regulates the hrp operon. hrp genes encode a specialized type III secretion system V 12374732 InvF (P69343) Ralstonia solanacearum (Pseudomonas solanacearum) Salmonella typhimurium Type III secretion system V 11296219 LacR (O33813) Staphylococcus xylosus Mammalian carbohydrates metabolism M 9573174 LumQ (Q51872) Photobacterium leiognathi Riboflavin synthesis and/or luminescence M 17586644 MarA (P0ACH5) Escherichia coli (strain K12) Antibiotic resistance S+V 9333027, 11104814 MelR (P0ACH8) Escherichia coli (strain K12) Plant carbohydrates metabolism M 16621812 MmsR (P28809) Pseudomonas aeruginosa Amino acid metabolism M 1339433 MsmR (Q00753) Streptococcus mutans Plant carbohydrates metabolism M 1537846, 8432594 MxiE (P0A2S7) Shigella flexneri Type III secretion system V 16428428 OruR (P72171) PchR (P40883) Pseudomonas aeruginosa Pseudomonas aeruginosa Amino acid metabolism Iron stress M 9401045 S+V 10722571, 15375116 PerA (P43459) Escherichia coli O127:H6 Cell adhesion V 7729884 PocR (Q05587) Salmonella typhimurium Carbohydrates metabolism M 8071226 PqrA (Q52620) Proteus vulgaris Antibiotic resistance, Oxidative stress response Regulates the expression of genes encoding the secreted effector molecules Sip/Ssp ABCD, SigD, SptP and SopE, necessary for type III secretion system Regulates the expression of lacPH (lactose permease and the betagalactosidase) genes for lactose utilization Probably regulates the riboflavin synthesis and/or luminescence. Is encoded in the lux operon that is linked to the luminescence genes in some photobacterium species Regulates the expression of the mar regulon that confers resistance to a variety of antibiotics Transcriptional activator for the expression of the melAB operon in response to the availability of melibiose Regulates the expression of the mmsAB operon. The operon contains two structural genes involved in valine metabolism Regulates the expression of the msm operon responsible for the transport and metabolism of melibiose, raffinose, and isomaltosaccharides Controls transcription of a set of genes encoding proteins that transit through the type III secretion system apparatus. Regulates the expression of genes involved in ornithine metabolism Regulates the expression of pyoverdin and pyochelin siderophores (iron chelators) which contribute to virulence Regulates the expression of the eaeA gene that is associated with attaching and effacing lesions and encodes intimin, a 94-kDa outer membrane protein Regulates the expression of pdu and cob operons that regulate the meabolism of the L-fucose subproduct anaerobic metabolite 1,2-propanediol Confers multidrug resistance in a way similar to that of the soxS and marA genes in E. coli S+V 7726514 Proteina Organism Biol. proccessb Regulatory function Categoryc Ref.d RafR (P43465) Pediococcus pentosaceus Plant carbohydrates metabolism M 2180920 RamA (P55922) Enterobacter cloacae Antibiotic resistance S+V 21811569 RhaR (P09378) Escherichia coli (strain K12) M 8757746 RhaS (P09377) Escherichia coli (strain K12) M 8230210 RhrA (Q9Z3Q6) Rhizobium meliloti (Ensifer meliloti) Plant carbohydrates metabolism Plant carbohydrates metabolism Activation of siderophores, Iron stress BPI + S 11274118 RipA (Q8NRR3) Corynebacterium glutamicum Iron stress response S 21217007, 16179344 Rns (P16114) Rob (P0ACI0) Escherichia coli Cell adhesion V 2563591 Escherichia coli (strain K12) Antibiotic resistance S+V 7896685, 7793951 SirC (Q8Z4A6) Salmonella typhi Type III secretion system V 10322010 SoxS (P0A9E2) Escherichia coli (strain K12) Oxidative stress response S+V 1653416, 7726514 TcpN (P0C6D6) Vibrio cholerae Pilus formation V 1352761 TetD (P28816) Escherichia coli Antibiotic resistance S+V 6094472, 9333027 ThcR (P43462) Rhodococcus erythropolis (Arthrobacter picolinophilus) Escherichia coli Organosulfur compound metabolism Regulates the raffinose operon. Raffinose is a trisaccharide composed of galactose, fructose, and glucose found in many plants Is Involved in resistance to multiple antibiotics through the expression regulation of the OmpF porin and the efflux pump AcrAB Regulates the rhaBAD operon that encode enzymes for catabolism of rhamnose Regulates the rhaBAD operon that encode enzymes for catabolism of rhamnose Regulates the transcription of the rhbABCDEF operon involved in biosynthesis of rhizobactin 1021, a siderophore produced under iron stress. Also regulates the rhtA gene which encodes an membrane receptor protein for rhizobactin 1021 Under iron limitation, RipA acts as a repressor of several genes encoding prominent iron-containing proteins (e.g. aconitase and succinate dehydrogenase) Regulates the expression of the CS1 and CS2 adhesins Regulates the expression of genes that confer multiple antibiotic resistance. Overexpression causes antibiotic resistance, organic solvent tolerance and heavy metal resistance Regulates the expression of the invasion-associated type III secretion system encoded within the SPI-1 plasmid Regulates the transcriptional activation of a complex oxidative stress regulon in response to superoxide-generating agents Regulates the tcp gene cluster, associated with the biosynthesis and assembly of the toxin-coregulated pilus Regulates the transcripcion of the mar regulon that confers the multiple antibiotic resistance phenotype Regulates the transcription of the thc operon for the degradation of the thiocarbamate herbicide EPTC M 7836301 Regulates the expression of the urease operon, related with uropathogenic strains of E. coli and Proteus mirabilis V 11724879, 11119505 UreR (P32326) Urease activation Proteina Organism Biol. proccessb Regulatory function Categoryc Ref.d UreR (Q02458) Proteus mirabilis (strain HI4320) Urease activation V 7678244, 18202436 VirF (P0A2T1) Shigella flexneri Type III secretion system V 18202440 VirF (P0C2V5) Yersinia enterocolitica Type III secretion system V 9841674, 9618447 VqsM (Q9I1P2) Pseudomonas aeruginosa Quorum sensing V 16194239 XylR (P0ACI3) Escherichia coli (strain K12) M 9371449 XylS (P07859) M 18296514 XylS1 (Q04713) Pseudomonas putida (Arthrobacter siderocapsulatus) Pseudomonas putida Plant carbohydrates metabolism Benzene derivatives metabolism M 8473862 XylS2 (Q05092) Pseudomonas putida Benzene derivatives metabolism M 1331988 XylS3 (Q05335) Pseudomonas putida Benzene derivatives metabolism M 8473862 Y4fK (P55449) Rhizobium sp. (strain NGR234) Nodulation induction BPI 9669339 YesN (O31517) Bacillus subtilis M 17921311, 19651770 YesS (O31522) Bacillus subtilis Plant carbohydrates metabolism Plant carbohydrates metabolism Regulates the expression of the urease operon, related with uropathogenic strains of E. coli and Proteus mirabilis Primary regulator of plasmid-encoded virulence genes. VirF activates the second essential virulence plasmid regulator VirB (type III secretion system) and the actin nucleator protein IcsA Transcriptional activator of the Yersinia Yop virulence regulon that encodes 11 different secreted antihost proteins called Yops, as well as a type III secretion machinery that is required for their secretion Regulates dozens of genes which are implicated in quorum sensing, virulence and multidrug resistance Regulatory protein for the xylBAFGHR operon involved in the transport and metabolism of D-xylose Regulates the xylXYZLTEGFJQKIH operon of TOL plasmid, required for the degradation of toluene, m-xylene and p-xylene Regulates the xylXYZLTEGFJQKIH operon of TOL plasmid, required for the degradation of toluene, m-xylene and p-xylene Regulates the xylXYZLTEGFJQKIH operon of TOL plasmid, required for the degradation of toluene, m-xylene and p-xylene Regulates the xylXYZLTEGFJQKIH operon of TOL plasmid, required for the degradation of toluene, m-xylene and p-xylene Regulates the nod operon that controls nodulation factors that promote nitrogen fixation in symbiotic bacteria-plant interaction Probably activates the expression of genes that regulate the metabolism of plant cell wall polysaccharides Probably regulates the pathway involve in rhamnogalacturonan (plant cell wall polysaccharides) depolymerization M 17449691 Benzene derivatives metabolism a Protein name and UniProt accession number in parenthesis. b General biological process. Multiple processes are separated by a comma. c Functional category: Bacteria-plant interaction (BPI), metabolism (M), stress response (S) and virulence (V). d PubMed ID of references describing the biological function. Table S2. Details of the analysis of the genetic context for predicted binding sites of the uncharacterized AraC/XlyS transcription factors. PDB structure 3mn2 3oio Organism Rhodopseudomonas palustris TIE-1 Chromobacterium violaceum ATCC 12472 - 3lsg 3oou Chromobacterium violaceum ATCC 12472 Fusobacterium nucleatum subsp. nucleatum ATCC 25586 Listeria innocua Listeria innocua - - + - - 1283534 / 1283555 3284826 / 3284847 2429687 / 2429708 1447985 / 1448006 2208002 / 2208023 2849441 / 2849462 GATCTGCCCGAGGGCG CCACGC AGGCGCACCAGCTCCT TGACGA GATGACGCCGTTTTCC GCCTTC TCTGGATTCATGATAG TTCAAT TCAAATAAAGGATCCC CGAACT AGACGAACAAATCGCC CAAGCA Gene 1 Locus: Rpal_1214 / UniProtKB: B3QHJ6 Gene: ibeB / UniProtKB: Q7NVV0 Locus: CV_3010 / UniProtKB: Q7NTP4 Locus: FN0792 / Uniprot: Q8RFC1 Locus: lin2190 / Uniprot: Q929T4 Locus: 2833 / Uniprot: Q7ANT9 Genome start / end / strand 1285065 / 1285718 / + 2430434 / 2431813 / + 3286108 / 3286602 / - 1448403 / 1450424 / + 2207728 / 2208621 / - 2849334 / 2849636 / + Description DSBA oxidoreductase / protein disulfide oxidoreductase activity Lipoprotein / outer membrane drug efflux lipoprotein oxidation-reduction process / electron carrier activity urocanate hydratase Aminoglycoside 3'phosphotransferase IIA PTS Lactose / cellobiose uptake / enzyme IIA Distance from sequence 1510 726 1261 419 598 0 Gene 2 Locus: Rpal_1215 / UniProtKB: B3QHJ7 Locus: CV_2243 / UniProtKB: Q7NVU9 Gene: flaD / UniProtKB: Q7NTP3 Locus: FN0793 / Uniprot: Q8RFC0 tRNA Locus: lin2834 / Uniprot: Lin2834 protein Genome start / end / strand 1285927 / 1286238 / + 2431827 / 2432282 / - 3286802 / 3287920 / - 1450652 / 1451851 / + 2208754 / 2208837 / + 2850081 / 2851184 / - Description hypothetical protein conserved hypothetical protein ciliary or flagellar motility / structural molecule activity L-glutamato Transporter tRNA Family FtsW_RodA_SpoVE Implicated in Cell division Distance from sequence 2372 2140 1955 2668 731 619 Genome strand of Binding Site Binding Site start / end Binding Site sequence PDB structure 3mn2 3oio Gene 3 Locus: Rpal_1216 / UniProtKB: B3QHJ8 Gene: nfnB / UniProtKB: Q7NVU8 Genome start / end / strand 1286243 / 1287505 / + 3lsg 3oou Locus: CV_3012 / UniProtKB: Q7NTP2 Locus: FN0794 / Uniprot: Q8RFB9 Locus: lin2191 / Uniprot: Q929T3 Locus: lin2835 / Uniprot: Q927F4 2432321 / 2432974 / - 3288132 / 3289388 / + 1451930 / 1452343 / - 2208964 / 2210244 / - 2851181 / 2852311 / - Description electron carrier activity / flavin adenine dinucleotide binding / iron ion binding / oxidoreductase activity oxygen-insensitive NAD(P)H nitroreductase / 6,7-dihydropteridine reductase activity proteolysis / serine-type carboxypeptidase activity Unknown DNA binding protein (HTH motif) Family FtsW_RodA_SpoVE Implicated in Cell division Distance from sequence 2688 2613 3285 3964 941 1719 Gene 4 Locus: Rpal_1213 / UniProtKB: B3QHJ5 Gene: acrB / UniProtKB: Q7NVV1 Locus: CV_3009 / UniProtKB: Q7NTP5 Gene: hutH1 / Uniprot: Q8RFC2 Locus: lin2189 / Uniprot: Q929T5 Locus: lin2832 / Uniprot: Q927F7 Genome start / end / strand 1281779 / 1285003 / + 2427309 / 2430437 / + 3284381 / 3286096 / - 1446836 / 1448386 / + 2207105 / 2207731 / - 2847990 / 2849297 / + Enzyme, catalyzes: Lhistidine = urocanate + NH3 Protein with structural similarity with: Regulation domain of Rob, multidrug effect transporter regulator and heme binding protein2 PTS, cellobiose uptake / enzyme IIC Description transporter activity / acriflavin resistance protein transporter activity / acriflavin resistance protein B peptidyl-histidine phosphorylation / regulation of transcription, DNA-dependent Distance from sequence 0 0 0 0 293 166 Gene 5 Locus: Rpal_1212 / UniProtKB: B3QHJ4 Locus: CV_2240 / UniProtKB: Q7NVV2 Locus: CV_3008 / UniProtKB: Q7NTP6 Locus: FN0790 / Uniprot: Q8RFC3 Gene: crcB2 / Uniprot: Q929T6 Locus: lin2831 / Uniprot: Q927F8 Genome start / end / strand 1280787 / 1281782 / + 2426145 / 2427305 / + 3283449 / 3284312 / + 1445384 / 1446547 / - 2206609 / 2206998 / - 2847548 / 2847853 / + Description efflux transporter, RND family, MFP subunit / transmembrane transport transmembrane transport / probable multidrug efflux protein regulation of transcription, DNAdependent / twocomponent response Xylose repressor Protein CrcB homolog 2 PTS Lactose / cellobiose uptake / enzyme IIB PDB structure 3mn2 3oio 3lsg 3oou regulator activity Distance from sequence 1774 2404 536 1438 1026 1610 Gene 6 Locus: Rpal_1211 / UniProtKB: B3QHJ3 Gene: fepB / UniProtKB: Q7NVV3 Gene: fhiA / UniProtKB: Q7NTP7 Locus: FN0789 / Uniprot: Q8RFC4 Gene: crcB1 / Uniprot: Q929T7 Gene: kdpA / Uniprot: Q927F9 Genome start / end / strand 1280444 / 1280785 / + 2424881 / 2425849 / - 3281425 / 3283137 / + 1444526 / 1445368 / - 2206263 / 2206619 / - 2845522 / 2847207 / - Description Transcriptional regulator, ArsR family / sequence-specific DNA binding transcription factor activity / HTH arsR-type DNA-binding domain high-affinity iron ion transport / ferrienterobactin-binding periplasmic protein precursor protein secretion Unknown Protein CrcB homolog 1 Potassium-transporting ATPase A chain,l Distance from sequence 2771 3860 1708 2617 1404 2256