Microbial Functional Genomics, Genomic Technologies, And Their Applications Jizhong (Joe) Zhou Zhouj@ornl.gov, 865-576-7544 Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA Gene Expression Patterns Microbial functional Genomics Whole Genome Microarrays Genomic Technology Community & Ecosystem Genomics Microbial Community Diversity & Mechanisms Microbial Ecology & Extremophiles Oligonucleotide Arrays Functional Gene Arrays Producing Magnetic Nanoparticles Community Genome Arrays Uranium Reduction Protein array Challenges in functional genomics Defining gene functions: 30-60% open reading frames are functionally unknown. • Regulatory network Gene number difference could not explain phenotypic differences, suggesting regulation is the key. Microbial Functional Genomics Integrating Gene Expression Profiling, Bioinformatics, mutagenesis and Proteomics MUTAGENESIS BIOINFORMATICS Structure-Based Function Prediction sacB aac1 Gmr pDS31 PROTEOMICS TRANSCRIPTOMICS 2-D Gels DNA Microarrays Genome Sequence ORF #, putative functiona Group 1 r = 0.93 Group 2 r = 0.86 Extracellular Periplasm Jun Mass Spectrometry Group 3 r = 0.84 POI Cytoplasm PSP promoter Jun pIII Fos Transcription & Translation 1203, 3458, 4138, 2987, 3455, 3457, 4141, 4142, 3454, 1863, 1752, 1754, 2851, 2952, 2849, 3388, 3005, 2389, 2390, 3134, 624, 4403, 4405, 4406, 487, 488, 2262, 3280, 3290, 4795, 722, - 3961, 749, 748, 3960, 3956, 3954, 1073, 3958, 2778, 3959, 3957, cytochrome c552, nrfA dimethyl sulfoxide reductase, dmsB Ni/Fe hydrogenase, hydA fumarate reductase, fcc outer membrane protein dimethyl sulfoxide reductase, dmsA Ni/Fe hydrogenase, hydB Ni/Fe hydrogenase, hydC deca-heme cytochrome c fumarate reductase, flavocytochrome c3 formate dehydrogenase, fdhA formate dehydrogenase, fdhC periplasmic nitrate reductase, napA di-heme split-soret cytochrome c ferredoxin-type protein napH prismane formate dehydrog., Se-cystein, fdhA fumarate reductase, frdA fumarate reductase, frdB bacterioferritin, b fr cytochrome c' cbb3-cytochrome oxidase, ccoP cbb3-cytochrome oxidase, ccoQ cbb3-cytochrome oxidase, ccoN cytochrome d ubiquinol oxidase, cydA cytochrome d ubiquinol oxidase, cydB mono-heme c-type cytochrome, scyA probable oxidoreductase ordL conserved hypothetical protein cytochrome b, cyb P N ADH dehydrogenase, ndh 1.11 (±0.04) c 3.16 (±1.26) 3.37 (±1.34) 2.25 (±0.35) 4.16 (±1.31) 4.99 (±0.49) 2.13 (±0.71) 3.11 (±1.22) 5.91 (±1.49) 2.08 (±0.42) 5.57 (±0.86) 4.74 (±0.56) 3.53 (±1.43) 3.55 (±0.32) 2.04 (±0.15) 2.89 (±1.62) 2.29 (±0.58) 2.69 (±0.98) 1.80 (±0.05) 0.30 (±0.06) 0.54 (±0.01) 0.52 (±0.02) 0.60 (±0.13) 0.64 (±0.33) 0.62 (±0.06) 0.83 (±0.05) 0.50 (±0.05) 0.43 (±0.12) 0.42 (±0.28) 0.37 (±0.09) 0.43 (±0.09) 3.33 (±0.47) 1.93 (±0.04) 3.46 (±0.16) 1.78 (±0.10) 2.46 (±0.51) 3.21 (±0.80) 2.09 (±0.41) 1.34 (±0.18) 2.51 (±0.21) 2.01 (±0.48) 10.38 (±4.45) 12.48 (±1.61) 1.05 (±0.31)c nd d 0.82 (±0.04) 0.59 (±0.06) 1.24 (±0.36) c nd d nd d 0.26 (±0.12) 0.33 (±0.02) 0.36 (±0.03) 0.34 (±0.05) 0.36 (±0.04) 0.26 (±0.06) 0.29 (±0.14) 0.37 (±0.07) 0.55 (±0.08) 0.58 (±0.08) 0.47 (±0.02) 0.65 (±0.11) p rom ote r geneIII o ri R CMP Nitrate A. Electron transport: - C o lE 1 F os loxP Gene B. Intermediary carbon metabolism: KanR pJun ori SC101 Mean intensity ratiob Fumarate F1 F2 N1 N2 AMP R M 1 3 ori loxP o ri R 6 ky Group 4 r = 0.90 succinyl-CoA synthetase, sucD glucose-6-phosphate isomerase, gpi transaldolase B, talB succinyl-CoA synthetase, sucC succinate dehydrogenase, sdhA citrate synthase, gltA malate oxidoreductase, sfcA 2-oxoglutarate dehydrogenase, sucA malate dehydrogenase, mdh 2-oxoglutarate dehydrogenase, sucB succinate dehydrogenase, sdhB 0.99 0.59 0.51 0.46 0.54 0.52 0.52 0.40 0.58 0.75 0.70 (±0.26) c (±0.11) (±0.10) (±0.02) (±0.08) (±0.17) (±0.18) (±0.08) (±0.20) (±0.12) (±0.03) 0.54 0.67 0.70 0.44 0.50 0.46 0.49 0.41 0.40 0.41 0.57 0.44 0.41 0.59 0.65 0.48 0.40 0.43 2.27 2.43 (±0.11) (±0.10) (±0.01) (±0.24) (±0.16) (±0.12) (±0.05) (±0.81) (±1.02) 0.57 (±0.03) 0.40 (±0.05) 0.60 (±0.06) 0.24 (±0.05) 1.10 (±0.13) nd d 0.93 (±0.21) c 1.32 (±0.11) 1.74 (±0.05) (±0.06) (±0.11) (±0.04) (±0.04) (±0.08) (±0.12) (±0.06) (±0.05) (±0.06) (±0.05) (±0.14) C. Transcription regulation: Phage Display Group 5 r = 0.86 Group 6 r = 0.81 - 3006, 2099, 3965, 1987, 4603, 1386, 721, 4019, 1382, H 2O2-acti vator, hpkR, LysR family histidine utilization repressor, hutC ferric uptake regulatory protein, fur transcritpional regulator, DeoR family sensor histidine kinase, kinA ATP-dependent protease, hslV transcritpional regulator, LacI family chemotaxis CheV homolog tetrathionite sensor kinase, ttrS Figure 2 Whole genome microarrays available at ORNL Geobacter metallireducens: MetalShewanella oneidensis MRreducing bacterium 1: Metal-reducing (GTL) bacterium (MGP, GTL) Rhodopseudomonas palustris: Photosynthetic bacterium (MGP, GTL) Nitrosomonas europaea: Ammonium-oxidizing bacterium (MGP) Desulfovibrio vulgaris: Sulfate-reducing bacterium (GTL, NABIR) Deinococcus radiodurans R1: Radiation-resistant bacterium (GTL) Methanococcus maripaludis (GTL) Two primary uses of microarrays for functional analysis • Hypothesis-generating, i.e., exploratory, Gene expression profiling under different conditions: e.g., Radiation responses in Deinococcus radiodurans . • Hypothesis-driven: e.g., mutant characterization in Shewanella oneidensis MR-1. Deinococcus radiodurans R1 Genome: 3.3Mb Plasmid 45.7 Kbp Chromosome I 2.65 Mbp Megaplasmid 177.5 Kbp Chromosome II 412.3 Kbp % G+C # ORFs Mean ORF size % Coding 66.6% 3,195 937 bp 91% # Similar to known proteins # Conserved hypothetical # Hypothetical rRNA operons 52.2% 16% 31.5% 9 *D. radiodurans R1 genome sequence and annotation courtesy of TIGR Radiation Resistance of D. radiodurans R1 Radiation Survival Curve • Majority of E. coli cells are dead at ~500 grays. D. radiodurans R1 • D. radiodurans exhibits a shoulder of resistance up to ~5000 Gy; no loss of viability. E. coli Hours post irradiation bp 23.1 9.4 6.6 4.4 M CK 0 1.5 3 5 9 24 • Very little is known about the DNA repair pathways enabling D. radiodurans to resist ionizing and UV irradiation. Deinococcus Cells Can Survive Acute -radiation due to its ability to repair direct damage and remove free radicals. • Direct damage (20%) • Indirect damage due to free radicals (80%) DNA damage repair Re-initiate DNA synthesis (early events after irradiation) -radiation -photon (20%) Cells DNA damages mRNA degradation Irradiation-induced Free radicals (80%) Protein degradation Minimize free radical levels (late events after irradiation) Cellular functions impaired Replication impaired Cell division arrested Cells grow slow or dead Gene Expression Profiling: Experimental Design Recovery of D. radiodurans (wild-type strain R1) from acute radiation (exposure dose = 15,000 Grays of -radiation) Cell Sample Recovery Time (in hours) @ 32C Control (non-irradiated) – 1 0 2 0.5 Irradiated Control 3 1.5 4 3 5 5 6 9 7 12 8 16 9 24 3 biological replicates (different mRNAs) Collaboration with 4 technical replicates Mike Daly Total replicates: 12 •More than 800 genes Time (h) Hierarchical Clustering Analysis of Expression Profile Patterns Gene#, putative functiona A. recA-like activation pattern r=0.83 DR0911 DR2220 DR2221 DRB0069 DRB0067 DR0261 DRA0344 DR0099 DR2129 DR2128 DR0324 DR2337 DRA0346 DR1825 DR1771 DRA0345 DR0422 DR1143 DR0003 DR1776 DR2340 DR2610 DR1645 DR0696 DR0421 DR1775 DR1561 DR2285 DR2356 DR2275 DR0206 DR0204 DR1354 DR0203 DR0205 DR1357 DR2482 DR2483 DRA0008 DRA0234 DR1359 DR2127 DR1356 DRB0136 DR1548 DR0207 DRA0249 DR0665 DR0596 DR0912 recA-like expression profile: DNA replication DNA repair Recombination Cell wall metabolism Cellular transport Uncharacterized proteins Superoxide dismutase r=0.71 0.5 5 3 3 3 0.5 1.5 0.5 1.5 1.5 0.5 1.5 0.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 1.5 0.5 1.5 1.5 1.5 1.5 1.5 3 3 3 3 3 3 1.5 3 1.5 1.5 1.5 3 1.5 1.5 3 3 3 3 3 3 3 0.5 0.5 are induced at 1.5 hr radiation. regulated than downregulated. genes which are functionally unknown are significantly changed upon irradiation. B. Growth-related activation pattern DR1172 DR0461 DR1595 DRA0043 DRA0042 DRA0031 DRA0065 DR2263 DRA0275 DR1279 Proteases, nucleases Lea76/LEa29-like desiccation resistance protein Bacillus yacB ortholog 6-phosphogluconate dehydrogenase, gnd TDP-rhamnose synthetase Glucose-1-phosphate thymidylyltransferase, rfbA Glucose-1-phosphate thymidylyltransferase Chromosomal protein HU HupA, hupA Bacterioferritin, Iron chelating protein Soluble cytochrome C Superoxide dismutase (Mn) 2.66 2.58 2.30 5.08 3.70 2.48 7.71 6.41 4.80 3.91 (±0.60) (±0.81) (±0.52) (±2.12) (±1.19) (±1.64) (±2.07) (±1.97) (±1.22) (±1.43) 24 24 24 12 12 12 24 16 24 24 0.33 0.25 0.37 0.48 0.42 0.23 0.25 0.46 0.35 0.45 (±0.12) (±0.05) (±0.13) (±0.22) (±0.12) (±0.07) (±0.06) (±0.09) (±0.15) (±0.25) 12 3 3 1.5 1.5 3 1.5 1.5 3 5 C. Repressed pattern r=0.77 DR1126 DR1337 DR0728 DR0977 DR1742 DR1998 DR1146 DR0493 DR0674 DR2620 TCA cycle 0.2 Genes involved in de novo synthesis of amino acids and nucleotides (hr)c •More than 40% of the Glyoxylate shunt Repressed Genes (early to mid phases): 1.99 (±1.37) 3.13 (±1.49) 5.24 (±2.94) 3.18 (±1.39) 4.37 (±1.21) 3.36 (±1.68) 1.80 (±1.08) 3.01 (±1.20) 5.92 (±2.09) 4.03 (±2.80) 3.30 (±1.47) 7.41 (±5.71) 3.52 (±1.94) 3.21 (±1.48) 3.52 (±1.15) 10.05 (±4.39) 18.85 (±7.46) 8.85 (±4.26) 14.03 (±5.53) 4.70 (±2.83) 7.98 (±3.86) 4.13 (±1.67) 5.88 (±2.79) 7.19 (±2.16) 4.94 (±2.30) 3.30 (±1.69) 6.00 (±1.40) 2.36 (±0.40) 3.35 (±0.45) 4.93 (±1.81) 5.45 (±2.65) 6.01 (±1.35) 3.78 (±0.42) 3.82 (±0.86) 4.10 (±2.45) 6.79 (±2.56) 5.75 (±2.92) 5.43 (±1.22) 6.60 (±2.00) 12.76 (±5.27) 24.83 (±11.13) 5.40 (±1.50) 9.85 (±5.98) 5.22 (±0.46) 5.62 (±2.35) 15.47 (±8.31) 6.47 (±4.43) 11.66 (±5.74) 3.22 (±1.31) 3.19 (±0.80) Time •More genes are up- Induced Genes (early to mid phases): Stress response DNA-directed rna polymerase beta subunit, rpoC Tellurium resistance protein TerB Tellurium resistance protein TerE Subtilisin serine protease Extracellular nuclease with Fibronectin III domains 8-oxo-dGTPase, mutT LEXA repressor, HTH+protease, lexA SsDNA-binding protein, ssb Ribosomal component L17 , rplQ RNA polymerase alpha subunit, rpoA Probable glutamate formiminotransferase Uncharacterized protein PprA protein, involved in DNA damage resistance Protein-export membrane protein UVRA ABC family ATPase, uvrA-1 Predicted esterase Trans-aconitate methylase Uncharacterized protein Uncharacterized protein Nudix family pyrophosphatase RecA, recA Periplasmic binding protein, fliY Teichoic acid biosynthesis protein, wecG V-type ATPase synthase, subunit K Uncharacterized protein Superfamily I helicase, uvrD UDP-N-acetylglucosamine 2-epimerase, wecB MutY, A/G-specific adenine glycosylase, mutY Nudix family hydrolase Excinuclease ABC subunit B, uvrB Uncharacterized protein Uncharacterized membrane protein Excinuclease ABC subunit C, uvrC Uncharacterized membrane protein ABC transporter ATPase ABC transporter, permease subunit Predicted transcription regulator McrA nuclease Conserved membrane protein Uncharacterized protein, ABC transporter, periplasmic subunit Ribosomal protein S4, rpsD ABC transporter, ATP-binding protein Putative DEAH ATP-dependent helicase, hepA Bacillus ykwD ortholog, PRP1 superfamily protein ComEA related protein, secreted Metalloproteinase, leishmanolysin-like Uncharacterized protein Resovasome RuvABC, subunit B, ruvB DNA-directed rna polymerase beta subunit, rpoB Ratio (fold)b 1 5 RecJ like DHH superfamily Phosphohydrolase Transaldolase, tal Fructokinase, cscK Phosphoenolpyruvate carboxykinase, pckA Glucose-6-phosphate isomerase, pgi Catalase, CATX, katA GSP26 general stress like protein Formamidopyrimidine-DNA glycosidase, mutM Argininosuccinate synthase, ASSY, argG Cytochrome oxidase subunit I, COX1, caaA Discovery of a Novel ATP-dependent DNA ligase Ligase (DR0100) 16.00 relative expression level 14.00 12.00 • A novel ATPdependent DNA ligase was highly expressed with recA profile. • It has consensus motifs with ligase from eucaryotes. DRB 0098 HD family pho spho hydro lase and nucleo tide kinase DRB 0099 Uncharacterized co nserved pro tein DRB 0100 P redicted DNA ligase 10.00 DR2069 NA D dependent ligase, dnlJ 8.00 6.00 4.00 2.00 0.00 0 5 6459863 DNLJ_DR2069 2506362 DNLJ_ECOLI 1352290 DNL1_MOUSE 1706482 DNL4_HUMAN 1706481 DNL3_HUMAN 11498455 AF0849 15894039 CAC0752 6460914 DRB0100 consensus/100% secondary str (1DGS) 10 time (h) 15 123 110 561 201 416 91 38 35 motif I * FTGELKIDGLSV WCCELKLDGLAV FTCEYKYDGQRA FYIETKLDGERM MFSEIKYDGERV VVLEEKMNGYNV CVLEEKVDGANC VVVTEKLDGENT hh...KhsG.th EEEEE EEE 20 25 motif III 44 46 41 46 40 40 49 37 LEVRGEVYL LEVRGEVFL FILDTEAVA CILDGEMMA MILDSEVLL YMLCCEAVG YVMYGEWLY WRFCGENVY h.h.sE.hh EEEEEEEE motif IIIa 44 44 31 28 27 16 12 12 KAILYAVGKRDG TFFCYGVGVLEG CLYAFDLIYLNG CYCVFDVLMVNN CLFVFDCIYFND EFFLFDVREGKT YFMEFDIFDKKE YFYLFSVWDDLN .hh.ashh...t EEEE motif IV 50 51 51 51 51 46 50 42 ADGTVLK IDGVVIK CEGLMVK EEGIMVK LEGLVLK REGVVFK RENLEIR MEGYVVR .-sh.h+ EEEEE 300 290 723 365 573 232 188 165 Liu et al. 2003. PNAS, 100: 4191-4196 Highly coordinated regulations • Energy pathway switching, less energy produced. • Minimizing energy demands --Shutdown de novo biosynthetic pathways Energy • Energy pathway switching --- less free radicals produced. • Increasing activities of the genes involved in removing free radicals. Free radicals Biosynthetic precursors • Shutdown de novo biosynthetic pathways to minimize energy requirement. • Increasing activities of proteases and nucleases to provide amino acids and nucleotides for protein, DNA and RNA synthesis. Shewanella oneidensis – MR-1 Habitats: • • • • Formate Lactate Pyruvate Amino Acids H2 O2 -, NO NO 2 lake & marine sediments 3 Mn(IV) deep sea Mn(III) oil brine spoiled food Fe (III) Fumarate S Mine waste Black Sea Oneida Lake Green Bay Panama Basin Mississippi Delta North Sea Redox Interfaces DMSO TMAO So S2O32U(VI) Cr(VI), Tc, As, Se, I, With this kind of versatility, what will it really do? DOE Shewanella Federation TIGR (John Heidelberg) Sequencing, annotation Metabolomics Center for Microbial Ecology, MSU (J.Tiedje, J.Cole, J.Klappenbach) USC, JPL (K.Nealson) ORNL ESD Microbial Functional Genomics Group UCB (J. Keasling) ANL (C.Giometti) BCM (T. Palzkill) B.Palsson (UCSD) Adam Arkin (LBL) M.Riley (Woods Hole) ISB PNNL (J.Frederickson, (E. Kolker) D. Smith) ORNL LSD, CASD (F.Larimer, B. Hettich) Large Genomes To Life Project: $38M for 5 years Rapid Deduction of Stress Response Pathways in Metal/Radionuclide Reducing Bacteria Stress responses on: Desulfovibrio vulgaris Shewanella oneidensis Geobacter metallireducens National Laboratories Universities Private Organizations UC Berkeley U Washington U Missouri (Consultant) Summary of microarray analysis for Shewanella Responses to 11 different electron acceptors Mutant characterization with chemostats Low-pH and high-pH stress Heat shock, cold shock Oxidative stress (e.g., H2O2)(Ting Li) High salt Carbon starvation Metal stress: strontium, chromium Hypothetical proteins Many mutants Defining Gene Function through Deletion Mutagenesis, ~ 80 deletion mutants GLOBAL REGULATORS: etrA, narQ, fur, crp, arcA, envZ cAMP-BINDING REGULATORS: cAMP1, cAMP2, cAMP3 ADENYLATE CYCLASES: cya1, cya2, cya3 OUTER MEMBRANE PROTEINS AND CYTOCHROMES: mtrC, mtrA, omcA SIGMA FACTORS: rpoH, rpoE, STRESS RESPONSE: oxyR, bolA, dps, ompR, cpxR DOUBLE MUTANTS: etrA-fur, etrA-crp, cpxR-cpxA, ompR-envZ, cpxR-cpxA PAS domain (old annotation): 0834, 0906, 1761,4254, 4326, 4917 Hypothetical proteins: 1377, 3584 Transcriptional factors: 220 genes, 78 within single operon, Cytochrome genes: 42 genes Computational Prediction of the function of the SO1328 Gene Product (LysR) • • • • It was annotated as LysR family protein. It is induced 5-7 folds by H2O2 treatment. It shares ~34% sequence homology with E.coli OxyR gene. 3D structure is similar to OxyR in E. coli. C-terminal domain N-terminal DNA-binding domain Growth phenotype of LysR deletion mutant (SO1328) OD log WT,H2O2 WT 0.4 0.2 0 -0.2 0 uM 0um -0.4 -0.6 -0.8 -1 -1.2 2000um 2,000 uM 0 2 4 6 8 10 Time (hours) OD log LysR, H2O2 Mutant 0.4 0.2 0 -0.2 0 uM 2,000 uM -0.4 -0.6 -0.8 -1 -1.2 0 2 4 6 Time (hours) 8 0um 2000um 10 • Less growth was obtained when the WT cells were treated with 2,000 um H2O2. • Wild type cells were sensitive to H2O2. • No differences between treatment and control for the mutant cells • The LysR mutant is not sensitive to H2O2. • OxyR mutant is more sensitive to H2O2 in E. coli Microarray analysis of LysR mutant in response to H2O2 stress folds of induction deregulation of the major H2O2 (40uM, 2 min) responsive genes 100 80 60 40 20 • Key genes (e.g., dps, katG) known to be involved in oxidative stress were not affected by H2O2 in the mutant. • Since OxyR mutant is more resistant to H2O2, it is expected that the genes involved in oxidative stress should be highly WT expressed, but they are not. This LysR suggests that novel mechanisms and pathways may exist. 0 Dps family protein ahpC KatG-1 ahpF • OxyR-dps double mutant is also resistant to H2O2, suggesting that the oxidative responses in MR-1 are very complicated. Proteomics Tools for studying proteomics 2-Dimentional gel electrophoresis Mass spectrometry Phage-display Yeast two hybrid system Protein arrays Structural determination: X-rays, NMR Using phage-display to study proteinprotein interactions and regulations Gateway cloning vector Extracellular Periplasm POI Cytoplasm Jun PSP promoter Jun pIII Fos Transcription & Translation p rom oter geneIII ori C olE 1 Fos loxP Gene KanR pJun oriSC101 R CMP AMP R M1 3 ori Phage display loxP o ri R 6ky • First key step: cloning all genes into universal vector. • The cloning systems were optimized. • All primers were synthesized. • 3,853 genes were cloned. • Sequenced 50 clones, no errors were found. Expression of Shewanella proteins from the pDEST17 vector ni i n i i i GST i i 175kDa 83kDa 62kDa 70.2kDa 34.2kDa 48kDa 20.5kDa 33kDa 32.4kDa 25kDa NarQ ArcA Global regulatory genes are well expressed in E. coli Fur EtrA n= no insert control i= expression induced with 0.5 mM IPTG Identification of binding motifs of ArcA by gel shifting assays gltA aceA aceB 1. Consistent with E. coli : Icd, gltAsdhCAB, sucABCD 2. Different from E. coli, aceBA, potentially regulate the glyoxylate shunt pathway. 3. Shewanella ArcA can also interact with promoters of other TCA cycle related genes (not found in E. coli): SO0970 (fumarate reductase flavoprotein subunit precursor), SO1538 (isocitrate dehydrogenase), , SO2222 (fumarate hydratase) Icd sucAB sdhCAB sucCD Using promoter microarray for studying proteinDNA interactions to understand regulatory network 1 In vitro/vivo pull down qPCR amplification 2 Non specific competitors 1. BSA/milk Direct binding 2. Random DNA Verification by EMSA/RT-PCR/cDNA microarray Challenges in protein arrays Antibodies are commonly used as probes in protein arrays Two big challenges: Loss of activity: The big challenge for antibody arrays is the loss of activity of antibody because the active binding site may bind to slide surface through chemical bonding, and thus the active site may not be available to the antigen. Cross reactivity: Specificity is also a big issue for antibody protein arrays.. Development of novel chemistry for protein array fabrication Langmuir 20, (2004), 8877-8885. Proteomics, in revision Thin film coating 1, Polycation 2, Wash 3, Polyanion Cleaned slide 4, Wash repeat 5, Polycation Glass substrate Proteins are affixed on the slide by: • Entrapment by porous structure of the polymer • Electrostatic interaction • But not by covalent bonding Proteins spotted on different slides 2 fold decrease Nanofilm coated slide • More sensitive • Less background noise Nanofilm-coated Superaldehyde Poly-Lysine Superamine Antibody arrays 1 2 3 4 5 Anti-Human IgG BSA Anti-Fibronectin BSA Streptavidin Very good specificity of the antibodyantigen reactions were obtained. BSA • A patent was filed and licensed to a company • Nominated by ORNL for R&D100 Award. Detection of Single Base Pair Differences GAG GGG GAA AGC GGG GGA TCG CAA GAC CTC GCG TGA TTG GAG CGG CCG AT CCT AGC GTT XTG GAG CGC A One-mismatch probe CCT AGC GTT XYG GAG CGC A two-mismatch probe 3-mismatch probe CCT AGC GTT XYZ GAG CGC A Checkborder X=C Checkborder XY=GG XY=AA XY=AT XY=GA XYZ=GA C XYZ=A GC Discrimination factor(Fm/Fp) X=G X=A X=T 1.2 Perfect match 1 blank bar-polymer coated slide Filled bar-SuperAldehyde slide 0.8 0.6 1-mismatch 0.4 2-mismatch 4 & 5 mismatch 3-mismatch 0.2 0 Checkborder 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 probes • Short oligos (<25 bp) without end modification, typically $20/oligo. • More than 5 fold difference of signal intensity between PM and MM probes. • Single mismatch can be clearly differentiated. Arbitrary cutoff for network identification Correlation matrix of 5 genes Main challenges All methods defined a cutoff arbitrarily. Identified clusters or modules are ambiguous. 1 0.3 0.9 0.5 0.4 1 0.7 0.3 0.8 1 0.4 0.2 1 0 0.9 0.5 0.4 1 0.7 0 1 0.6 1 Rc=0.7 Rc=0.4 1 Only 3 0.8 0.4 0 1 0.6 1 interactions left when Rc=0.7. 7 interactions left when Rc=0.4 0 0.9 0 0 1 0.7 0 0.8 1 0 0 1 0 1 1 7 interactions left 3 interactions left Novel approach for network identification Poisson Distribution Wigner-Dyson Distribution (cutoff >0.7) (cutoff < 0.7) Random Matrix Theory and Level Statistics Level Spacing Distribution of Yeast Gene Correlation Matrix P(0.8) P(0.7) p(0.6) P(0.5) 1 Poisson Distribution: P( s ) exp( s) 0.8 0.6 p Wigner-Dyson Distribution: s2 P( s) s exp 2 4 • Random properties: WignerDyson distribution • Nonrandom properties: Poisson distribution 0.4 0.2 0 0 0.5 1 1.5 2 2.5 Level Spacing Main advantages: • Universal laws support • Automatic cutoff • Reliable, sensitive, robust 3 Identification of 27 Modules from Yeast Cell Cycle Expression Data Experimental Validation of some hypothetical proteins • Cycloheximide inhibits protein synthesis by blocking peptidyl transferase. • Mutants are more sensitive to this drug, suggesting that it has defective ribosome. • Thus the function of the genes is involved in ribosomal biogenesis. Functional identification of a hypothetical protein in Shewanella 1 For Shewanella heat shock data, SO2017 is grouped with heat shock proteins. Experimental validation of SO2017 10 OD 600 1 30oC SO2017 30oC Series2 DSP10 42oC Series3 SO2017 42oC Series4 DSP10 Series1 0.1 0.01 0 2 4 Time (h) 6 8 2 7 5 6 3 4 1. dnaK 2. htpG 3. groEL 4. groES 5. Lon 6. dnaJ 7. SO2017 • Mutant of SO2017 is sensitive to heat shock. • This gene is indeed involved in heat shock response. • Suggesting that the prediction is correct Pioneering advances in microarray-based technologies to address challenges in microbial community genomics Challenges: Specificity: Environmental sequence divergences. Sensitivity: Low biomass. Quantification: Existence of contaminants: Humic materials, organic contaminants, metals and radionuclides. Solutions Developing different types of microarrays and novel chemistry to address different levels of specificity. Developing novel signal amplification strategy to increase sensitivity Optimizing microarray protocols for reliable quantification. Summary of 50mer-based FGAs for environmental studies Oligonucleotide probe size: 50 bp Tiquia et al. 2004. BioTechniques 36, 664-675 Rhee et al. 2004, AEM 70:4303-4317 • • • • • • Nitrogen cycling: 302 Sulfate reduction: 204 Carbon cycling: 566 Phosphorus utilization: 79 Organic contaminant degradation: 770 Metal resistance and oxidation: 85 • Total: 2,006 probes • All probes are < 88% similarity Specificity of 50 mer microarrays Specific hybridization was obtained with probes 85% similarity 4 5 • 5 nirS genes were mixed together • Only corresponding genes were hybridized 1 3 nir K 2 nir S • 6 types of genes were mixed together • Only corresponding genes were hybridized nif H amo pmo A dsr AB A Sensitivity Cells Genomic DNA 5 6 7 8 1 2 3 4 500 ng gDNA 50 ng 25 ng 1.6109 1.3107 Detection limit • 50 ng pure DNA in the presence of nontarget templates • 107 cells 3.0106 Quantification and validation r2 = 0.98 0.5 Real-PCR 1.6 109 12 8.0 108 0.0 2.0 108 4.0 Real Time PCR (Log Copy Number) Microarray Hybridization (Log SNR) 1: gi4704462-TFD 2: gi4704463-TFD-Microcosm 3: gi4704464-TFD-Enrichment 4: gi4704463-TFD 5: gi4704464-TFD-Microcosm 6: gi4704465-TFD-Enrichment 7: gi2828015-TFD 8: gi2828016-TFD-Microcosm 9: gi2828017-TFD-Enrichment 10: gi2828018-TFD 11: gi2828019-TFD-Microcosm 12: gi2828020 8 109 1.0 107 5.0 107 2.5 107 -0.5 r=0.86 10 Log Value Signal Ratio (Log(Log R) LogLog Signal Ratio R) Microarray hybridization -1.0 6 4 2 0 1.3 107 3.0 106 -1.5 6.0 6 -2 106 7 -4 8 9 Log (Cell Cell Number (Log[N]) [N]) Log Number Quantification • Good linear relationship • Quantitative 10 0 2 4 6 8 10 12 14 Genes • Microarray result is consistent with realtime PCR Novel amplification approach for increasing hybridization sensitivity 10fg 4.6 M A1 B1 A2 B2 A3 B3 A4 B4 A5 B5 A6 B6 A7 B7 A8 B8 M 4.4 Log Signal Intensity 4.2 4.0 3.8 3.6 3.4 SO4131: r2=0.9910 SO3234: r2=0.9922 SO1077: r2=0.9924 SO4136: r2=0.9934 SO2637: r2=0.9942 3.2 3.0 2.8 2.6 -2 -1 0 1 2 Log DNA Template Concentration (ng) As low as 10fg (2 cells) can be detected Submitted to PNAS Amplification is quantitative for majority of the genes 3 NABIR Field Research Center Samples pH Nitrate Uranium Nickel TOC FW-300* 6.1 1.200 0.001 0.005 30 FW-003 6.0 1060 0.01 0.015 100 FW-005 3.9 175.0 6.40 5.00 70 FW-010 3.5 42000 0.17 18.0 175 FW-015 3.4 8300 7.70 8.80 65 TPB-16 6.3 30.00 1.10 ND 65 2 L groundwater Genes analyzed 16S rRNA, nirS, nirK, dsrAB, amoA Area 2 16 S-3 Ponds Cap 010 Contaminant source 005 Most contaminated 015 Least contaminated 003 Area 3 Less contaminated 275 m Area 1 N 6 samples were taken to assess the effects of contaminants on microbial community structure 30 m Groundwater samples with very low biomass • 2L groundwater from six different sites. • Cell counts: 1-5x105/ml • DNA was isolated, 1/20 of the DNA was manipulated and used for hybridization. • Nice hybridization was obtained with the DNA manipulated with the new method. • No hybridization were obtained if the DNA is not manipulated. Difference of functional genes in samples from NABIR Field Research Center 40000 35000 30000 FW300 25000 Reference site 20000 15000 10000 5000 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 40000 35000 30000 FW010 25000 20000 Highly contaminated site 15000 10000 5000 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 • Clear difference was observed among contaminated and noncontaminated sites. • E.g., some genes are present in noncontaminated site but not in contaminated sites Overall diversity among different samples FW300 FW300 FW003 FW021 FW010 FW024 61(20%) 189(36%) 174(35%) 80(21%) 111(23%) 25(11%) 144(35%) 61(17%) 84(20%) 10(5%) 64(20%) 90(24%) 6(5%) 118(37%) FW003 FW021 FW010 FW024 Total Genes Detected Genetic diversity, Simpson’s (1/D)a 30(16%) 302 219 192 130 190 125.5 67.1 26.6 17.4 35.7 • Overall diversity correlates with contaminant level. • The proportion of overlapping genes between samples was consistent with the contaminant level and geochemistry. • A significant portion (5-20%) of all detected genes were unique to each sample, even though they are very close. Thus, important microbial populations appear to be highly heterogeneous in this groundwater system. CommOligo --- New oligo probe design program for community analysis Number and specificity of designed probes (50-mer) by different programs Group sequences of nirS and nirK (842 gene sequences) Programs used Total ORFs ORFs rejected Probes designed Specific probe Nonspecific Group-specific ArrayOligoSeector 842 0 842 117 725 0 OligoArray 842 35 807 70 737 0 OligoArray 2.0 842 51 791 35 756 0 OligoPicker 842 657 185 141 44 0 CommOligo 842 512 330 147 0 183 • Useful for both whole genome microarrays and community arrays • Able to design group-specific probes • Better performance than other programs Probes Designed for a Second Generation FGA • Nitrogen cycling: 5089 • Carbon cycling: 9198 • Sulfate reduction: 1006 • Phosphorus utilization: 438 • Organic contaminant degradation: 5359 • Metal resistance and oxidation: 2303 23,408 genes •23,000 probes designed Total: • Will be very useful for community and ecological studies Community Genomics Grand challenges • Extremely high diversity, 5000 species/g soil • 99% of the microbial species are uncultured Whole community sequencing 99 010A-A05 Ralstonia eutropha Azoarcus eutrophus 67 Ralstonia NI1 59 010A-E08 010D-B06 010A-F09 54 Azoarcus FL05 98 010B-A01 uncultured clone 3 100 010A-A04 Acidovorax 3DHB1 84 010D-C09 95 uncultured clone 81 96 80 010A-D01 Rhodoferax antarcticus 97 010A-F11 98 uncultured clone HC-32 64 010B-E10 Aquaspirillum autotrophicum 61 010D-D06 55 010D-A06 53 uncultured clone S015 uncultured clone GOUTA12 99 010B-G08 51 010B-B11 100 Pseudomonas marginalis 010D-G08 100 010B-B09 87 010D-C08 99 Pseudomonas stutzeri 010A-C01 010A-A01 100 Rhizobium gallicum 010A-F12 100 uncultured clone LAH1 89 71 100 89 100 0.05 • Sample from NABIR Field Research Center at ORNL • Sequenced by DOE Joint Genome Institute • 20 species based on 16S rRNA Sequencing a stable thermophilic terephthalate (TA)-degrading community CH4 + CO2 CO2 H2+CO2 Ac TA Go’ (B) (A) (1) TA 2 8 H 2 O (kJ/reaction ) 3acetate 3H 2HCO 3 3H 2 3 (43.2) (2) 4H 2 HCO H CH 4 3H 2 O (-135.6) (3) acetate H 2 O HCO 3 CH 4 (-31.0) (4) 4TA - 35H 2 O 2 17HCO 3 9H 15CH 4 (-151.9) • Terephthalate (TA) or 1,4-benzene dicarboxylic acid is a major byproduct of the plastics manufacturing industry. • Three dominant populations: – Pelotomaculum: converting TA to acetate and hydrogen. – Methanothrix: converting acetate to methane and carbon dioxide. – A representative of candidate bacterial phylum OP5, unknown function, but may also ferment TA. Syntrophic Interaction Shewanella-Clostridium Co-Culture MeOH + Fe(III) 14CO 2 Growth Functional Genomics of Shewanella in CoCulture – [towards microbial communities] Establish ShewanellaClostridium co-culture MR-1 & Clostridium acetobutylicum or C. sphenoides Global expression analyses of co-cultures Also Fe(II) Daniel, Gottschalk et al. 1999 Desulfovibrio (H2 production) + Methanococcus (H2 utilization) Genomics, community functions and stability Linking genomics to populations, to community diversity, functions, stability and to global change Dynamics, stability in nature Analyses: genome sequencing, FGA microarrays Obj 4. Effects of elevated CO2 on microbial community, functions & stability in nature Natural system Many species Obj 5. Integration, modeling, simulation & prediction across different organization levels Defined system 2 species Obj 2. AOB-NOB interactions, regulation & stability Analyses: mRNA, protein, metabolites, populations dynamics, community function Insights on stability of the mutalistic interactions in more complex systems Providing systems and knowledge for constructing more complex systems Defined systems 3 & 4 - species Obj 3. Competition, functional redundancy, stresses, & stability Providing signature target genes for monitoring Natural system Many species Probe sequences, diversity Dynamics, stability in nature Isolates, sequences Mechanistic understanding of coexistence in nature Obj 1. Genome diversity of nitrifying community & isolation • Nitrifying communities. • One of the biggest NSF program in life science. • 1M/yr for 5 years. • Preproposal was panel reviewed, and invited to submit a full proposal. Proposal to NSF Frontiers In Integrated Biological Research (FIBR) program. Predictive Microbial Ecology Qualitative microbial ecology: Due to the difficulty in obtaining experimental data, microbial ecology is qualitative, but not quantitative. Opportunity for quantitative microbial science: With availability of genomic technologies, microbial ecology is no longer limited by the deficiency of experimental data. Challenges: Modeling, simulation and prediction A big mathematical challenges: dimensionality problem. The sample number is less than the gene number. Possible solution: System ecology + Genomics An example of the conceptual integration scheme mk d (k ) (k ) (k ) i. Modeling microarray data at individual gene level xi (t ) Wij x j (t ) dt j 1 ii. Modeling interactions between functional gene groups or gilds. n d yk (t ) f k (t ) Qkj ykj (t ) dt j 1 mk yk xi( k ) mk i ii. Modeling interactions between functional gene groups or gilds. N d z p (t ) g p (t ) U pq z pq (t ) dt q 1 n zk (t1 ) yi (t1 ) n i Grand Challenges for Systems Biology Sequence and Pathway Analyses Sequencing Microarray Data Analysis & Management Population Level Modeling Experiment Design Experiment Species 3 Species 1 Species 2 • Network identification and modeling • Scaling from single cells to ecosystems • Spatial • Temporal Community Level Modeling First Book on Microbial Functional Genomics Authors Jizhong Zhou, Dorothea Thompson, Ying Xu, James M. Tiedje John Wiley & Sons, March 19, 2004 15 chapters, > 600 pages Rita Colwell, former NSF Director, wrote a forward To our knowledge, this is the first book in microbial functional genomics Acknowledgement (1) • Department of Energy – Microbial Genome Program – Genomes To Life Program – NABIR Program – Ocean Margin Program – Carbon cycling programs • Oak Ridge National Laboratory – Laboratory Directed Research and Development Microbial Genomics and Ecology Group at Environmental Sciences Division, ORNL • ORNL – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – – Acknowledgement Zhili He Liyou Wu Dorothea Thompson Yongqing Liu Ting Li Matthew Fields Xuedan Liu Tingfen Yan Sung-Keun Rhee Song Chong Yunfeng Yang Jost Liebich Christopher Schadt Dawn Stanek Adam Leaphart Weimin Gao Terry Gentry Steve Brown Qiang He Feng Luo Crystal McAlvin Susan Carroll Lisa Fagan Haichun Gao Hongbin Pan Xiufeng Wan Xichun Zhou Zamin Yang Jianxin Zhong Dong Yu Ying Xu • Michigan State University – – – James M. Tiedje James Cole Joel Klappenbach • USUHS – Mike Daly • USC – Ken Nealson • Argonne National Lab – Carol Giomettie • Univ of Iowa – Caroline Harwood • Oregon State Univ – Dan Arp • UC Berkeley – Jay Kneasling • Ohio State Univ – Bob Tabita • Univ of Missouri – Judy Wall • Bayler College – Tim Palzkill • SREL – Chuanlun Zhang • PNNL – – – – – Jim Frederickson Margie Romine Yuri Gorby Dick Smith Mary Lipton • LBL – – Terry Hazen Adam Arkin • Perkin Elmer – Xinyuan Li