Functional Encyclopedia of Bacteria and Archaea Matthew Blow Deutschbauer lab, LBNL Adam Deutschbauer Morgan Price Kelly Wetmore Adam Arkin JGI Cindi Hoover Feng Chen Jim Bristow mjblow@lbl.gov 1. Gene function annotation using transposon mutagenesis and sequencing (TnSeq) 2. A ‘Functional Encyclopedia of Bacteria and Archaea’ (FEBA) 1. Gene function annotation using transposon mutagenesis and sequencing (TnSeq) 2. A ‘Functional Encyclopedia of Bacteria and Archaea’ (FEBA) Problem: Computational annotation of microbial genomes is imperfect Current computational genome annotation pipeline: Isolate Sequence Predict gene structure and function Incomplete model Nucleus Limitations of homology: • Median bacterial genome: 3261 protein coding genes 971 “hypothetical” protein coding genes • New experimental approaches are necessary to rapidly annotate and characterize microbial genomes. Our solution: Experimental evidence based annotation of genomes Develop a rapid experimental pipeline to: 1) Assess phenotypic capability via growth assays (~300 metabolic and stress conditions) 3) Predict gene function with TnSeq in multiple conditions per microbe Nucleus 2) Correct gene structure and identify promoters with RNAseq In D. vulgaris, 507 gene revisions and 1,124 promoters at single nucleotide resolution. Synthetic light collecting structure Gene function annotation by TnSeq Microbe of interest Condition A Identify mutant fitness effects by PCR and sequencing Is there evidence that this approach works to annotate gene function? ii) Recovery i) Transposon Mutagenesis Condition B iii) Antibiotic selection Mutant population Millions cells, 1 random mutant per cell … … … Selection under 100’s of conditions essential in condition B essential in condition C essential in all conditions Proof of principle: Gene function annotation using Transposon mutagenesis and microarray based analysis Condition 1 Condition 2 Condition 3 …etc S. Oneidensis MR-1 Metal reducing bacteria Bio-remediation Mutant population Growth under ~300 conditions Assay selected populations on microarray (Deutschbauer et al PLoS Genetics 2011) 290 diverse conditions (average 7 mutants per gene) 3,355 genes Microbe Plant Mb Gb Fungi 10’s Mb Metagenome Gb+ an a u ua l to / m As ated se m bl er G M ar ie s 1, an a u ua l to / m As ated se m bl er G 2-3 M ar ie s Li br 10’s Mb AP Genes with Tn mutants As s d i em ffi b cu ly lty Fungi 3,355 1-2 Mb G en si om ze e Microbe Li br As s d i em ffi b cu ly lty G en si om ze e Proof of principle: Gene function annotation using Transposon mutagenesis and microarray based analysis AP 1,230 1-2 AP Genes with significant 3+ phenotypes M 2-3 AP 1 S,V 40 1, Genes with proposed Assemblers:–ve M fitness = Meraculous, AP = AllPaths, S = SOAP DeNovo, V =M velvet Plant Gb 3+ of effect annotations specific Meraculous (Chapman et al. PloS One, 2011), Assemblathon (Earl et al Genome Research, 2011) No fitness effect molecular function +ve fitness effect Meta- 7 (Deutschbauer et al PLoS Genetics 2011) 7 4 1. Gene function annotation using transposon mutagenesis and sequencing (TnSeq) 2. A ‘Functional Encyclopedia of Bacteria and Archaea’ (FEBA) A Functional Encyclopedia of Bacteria and Archaea (FEBA) ~50 Phylogeneticaly diverse organisms (GEBA) * * * Phosphorous sources, 8 * * * TnSeq under 50 growth conditions Sulfur sources; 12 * * Bacterial phylogenetic tree * * * * Carbon sources; 96 * * * Environment al stresses (temp, pH, salinity); 9 * *= GEBA / candidate F-GEBA Phylogeny approach to maximize functional diversity Small molecule stresses (metals, Nitrogen antibiotics); sources; 48 165 300 possible growth conditions Outcome: 1000’s of novel gene function annotations Plans for a FEBA pilot project Aim 2 Culturing and transposon mutagenesis of ~40 diverse bacteria ..etc Growth assays RNASeq TnSeq Analysis / integration Functional genome annotation Aim 1 a) Work through the entire functional annotation pipeline for one bacteria (P. Stutzeri) b) Expand to ~10 bugs Plans for a FEBA pilot project Aim 1 Culturing and transposon mutagenesis of ~40 diverse bacteria ..etc ? Growth assays RNASeq TnSeq Analysis / integration Functional genome annotation Strategy for identifying transposon insertions PCR primer contains adapter arm and 5’ index sequence 3’ 5’ 3’ Transposon complementary sequence Random 5mer Read 2 primer 5’ 3’ 5’ 5’ 3’ 3’ 5’ 3’ Genomic DNA only inserts are not amplifiable by downstream PCR 5’ 5’ 3’ 3’ 5. Sequencing (HiSeq or MiSeq) Index Read + Tn specific primer 3’ etc 3’ DNA / Tn junction 5’ 3’ Tn specific primer 5’ 3. Ligate custom truncated illumina adapter 5’ 4. PCR using Tn specific primer 3’ Read 1 primer 6. Mapping to reference genome and counting 5’ 5’ 1. Isolate genomic DNA from mutant population 2. Sonicate DNA 3’ Illumina universal adapter Does this sequencing strategy work? Can we use it to identify function of known genes? Proof of principle: Identification of genes required for survival in minimal media in Pseudomonas Stutzeri P.Stutzeri Soil bacteria with a potential applications in bioremediation Transposon Mutagenesis Select in LB Compare >106 mutant cells Select in minimal media TnSeq specifically identifies Tn insertions and is highly reproducilbe Tn insertion is at TA Replicate 1 99.91% 97.81% Replicate 2 99.92% 97.80% Tn inserts per gene (Rep 1) Map to the genome 150 100 50 Pearson correlation 0.99 0 0 50 100 150 Tn inserts per gene (Rep 2) “Essential” genes appear as transposon free regions Illumina read depth 230 Transposon insertions Insertion free site Transposon insertions 0 Genes Non-essential genes Non-essential genes Essential gene: dihydroxy-acid dehydratase (required for biosynthesis of amino acids) Top 20 genes advantageous for survival in minimal media Gene Phosphoribosylanthranilate_isomerase phosphoserine_phosphatase_SerB 3-isopropylmalate_dehydrogenase Predicted_membrane_protein Putative_threonine_efflux_protein O-succinylhomoserine_sulfhydrylase Chemotaxis_protein_histidine_kinase_and_related_kinases tryptophan_synthase,_beta_subunit Indole-3-glycerol_phosphate_synthase hypothetical_protein anthranilate_phosphoribosyltransferase ATP_phosphoribosyltransferase,_regulatory_subunit methionine_biosynthesis_protein_MetW 5,10-methenyltetrahydrofolate_synthetase Membrane_protease_subunits,_stomatin/prohibitin_homologs 3-isopropylmalate_dehydratase,_large_subunit anthranilate_synthase_component_I Predicted_integral_membrane_protein Imidazoleglycerol-phosphate_dehydratase 5,10-methylenetetrahydrofolate_reductase Tn insertion ratio (LB / minimal) 7.0 6.2 5.0 4.7 3.8 3.5 3.4 3.2 3.2 3.1 3.1 3.0 3.0 2.9 2.8 2.8 2.7 2.7 2.7 2.6 Top 20 genes advantageous for survival in minimal media Gene Phosphoribosylanthranilate_isomerase phosphoserine_phosphatase_SerB 3-isopropylmalate_dehydrogenase Predicted_membrane_protein Putative_threonine_efflux_protein O-succinylhomoserine_sulfhydrylase Chemotaxis_protein_histidine_kinase_and_related_kinases tryptophan_synthase,_beta_subunit Indole-3-glycerol_phosphate_synthase hypothetical_protein anthranilate_phosphoribosyltransferase ATP_phosphoribosyltransferase,_regulatory_subunit methionine_biosynthesis_protein_MetW 5,10-methenyltetrahydrofolate_synthetase Membrane_protease_subunits,_stomatin/prohibitin_homologs 3-isopropylmalate_dehydratase,_large_subunit anthranilate_synthase_component_I Predicted_integral_membrane_protein Imidazoleglycerol-phosphate_dehydratase 5,10-methylenetetrahydrofolate_reductase Red = known role in amino acid biosynthesis Blue = known role in purine biosynthesis Tn insertion ratio (LB / minimal) 7.0 6.2 5.0 4.7 3.8 3.5 3.4 3.2 3.2 3.1 3.1 3.0 3.0 2.9 2.8 2.8 2.7 2.7 2.7 2.6 Conclusion: - TnSeq strategy works - Identifies genes required for growth in minimal media The next experiment: P. Stutzeri Mutant library Selection under multiple conditions Synthesis of libraries in plate based format Sequencing of pooled experiments Plans for a FEBA pilot project Aim 2 Culturing and transposon mutagenesis of ~40 diverse bacteria ..etc Growth assays RNASeq TnSeq Analysis / integration Functional genome annotation Aim 2 a) Work through the entire functional annotation pipeline for one bacteria (P. Stutzeri) b) Expand to ~10 bugs Progress toward culturing and mutagenesis of ~40 bacteria 44 bugs (9 phyla) In hand at LBNL 15 bugs (5 phyla) Cultured 9 bugs (2 phyla) Tn mutagenesis attempted Was mutagenesis successful? MiSeq analysis of transposon mutant libraries from four new bugs Tn mutants of four marine bacteria with similar culturing conditions Alcanivorax jadensis Dinoroseobacter shibae Kangiella aquimarina Phaeobacter gallaeciensis Isolate and pool DNA PCR Tn inserts and sequence on MiSeq Map to four genomes Alcanivorax jadensis insertions Dinoroseobacter shibae insertions Kangiella aquimarina insertions Phaeobacter gallaeciensis insertions MiSeq analysis of transposon mutant libraries from four new bugs 96% reads map to unincorporated transposon! But……. Candidate transposon insertions from all 4 bugs (Insertion dinucleotide frequency / genome dinucleotide frequency) Candidate transposon are at expected TA dinucleotides 20 TA = insertion site preference of pHIMAR transposon Kangiella aquimarina (639 potential insertions) 10 0 Fold enrichment Conclusion: 20 Phaeobacter gallaeciensis (158 potential insertions) 10 0 - We are able to culture and mutagenize diverse bacteria - Need40 to demonstrate that we can generate high Dinoroseobacter shibae (170 potential insertions) 30 diversity mutant libraries 20 10 0 20 Alcanivorax jadensis (161 potential insertions) 10 0 AA AC AG AT CA CC CG CT GA GC GG GT TA Dinucleotide sequence of Tn insertion site TC TG TT Summary We are developing high throughput experimental approaches to annotate gene function * * * * * * * * Bacterial phylogenetic tree * * * * The ‘FEBA’ project will provide functional annotation for 50 diverse organisms / 1000s novel genes * * * * Future ‘product’ of JGI? Keen to target bugs of interest to DOE and to JGI user community mjblow@lbl.gov Example of specific novel gene function annotation from transposon mutagenesis Gene S0_3749 = Hypothetical gene with no homology based annotation Functional evidence from mutant assays 2. Function confirmed in complementation assay Arg biosynthesis genes Conditions Does SO_3749 catalayze missing step in Arg biosynthesis? Strong –ve fitness effect No fitness effect Conclusion: SO374 encodes a functional acetyl-ornithine deacetylase No homology to the functional ortholog (argE) in E.Coli Transposon mutagenesis through bacterial conjugation Vector carrying transposon Target cell E. Coli ‘donor’ cell Conjugation Growth in absence of DAP (E. Coli dies) Further growth (Vector is lost) Transposon mutagenesis through bacterial conjugation Vector carrying transposon Target cell E. Coli ‘donor’ cell Conjugation Growth in absence of DAP (E. Coli dies) THIS STEP DIDN’T WORK PROPERLY Further growth (Vector is lost)