www. .uni-rostock.de Bioinformatics Computational methods to discover ncRNA in bacteria Ulf Schmitz ulf.schmitz@informatik.uni-rostock.de Bioinformatics and Systems Biology Group www.sbi.informatik.uni-rostock.de www. Outline .uni-rostock.de 1. Problem description 2. Streptoccocus pyogenes 3. The RNome, transcriptome 4. Characteristics of bacterial ncRNA 5. Approaches to find fRNA 6. Conclusion / Outlook Ulf Schmitz, Computational methods to discover ncRNA 2 www. Streptococcus pyogenes .uni-rostock.de • important human pathogen (group A streptococcus or GAS) • causes following diseases: – pyoderma (111 million cases/year) – pharyngitis (616 million cases/year and 517,000 deaths/year) pyoderma (source: DermNet NZ) • • pharyngitis (source: UCSD) completely adapted to humans as it’s only natural host causes purulent infections of the skin and mucous membranes and rarely life-threatening systemic diseases Ulf Schmitz, Computational methods to discover ncRNA 3 Streptococcus pyogenes www. .uni-rostock.de varies in multiplication rate -> associated with type of infection to understand the regulation, one studied the growth-phase regulatory factors and gene expression in response to specific environmental differences within the host a novel growth phase assosiated two-component-type regulator was identified fasBCA operon, present in all 12 tested M serotypes contained two potential HPK genes (FasB, FasC) and one RR (FasA) shows its maximum expression and activity at the transition phase and to potentially support the aggressive spreading of the bacteria in its host HPK = Histidine protein kinase RR = response regulator Ulf Schmitz, Computational methods to discover ncRNA 4 Streptococcus pyogenes www. .uni-rostock.de • downstream of the fas operon they identified a ~300 nucleotide transcript (fasX) • not encoding for a peptide/protein – but also growth phase related – main effector molecule of fas regulon • ncRNA or fRNA Ulf Schmitz, Computational methods to discover ncRNA 5 www. ncRNA .uni-rostock.de fasX gltX-L fasB tt pfas fasC fasA rnpA-L tt pfasX prnpA 1kb Ulf Schmitz, Computational methods to discover ncRNA 6 www. RNome or transcriptome .uni-rostock.de RNA mRNA ncRNA / fRNA snmRNA / sRNA Structural RNA rRNA miRNA tRNA siRNA snRNA snoRNA stRNA putative gene expression regulators (also protein interaction – and housekeeping ncRNAs where found) Ulf Schmitz, Computational methods to discover ncRNA 7 RNome or transcriptome www. .uni-rostock.de types of RNA: mRNA messenger RNA - transcript of a protein coding gene rRNA ribosomal RNA - form large parts of the ribosome, the protein producing machinary tRNA transfer RNA - also involved in protein production, carrying single amino acids to the growing amino acid chain of a protein ncRNA non coding RNA - found in intergenic regions, playing miscellaneous roles Non-coding RNA (ncRNA) genes produce functional RNA molecules rather than encoding proteins and here are the nominees: fRNA Functional RNA essentially synonymous with non-coding RNA miRNA MicroRNA 21-24 nucleotide RNAs probably acting as translational regulators mRNA siRNA Small interfering RNA active molecules in RNA Interference snRNA Small nuclear RNA includes spliceosomal RNAs snmRNA Small non-mRNA essentially synonymous with small ncRNAs snoRNA Small nucleolar RNA most known snoRNAs are involved in rRNA modification stRNA Small temporal RNA for example, lin-4 and let-7 in Caenorhabditis elegans Ulf Schmitz, Computational methods to discover ncRNA 8 www. Functions of ncRNA .uni-rostock.de …target mRNAs via imperfect sequence complementarity binding may result in: • blockage of ribosome entry (translation repression) • melting of inhibitory secondary structures (translation activation) dissolving fold the fold back structure loop-loop kissing complex Ulf Schmitz, Computational methods to discover ncRNA 9 www. Streptococcus pyogenes genomes Serotype Length Date M1 GAS 1852441 bp Sep 19 2001 MGAS10270 1928252 bp May 4 2006 Genes: 1805 MGAS10394 1899877 bp Aug 3 2004 Protein coding 1697 MGAS10750 1937111 bp May 4 2006 Length 1,852,441 nt MGAS2096 1860355 bp May 4 2006 Structural RNAs: 73 MGAS315 1900521 bp Jul 18 2002 GC Content: 38% MGAS5005 1838554 bp Aug 8 2005 Pseudo genes: 35 MGAS6180 1897573 bp Aug 8 2005 Coding: 83% MGAS8232 1895017 bp Jan 31 2002 Topology: circular MGAS9429 1836467 bp May 4 2006 Molecule dsDNA SSI-1 1836467 bp May 4 2006 .uni-rostock.de Genome Info & Features: Ulf Schmitz, Computational methods to discover ncRNA 10 www. Intergenic sequence inspector (ISI) .uni-rostock.de Bacterial genomes database Annotated genome IGR databank IGR extractor Filtered IGR databank IGR filtering BLAST results BLAST Aligned features BLAST Analyser Ulf Schmitz, Computational methods to discover ncRNA Sequence features Genview Final results 11 Characteristics of bacterial ncRNA www. .uni-rostock.de • intergenic sequence/structure conservation between related genomes • encoded by free-standing genes, oriented in opposite fashion to both flanking genes • 50 to 400 nt long (avrg. >200nt) • higher G+C content than average intergenic space • σ70 promoter • ρ – independent terminator • imperfect sequence complementary with target mRNA Ulf Schmitz, Computational methods to discover ncRNA 12 www. Characteristics of bacterial ncRNA .uni-rostock.de Promotor Startpoint -35 -10 T82T84G78A65C54A45 16-19bp T80A95T45A60A50T96 5-9bp CA90T intrinsic terminator Ulf Schmitz, Computational methods to discover ncRNA 13 The structure approach with RNAz www. .uni-rostock.de Function of many ncRNAs depend on a defined secondary structure 1. multiple sequence alignment 2. measure of thermodynamic stability (z score) 3. measure for RNA secondary structure conservation Ulf Schmitz, Computational methods to discover ncRNA 14 The structure approach www. .uni-rostock.de Thermodynamic stability • calculation of the MFE (minimum free energy) as a measure of thermodynamic stability • MFE depends on the length and the base composition of the sequence – and is therefor difficult to interpret in absolute terms • RNAz calculates a normalized measure of thermodynamic stability by – compares the MFE m of a given (native) sequence – with the MFEs of a large number of random sequences with similar length and base composition. • A z-score is calculated as where µ and σ are the mean and standard deviations, resp., of m , the MFEs of the random samples z • negative z score indicates the a sequence is more stable than expected by chance Ulf Schmitz, Computational methods to discover ncRNA 15 www. The structure approach .uni-rostock.de Structural conservation • RNAz predicts a consensus secondary structure for an alignment – results in a consensus MFE EA • RNAz compares this consensus MFE to the average MFE of the individual sequences Ē and calculates a structure conservation index: _ SCI E A / E • SCI will be low if no consensus fold can be found. Ulf Schmitz, Computational methods to discover ncRNA 16 The structure approach www. .uni-rostock.de • z-score and SCI, are used to classify an alignment as “structural RNA” or “other”. • RNAz uses a support vector machine (SVM) learning algorithm which is trained on a set of known ncRNAs. Ulf Schmitz, Computational methods to discover ncRNA 17 Analysis pipeline of Freiburg group www. .uni-rostock.de extraction of intergenic regions ≥50nt BLASTN local alignment of IGRs with BLASTN no E-value ≤10-8 discard reverse complement Unify overlapping Clustering Scoring of candidate sequences to reduce redundancy using ClustalW using RNAz Ulf Schmitz, Computational methods to discover ncRNA 18 Summary / Conclusion www. .uni-rostock.de • there are ‘reliable’ computational methods to find ncRNA coding genes in bacteria • key methods involve: – IGR extraction and filtering – observing sequence conservation in related genomes (BLAST search, ClustalW alignment) – checking for structure conservation and thermodynamic stability • next step is to proof their existance experimentally via microArrays or Northern Blots Ulf Schmitz, Computational methods to discover ncRNA 19 www. Outlook .uni-rostock.de • might it be possible to predict target mRNA? Thanks for your attention! Ulf Schmitz, Computational methods to discover ncRNA 20