Introduction to molecular and cell biology

advertisement
www.
.uni-rostock.de
Bioinformatics
Computational methods to discover ncRNA in bacteria
Ulf Schmitz
ulf.schmitz@informatik.uni-rostock.de
Bioinformatics and Systems Biology Group
www.sbi.informatik.uni-rostock.de
www.
Outline
.uni-rostock.de
1. Problem description
2. Streptoccocus pyogenes
3. The RNome, transcriptome
4. Characteristics of bacterial ncRNA
5. Approaches to find fRNA
6. Conclusion / Outlook
Ulf Schmitz, Computational methods to discover ncRNA
2
www.
Streptococcus pyogenes
.uni-rostock.de
• important human pathogen (group A streptococcus or GAS)
• causes following diseases:
– pyoderma (111 million cases/year)
– pharyngitis (616 million cases/year and 517,000 deaths/year)
pyoderma (source: DermNet NZ)
•
•
pharyngitis (source: UCSD)
completely adapted to humans as it’s only natural host
causes purulent infections of the skin and mucous membranes
and rarely life-threatening systemic diseases
Ulf Schmitz, Computational methods to discover ncRNA
3
Streptococcus pyogenes
www.
.uni-rostock.de
 varies in multiplication rate -> associated with type of


infection
to understand the regulation, one studied the growth-phase
regulatory factors and gene expression in response to
specific environmental differences within the host
a novel growth phase assosiated two-component-type
regulator was identified
 fasBCA operon, present in all 12 tested M serotypes
 contained two potential HPK genes (FasB, FasC) and one RR (FasA)
 shows its maximum expression and activity at the transition phase
 and to potentially support the aggressive spreading of the bacteria in its
host
HPK = Histidine protein kinase
RR = response regulator
Ulf Schmitz, Computational methods to discover ncRNA
4
Streptococcus pyogenes
www.
.uni-rostock.de
• downstream of the fas operon they identified
a ~300 nucleotide transcript (fasX)
• not encoding for a peptide/protein
– but also growth phase related
– main effector molecule of fas regulon
• ncRNA or fRNA
Ulf Schmitz, Computational methods to discover ncRNA
5
www.
ncRNA
.uni-rostock.de
fasX
gltX-L
fasB
tt
pfas
fasC
fasA
rnpA-L
tt
pfasX
prnpA
1kb
Ulf Schmitz, Computational methods to discover ncRNA
6
www.
RNome or transcriptome
.uni-rostock.de
RNA
mRNA
ncRNA / fRNA
snmRNA / sRNA
Structural RNA
rRNA
miRNA
tRNA
siRNA
snRNA
snoRNA
stRNA
putative gene expression regulators
(also protein interaction – and
housekeeping ncRNAs where found)
Ulf Schmitz, Computational methods to discover ncRNA
7
RNome or transcriptome
www.
.uni-rostock.de
types of RNA:
mRNA
messenger RNA - transcript of a protein coding gene
rRNA
ribosomal RNA - form large parts of the ribosome, the protein producing machinary
tRNA
transfer RNA - also involved in protein production, carrying single amino acids to the
growing amino acid chain of a protein
ncRNA
non coding RNA - found in intergenic regions, playing miscellaneous roles
Non-coding RNA (ncRNA) genes produce functional RNA molecules rather
than encoding proteins and here are the nominees:
fRNA
Functional RNA
essentially synonymous with non-coding RNA
miRNA
MicroRNA
21-24 nucleotide RNAs probably acting as translational regulators
mRNA
siRNA
Small interfering RNA
active molecules in RNA Interference
snRNA
Small nuclear RNA
includes spliceosomal RNAs
snmRNA
Small non-mRNA
essentially synonymous with small ncRNAs
snoRNA
Small nucleolar RNA
most known snoRNAs are involved in rRNA modification
stRNA
Small temporal RNA
for example, lin-4 and let-7 in Caenorhabditis elegans
Ulf Schmitz, Computational methods to discover ncRNA
8
www.
Functions of ncRNA
.uni-rostock.de
…target mRNAs via imperfect sequence complementarity
binding may result in:
• blockage of ribosome entry
(translation repression)
• melting of inhibitory
secondary structures
(translation activation)
dissolving fold the fold back structure
loop-loop kissing complex
Ulf Schmitz, Computational methods to discover ncRNA
9
www.
Streptococcus pyogenes genomes
Serotype
Length
Date
M1 GAS
1852441 bp
Sep 19 2001
MGAS10270
1928252 bp
May 4 2006
Genes:
1805
MGAS10394
1899877 bp
Aug 3 2004
Protein coding
1697
MGAS10750
1937111 bp
May 4 2006
Length
1,852,441 nt
MGAS2096
1860355 bp
May 4 2006
Structural RNAs:
73
MGAS315
1900521 bp
Jul 18 2002
GC Content:
38%
MGAS5005
1838554 bp
Aug 8 2005
Pseudo genes:
35
MGAS6180
1897573 bp
Aug 8 2005
Coding:
83%
MGAS8232
1895017 bp
Jan 31 2002
Topology:
circular
MGAS9429
1836467 bp
May 4 2006
Molecule
dsDNA
SSI-1
1836467 bp
May 4 2006
.uni-rostock.de
Genome Info & Features:
Ulf Schmitz, Computational methods to discover ncRNA
10
www.
Intergenic sequence inspector (ISI)
.uni-rostock.de
Bacterial genomes database
Annotated
genome
IGR databank
IGR extractor
Filtered IGR
databank
IGR filtering
BLAST results
BLAST
Aligned
features
BLAST Analyser
Ulf Schmitz, Computational methods to discover ncRNA
Sequence
features
Genview
Final results
11
Characteristics of bacterial ncRNA
www.
.uni-rostock.de
• intergenic sequence/structure conservation between related
genomes
• encoded by free-standing genes, oriented in opposite
fashion to both flanking genes
• 50 to 400 nt long (avrg. >200nt)
• higher G+C content than average intergenic space
• σ70 promoter
• ρ – independent terminator
• imperfect sequence complementary with target mRNA
Ulf Schmitz, Computational methods to discover ncRNA
12
www.
Characteristics of bacterial ncRNA
.uni-rostock.de
Promotor
Startpoint
-35
-10
T82T84G78A65C54A45
16-19bp
T80A95T45A60A50T96
5-9bp
CA90T
intrinsic terminator
Ulf Schmitz, Computational methods to discover ncRNA
13
The structure approach with RNAz
www.
.uni-rostock.de
Function of many ncRNAs depend on a defined secondary structure
1. multiple sequence alignment
2. measure of thermodynamic stability (z score)
3. measure for RNA secondary structure
conservation
Ulf Schmitz, Computational methods to discover ncRNA
14
The structure approach
www.
.uni-rostock.de
Thermodynamic stability
• calculation of the MFE (minimum free energy) as a measure of
thermodynamic stability
• MFE depends on the length and the base composition of the sequence
– and is therefor difficult to interpret in absolute terms
• RNAz calculates a normalized measure of thermodynamic stability by
– compares the MFE m of a given (native) sequence
– with the MFEs of a large number of random sequences with similar length
and base composition.
• A z-score is calculated as
where µ and σ are the mean and standard deviations, resp., of

m    , the
MFEs of the random samples
z

• negative z score indicates the a sequence is more stable than expected
by chance
Ulf Schmitz, Computational methods to discover ncRNA
15
www.
The structure approach
.uni-rostock.de
Structural conservation
• RNAz predicts a consensus secondary structure for an alignment
– results in a consensus MFE EA
• RNAz compares this consensus MFE to the average MFE of the
individual sequences Ē and calculates a structure conservation index:
_
SCI  E A / E
• SCI will be low if no consensus fold can be found.
Ulf Schmitz, Computational methods to discover ncRNA
16
The structure approach
www.
.uni-rostock.de
• z-score and SCI, are used to classify an alignment as
“structural RNA” or “other”.
• RNAz uses a support vector machine (SVM) learning
algorithm which is trained on a set of known ncRNAs.
Ulf Schmitz, Computational methods to discover ncRNA
17
Analysis pipeline of Freiburg group
www.
.uni-rostock.de
extraction of intergenic regions ≥50nt
BLASTN
local alignment of IGRs with BLASTN
no
E-value ≤10-8
discard
reverse complement
Unify overlapping
Clustering
Scoring
of candidate sequences
to reduce redundancy
using ClustalW
using RNAz
Ulf Schmitz, Computational methods to discover ncRNA
18
Summary / Conclusion
www.
.uni-rostock.de
• there are ‘reliable’ computational methods to
find ncRNA coding genes in bacteria
• key methods involve:
– IGR extraction and filtering
– observing sequence conservation in related
genomes (BLAST search, ClustalW alignment)
– checking for structure conservation and
thermodynamic stability
• next step is to proof their existance
experimentally via microArrays or Northern
Blots
Ulf Schmitz, Computational methods to discover ncRNA
19
www.
Outlook
.uni-rostock.de
• might it be possible to predict target mRNA?
Thanks for your attention!
Ulf Schmitz, Computational methods to discover ncRNA
20
Download