Experimentation with DNA Grammars in Analysing Escherichia coli Promoter Sequences

advertisement
Experimentation with DNA Grammars
in Analysing Escherichia coli Promoter Sequences
Siu-wai Leung, Chris Mellish and Dave Robertson
School of Informatics, University of Edinburgh
16 October, 2008
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Past Study
MSc Thesis 1993
Basic Gene Grammars and DNA-ChartParser:
A Simple DNA Parsing System
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA and Base Pairs
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Complementarity
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Transcription: DNA Makes RNA
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Translation: RNA Makes Protein
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA as a Language
I
Language analogy (genetic codes)
I
Textbooks & popular science books
I
Formal DNA grammars
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Grammars
I
Scientific knowledge of DNA
I
Conceptual categories of DNA sequences
I
Grammars for representation and reasoning
I
Computational DNA grammars
I
Example: E. coli promoters
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Transcription and Promoters
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Promoter Sequences
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Knowledge Representation
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Lexicon: IUB Standard Codes
Code
A
C
G
T
R
Base(s)
A
C
G
T
A or G
Mnemonic
Adenine
Cytosine
Guanine
Thymine
puRine
Y
C or T
pYrimidine
K
G or T
Keto
M
A or C
aMino
B
C or G or T
not A
D
A or G or T
not C
H
A or C or T
not G
V
A or C or G
not T
N
any base
aNy
Lexical Rules
b(a)--->[a].
b(c)--->[c].
b(g)--->[g].
b(t)--->[t].
b(r)--->[a].
b(r)--->[g].
b(y)--->[c].
b(y)--->[t].
b(k)--->[g].
b(k)--->[t].
b(m)--->[a].
b(m)--->[c].
b(b)--->[c].
b(b)--->[g].
b(b)--->[t].
b(d)--->[a].
b(d)--->[g].
b(d)--->[t].
b(h)--->[a].
b(h)--->[c].
b(h)--->[t].
b(v)--->[a].
b(v)--->[c].
b(v)--->[g].
b(n)--->[a].
b(n)--->[c].
b(n)--->[g].
b(n)--->[t].
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Basic Gene Grammars
I
LHS category, arrows, RHS categories, constraints
I
Arrows: directions, overlapping
Arrow Symbol Arrow Body Arrow Head
--->
-->
===>
===
>
<----<
<===
===
<
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Basic Gene Grammars
I
Gaps: a special category
Gap Category
gap
gap(L1, L2)
gap(L1, no)
gap(no, L2)
gap(L)
Lower Limit
0
L1
L1
0
L
Upper Limit
no
L2
no
L2
L
I
Approximate pattern matching
I
Variables and constraints: positions, formulae
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Overlapping Categories and Relative Positions
i
i
k
i
k
Type 1:
j
k
l
l
l
i
k
i
j
k
l
i<k<j<l
j
Type 2:
j
Type 3:
j
l
Type 4:
i<k<l<j
i=k<l<j
i<k<j=l
Type 5:
Siu-wai Leung, Chris Mellish and Dave Robertson
i=k<j=l
Experimentation with DNA Grammars
Consensus Sequence Grammars
promoter ===> contact, conformation.
contact ---> minus_35, gap(15,19), minus_10.
minus_35 ---> b(t),b(t),b(g),b(a),b(c),b(a).
minus_10 ---> b(t),b(a),b(t),b(a),b(a),b(t).
conformation ---> . . . .
minus_10 & (ecoli,7,Score):Score>=20 --->
b(t), b(a), b(t), b(a), b(a), b(t).
scheme(ecoli, 12, -1, -9, -9).
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Weight Matrix Scoring
Base
A
C
G
T
T
10
10
10
69
-35 Region
T
G
6
9
7 12
8 61
79
18
Evidence Matrix
Position
1
2
3
A
1
0
0
C
0
0
1
G
0
0
0
T
0
1
0
A
56
17
11
16
4
0
0
0
1
Score =
C
21
54
9
16
5
1
0
0
0
A
54
13
16
17
Base
A
C
G
T
T
5
10
8
77
-10 Region
A
T
76
15
6
11
6
14
12
60
A
61
13
14
12
6
0
0
1
0
m
Y
ni
n1 × n2 × ... × nm−1 × nm
=
a
a1 × a2 × ... × am−1 × am
i =1 i
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
A
56
20
8
15
T
6
7
5
82
Weight Matrix Grammars
freq(m10_pos1, 5/77) ---> b(a).
freq(m10_pos2, 76/76) ---> b(a).
. . .
freq(m10_pos1, 10/77) ---> b(c).
freq(m10_pos2, 6/76) ---> b(c).
. . .
freq(m10_pos1, 8/77) ---> b(g).
freq(m10_pos2, 6/76) ---> b(g).
. . .
freq(m10_pos1, 77/77) ---> b(t).
freq(m10_pos2, 12/76) ---> b(t).
. . .
minus_10 : A*B*C*D*E*F>=0.002 --->
freq(m10_pos1,A), freq(m10_pos2,B),
freq(m10_pos3,C), freq(m10_pos4,D),
freq(m10_pos5,E), freq(m10_pos6,F).
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Induced Rule Grammars
promoter <--- b(v), gap(7) , b(k), b(b),
b(k), gap(20), b(r).
promoter <--- b(k), gap(1) , b(b), gap(2),
b(d), gap(18), b(h), gap(9),
b(v).
promoter <--- b(t), gap(26), b(t), gap(4),
b(t).
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Knowledge-based Artificial Neural Network (KBANN)
#
#
Final
Domain Theory
"
!
Initial
Domain Theory
"
!
#
Training
Examples
"
?
Rules-to-Network
Translator
#
?
Initial
Neural Network
"
?
-
!
6
!
Neural Network
Learning
Siu-wai Leung, Chris Mellish and Dave Robertson
-
Network-to-Rules
Translator
#
6
Final
Neural Network
"
!
Experimentation with DNA Grammars
KBANN Grammars
I
Many extracted rules
minus10
I
if
1.5 < nt(‘CA---T’).
Simple match grammar
minus10 # X : X>1.5 --->
b(c), b(a), b(x), b(x), b(x), b(t).
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Induce-Net Grammars
promoter <--- b(t), gap(1), b(b), b(h),
gap(20), b(h), gap(9),
b(v).
promoter <--- b(g), b(h), gap(20), b(w).
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Parsing
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Edges in Chart
<Start, Finish, Label, Arrow, Found, ToFind>
<Start,
<Start,
<Start,
<Start,
Finish,
Finish,
Finish,
Finish,
Label
Label
Label
Label
→
⇒
←
⇐
Found . ToFind
Found . ToFind
ToFind . Found
ToFind . Found
>
>
>
>
<Start, Finish, Label ↔ AllRHSCategories >
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Chart Parsing
Fundamental Rule: Left-to-Right
If the chart contains edges
< i, j, A → W1 . B W2 > and
< j, k, B ↔ W3 >,
where A and B are categories and W1, W2 and W3 are sequences
of zero or more categories, then add edge:
< i, k, A → W1 B . W2 >
to the agenda.
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Processing Gap Categories
Gap Rule
If the chart contains edges
< i, j, A → W1 . gap(Lower,Upper) B W2 > and
< k, l, B ↔ W >
where Lower ≤ k – j ≤ Upper, then add edges:
< i, l, A → W1 gap(Lower,Upper) B . W2 > and
< j, k, gap ↔ gap >
to the agenda.
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Processing Overlapping Categories
Overlap Rule
If the chart contains edges:
< i, j, A ⇒ W1 . B W2 > and
< k, l, B ↔ W3 >,
where (1) A and B are categories, (2) W1, W2 and W3 are
(possibly empty) sequences of categories, (3) i ≤ k ≤ j, and (4) m
is the maximum value of j and l, then add edge:
< i, m, A ⇒ W1 B . W2 >
to the agenda.
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Approximate Pattern Matching
APM Rule
If the chart contains an edge:
< i, j, A → W1 . X&Y W2 > and
there is a grammar rule
X&(Scheme,MaxLen,Y) : constraints → W
and l is the length of the DNA subsequence matched with W after
doing best matching and l ≤ MaxLen and r is the best match score
W against the sequence from j to j+l according the scoring
Scheme, and the constraints are satisfiable, then add edges:
< i, j+l, A → W1 X&r . W2 > and
< j, l, X&r ↔ W >
to the agenda.
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Approximate Pattern Matching
APM Algorithm
S(0, 0) ← 0
for j ← 1 to N do
S(0, j) ← S(0, j − 1) + σ(−, bj )
for i ← 1 to M do
{
S(i , 0) ← S(i − 1, 0) + σ(ai , −)
for j ← 1 to N do
S(i , j) ← max{S(i − 1, j − 1) + σ(ai , bj ),
S(i − 1, j) + σ(ai , −),
S(i , j − 1) + σ(−, bj )}
}
write S(M, N)
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Parsing Experiment
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
New Sequence Data
Knowledge
seq053
seq300
seq749
Sequence Data
seq053 seq300 seq749
√
√
?
√
?
?
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
e-CAT Experiment LogBook
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Pattern Analysis
I
Computational experiments: parsing
I
Physical experiments: hybridisation
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Hybridisation
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Pattern Matching
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
DNA Probes
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Target Sequences
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Target-Probe Hybridisation
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Hybridisation Overview
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Scanning Result
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Grammars as Abstract Specification
I
I
Grammars - Knowledge/Hypotheses
Protocols
I
I
I
Material
Apparatus / servers
Method / procedure
I
Expt design
I
Operationalisation
I
Execution
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Species Identification
n_rDNA_ITS ---> r18s, its1, r5_8s, its2, r28s.
its1_spA ---> patternA1, patternA2.
its1_spA ===> patternB1, patternB2, patternB3.
its2_spB ===> patternC1, patternC2, patternC3.
its2_spB ---> patternD1, patternD2.
patternA1 : hybridisable(probeA1,patternA1) ---> probeA1.
patternA2 : hybridisable(probeA1,patternA2) ---> probeA2.
....
probeA1 ---> "tgattacagacccagcccaatacttttctaca".
....
In collaboration with Yanbo Zhang at the University of Hong Kong
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Quality Assurance of Medicinal Plants
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Authentication by DNA Microarrays
In collaboration with Yanbo Zhang at the University of Hong Kong
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Probe Selection Criteria
I
Base composition
I
Base distribution
I
GC content (30-70%)
I
No 2o structure
I
Continuous non-target match (< 15bp)
I
Overall non-target match (< 75%)
I
Other biophysical properties (e.g., Tm )
I
Computation and experiments are necessary
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Probe Specificity Test
In collaboration with Yanbo Zhang at the University of Hong Kong
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Scientific Experimentation
I
Physical or computational
I
Computational sequence analysis
I
Microarray toolkit design
I
Need sequence knowledge & representations
I
Coordination of experimental protocols
I
Simpler if we use grammars?
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
”The existence of the experimental method makes us
think we have the method of solving the problems which
trouble us; though problem and method pass one another
by.”
Ludwig Wittgenstein
Siu-wai Leung, Chris Mellish and Dave Robertson
Experimentation with DNA Grammars
Download