Experimentation with DNA Grammars in Analysing Escherichia coli Promoter Sequences Siu-wai Leung, Chris Mellish and Dave Robertson School of Informatics, University of Edinburgh 16 October, 2008 Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Past Study MSc Thesis 1993 Basic Gene Grammars and DNA-ChartParser: A Simple DNA Parsing System Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA and Base Pairs Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Complementarity Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Transcription: DNA Makes RNA Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Translation: RNA Makes Protein Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA as a Language I Language analogy (genetic codes) I Textbooks & popular science books I Formal DNA grammars Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Grammars I Scientific knowledge of DNA I Conceptual categories of DNA sequences I Grammars for representation and reasoning I Computational DNA grammars I Example: E. coli promoters Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Transcription and Promoters Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Promoter Sequences Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Knowledge Representation Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Lexicon: IUB Standard Codes Code A C G T R Base(s) A C G T A or G Mnemonic Adenine Cytosine Guanine Thymine puRine Y C or T pYrimidine K G or T Keto M A or C aMino B C or G or T not A D A or G or T not C H A or C or T not G V A or C or G not T N any base aNy Lexical Rules b(a)--->[a]. b(c)--->[c]. b(g)--->[g]. b(t)--->[t]. b(r)--->[a]. b(r)--->[g]. b(y)--->[c]. b(y)--->[t]. b(k)--->[g]. b(k)--->[t]. b(m)--->[a]. b(m)--->[c]. b(b)--->[c]. b(b)--->[g]. b(b)--->[t]. b(d)--->[a]. b(d)--->[g]. b(d)--->[t]. b(h)--->[a]. b(h)--->[c]. b(h)--->[t]. b(v)--->[a]. b(v)--->[c]. b(v)--->[g]. b(n)--->[a]. b(n)--->[c]. b(n)--->[g]. b(n)--->[t]. Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Basic Gene Grammars I LHS category, arrows, RHS categories, constraints I Arrows: directions, overlapping Arrow Symbol Arrow Body Arrow Head ---> --> ===> === > <----< <=== === < Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Basic Gene Grammars I Gaps: a special category Gap Category gap gap(L1, L2) gap(L1, no) gap(no, L2) gap(L) Lower Limit 0 L1 L1 0 L Upper Limit no L2 no L2 L I Approximate pattern matching I Variables and constraints: positions, formulae Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Overlapping Categories and Relative Positions i i k i k Type 1: j k l l l i k i j k l i<k<j<l j Type 2: j Type 3: j l Type 4: i<k<l<j i=k<l<j i<k<j=l Type 5: Siu-wai Leung, Chris Mellish and Dave Robertson i=k<j=l Experimentation with DNA Grammars Consensus Sequence Grammars promoter ===> contact, conformation. contact ---> minus_35, gap(15,19), minus_10. minus_35 ---> b(t),b(t),b(g),b(a),b(c),b(a). minus_10 ---> b(t),b(a),b(t),b(a),b(a),b(t). conformation ---> . . . . minus_10 & (ecoli,7,Score):Score>=20 ---> b(t), b(a), b(t), b(a), b(a), b(t). scheme(ecoli, 12, -1, -9, -9). Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Weight Matrix Scoring Base A C G T T 10 10 10 69 -35 Region T G 6 9 7 12 8 61 79 18 Evidence Matrix Position 1 2 3 A 1 0 0 C 0 0 1 G 0 0 0 T 0 1 0 A 56 17 11 16 4 0 0 0 1 Score = C 21 54 9 16 5 1 0 0 0 A 54 13 16 17 Base A C G T T 5 10 8 77 -10 Region A T 76 15 6 11 6 14 12 60 A 61 13 14 12 6 0 0 1 0 m Y ni n1 × n2 × ... × nm−1 × nm = a a1 × a2 × ... × am−1 × am i =1 i Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars A 56 20 8 15 T 6 7 5 82 Weight Matrix Grammars freq(m10_pos1, 5/77) ---> b(a). freq(m10_pos2, 76/76) ---> b(a). . . . freq(m10_pos1, 10/77) ---> b(c). freq(m10_pos2, 6/76) ---> b(c). . . . freq(m10_pos1, 8/77) ---> b(g). freq(m10_pos2, 6/76) ---> b(g). . . . freq(m10_pos1, 77/77) ---> b(t). freq(m10_pos2, 12/76) ---> b(t). . . . minus_10 : A*B*C*D*E*F>=0.002 ---> freq(m10_pos1,A), freq(m10_pos2,B), freq(m10_pos3,C), freq(m10_pos4,D), freq(m10_pos5,E), freq(m10_pos6,F). Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Induced Rule Grammars promoter <--- b(v), gap(7) , b(k), b(b), b(k), gap(20), b(r). promoter <--- b(k), gap(1) , b(b), gap(2), b(d), gap(18), b(h), gap(9), b(v). promoter <--- b(t), gap(26), b(t), gap(4), b(t). Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Knowledge-based Artificial Neural Network (KBANN) # # Final Domain Theory " ! Initial Domain Theory " ! # Training Examples " ? Rules-to-Network Translator # ? Initial Neural Network " ? - ! 6 ! Neural Network Learning Siu-wai Leung, Chris Mellish and Dave Robertson - Network-to-Rules Translator # 6 Final Neural Network " ! Experimentation with DNA Grammars KBANN Grammars I Many extracted rules minus10 I if 1.5 < nt(‘CA---T’). Simple match grammar minus10 # X : X>1.5 ---> b(c), b(a), b(x), b(x), b(x), b(t). Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Induce-Net Grammars promoter <--- b(t), gap(1), b(b), b(h), gap(20), b(h), gap(9), b(v). promoter <--- b(g), b(h), gap(20), b(w). Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Parsing Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Edges in Chart <Start, Finish, Label, Arrow, Found, ToFind> <Start, <Start, <Start, <Start, Finish, Finish, Finish, Finish, Label Label Label Label → ⇒ ← ⇐ Found . ToFind Found . ToFind ToFind . Found ToFind . Found > > > > <Start, Finish, Label ↔ AllRHSCategories > Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Chart Parsing Fundamental Rule: Left-to-Right If the chart contains edges < i, j, A → W1 . B W2 > and < j, k, B ↔ W3 >, where A and B are categories and W1, W2 and W3 are sequences of zero or more categories, then add edge: < i, k, A → W1 B . W2 > to the agenda. Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Processing Gap Categories Gap Rule If the chart contains edges < i, j, A → W1 . gap(Lower,Upper) B W2 > and < k, l, B ↔ W > where Lower ≤ k – j ≤ Upper, then add edges: < i, l, A → W1 gap(Lower,Upper) B . W2 > and < j, k, gap ↔ gap > to the agenda. Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Processing Overlapping Categories Overlap Rule If the chart contains edges: < i, j, A ⇒ W1 . B W2 > and < k, l, B ↔ W3 >, where (1) A and B are categories, (2) W1, W2 and W3 are (possibly empty) sequences of categories, (3) i ≤ k ≤ j, and (4) m is the maximum value of j and l, then add edge: < i, m, A ⇒ W1 B . W2 > to the agenda. Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Approximate Pattern Matching APM Rule If the chart contains an edge: < i, j, A → W1 . X&Y W2 > and there is a grammar rule X&(Scheme,MaxLen,Y) : constraints → W and l is the length of the DNA subsequence matched with W after doing best matching and l ≤ MaxLen and r is the best match score W against the sequence from j to j+l according the scoring Scheme, and the constraints are satisfiable, then add edges: < i, j+l, A → W1 X&r . W2 > and < j, l, X&r ↔ W > to the agenda. Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Approximate Pattern Matching APM Algorithm S(0, 0) ← 0 for j ← 1 to N do S(0, j) ← S(0, j − 1) + σ(−, bj ) for i ← 1 to M do { S(i , 0) ← S(i − 1, 0) + σ(ai , −) for j ← 1 to N do S(i , j) ← max{S(i − 1, j − 1) + σ(ai , bj ), S(i − 1, j) + σ(ai , −), S(i , j − 1) + σ(−, bj )} } write S(M, N) Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Parsing Experiment Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars New Sequence Data Knowledge seq053 seq300 seq749 Sequence Data seq053 seq300 seq749 √ √ ? √ ? ? Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars e-CAT Experiment LogBook Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Pattern Analysis I Computational experiments: parsing I Physical experiments: hybridisation Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Hybridisation Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Pattern Matching Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars DNA Probes Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Target Sequences Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Target-Probe Hybridisation Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Hybridisation Overview Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Scanning Result Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Grammars as Abstract Specification I I Grammars - Knowledge/Hypotheses Protocols I I I Material Apparatus / servers Method / procedure I Expt design I Operationalisation I Execution Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Species Identification n_rDNA_ITS ---> r18s, its1, r5_8s, its2, r28s. its1_spA ---> patternA1, patternA2. its1_spA ===> patternB1, patternB2, patternB3. its2_spB ===> patternC1, patternC2, patternC3. its2_spB ---> patternD1, patternD2. patternA1 : hybridisable(probeA1,patternA1) ---> probeA1. patternA2 : hybridisable(probeA1,patternA2) ---> probeA2. .... probeA1 ---> "tgattacagacccagcccaatacttttctaca". .... In collaboration with Yanbo Zhang at the University of Hong Kong Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Quality Assurance of Medicinal Plants Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Authentication by DNA Microarrays In collaboration with Yanbo Zhang at the University of Hong Kong Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Probe Selection Criteria I Base composition I Base distribution I GC content (30-70%) I No 2o structure I Continuous non-target match (< 15bp) I Overall non-target match (< 75%) I Other biophysical properties (e.g., Tm ) I Computation and experiments are necessary Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Probe Specificity Test In collaboration with Yanbo Zhang at the University of Hong Kong Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars Scientific Experimentation I Physical or computational I Computational sequence analysis I Microarray toolkit design I Need sequence knowledge & representations I Coordination of experimental protocols I Simpler if we use grammars? Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars ”The existence of the experimental method makes us think we have the method of solving the problems which trouble us; though problem and method pass one another by.” Ludwig Wittgenstein Siu-wai Leung, Chris Mellish and Dave Robertson Experimentation with DNA Grammars