Signatures and terms Trees and tree grammars Evaluation algebras Algebraic Dynamic Programming Session 3 ADP Theory I: Basic Definitions Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) Department of Computer Science Albert-Ludwigs-University Freiburg Summer 2010 http://www.bioinf.uni-freiburg.de/Lehre/Courses/2010 SS/V ADP mmohl@informatik.uni-freiburg.de.de Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Alphabets An alphabet A is a finite set of symbols (also called characters). Symbols can be compared for equality, and often, there is a total ordering defined on them (“alphabetical order”). Examples: the ASCII alphabet the single-letter or the 3-letter IUPAC code for amino acid sequences {A, C , G , T } for DNA a finite subset of N × N , denoting e.g. matrix dimensions Sequences of symbols over some alphabet are called texts, words, sequences, strings, genes, proteins, . . . – depending on the application domain Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras A signature Σ over A is a family of function declarations. There is a name for a base set (say S) – merely a placeholder for a yet unspecified data domain a family of function names, together with their argument and result types argument types are either S or A result type is always S Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Term language A signature describes a language of terms. Terms are all well-typed formulas that can be formed from the symbols in A and the function names of the signature. The term language defined by Σ over A is denoted TΣ . Subsets of TΣ are also term languages. If we allow variables, taken from a set V , in the terms, we speak of TΣ (V ), a term language with variables. Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Signature and term language for the edit distance problem Alphabet: A Base set name: Ali Simple edit distance model: r : (A, Ali, A) → Ali d : (A, Ali) → Ali i : (Ali, A) → Ali e:A → Ali For the affine gap cost model we add: dx : (A, Ali) → Ali ix : (Ali, A) → Ali Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Some terms in the term language e($) d(x3 , r (x4 , e($), y2 ) r (x1 , d(x2 , d(x3 , r (x4 , e($), y2 ))), y1 ) r (x1 , d(x2 , r (x3 , d(x4 , e($)), y2 )), y1 ) r (x1 , d(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 ) r (x1 , dx(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 ) Although the function names have (yet) no meaning, we can think of the last four terms as representing alternative alignments of two strings x1 x2 x3 x4 and y1 y2 . More general: terms can describe candidates in a DP problem. Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras From terms to trees Terms (or any formula) can be seen as trees – the outmost function (or “operator”) is the root of the tree, its arguments are the subtrees. Examples: e($) Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 d(x3 , r (x4 , e($), y2 )) Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Term languages TΣ are not specific enough Reconsider the term r (x1 , dx(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 ) This term should denote an alignment (no candidate), but it holds a two-letter deletion that is NOT charged a gap opening cost – as there is no use of function d. This violates the affine gap model. We need a way to describe specific, well-formed subsets of term languages. Only terms built in a special way represent the objects of interest that constitute our search space. → use tree grammars Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Term languages via tree grammars Term languages can be described by (different types of) tree grammars. Recall: A formal language (for example context-free language or regular language) is a subset of A∗ . A formal tree language is a subset of TΣ . Different types of grammars describe languages of different complexity (for example regular or context-free). Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg Signatures and terms Trees and tree grammars Evaluation algebras Regular tree grammars A regular tree grammar over Σ has a set V of nonterminal symbols a designated symbol Z ∈ V , called the axiom a set of productions of the form v → t with v ∈ V and t ∈ TΣ (V ). The language described by a tree grammar is L(G) = {t ∈ TΣ |Z ⇒∗ t}. Derivation with tree grammars is just as it is with context free grammars – substituting righthand-side trees for nonterminal symbols generates a tree. Notions of terminal trees, leftmost derivation, derivation tree, and syntactic ambiguity carry over. More on this after the next example. Mathias Möhl, Rolf Backofen (Lecture) Daniel Maticzka, Sita Lange (Exercises) ADP Lecture Summer 2010 Albert-Ludwigs-University Freiburg