Signatures and terms Trees and tree grammars Evaluation algebras Algebraic Dynamic Programming Session 3 ADP Theory I: Basic Definitions Robert Giegerich (Lecture) Stefan Janssen (Exercises) Faculty of Technology Bielefeld University Summer 2013 http://www.techfak.uni-bielefeld.de/ags/pi/lehre/ADP robert@techfak.uni-bielefeld.de Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Alphabets An alphabet is a finite set of symbols (also called characters). Symbols can be compared for equality, and often, there is a total ordering defined on them (“alphabetical order”). Examples: the ASCII alphabet the single-letter or the 3-letter IUPAC code for amino acid sequences {A, C , G , T } for DNA a finite subset of N × N , denoting e.g. matrix dimensions Sequences of symbols over some alphabet are called texts, words, sequences, strings, genes, proteins, . . . – depending on the application domain Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras A signature Σ over A is a family of function declarations. It consists of a name for a base set (say S) – a placeholder for a yet unspecified data domain a family of function names, together with their argument and result types, where argument types are either S or A, and the result type is always S. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Term language A signature describes a language of terms. Terms are all well-typed formulas that can be formed from the symbols in A and the function names of the signature. The term language defined by Σ over A is denoted TΣ . If we allow variables, taken from a set V , in the terms, we speak of TΣ (V ), a term language with variables. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Signature and term language for the edit distance problem Alphabet: A (short for the ASCII character set) Base set name: Ali Simple edit distance model: r : (A, Ali, A) → Ali d : (A, Ali) → Ali i : (Ali, A) → Ali e:A → Ali For the affine gap cost model we add: dx : (A, Ali) → Ali ix : (Ali, A) → Ali Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Some terms in the term language e($) d(x3 , r (x4 , e($), y2 ) r (x1 , d(x2 , d(x3 , r (x4 , e($), y2 ))), y1 ) r (x1 , d(x2 , r (x3 , d(x4 , e($)), y2 )), y1 ) r (x1 , d(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 ) r (x1 , dx(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 ) Although the function names have (yet) no meaning, we can think of the last four terms as representing alternative alignments of two strings x1 x2 x3 x4 and y1 y2 . Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Term languages are not specific enough Reconsider the term r (x1 , dx(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 ) This term should denote an alignment, but it holds a two-letter deletion that is NOT charged a gap opening cost – as there is no use of function d. This violates the affine gap model. We need a way to describe specific, well-formed subsets of term languages. Only terms built in a special way represent the objects of interest that constitute our search space. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras From terms to trees Terms (or any formula) can be seen as trees – the outmost function (or “operator”) is the root of the tree, its arguments are the subtrees. Term languages can be described by (different types of) tree grammars. Recall: A formal language is a subset of A∗ . A formal tree language is a subset of TΣ . Different types of grammars describe languages of different complexity. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Regular tree grammars A regular tree grammar G over Σ has a set V of nonterminal symbols a designated symbol Z ∈ V , called the axiom a set of productions of the form v → t with v ∈ V and t ∈ TΣ (V ). The language described by a tree grammar is L(G) = {t | t ∈ TΣ , Z ⇒∗ t}. Derivation with tree grammars is just as it is with context free grammars – substituting righthand-side trees for nonterminal symbols generates a tree. Notions of terminal trees, leftmost derivation, derivation tree, and syntactic ambiguity carry over. More on this after the next example. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Tree grammar for edit distance problem V = {base, ali, del, ins}, axiom is ali. base −> A | C | G | T ali −> r | base ali base base del −> ali | dx base | del del | i ins base ins −> ali | Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 d e $ ix ins base Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Tree grammar in textual form Since we cannot use trees as graphical input in programming ... ali = r(base, ali, base) | d(base, del) | i(ins, base) | e(char(’$’)) del = ali | dx(base, del) ins = ali | ix(ins, base) ... we write the righthand sides as terms, using NTs as variables. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Tree grammar in textual form Since we cannot use trees as graphical input in programming ... ali = r(base, ali, base) | d(base, del) | i(ins, base) | e(char(’$’)) del = ali | dx(base, del) ins = ali | ix(ins, base) ... we write the righthand sides as terms, using NTs as variables. Two tape version: ali = r(<base, base>, ali) | d(<base, > ) | i(< , base>, ins) | e() del = ali | dx(<base, >, del) ins = ali | ix(< , base>, ins) Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras A candidate tree D A − R L I N G d m i m m m m r e−$ − A I R L I N E d(D, m(A, i(m(R, m(L, m(I , m(N, r (G , e($), E ), N), I ), L), R), I ), A)) Here m is used as a synonym for r to denote replacements of a character by itself. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras A candidate tree D A − R L I N G d m i m m m m r e−$ − A I R L I N E d(D, m(A, i(m(R, m(L, m(I , m(N, r (G , e($), E ), N), I ), L), R), I ), A)) Here m is used as a synonym for r to denote replacements of a character by itself. The yield (y ) of this tree is “DARLING$ENILRIA” y :: TΣ → A∗ y (fr (a1 . . . anr )) = y (a1 ) · . . . · y (anr ) y (a) = a Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Derivation tree versus derived candidate tree 1 In context free (string) grammars, each derived string has a derivation tree (or several) 2 the derivation tree records the productions used in the derivation 3 when a derivation is reconstructed from a given string, the derivation tree is called a parse tree 4 when there are several derivation trees some string, the grammar is ambiguous Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Derivation tree versus derived candidate tree 1 In context free (string) grammars, each derived string has a derivation tree (or several) 2 the derivation tree records the productions used in the derivation 3 when a derivation is reconstructed from a given string, the derivation tree is called a parse tree 4 when there are several derivation trees some string, the grammar is ambiguous 5 in tree grammars, a derived tree also has a derivation/parse tree (or several) 6 points (2) – (4) apply in an analogous way. A derived (candidate) tree must not be mistaken for a parse tree. It bears no resemblance to the grammar which derived it! Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Relating tree grammars to the Chomsky hierarchy Trees and strings are best compared by drawing the tree from left (root) to right (leaves) – a tree is a string that branches! This is why our tree grammars are regular – the only “grow” to the right. They have the same decidability properties as regular string languages. For example, ambiguity is decidable for regular tree languages, whereas it is undecidable for context free languages. However, the yield language of a regular tree grammar is a context free string language! Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Relating tree grammars to the Chomsky hierarchy Trees and strings are best compared by drawing the tree from left (root) to right (leaves) – a tree is a string that branches! This is why our tree grammars are regular – the only “grow” to the right. They have the same decidability properties as regular string languages. For example, ambiguity is decidable for regular tree languages, whereas it is undecidable for context free languages. However, the yield language of a regular tree grammar is a context free string language! Ly (G) = {y (t) | t ∈ TΣ , Z ⇒∗ t} Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Σ-algebras Generally in mathematics, an algebra is a family of functions over some value domain(s), which satisfy a certain set of axioms a Σ-algebra is an algebra I that supplies a value domain SI for the base set S of Σ, and a function fI of the appropriate type for each function symbol f in Σ a Σ-algebra provides an interpretation of each t ∈ TΣ , which is denoted tI for distinction. tI can be computed and yields a value I(t) ∈ SI Note: Mathematically, tI and I(t) are the same object. Only computer science people distinguish between a value and its computation. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Evaluation algebras Algebras model the evaluation of candidates. Candidates of the search space are represented as terms (trees), and their interpretation yields a score value. However, in optimization problems, we also have an objective function: Given the scores of different candidates, which are the ones we are interested in? Hence we define: An evaluation algebra is a Σ-algebra augmented with a objective function h : [S] → [S] Here, [S] denotes all multi-sets (or lists) with elements from S. The objective function h is often called the choice function, especially when it chooses optimal candidate scores, but it may also make random selection or compute score sums, etc.. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras An evaluation algebra for the edit distance problem operator d(a, z) i(z, a) m(a, z, a) r (a, z, b) e($) h([z1 , . . . ]) meaning character deletion character insertion character match character replacement empty alignment candidate choice interpretation: d(a, z) = i(z, a) = m(a, z, a) = r (a, z, b) = e($) = h([z1 , . . . , zn ]) = unit distance 1+z 1+z 0 + z (same as r (a, z, a)) z+ if a == b then 0 else 1 0 [min[z1 , . . . , zn ]] In this interpretation, d(D, m(A, i(m(R, m(L, m(I , m(N, r (G , e($), E ), N), I ), L), R), I ), A)) = 3 Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Exercise Design signature, grammar and an evaluation algebra for “free-shift” sequence alignment. Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University Signatures and terms Trees and tree grammars Evaluation algebras Sneak preview Next session’s topics: How to put grammars and algebras together to solve DP problems Mathematical prerequisite for DP problems: Bellman’s Principle of Optimality Robert Giegerich (Lecture) Stefan Janssen (Exercises) ADP Lecture Summer 2013 Bielefeld University