Algebraic Dynamic Programming Session 3 ADP Theory I: Basic

advertisement
Signatures and terms
Trees and tree grammars
Evaluation algebras
Algebraic Dynamic Programming
Session 3
ADP Theory I: Basic Definitions
Robert Giegerich (Lecture)
Stefan Janssen (Exercises)
Faculty of Technology
Bielefeld University
Summer 2013
http://www.techfak.uni-bielefeld.de/ags/pi/lehre/ADP
robert@techfak.uni-bielefeld.de
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Alphabets
An alphabet is a finite set of symbols (also called characters).
Symbols can be compared for equality, and often, there is a total
ordering defined on them (“alphabetical order”).
Examples:
the ASCII alphabet
the single-letter or the 3-letter IUPAC code for amino acid
sequences
{A, C , G , T } for DNA
a finite subset of N × N , denoting e.g. matrix dimensions
Sequences of symbols over some alphabet are called texts, words,
sequences, strings, genes, proteins, . . . – depending on the
application domain
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
A signature Σ over A is a family of function declarations. It
consists of
a name for a base set (say S) – a placeholder for a yet
unspecified data domain
a family of function names, together with their argument and
result types, where
argument types are either S or A, and
the result type is always S.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Term language
A signature describes a language of terms.
Terms are all well-typed formulas that can be formed from the
symbols in A and the function names of the signature.
The term language defined by Σ over A is denoted TΣ .
If we allow variables, taken from a set V , in the terms, we speak of
TΣ (V ), a term language with variables.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Signature and term language for the edit distance problem
Alphabet: A (short for the ASCII character set)
Base set name: Ali
Simple edit distance model:
r : (A, Ali, A) → Ali
d : (A, Ali)
→ Ali
i : (Ali, A)
→ Ali
e:A
→ Ali
For the affine gap cost model we add:
dx : (A, Ali) → Ali
ix : (Ali, A) → Ali
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Some terms in the term language
e($)
d(x3 , r (x4 , e($), y2 )
r (x1 , d(x2 , d(x3 , r (x4 , e($), y2 ))), y1 )
r (x1 , d(x2 , r (x3 , d(x4 , e($)), y2 )), y1 )
r (x1 , d(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 )
r (x1 , dx(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 )
Although the function names have (yet) no meaning, we can think
of the last four terms as representing alternative alignments of two
strings x1 x2 x3 x4 and y1 y2 .
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Term languages are not specific enough
Reconsider the term
r (x1 , dx(x2 , dx(x3 , r (x4 , e($), y2 ))), y1 )
This term should denote an alignment, but it holds a two-letter
deletion that is NOT charged a gap opening cost – as there is no
use of function d. This violates the affine gap model.
We need a way to describe specific, well-formed subsets of term
languages. Only terms built in a special way represent the objects
of interest that constitute our search space.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
From terms to trees
Terms (or any formula) can be seen as trees –
the outmost function (or “operator”) is the root of the tree,
its arguments are the subtrees.
Term languages can be described by (different types of) tree
grammars.
Recall:
A formal language is a subset of A∗ .
A formal tree language is a subset of TΣ .
Different types of grammars describe languages of different complexity.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Regular tree grammars
A regular tree grammar G over Σ has
a set V of nonterminal symbols
a designated symbol Z ∈ V , called the axiom
a set of productions of the form v → t with v ∈ V and
t ∈ TΣ (V ).
The language described by a tree grammar is
L(G) = {t | t ∈ TΣ , Z ⇒∗ t}.
Derivation with tree grammars is just as it is with context free grammars –
substituting righthand-side trees for nonterminal symbols generates a tree.
Notions of terminal trees, leftmost derivation, derivation tree, and syntactic
ambiguity carry over. More on this after the next example.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Tree grammar for edit distance problem
V = {base, ali, del, ins}, axiom is ali.
base −> A | C | G | T
ali −>
r
|
base ali base
base
del −> ali | dx
base
|
del
del
|
i
ins base
ins −> ali |
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
d
e
$
ix
ins
base
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Tree grammar in textual form
Since we cannot use trees as graphical input in programming ...
ali = r(base, ali, base) | d(base, del) |
i(ins, base)
| e(char(’$’))
del = ali | dx(base, del)
ins = ali | ix(ins, base)
... we write the righthand sides as terms, using NTs as variables.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Tree grammar in textual form
Since we cannot use trees as graphical input in programming ...
ali = r(base, ali, base) | d(base, del) |
i(ins, base)
| e(char(’$’))
del = ali | dx(base, del)
ins = ali | ix(ins, base)
... we write the righthand sides as terms, using NTs as variables.
Two tape version:
ali = r(<base, base>, ali) | d(<base, > ) |
i(< , base>, ins)
| e()
del = ali | dx(<base, >, del)
ins = ali | ix(< , base>, ins)
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
A candidate tree
D A − R L I N G
d m i m m m m r
e−$
− A I R L I N E
d(D, m(A, i(m(R, m(L, m(I , m(N, r (G , e($), E ), N), I ), L), R), I ), A))
Here m is used as a synonym for r to denote replacements of a character by
itself.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
A candidate tree
D A − R L I N G
d m i m m m m r
e−$
− A I R L I N E
d(D, m(A, i(m(R, m(L, m(I , m(N, r (G , e($), E ), N), I ), L), R), I ), A))
Here m is used as a synonym for r to denote replacements of a character by
itself.
The yield (y ) of this tree is “DARLING$ENILRIA”
y :: TΣ → A∗
y (fr (a1 . . . anr )) = y (a1 ) · . . . · y (anr )
y (a) = a
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Derivation tree versus derived candidate tree
1
In context free (string) grammars, each derived string has a
derivation tree (or several)
2
the derivation tree records the productions used in the
derivation
3
when a derivation is reconstructed from a given string, the
derivation tree is called a parse tree
4
when there are several derivation trees some string, the
grammar is ambiguous
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Derivation tree versus derived candidate tree
1
In context free (string) grammars, each derived string has a
derivation tree (or several)
2
the derivation tree records the productions used in the
derivation
3
when a derivation is reconstructed from a given string, the
derivation tree is called a parse tree
4
when there are several derivation trees some string, the
grammar is ambiguous
5
in tree grammars, a derived tree also has a derivation/parse
tree (or several)
6
points (2) – (4) apply in an analogous way.
A derived (candidate) tree must not be mistaken for a parse tree.
It bears no resemblance to the grammar which derived it!
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Relating tree grammars to the Chomsky hierarchy
Trees and strings are best compared by drawing the tree from left
(root) to right (leaves) – a tree is a string that branches!
This is why our tree grammars are regular – the only “grow” to the
right.
They have the same decidability properties as regular string
languages. For example, ambiguity is decidable for regular tree
languages, whereas it is undecidable for context free languages.
However, the yield language of a regular tree grammar is a context
free string language!
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Relating tree grammars to the Chomsky hierarchy
Trees and strings are best compared by drawing the tree from left
(root) to right (leaves) – a tree is a string that branches!
This is why our tree grammars are regular – the only “grow” to the
right.
They have the same decidability properties as regular string
languages. For example, ambiguity is decidable for regular tree
languages, whereas it is undecidable for context free languages.
However, the yield language of a regular tree grammar is a context
free string language!
Ly (G) = {y (t) | t ∈ TΣ , Z ⇒∗ t}
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Σ-algebras
Generally in mathematics,
an algebra is a family of functions over some value domain(s),
which satisfy a certain set of axioms
a Σ-algebra is an algebra I that supplies a value domain SI
for the base set S of Σ, and a function fI of the appropriate
type for each function symbol f in Σ
a Σ-algebra provides an interpretation of each t ∈ TΣ , which
is denoted tI for distinction.
tI can be computed and yields a value I(t) ∈ SI
Note: Mathematically, tI and I(t) are the same object. Only computer science
people distinguish between a value and its computation.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Evaluation algebras
Algebras model the evaluation of candidates. Candidates of the
search space are represented as terms (trees), and their
interpretation yields a score value. However, in optimization
problems, we also have an objective function: Given the scores of
different candidates, which are the ones we are interested in?
Hence we define:
An evaluation algebra is a Σ-algebra augmented with a objective
function
h : [S] → [S]
Here, [S] denotes all multi-sets (or lists) with elements from S.
The objective function h is often called the choice function,
especially when it chooses optimal candidate scores, but it may
also make random selection or compute score sums, etc..
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
An evaluation algebra for the edit distance problem
operator
d(a, z)
i(z, a)
m(a, z, a)
r (a, z, b)
e($)
h([z1 , . . . ])
meaning
character deletion
character insertion
character match
character replacement
empty alignment
candidate choice
interpretation:
d(a, z) =
i(z, a) =
m(a, z, a) =
r (a, z, b) =
e($) =
h([z1 , . . . , zn ]) =
unit distance
1+z
1+z
0 + z (same as r (a, z, a))
z+ if a == b then 0 else 1
0
[min[z1 , . . . , zn ]]
In this interpretation,
d(D, m(A, i(m(R, m(L, m(I , m(N, r (G , e($), E ), N), I ), L), R), I ), A)) = 3
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Exercise
Design signature, grammar and an evaluation algebra for
“free-shift” sequence alignment.
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Signatures and terms
Trees and tree grammars
Evaluation algebras
Sneak preview
Next session’s topics:
How to put grammars and algebras together to solve DP problems
Mathematical prerequisite for DP problems: Bellman’s Principle of
Optimality
Robert Giegerich (Lecture) Stefan Janssen (Exercises)
ADP Lecture Summer 2013
Bielefeld University
Download