Learning Residual Finite-State Automata Using Observation Tables

advertisement
Learning Residual Finite-State Automata
Using Observation Tables
Anna Kasprzik
Technical report 09-3
FB IV Informatik, University of Trier, D-54286 Trier
kasprzik@informatik.uni-trier.de
Key words: Regular languages, non-determinism, Myhill-Nerode equivalence
relation, residual languages, observation tables, grammatical inference
1
Introduction
In the area of grammatical inference the problem of how to infer – or “learn” –
a description of a formal language (e.g., a grammar or an automaton) algorithmically from given examples or other kinds of information sources is considered.
A range of conceivable learning settings have been formulated and based on
those quite an amount of learning algorithms have been developed. One of the
best studied classes with respect to this conception of learnability is the class of
regular languages in the Chomsky Hierarchy.
A significant part of the algorithms that have been developed, of which Angluin’s seminal algorithm L∗ for regular string languages [1] was one of the first,
use the concept of an observation table. In an observation table the rows are
labeled by elements from some set that can be taken as prefixes, the columns
are labeled by elements from some set that can be taken as suffixes, and each
cell of the table contains a Boolean value indicating the membership status of
the concatenation of the two labeling elements with respect to the language L
the learner is trying to infer (if known from the available information). If the
table fulfils certain conditions then we can immediately derive a deterministic
finite-state automaton (DFA) from it, and if the given information is sufficient
then this automaton is isomorphic to the minimal DFA AL for L. The states
of AL correspond exactly to the equivalence classes of L under the syntactic
congruence relation on which the Myhill-Nerode theorem (see for example [2])
is based. The elements labeling the rows of the table from which AL is derived
can be seen as representatives for these equivalence classes, and the elements
labeling the columns as experiments by which it is possible to prove that two
of those representatives belong to different equivalence classes and should thus
represent different states of AL .
Unfortunately there is a price to pay for the uniqueness of the minimal DFA:
It can have exponentially more states than a minimal non-deterministic finitestate automaton (NFA) for the same language, so that it seems worth to consider
the question if we should not rather try to infer an automaton of the latter kind.
2
Anna Kasprzik
Lemay and Terlutte [3] introduce a special case of NFA, so-called residual finitestate automata (RFSAs), where each state represents a residual language of
the language that is recognized by the automaton (for the definition of residual
language see Section 2). And fortunately, there is a unique minimal RFSA RL
for every regular string language L. Lemay and Terlutte present several learning
algorithms for RFSAs (see [4–6]), which, however, all work by adding or deleting
states in an automaton according to the information contained in a given sample.
We define a new learning algorithm for regular string languages based on
the concept of an observation table by first using an arbitrary existing appropriate algorithm in order to establish a table for the reversal of the unknown
language L that is to be inferred and then showing that it is possible to derive
the minimal RFSA RL for L itself from this table after some simple necessary
modifications. By thus exploiting the complementarity of prefixes and suffixes,
viz. residual languages and equivalence classes under the syntactic congruence
relation, we would like to emphasize the universality of the concept of an observation table which can be used as a basis for the formulation of any kind of
inference algorithm (i.e., in any kind of learning setting) founded on the MyhillNerode theorem without having to resort to a complicated or very specialized
underlying mechanism such as an automaton. The ultimate goal is to illuminate
the different aspects of the area of Grammatical Inference and the notions used
in it from as many angles as possible in order to eventually be able to set it on
a better, even more general theoretical foundation.
2
Basic Notions and Definitions
Definition 1. An observation table is a triple T = (S, E, obs) with S, E ⊆ Σ ∗
finite, non-empty for some alphabet Σ and obs : S × E −→ {0, 1} a function with
(
1 if se ∈ L is confirmed,
obs(s, e) =
0 if se ∈
/ L is confirmed.
For an observation table T = (S, E, obs) and s ∈ S, the observed behaviour of s
is row(s) := {(se, obs(s, e))|e ∈ E}. Let row(S) denote the set {row(s)|s ∈ S}.
S is partitioned into two sets red and blue where uv ∈ red ⇒ u ∈ red for
u, v ∈ Σ ∗ (prefix-closedness), and blue := {sa ∈ S \ red|s ∈ red, a ∈ Σ}, i.e.,
blue contains the one-symbol extensions of red elements that are not in red.
Definition 2. Two elements r, s ∈ S are obviously different (denoted by r <>
s) iff ∃e ∈ E such that obs(r, e) 6= obs(s, e).
Definition 3. Let T = (S = red ∪ blue, E, obs) be an observation table.
– T is closed iff ¬∃s ∈ blue : ∀r ∈ red : r <> s.
– T is consistent iff ∀s1 , s2 ∈ red, s1 a, s2 a ∈ S, a ∈ Σ : row(s1 ) = row(s2 ) ⇒
row(s1 a) = row(s2 a).
Learning RFSA using observation tables
3
Definition 4. A finite-state automaton is a tuple A = (Σ, Q, Q0 , F, δ) with
finite input alphabet Σ, finite non-empty state set Q, set of start states Q0 ⊆ Q,
set of accepting states F ⊆ Q, and a transition function δ : Q × Σ −→ 2Q . If Q0
is a singleton and δ maps a set containing at most one state to any pair from
Q × Σ the automaton is deterministic (a DFA), otherwise non-deterministic (a
NFA). If δ maps a non-empty set of states to every pair in Q × Σ the automaton
is total, otherwise partial. The transition function can always be extended to
δ : Q × Σ ∗ −→ 2Q with δ(q, ε) = {q} and δ(q,Swa) = δ(δ(q, w), a) for q ∈ Q,
a ∈ Σ, w ∈ Σ ∗ . Let δ(Q0 , w) denote the set {δ(q, w)|q ∈ Q0 } for Q0 ⊆ Q,
w ∈ Σ ∗ . A state q ∈ Q is reachable if there is a string w ∈ Σ ∗ such that
q ∈ δ(Q0 , w). A state q ∈ Q is useful if there are strings w1 , w2 ∈ Σ ∗ such that
q ∈ δ(Q0 , w1 ) and δ(q, w2 ) ∩ F 6= ∅, otherwise useless. The language accepted by
A is L(A) = {w ∈ Σ ∗ |δ(Q0 , w) ∩ F 6= ∅} and is called recognizable or regular.
From an observation table T = (red∪blue, E, obs) with ε ∈ E we can derive an
automaton AT = (Σ, QT , QT 0 , FT , δT ) with QT = row(red), QT 0 = {row(ε)},
FT = {row(s)|obs(s, ε) = 1, s ∈ red}, and δT (row(s), a) = {q ∈ QT |¬(q <>
row(sa)), s ∈ red, a ∈ Σ, sa ∈ S}. If T is consistent AT is deterministic (this
follows straight from the definition of δT – if T is not consistent there is at least
one pair in QT × Σ to which δT maps a set containing more than one state).
The DFA for a regular language L derived from a closed and consistent table
has the minimal number of states (see [1], Theorem 1). This automaton is also
called the canonical DFA AL for L and is unique up to a bijective renaming of
states. However, if AL is required to be total it contains a “failure state” for
all strings that are not a prefix of some string in L (if there are any, which for
example is not the case for L = Σ ∗ \ X for all finite sets X), which does not
have to appear at all or to receive all such strings otherwise.
The Myhill-Nerode equivalence relation ≡L is defined as follows: r ≡L s
iff re ∈ L ⇔ se ∈ L for all r, s, e ∈ Σ ∗ . The index of L is IL := |{[s0 ]L |s0 ∈
Σ ∗ }| where [s0 ]L denotes the equivalence class containing s0 . The Myhill-Nerode
theorem (see for example [2]) states that IL is finite iff L is a regular language,
i.e., iff it can be recognized by a finite-state automaton. The total canonical DFA
AL has exactly IL states, and each state can be seen to represent an equivalence
class under ≡L . Note the correspondence between the table T = (S, E, obs)
representing AL and the equivalence classes of L under ≡L (reflected by the
symbols S and E): S contains strings whose rows are candidates for states in
AL , and the elements of E – ‘contexts’, as we will call them – can be taken
as experiments which prove that two strings in S do indeed belong to different
equivalence classes and that thus their rows should represent two different states.
Definition 5. The reversal w of a string w ∈ Σ ∗ is defined inductively by ε := ε
and aw := wa for a ∈ Σ, w ∈ Σ ∗ . The reversal of a set X ⊆ Σ ∗ is defined as
X := {w|w ∈ X}. The reversal of an automaton A = (Σ, Q, Q0 , F, δ) is defined
as A := (Σ, Q, F, Q0 , δ) with δ(q 0 , w) = {q ∈ Q|q 0 ∈ δ(q, w)} for q 0 ∈ Q, w ∈ Σ ∗ .
Obviously, L(A) = L(A). Note that due to the duality of left and right congruence the reversal of a regular string language is a regular language as well.
4
Anna Kasprzik
Definition 6. The residual language of a language L ⊆ Σ ∗ with regard to a
string w ∈ Σ ∗ is defined
w−1 L := {v ∈ Σ ∗ |wv ∈ L}. A residual language
S −1 as −1
−1
w L is prime iff {v L|v L ( w−1 L} ( w−1 L, otherwise composed.
The Myhill-Nerode theorem can be used to show that the set of distinct residual languages of a language L is finite iff L is regular. There is a bijection
between the residual languages of L and the states of the minimal DFA AL =
(Σ, QL , {q0 }, FL , δL ) defined by w−1 L 7→ q 0 for all w ∈ Σ ∗ with δL (q0 , w) = {q 0 }.
Let LQ0 := {w|δ(Q0 , w) ∩ F 6= ∅} for some regular language L, some automaton A = (Σ, Q, Q0 , F, δ) recognizing L, and Q0 ⊆ Q. Let Lq denote L{q} .
Definition 7. A residual finite-state automaton (RFSA) is an automaton A =
(Σ, Q, Q0 , F, δ) such that Lq is a residual language of L(A) for all states q ∈ Q.
The string w is a characterizing word for q ∈ Q iff Lq = w−1 L(A).
Note that empty residual languages correspond to failure states.
Definition 8. Let L ⊆ Σ ∗ be a regular language. The canonical RFSA RL =
(Σ, QR , QR0 , FR , δR ) for L is defined by QR = {w−1 L|w−1 L is prime}, QR0 =
{w−1 L ∈ QR |w−1 L ⊆ L}, FR = {w−1 L|ε ∈ w−1 L}, and δR (w−1 L, a) =
{v −1 L ∈ QR |v −1 L ⊆ (wa)−1 L} for a ∈ Σ.
RL is minimal with respect to the number of states (see [3], Theorem 1). Note
that the fact that prime residual languages are non-empty sets by definition
implies that RL cannot contain failure states.
Definition 9. Let A = (Σ, Q, Q0 , F, δ) be a NFA, and let Q := {p ∈ 2Q |∃w ∈
Σ ∗ : δ(Q0 , w) = p}. A state q ∈ Q is said
Sn to be coverable iff there exist
q1 , . . . , qn ∈ Q \ {q} for n ≥ 1 such that q = i=1 qi .
3
Inferring a RFSA from an Observation Table
We infer the canonical RFSA for some regular language L from a suitable combination of available information sources. Information sources can be an oracle
answering membership or equivalence queries (including the provision of a counterexample in case of a negative answer for the latter) or positive or negative
samples of L with certain properties (and there are possibly other kinds sources
that could be taken into consideration as well). Known suitable combinations are
for example an oracle answering membership and equivalence queries (a so-called
minimally adequate teacher, or MAT), an oracle answering membership queries
and positive data, or positive and negative data (for references see below).
We proceed as follows: In a first step we use an existing algorithm in order to
build an observation table T 0 = (red0 ∪ blue0 , E 0 , obs0 ) representing the canonical DFA for the reversal L of L. Appropriate algorithms for various learning
settings can be found in [1] (L∗ , for MAT learning), [7] (ALTEX, which can be
readapted to strings by treating them as non-branching trees, for learning from a
membership oracle and positive data), or [8] (the meta-algorithm GENMODEL
Learning RFSA using observation tables
5
covers MAT learning, learning from membership queries and positive data, and
learning from positive and negative data, and can be adapted to a range of other
combinations of information sources as well). All of these algorithms are based
on the principle of adding elements to the set labeling the rows of the table
(representing candidates for states in the canonical DFA) until it is closed and
separating contexts (i.e., suffixes by which it can be shown that two states should
be distinct in the derived automaton) to the set labeling the columns of the table
until it is consistent – additions of one kind potentially resulting in the necessity
of the other and vice versa – and, once the table is both closed and consistent,
deriving an automaton from it that is either the canonical DFA in question or
can be rejected by a counterexample obtained from one of the available information sources, which is then evaluated and used to start the cycle afresh. For
reasons of convenience we will assume that the elements of red0 are all pairwise
obviously different (which for example for tables built by GENMODEL [8] is
the case, and can be made the case for tables built by the algorithm L∗ as well
if instead of adding a counterexample and its prefixes to the set of candidates
the example and its suffixes are added to the set of contexts) so that there is a
natural bijection between red0 and the set of equivalence classes of L under the
Myhill-Nerode relation, viz. the states of the canonical DFA recognizing L.
Obviously, since the available information is for L and not L, data and queries
must be adapted accordingly: Before submitting them to an oracle strings and
automata have to be reversed (see Section 2), as well as given samples and counterexamples before using them for the construction of the table.
The table T 0 should then be submitted to the following modifications:
a) If AT 0 contains a failure state eliminate the representatives of that state from
red0 (these should be rows consisting of 0’s only). The automaton derived
from T 0 at this stage shall be denoted by Ad = (Σ, Qd , Qd0 , Fd , δd ). Ad is
the generally partial minimal DFA for L and contains no useless states.
b) For all s ∈ red0 with ∀e ∈ E 0 : obs0 (s, e) = 0 use Ad to find a string e0 ∈ Σ ∗
such that se0 ∈ L. This is always possible since Ad contains no failure state
and hence a final state must be reachable by some string from every state of
Ad . Add e0 to E 0 and fill up the table using Ad as a membership oracle.
c) For all e ∈ E 0 with ∃e1 , . . . , en ∈ E 0 : ∀s ∈ red0 : [∀i ∈ {1, . . . , n} :
obs0 (s, e) = 0 ⇒ obs0 (s, ei ) = 0] ∧ [obs(s, e) = 1 ⇒ ∃i ∈ {1, . . . , n} :
obs0 (s, ei ) = 1] eliminate e from E 0 (and delete the corresponding columns).
For example, the column labeled by e in the table in Figure 1 would be
eliminated because its 1’s are all “covered” by other contexts.
The table thus modified shall be denoted by T = (red ∪ blue, E, obs) with
obs = obs0 and the automaton derived from it by AT = (Σ, QT , QT 0 , FT , δT )
with FT = Fd (we state these two equalities explicitly for the case in which the
ε-column has been erased). Note that as the states of Ad represent equivalence
classes under the Myhill-Nerode relation the addition of contexts to E 0 cannot
create additional states in the automaton derived from the modified table. And
since elements of red0 that are distinguished by the contexts eliminated in c)
6
Anna Kasprzik
s1
s2
s3
s4
e
1
1
1
0
e1
0
1
0
0
e2
1
0
1
0
e3
1
1
0
0
e4
0
1
0
1
Fig. 1. An example for a coverable column
must be distinguished by the contexts covering those in E 0 as well AT and Ad
are isomorphic and L(AT ) = L(Ad ) = L.
We use components of the table T and the automaton AT to define another
automaton R = (Σ, QR , QR0 , FR , δR ) with
– QR = {q ∈ 2QT |∃e ∈ E : ∀x1 ∈ q : obs(x1 , e) = 1 ∧ ∀x2 ∈ red \ q :
obs(x2 , e) = 0},
– QR0 = {q ∈ QR |∀x ∈ q : obs(x, ε) = 1},
– FR = {q ∈ QR |ε ∈ q}, and
– δR (q1 , a) = {q2 ∈ QR |q2 ⊆ δT (q1 , a)} for q1 ∈ QR and a ∈ Σ.
This enables us to state the following result:
Theorem 1. R is isomorphic to the canonical RFSA for L.
The proof makes use of Theorem 3 from [3], repeated here as Theorem 2:
Theorem 2. Let L be a regular language and let B = (Σ, QB , QB0 , FB , δB ) be
a NFA such that B is a RFSA recognizing L whose states are all reachable. Then
C(B) = (Σ, QC , QC0 , FC , δC ) with
–
–
–
–
QC = {p ∈ QB |p is not coverable}
QC0 = {p ∈ QC |p ⊆ QB0 }
FC = {p ∈ QC |p ∩ FB 6= ∅}
δC (p, a) = {p0 ∈ QC |p0 ⊆ δB (p, a)} for p ∈ QC and a ∈ Σ
is the canonical RFSA recognizing L.
It has been shown in [3], Section 5, that in a RFSA for a regular language
L whose states are all reachable the non-coverable states correspond exactly to
the prime residual languages of L and that consequently QC can be naturally
identified with the set of states of the canonical RFSA for L.
Proof of Theorem 1:
First of all, observe that AT meets the conditions for B in Theorem 2:
– All states of AT are reachable because AT contains no useless states,
– AT is a RFSA because every DFA without useless states is a RFSA (see [3],
Section 3), and
– L(AT ) = L.
Learning RFSA using observation tables
7
We therefore set B = AT . Since AT contains no useless states AT and AT
have the same number of states and transitions, so that AT = B = B =
(Σ, QT , FT , QT 0 , δT ). Assuming for the present that there is a bijection between
QR and QC it is rather trivial to see that
– there is a bijection between QR0 = {q ∈ QR |∀x ∈ q : obs(x, ε) = 1} and
QC0 = {p ∈ QC |p ⊆ FT } due to the fact that FT = {x ∈ red|obs(x, ε) = 1},
– there is a bijection between FR = {q ∈ QR |ε ∈ q} and FC = {p ∈ QC |p ∩
QT 0 6= ∅} due to the fact that QT 0 = {ε},
– for every q ∈ QR , p ∈ QC , and a ∈ Σ such that q is the image of p under
the bijection between QR and QC , δR (q, a) = {q2 ∈ QR |q2 ⊆ δT (q, a)} is the
image of δC (p, a) = {p0 ∈ QC |p0 ⊆ δT (p, a)}.
It remains to show that there is indeed a bijection between QR and the set of
prime residual languages of L, represented by QC . For this, consider Proposition
1 from [3], repeated here as Lemma 1:
Lemma 1. Let A = (Σ, Q, Q0 , F, δ) be a RFSA. For every prime residual language w−1 L(A) there exists a state q ∈ δ(Q0 , w) such that Lq = w−1 L(A).
It is clear from the definition of QR that R is a RFSA: Every state in QR
corresponds to exactly one column in T and to the set of contexts labeling the
various occurrences of that column. Every context e ∈ E corresponds to the
residual language e−1 L of L, and contexts with the same column correspond to
the same residual language of L. Consequently, every state in QR corresponds
to exactly one residual language of L.
According to Lemma 1, there is a state in QR for each prime residual language
of L, and thus, every prime residual language of L is represented by exactly one
column occurring in the table (and at least one context in E).
By modification c) of the table we have eliminated the columns that are coverable
by other columns in the table. If a column is not coverable in the table the state
it represents in QR is not coverable either: Assume that there was a column
in the table that could be covered by a set of strings whose columns do not
occur in the table. Then, due to Lemma 1, these must correspond to composed
residual languages of L. If we were to add these to the table they would have to
be eliminated again because of the restrictions imposed by modification c). This
means that if a column is coverable at all it must be coverable by using only
columns corresponding to prime residual languages of L as well, and these are
all represented in the table. Therefore QR cannot contain any coverable states.
Consequently, the correspondence between QR and the set of prime residual
languages of L is one-to-one, and we have shown that R is isomorphic to the
canonical RFSA for L. Corollary 1. Let L be a regular language. The number of prime residual languages of L equals the minimal number of contexts needed in order to distinguish
the states of the canonical DFA for L.
8
Anna Kasprzik
4
Conclusion
Some remarks:
– Note that if the algorithm generating the original table yields a partial automaton (which in general is the case for ALTEX and also for GENMODEL
in the settings of learning from membership queries and positive data, and
from positive and negative data) then the canonical RFSA cannot be maximal with respect to the transitions either.
– As should become clear from reading [9] and [10], the method presented
here can be very easily adapted to ordinary two-dimensional and even ndimensional trees for an arbitrary n ≥ 1.
– Modification c) might look relatively expensive. It can be made superfluous
if one requires L to be a 0-reversible language:
Definition 10. A regular language L is 0-reversible iff for every DFA A
with L(A) = L A is deterministic as well.
Those languages have the useful property that all their residual languages
are disjoint and hence prime (see [6]).
Algorithmically, modification c) could be handled as follows: For every e ∈ E 0
find the set containing e1 , . . . , en ∈ E 0 such that ∀s ∈ red0 : ∀i ∈ {1, . . . , n} :
obs0 (s, e) = 0 ⇒ obs0 (s, ei ) = 0. Then for all s ∈ red0 with obs(s, e) = 1 check
if ∃i ∈ {1, . . . , n} : obs0 (s, ei ) = 1. If the answer is positive, eliminate e from
E 0 . This can be done in polynomial time.
We add the information that it has been shown experimentally in [5] that for
languages recognized by randomly generated DFAs the number of inclusion
relations between residual languages represented in that DFA is extremely
small, which may allow to hope that the full test does not have to be executed
too often at all.
References
1. Angluin, D.: Learning regular sets from queries and counterexamples. Information
and Computation 75(2) (1987) 87–106
2. Hopcroft, J.E., Ullmann, J.D.: Introduction to Automata Theory, Languages, and
Computation. Addison-Wesley Longman (1990)
3. Denis, F., Lemay, A., Terlutte, A.: Residual finite state automata. In: Proceedings
of STACS. Volume 2010 of LNCS. Springer (2001) 144–157
4. Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using nondeterministic finite automata. In: Proceedings of ICGI. Volume 1891 of LNCS.
Springer (2000) 39–50
5. Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using rfsa. In:
Proceedings of ALT. Volume 2225 of LNCS. Springer (2001) 348–363
6. Denis, F., Lemay, A., Terlutte, A.: Some classes of regular languages identifiable
in the limit from positive data. In: Grammatical Inference: Algorithms and Applications. Volume 2484 of LNCS. Springer (2002) 269–273
Learning RFSA using observation tables
9
7. Besombes, J., Marion, J.Y.: Learning tree languages from positive examples and
membership queries. In: ALT 2003. Volume 3244 of LNCS. Springer (2004) 440–453
8. Kasprzik, A.: Meta-algorithm GENMODEL: Generalizing over three learning settings using observation tables. Technical Report 09-2, University of Trier (2009)
9. Kasprzik, A.: Making finite-state methods applicable to languages beyond contextfreeness via multi-dimensional trees. In J. Piskorski, B. Watson, A.Y., ed.: Postproceedings of FSMNLP 2008. IOS Press (2009) 98–109
10. Kasprzik, A.: A learning algorithm for multi-dimensional trees, or: Learning beyond
context-freeness. In A. Clark, F. Coste, L.M., ed.: Proceedings of ICGI. Volume
5278 of LNAI. Springer (2008) 111–124
11. de la Higuera, C.: Grammatical inference. Unpublished manuscript.
Download