Learning Residual Finite-State Automata Using Observation Tables Anna Kasprzik Technical report 09-3 FB IV Informatik, University of Trier, D-54286 Trier kasprzik@informatik.uni-trier.de Key words: Regular languages, non-determinism, Myhill-Nerode equivalence relation, residual languages, observation tables, grammatical inference 1 Introduction In the area of grammatical inference the problem of how to infer – or “learn” – a description of a formal language (e.g., a grammar or an automaton) algorithmically from given examples or other kinds of information sources is considered. A range of conceivable learning settings have been formulated and based on those quite an amount of learning algorithms have been developed. One of the best studied classes with respect to this conception of learnability is the class of regular languages in the Chomsky Hierarchy. A significant part of the algorithms that have been developed, of which Angluin’s seminal algorithm L∗ for regular string languages [1] was one of the first, use the concept of an observation table. In an observation table the rows are labeled by elements from some set that can be taken as prefixes, the columns are labeled by elements from some set that can be taken as suffixes, and each cell of the table contains a Boolean value indicating the membership status of the concatenation of the two labeling elements with respect to the language L the learner is trying to infer (if known from the available information). If the table fulfils certain conditions then we can immediately derive a deterministic finite-state automaton (DFA) from it, and if the given information is sufficient then this automaton is isomorphic to the minimal DFA AL for L. The states of AL correspond exactly to the equivalence classes of L under the syntactic congruence relation on which the Myhill-Nerode theorem (see for example [2]) is based. The elements labeling the rows of the table from which AL is derived can be seen as representatives for these equivalence classes, and the elements labeling the columns as experiments by which it is possible to prove that two of those representatives belong to different equivalence classes and should thus represent different states of AL . Unfortunately there is a price to pay for the uniqueness of the minimal DFA: It can have exponentially more states than a minimal non-deterministic finitestate automaton (NFA) for the same language, so that it seems worth to consider the question if we should not rather try to infer an automaton of the latter kind. 2 Anna Kasprzik Lemay and Terlutte [3] introduce a special case of NFA, so-called residual finitestate automata (RFSAs), where each state represents a residual language of the language that is recognized by the automaton (for the definition of residual language see Section 2). And fortunately, there is a unique minimal RFSA RL for every regular string language L. Lemay and Terlutte present several learning algorithms for RFSAs (see [4–6]), which, however, all work by adding or deleting states in an automaton according to the information contained in a given sample. We define a new learning algorithm for regular string languages based on the concept of an observation table by first using an arbitrary existing appropriate algorithm in order to establish a table for the reversal of the unknown language L that is to be inferred and then showing that it is possible to derive the minimal RFSA RL for L itself from this table after some simple necessary modifications. By thus exploiting the complementarity of prefixes and suffixes, viz. residual languages and equivalence classes under the syntactic congruence relation, we would like to emphasize the universality of the concept of an observation table which can be used as a basis for the formulation of any kind of inference algorithm (i.e., in any kind of learning setting) founded on the MyhillNerode theorem without having to resort to a complicated or very specialized underlying mechanism such as an automaton. The ultimate goal is to illuminate the different aspects of the area of Grammatical Inference and the notions used in it from as many angles as possible in order to eventually be able to set it on a better, even more general theoretical foundation. 2 Basic Notions and Definitions Definition 1. An observation table is a triple T = (S, E, obs) with S, E ⊆ Σ ∗ finite, non-empty for some alphabet Σ and obs : S × E −→ {0, 1} a function with ( 1 if se ∈ L is confirmed, obs(s, e) = 0 if se ∈ / L is confirmed. For an observation table T = (S, E, obs) and s ∈ S, the observed behaviour of s is row(s) := {(se, obs(s, e))|e ∈ E}. Let row(S) denote the set {row(s)|s ∈ S}. S is partitioned into two sets red and blue where uv ∈ red ⇒ u ∈ red for u, v ∈ Σ ∗ (prefix-closedness), and blue := {sa ∈ S \ red|s ∈ red, a ∈ Σ}, i.e., blue contains the one-symbol extensions of red elements that are not in red. Definition 2. Two elements r, s ∈ S are obviously different (denoted by r <> s) iff ∃e ∈ E such that obs(r, e) 6= obs(s, e). Definition 3. Let T = (S = red ∪ blue, E, obs) be an observation table. – T is closed iff ¬∃s ∈ blue : ∀r ∈ red : r <> s. – T is consistent iff ∀s1 , s2 ∈ red, s1 a, s2 a ∈ S, a ∈ Σ : row(s1 ) = row(s2 ) ⇒ row(s1 a) = row(s2 a). Learning RFSA using observation tables 3 Definition 4. A finite-state automaton is a tuple A = (Σ, Q, Q0 , F, δ) with finite input alphabet Σ, finite non-empty state set Q, set of start states Q0 ⊆ Q, set of accepting states F ⊆ Q, and a transition function δ : Q × Σ −→ 2Q . If Q0 is a singleton and δ maps a set containing at most one state to any pair from Q × Σ the automaton is deterministic (a DFA), otherwise non-deterministic (a NFA). If δ maps a non-empty set of states to every pair in Q × Σ the automaton is total, otherwise partial. The transition function can always be extended to δ : Q × Σ ∗ −→ 2Q with δ(q, ε) = {q} and δ(q,Swa) = δ(δ(q, w), a) for q ∈ Q, a ∈ Σ, w ∈ Σ ∗ . Let δ(Q0 , w) denote the set {δ(q, w)|q ∈ Q0 } for Q0 ⊆ Q, w ∈ Σ ∗ . A state q ∈ Q is reachable if there is a string w ∈ Σ ∗ such that q ∈ δ(Q0 , w). A state q ∈ Q is useful if there are strings w1 , w2 ∈ Σ ∗ such that q ∈ δ(Q0 , w1 ) and δ(q, w2 ) ∩ F 6= ∅, otherwise useless. The language accepted by A is L(A) = {w ∈ Σ ∗ |δ(Q0 , w) ∩ F 6= ∅} and is called recognizable or regular. From an observation table T = (red∪blue, E, obs) with ε ∈ E we can derive an automaton AT = (Σ, QT , QT 0 , FT , δT ) with QT = row(red), QT 0 = {row(ε)}, FT = {row(s)|obs(s, ε) = 1, s ∈ red}, and δT (row(s), a) = {q ∈ QT |¬(q <> row(sa)), s ∈ red, a ∈ Σ, sa ∈ S}. If T is consistent AT is deterministic (this follows straight from the definition of δT – if T is not consistent there is at least one pair in QT × Σ to which δT maps a set containing more than one state). The DFA for a regular language L derived from a closed and consistent table has the minimal number of states (see [1], Theorem 1). This automaton is also called the canonical DFA AL for L and is unique up to a bijective renaming of states. However, if AL is required to be total it contains a “failure state” for all strings that are not a prefix of some string in L (if there are any, which for example is not the case for L = Σ ∗ \ X for all finite sets X), which does not have to appear at all or to receive all such strings otherwise. The Myhill-Nerode equivalence relation ≡L is defined as follows: r ≡L s iff re ∈ L ⇔ se ∈ L for all r, s, e ∈ Σ ∗ . The index of L is IL := |{[s0 ]L |s0 ∈ Σ ∗ }| where [s0 ]L denotes the equivalence class containing s0 . The Myhill-Nerode theorem (see for example [2]) states that IL is finite iff L is a regular language, i.e., iff it can be recognized by a finite-state automaton. The total canonical DFA AL has exactly IL states, and each state can be seen to represent an equivalence class under ≡L . Note the correspondence between the table T = (S, E, obs) representing AL and the equivalence classes of L under ≡L (reflected by the symbols S and E): S contains strings whose rows are candidates for states in AL , and the elements of E – ‘contexts’, as we will call them – can be taken as experiments which prove that two strings in S do indeed belong to different equivalence classes and that thus their rows should represent two different states. Definition 5. The reversal w of a string w ∈ Σ ∗ is defined inductively by ε := ε and aw := wa for a ∈ Σ, w ∈ Σ ∗ . The reversal of a set X ⊆ Σ ∗ is defined as X := {w|w ∈ X}. The reversal of an automaton A = (Σ, Q, Q0 , F, δ) is defined as A := (Σ, Q, F, Q0 , δ) with δ(q 0 , w) = {q ∈ Q|q 0 ∈ δ(q, w)} for q 0 ∈ Q, w ∈ Σ ∗ . Obviously, L(A) = L(A). Note that due to the duality of left and right congruence the reversal of a regular string language is a regular language as well. 4 Anna Kasprzik Definition 6. The residual language of a language L ⊆ Σ ∗ with regard to a string w ∈ Σ ∗ is defined w−1 L := {v ∈ Σ ∗ |wv ∈ L}. A residual language S −1 as −1 −1 w L is prime iff {v L|v L ( w−1 L} ( w−1 L, otherwise composed. The Myhill-Nerode theorem can be used to show that the set of distinct residual languages of a language L is finite iff L is regular. There is a bijection between the residual languages of L and the states of the minimal DFA AL = (Σ, QL , {q0 }, FL , δL ) defined by w−1 L 7→ q 0 for all w ∈ Σ ∗ with δL (q0 , w) = {q 0 }. Let LQ0 := {w|δ(Q0 , w) ∩ F 6= ∅} for some regular language L, some automaton A = (Σ, Q, Q0 , F, δ) recognizing L, and Q0 ⊆ Q. Let Lq denote L{q} . Definition 7. A residual finite-state automaton (RFSA) is an automaton A = (Σ, Q, Q0 , F, δ) such that Lq is a residual language of L(A) for all states q ∈ Q. The string w is a characterizing word for q ∈ Q iff Lq = w−1 L(A). Note that empty residual languages correspond to failure states. Definition 8. Let L ⊆ Σ ∗ be a regular language. The canonical RFSA RL = (Σ, QR , QR0 , FR , δR ) for L is defined by QR = {w−1 L|w−1 L is prime}, QR0 = {w−1 L ∈ QR |w−1 L ⊆ L}, FR = {w−1 L|ε ∈ w−1 L}, and δR (w−1 L, a) = {v −1 L ∈ QR |v −1 L ⊆ (wa)−1 L} for a ∈ Σ. RL is minimal with respect to the number of states (see [3], Theorem 1). Note that the fact that prime residual languages are non-empty sets by definition implies that RL cannot contain failure states. Definition 9. Let A = (Σ, Q, Q0 , F, δ) be a NFA, and let Q := {p ∈ 2Q |∃w ∈ Σ ∗ : δ(Q0 , w) = p}. A state q ∈ Q is said Sn to be coverable iff there exist q1 , . . . , qn ∈ Q \ {q} for n ≥ 1 such that q = i=1 qi . 3 Inferring a RFSA from an Observation Table We infer the canonical RFSA for some regular language L from a suitable combination of available information sources. Information sources can be an oracle answering membership or equivalence queries (including the provision of a counterexample in case of a negative answer for the latter) or positive or negative samples of L with certain properties (and there are possibly other kinds sources that could be taken into consideration as well). Known suitable combinations are for example an oracle answering membership and equivalence queries (a so-called minimally adequate teacher, or MAT), an oracle answering membership queries and positive data, or positive and negative data (for references see below). We proceed as follows: In a first step we use an existing algorithm in order to build an observation table T 0 = (red0 ∪ blue0 , E 0 , obs0 ) representing the canonical DFA for the reversal L of L. Appropriate algorithms for various learning settings can be found in [1] (L∗ , for MAT learning), [7] (ALTEX, which can be readapted to strings by treating them as non-branching trees, for learning from a membership oracle and positive data), or [8] (the meta-algorithm GENMODEL Learning RFSA using observation tables 5 covers MAT learning, learning from membership queries and positive data, and learning from positive and negative data, and can be adapted to a range of other combinations of information sources as well). All of these algorithms are based on the principle of adding elements to the set labeling the rows of the table (representing candidates for states in the canonical DFA) until it is closed and separating contexts (i.e., suffixes by which it can be shown that two states should be distinct in the derived automaton) to the set labeling the columns of the table until it is consistent – additions of one kind potentially resulting in the necessity of the other and vice versa – and, once the table is both closed and consistent, deriving an automaton from it that is either the canonical DFA in question or can be rejected by a counterexample obtained from one of the available information sources, which is then evaluated and used to start the cycle afresh. For reasons of convenience we will assume that the elements of red0 are all pairwise obviously different (which for example for tables built by GENMODEL [8] is the case, and can be made the case for tables built by the algorithm L∗ as well if instead of adding a counterexample and its prefixes to the set of candidates the example and its suffixes are added to the set of contexts) so that there is a natural bijection between red0 and the set of equivalence classes of L under the Myhill-Nerode relation, viz. the states of the canonical DFA recognizing L. Obviously, since the available information is for L and not L, data and queries must be adapted accordingly: Before submitting them to an oracle strings and automata have to be reversed (see Section 2), as well as given samples and counterexamples before using them for the construction of the table. The table T 0 should then be submitted to the following modifications: a) If AT 0 contains a failure state eliminate the representatives of that state from red0 (these should be rows consisting of 0’s only). The automaton derived from T 0 at this stage shall be denoted by Ad = (Σ, Qd , Qd0 , Fd , δd ). Ad is the generally partial minimal DFA for L and contains no useless states. b) For all s ∈ red0 with ∀e ∈ E 0 : obs0 (s, e) = 0 use Ad to find a string e0 ∈ Σ ∗ such that se0 ∈ L. This is always possible since Ad contains no failure state and hence a final state must be reachable by some string from every state of Ad . Add e0 to E 0 and fill up the table using Ad as a membership oracle. c) For all e ∈ E 0 with ∃e1 , . . . , en ∈ E 0 : ∀s ∈ red0 : [∀i ∈ {1, . . . , n} : obs0 (s, e) = 0 ⇒ obs0 (s, ei ) = 0] ∧ [obs(s, e) = 1 ⇒ ∃i ∈ {1, . . . , n} : obs0 (s, ei ) = 1] eliminate e from E 0 (and delete the corresponding columns). For example, the column labeled by e in the table in Figure 1 would be eliminated because its 1’s are all “covered” by other contexts. The table thus modified shall be denoted by T = (red ∪ blue, E, obs) with obs = obs0 and the automaton derived from it by AT = (Σ, QT , QT 0 , FT , δT ) with FT = Fd (we state these two equalities explicitly for the case in which the ε-column has been erased). Note that as the states of Ad represent equivalence classes under the Myhill-Nerode relation the addition of contexts to E 0 cannot create additional states in the automaton derived from the modified table. And since elements of red0 that are distinguished by the contexts eliminated in c) 6 Anna Kasprzik s1 s2 s3 s4 e 1 1 1 0 e1 0 1 0 0 e2 1 0 1 0 e3 1 1 0 0 e4 0 1 0 1 Fig. 1. An example for a coverable column must be distinguished by the contexts covering those in E 0 as well AT and Ad are isomorphic and L(AT ) = L(Ad ) = L. We use components of the table T and the automaton AT to define another automaton R = (Σ, QR , QR0 , FR , δR ) with – QR = {q ∈ 2QT |∃e ∈ E : ∀x1 ∈ q : obs(x1 , e) = 1 ∧ ∀x2 ∈ red \ q : obs(x2 , e) = 0}, – QR0 = {q ∈ QR |∀x ∈ q : obs(x, ε) = 1}, – FR = {q ∈ QR |ε ∈ q}, and – δR (q1 , a) = {q2 ∈ QR |q2 ⊆ δT (q1 , a)} for q1 ∈ QR and a ∈ Σ. This enables us to state the following result: Theorem 1. R is isomorphic to the canonical RFSA for L. The proof makes use of Theorem 3 from [3], repeated here as Theorem 2: Theorem 2. Let L be a regular language and let B = (Σ, QB , QB0 , FB , δB ) be a NFA such that B is a RFSA recognizing L whose states are all reachable. Then C(B) = (Σ, QC , QC0 , FC , δC ) with – – – – QC = {p ∈ QB |p is not coverable} QC0 = {p ∈ QC |p ⊆ QB0 } FC = {p ∈ QC |p ∩ FB 6= ∅} δC (p, a) = {p0 ∈ QC |p0 ⊆ δB (p, a)} for p ∈ QC and a ∈ Σ is the canonical RFSA recognizing L. It has been shown in [3], Section 5, that in a RFSA for a regular language L whose states are all reachable the non-coverable states correspond exactly to the prime residual languages of L and that consequently QC can be naturally identified with the set of states of the canonical RFSA for L. Proof of Theorem 1: First of all, observe that AT meets the conditions for B in Theorem 2: – All states of AT are reachable because AT contains no useless states, – AT is a RFSA because every DFA without useless states is a RFSA (see [3], Section 3), and – L(AT ) = L. Learning RFSA using observation tables 7 We therefore set B = AT . Since AT contains no useless states AT and AT have the same number of states and transitions, so that AT = B = B = (Σ, QT , FT , QT 0 , δT ). Assuming for the present that there is a bijection between QR and QC it is rather trivial to see that – there is a bijection between QR0 = {q ∈ QR |∀x ∈ q : obs(x, ε) = 1} and QC0 = {p ∈ QC |p ⊆ FT } due to the fact that FT = {x ∈ red|obs(x, ε) = 1}, – there is a bijection between FR = {q ∈ QR |ε ∈ q} and FC = {p ∈ QC |p ∩ QT 0 6= ∅} due to the fact that QT 0 = {ε}, – for every q ∈ QR , p ∈ QC , and a ∈ Σ such that q is the image of p under the bijection between QR and QC , δR (q, a) = {q2 ∈ QR |q2 ⊆ δT (q, a)} is the image of δC (p, a) = {p0 ∈ QC |p0 ⊆ δT (p, a)}. It remains to show that there is indeed a bijection between QR and the set of prime residual languages of L, represented by QC . For this, consider Proposition 1 from [3], repeated here as Lemma 1: Lemma 1. Let A = (Σ, Q, Q0 , F, δ) be a RFSA. For every prime residual language w−1 L(A) there exists a state q ∈ δ(Q0 , w) such that Lq = w−1 L(A). It is clear from the definition of QR that R is a RFSA: Every state in QR corresponds to exactly one column in T and to the set of contexts labeling the various occurrences of that column. Every context e ∈ E corresponds to the residual language e−1 L of L, and contexts with the same column correspond to the same residual language of L. Consequently, every state in QR corresponds to exactly one residual language of L. According to Lemma 1, there is a state in QR for each prime residual language of L, and thus, every prime residual language of L is represented by exactly one column occurring in the table (and at least one context in E). By modification c) of the table we have eliminated the columns that are coverable by other columns in the table. If a column is not coverable in the table the state it represents in QR is not coverable either: Assume that there was a column in the table that could be covered by a set of strings whose columns do not occur in the table. Then, due to Lemma 1, these must correspond to composed residual languages of L. If we were to add these to the table they would have to be eliminated again because of the restrictions imposed by modification c). This means that if a column is coverable at all it must be coverable by using only columns corresponding to prime residual languages of L as well, and these are all represented in the table. Therefore QR cannot contain any coverable states. Consequently, the correspondence between QR and the set of prime residual languages of L is one-to-one, and we have shown that R is isomorphic to the canonical RFSA for L. Corollary 1. Let L be a regular language. The number of prime residual languages of L equals the minimal number of contexts needed in order to distinguish the states of the canonical DFA for L. 8 Anna Kasprzik 4 Conclusion Some remarks: – Note that if the algorithm generating the original table yields a partial automaton (which in general is the case for ALTEX and also for GENMODEL in the settings of learning from membership queries and positive data, and from positive and negative data) then the canonical RFSA cannot be maximal with respect to the transitions either. – As should become clear from reading [9] and [10], the method presented here can be very easily adapted to ordinary two-dimensional and even ndimensional trees for an arbitrary n ≥ 1. – Modification c) might look relatively expensive. It can be made superfluous if one requires L to be a 0-reversible language: Definition 10. A regular language L is 0-reversible iff for every DFA A with L(A) = L A is deterministic as well. Those languages have the useful property that all their residual languages are disjoint and hence prime (see [6]). Algorithmically, modification c) could be handled as follows: For every e ∈ E 0 find the set containing e1 , . . . , en ∈ E 0 such that ∀s ∈ red0 : ∀i ∈ {1, . . . , n} : obs0 (s, e) = 0 ⇒ obs0 (s, ei ) = 0. Then for all s ∈ red0 with obs(s, e) = 1 check if ∃i ∈ {1, . . . , n} : obs0 (s, ei ) = 1. If the answer is positive, eliminate e from E 0 . This can be done in polynomial time. We add the information that it has been shown experimentally in [5] that for languages recognized by randomly generated DFAs the number of inclusion relations between residual languages represented in that DFA is extremely small, which may allow to hope that the full test does not have to be executed too often at all. References 1. Angluin, D.: Learning regular sets from queries and counterexamples. Information and Computation 75(2) (1987) 87–106 2. Hopcroft, J.E., Ullmann, J.D.: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Longman (1990) 3. Denis, F., Lemay, A., Terlutte, A.: Residual finite state automata. In: Proceedings of STACS. Volume 2010 of LNCS. Springer (2001) 144–157 4. Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using nondeterministic finite automata. In: Proceedings of ICGI. Volume 1891 of LNCS. Springer (2000) 39–50 5. Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using rfsa. In: Proceedings of ALT. Volume 2225 of LNCS. Springer (2001) 348–363 6. Denis, F., Lemay, A., Terlutte, A.: Some classes of regular languages identifiable in the limit from positive data. In: Grammatical Inference: Algorithms and Applications. Volume 2484 of LNCS. Springer (2002) 269–273 Learning RFSA using observation tables 9 7. Besombes, J., Marion, J.Y.: Learning tree languages from positive examples and membership queries. In: ALT 2003. Volume 3244 of LNCS. Springer (2004) 440–453 8. Kasprzik, A.: Meta-algorithm GENMODEL: Generalizing over three learning settings using observation tables. Technical Report 09-2, University of Trier (2009) 9. Kasprzik, A.: Making finite-state methods applicable to languages beyond contextfreeness via multi-dimensional trees. In J. Piskorski, B. Watson, A.Y., ed.: Postproceedings of FSMNLP 2008. IOS Press (2009) 98–109 10. Kasprzik, A.: A learning algorithm for multi-dimensional trees, or: Learning beyond context-freeness. In A. Clark, F. Coste, L.M., ed.: Proceedings of ICGI. Volume 5278 of LNAI. Springer (2008) 111–124 11. de la Higuera, C.: Grammatical inference. Unpublished manuscript.