From NFA to minimal DFA Bas Ploeger1 1 Rob van Glabbeek2 Jan Friso Groote1 Department of Mathematics and Computer Science Technische Universiteit Eindhoven The Netherlands 2 National ICT Australia Sydney, Australia ProSe, 19 October 2006 Overview Introduction Problem Solution 1 Solution 2 Conclusions Preliminaries Finite automata I NFA is a tuple (S, Σ, →, i, F ) I DFA: every state has at most one outgoing a-transition for every a ∈ Σ Preliminaries Finite automata I NFA is a tuple (S, Σ, →, i, F ) I DFA: every state has at most one outgoing a-transition for every a ∈ Σ Language semantics I I σ Language of a state s: L(s) = {σ ∈ Σ∗ | ∃f ∈ F . s − →f} 0 Language preorder and equivalence on states s, s : s vL s 0 s ≡L s 0 ⇔ ⇔ L(s) ⊆ L(s 0 ) L(s) = L(s 0 ) Canonization Problem Given an NFA, find the smallest, language equivalent DFA. Canonization Problem Given an NFA, find the smallest, language equivalent DFA. Solution 1. Determinize NFA (subset construction) 2. Minimize DFA (Hopcroft) Canonization Problem Given an NFA, find the smallest, language equivalent DFA. Solution 1. Determinize NFA (subset construction) EXPTIME 2. Minimize DFA (Hopcroft) PTIME Canonization Problem Given an NFA, find the smallest, language equivalent DFA. Solution 1. Determinize NFA (subset construction) EXPTIME 2. Minimize DFA (Hopcroft) PTIME Minimal DFA can be exponentially larger than NFA Example Subset construction DFA NFA {0} 0 a,b a a 1 2 a b b 3 a Example Subset construction DFA NFA {0} 0 a a,b a a 1 2 a b b 3 a {1, 2} Example Subset construction DFA NFA {0} 0 a a,b a b a 1 2 a b b 3 a {1, 2} {2} Example Subset construction DFA NFA {0} 0 a a,b a b a 1 2 a b b 3 a a {1, 2} {2} Example Subset construction DFA NFA {0} 0 a a,b a b a 1 2 a a {1, 2} {2} a b b 3 b {3} Example Subset construction DFA NFA {0} 0 a a,b a a 1 2 a a {1, 2} b a a b b 3 b {3} {2} Example Subset construction DFA NFA {0} 0 a a,b a a 1 2 a a {1, 2} b a {2} a b b 3 b b {3} Example Minimization DFA NFA {0} 0 a a,b a a 1 2 a a {1, 2} b a {2} a b b 3 b b {3} Example Minimization DFA NFA {0} 0 a,b a a,b a 1 2 a a {1, 2} a b b 3 b {3} Relevance to Process Theory (1) Labelled Transition Systems I Process modelled by LTS (S, Σ, →, i) I No final states (computation “never” stops) Relevance to Process Theory (1) Labelled Transition Systems I Process modelled by LTS (S, Σ, →, i) I No final states (computation “never” stops) Trace semantics σ I Traces of a state s: Tr(s) = {σ ∈ Σ∗ | ∃f ∈ S . s − →f} I Trace equivalence: s ≡T s 0 ⇔ Tr(s) = Tr(s 0 ) Relevance to Process Theory (1) Labelled Transition Systems I Process modelled by LTS (S, Σ, →, i) I No final states (computation “never” stops) Trace semantics σ I Traces of a state s: Tr(s) = {σ ∈ Σ∗ | ∃f ∈ S . s − →f} I Trace equivalence: s ≡T s 0 ⇔ Tr(s) = Tr(s 0 ) Bisimulation semantics I Bisimulation is a relation R on states satisfying, for a ∈ Σ: I I I a a if s R t and s − → s 0 , then ∃t 0 . t − → t 0 and s 0 R t 0 ; a 0 a 0 if s R t and t − → t , then ∃s . s − → s 0 and s 0 R t 0 ; Bisimulation equivalence: s ↔ s 0 if there exists a bisimulation R with s R s 0 Relevance to Process Theory (2) Problem Given an LTS, minimize it under trace semantics Relevance to Process Theory (2) Problem Given an LTS, minimize it under trace semantics Facts I Deciding trace equivalence is PSPACE-complete Relevance to Process Theory (2) Problem Given an LTS, minimize it under trace semantics Facts I Deciding trace equivalence is PSPACE-complete I Deciding bisimulation equivalence is in PTIME Relevance to Process Theory (2) Problem Given an LTS, minimize it under trace semantics Facts I Deciding trace equivalence is PSPACE-complete I Deciding bisimulation equivalence is in PTIME If LTS is deterministic: ≡T equals ↔ I Relevance to Process Theory (2) Problem Given an LTS, minimize it under trace semantics Facts I Deciding trace equivalence is PSPACE-complete I Deciding bisimulation equivalence is in PTIME If LTS is deterministic: ≡T equals ↔ I Solution 1. Determinize LTS (subset construction) 2. Minimize LTS under bisimulation semantics (Paige-Tarjan) Problem Overview NFA subset construction DFA minimize mDFA Problem Overview NFA subset construction DFA minimize Questions I What if DFA is much larger than mDFA? mDFA Problem Overview NFA subset construction DFA minimize Questions I What if DFA is much larger than mDFA? I Can we avoid the generation of redundant states? mDFA Problem Overview NFA ? Questions I What if DFA is much larger than mDFA? I Can we avoid the generation of redundant states? mDFA Problem Overview NFA ? Questions I What if DFA is much larger than mDFA? I Can we avoid the generation of redundant states? I Space efficiency (average case) mDFA Solution 1 Subset construction Input: Output: NFA N = (SN , ΣN , − →N , iN , FN ) DFA D = (SD , ΣD , − →D , iD , FD ) Every DFA state P ∈ SD is a set of NFA states Solution 1 Subset construction Input: Output: NFA N = (SN , ΣN , − →N , iN , FN ) DFA D = (SD , ΣD , − →D , iD , FD ) Every DFA state P ∈ SD is a set of NFA states Basic idea I Add “irrelevant” NFA states to P I State is irrelevant if it does not alter L(P) Solution 1 Subset construction Input: Output: NFA N = (SN , ΣN , − →N , iN , FN ) DFA D = (SD , ΣD , − →D , iD , FD ) Every DFA state P ∈ SD is a set of NFA states Basic idea I Add “irrelevant” NFA states to P I State is irrelevant if it does not alter L(P) L(P)? Solution 1 Subset construction Input: Output: NFA N = (SN , ΣN , − →N , iN , FN ) DFA D = (SD , ΣD , − →D , iD , FD ) Every DFA state P ∈ SD is a set of NFA states Basic idea I Add “irrelevant” NFA states to P I State is irrelevant if it does not alter L(P) L(P)? I Language in NFA: LN (P) = S p∈P LN (p) Solution 1 Subset construction Input: Output: NFA N = (SN , ΣN , − →N , iN , FN ) DFA D = (SD , ΣD , − →D , iD , FD ) Every DFA state P ∈ SD is a set of NFA states Basic idea I Add “irrelevant” NFA states to P I State is irrelevant if it does not alter L(P) L(P)? S I Language in NFA: LN (P) = I Language in DFA: LD (P), defined as usual p∈P LN (p) Solution 1 Subset construction Input: Output: NFA N = (SN , ΣN , − →N , iN , FN ) DFA D = (SD , ΣD , − →D , iD , FD ) Every DFA state P ∈ SD is a set of NFA states Basic idea I Add “irrelevant” NFA states to P I State is irrelevant if it does not alter L(P) L(P)? S I Language in NFA: LN (P) = I Language in DFA: LD (P), defined as usual I Lemma: LN (P) = LD (P) for every P ⊆ SN p∈P LN (p) Solution 1 Closure I For any set P ⊆ SN define: P = {p ∈ SN | LN (p) ⊆ LN (P)} Solution 1 Closure I For any set P ⊆ SN define: P = {p ∈ SN | p vL P} Solution 1 Closure I For any set P ⊆ SN define: P = {p ∈ SN | p vL P} I Proposition: P ≡L P for any P ⊆ SN Solution 1 Closure I For any set P ⊆ SN define: P = {p ∈ SN | p vL P} I Proposition: P ≡L P for any P ⊆ SN Algorithm I Normal subset construction, but . . . I Replace every generated set P by P Solution 1 Closure I For any set P ⊆ SN define: P = {p ∈ SN | p vL P} I Proposition: P ≡L P for any P ⊆ SN Algorithm I Normal subset construction, but . . . I Replace every generated set P by P Main Theorem Given an NFA, the algorithm constructs the minimal, language equivalent DFA Example DFA NFA 0 {0} (a|b)a∗ b a,b a a 2 1 a a∗ b b a∗ b b 3 {λ} a Example DFA NFA 0 {0} (a|b)a∗ b a,b a a a 2 1 a a∗ b b a∗ b b 3 {λ} a {1, 2} Example DFA NFA 0 {0} (a|b)a∗ b a,b a a b a 2 1 a a∗ b b a∗ b b 3 {λ} a {1, 2} {2} Example DFA NFA 0 {0} (a|b)a∗ b a,b a a b a 2 1 a a∗ b b a∗ b b 3 {λ} a {1, 2} {1, 2} Example DFA NFA 0 {0} (a|b)a∗ b a,b a a,b a 2 1 a a∗ b b a∗ b b 3 {λ} a {1, 2} Example DFA NFA 0 {0} (a|b)a∗ b a,b a a,b a 2 1 a a∗ b b a∗ b b 3 {λ} a a {1, 2} Example DFA NFA 0 {0} (a|b)a∗ b a,b a a,b a 2 1 a a∗ b b b 3 a a {1, 2} a∗ b {λ} b {3} Complexity issues Closure I Language inclusion is PSPACE-complete Complexity issues Closure I Language inclusion is PSPACE-complete Simulation semantics I Simulation is a relation R on states satisfying, for a ∈ ΣN : I I I a a if s R t and s − → s 0 , then ∃t 0 . t − → t 0 and s 0 R t 0 if s R t then s ∈ F ⇒ t ∈ F 0 Simulation preorder: s ⊂ → s if there exists a simulation R with s R s 0. Complexity issues Closure I Language inclusion is PSPACE-complete Simulation semantics I Simulation is a relation R on states satisfying, for a ∈ ΣN : I I a a if s R t and s − → s 0 , then ∃t 0 . t − → t 0 and s 0 R t 0 if s R t then s ∈ F ⇒ t ∈ F I 0 Simulation preorder: s ⊂ → s if there exists a simulation R with s R s 0. I Simulation is in PTIME Implementation Trade-off I Use ⊂ → instead of vL in closure Implementation Trade-off I Use ⊂ → instead of vL in closure I Resulting DFA is not minimal I But at most as large as the DFA produced by subset construction Preliminary results 1 2 NFA states transitions 236 456 1.438 2.821 min. DFA states transitions 1.367 2.690 18.925 37.615 Preliminary results NFA states transitions 236 456 1.438 2.821 1 2 1 1 2 2 - normal solution 1 normal solution 1 min. DFA states transitions 1.367 2.690 18.925 37.615 DFA states transitions 42.665 83.416 4.712 9.252 7.403.224 14.616.424 176.105 348.005 memory 5,90 MB 4,39 MB 468,05 MB 63,57 MB time 0,200 sec 0,844 sec 72,468 sec 561,022 sec Solution 2 Idea I Why not remove irrelevant states from a set P? Solution 2 Idea I Why not remove irrelevant states from a set P? I A p ∈ P is irrelevant if: ∃q ∈ P . p 6= q ∧ p vL q Solution 2 Idea I Why not remove irrelevant states from a set P? I A p ∈ P is irrelevant if: ∃Q ⊆ P . p 6∈ Q ∧ p vL Q Solution 2 Idea I Why not remove irrelevant states from a set P? I A p ∈ P is irrelevant if: ∃Q ⊆ P . p 6∈ Q ∧ p vL Q Better idea I Replace P by the set of transitions of the states in P: a T = {(a, q) ∈ Σ × S | ∃p ∈ P . p − → q} Solution 2 Idea I Why not remove irrelevant states from a set P? I A p ∈ P is irrelevant if: ∃Q ⊆ P . p 6∈ Q ∧ p vL Q Better idea I Replace P by the set of transitions of the states in P: a T = {(a, q) ∈ Σ × S | ∃p ∈ P . p − → q} I Remove irrelevant transitions from T I A t ∈ T is irrelevant if: ∃u ∈ T . t 6= u ∧ t vL u Definitions Language semantics For transitions t = (a, q) and u = (b, r ): I Language of t: L(t) = {aσ ∈ Σ+ | σ ∈ L(q)} Definitions Language semantics For transitions t = (a, q) and u = (b, r ): I Language of t: L(t) = {aσ ∈ Σ+ | σ ∈ L(q)} I t vL u ⇔ a = b ∧ q vL r Definitions Language semantics For transitions t = (a, q) and u = (b, r ): I Language of t: L(t) = {aσ ∈ Σ+ | σ ∈ L(q)} I t vL u ⇔ a = b ∧ q vL r Compression For any set of transitions T define: if ¬∃t, u ∈ T . t 6= u ∧ t vL u T ↓(T − {t}) where t ∈ T and ∃u ∈ T . t 6= u ∧ t vL u, ↓T = otherwise Compression Language equivalence For a set of transitions T : I Language of T : L(T ) = S t∈T L(t) Compression Language equivalence For a set of transitions T : I Language of T : L(T ) = I Proposition: T ≡L ↓T S t∈T L(t) Compression Language equivalence For a set of transitions T : I Language of T : L(T ) = I Proposition: T ≡L ↓T S t∈T Uniqueness? I ↓T is not unique for a given T L(t) Compression Language equivalence For a set of transitions T : I Language of T : L(T ) = I Proposition: T ≡L ↓T S t∈T L(t) Uniqueness? I ↓T is not unique for a given T I ↓T is unique if no two states in the NFA are language equivalent Sets of transitions Language equivalence I For any state p, Tp is the set of outgoing transitions of p Sets of transitions Language equivalence I For any state p, Tp is the set of outgoing transitions of p I L(p) = L(Tp ) Sets of transitions Language equivalence I I For any state p, Tp is the set of outgoing transitions of p {λ} if p ∈ F L(p) = L(Tp ) ∪ ∅ if p 6∈ F Sets of transitions Language equivalence I I I For any state p, Tp is the set of outgoing transitions of p {λ} if p ∈ F L(p) = L(Tp ) ∪ ∅ if p 6∈ F S For any set of states P: L(P) = p∈P L(p) Sets of transitions Language equivalence I I I For any state p, Tp is the set of outgoing transitions of p {λ} if p ∈ F L(p) = L(Tp ) ∪ ∅ if p 6∈ F For any set of states P: S {λ} if p ∈ F L(P) = p∈P L(Tp ) ∪ ∅ if p 6∈ F Sets of transitions Language equivalence I I I For any state p, Tp is the set of outgoing transitions of p {λ} if p ∈ F L(p) = L(Tp ) ∪ ∅ if p 6∈ F For any set of states P: S {λ} if p ∈ F L(P) = p∈P L(Tp ) ∪ ∅ if p 6∈ F DFA state I A set of transitions T I T does not determine whether the state is final I Add a boolean Solution 2 Algorithm I Minimize NFA under language semantics I Subset construction; DFA states are tuples (T , b) I Replace every (T , b) by (↓T , b) Solution 2 Algorithm I Minimize NFA under language semantics I Subset construction; DFA states are tuples (T , b) I Replace every (T , b) by (↓T , b) Implementation I I Minimize NFA under simulation semantics Use ⊂ → for compression Conclusions Done: I Two algorithms for NFA − → minimal DFA I Aim: improve space efficiency (average case) Conclusions Done: I Two algorithms for NFA − → minimal DFA I Aim: improve space efficiency (average case) I Implementation trade-off: simulation semantics I Suboptimal results Conclusions Done: I Two algorithms for NFA − → minimal DFA I Aim: improve space efficiency (average case) I Implementation trade-off: simulation semantics I Suboptimal results To do: I Implement Solution 2 Conclusions Done: I Two algorithms for NFA − → minimal DFA I Aim: improve space efficiency (average case) I Implementation trade-off: simulation semantics I Suboptimal results To do: I Implement Solution 2 I Investigate performance gain in practice (benchmarking!) I Compare to other tools Conclusions Done: I Two algorithms for NFA − → minimal DFA I Aim: improve space efficiency (average case) I Implementation trade-off: simulation semantics I Suboptimal results To do: I Implement Solution 2 I Investigate performance gain in practice (benchmarking!) I Compare to other tools I Finish paper