Uploaded by 韩赵瑞

Lecture 18 - NFA and RegEx

advertisement
INTRODUCTION TO
DISCRETE STRUCTURES
NFA and RegEx
1.1–1.2
INTRODUCTION TO
DISCRETE
STRUCTURES
12.2 NFA and Regular Expressions
• Regular Expressions
• Non-deterministic finite automata (NFA)
• Rules for building NFA
Regular Sets
— A language is the set of strings derived from a grammar
— Type 3 grammars are known as regular grammars.
— A set is generated by a regular grammar if and only if it is a regular set.
— The set of languages generated by right-linear grammar are regular sets.
— Regular sets are sets that represents the value of a regular expression
Regular Expressions
Previously, we had been describing the languages generated by grammars
in English, but describing the sets this way may become overly
complicated
Regular expressions, or often called regex, are an algebraic way to describe
the sequence of characters that compose the strings in regular sets
— Each regular expression represents a set specified by these rules:
Æ represents the empty set, that is, the set with no strings;
λ represents the set {λ}, which is the set containing the empty string;
x represents the set {x} containing the string with one symbol x;
(AB) represents the concatenation of the sets represented by A and by B;
(A ∪ B) represents the union of the sets represented by A and by B;
A* represents the Kleene closure of the set represented by A.
Regex Operations
There are few important operations on RegEx such as
1. Concatenation – (Priority 3) - The ability to combine two RegEx to build a larger RegEx
2. or – (Priority 4) – Denoted by | or ∪, the operation allows a choice between one RegEx or
another
3. Closure – (Priority 2) - Denoted by *, it allows more general patterns to be accepted by the
machine.
4. Parenthesis – (Priority 1) - The use of parenthesis determines the order of operations
Regular Sets: The regular sets are those that can be formed using the operations of
concatenation, union, and Kleene closure in arbitrary order.
Regular Expressions Examples
Example: What are the strings in the regular sets specified by the regular
expressions 10*, (10)*, 0 ∪ 01, 0(0 ∪ 1)*, and (0*1)*?
Solution:
Writing Regex
Common Symbol Notations
• For a list of symbols x,
• [x] : Exactly one of one the symbols must be substituted here
• [x]+ : At least one or more symbols must be substituted here
• [x]* : Zero or more symbols can be substituted here
• [x]{n} : Exactly n symbols must be substituted here
• [x]{n,} : Exactly n or more symbols must be substituted here
• A backslash before a symbol such as \) means that that it represents the character string ‘)’ and not a
formula notation
Social Security Number. [0-9]{3}\-[0-9]{2}\-[0-9]{4}
Email Address. [a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Java Identifier. [a-zA-Z_$][a-zA-Z0-9_$]*
Phone Number. \([0-9]{3}\) [0-9]{3}\-[0-9]{4}
Duality between RegEx and DFA
RE. (Regular Expression) Concise way to describe a set of strings
DFA. (Deterministic Finite Automata) Machine to recognize a given string in a given set
Kleene’s Theorem: In 1956, Kleene showed that a set is recognized by a Finite State Automata
(FSA) if and only if it is a regular set.
• For any DFA, there exists a RE that describes the same set of strings
• For any RE, there exists a DFA that recognizes the same set of strings
• One of the most widely used string search algorithms called KMP algorithm, builds a DFA in
the preprocessing stage to efficiently search patterns within a long string.
RE to DFA
Describe a regex that is any binary string with 1’s that are a multiple of 3 (including zero 1’s)
Basic Plan.
1. Build a DFA from the RE
2. Simulate DFA with text as input
INTRODUCTION TO
DISCRETE
STRUCTURES
NFA and Regular Expressions
• Regular Expressions
• Non-deterministic finite automata (NFA)
• Rules for building NFA
State Machines
FSM
• A finite state machine is a state machine with finite number of states and an output. The
machine must have a start state. The machine runs as long as there is input
FSA
• Finite state automata is a machine with no output. It must have a start state and at least one
accept state. The machine runs until the end of input and based on where the machine ends
up, the input is accepted or rejected
DFA
• Deterministic finite automata is an FSA that is deterministic. That is, for each state there is a
unique state that it can transition into.
NFA
• Non-deterministic finite automata ?
NFA
— A nondeterministic finite-state automaton M = (S, I, f, s0, F) consists of
— A finite set S of states
— A finite input alphabet I
— A transition function f that assigns a set of states to every pair of state and input (so that f: S × I → P(S))
— An initial or start state s0
— A subset F of S consisting of final (or accepting) states.
— We can represent a nondeterministic finite-state automaton using a state table where we give a list of possible next
states for each pair of a state and an input value.
— We construct a state diagram for a nondeterministic automaton by including an edge from each state to all possible next
states, labeling edges with the input or inputs that lead to this transition.
— We use the abbreviation NFA for a nondeterministic finite-state automaton and DFA for a deterministic finite-state automata
when we needed to distinguish between NFA and DFA.
Example: Find the state diagram for the NFA with the state table shown in Table 2. The final states are s2 and s3.
Finding a DFA Equivalent to a NFA
— For every NFA there is an equivalent DFA. That is, if the language L is recognized by a NFA M0, then L is
also recognized by a DFA M1.
We construct the DFA M1 so that
—
The start symbol of M1 is {s0}.
—
The input set of M1 is the same as the input set of M0.
—
Each state in M1 is made from of a set of states in M0. Construct new states in M1 by interpreting each
unique output in the M0 transition table as a its singular own state, e.g. s! , 𝑠" , 𝑠# , ∅
—
Given a state {𝑠$! , 𝑠$" ,…, 𝑠$# } in M1 and an input symbol x, the transitions from this state to the next is the
union of transitions f(𝑠$! , x), f(𝑠$" ,x), … , f(𝑠$# ,x) from M0 for the states that compose the state from 𝑀"
—
The final states of M1 are any sets that contain a final state of M0.
— To see that M0 and M1 are equivalent, first suppose that an input string is recognized by M0. This means that
one of the states that can be reached from s0 is a final state. So, in M1 this input string leads from {s0} to a set
of states of M0 that contains the final state. Since this is a final state of M1, this string is also recognized by
M1.
— Conversely, a string that is not recognized by M0 does not lead to any final states in M0. Consequently, this
input string does not lead from {s0} to a final state of M1.
Finding an Equivalent DFA (cont.)
Example: Find a DFA that recognizes the same language as the NFA:
Solution: Following the steps of the procedure described on the
previous slide, we obtain the DFA shown here.
NFA Workshop 1
Consider the following NFA on the alphabet = {0,1} and states ={𝑞% , 𝑞" , 𝑞# } and start state 𝑞% .
1. What language is recognized by this machine?
∗
𝐿 = 0 + | (0 1 2, )
2. Fill in the transition table for NFA
3. convert NFA to a DFA
Input\State
q0
0
q0, q1
1
q2
NFA
q1
q2
q1, q2
Input\State
q0
q1
q2
[q0, q1]
[q1, q2]
∅
0
[q0, q1]
∅
∅
[q0, q1]
∅
∅
1
q2
∅
[q1, q2]
q2
[q1, q2]
∅
DFA
This Photo by Unknown Author is licensed under CC BYSA
q1 can be deleted as there is no way to reach it from q0
NFA Workshop 2
1. What language is recognized by this machine?
(0*1+)* | (0*1+01*)*
2. Fill in the transition table for NFA
3. convert NFA to a DFA
Input\State
s0
s1
0
[s0,s1]
s0
1
s3
[s1,s3]
NFA
s2
s3
[s0,s1,s2]
[s0,s2]
s1
Input\State
s0
s1
s2
s3
[s0,s1]
[s0,s2]
[s1,s3]
[s0,s1,s2]
[s0,s2,s3]
[s0,s1,s2,s3]
∅
0
[s0,s1]
s0
∅
[s0,s1,s2]
[s0,s1]
[s0,s1]
[s0,s1,s2]
[s0,s1]
[s0,s1,s2]
[s0,s1,s2]
∅
1
s3
s1
[s1,s3]
[s0,s2,s3]
[s1,s3]
[s0,s1,s2,s3]
[s0,s1,s2,s3]
[s0,s1,s2,s3]
∅
DFA
[s1,s3] [s0,s2]
s2 and [s0,s2] can be deleted as there is no way to reach them from s0
NFA Workshop 3
1. What language is recognized by this machine?
0*(0|1)(0|1)*
2. Fill in the transition table for NFA
3. convert NFA to a DFA
This Photo by Unknown Author is licensed under CC BY-SA
NFA
DFA
State\Input
a
b
c
d
e
0
a,b,c,d,e
c
1
d,e
c,e
b
State\Input
a
b
c
d
e
[a,b,c,d,e]
[d,e]
[c,e]
[b,c,d,e]
[b,c,e]
∅
0
[a,b,c,d,e]
c
∅
e
∅
[a,b,c,d,e]
e
∅
[c,e]
c
∅
1
[d,e]
[c,e]
b
∅
∅
[b,c,d,e]
∅
b
[b,c,e]
[b,c,e]
∅
e
d can be deleted as there is no way to reach it from a
DFA vs NFA
Build a DFA to recognize all strings of the form 𝑎 𝑏 ' 𝑎 𝑎 𝑏
∗
• Pretty easy to do, only takes n+2 states
• Can also easily construct a simple NFA
Now, construct an NFA to recognize the reversed string 𝑎 𝑏 ∗ 𝑎 𝑎 𝑏
'("
Constructing the corresponding DFA would take 2
unfeasible.
'
states. This exponential growth makes this
NFA’s with 𝜀-transitions
Allows the automaton to change its state without consuming any input symbols
Purpose of 𝜀-transitions
1. Modeling optional elements
• An 𝜀- transition from one state to another can be used to represent an optional
character
2. Adding flexibility to automata
•
Allow NFAs to represent languages that would be difficult to represent with DFAs
3. Modeling context-free grammars
•
Used to describe the syntax of programming languages and natural languages
Despite their flexibility, epsilon transitions do not increase expressive power. In other words, a
language that can be recognized by an NFA with epsilon transitions can be recognized without
epsilon transitions. However, epsilon transitions can make NFAs more concise and easier to
understand for some grammars.
𝜀-transition Example
Given the following NFA with 𝜀-transition and alphabet={a,b}, find the final states given the input
string ‘aba’.
Given this change to the automata from a previous slide,
how does the language recognized by the automata
change?
𝜖
INTRODUCTION TO
DISCRETE
STRUCTURES
NFA and Regular Expressions
• Regular Expressions
• Non-deterministic finite automata (NFA)
• Rules for building NFA (optional)
Rules for building NFA
Regular Expressions matching NFA’s
• Enclose RE in parenthesis
• One state per each character (including parenthesis)
• Red Є–transitions can change state w/o scanning a text
• Black match transitions change state and scan to next char
• Accept if any sequence of transitions ends in accept state
Steps in building NFA
Step 0. Add Є-transitions from parenthesis to next state
Step 1. Add a black transition edge from each match character to next
Step 2. Add 3 red Є-transitions edges for each closure (*) operator
Step 3. Add 2 red Є-transitions edges for each | operator
Download