Topic 3-Automata Theory

advertisement
Topic 3:
Automata Theory
1
Outline
Finite state machine, Regular
expressions, DFA,
NDFA, and their equivalence,
Grammars and Chomsky
hierarchy.
2
What is Automata Theory?
Study of abstract computing devices, or “machines”
Automaton = an abstract computing device
Note: A “device” need not even be a physical hardware!
A fundamental question in computer science:
Find out what different models of machines can do and
cannot do
The theory of computation
Computability vs. Complexity
3
Alan Turing (1912-1954)
(A pioneer of automata theory)
Father of Modern Computer
Science
English mathematician
Studied abstract machines
called Turing machines even
before computers existed
Heard of the Turing test?




4
Languages & Grammars
Or “words”
Languages: “A language is a
collection of sentences of finite
length all constructed from a
finite alphabet of symbols”
Grammars: “A grammar can be
regarded as a device that
enumerates the sentences of a
language” - nothing more,
nothing less
N. Chomsky, Information and
Control, Vol 2, 1959
Image source: Nowak et al. Nature, vol 417, 2002
5
The Chomsky Hierachy
• A containment hierarchy of classes of formal languages
Regular
(DFA)
Contextfree
(PDA)
Contextsensitive
(LBA)
Recursivelyenumerable
(TM)
6
The Central Concepts
of Automata Theory
7
Alphabet
An alphabet is a finite, non-empty set of symbols
We use the symbol ∑ (sigma) to denote an alphabet
Examples:
Binary: ∑ = {0,1}
All lower case letters: ∑ = {a,b,c,..z}
Alphanumeric: ∑ = {a-z, A-Z, 0-9}
DNA molecule letters: ∑ = {a,c,g,t}
…
8
Strings
A string or word is a finite sequence of symbols
chosen from ∑
Empty string is  (or “epsilon”)
Length of a string w, denoted by “|w|”, is equal to the
number of (non- ) characters in the string
|x| = 6
|x| = ?
E.g., x = 010100
x = 01  0  1  00 
xy = concatentation of two strings x and y
9
Languages
10
The Membership Problem
11
Languages
 Let S be a set of
characters. S is called the
alphabet.
 A language over S is set of
strings of characters drawn
from S.
12
Example of Languages
Alphabet = English characters
Language = English sentences
Alphabet = ASCII
Language = C++ programs,
Java, C#
13
Notation
 Languages are sets of
strings (finite sequence of
characters)
 Need some notation for
specifying which sets we
want
14
Regular Languages
 Each regular expression is a
notation for a regular
language (a set of words).
 If A is a regular expression,
we write L(A) to refer to
language denoted by A.
15
Regular Expression
 A regular expression (RE) is
defined inductively
a
ordinary character
from S

the empty string
16
Regular Expression
R|S
RS
R*
= either R or S
= R followed by S
(concatenation)
= concatenation of R
zero or more times
(R*=  |R|RR|RRR...)
17
RE Extentions
R?
R+
(R)
=  | R (zero or one R)
= RR* (one or more R)
= R (grouping)
18
RE Extentions
[abc] = a|b|c (any of listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything but
‘a’‘b’)
19
Regular Expression
RE
Strings in L(R)
a
“a”
ab
“ab”
a|b
“a” “b”
(ab)* “” “ab” “abab” ...
(a|)b “ab” “b”
20
Example: integers
 integer: a non-empty string
of digits
 digit
= ‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
 integer = digit digit*
21
Example: identifiers
 identifier:
string or letters or digits
starting with a letter
 C identifier:
[a-zA-Z_][a-zA-Z0-9_]*
22
Recap
Language L(R):
set of strings represented
by a regular expression R.
L(R) is the language
denoted by regular
expression R.
23
How to Use REs
 We need mechanism to
determine if an input string
w belongs to L(R), the
language denoted by
regular expression R.
24
Acceptor
 Such a mechanism is called
an acceptor.
input w
string
language L
acceptor
yes, if w  L
no, if w  L
25
Finite Automata (FA)
 Specification:
Regular Expressions
 Implementation:
Finite Automata
26
Finite Automata
Finite Automaton consists of
 An input alphabet (S)
 A set of states
 A start (initial) state
 A set of transitions
 A set of accepting (final)
states
27
Finite Automaton
State Graphs
A state
The start state
An accepting
state
28
Finite Automaton
State Graphs
a
A transition
29
Finite Automata
 A finite automaton accepts a
string if we can follow
transitions labelled with
characters in the string from
start state to some
accepting state.
30
FA Example
A FA that accepts only “1”
1
31
FA Example
 A FA that accepts any number
of 1’s followed by a single 0
1
0
32
FA Example
 A FA that accepts ab*a
 Alphabet: {a,b}
b
a
a
33
Table Encoding of FA
 Transition
table
a
b
a
0
0
1
2
1
a
1
2
err
2
b
err
1
err
34
RE → Finite Automata
 Can we build a finite
automaton for every regular
expression?
 Yes, – build FA inductively
based on the definition of
Regular Expression
35
NFA
Nondeterministic Finite
Automaton (NFA)
 Can have multiple
transitions for one input
in a given state
 Can have  - moves
36
Epsilon Moves
 ε – moves
machine can move from state
A to state B without consuming
input

A
B
37
NFA
operation of the automaton is not
completely defined by input
1
A
0
B
1
C
On input “11”, automaton could be
in either state
38
Execution of FA
A NFA can choose
 Whether to make -moves.
 Which of multiple
transitions to take for a
single input.
39
Acceptance of NFA
 NFA can get into multiple states
 Rule: NFA accepts if it can get
in a final state
1
A
0
B
1
C
0
40
DFA and NFA
Deterministic Finite Automata
(DFA)
 One transition per input per
state.
 No  - moves
41
Execution of FA
A DFA
 can take only one path
through the state graph.
 Completely determined by
input.
42
NFA vs DFA
 NFAs and DFAs recognize
the same set of languages
(regular languages)
 DFAs are easier to
implement – table driven.
43
NFA vs DFA
 For a given language, the
NFA can be simpler than
the DFA.
 DFA can be exponentially
larger than NFA.
44
NFA vs DFA
 NFAs are the key to
automating RE → DFA
construction.
45
RE → NFA Construction
Thompson’s construction
(CACM 1968)
 Build an NFA for each RE
term.
 Combine NFAs with
-moves.
46
RE → NFA Construction
Subset construction
NFA → DFA
 Build the simulation.
 Minimize number of states
in DFA (Hopcroft’s
algorithm)
47
RE → NFA Construction
Key idea:
 NFA pattern for each
symbol and each operator.
 Join them with -moves in
precedence order.
48
RE → NFA Construction
a
s0
s1
NFA for a
s0
a
s1

s3
b
s4
NFA for ab
49
RE → NFA Construction
NFA for a
s0
a
s1
50
RE → NFA Construction
NFA for a
NFA for b
s0
s3
a
b
s1
s4
51
RE → NFA Construction
NFA for a
NFA for b
s0
a
s1
s0
s3
a
b
s3
s1
s4
b
s4
52
RE → NFA Construction
NFA for a
s0
NFA for b
s0
a
a
b
s3
s1

s3
s1
s4
b
s4
NFA for ab
53
RE → NFA Construction

s1
a
s2

s0
s5

s3
b
s4

NFA for a | b
54
RE → NFA Construction
s1
a
s2
NFA for a
55
RE → NFA Construction
s1
a
s3
b
s2
s4
NFA for a and b
56
RE → NFA Construction

s1
a
s2

s0
s5

s3
b
s4

NFA for a | b
57
RE → NFA Construction

s0

s1
a
s2

s4

NFA for a*
58
RE → NFA Construction
s1
a
s2
NFA for a
59
RE → NFA Construction

s0

s1
a
s2

s4

NFA for a*
60
Example RE → NFA

NFA for a ( b|c )*
s0
a

s1 s2

 s4
s3
 s6
b
s5 

s8 s 9
c
s7 

61
Example RE → NFA
building NFA for a ( b|c )*
s0
a
s1
62
Example RE → NFA
NFA for a, b and c
s0
a
s4
b
s5
s6
c
s7
s1
63
Example RE → NFA
NFA for a and b|c
s0
a
 s4
s1
s3
 s6
b
s5 
s8
c
s7 
64
Example RE → NFA
NFA for a and ( b|c )*
s0
a
s1 s2

 s4
s3
 s6

b
s5 

s8 s 9
c
s7 

65
Example RE → NFA

NFA for a ( b|c )*
s0
a

s1 s2

 s4
s3
 s6
b
s5 

s8 s 9
c
s7 

66
Download