Topic 3: Automata Theory 1 Outline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy. 2 What is Automata Theory? Study of abstract computing devices, or “machines” Automaton = an abstract computing device Note: A “device” need not even be a physical hardware! A fundamental question in computer science: Find out what different models of machines can do and cannot do The theory of computation Computability vs. Complexity 3 Alan Turing (1912-1954) (A pioneer of automata theory) Father of Modern Computer Science English mathematician Studied abstract machines called Turing machines even before computers existed Heard of the Turing test? 4 Languages & Grammars Or “words” Languages: “A language is a collection of sentences of finite length all constructed from a finite alphabet of symbols” Grammars: “A grammar can be regarded as a device that enumerates the sentences of a language” - nothing more, nothing less N. Chomsky, Information and Control, Vol 2, 1959 Image source: Nowak et al. Nature, vol 417, 2002 5 The Chomsky Hierachy • A containment hierarchy of classes of formal languages Regular (DFA) Contextfree (PDA) Contextsensitive (LBA) Recursivelyenumerable (TM) 6 The Central Concepts of Automata Theory 7 Alphabet An alphabet is a finite, non-empty set of symbols We use the symbol ∑ (sigma) to denote an alphabet Examples: Binary: ∑ = {0,1} All lower case letters: ∑ = {a,b,c,..z} Alphanumeric: ∑ = {a-z, A-Z, 0-9} DNA molecule letters: ∑ = {a,c,g,t} … 8 Strings A string or word is a finite sequence of symbols chosen from ∑ Empty string is (or “epsilon”) Length of a string w, denoted by “|w|”, is equal to the number of (non- ) characters in the string |x| = 6 |x| = ? E.g., x = 010100 x = 01 0 1 00 xy = concatentation of two strings x and y 9 Languages 10 The Membership Problem 11 Languages Let S be a set of characters. S is called the alphabet. A language over S is set of strings of characters drawn from S. 12 Example of Languages Alphabet = English characters Language = English sentences Alphabet = ASCII Language = C++ programs, Java, C# 13 Notation Languages are sets of strings (finite sequence of characters) Need some notation for specifying which sets we want 14 Regular Languages Each regular expression is a notation for a regular language (a set of words). If A is a regular expression, we write L(A) to refer to language denoted by A. 15 Regular Expression A regular expression (RE) is defined inductively a ordinary character from S the empty string 16 Regular Expression R|S RS R* = either R or S = R followed by S (concatenation) = concatenation of R zero or more times (R*= |R|RR|RRR...) 17 RE Extentions R? R+ (R) = | R (zero or one R) = RR* (one or more R) = R (grouping) 18 RE Extentions [abc] = a|b|c (any of listed) [a-z] = a|b|....|z (range) [^ab] = c|d|... (anything but ‘a’‘b’) 19 Regular Expression RE Strings in L(R) a “a” ab “ab” a|b “a” “b” (ab)* “” “ab” “abab” ... (a|)b “ab” “b” 20 Example: integers integer: a non-empty string of digits digit = ‘0’|’1’|’2’|’3’|’4’| ’5’|’6’|’7’|’8’|’9’ integer = digit digit* 21 Example: identifiers identifier: string or letters or digits starting with a letter C identifier: [a-zA-Z_][a-zA-Z0-9_]* 22 Recap Language L(R): set of strings represented by a regular expression R. L(R) is the language denoted by regular expression R. 23 How to Use REs We need mechanism to determine if an input string w belongs to L(R), the language denoted by regular expression R. 24 Acceptor Such a mechanism is called an acceptor. input w string language L acceptor yes, if w L no, if w L 25 Finite Automata (FA) Specification: Regular Expressions Implementation: Finite Automata 26 Finite Automata Finite Automaton consists of An input alphabet (S) A set of states A start (initial) state A set of transitions A set of accepting (final) states 27 Finite Automaton State Graphs A state The start state An accepting state 28 Finite Automaton State Graphs a A transition 29 Finite Automata A finite automaton accepts a string if we can follow transitions labelled with characters in the string from start state to some accepting state. 30 FA Example A FA that accepts only “1” 1 31 FA Example A FA that accepts any number of 1’s followed by a single 0 1 0 32 FA Example A FA that accepts ab*a Alphabet: {a,b} b a a 33 Table Encoding of FA Transition table a b a 0 0 1 2 1 a 1 2 err 2 b err 1 err 34 RE → Finite Automata Can we build a finite automaton for every regular expression? Yes, – build FA inductively based on the definition of Regular Expression 35 NFA Nondeterministic Finite Automaton (NFA) Can have multiple transitions for one input in a given state Can have - moves 36 Epsilon Moves ε – moves machine can move from state A to state B without consuming input A B 37 NFA operation of the automaton is not completely defined by input 1 A 0 B 1 C On input “11”, automaton could be in either state 38 Execution of FA A NFA can choose Whether to make -moves. Which of multiple transitions to take for a single input. 39 Acceptance of NFA NFA can get into multiple states Rule: NFA accepts if it can get in a final state 1 A 0 B 1 C 0 40 DFA and NFA Deterministic Finite Automata (DFA) One transition per input per state. No - moves 41 Execution of FA A DFA can take only one path through the state graph. Completely determined by input. 42 NFA vs DFA NFAs and DFAs recognize the same set of languages (regular languages) DFAs are easier to implement – table driven. 43 NFA vs DFA For a given language, the NFA can be simpler than the DFA. DFA can be exponentially larger than NFA. 44 NFA vs DFA NFAs are the key to automating RE → DFA construction. 45 RE → NFA Construction Thompson’s construction (CACM 1968) Build an NFA for each RE term. Combine NFAs with -moves. 46 RE → NFA Construction Subset construction NFA → DFA Build the simulation. Minimize number of states in DFA (Hopcroft’s algorithm) 47 RE → NFA Construction Key idea: NFA pattern for each symbol and each operator. Join them with -moves in precedence order. 48 RE → NFA Construction a s0 s1 NFA for a s0 a s1 s3 b s4 NFA for ab 49 RE → NFA Construction NFA for a s0 a s1 50 RE → NFA Construction NFA for a NFA for b s0 s3 a b s1 s4 51 RE → NFA Construction NFA for a NFA for b s0 a s1 s0 s3 a b s3 s1 s4 b s4 52 RE → NFA Construction NFA for a s0 NFA for b s0 a a b s3 s1 s3 s1 s4 b s4 NFA for ab 53 RE → NFA Construction s1 a s2 s0 s5 s3 b s4 NFA for a | b 54 RE → NFA Construction s1 a s2 NFA for a 55 RE → NFA Construction s1 a s3 b s2 s4 NFA for a and b 56 RE → NFA Construction s1 a s2 s0 s5 s3 b s4 NFA for a | b 57 RE → NFA Construction s0 s1 a s2 s4 NFA for a* 58 RE → NFA Construction s1 a s2 NFA for a 59 RE → NFA Construction s0 s1 a s2 s4 NFA for a* 60 Example RE → NFA NFA for a ( b|c )* s0 a s1 s2 s4 s3 s6 b s5 s8 s 9 c s7 61 Example RE → NFA building NFA for a ( b|c )* s0 a s1 62 Example RE → NFA NFA for a, b and c s0 a s4 b s5 s6 c s7 s1 63 Example RE → NFA NFA for a and b|c s0 a s4 s1 s3 s6 b s5 s8 c s7 64 Example RE → NFA NFA for a and ( b|c )* s0 a s1 s2 s4 s3 s6 b s5 s8 s 9 c s7 65 Example RE → NFA NFA for a ( b|c )* s0 a s1 s2 s4 s3 s6 b s5 s8 s 9 c s7 66