LING/C SC/PSYC 438/538 Lecture 11 Sandiway Fong Administrivia • Homework 3 graded Last Time 1. Introduced Regular Languages – can be generated by regular expressions – or Finite State Automata (FSA) – or regular grammars --- not yet introduced 2. Deterministic and non-deterministic FSA 3. DFSA can be easily encoded in Perl: – hash table for the transition function – foreach loop over a string (character by character) – conditional to check for end state 4. NDFSA can be converted into DFSA – example of the set of states construction – Practice: ungraded homework exercise Ungraded Homework Exercise • do not submit, do the following exercise to check your understanding – apply the set-of-states construction technique to the two machines on the ε-transition slide (repeated below) – self-check your answer: • verify in each case that the machine produced is deterministic and accurately simulates its ε-transition counterpart > a b ε > a ε b Ungraded Homework Exercise Review • Converting a NDFSA into a DFSA > a 1 b 2 3 Note: this machine with an ε-transition is non-deterministic ε Note: this machine is deterministic > {1,3} a {2} b {3} Ungraded Homework Exercise Review • Converting a NDFSA into a DFSA > a 1 b 2 3 Note: this machine with an ε-transition is non-deterministic ε Note: this machine is deterministic > {1,2} a {2} b b {3} Last Time Regular Languages • Three formalisms – All formally equivalent (no difference in expressive power) – i.e. if you can encode it using a RE, you can do it using a FSA or regular grammar, and so on … Perl regular expressions stuff out here Regular Expressions FSA Regular Languages Regular Grammars talk more about formal equivalence later today… Perl Regular Expressions • Perl regex can include backreferences to groupings (i.e. \1, etc.) – backreferences give Perl regexs expressive power beyond regular languages: • the set of prime numbers is not a regular language Lprime = {2, 3, 5, 7, 11, 13, 17, 19, 23,.. } can be proved using the Pumping Lemma for regular languages (later) Backreferences and FSA • Deep question: > s a x a y b x2 a – why are backreferences impossible in FSA? Example: Doesn’t work! Suppose you wanted a machine that Why? accepted /(a+b+)\1/ One idea: link two copies of the machine together • Perl implementation: – how to modify it get the backreference effect? b a b y2 b Regular Languages and FSA • Formal (constructive) set-theoretic definition of a regular language • Correspondence between REs and Regular Languages • concatenation (juxtaposition) • union (| also [ ]) • Kleene closure (*) = (x+ = xx*) • Note: • backreferences are memory devices and thus are too powerful • e.g. L = {ww} and prime number testing (earlier slides) Regular Languages and FSA • Other closure properties: • Not true higher up: e.g. context-free grammars as we’ll see later Equivalence: FSA and Regexs Textbook gives one direction only • Case by case: a) b) c) Empty string Empty set Any character from the alphabet Equivalence: FSA and Regexs • Concatenation: – Link final state of FSA1 to initial state of FSA2 using an empty transition Note: empty transition can be eliminated using the set of states construction (see earlier slides in this lecture) Equivalence: FSA and Regexs • Kleene closure: – repetition operator: zero or more times – use empty transitions for loopback and bypass Equivalence: FSA and Regexs • Union: aka disjunction – Non-deterministically run both FSAs at the same time, accept if either one accepts Regular Languages and FSA • Other closure properties: Let’s consider building the FSA machinery for each of these guys in turn… Regular Languages and FSA • Other closure properties: Regular Languages and FSA • Other closure properties: Regular Languages and FSA • Other closure properties: Regular Languages and FSA • Other closure properties: Regular Expressions from FSA Textbook Exercise: find a RE for Examples (* denotes string not in the language): *ab *ba bab λ (empty string) bb *baba babab Regular Expressions from FSA • Draw a FSA and convert it to a RE: b b > 1 b 2 a 3 ε b* b ( ab+ )+ = b+(ab+)*| ε b 4 [Powerpoint Animation] Regular Expressions from FSA • Perl implementation: $s = "ab ba bab bb baba babab"; while ($s =~ /\b(b+(ab+)*)\b/g) { print "<$1> match!\n"; } Note: doesn’t include the empty string case • Output: perl test.perl <bab> match! <bb> match! <babab> match! Note: /../g global flag for multiple matches