Course Overview PART I: overview material 1 2 3 Introduction Language processors (tombstone diagrams, bootstrapping) Architecture of a compiler Supplementary material: Theoretical foundations PART II: inside a compiler (Finite-state machines) 4 Syntax analysis 5 6 7 Contextual analysis Runtime organization Code generation PART III: conclusion 8 9 Interpretation Review 1 Finite State Machines (aka Finite Automata) • A FSM is similar to a compiler in that: – A compiler recognizes legal programs in some (source) language. – A finite-state machine recognizes legal strings in some language. • Example: Pascal Identifiers – sequences of one or more letters or digits, starting with a letter: letter | digit letter S A 2 Finite State Machines viewed as Graphs • A state • The start state • An accepting state • A transition a 3 Finite State Machines • Transition s1 a > s2 • Is read In state s1 on input “a” go to state s2 • If end of input – If in accepting state => accept – Otherwise => reject • If no transition possible (got stuck) => reject 4 Language defined by FSM • The language defined by a FSM is the set of strings accepted by the FSM. – Are in the language of the FSM shown above: • x, mp2, XyZzy, position27. – Are not in the language of the FSM shown above: • 123, a?, 13apples. 5 Example: Integer Literals • FSM that accepts integer literals with an optional + or - sign: digit B digit digit + S A 6 Formal Definition • Each finite state machine is a 5-tuple (, Q, , q, F) that consists of: – – – – – An input alphabet A set of states Q A start state q A set of accepting states (or final states) F Q is a state transition function: Q x Q that encodes transitions statei input> statej 7 State-Transition Function for the integer-literal example: (S, +) = A (S, –) = A (S, digit) = B (A, digit) = B (B, digit) = B 8 FSM Examples 0 1 A 1 B 0 Accepts strings over alphabet {0,1} that end in 1 9 FSM Examples 1 a a b 2 b 3 a 4 b b b a 5 a Accepts strings over alphabet {a,b} that begin and end with same symbol 10 FSM Examples 0 Accepts strings over {0,1,2} such that sum of digits is a multiple of 3 1 Start 0 2 0 1 1 2 2 0 2 1 11 FSM Examples 0 0 1 Even Odd 1 Accepts strings over {0,1} that have an odd number of ones 12 FSM Examples 1 0 1 1 0 '0' 0,1 '00' '001' 0 Accepts strings over {0,1} that contain the substring 001 13 Examples • Design a FSM to recognize strings with an equal number of ones and zeros. – Not possible • Design a FSM to recognize strings with an equal number of substrings "01" and "10". – Perhaps surprisingly, this is possible 14 FSM Examples 0 1 0 1 Accepts strings with an equal number of substrings "01" and "10" 0 1 1 1 0 0 15 TEST YOURSELF • Question 1: Draw a finite-state machine that accepts Java identifiers – one or more letters, digits, or underscores, starting with a letter or an underscore. • Question 2: Draw a finite-state machine that accepts only Java identifiers that do not end with an underscore 16 TEST YOURSELF Question 3: What strings does this FSM accept? Describe the set of accepted strings in English. 1 q0 q2 1 0 0 0 0 1 q1 q3 1 17 Two kinds of Finite State Machines Deterministic (DFSM): – No state has more than one outgoing edge with the same label. [All previous FSM were DFSM.] Non-deterministic (NFSM): – States may have more than one outgoing edge with same label. – Edges may be labeled with (epsilon), the empty string. [Note that some books use the symbol .] – The automaton can make an epsilon transition without consuming the current input character. 18 Example of NFSM • integer-literal example: digit B digit + S A 19 Example of NFSM 0,1 0 1 0 '0' 0,1 '00' '001' Accepts strings over {0,1} that contain the substring 001 20 Non–deterministic finite state machines (NFSM) • sometimes simpler than DFSM • can be in multiple states at the same time • NFSM accepts a string if – – – – there exists a sequence of moves starting in the start state, ending in a final state, that consumes the entire string. • Examples: – Consider the integer-literal NFSM on input "+752" – Consider the second NFSM on input "10110001" 21 Equivalence of DFSM and NFSM • Theorem: – For each non-deterministic finite state machine N, we can construct a deterministic finite state machine D such that N and D accept the same language. – [proof omitted] • Theorem: – Every deterministic finite state machine can be regarded as a non–deterministic finite state machine that just doesn’t use the extra non– deterministic capabilities. 22 How to Implement a FSM A table-driven approach: • Table: – one row for each state in the machine, and – one column for each possible character. • Table[j][k] – which state to go to from state j on input character k, – an empty entry corresponds to the machine getting stuck. 23 The table-driven program for a DFSM state = S repeat { // S is the start state k = next character from the input if (k == EOF) then // end of input if (state is a final state) then accept else reject state = T[state][k] if (state = empty) then reject // got stuck } 24