Chapter V: Compiler Overview: To study the design and operation of compiler for high-level programming languages. Contents Basic compiler (one-pass compiler) functions Machine-dependent extension: (object-code generation & code optimization) Compiler design alternative: multi-pass compiler, interpreters, p-code compilers & compiler-compilers. 1 Compiler Basic compiler functions Example 2 Compiler Basic compiler functions (cont.) Source program Regard each statement as a sequence of token. The task of scanning the source statement, recognizing and classifying the various tokens, is known as lexical analysis. (scanner) Recognized all tokens as some language construct by the grammar. This process is called syntactic analysis or parsing. (parser) Generation of object code. 3 Compiler Compilation process Scanning (lexical analysis) Parsing (syntactic analysis) Code generation Ps. It can achieve in a single pass ! 4 Compiler Grammars A grammar for a programming language is a formal description of the syntax, of programs and individual statements written in the language. The difference between syntax and semantics, E.g., I := J + K X := Y + I where X,Y : Real I,J,K : Integer They are identical syntax. However, the semantic are quite different. 5 Compiler Grammars (cont.) BNF (Backus-Naur Form) A kind of syntax description. Simple. Widely used. It provide capabilities that are sufficient for most purposes. BNF consists of a set of rules, each of which defines the syntax of some construct in the programming language. E.g., <read> ::= READ ( <id-list>) 6 Compiler Grammars (cont.) <read> ::= READ ( <id-list>) <id-list> ::= id | <id-list>, id Character strings enclosed between < and > are called nonterminal symbol. Character strings not enclosed between < and > are called terminal symbol (I.e, tokens). E.g., READ(value, sum, x, y) 7 Compiler Simplified Pascal grammar 8 Compiler Simplified Pascal grammar (cont.) 9 Compiler Simplified Pascal grammar (cont.) To display the analysis of a source statement in terms of a grammar a a tree (parse tree or syntax tree). 10 Compiler The parse tree for VARIANCE := SUMSQ DIV 100 – MEAN * MEAN 11 Compiler Grammars (cont.) Draw parse tree for ALPHA – BETA * GAMMA If there is more than one possible parse tree for a given statement, the grammar is said to be ambiguous. The ambiguous grammar would leave doubt about what object code should be generated. 12 Compiler 13 Compiler 14 Compiler Lexical analysis (scanning) Scanning the program to be compiled and recognizing the tokens that make up the source statements. Scanner are usually designed to recognize keywords, operators, and identifiers, integer, floating-point numbers, character strings, …,etc. The identifier might be defined by the rules: <ident> ::= <letter> | <ident> <letter> | <ident> <digit> <letter> ::= A | B | C | D | … | Z <digit> ::= 0 | 1 | 2 | 3 | … | 9 15 Compiler Token coding scheme 16 Compiler Lexical scan 17 Compiler The lexical scanning It must deal with the following cases: For example, DO 10 I = 1, 100 DO 10 I =1 (FORTRAN ignores blank in the statement) IF (THEN .EQ. ELSE) THEN IF = THEN ELSE THEN = IF ENDIF A number of tools have been developed for automatically constructing lexical scanners from specifications stated in a special-purpose language. 18 Compiler Modeling Scanners as Finite Automata The tokens of most programming languages can be recognized by a finite automation. Starting state vs. final state. If the automation stops in a final state, we say that it recognizes (or accept) the string being scanned, otherwise, it fails to recognize the string. 19 Compiler Modeling Scanners as Finite Automata (cont.) 20 Compiler Modeling Scanners as Finite Automata (cont.) 21 Compiler Modeling Scanners as Finite Automata (cont.) 22 Compiler Modeling Scanners as Finite Automata (cont.) 23 Compiler The implementation of finite automata Using algorithm code (for Fig. 5.8 (b)) 24 Compiler The implementation of finite automata (cont.) Using tabular representation 25 Compiler