Basic compiler

advertisement
Chapter V: Compiler
Overview:
 To study the design and operation of compiler for
high-level programming languages.
 Contents
 Basic
compiler (one-pass compiler) functions
 Machine-dependent extension:
(object-code generation & code optimization)
 Compiler design alternative:
multi-pass compiler, interpreters, p-code compilers &
compiler-compilers.
1
Compiler
Basic compiler functions
 Example
2
Compiler
Basic compiler functions (cont.)
 Source program
 Regard each statement as a sequence of token.
 The task of scanning the source statement, recognizing and
classifying the various tokens, is known as lexical analysis.
(scanner)
 Recognized all tokens as some language construct by
the grammar.
 This process is called syntactic analysis or parsing. (parser)
 Generation of object code.
3
Compiler
Compilation process
 Scanning (lexical analysis)
 Parsing (syntactic analysis)
 Code generation
 Ps. It can achieve in a single pass !
4
Compiler
Grammars
 A grammar for a programming language is a
formal description of the syntax, of programs and
individual statements written in the language.
 The difference between syntax and semantics,
 E.g.,
I := J + K
X := Y + I
where X,Y : Real
I,J,K : Integer
They are identical syntax.
However, the semantic are quite different.
5
Compiler
Grammars (cont.)
 BNF (Backus-Naur Form)
 A kind of syntax description.
 Simple.
 Widely used.
 It provide capabilities that are sufficient for most purposes.
 BNF consists of a set of rules, each of which defines
the syntax of some construct in the programming
language.

E.g.,
<read> ::= READ ( <id-list>)
6
Compiler
Grammars (cont.)
<read> ::= READ ( <id-list>)
 <id-list> ::= id | <id-list>, id

Character strings enclosed between < and > are called
nonterminal symbol.
 Character strings not enclosed between < and > are
called terminal symbol (I.e, tokens).


E.g.,
READ(value, sum, x, y)
7
Compiler
Simplified Pascal grammar
8
Compiler
Simplified Pascal grammar (cont.)
9
Compiler
Simplified Pascal grammar (cont.)
 To display the analysis of a source statement in
terms of a grammar a a tree (parse tree or syntax
tree).
10
Compiler
The parse tree for
VARIANCE := SUMSQ DIV 100 – MEAN * MEAN
11
Compiler
Grammars (cont.)
 Draw parse tree for
 ALPHA – BETA * GAMMA
 If there is more than one possible parse tree for a
given statement, the grammar is said to be
ambiguous.
 The ambiguous grammar would leave doubt about
what object code should be generated.
12
Compiler
13
Compiler
14
Compiler
Lexical analysis (scanning)
 Scanning the program to be compiled and recognizing
the tokens that make up the source statements.
 Scanner are usually designed to recognize keywords,
operators, and identifiers, integer, floating-point
numbers, character strings, …,etc.
 The identifier might be defined by the rules:
<ident> ::= <letter> | <ident> <letter> | <ident> <digit>
 <letter> ::= A | B | C | D | … | Z
 <digit> ::= 0 | 1 | 2 | 3 | … | 9

15
Compiler
Token coding scheme
16
Compiler
Lexical scan
17
Compiler
The lexical scanning
 It must deal with the following cases:
 For example,
 DO 10 I = 1, 100
 DO 10 I =1
 (FORTRAN ignores blank in the statement)
 IF (THEN .EQ. ELSE) THEN
IF = THEN
ELSE
THEN = IF
ENDIF
 A number of tools have been developed for automatically
constructing lexical scanners from specifications stated in a
special-purpose language.
18
Compiler
Modeling Scanners as Finite Automata
 The tokens of most programming languages can
be recognized by a finite automation.
 Starting state vs. final state.
 If the automation stops in a final state, we say that
it recognizes (or accept) the string being scanned,
otherwise, it fails to recognize the string.
19
Compiler
Modeling Scanners as Finite Automata (cont.)
20
Compiler
Modeling Scanners as Finite Automata (cont.)
21
Compiler
Modeling Scanners as Finite Automata (cont.)
22
Compiler
Modeling Scanners as Finite Automata (cont.)
23
Compiler
The implementation of finite automata
 Using algorithm code (for Fig. 5.8 (b))
24
Compiler
The implementation of finite automata (cont.)
 Using tabular representation
25
Compiler
Download