
Chapter 1. Introduction
J. H. Wang
Sep. 10, 2008
• Language Processors
• The Structure of a Compiler
• The Evolution of Programming Languages
• The Science of Building a Compiler
• Applications of Compiler Technology
• Programming Languages Basics
Language Processors
• A compiler
source program
target program
• Running the target program
Target Program
• An interpreter
– Much slower program execution
– Better error diagnostics
source program
• A hybrid compiler, e.g. Java
source program
A Language Processing System
source program
modified source program
target assembly program
relocatable machine code
target machine code
library files
relocatable object files
The Structure of a Compiler
• Analysis
– Front end
– Using a grammatical structure to create an
intermediate representation
– Collecting information about the source program in a
symbol table
• Synthesis
– Back end
– Constructing the target program from the
intermediate representation and the symbol table
Phases of a Compiler
character stream
Lexical Analyzer
token stream
Syntax Analyzer
syntax tree
Semantic Analyzer
syntax tree
Intermediate Code Generator
Code Optimization
intermediate representation
Code Generator
target machine code
Code Optimization
Lexical Analysis (Scanning)
• Grouping characters into lexemes
• Producing tokens
– (token-name, attribute-value)
• E.g.
– position = initial + rate * 60
– <id,1> <=> <id,2> <+> <id,3> <*> <60>
Syntax Analysis (Parsing)
• Creating a tree-like (e.g. syntax tree)
intermediate representation that depicts
the grammatical structure of the token
– E.g.
– <id,1> <=> <id,2> <+> <id,3> <*> <60>
<id, 1>
<id, 2>
<id, 3>
Semantic Analysis
• Type checking
• Type conversions or coercions
• E.g.
<id, 1>
<id, 2>
<id, 3>
Intermediate Code Generation
• Generating a low-level intermediate
– It should be easy to produce
– It should be easy to translate into the target
– E.g. three-address code (in Chap. 6)
• t1 = int2float(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Code Optimization
• Attempts to improve the intermediate
– Better: faster, shorter code, or code that
consumes less power (Chap. 8 -)
– E.g.
• t1 = id3 * 60.0
id1 = id2 + t1
Code Generation
• Mapping intermediate representation of
the source program into the target
language (Chap. 8)
– Machine code: register/memory location
– E.g.
• LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
Symbol Table Management
• To record the variable names and collect
information about various attributes of
each name
– Storage, type, scope
– Number and types of arguments, method of
argument passing, and the type returned
• (Chap. 2)
Grouping of Phases into Passes
• Front-end pass
– Lexical analysis, syntax analysis, semantic
analysis, intermediate code generation
• (Optional) Code optimization pass
• Back-end pass
– Code generation
Compiler-Construction Tools
• Parser generators
• Scanner generators
• Syntax-directed translation engines
• Code-generator generators
• Data-flow analysis engines
– A key part of code optimization
• Compiler construction toolkits
The Evolution of Programming
• Machine language: 1940’s
• Assembly language: early 1950’s
• Higher-level languages: late 1950’s
– Fortran: scientific computation
– Cobol: business data processing
– Lisp: symbolic computation
• Today: thousands of programming
Classification of Programming
Languages – by Generation
• First generation: machine languages
• Second generation: assembly languages
• Third generation: high-level languages
– Fortran, Cobol, Lisp, C, C++, C#, Java
• Fourth generation: specific application
– NOMAD, SQL, Postscript
• Fifth generation: logic- and constraintbased
– Prolog, OPS5
Classification of Programming
Languages - by Functions
• Imperative: how
– C, C++, C#, Java
• Declarative: what
– ML, Haskell, Prolog
• von Neumann language
– Fortran, C
• Object-oriented language
– Simula 67, Smalltalk, C++, C#, Java, Ruby
• Scripting languages
– Awk, JavaScript, Perl, PHP, Python, Ruby, Tcl
Impacts on Compilers
• To translate and support new language features
• To take advantage of new hardware capabilities
• To promote the use of high-level languages by
minimizing the execution overhead
• To make high-performance computer
architectures effective on users’ applications
• To evaluate architectural concepts
The Science of Building a Compiler
• How abstractions can be used to solve
– Take problem
– Formulate a mathematical abstraction that
captures the key characteristics
– Solve it using mathematical techniques
Modeling in Compiler Design and
• To design the right mathematical models
and choose the right algorithms
– Finite-state machines and regular expressions
(Chap. 3)
– Context-free grammars (Chap. 4)
– Trees (Chap. 5)
The Science of Code Optimization
• “optimization”: attempts to produce code
that is more efficient than the obvious
– Complex processor architectures
– Parallel computers
– Multicore, multiprocessor machines
• Theory vs. practice
– Graphs, matrices, linear programs (Chap. 9 - )
– Undecidable
• Design objectives for compiler
– Correct
– Performance improvement
• speed, size, power consumption
– Reasonable compilation time
• For rapid development and debugging cycle
– Manageable engineering effort
• Prioritize optimizations
Applications of Compiler
• Implementation of high-level
programming languages
• Optimizations for computer architectures
• Design of new computer architectures
• Program translations
• Software productivity tools
Implementation of high-level
programming languages
• Example: the register keyword in the C
programming language
– May lose efficiency, because programmers are
often not the best judge of very low-level
• Increased level of abstraction
– User-defined aggregate data types: arrays,
– High-level control flow: loops, procedure
• Object orientation
– Data abstraction
– Inheritance of properties
• Java
Range checks for arrays
Garbage collection
Portable and mobile code
Optimizations for computer
• Parallelism
– Instruction-level
• Explicit: VLIW machines such as Intel IA64
– Processor-level
• Memory hierarchies
– Registers, caches, physical memory,
secondary storage
Design of new computer
• RISC (Reduced Instruction-Set Computer)
– PowerPC, SPARC, MIPS, Alpha, PA-RISC
• CISC (Complex Instruction-Set Computer)
– x86
• Specialized architectures
Data flow machines
VLIW machines
SIMD arrays of processors
Systolic arrays
Multiprocessors with shared memory
Multiprocessors with distributed memory
Program translations
• Binary translation
– To translate the binary code for one machine to that
of another
– To provide backward compatibility
• Hardware synthesis
– Hardware description languages: Verilog and VHDL
• Database query interpreters
– Query languages: SQL (Structured Query Language)
• Compiled simulation
Software productivity tools
• Type checking
– Wrong type in an operation or parameters
passed to a procedure
• Bounds checking
– Buffer overflow in C
• Memory-management tools
– Purify: a widely used tool to find memory
management errors such as memory leaks in
C or C++
Programming Language Basics
• The static/dynamic distinction
– Static policy: the issue can be determined at
compile time
– Dynamic policy: at run time
– Scope of declarations
• Static scope
• Dynamic scope
– Ex: in a Java class,
• public static int x;
Environments and states
• Environment: a mapping from names to locations
• States: a mapping from locations to values
• Ex:
– int i;
void f(…) {
int i;
i = 3;
x = i +1;
• Names, Identifiers, and Variables
• The environment and state mappings are
dynamic, with a few exceptions:
– Static binding of names to locations
• E.g. global variable
– Static binding of locations to values
• E.g. declared constants
– #define ARRAYSIZE 1000
Static Scope and Block Structure
• Block: a grouping of declarations and statements
– C: { }
– Pascal: begin end
• Ex: blocks in a C++ program
– main () {
int a = 1;
int b = 1;
int b = 2;
int a = 3;
cout << a << b;
int b = 4;
cout << a << b;
cout << a << b;
cout << a << b;
Explicit Access Control
• Keywords like public, private, protected
in object-oriented languages such as C++
or Java
• Procedures, functions, methods
Dynamic Scope
• A use of a name x refers to the declaration
of x in the most recently called procedure
with such a declaration
• Declarations and definitions
• Ex: macro expansion in C preprocessor
– #define a (x+1)
int x = 2;
void b{} { int x = 1; printf(“%d\n”, a); }
void c{} { printf(%d\n”, a); }
void main() { b(); c(); }
• Ex: method resolution in object-oriented
– Class C with a method named m()
D is a subclass of C
x.m(), where x is an object of class C
Parameter Passing Mechanisms
• Call-by-value
– The actual parameter is evaluated or copied
• Call-by-reference
– The address of the actual parameter is passed
to the called as the value of the corresponding
formal parameter
• Call-by-name
– In Algo60, like a macro
• Ex:
– A is an array belonging to a procedure p
– P calls another procedure q(x, y) with a call
q(a, a)
– Parameters are passed by value
– x and y are aliases
End of Chapter 1