slide3

advertisement
Chapter 1. Introduction
J. H. Wang
Sep. 10, 2008
Outline
• Language Processors
• The Structure of a Compiler
• The Evolution of Programming Languages
• The Science of Building a Compiler
• Applications of Compiler Technology
• Programming Languages Basics
Language Processors
• A compiler
source program
Compiler
target program
• Running the target program
input
Target Program
output
• An interpreter
– Much slower program execution
– Better error diagnostics
source program
input
Interpreter
output
• A hybrid compiler, e.g. Java
source program
Translator
intermediate
program
input
Virtual
Machine
output
A Language Processing System
source program
Preprocessor
modified source program
Compiler
target assembly program
Assembler
relocatable machine code
Linker/Loader
target machine code
library files
relocatable object files
The Structure of a Compiler
• Analysis
– Front end
– Using a grammatical structure to create an
intermediate representation
– Collecting information about the source program in a
symbol table
• Synthesis
– Back end
– Constructing the target program from the
intermediate representation and the symbol table
Phases of a Compiler
character stream
Lexical Analyzer
token stream
Syntax Analyzer
syntax tree
Symbol
Table
(optional)
Semantic Analyzer
syntax tree
Intermediate Code Generator
Machine-Independent
Code Optimization
intermediate representation
Code Generator
target machine code
Machine-Dependent
Code Optimization
(optional)
Lexical Analysis (Scanning)
• Grouping characters into lexemes
• Producing tokens
– (token-name, attribute-value)
• E.g.
– position = initial + rate * 60
– <id,1> <=> <id,2> <+> <id,3> <*> <60>
Syntax Analysis (Parsing)
• Creating a tree-like (e.g. syntax tree)
intermediate representation that depicts
the grammatical structure of the token
streams
– E.g.
– <id,1> <=> <id,2> <+> <id,3> <*> <60>
=
–
+
<id, 1>
*
<id, 2>
<id, 3>
60
Semantic Analysis
• Type checking
• Type conversions or coercions
• E.g.
–
=
+
<id, 1>
*
<id, 2>
<id, 3>
int2float
60
Intermediate Code Generation
• Generating a low-level intermediate
representation
– It should be easy to produce
– It should be easy to translate into the target
machine
– E.g. three-address code (in Chap. 6)
• t1 = int2float(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Code Optimization
• Attempts to improve the intermediate
code
– Better: faster, shorter code, or code that
consumes less power (Chap. 8 -)
– E.g.
• t1 = id3 * 60.0
id1 = id2 + t1
•
Code Generation
• Mapping intermediate representation of
the source program into the target
language (Chap. 8)
– Machine code: register/memory location
assignments
– E.g.
• LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
Symbol Table Management
• To record the variable names and collect
information about various attributes of
each name
– Storage, type, scope
– Number and types of arguments, method of
argument passing, and the type returned
• (Chap. 2)
Grouping of Phases into Passes
• Front-end pass
– Lexical analysis, syntax analysis, semantic
analysis, intermediate code generation
• (Optional) Code optimization pass
• Back-end pass
– Code generation
Compiler-Construction Tools
• Parser generators
• Scanner generators
• Syntax-directed translation engines
• Code-generator generators
• Data-flow analysis engines
– A key part of code optimization
• Compiler construction toolkits
The Evolution of Programming
Languages
• Machine language: 1940’s
• Assembly language: early 1950’s
• Higher-level languages: late 1950’s
– Fortran: scientific computation
– Cobol: business data processing
– Lisp: symbolic computation
• Today: thousands of programming
languages
Classification of Programming
Languages – by Generation
• First generation: machine languages
• Second generation: assembly languages
• Third generation: high-level languages
– Fortran, Cobol, Lisp, C, C++, C#, Java
• Fourth generation: specific application
– NOMAD, SQL, Postscript
• Fifth generation: logic- and constraintbased
– Prolog, OPS5
Classification of Programming
Languages - by Functions
• Imperative: how
– C, C++, C#, Java
• Declarative: what
– ML, Haskell, Prolog
• von Neumann language
– Fortran, C
• Object-oriented language
– Simula 67, Smalltalk, C++, C#, Java, Ruby
• Scripting languages
– Awk, JavaScript, Perl, PHP, Python, Ruby, Tcl
Impacts on Compilers
• To translate and support new language features
• To take advantage of new hardware capabilities
• To promote the use of high-level languages by
minimizing the execution overhead
• To make high-performance computer
architectures effective on users’ applications
• To evaluate architectural concepts
The Science of Building a Compiler
• How abstractions can be used to solve
problems
– Take problem
– Formulate a mathematical abstraction that
captures the key characteristics
– Solve it using mathematical techniques
Modeling in Compiler Design and
Implementation
• To design the right mathematical models
and choose the right algorithms
– Finite-state machines and regular expressions
(Chap. 3)
– Context-free grammars (Chap. 4)
– Trees (Chap. 5)
The Science of Code Optimization
• “optimization”: attempts to produce code
that is more efficient than the obvious
code
– Complex processor architectures
– Parallel computers
– Multicore, multiprocessor machines
• Theory vs. practice
– Graphs, matrices, linear programs (Chap. 9 - )
– Undecidable
• Design objectives for compiler
optimizations
– Correct
– Performance improvement
• speed, size, power consumption
– Reasonable compilation time
• For rapid development and debugging cycle
– Manageable engineering effort
• Prioritize optimizations
Applications of Compiler
Technology
• Implementation of high-level
programming languages
• Optimizations for computer architectures
• Design of new computer architectures
• Program translations
• Software productivity tools
Implementation of high-level
programming languages
• Example: the register keyword in the C
programming language
– May lose efficiency, because programmers are
often not the best judge of very low-level
matters
• Increased level of abstraction
– User-defined aggregate data types: arrays,
structures
– High-level control flow: loops, procedure
invocations
• Object orientation
– Data abstraction
– Inheritance of properties
• Java
–
–
–
–
Type-safe
Range checks for arrays
Garbage collection
Portable and mobile code
Optimizations for computer
architectures
• Parallelism
– Instruction-level
• Explicit: VLIW machines such as Intel IA64
– Processor-level
• Memory hierarchies
– Registers, caches, physical memory,
secondary storage
Design of new computer
architectures
• RISC (Reduced Instruction-Set Computer)
– PowerPC, SPARC, MIPS, Alpha, PA-RISC
• CISC (Complex Instruction-Set Computer)
– x86
• Specialized architectures
–
–
–
–
–
–
Data flow machines
VLIW machines
SIMD arrays of processors
Systolic arrays
Multiprocessors with shared memory
Multiprocessors with distributed memory
Program translations
• Binary translation
– To translate the binary code for one machine to that
of another
– To provide backward compatibility
• Hardware synthesis
– Hardware description languages: Verilog and VHDL
• Database query interpreters
– Query languages: SQL (Structured Query Language)
• Compiled simulation
Software productivity tools
• Type checking
– Wrong type in an operation or parameters
passed to a procedure
• Bounds checking
– Buffer overflow in C
• Memory-management tools
– Purify: a widely used tool to find memory
management errors such as memory leaks in
C or C++
Programming Language Basics
• The static/dynamic distinction
– Static policy: the issue can be determined at
compile time
– Dynamic policy: at run time
– Scope of declarations
• Static scope
• Dynamic scope
– Ex: in a Java class,
• public static int x;
Environments and states
• Environment: a mapping from names to locations
• States: a mapping from locations to values
• Ex:
– int i;
void f(…) {
int i;
…
i = 3;
…
}
…
x = i +1;
• Names, Identifiers, and Variables
• The environment and state mappings are
dynamic, with a few exceptions:
– Static binding of names to locations
• E.g. global variable
– Static binding of locations to values
• E.g. declared constants
– #define ARRAYSIZE 1000
Static Scope and Block Structure
• Block: a grouping of declarations and statements
– C: { }
– Pascal: begin end
• Ex: blocks in a C++ program
– main () {
int a = 1;
int b = 1;
{
int b = 2;
{
int a = 3;
cout << a << b;
}
{
int b = 4;
cout << a << b;
}
cout << a << b;
}
cout << a << b;
}
Explicit Access Control
• Keywords like public, private, protected
in object-oriented languages such as C++
or Java
• Procedures, functions, methods
Dynamic Scope
• A use of a name x refers to the declaration
of x in the most recently called procedure
with such a declaration
• Declarations and definitions
• Ex: macro expansion in C preprocessor
– #define a (x+1)
int x = 2;
void b{} { int x = 1; printf(“%d\n”, a); }
void c{} { printf(%d\n”, a); }
void main() { b(); c(); }
• Ex: method resolution in object-oriented
programming
– Class C with a method named m()
D is a subclass of C
x.m(), where x is an object of class C
Parameter Passing Mechanisms
• Call-by-value
– The actual parameter is evaluated or copied
• Call-by-reference
– The address of the actual parameter is passed
to the called as the value of the corresponding
formal parameter
• Call-by-name
– In Algo60, like a macro
Aliasing
• Ex:
– A is an array belonging to a procedure p
– P calls another procedure q(x, y) with a call
q(a, a)
– Parameters are passed by value
– x and y are aliases
End of Chapter 1
Download