CS322Week1

advertisement
COMPSCI 322: Language and Compilers
Class Hour:
Hyer Hall 210: TThu 9:30am – 10:15am
A little bit about the instructor
Assistant professor at UWW since August 2005
• Graduated from the University of
Connecticut (05 Class), Ph.D in
Computer Science and Engineering
• Master of Computer Science from
UW-Milwaukee (96-99)
• Bachelor of Science from Hanoi
University of Technology (86-91)
A little bit about the instructor
• Research Experience:
– User Modeling, Information Retrieval,
Decision Theory, Collaborative Filtering,
Human Factors
• Teaching Experience:
– MCS 220, COMPSCI 172, 181, 271, 381 at
UWW
– Introductory courses at UOP and Devry
– TA for Computer Architecture, OO Design,
Compiler, Artificial Intelligence
Contact information
nguyenh@uww.edu
(fastest way to contact me)
Baker Hall 324
Office Hours: 9:50am – 10:50 am,
3-4pm, MWF or by appointment
262 472 5170
Course Objectives
• Understand the description and successfully
design a scanner, parser, semantic checker
and code generator for this language
• Implement successfully a scanner, parser,
semantic checker and code generator for this
given language. Test the implementation with
all test cases for each component in a
compiler.
Book Requirement
• Engineering a Compiler. 2004. Keith D. Cooper
and Linda Torczon. Morgan Kaufmann
Publisher (available in TextBook rental)
• Web site:
http://www.cs.rice.edu/~keith/Errata.html
Course detail - Evaluation
GRADABLE
POINTS
3 projects
650
Final Exam
150
Presentation
100
In class exercises
100
Total
1000
Projects
• 3 projects: scanner, parser and semantic
checker, code generator. Preferred language
to develop them is Java, but C/C++ are
welcomed too.
• Project 3 depends on Project 2, Project 2
depends on Project 1.
• ABSOLUTELY no LATE submission for
Project 3 because of the time consuming
to grade this project.
In class exercises
• Simple multiple choice questions and simple
problems will be given in class weekly and
graded.
• This requires students to read the assigned
reading (partly also because this is a
discussion course instead of lecture)
– Not all material will be covered in class
– Book complements the lectures
Presentation
• Each student will do research on a specific
programming language of his choice. Please
let the instructor know ahead of time which
language do you choose
• Then present 15-20 minutes his research in
front of class using powerpoint presentation.
This will be followed by 10 minute questions.
Grade
Letter Grade
Percentage
A
90 to 100%
B
80 to 89%
C
70 to 79%
D
60 to 69%
F
Below 60%
Prerequisite
Prerequisite: COMPSCI 271, and Data Structures
Students are responsible for meeting these
requirements.
Compilers
• What is a compiler?
– A program that translates an executable program
in one language into an executable program in
another language
– The compiler should improve the program, in some
way
• What is an interpreter?
Compilers
• What is a compiler?
– A program that translates an executable program
in one language into an executable program in
another language
– The compiler should improve the program, in some
way
• What is an interpreter?
– A program that reads an executable program and
produces the results of executing that program
Examples
• C is typically compiled, Basic is typically
interpreted
• Java is compiled to bytecodes (code for the
Java VM).
– which are then interpreted
– Or a hybrid strategy is used
• Just-in-time compilation
Taking a Broader View
• Compiler Technology = Off-Line Processing
– Goals: improved performance and language usability
• Making it practical to use the full power of the language
– Trade-off: preprocessing time versus execution time
(or space)
– Rule: performance of both compiler and application
must be acceptable to the end user
Why study Compilation
“ So even though I'd never actually want to write a
compiler myself, knowing about compiler concepts
would have made me a better programmer. It's one of
those gaps that I regret, which is why I think I may
actually try to struggle through a few chapters from
this Engineering a Compiler book during the holidays,
in between all the holiday activities like eating. And
shopping. And listening to "Santa Got Run Over By a
Reindeer" for the billionth time … “
Why Study Compilation?
• Compilers are important system software
components
– They are intimately interconnected with architecture,
systems, programming methodology, and language design
• Compilers include many applications of theory to
practice
– Scanning, parsing, static analysis, instruction selection
• Many practical applications have embedded
languages
– Commands, macros, formatting tags …
Why Study Compilation?
• Many applications have input formats that look like
languages,
– Matlab, Mathematica
• Writing a compiler exposes practical algorithmic &
engineering issues
– Approximating hard problems; efficiency &
scalability
Intrinsic interest
 Compiler construction involves ideas from
many different parts of computer science
Artificial intelligence
Algorithms
Theory
Systems
Architecture
Greedy algorithms
Heuristic search techniques
Graph algorithms, union-find
Dynamic programming
DFAs & PDAs, pattern matching
Fixed-point algorithms
Allocation & naming,
Synchronization, locality
Pipeline & hierarchy management
Instruction set use
Intrinsic merit
 Compiler construction poses challenging and
interesting problems:
– Compilers must do a lot but also run fast
– Compilers have primary responsibility for run-time performance
– Compilers are responsible for making it acceptable to use the full
power of the programming language
– Computer architects perpetually create new challenges for the
compiler by building more complex machines
– Compilers must hide that complexity from the programmer
– Success requires mastery of complex interactions
Preparation for next class
Review the materials for this class
Read chapter 1 of the book
Overview of compilers
High-level View of a Compiler
Source
code
Compiler
Machine
code
Errors
High-level overview of a compiler
Implications
–
–
–
–
Must
Must
Must
Must
recognize legal (and illegal) programs
generate correct code
manage storage of all variables (and code)
agree with OS & linker on format for object code
Big step up from assembly language—use higher level notations
Traditional Two-pass Compiler
Source
code
Front
End
IR
Back
End
Machine
code
Errors
•
•
•
•
Use an intermediate representation (IR)
Front end maps legal source code into IR
Back end maps IR into target machine code
Admits multiple front ends & multiple passes
The Front End
Source
code
Scanner
tokens
IR
Parser
Errors
• Responsibilities
–
–
–
–
–
Recognize legal (& illegal) programs
Report errors in a useful way
Produce IR & preliminary storage map
Shape the code for the back end
Much of front end construction can be automated
Scanner
• Maps character stream into words
• Produces pairs (token): <its part of speech, a word>
x = x + y ; becomes <id,x> = <id,x> + <id,y> ;
– word  lexeme, part of speech  token type
• Typical tokens include number, identifier, +, –, new, while,
if
• Scanner eliminates white space and comments
• Speed is important
Parser
• Recognizes context-free syntax & reports errors
• Guides context-sensitive (“semantic”) analysis (type
checking)
• Builds IR for source program
Hand-coded parsers are fairly easy to build
Most books advocate using automatic parser generators
Parser
Context-free syntax is specified with a grammar
SheepNoise  SheepNoise baa | baa
SheepNoise -> nil
This grammar defines the set of noises that a
sheep makes under normal circumstances
It is written in a variant of Backus–Naur Form
(BNF)
Parser
Formally, a grammar G = (S,N,T,P)
• S is the start symbol
• N is a set of non-terminal symbols
• T is a set of terminal symbols or words
• P is a set of productions or rewrite rules
(P : N  N T )
Parser
1. goal  expr
2. expr  expr op term
3.
| term
4. term  number
5.
| id
6. op
7.
+
|
-
S = goal
T = { number, id, +, - }
N = { goal, expr, term, op }
P = { 1, 2, 3, 4, 5, 6, 7}
Parser
Context-free syntax can be put to better use
• This grammar defines simple expressions with addition
& subtraction over “number” and “id”.
• This grammar, like many, falls in a class called
“context-free grammars”, abbreviated CFG.
Parser
Production
1
2
5
7
2
4
6
3
5
Result
goal
expr
expr
expr
expr
expr
expr
expr
term
x + 2 - y
op term
op y
- y
op term - y
op 2 - y
+ 2 - y
+ 2 - y
x + 2 - y
Parser
A parse can be represented by a tree (parse tree
goal
or syntax tree)
x + 2 - y
expr
expr
expr
term
<id,x>
op
+
term
<number,2>
op
term
-
<id,y>
1. goal  expr
2. expr  expr op term
3.
| term
4. term  number
5.
| id
6. op
7.
 +
|
-
Parser
Compilers often use an abstract syntax tree
-
+
<id,x>
<id,y>
<number,2>
The AST summarizes
grammatical structure,
without including detail
about the derivation
The Back End
IR
Instruction
Selection
IR
Register
Allocation
IR
Machine
code
Instruction
Scheduling
Errors
Responsibilities
• Translate IR into target machine code
• Choose instructions to implement each IR operation
• Decide which value to keep in registers
• Ensure conformance with system interfaces
Automation has been less successful in the back end
The Back End
IR
Instruction
Selection
IR
Register
Allocation
IR
Instruction
Scheduling
Machine
code
Errors
Instruction Selection
• Produce fast, compact code
• Take advantage of target features such as addressing modes
• Usually viewed as a pattern matching problem
– ad hoc methods, pattern matching, dynamic programming
The Back End
IR
Instruction
Selection
IR
Register Allocation
•
•
•
•
Register
Allocation
IR
Instruction
Scheduling
Machine
code
Errors
Have each value in a register when it is used
Manage a limited set of resources
Can change instruction choices & insert LOADs & STOREs
Optimal allocation is NP-Complete
(1 or k registers)
• Compilers approximate solutions to NP-Complete
problems
The Back End
IR
Instruction
Selection
IR
Register
Allocation
IR
Instruction
Scheduling
Machine
code
Errors
Instruction Scheduling
• Avoid hardware stalls and interlocks
• Use all functional units productively
• Can increase lifetime of variables
(changing the allocation)
Optimal scheduling is NP-Complete in nearly all cases
Heuristic techniques are well developed
Traditional Three-pass Compiler
Source
Code
Front
End
IR
Middle
End
Code Improvement (or Optimization)
IR
Back
End
Machine
code
Errors
• Analyzes IR and rewrites (or transforms) IR
• Primary goal is to reduce running time of the compiled code
– May also improve space, power consumption, …
• Must preserve “meaning” of the code
– Measured by values of named variables
The Optimizer (or Middle End)
IR
Opt
1
IR
Opt
2
IR
Opt
3
IR
...
Opt
n
IR
Errors
Modern optimizers are structured as a series of passes
Typical Transformations
•
•
•
•
•
•
Discover & propagate some constant value
Move a computation to a less frequently executed place
Specialize some computation based on context
Discover a redundant computation & remove it
Remove useless or unreachable code
Encode an idiom in some particularly efficient form
Modern Restructuring Compiler
Source
Code
Front
End
HL
HL
AST Restructure AST
r
IR
Gen
IR
Opt +
Back
End
Machine
code
Errors
Typical Restructuring Transformations:
•
•
•
•
•
Blocking for memory hierarchy and register reuse
Vectorization
Parallelization
All based on dependence
Also full and partial inlining
Discussion
Consider a simple web browser that takes as
input a textual string in HTML format and
displays the specified graphics on the screen.
Is the display process of compilation or
interpretation? Why?
Next class
• Lexical analysis
• Chapter 2
Download