Compiler Construction CS 606 Sohail Aslam Lecture 1 Course Organization General course information Homework and project information 2 General Information Instructor Sohail Aslam Lectures 45 Text Compilers – Principles, Techniques and Tools by Aho, Sethi and Ullman 3 Work Distribution Theory • homeworks = 20% • exams = 40% Practice • build a compiler = 40% 4 Project Implementation language: subset of java Generated code: Intel x86 assembly Implementation language: C++ Six programming assignments 5 Why Take this Course Reason #1: understand compilers and languages understand the code structure understand language semantics understand relation between source code and generated machine code become a better programmer 6 Why Take this Course Reason #2: nice balance of theory and practice Theory • mathematical models: regular expressions, automata, grammars, graphs • algorithms that use these models 7 Why Take this Course Reason #2: nice balance of theory and practice Practice • Apply theoretical notions to build a real compiler 8 Why Take this Course Reason #3: programming experience write a large program which manipulates complex data structures learn more about C++ and Intel x86 9 What are Compilers Translate information from one representation to another Usually information = program 10 Examples Typical Compilers: • VC, VC++, GCC, JavaC • FORTRAN, Pascal, VB(?) Translators • Word to PDF • PDF to Postscript 11 In This Course We will study typical compilation: from programs written in highlevel languages to low-level object code and machine code 12 Typical Compilation High-level source code Compiler Low-level machine code 13 Source Code int expr( int n ) { int d; d = 4*n*n*(n+1)*(n+1); return d; } 14 Source Code Optimized for human readability Matches human notions of grammar Uses named constructs such as variables and procedures 15 Assembly Code .globl _expr _expr: pushl %ebp movl %esp,%ebp subl $24,%esp movl 8(%ebp),%eax movl %eax,%edx leal 0(,%edx,4),%eax movl %eax,%edx imull 8(%ebp),%edx movl 8(%ebp),%eax incl %eax imull %eax,%edx movl 8(%ebp),%eax incl %eax imull %eax,%edx movl %edx,-4(%ebp) movl -4(%ebp),%edx movl %edx,%eax jmp L2 .align 4 L2: leave ret 16 Assembly Code Optimized for hardware Consists of machine instructions Uses registers and unnamed memory locations Much harder to understand by humans 17 How to Translate Correctness: the generated machine code must execute precisely the same computation as the source code 18 How to Translate Is there a unique translation? No! Is there an algorithm for an “ideal translation”? No! 19 How to Translate Translation is a complex process source language and generated code are very different Need to structure the translation 20