COMP2010 – Compilers Introduction Dr. Licia Capra UCL/CS ABOUT ME Lecturer Dr. Licia Capra Room 7.17 Malet Street Building Phone 020 7679 3708 Email l.capra@cs.ucl.ac.uk Office Hours Appointments via Email Web Page http://www.cs.ucl.ac.uk/staff/l.capra Mobile & Pervasive Computing Research Recommender Systems Trust Management Content Sharing & Distribution Ontology & Tagging ABOUT YOU 2nd Year Undergrads Students – Any affiliate student? – Any non-CS student? Pre requisites Pre-requisites 1007-1008 (Java Programming) 1001 Computer Architecture I (MIPS (MIPS, SPIM MIPS simulator) Emails Use your CS/UCL email account to write me Register for 2010@cs.ucl.ac.uk 1 ABOUT THE COURSE Lecturer (Me) Teaching Assistant Andrew Cox Who ABOUT THE COURSE Term 2 Course 30 Lectures When – Wed 11am-1pm [Drayton Ricardo LT] – Fri 10am-11am [MPEB 1.03] Problem Classes: –Tue 2pm-3pm [MPEB 1.02] – Start 19/01 Course Website http://www.cs.ucl.ac.uk/staff/l.capra/teaching/2010.html ABOUT THE COURSE Lecture Slides Course Material Exercises/Problem Classes Code Examples Books “Compilers – Principles, Techniques and Tools”, by A.V. Aho, R. Sethi, J.D. Ullman. Addison Wesley 2 ABOUT THE COURSE “Modern Compiler Implementation in Java”, by A.W. Appel. Cambridge University Press Books “M d “Modern Compiler C il D Design”, i ” b by D D. G Grune et al.l JJohn h Wiley and Sons Ltd “Advanced Compiler Design and Implementation”, by S.S. Muchnick. Morgan Kaufmann Read at least one! ABOUT THE COURSE Other Books “Linkers and Loaders”, by John R. Levine “Building an Optimizing Compiler”, by Robert Morgan “Advanced Compiling for High Perfomance”, by Kennedy “Object-Oriented Compiler Construction”, by Jim Holmes Conferences (a few!) Int. Conf. on Compiler Construction (CC) Int. Conf. on Programming Languages and Compilers (PLC) Principles of Programming Languages (POPL) European Conf. on OO Programming (ECOOP) Int. Conf. on Functional Programming (ICFP) Int. Conf. on Logic Programming (ICLP) Parallel Architectures and Compilation Techniques (PACT) ABOUT THE COURSE SIG ACM Special Interest Group on Programming Languages http://www.acm.org/sigs/sigplan/ JLex - A Lexical Analyzer Generator for Java http://www.cs.princeton.edu/~appel/modern/java/JLex/ Tools JFlex - The Fast Scanner Generator for Java http://www.jflex.de/ CUP – LALR Parser Generator for Java http://www.cs.princeton.edu/~appel/modern/java/CUP/ 3 ABOUT THE COURSE 2 Submissions – 1 Overall Mark: Coursework –Part I : Parsing – due Wednesday 24/02 –Part II: AST and Semantic Analyser – due Wednesday 24/03 Actions to take NOW: –Form groups of 3/4 people each –Groups to email me as soon as formed Assessment 20% coursework 80% written examination (2.5 hours) NOTE ON PLAGIARISM… 2010 GOALS Understand … the code structure … the language semantics … the relationship between source and machine code Learn … theory (mathematical models and algorithms) … practice (apply theory to build a real compiler) Build … a compiler! WHAT ARE COMPILERS? Compilers: translate computer program from one language to another Source language g g (high-level language: Java, C, Pascal, …) COMPILER Target g language g g (assembly language) Error messages 4 WHY DO WE NEED COMPILERS? Too difficult to write, debug, maintain programs written in assembly language Source code optimised p for human readability y Machine code optimised for hardware Goal of compilers: translate a source code program into an equivalent machine code program efficiently TRANSLATION CORRECTNESS Translation is a complex process – Source language and target language are very different Solution – Split compilation into different phases – From language-specific to machine-specific representation SIMPLIFIED COMPILER STRUCTURE Source code if (b==0) a=b; Analysis Intermediate representation Synthesis Target code CMP CX,0 CMOVZ DX,CX 5 ANALYSIS Source code (character stream) Lexical Analysis (Scanner) Token stream Syntax Analysis (Parser) Abstract Syntax Tree (AST) Semantic Analysis Decorated AST SYNTHESIS Decorated AST Intermediate Code Generator Front-end Intermediate code Optimiser Back-end Intermediate code Code Generator Target program LEXICAL ANALYSIS Goal: recognise words and symbols in the source program and group them into tokens – Natural language: “I like classical music” Tokens: “I” “like” “classical” “music” – Programming language: “if (b==0) a=b” Tokens: “if” “(” “b” “==” “0” “)” “a” “=” “b” 6 SYNTAX ANALYSIS Goal: recognise the phrase structure – Natural language: I like classical music noun verb adj noun object subject predicate sentence – Programming language: if (b==0) test a=b assignment If-statement SEMANTIC ANALYSIS Goal: check whether the source program is semantically valid – Natural language: Classical adj adj. music noun likes verb I noun (syntax is correct, semantics is wrong) – Programming language: if (b==0) test a=“foo” assignment If `a’ is an integer, the semantic analysis will report an error (end of analysis) ERROR HANDLING AND SYMBOL TABLE Source code (character stream) Lexical Analysis (Scanner) Token stream Symbol Table Syntax Analysis (Parser) Error Handler Abstract Syntax Tree (AST) Semantic Analysis Decorated AST 7 INTERMEDIATE CODE GENERATOR Goal: create intermediate code that is – easier to optimise than binary code – portable Example (3-address code): CJUMP(b==0,L1,L2) LABEL(L1) a=b LABEL(L2) … (end of front-end) OPTIMISER Goal: transform the intermediate code so to run faster and/or to use less space Example: (intermediate code) (optimised intermediate code) CJUMP(b==0,L1,L2) LABEL(L1) a=b LABEL(L2) CJUMP(b==0,L1,L2) LABEL(L1) a=0 LABEL(L2) CODE GENERATOR Goal: generate target program (usually assembly code) from optimised intermediate code Example: CMP ECX ECX,0 0 CMOVZ [EBP+8],0 (end of synthesis) (end of back-end) (end of compilation) 8 OVERALL COMPILER STRUCTURE Source code (character stream) Lexical Analysis (Scanner) Token stream Symbol Table Syntax Analysis (Parser) Error Handler Abstract Syntax Tree (AST) Semantic Analysis Decorated AST Intermediate Code Intermediate Code Optimisation Optimised Intermediate Code Code Generation Target code PHASES AND PASSES Good compilers must be FAST! Phases 1-4 executed in 1 pass Intermediate code Parser Source program Scanner Semantic Analyser Intermediate Code Gen. Phases 1-4: 1 pass Phase 5: 1+ pass 3+ passes Phase 6: 1 pass Unlike phases, passes produce real output COMPILERS VS. INTERPRETERS Compiler Source program Compiler Data Executable program Program g output p Interpreter Program output Executable program Interpreter Source program Data 9 COMPILERS VS. INTERPRETERS Hybrid approach Source program P-code Compiler Data P-code Interpreter P-code program Program output 10