Crafting a Compiler with C Chapter 1 Introduction Teacher : Dr. Lawrence Y. Deng contact: Lawrence@mail.sju.edu.tw Grading: Attendance 10%, Midterm exam. 30%, Homework 20%, Final exam. + Team project 40% Textbook: “Crafting a compiler with C” C. N. Fischer and R. J. LeBlanc, Jr.1991, The Benjamin/Cummings Publishing Company Reference “Compilers Principles, Techniques, and Tools”,A. V. Aho, R. Sethi, J. D. Ullman, 2006 2th, Addison Weslay 1 Outlines • • • • • • • 1.1 Overview and History 1.2 What Do Compilers Do? 1.3 The Structure of a Compiler 1.4 The Syntax and Semantics of Programming Languages 1.5 Compiler Design and Programming Language Design 1.6 Compiler Classifications 1.7 Influences on Computer Design 2 Overview and History • Compilers are fundamental to modern computing. • They act as translators, transforming human-oriented programming languages into computer-oriented machine languages. Programming Language (Source) Compiler Machine Language (Target) 3 Compilers are fundamental to modern computing: • Natural Language Processing –資訊檢索、機器翻譯、智慧型介面(介面設計)、寫 作輔助、語音辨識與合成、OCR –網頁的生成、檢索、擷取、多媒體性、及多語言性 –手語、行為偵測、網路監控、遊戲對奕 –內容管理 4 Reference Technologies • Finite State Machine Network • Data Classification • Web Content Structure/Content/Usage Mining 5 Overview and History (Cont’d) • The first real compiler – FORTRAN compilers of the late 1950s, IBM – 18 person-years to build • Compiler technology is more broadly applicable and has been employed in rather unexpected areas. – Text-formatting languages, like nroff and troff; preprocessor packages like eqn, tbl, pic – Silicon compiler for the creation of VLSI circuits – Command languages of OS – Query languages of Database systems • efficient? (compared with assembly program) – not bad, but much easier to write – high-level languages are feasible. • • • 18 man-year, ad hoc structure Today, we can build a simple compiler in a few month. Crafting an efficient and reliable compiler is still challenging. 6 1.2 What Do Compilers Do? • Compilers may be distinguished according to the kind of target code they generate: – Pure Machine Code – Augmented Machine Code – Virtual Machine Code • JVM, P-code 7 What Do Compilers Do? (Cont’d) • Another way that compilers differ from one another is in the format of the target machine code they generate – Assembly Language Format – Relocatable Binary Format • A linkage step is required – Memory-Image (Load-and-Go) Format 8 What Do Compilers Do? (Cont’d) • Another kind of language processor, called an interpreter, differs from a compiler in that it executes programs without explicitly performing a translation Interpreter Source Program Encoding Output Data 9 1.3 The Structure of a Compiler • Any compiler must perform two major tasks – Analysis of the source program – Synthesis of a machine-language program 10 11 The Structure of a Compiler (Cont’d) Source Program Tokens Scanner (Character Stream) Syntactic Parser Semantic Structure Routines Intermediate Representation Symbol and Attribute Tables (Used by all Phases of The Compiler) The structure of a Syntax-Directed Compiler Optimizer Code Generator Target Machine Code 12 13 the structure of a compiler source program a:=b*2+0 scanner id:=id*int+int parser syntax trees symbol table load b shL 1 sta a semantic routine optimizer code generator 1. type are correct load b mul 2 add 0 sta a oo1F 2b4A 309F target code 14 The Structure of a Compiler (Cont’d) • Scanner – The scanner begins the analysis of the source program by reading the input, character by character, and grouping characters into individual words and symbols (tokens) – The tokens are encoded and then are fed to the parser for syntactic analysis – For details, see the bottom of page 8. • Scanner generators 15 The Structure of a Compiler (Cont’d) • Parser – Given a formal syntax specification (typically as a context-free [CFG] grammar), the parse reads tokens and groups them into units as specified by the productions of the CFG being used. – While parsing, the parser verifies correct syntax, and if a syntax error is found, it issues a suitable diagnostic. – As syntactic structure is recognized, the parser either calls corresponding semantic routines directly or builds a syntax tree. 16 <2> Parser: identify syntactic structure - Syntax is specified by grammars,e.g., IfStmt == if ( Exp ) then Stmt Exp == … - Parsers can be generated from grammars. yacc or llgen grammar parser - Error-repair, e.g., ")" is missing if ( a > 3 then - Parsers may build a syntax tree or it may call semantic routines directly. 17 The Structure of a Compiler (Cont’d) • Semantic Routines – Perform two functions • Check the static semantics of each construct • Do the actual translation – The heart of a compiler • Optimizer – The IR code generated by the semantic routines is analyzed and transformed into functionally equivalent but improved IR code. – This phase can be very complex and slow – Peephole optimization 18 <3> Semantic routines - Check that variables are defined, that types are correct, … , etc. Ex. x := x + true - Generate IR (3-addr code), e.g., ADD A, B, C - hand-coded - associated with grammar rules - formalized as attribute grammars (AG) 19 • peephole optimization: – simple and effective – Ex 1. STORE R1, A • – Ex 2. – Ex 3. LOAD A, R1 MUL R1, #1 ADD R1, #0 – These innocent code can be actually generated by a naive compiler. 20 The Structure of a Compiler (Cont’d) • One-pass compiler – No optimization is required – To merge code generation with semantic routines and eliminate the use of an IR • Compiler writing tools – Compiler generators or compiler-compilers • E.g. scanner and parser generators 21 1.4 Compiler Design and Programming Language Design • An interesting aspect is how programming language design and compiler design influence one another. • Programming languages that are easy to compiler have many advantages – See the 2nd paragraph of page 16. 22 1.5 Compiler Design and Programming Language Design (Cont’d) • Languages such as Snobol and APL are usually considered noncompilable • What attributes must be found in a programming language to allow compilation? – Can the scope and binding of each identifier reference be determined before execution begins – Can the type of object be determined before execution begins? – Can existing program text be changed or added to during execution? 23 1.6 Compiler Classifications • Diagnostic compilers Report and repair compile-time errors. - Add run-time checks, e.g., array subscripts. - should be used in real world. - vs. production compiler • Optimizing compilers - for production use - 'optimal' is suspicious because 1. theoretically undecidable 2. practically infeasible - Perform a lot of transformations with unknown effects. Ex. place data in registers • › fast access › slow procedure call Retargetable compiler Localize machine dependence. - difficult to implement - less efficient object code 24