Systems Software - Compilers and Assemblers Oct-06 Compilers Compilers • design problem • Translates a source language program into a low level object language – express every high level language program in machine instructions – high level source language => large distance to machine language – therefore use intermediate language in translation process (2 steps not 1) October 06 Compilers and Assemblers – May be the assembly language or the machine language of that particular computer. • If the object file is assembler, then an assembler converts it to machine language. – A Linker then combines the object files with libraries to create the final executable file. 1 October 06 Compilers • Difference between Interpreters and Compilers – May translate the source into an internal intermediate code that it can execute more efficiently. – May execute the source program statements directly. – Interpreter works similar to us in figuring out what the code does. • Check for syntax errors, locate the start and then execute the statements. – Compiler translates the source to object after checking for errors. Can be either assembler or machine language. • Net result – interpreter translates the program into actions specified by the code. Compilers and Assemblers 3 October 06 Compilers – Interpreter works as we do. • Each time, it has to figure out what statements mean. – Compilers generate machine code which is executed at top speed. • Debuggers have helped a lot. • Up to 100 times faster than an interpreter. – However, the statements required to keep the environment informed may affect the efficiency of the code Damien Costello, Dept of Computing & Maths, GMIT 4 • What about speed? – Interpreter may stop and ask for corrective action, show you where the problem is and so on. – Compilers generate the object code which then runs independently, may simply abort on error. Compilers and Assemblers Compilers and Assemblers Compilers • What about logic errors (divide by zero)? October 06 2 Compilers • An interpreter does not produce an object program. October 06 Compilers and Assemblers 5 • Would be nice to have both an interpreter and a compiler for a given language. October 06 Compilers and Assemblers 6 1 Systems Software - Compilers and Assemblers Oct-06 Compilers - Analysis-Synthesis Analysis-Synthesis • Front-end • related view – Reads the source program – source-language-dependant – target-language-independent – analysis • recognise structure and meaning of source – synthesis • Back-end – – – – • construct the desired target Interpreter – execute program Compiler – generate object code source-language-independent target-language-dependant October 06 Compilers and Assemblers 7 October 06 Compilers and Assemblers Compilers Compilers • performs translation of high level language to intermediate or machine language • reads program in source language and translates to target language • target program may need further processing • compiler generates assembly code • translated by an assembler to relocatable code from libraries • final code can run on machine October 06 Compilers and Assemblers 8 9 source object files from libraries compiler assembly relocatable machine code assembler linker/loader October 06 Compiler Components absolute machine code Compilers and Assemblers 10 Compiler Components • Parser • code generator – Knows the syntax of the source language. – constructs the target machine code • Grammar and rules • constrainer – It controls the translation process. – Sends “get” messages to the scanner object. – Helps to enforces type and declaration rules – Adds to symbol table • Scanner – reads the source as string of characters – recognises streams of words and symbols (tokens) – Starts building the symbol table • semantic information about identifiers – often considered part of parser as they work closely • character sting constants, spelling of identifiers October 06 Compilers and Assemblers Damien Costello, Dept of Computing & Maths, GMIT 11 October 06 Compilers and Assemblers 12 2 Systems Software - Compilers and Assemblers Oct-06 Front End Compiler Components Back End Token Token ICode ICode Parser Get GetChar Source Source Buffer Buffer Get Put Get Scanner Scanner parser parser Code Code Generator Generator Go GetChar Enter Search PutLine PutLine Scanner Symbol Table Symbol Symbol Table Table October 06 Compilers and Assemblers 13 October 06 – depend on characteristics of target machine code • Machine Independent Optimisation – operates on the Icode (sometimes called the Abstract Syntax Tree) • reshaping for more efficient code • Peephole Optimisation 15 – most common form of machine dependent optimiser • operates on machine code for local improvements • small number of instructions considered October 06 Language Definition 16 • some sequences of words are correct • others are incorrect or ill formed • grammar or syntax – defines which sequences are correct – set of rules that define how words can be arranged to form sentences – rules provide every sentence with structure – consult language definition to find out constructs and meanings • compiler writer – provide for every construct a translation according to its meaning Damien Costello, Dept of Computing & Maths, GMIT Compilers and Assemblers Language Definition • programmer and compiler writer need strict definition of high level language • programmer Compilers and Assemblers 14 • can also include one or more optimising modules • two kinds of optimisation – Taken with the Symbol Table, this provides a clean interface between the front and back ends. October 06 Compilers and Assemblers Compiler Components • List buffer – for source listings, error messages and other printed information • Symbol Table – used to keep track of information about certain tokens (identifiers, function calls). • Icode – intermediate code – a predigested version of the source. Compilers and Assemblers Machine Code Object Object Buffer Buffer Compiler Components October 06 Code Generator Put Search PutLine List List Buffer Buffer Constrainer • can be used as an instrument to recognise the structure of sentence 17 October 06 Compilers and Assemblers 18 3 Systems Software - Compilers and Assemblers Oct-06 Language Definition Syntax • semantics • consider English sentences – define meaning of well formed sentences – give meaning of every language structure recognised by syntax – subject, verb, object • notions of sentence subject verb object article noun • pragmatics – play a role in the language description – denote parts of sentence – syntactic categories – characteristics of specific implementation • restrictions of implementation of language • in formal languages - tokens October 06 Compilers and Assemblers 19 October 06 Syntax • sentence denotes notion – set of all strings of tokens that satisfy definition of sentence – defined using rewriting rules • choices denoted by vertical bar • language definition described is a generation scheme sentence → subject verb object subject → article noun verb → bites object → article noun article → a | the noun → man | dog October 06 – sentence generated by starting at “sentence” and successively applying rewriting rules Compilers and Assemblers 21 October 06 Syntax article verb noun Compilers and Assemblers 22 Syntax sentence subject 20 Syntax • syntactic categories + tokens = grammar symbols • syntactic categories • • • • • • Compilers and Assemblers • interior node and children correspond to rewriting rule • tokens (a the dog man bites) are terminal symbols object article noun – end of generation process the dog bites a • sentence, subject, object are non terminals • rewriting rules are called production rules man – produce sentences of the language October 06 Compilers and Assemblers Damien Costello, Dept of Computing & Maths, GMIT 23 October 06 Compilers and Assemblers 24 4 Systems Software - Compilers and Assemblers Oct-06 Syntax Syntax • sentence • Context free grammar consists of – start symbol – distinguished nonterminal where generation process starts – set of terminals or tokens • (representation of tokens in sentences) – set of nonterminals • context free grammar • do not occur in sentences – productions apply in any context in which the nonterminals occur – start symbol – set of production rules • language is context free • left side and right side • string containing zero tokens - empty string – defined by means of context free grammar October 06 Compilers and Assemblers 25 October 06 Syntax • depicts how a string in the language is derived from start symbol • parse tree properties – need semantics of every language structure – cannot assume to know the meanings of words – evaluation of arithmetic expressions – – – – • define precedence – for every operator • define result type for every combination of operand types Compilers and Assemblers Damien Costello, Dept of Computing & Maths, GMIT 26 Parse Trees • in high level language October 06 Compilers and Assemblers 27 root is labelled by the start symbol leaf is labelled by a token interior node is labelled by nonterminal leaves, spelled from left to right • yield of tree, generated or derived • ambiguity ( 3 - 2 + 1) – where expression can have more than one parse tree October 06 Compilers and Assemblers 28 5