52 358 Compilers – course overview and key skills outcomes. Richard Connor This document provides an overview of all the lecture material; in particular, it emphasises what you should be able to do as a result of the material. This is a practical-based course, and concentrates on transferable general skills rather than abstract knowledge. Note that this is in contrast with many other similar courses and most textbooks – ie pass the course by understanding these notes, not by reading textbooks. If you can do all (most) of the tests stated below, you will find the exam very easy to pass! Course structure by lectures: Intro: Basic definition of languages; definitions of syntax and semantics; the concept of a language implementation; languages that can be mechanically interpreted; introduction to soundness and completeness Skills: Must know: everything presented here. These are core definitional concepts. Regular Expressions Introduction to the simplest class of ‘interesting’ computed languages. The syntax taught is taken from (and is a subset of) the ECMA-262 (‘JavaScript’) standard and as such is widely available. Most of the teaching for this subject is through the selftaught unit available from the web site. Skills: to be able to read and write regular expressions fluently. Test: you should know what all of the following mean. You should be able to construct these expressions themselves given an alternative definition of the set of strings they represent. a* aa ab? a*b+a?a [1-9][0-9]* [0-9].[0-9]+ Structural induction This is a core skill that should already be understood from a general mathematical background, but is included here for revision purposes. Skills: understand the definition of an infinite set, containing members of unbounded size, via structural induction. Test: understand fully the definition of the meaning of natural numbers in slide 5. Derive a meaning for the string ‘263’ based on this. Context-free syntax This introduces BNF and eBNF, the bread-and-butter of language definition. Skills: You must be able to read, write and understand any BNF definition in the class of context-free of languages (single non-terminal definitions –see next module). You should understand the difference between pure BNF and various extended forms, and be fairly confident about translating between them. Tests: Read the JavaScript eBNF – you should be able to understand most of it (it’s fairly difficult around ‘new’ and ‘member’ expressions, though!) Write all the above REs in BNF – you should be able to do this easily. Complete the URL BNF exercise in the lecture; that’s about the level of difficulty you are likely to meet in an exam. Expressive power A theory module, but with important salient facts: definition wrt BNF of regular expressions, context-free languages, and context-sensitive languages. The requirement for context-sensitive languages, and the realisation that BNF isn’t a good way to define them, leading to multi-part definitions: BNF structure with other layered rules based on this structure. Skills: none. Language implementation Outline of a bit of history and the way a modern compiler is structured. The concept of simulation is important, and the definition of a compiler as a function that translates one form of symbols to another, in a different language, but with the same meaning. Skills: none. Lexical analysis Quite a difficult module, mixing some significant theory and practice. The main point is that if we can assume most of our language are LL(1) then we can parse them (which means to construct the proof tree of inclusion in the BNF structure) from left to right. Nowadays we do normally make this assumption, and most well-designed language are amenable to this. This reflects on the way we write parsers, which in turn reflects on the lexical analysis abstractions required. This entire course is based on languages with this property; students should be aware that other classes of language exist. Other important understanding is the arbitrary division between syntax and microsytnax, and the reasons for maintaining this. Skills: fully understand the lexical analysis example interface given, at a sufficiently deep level to adapt to different interfaces with slightly different properties. Test: completion of parts 1 and 2 of the compiler practical coursework. Syntax analysis Includes only recursive descent – other methods do exist! Key understanding is the relationship between the recursive descent functions and the BNF definition of the language, this being so tight it can often be automated. The deep knowledge here is the understanding of the relationship between the flow of recursive function calls and the proof structure of BNF inclusion for the same input string. A string not in the language will result in an error by side-effect, a string in the language will cause the parser to terminate silently. Skills: to translate fluently between a BNF definition and a set of recursive descent parsing functions. Test: check and try to re-implement the example language TRIV-CF from the examples. Follow through the set of recursive calls made for legal and non-legal sentences of this language. Type rules Definitional framework for context-sensitive syntax. Based on BNF structure. Understand the structure and meaning of these rules as a set of logical implications; read as axioms from the top down, used as proof steps from the bottom up. Skills: read and write type rules (a) without and (b) with environments present. Test: prove of disprove the inclusion of simple examples of TRIV programs as defined on the last slide. Take some simple BNF examples from the ECMA-262 (untyped) syntax and write type rules for them. Understand all the code in the TRIVtyped parser. Semantics Only defintional framwork by giving rules to rewrite a language sentence in another language. Denotational semantics: using the common language of ‘maths’; operational semantics: using a separately defined context, eg another language or machine. Advantages and disadvantages of different translation targets. The ‘fat bracket’ notation for rewrite rules. Revision of soundness and completeness interpreted in terms of type rules and semantic rules. Skills: read and write semantic rules (a) without and (b) with environments present. Test: Derive a meaning for sentences from TRIV, in the manner of the worked example of slide 14. Understand the code in the TRIV-denotational parser. Operational semantics The final piece in the jigsaw leading from definition to implementation: both rules and practice for translation from a defined language to an operational piece of (probably simulated) hardware. Skills: read and write rules which define mappings to abstract machines. Test: Understand the code in the TRIV-operational parser, and the operation of the abstract machine invoked when the ‘run’ button is pressed after compilation. Wrapup Final test: if you could write the code in the TRIV-operational parser, given: The BNF The type rules The denotational semantics The abstract machine defintion then you are doing very well indeed!. Thus the entire course may be summarised in around 60 lines of code! – but fairly subtle code, that you couldn’t write without a deep understanding of all the issues outlined above.