Richard Connor - CIS Personal Web Pages

advertisement
52 358 Compilers – course overview and key skills outcomes.
Richard Connor
This document provides an overview of all the lecture material; in particular, it
emphasises what you should be able to do as a result of the material.
This is a practical-based course, and concentrates on transferable general skills rather
than abstract knowledge. Note that this is in contrast with many other similar courses
and most textbooks – ie pass the course by understanding these notes, not by reading
textbooks.
If you can do all (most) of the tests stated below, you will find the exam very easy to
pass!
Course structure by lectures:
Intro:
Basic definition of languages; definitions of syntax and semantics; the concept of a
language implementation; languages that can be mechanically interpreted;
introduction to soundness and completeness
Skills: Must know: everything presented here. These are core definitional concepts.
Regular Expressions
Introduction to the simplest class of ‘interesting’ computed languages. The syntax
taught is taken from (and is a subset of) the ECMA-262 (‘JavaScript’) standard and as
such is widely available. Most of the teaching for this subject is through the selftaught unit available from the web site.
Skills: to be able to read and write regular expressions fluently.
Test: you should know what all of the following mean. You should be able to
construct these expressions themselves given an alternative definition of the set of
strings they represent.






a*
aa
ab?
a*b+a?a
[1-9][0-9]*
[0-9].[0-9]+
Structural induction
This is a core skill that should already be understood from a general mathematical
background, but is included here for revision purposes.
Skills: understand the definition of an infinite set, containing members of unbounded
size, via structural induction.
Test: understand fully the definition of the meaning of natural numbers in slide 5.
Derive a meaning for the string ‘263’ based on this.
Context-free syntax
This introduces BNF and eBNF, the bread-and-butter of language definition.
Skills: You must be able to read, write and understand any BNF definition in the class
of context-free of languages (single non-terminal definitions –see next module). You
should understand the difference between pure BNF and various extended forms, and
be fairly confident about translating between them.
Tests: Read the JavaScript eBNF – you should be able to understand most of it (it’s
fairly difficult around ‘new’ and ‘member’ expressions, though!) Write all the above
REs in BNF – you should be able to do this easily. Complete the URL BNF exercise
in the lecture; that’s about the level of difficulty you are likely to meet in an exam.
Expressive power
A theory module, but with important salient facts: definition wrt BNF of regular
expressions, context-free languages, and context-sensitive languages. The requirement
for context-sensitive languages, and the realisation that BNF isn’t a good way to
define them, leading to multi-part definitions: BNF structure with other layered rules
based on this structure.
Skills: none.
Language implementation
Outline of a bit of history and the way a modern compiler is structured. The concept
of simulation is important, and the definition of a compiler as a function that
translates one form of symbols to another, in a different language, but with the same
meaning.
Skills: none.
Lexical analysis
Quite a difficult module, mixing some significant theory and practice. The main point
is that if we can assume most of our language are LL(1) then we can parse them
(which means to construct the proof tree of inclusion in the BNF structure) from left
to right. Nowadays we do normally make this assumption, and most well-designed
language are amenable to this. This reflects on the way we write parsers, which in
turn reflects on the lexical analysis abstractions required. This entire course is based
on languages with this property; students should be aware that other classes of
language exist. Other important understanding is the arbitrary division between syntax
and microsytnax, and the reasons for maintaining this.
Skills: fully understand the lexical analysis example interface given, at a sufficiently
deep level to adapt to different interfaces with slightly different properties.
Test: completion of parts 1 and 2 of the compiler practical coursework.
Syntax analysis
Includes only recursive descent – other methods do exist! Key understanding is the
relationship between the recursive descent functions and the BNF definition of the
language, this being so tight it can often be automated. The deep knowledge here is
the understanding of the relationship between the flow of recursive function calls and
the proof structure of BNF inclusion for the same input string. A string not in the
language will result in an error by side-effect, a string in the language will cause the
parser to terminate silently.
Skills: to translate fluently between a BNF definition and a set of recursive descent
parsing functions.
Test: check and try to re-implement the example language TRIV-CF from the
examples. Follow through the set of recursive calls made for legal and non-legal
sentences of this language.
Type rules
Definitional framework for context-sensitive syntax. Based on BNF structure.
Understand the structure and meaning of these rules as a set of logical implications;
read as axioms from the top down, used as proof steps from the bottom up.
Skills: read and write type rules (a) without and (b) with environments present.
Test: prove of disprove the inclusion of simple examples of TRIV programs as
defined on the last slide. Take some simple BNF examples from the ECMA-262
(untyped) syntax and write type rules for them. Understand all the code in the TRIVtyped parser.
Semantics
Only defintional framwork by giving rules to rewrite a language sentence in another
language. Denotational semantics: using the common language of ‘maths’;
operational semantics: using a separately defined context, eg another language or
machine. Advantages and disadvantages of different translation targets. The ‘fat
bracket’ notation for rewrite rules. Revision of soundness and completeness
interpreted in terms of type rules and semantic rules.
Skills: read and write semantic rules (a) without and (b) with environments present.
Test: Derive a meaning for sentences from TRIV, in the manner of the worked
example of slide 14. Understand the code in the TRIV-denotational parser.
Operational semantics
The final piece in the jigsaw leading from definition to implementation: both rules
and practice for translation from a defined language to an operational piece of
(probably simulated) hardware.
Skills: read and write rules which define mappings to abstract machines.
Test: Understand the code in the TRIV-operational parser, and the operation of the
abstract machine invoked when the ‘run’ button is pressed after compilation.
Wrapup
Final test: if you could write the code in the TRIV-operational parser, given:




The BNF
The type rules
The denotational semantics
The abstract machine defintion
then you are doing very well indeed!. Thus the entire course may be summarised in
around 60 lines of code! – but fairly subtle code, that you couldn’t write without a
deep understanding of all the issues outlined above.
Download