Lecture # 7 Chapter 4: Syntax Analysis What is the job of Syntax Analysis? • Syntax Analysis is also called Parsing or Hierarchical Analysis. • A Parser implements grammar of the language may it be C, C++ etc • The parser obtains a string of tokens from the lexical analyzer and verifies that the string can be generated by the grammar for the source language • The grammar that a parser implements is called a Context Free Grammar or CFG Position of a Parser in the Compiler Model Source Program Token, tokenval Lexical Analyzer Get next token Lexical error Parser and rest of front-end Intermediate representation Syntax error Semantic error Symbol Table 3 The Parser The role of the parser is twofold: 1. To check syntax (= string recognizer) – And to report syntax errors accurately 2. To invoke semantic actions – For static semantics checking, e.g. type checking of expressions, functions, etc. 4 What is the difference between Syntax and Semantic? • Syntax is the way in which we construct sentences by following principles and rules. • Semantics is the interpretations of and meanings derived from the sentence transmission and understanding of the message or in other words are the logical sentences making sense or not Error Handling • A good compiler should assist in identifying and locating errors – Lexical errors: important, compiler can easily recover and continue such as misspelling an identifier, keyword etc. – Syntax errors: most important for compiler, can almost always recover such as arithmetic expression with unbalanced parenthesis – Static semantic errors: important, can sometimes recover such as operator applied to incompatible operands – Dynamic semantic errors: hard or impossible to detect at compile time, runtime checks are required – Logical errors: hard or impossible to detect such as infinite recursive calls 6 Viable-Prefix Property • The viable-prefix property of parsers allows early detection of syntax errors – Goal: detection of an error as soon as possible without further consuming unnecessary input – How: detect an error as soon as the prefix of the input does not match a prefix of any string in the language Prefix … for (;) … Error is detected here 7 Error Recovery Strategies • Panic mode – Discard input until a token in a set of designated synchronizing tokens is found • Phrase-level recovery – Perform local correction on the input to repair the error • Error productions – Augment grammar with productions for erroneous constructs • Global correction – Choose a minimal sequence of changes to obtain a global least-cost correction 8 Context free Grammar • Context-free grammar is a 4-tuple G = (N, T, P, S) where – T is a finite set of tokens (terminal symbols) – N is a finite set of nonterminals – P is a finite set of productions of the form where N and (NT)* – S N is a designated start symbol Example Grammar Context-free grammar for simple expressions: G = <{expr,op,digit}, {+,-,*,/,0,1,2,3,4,5,6,7,8,9,),(}, P,expr> with productions P = expr expr op expr expr (expr) expr digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 op + | - | * | / 10 Notational Conventions Used • Terminals: Lower case letters, operator symbols, punctuation symbols, digits, bolface strings are all terminals • Non Terminals: Upper case letters, lower case italic names are usually non terminals • Greek letters such as ,, represent strings of grammars symbols. Thus a generic production can be written as A Examples of CFG • Write a CFG that generates Even Palindrome S aSa | bSb | є • Write a CFG that generates Odd Palindrome S aSa | bSb | a | b • Write a CFG that generates Equal number of a’s and b’s S aSbS | bSaS | є Examples • Write a CFG that generates Equal number of a’s and b’s S aSbScS | aScSbS | bSaScS | bScSaS | cSaSbS | cSbSaS | є Practice a) CFG generating alternating sequence of 0’s and 1’s b) CFG in which no consecutive b’s can occur but consecutive a’s can occur c) CFG for the following language: L(G)= {an b2n | n>=0} Practice Answers a) S 0A | 1B A 1B | 0 B 0A | 1 b) S aS | bT |a |b T aS | a c) S aSbb | є Example • Design a CFG for the language L(G)= {0n 1m | n <> m} There are two cases: – For n>m – For n<m – Write two separate set of rules and combine them Example • For n>m S1 AB B0A1 | Є A0A | 0 For n<m S2 XY X0X1 | Є Y1Y | 1 Combining both: S S1 | S2 Derivations • The one-step derivation is defined by A where A is a production in the grammar • In addition, we define – – – – is leftmost lm if does not contain a nonterminal is rightmost rm if does not contain a nonterminal Transitive closure * (zero or more steps) Positive closure + (one or more steps) • The language generated by G is defined by L(G) = {w T* | S + w} Derivation (Example) Grammar G = ({E}, {+,*,(,),-,id}, P, E) with productions P = EE+E EE*E E(E) E-E E id Example derivations: E - E - id E rm E + E rm E + id rm id + id E * E E * id + id E + id * id + id 19 Example • For the given grammar derive the string 9-5+2 using Left most derivation Then derive the same string using Right most derivation Example Grammar Context-free grammar for simple expressions: G = <{list,digit}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, list> with productions P = list list + digit list list - digit list digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 21 Derivation for the Example Grammar list list + digit list - digit + digit digit - digit + digit 9 - digit + digit 9 - 5 + digit 9-5+2 This is an example leftmost derivation, because we replaced the leftmost nonterminal (underlined) in each step. Likewise, a rightmost derivation replaces the rightmost nonterminal in each step 22