Syntax

advertisement
SYNTAX
Outline

Programming Language Specification
Lexical Structure of PLs
Syntactic Structure of PLs
Context-Free Grammar / BNF
Parse Trees

Abstract Syntax Trees

Ambiguous Grammar

Associativity and Precedence

EBNFs and Syntax Diagrams




Nandigam
2
Programming Language
Specification

PLs require precise definitions (i.e. no ambiguity)



Language form (Syntax)
Language meaning (Semantics)
Consequently, PLs are specified using formal
notation:

Formal syntax



Formal semantics



Nandigam
Tokens
Grammar
Operational
Denotational
Axiomatic
3
Lexical Structure of PLs
Nandigam
4
Lexical Structure of PLs

Main task of scanner: identify tokens







(cont.)
Basic building blocks of programs
E.g. keywords, identifiers, numbers, punctuation marks
Lexeme – an instance of a token.
One can think of programs as strings of lexemes rather than of
characters
A token of a language is a category of its lexemes (or
instances)
Some tokens can have one or more lexemes
 E.g. keyword, identifier, number
In some cases, a token has only one single possible lexeme
 E.g. equal_sign, plus_op, mult_op
Nandigam
5
Lexical Structure of PLs

(cont.)
Consider the following Java statement:
index = 2 * count + 17 ;

The lexemes and tokens of this statement are:
Lexemes
Nandigam
Tokens
index
identifier
=
equal_sign
2
int_literal
*
mult_op
count
identifier
+
plus_op
17
int_literal
;
semicolon
6
Lexical Structure of PLs



(cont.)
Tokens in a programming language are described formally by
regular expressions.
Regular expressions – descriptions of patterns of characters
Regular expression operations

Basic operations





Additional operations




Nandigam
Concatenation
Choice or selection
Repetition
Grouping
One or more repetitions
Range of characters
Optional
Any character
item sequencing
|
*
()
+
[-]
?
.
7
Lexical Structure of PLs

(cont.)
Regular expression examples

(a|b)*c


[0-9]+


Floating-point literals
[a-zA-Z][a-zA-Z0-9_]*

Nandigam
Integer constants with one or more digits
[0-9]+(\.[0-9]+)?


String that match include ababaac, aac, bbc, c, and babc
Identifiers
8
Lexical Structure of PLs

Scanners generators:





(cont.)
lex, flex
ANTLR – Another Tool for Language Recognition
These programs can be used to generate a program
(i.e., a scanner) that can extract tokens from a stream
of characters.
Many PLs provide good support for regular
expressions – Java, C#, Perl, Ruby, …
Support for regular expressions in Java


Nandigam
java.util.regex package
split() method of String class
9
Syntactic Structure of PLs

Specifying the form of a programming language

Tokens


Syntax – organization of tokens

Nandigam
Regular Expression
Context-Free Grammars (CFGs)
10
Context-Free Grammar

Context-free grammars (CFGs) are used to describe
the syntax of PLs.


BNF (Backus-Naur Form) is a notation for describing
syntax.




Proposed by Noam Chomsky – a noted linguist
Proposed by John Backus and Peter Naur
CFG and BNF are nearly identical and are used
interchangeably.
BNF is a metalanguage for programming languages.
A metalanguage is a language that is used to describe
another language.
Nandigam
11
Context-Free Grammar


CFG or BNF consists of a series of rules or productions.
Productions are made up of:
 Nonterminals – structures that are broken down into
further structures
 Terminals – things that cannot be broken down
 Metasymbols





(cont.)
Symbols that are part of CFG/BNF
These are not actual symbols in the language being described
Sometimes, a metasymbol is also an actual symbol in a language
One of the nonterminals is designated as the start
symbol.
The start symbol stands for the entire structure being
defined.
Nandigam
12
Context-Free Grammar

(cont.)
CFG/BNF Example (Figure 4.2, page 83)
(1) sentence → noun-phrase verb-phrase .
(2) noun-phrase → article noun
(3) article → a | the
(4) noun → girl | dog
(5) verb-phrase → verb noun-phrase
(6) verb → sees | pets
Nandigam
13
Context-Free Grammar

(cont.)
The language of a CFG is the set of strings of terminals that can be
generated from the start symbol by a derivation:
sentence  noun-phrase verb-phrase . (rule 1)
 article noun verb-phrase . (rule 2)
 the noun verb-phrase . (rule 3)
 the girl verb-phrase . (rule 4)
 the girl verb noun-phrase . (rule 5)
 the girl sees noun-phrase . (rule 6)
 the girl sees article noun . (rule 2)
 the girl sees a noun . (rule 3)
 the girl sees a dog . (rule 4)
Nandigam
14
Context-Free Grammar

(cont.)
Derivation – Generating sentences of the language through a
sequence of applications of rules (or productions), beginning
with a special nonterminal called the start symbol.

Leftmost derivation – The replaced nonterminal is always the
leftmost nonterminal.

Rightmost derivation – The replaced nonterminal is always
the rightmost nonterminal.

A derivation may be neither leftmost nor rightmost. Derivation
order has no effect on the language generated by a grammar.
Nandigam
15
Context-Free Grammar

(cont.)
A grammar for a small language
<program> → begin <stmt_list> end
<stmt_list> → <stmt>
| <stmt> ; <stmt_list>
<stmt> → <var> := <expr>
<expr> → <var> + <var>
| <var> - <var>
| <var>
<var> → A | B | C


Derive the following program:
begin A := B + C ;
B := C
end
Is the language defined by this grammar finite or infinite?
Nandigam
16
Context-Free Grammar


(cont.)
Left recursive rule – A BNF rule is left recursive if the left-hand side
(LHS) appears at the beginning of its right-hand side (RHS).
Right recursive rule – A BNF rule is right recursive if the LHS appears
at the right end of the RHS.

Examples:

Uses of recursion in BNF:
number  number digit | digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
expr  expr + expr
| expr  expr
| ( expr ) | number


Nandigam
to show repetition
to describe complex structures
17
Parse Trees



A parse tree is a graphical representation of hierarchical syntactic
structure of sentences. It describes graphically the replacement
process in a derivation.
A parse tree is labeled by nonterminals at interior nodes and
terminals at leaves.
A parse tree better expresses the structure inherent in a derivation.
expr
number
expr
(
digit
2
Nandigam
expr
)
number
4
digit
expr
digit
expr
digit
number
number
*
+
expr
4
3
number
number
digit
digit
2
3
18
Parse Trees
(cont.)
Problem 1:
<assign> → <id> := <expr>
<expr> → <id> + <expr> | <id> * <expr> | ( <expr> ) | <id>
<id> → A | B | C
Show a leftmost derivation and a parse tree for each of the following
statements:
A
A
A
A
Nandigam
:=
:=
:=
:=
A
B
A
B
+(B *C)
+C+A
*(B+C)
*(C*(A+B))
19
Parse Trees
(cont.)
Problem 2:
Describe, in English, the language defined by the following grammar:
<S> → <A> <B> <C>
<A> → a <A> | a
<B> → b <B> | b
<C> → c <C> | c
Problem 3:
Consider the following grammar:
<S> → <A> a <B> b
<A> → <A> b | b
<B> → a <B> | a
Which of the following sentences are in the language generated by this
grammar?
baab
bbbab
bbaaaaa
bbaab
Nandigam
20
Parse Trees
(cont.)
Problem 4:
Consider the following grammar:
<S> →
<S> →
<A> →
<B> →
a <S> c <B>
<A> | b
c <A> | c
d | <A>
Which of the following sentences are in the language generated by the
grammar?
abcd
acccbd
acccbcc
acd
accc
Nandigam
21
Abstract Syntax Trees







Parse trees are still too detailed in their structure, since every
step in a derivation is expressed as nodes
Abstract Syntax Tree or (just syntax tree) shows the essential
structure of a parse tree.
AST is more compact than the corresponding parse tree
An (abstract) syntax tree condenses a parse tree to its
essential structure
Language designers and translator writers are most interested
in abstract syntax.
A programmer is most interested in concrete syntax
Examples on the next two slides…
Nandigam
22
Abstract Syntax Trees
4
number
digit
number
number
digit
digit
4
3
3
2
2
Parse Tree
Nandigam
(cont.)
Corresponding AST
23
Abstract Syntax Trees
(cont.)
*
expr
expr
(
expr
*
)
expr
number
+
4
digit
expr
+
expr
4
number
number
digit
digit
2
3
Parse Tree
Nandigam
2
3
Corresponding AST
24
Ambiguous Grammars


A grammar is ambiguous if it is possible to construct two or more distinct parse
trees for the same string
Example:



Grammar:
expr  expr + expr | expr  expr
| ( expr ) | NUMBER
Expression: 2 + 3 * 4
Parse trees – ambiguity in operator precedence
expr
expr
NUMBER
(2)
Nandigam
+
expr
expr
expr
expr * expr
expr + expr
NUMBER
(3)
NUMBER
(4)
*
expr
NUMBER
(4)
NUMBER
(2)
NUMBER
(3)
25
Ambiguous Grammars

(cont.)
Another Example:



Grammar:
expr  expr + expr | expr  expr
| ( expr ) | NUMBER
Expression: 2 - 3 - 4
Parse trees – ambiguity in operator associativity
expr
expr
NUMBER
(2)
-
expr
expr
expr
expr - expr
expr - expr
NUMBER
(3)
Nandigam
NUMBER
(4)
-
expr
NUMBER
(4)
NUMBER
(2)
NUMBER
(3)
26
Ambiguous Grammars

Ways to resolve ambiguities in a grammar



Revise grammar – desired approach
Provide disambiguating rule (semantic help)
Revising grammar to address precedence and associativity ambiguities






(cont.)
Do not write rules that allow a parse tree to grow on both left and right sides
Use left recursive rules for left-associative operators
Use right recursive rules for right-associative operators
Add new rules that establish “precedence cascade” between rules to specify
precedence
Make sure operators with higher precedence appear lower in the cascade of
rules
Revised grammar
expr  expr + term | term
term  term * factor | factor
factor  ( expr ) | NUMBER
Nandigam
27
Ambiguous Grammars
(cont.)
Problem 1:
<expr> → <expr> + <expr>
| <expr> - <expr>
| <expr> * <expr>
| <expr> / <expr>
| ( <expr> ) | NUMBER
NUMBER
= [0-9]+
Show that this grammar is ambiguous by constructing two distinct parse
trees for each of the following expressions:
30 + 5 + 2
30 – 5 – 2
30 * 5 * 2
30 / 5 / 2
30 + 5 * 2
Nandigam
28
Ambiguous Grammars

(cont.)
Revised unambiguous grammar
<expr> →
<expr> + <term>
| <expr> - <term>
| <term>
<term> →
<term> * <factor>
| <term> / <factor>
| <factor>
<factor> → ( <expr> ) | NUMBER
NUMBER
Nandigam
=
[0-9]+
29
Ambiguous Grammars
(cont.)
Problem 2:
Show that the following grammar is ambiguous:
<S> → <A>
<A> → <A> + <A> | <id>
<id>
→ a | b | c
Nandigam
30
Ambiguous Grammars

Are there other alternatives to resolving ambiguities?


(cont.)
Yes, but they change the language!
Fully-parenthesized expressions:
expr  ( expr + expr ) | ( expr - expr )
| NUMBER

Prefix expressions:
expr  + expr expr | - expr expr
| NUMBER
Nandigam
31
Extended BNF




Adds new metasymbols (or operations) to BNF to enhance
readability and writability.
These new extensions do not enhance the descriptive power
of BNF.
It facilitates development of parsing tools based on an
approach called Recursive-Descent Parsing.
New metasymbols added to EBNF:



Nandigam
{}
[]
(|)
zero or more repetitions
optional parts
multiple-choice
32
Extended BNF

Examples:
BNF:
EBNF:
<number> → <number> <digit> | <digit>
<number> → <digit> {<digit>}
BNF:
EBNF:
<expr> → <expr> + <term> | <term>
<expr> → <term> {+ <term>}
BNF:
EBNF:
<expr> → <term> ^ <expr> | <term>
<expr> → <term> [^ <expr>]
BNF:
<selection> → if <logic-expr> then <statement>
| if <logic-expr> then <statement> else <statement>
<selection> →if <logic-expr> then <statement>
[else <statement>]
EBNF
BNF:
EBNF:
Nandigam
(cont.)
<for-stmt> → for <var> := <expr> to <expr> do <statement>
| for <vat> := <expr> downto <expr> do <statement>
<for-stmt> → for <var> := <expr> (to | downto) <expr> do <stmt>
33
Extended BNF

(cont.)
More examples:
BNF:
<expr> → <expr> + <term> | <term>
<term> → <term> * <power> | <term> / <power> |
<term> % <power> | <power>
<power> → <factor> ^ <power> | factor
<factor> → (<expr>) | NUMBER
NUMBER = [0-9]+
EBNF:
<expr> → <term> {+ <term>}
<term> → <power> { * <power> | / <power> | %
<power> → <factor> [^ <power>]
<factor> → (<expr>) | NUMBER
NUMBER = [0-9]+
Nandigam
<power> }
34
Syntax Diagrams







A graphical representation for a grammar rule
An alternative to EBNF
Circle or ovals for terminals
Squares or rectangles for nonterminals
Terminals and nonterminals are connected with lines and arrows
Visually appealing but takes up space
Rarely seen any more: EBNF is much more compact
if-statement
if
(
statement
Nandigam
expression
)
else
statement
35
Download