Grammars

Grammars

CPSC 5135

Formal Definitions

• A symbol is a character. It represents an abstract entity that has no inherent meaning

• Examples: a, A, 3, *, - ,=

Formal Definitions

• An alphabet is a finite set of symbols.

• Examples:

A = { a, b, c }

B = { 0, 1 }

Formal Definitions

• A string (or word) is a finite sequence of symbols from a given alphabet.

• Examples:

S = { 0, 1 } is a alphabet

0, 1, 11010, 101, 111 are strings from S

A = { a, b, c ,d } is an alphabet bad, cab, dab, d, aaaaa are strings from A

Formal Definitions

• A language is a set of strings from an alphabet.

• The set can be finite or infinite.

• Examples:

A = { 0, 1}

L1 = { 00, 01, 10, 11 }

L2 = { 010, 0110, 01110,011110,

…}

Formal Definitions

• A grammar is a quadruple G = (V, Σ, R, S) where

1) V is a finite set of variables (non-terminals),

2) Σ is a finite set of terminals, disjoint from V,

3) R is a finite set of rules. The left side of each rule is a string of one or more elements from V U Σ and whose right side is a string of 0 or more elements from V U Σ

4) S is an element of V and is called the start symbol

Formal Definitions

• Example grammar:

• G = (V, Σ, R, S)

V = { S, A }

Σ = { a, b }

R = { S → aA

A → bA

A → a }

Derivations

R = S

→ aA

A

→ bA

A

→ a

• A derivation is a sequence of replacements , beginning with the start symbol, and replacing a substring matching the left side of a rule with the string on the right side of a rule

S → aA

→ abA

→ abbA

→ abba

Derivations

• What strings can be generated from the following grammar?

S → aBa

B → aBa

B → b

Formal Definitions

• The language generated by a grammar is the set of all strings of terminal symbols which are derivable from S in 0 or more steps.

• What is the language generated by this grammar?

• S → a

S → aB

B → aB

B → a

Kleene Closure

• Let Σ be a set of strings. Σ* is called the

Kleene closure of Σ and represents the set of all concatenations of 0 or more strings in Σ.

• Examples

Σ* = { 1 }* = { ø, 1, 11, 111, 1111, …}

Σ* = { 01 }* = { ø, 01, 0101, 010101, …}

Σ* = { 0 + 1 }* = set of all possible strings of 0’s and 1’s. (+ means union)

Formal Definitions

• A grammar G = (V,Σ, R, S) is right-linear if all rules are of the form:

A → xB

A → x where A, B ε V and x ε Σ*

Right-linear Grammar

• G = { V, Σ, R, S }

V = { S, B }

Σ = { a, b }

R = {

S → aS ,

S → B ,

B → bB ,

B → ε

}

What language is generated?

Formal Definitions

• A grammar G = (V,Σ, R, S) is left-linear if all rules are of the form:

A → Bx

A → x where A, B ε V and x ε Σ*

Formal Definitions

• A regular grammar is one that is either right or left linear.

• Let Q be a finite set and let Σ be a finite set of symbols. Also let δ be a function from Q x Σ to

Q, let q

0 be a state in Q and let A be a subset of Q.

We call each element of Q a state, δ the transition

function, q

0 the initial state and A the set of

accepting states.

Then a deterministic finite automaton (DFA) is a 5tuple < Q , Σ , q

0

, δ , A >

• Every regular grammar is equivalent to a DFA

Language Definition

• Recognition – a machine is constructed that reads a string and pronounces whether the string is in the language or not. (Compiler)

• Generation – a device is created to generate strings that belong to the language.

(Grammar)

Chomsky Hierarchy

• Noam Chomsky (1950’s) described 4 classes of grammars

1) Type 0 – unrestricted grammars

2) Type 1 – Context sensitive grammars

3) Type 2 – Context free grammars

4) Type 3 – Regular grammars

Grammars

• Context-free and regular grammars have application in computing

• Context-free grammar – each rule or production has a left side consisting of a single non-terminal

Backus-Naur form (BNF)

• BNF was used to describe programming language syntax and is similar to Chomsky’s context free grammars

• A meta-language is a language used to describe another language

• BNF is a meta-language for computer languages

BNF

• Consists of nonterminal symbols, terminal symbols (lexemes and tokens), and rules or productions

• <if-stmt> → if <logical-expr> then <stmt>

• <if-stmt> → if <logical-expr> then <stmt> else <stmt>

• <if-stmt> → if <logical-expr> then <stmt>

| if <logical-expr> then <stmt> else <stmt>

A Small Grammar

<program>  begin <stmt_list> end

<stmt_list>  <stmt>

| <stmt> ; <stmt_list>

<stmt>  <var> = <expression>

<var>  A | B | C

<expression>  <var> + <var>

| <var> - <var>

| <var>

A Derivation


 begin <stmt> end

 begin <var> = <expression> end

 begin A = <expression> end

 begin A = <var> + <var> end

 begin A = B + <var> end

 begin A = B + C end

Terms

• Each of the strings in a derivation is called a

sentential form.

• If the leftmost non-terminal is always the one selected for replacement, the derivation is a

leftmost derivation.

• Derivations can be leftmost, rightmost, or neither

• Derivation order has no effect on the language generated by the grammar

Derivations Yield Parse Trees


 begin <stmt> end

 begin <var> = <expression> end

 begin A = <expression> end

 begin A = <var> + <var> end

 begin A = B + <var> end

 begin A = B + C end

<Program> begin <stmt_list> end

<stmt>

<var> = <expression>

A <var> + <var>

B C

Parse Trees

• Parse trees describe the hierarchical structure of the sentences of the language they define.

• A grammar that generates a sentence for which there are two or more distinct parse trees is ambiguous.

An Ambiguous Grammar

<assign>  <id> = <expr>

<id>  A | B | C

<expr>  <expr> + <expr>

| <expr> * <expr>

| ( <expr> )

| <id>

Two Parse Trees – Same Sentence

<assign>

<id> = <expr>

A <expr> + <expr>

<id> <expr> * <expr>

B <id> <id>

C A

<assign>

<id> = <expr>

A <expr> * <expr>

<expr> + <expr> <id>

<id> <id> A

B C

Derivation 1


 A = <expr>

 A = <expr> + <expr>

 A = <id> + <expr>

 A = B + <expr>

 A = B + <expr> * <expr>

 A = B + <id> * <expr>

 A = B + C * <expr>

 A = B + C * <id>

 A = B + C * A

Derivation 2


 A = <expr>

 A = <expr> * <expr>

 A = <expr> + <expr> * <expr>

 A = <id> + <expr> * <expr>

 A = B + <expr> * <expr>

 A = B + <id> * <expr>

 A = B + C * <expr>

 A = B + C * <id>

 A = B + C * A

Ambiguity

• Parse trees are used to determine the semantics of a sentence

• Ambiguous grammars lead to semantic ambiguity - this is intolerable in a computer language

• Often, ambiguity in a grammar can be removed

Unambiguous Grammar


<id>  A | B | C

<expr>  <expr> + <term> | <term>

<term>  <term> * <factor> | <factor>

<factor>  ( <expr> ) | <id>

• This grammar makes multiplication take precedence over addition

Associativity of Operators

<assign>



<id> = <expr>

<id>



A | B | C

<expr>



<expr> + <term> |

<term>

<term>



<term> * <factor> |

<factor>

<factor>



( <expr> ) | <id>

Addition operators associate from left to right

<assign>

<id> = <expr>

A <expr> + <term>

<expr> + <term> <factor>

<term> <factor> <id>

<factor> <id> A

<id> C

B

BNF

• A BNF rule that has its left hand side appearing at the beginning of its right hand side is left recursive .

• Left recursion specifies left associativity

• Right recursion is usually used for associating exponetiation operators

<factor>  <exp> ** <factor> | <exp>

<exp>  ( <expr> ) | <id>

Ambiguous If Grammar

<stmt>  <if_stmt>

<if_stmt>  if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>

• Consider the sentential form: if <logic_expr> then if <logic_expr> then <stmt> else <stmt>

Parse Trees for an If Statement

<if_stmt>

<if_stmt>

If <logic_expr> then <stmt> else <stmt>

If <logic_expr> then <stmt>

<if_stmt>

<if_stmt> if <logic_expr> then <stmt> if <logic_expr> then <stmt> else <stmt>

Unambiguous Grammar for If

Statements

<stmt>  <matched> | <unmatched>

<matched>  if <logic_expr> then <matched> else <matched>

| any non-if statement

<unmatched>  if <logic_expr> then <stmt>

| if <logic_expr> then <matched> else <unmatched>

Extended BNF (EBNF)

• Optional part denoted by […]

<selection>  if ( <expr> ) <stmt> [ else <stmt> ]

• Braces used to indicate the enclosed part can be repeated indefinitely or left out

<ident_list>  <identifier> { , <identifier> }

• Multiple choice options are put in parentheses and separated by the or operator |

<for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>

BNF vs EBNF for Expressions

BNF:

<expr>  <expr> + <term>

| <expr> - <term>

| <term>

<term>  <term> * <factor>

| <term> / <factor>

| <factor>

EBNF:

<expr>  <term> { (+ | - ) <term> }

<term>  <factor> { ( * | / ) <factor>

Grammars - Columbus State University

Formal Definitions

Formal Definitions

Formal Definitions

Formal Definitions

Formal Definitions

Formal Definitions

Derivations

Derivations

Formal Definitions

Kleene Closure

Formal Definitions

Right-linear Grammar

Formal Definitions

Formal Definitions

Language Definition

Chomsky Hierarchy

Grammars

Backus-Naur form (BNF)

BNF

A Small Grammar

A Derivation

Terms

Derivations Yield Parse Trees

Parse Trees

An Ambiguous Grammar

Two Parse Trees – Same Sentence

Derivation 1

Derivation 2

Ambiguity

Unambiguous Grammar

Associativity of Operators

BNF

Ambiguous If Grammar

Parse Trees for an If Statement

Unambiguous Grammar for If

Statements

Extended BNF (EBNF)

BNF vs EBNF for Expressions

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib