Uploaded by Sanjay A C

ATCD

advertisement
AUTOMATA THEORY AND COMPILER DESIGN
CHAPTER 1
INTRODUCTION TO THE AUTOMATA
THEORY AND COMPILER DESIGN
1.1 Description of Automata Theory:
Automata Theory is a branch of theoretical computer science that deals with abstract
machines and computational models. It encompasses the study of formal languages, which
are sets of strings or sequences of symbols, and the abstract machines that recognize or
generate these languages. The key concepts and components of Automata Theory include:
• Formal Languages: These are sets of strings over a finite alphabet. Formal languages
are classified based on their properties, such as regular languages, context-free
languages, context-sensitive languages, and recursively enumerable languages.
• Alphabets: An alphabet is a finite set of symbols from which strings are formed. In
Automata Theory, alphabets are typically denoted by Σ.
• Automata: Automata are abstract machines used to recognize languages. There are
several types of automata, including finite automata (FA), pushdown automata
(PDA), and Turing machines (TM). These machines vary in power and capability,
with Turing machines being the most powerful.
• Finite Automata (FA): Finite automata are abstract machines with a finite set of
states. They read input symbols from an input tape and transition between states
based on the input. Finite automata are used to recognize regular languages.
• Pushdown Automata (PDA): Pushdown automata are extensions of finite automata
with an additional stack memory. They can recognize context-free languages, which
are more powerful than regular languages.
• Turing Machines (TM): Turing machines are the most powerful computational model
in Automata Theory. They consist of an infinite tape divided into cells, a read/write
head that moves left or right on the tape, and a finite set of states. Turing machines
can recognize recursively enumerable languages, which include all languages that
can be computed by any algorithm.
• Regular Expressions: Regular expressions are compact notations for specifying
regular languages. They are used in pattern matching and text processing
applications.
•
•
Language Recognition and Generation: Automata Theory provides methods for
recognizing whether a given string belongs to a particular language (acceptance
problem) and generating strings that belong to a language (generation problem).
Automata Theory has applications in various areas of computer science, including
compiler design, parsing, formal verification, and natural language processing. It
provides fundamental insights into the capabilities and limitations of computational
Department of CSE , AIET
1
AUTOMATA THEORY AND COMPILER DESIGN
systems, helping researchers and practitioners understand the nature of computation
and develop efficient algorithms and systems.
1.2 Introduction to DFA:
1.2.1 Definition of DFA:
A DFA, or Deterministic Finite Automaton, is a computational model used in Automata
Theory and formal language theory. It consists of five components:
1. Alphabet (Σ): A finite set of symbols from which input strings are formed.
2. States (Q): A finite set of states representing different configurations that the
automaton can be in at any given time.
3. Transition Function (δ): A function that maps each state and input symbol pair
to a unique next state. It describes how the automaton transitions between states
based on the input symbols it reads.
4. Start State (q₀): The initial state of the automaton where it begins its computation.
5. Accepting States (F): A subset of states that designate which configurations are
considered "accepting" or "final". If the automaton reaches any of these states after
processing an input string, it recognizes the input string as belonging to the
language defined by the automaton.
Formally, a DFA is defined by the tuple (Q, Σ, δ, q₀, F), where:
•
Q is the set of states.
•
Σ is the input alphabet.
•
δ is the transition function, δ: Q × Σ → Q.
•
q₀ is the start state, q₀ ∈ Q.
•
F is the set of accepting states, F ⊆ Q.
1.2.2 Example for DFA:
Design a FA with ∑ = {0, 1} accepts those string which starts with 1 and ends
with 0.
Solution:
The FA will have a start state q0 from which only the edge with input 1 will go to the next
state.
Department of CSE , AIET
2
AUTOMATA THEORY AND COMPILER DESIGN
Fig 1.1 Example for DFA
Present State
Next State for Input 0
Next State for Input 1
a
a
b
b
c
a
c
b
c
In state q1, if we read 1, we will be in state q1, but if we read 0 at state q1, we will reach to
state q2 which is the final state. In state q2, if we read either 0 or 1, we will go to q2 state or
q1 state respectively. Note that if the input ends with 0, it will be in the final state.
1.1 Introduction to NFA:
1.3.1 Definition of NFA:
An NFA, or Non-deterministic Finite Automaton, is another type of computational model
used in Automata Theory and formal language theory. Like a DFA, it consists of five
components:
1. Alphabet (Σ): A finite set of symbols from which input strings are formed.
2. States (Q): A finite set of states representing different configurations that the
automaton can be in at any given time.
3. Transition Function (δ): Unlike in a DFA, the transition function in an NFA can
map a state and input symbol pair to multiple possible next states or to a special
empty transition. This non-determinism allows for branching behaviour, where the
automaton can explore multiple paths simultaneously.
4. Start State (q₀): The initial state of the automaton where it begins its computation.
Department of CSE , AIET
3
AUTOMATA THEORY AND COMPILER DESIGN
5. Accepting States (F): A subset of states that designate which configurations are
considered "accepting" or "final". If the automaton reaches any of these states after
processing an input string, it recognizes the input string as belonging to the
language defined by the automaton.
Formally, an NFA is defined by the tuple (Q, Σ, δ, q₀, F), where:
•
Q is the set of states.
•
Σ is the input alphabet.
•
δ is the transition function, δ: Q × (Σ 𝖴 {ε}) → 2^Q, where ε denotes an empty
transition.
•
q₀ is the start state, q₀ ∈ Q.
•
F is the set of accepting states, F ⊆ Q.
An NFA differs from a DFA in that it allows for non-deterministic transitions, meaning
that for a given state and input symbol, there can be multiple possible next states or an
empty transition. This non-determinism allows NFAs to represent certain language classes
more compactly than DFAs.
1.3.2 Example for NFA:
Design an NFA with ∑ = {0, 1} in which double '1' is followed by
double '0'. Solution:
The FA with double 1 is as follows:
Fig 1.2 Example for NFA
It should be immediately followed by double 0.
Then,
Department of CSE , AIET
4
AUTOMATA THEORY AND COMPILER DESIGN
Now before double 1, there can be any string of 0 and 1. Similarly, after double 0, there can
be any string of 0 and 1.
Hence the NFA becomes:
Now considering the string 01100011
q0 → q1 → q2 → q3 → q4 → q4 → q4 → q4
Present
Next State for
State
Input 0
Next State for Input
1
a
a, b
b
b
c
a, c
c
b, c
c
1.4Introduction to Compilers:
A compiler is a crucial software tool in computer science that translates high-level
programming language code into machine code that a computer's processor can execute. It
comprises several stages, including lexical analysis, syntax analysis, semantic analysis,
intermediate code generation, optimization, and code generation. Understanding compilers
is essential for programmers to write efficient code and grasp programming language
fundamentals.
1.4.1 Description of Compilers:
A compiler is a fundamental tool in computer science that translates source code written in
a high-level programming language into machine code that can be executed by a computer's
Department of CSE , AIET
5
AUTOMATA THEORY AND COMPILER DESIGN
processor. The process of compilation involves several stages, each of which plays a crucial
role in transforming human-readable code into executable instructions.
1. Lexical Analysis (Scanning): This initial stage involves breaking the source code into
smaller units called tokens. These tokens can be keywords, identifiers, literals, or
symbols. The purpose of lexical analysis is to simplify the code structure for further
processing.
2. Syntax Analysis (Parsing): In this stage, the compiler analyses the structure of the code
according to the rules of the programming language's grammar. It creates a parse tree
or syntax tree, which represents the hierarchical structure of the code. Syntax analysis
ensures that the code follows the syntactic rules of the language.
3. Semantic Analysis: After syntax analysis, the compiler performs semantic analysis to
check the meaning of the code. It verifies that the code adheres to the semantic rules of
the language, such as type compatibility, variable declarations, and scope rules.
Semantic analysis helps in detecting and reporting errors that cannot be identified
through syntax alone.
4. Intermediate Code Generation: At this stage, the compiler generates an intermediate
representation of the source code. This intermediate code is often platform-independent
and serves as an intermediate step before generating the target machine code. Common
forms of intermediate code include abstract syntax trees (ASTs) or three-address code.
5. Optimization: Optimization is an optional but crucial stage where the compiler
improves the intermediate code to enhance the performance of the generated
executable. Various optimization techniques, such as constant folding, loop
optimization, and code motion, are applied to make the code more efficient while
preserving its functionality.
6. Code Generation: In the final stage, the compiler translates the optimized intermediate
code into the target machine code specific to the hardware architecture. This machine
code consists of low-level instructions that the processor can directly execute. Code
generation involves mapping high-level language constructs to corresponding machine
instructions and managing memory allocation and usage.
1.5 Difference between Compiler and Interpreter
Department of CSE , AIET
6
AUTOMATA THEORY AND COMPILER DESIGN
Table 1.1 Difference Between Compiler and Interpreter
Feature
Compiler
Interpreter
Input
Takes entire source code as input.
Output
Generates machine
intermediate code.
Execution Speed
Generally faster execution.
Tends to have slower execution.
Standalone Program
Generates standalone executable
files.
Does not produce
executable files.
Debugging
May be more challenging to debug Easier to debug as errors are
due
to
optimization and encountered during execution.
compilation steps.
Memory Usage
Compiler-generated executables
may require less memory.
Interpreter consumes memory for
both source code and runtime
environment.
Portability
Compiled executables are less
Interpreted code is more portable,
as long as the interpreter is
available for the target platform.
portable
platforms.
Department of CSE , AIET
code
across different
Takes source code line by line as
input.
or Executes code directly without
generating executable files
standalone
7
AUTOMATA THEORY AND COMPILER DESIGN
CHAPTER 2
2. PROBLEM STATEMENT
Consider the following grammar:
S → Ako
A → Ad | aB | aC
B → bBc | r
Construct the LL(1) Parsing Table. Show the Parsing moves of LL(1) Parser to
parse the string arko
2.1 Brief Logic for Problem Solution:
2.1.1 Calculate First and Follow sets:
Determine the set of first terminals that can start from each non-terminal. Identify the set of
terminals that can follow each non-terminal.
2.1.2 Construct LL (1) Parsing Table:
Use the First and Follow sets to populate the LL (1) parsing table. For each non-terminal and
terminal combination, fill in the corresponding production rule or rules.
2.1.3 Check for LL (1) Grammar:
Verify that each entry in the parsing table is unique. Ensure that there are no conflicts
(multiple entries) for the same non-terminal and terminal combination.
2.1.4 Construct Parsing Moves of LL(1) Parser for the string arko:
Department of CSE , AIET
8
AUTOMATA THEORY AND COMPILER DESIGN
CHAPTER 3
SOLUTION
The given grammar is as follows:
S -> cABc
A -> aAa | c
B -> bBb | c
3.1. Grammar Rewrite:
The grammar can be written as,
S -> cABc
A -> aAa
A -> c
B -> bBb
B -> c
3.2. Calculate FIRST:
Calculating FIRST(S)
FIRST(S) = FIRST(cABc)
FIRST(S) = FIRST(c)
FIRST(S) = {c}
Calculating FIRST(A)
FIRST(A) = FIRST(aAa) U FIRST(c)
FIRST(A) = FIRST(a) U FIRST(c)
FIRST(A) = {a} U {c}
FIRST(A) = {a,c}
Calculating FIRST(B)
Department of CSE , AIET
9
AUTOMATA THEORY AND COMPILER DESIGN
FIRST(B) = FIRST(bBb) U FIRST(c)
FIRST(B) = FIRST(b) U FIRST(c)
FIRST(B) = {b} U {c}
FIRST(B) = {b,c}
Therefore,
FIRST(S) = {c}
FIRST(A) = {a,c}
FIRST(B) = {b,c}
3.3. Calculate FOLLOW:
Calculating FOLLOW(S)
FOLLOW(S) = {$}
Calculating FOLLOW(A)
FOLLOW(A) = FIRST(B) U FIRST(a)
FOLLOW(A) = {b,c} U {a,c}
FOLLOW(A) = {a,b,c}
Calculating FOLLOW(B)
FOLLOW(B) = FIRST(c) U FIRST(b)
FOLLOW(B) = {c} U {B}
FOLLOW(B) = {b,c}
3.4. Parse Table:
Construct parsing table by placing all terminals in columns and non-terminals in rows
Blanks are error entries; non-blanks indicate a production with which to expand a non-terminal.
Department of CSE , AIET
10
AUTOMATA THEORY AND COMPILER DESIGN
CONCLUSION:
The parsing process for "arko" revealed conflicts in the LL(1) parsing table, particularly
when popping the non-terminal A with input r and o. This suggests the grammar you
provided (S -> Ako, A -> Ad | aB | aC, B -> bBc | r) might not be LL(1). In an LL(1)
grammar, the parsing table wouldn't have these ambiguities, allowing for a clear and
unambiguous parsing process. It's possible that the grammar needs adjustments to remove
left recursion or other factors that prevent LL(1) properties, or there might be an error in
how the parsing table was constructed. Further investigation of the grammar and potential
modifications would be necessary to determine if a valid LL(1) parsing table can be
achieved.
Department of CSE , AIET
11
AUTOMATA THEORY AND COMPILER DESIGN
REFERENCE
• Textbook 1: John E Hopcroft, Rajeev Motwani, Jeffrey D. Ullman,
“Introduction to Automata Theory, Languages and Computation”, Third
Edition, Pearson
• Textbook 2: Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D.
Ullman, “Compilers Principles, Techniques and Tools”, Second
Edition, Pearson.
• https://www.geeksforgeeks.org/shift-reduce-parser-compiler
• https://www.tutorialspoint.com/what-is-shift-reduce-parser
•
https://www.javatpoint.com/shift-reduce-parsing
Department of CSE , AIET
12
Download