Uploaded by Abdullah Falak

Compiler Construction Basics

advertisement
Muhammad Abdullah Falak
BSCS 5TH C
13961
COMPILER CONSTRUCTION
MA’AM FARAH NAZ
ASSIGNMENT # 01
PROBLEM STATEMENT:
Explain the followings:
•
•
•
•
•
Languages Translator
Types of Languages Translators
Cousins of Compiler
Phases of Compiler
Overview of Compiler
SOLUTION:
1.
Languages Translator:
Computers are electronic devices that can only understand machine-level binary code (0/1 or
on/off), and it is extremely difficult to understand and write a program in machine language,
so developers use human-readable high-level and assembly instructions. To bridge that gap,
a translator is used, which converts high-level instructions to machine-level instructions (0
and 1).
The translator is a programming language processor that converts a high-level or assembly
language program to machine-understandable low-level machine language without sacrificing
the code's functionality.
A translator is a programming language processor that modifies a computer program from one
language to another. It takes a program written in the source program and modifies it into a
machine program. It can find and detect errors during translation.
2.
Types of Languages Translators:
There are various types of a translator which are as follows −
Compiler − A compiler is a program that translates a high-level language (for
example, C, C++, and Java) into a low-level language (object program or
machine program). The compiler converts high-level language into low-level
language using various phases. A character stream inputted by the customer goes
through multiple stages of compilation which at last will provide target
language.
• Pre-Processor − Pre-Processor is a program that processes the source code
before it passes through the compiler. It can perform under the control of what
is referred to as pre-processor command lines or directives.
• Assembler − An assembler is a translator who translates an assembly language
program into an equivalent machine language program of the computer.
Assembler provides a friendlier representation than a computer 0’s and 1’s that
simplifies writing and reading programs.
An assembler reads a single assembly language source document and creates an object
document including machine instructions and bookkeeping data that supports to merge of
various object files into a program.
•
•
•
•
Interpreter − An interpreter is a program that executes the programming code
directly rather than only translating it into another format. It translates and
executes programming language statements one by one.
Macros − Many assembly languages support a “macro” facility whereby a
macro statement will translate into a sequence of assembly language statements
and possibly other macro statements before being translated into machine code.
Therefore, a macro facility is a text replacement efficiency.
Linker − Linker is a computer program that connects and combines multiple
object files to create an executable file. All these files might have been compiled
by a separate assembler. The function of a linker is to inspect and find referenced
modules/routines in a program and to decide the memory location where these
codes will be loaded creating the program instruction to have an absolute
reference.
• Loader − The loader is an element of the operating framework and is liable for
loading executable files into memory and implementing them. It can compute
the size of a program (instructions and data) and generate memory space for it.
It can initialize several registers to start execution.
It creates a new address space for the program. This address space is huge to influence the text
and data segments, along with a stack segment. It can repeat instructions and data from the
executable file into the new address space.
Cousins of Compiler:
1. Preprocessor
2. Assembler
3. Loader and Link-editor
Preprocessor
A preprocessor is a program that processes its input data to produce output
that is used as input to another program. The output is said to be a preprocessed
form of the input data, which is often used by some subsequent programs like
compilers.
They may perform the following functions :
1.
Macro processing 3.
Rational Preprocessors
2.
File Inclusion
4.
Language extension
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input sequence
should be mapped to an output sequence according to a defined procedure. The
mapping process that instantiates a macro into a specific output sequence is
known as macro expansion.
2. File Inclusion:
The preprocessor includes header files in the program text. When the
preprocessor finds an #include directive it replaces it with the entire content of
the specified file.
3. Rational Preprocessors:
These processors change older languages with more modern flow-ofcontrol and data-structuring facilities.
4. Language extension:
These processors attempt to add capabilities to the language by what
amounts to built-in macros. For example, the language Equal is a database query
language embedded in C.
Assembler
The assembler creates object code by translating assembly instruction
mnemonics into machine code. There are two types of assemblers:
·
One-pass assemblers go through the source code once and assume that all
symbols will be defined before any instruction that references them.
·
Two-pass assemblers create a table with all symbols and their values in
the first pass, and then use the table in a second pass to generate code
Fig. 1.7 Translation of a statement
Linker and Loader
A linker or link editor is a program that takes one or more objects
generated by a compiler and combines them into a single executable program.
Three tasks of the linker are
1. Searches the program to find library routines used by program, e.g. printf(),
and math routines.
2. Determines the memory locations that code from each module will occupy and
relocates its instructions by adjusting absolute references 3. Resolves
references among files.
A loader is the part of an operating system that is responsible for loading
programs in memory, one of the essential stages in the process of starting a
program.
Phases of Compiler:
The 6 phases of a compiler are:
1.
2.
3.
4.
5.
6.
Lexical Analysis
Syntactic Analysis or Parsing
Semantic Analysis
Intermediate Code Generation
Code Optimization
Code Generation
1.Lexical Analysis: Lexical analysis or Lexical analyzer is the initial stage or
phase of the compiler. This phase scans the source code and transforms the
input program into a series of a token.
A token is basically the arrangement of characters that defines a unit of
information in the source code.
NOTE: In computer science, a program that executes the process of lexical
analysis is called a scanner, tokenizer, or lexer.
You can gain in-depth knowledge of lexical analysis to get a better
understanding.
Roles and Responsibilities of Lexical Analyzer
•
It is accountable for terminating the comments and white spaces from the
source program.
• It helps in identifying the tokens.
• Categorization of lexical units.
2. Syntax Analysis:
In the compilation procedure, the Syntax analysis is the second stage. Here the
provided input string is scanned for the validation of the structure of the
standard grammar. Basically, in the second phase, it analyses the syntactical
structure and inspects if the given input is correct or not in terms of
programming syntax.
It accepts tokens as input and provides a parse tree as output. It is also known as
parsing in a compiler.
Roles and Responsibilities of Syntax Analyzer
•
•
•
•
Note syntax errors.
Helps in building a parse tree.
Acquire tokens from the lexical analyzer.
Scan the syntax errors, if any.
3. Semantic Analysis: In the process of compilation, semantic analysis is the
third phase. It scans whether the parse tree follows the guidelines of language. It
also helps in keeping track of identifiers and expressions. In simple words, we
can say that a semantic analyzer defines the validity of the parse tree, and the
annotated syntax tree comes as an output.
Roles and Responsibilities of Semantic Analyzer:
•
Saving collected data to symbol tables or syntax trees.
• It notifies semantic errors.
• Scanning for semantic errors.
4. Intermediate Code Generation: The parse tree is semantically confirmed;
now, an intermediate code generator develops three address codes. A middlelevel language code generated by a compiler at the time of the translation of a
source program into the object code is known as intermediate code or text.
Few Important Pointers:
•
•
•
A code that is neither high-level nor machine code, but a middle-level
code is an intermediate code.
We can translate this code to machine code later.
This stage serves as a bridge or way from analysis to synthesis.
Roles and Responsibilities:
•
Helps in maintaining the priority ordering of the source language.
• Translate the intermediate code into the machine code.
• Having operands of instructions.
5. Code optimizer: Now coming to a phase that is totally optional, and it is
code optimization. It is used to enhance the intermediate code. This way, the
output of the program is able to run fast and consume less space. To improve
the speed of the program, it eliminates the unnecessary strings of the code and
organizes the sequence of statements.
Roles and Responsibilities:
•
Remove the unused variables and unreachable code.
• Enhance runtime and execution of the program.
• Produce streamlined code from the intermediate expression.
6. Code Generator: The final stage of the compilation process is the code
generation process. In this final phase, it tries to acquire the intermediate code
as input which is fully optimized, and map it to the machine code or language.
Later, the code generator helps in translating the intermediate code into the
machine code.
Roles and Responsibilities:
•
•
Translate the intermediate code to the target machine code.
Select and allocate memory spots and registers.
Overview of Compiler
Computers are a balanced mix of software and hardware. Hardware is just a piece
of mechanical device and its functions are being controlled by a compatible
software. Hardware understands instructions in the form of electronic charge,
which is the counterpart of binary language in software programming. Binary
language has only two alphabets, 0 and 1. To instruct, the hardware codes must
be written in binary format, which is simply a series of 1s and 0s. It would be a
difficult and cumbersome task for computer programmers to write such codes,
which is why we have compilers to write such codes.
Language Processing System
We have learnt that any computer system is made of hardware and software. The
hardware understands a language, which humans cannot understand. So we write
programs in high-level language, which is easier for us to understand and
remember. These programs are then fed into a series of tools and OS components
to get the desired code that can be used by the machine. This is known as
Language Processing System.
The high-level language is converted into binary language in various phases.
A compiler is a program that converts high-level language to assembly language.
Similarly, an assembler is a program that converts the assembly language to
machine-level language.
Let us first understand how a program, using C compiler, is executed on a host
machine.
•
•
•
•
•
User writes a program in C language (high-level language).
The C compiler, compiles the program and translates it to assembly
program (low-level language).
An assembler then translates the assembly program into machine
code (object).
A linker tool is used to link all the parts of the program together for
execution (executable machine code).
A loader loads all of them into memory and then the program is
executed.
Before diving straight into the concepts of compilers, we should understand a few
other tools that work closely with compilers.
Preprocessor
A preprocessor, generally considered as a part of compiler, is a tool that produces
input for compilers. It deals with macro-processing, augmentation, file inclusion,
language extension, etc.
Interpreter
An interpreter, like a compiler, translates high-level language into low-level
machine language. The difference lies in the way they read the source code or
input. A compiler reads the whole source code at once, creates tokens, checks
semantics, generates intermediate code, executes the whole program and may
involve many passes. In contrast, an interpreter reads a statement from the input,
converts it to an intermediate code, executes it, then takes the next statement in
sequence. If an error occurs, an interpreter stops execution and reports it. whereas
a compiler reads the whole program even if it encounters several errors.
Assembler
An assembler translates assembly language programs into machine code. The
output of an assembler is called an object file, which contains a combination of
machine instructions as well as the data required to place these instructions in
memory.
Linker
The linker is a computer program that links and merges various object files
together in order to make an executable file. All these files might have been
compiled by separate assemblers. The major task of a linker is to search and locate
referenced modules/routines in a program and to determine the memory location
where these codes will be loaded, making the program instruction have absolute
references.
Loader
The loader is a part of the operating system and is responsible for loading
executable files into memory and executing them. It calculates the size of a
program (instructions and data) and creates memory space for it. It initializes
various registers to initiate execution.
Cross-compiler
A compiler that runs on the platform (A) and is capable of generating executable
code for the platform (B) is called a cross-compiler.
Source-to-source Compiler
A compiler that takes the source code of one programming language and
translates it into the source code of another programming language is called a
source-to-source compiler.
Download