3.2 Functions and Purposes of Translators

advertisement
3.2 Functions and Purposes of
Translators
Compiled by Benjamin
Muganzi
Computing 9691
Paper 3
1
Assembly language and Machine code
• An assembly language is a low-level programming language for a
computer, microcontroller, or other programmable device, in which
each statement corresponds to a single machine code instruction
• Assembly language uses mnemonics to represent each low-level machine
operation (instruction) and opcode (Register or other resources).
• Machine code or machine language is the binary commands that
can be directly executed by a processor. They are 1s and 0s. Their
order tells the computer what to do.
Compiled by Benjamin
Muganzi
• This code is the lowest level of software. All other kinds of software need to be
translated into machine code before they can be used.
• An Assembler is the utility software used to convert Assembly
language code to machine code.
2
How an assembler converts assembly
language code to machine code
Compiled by Benjamin
Muganzi
• An assembler creates object code by translating assembly
instruction mnemonics into opcodes, and by resolving symbolic
names for memory locations and other entities like registers etc.
• An opcode is a single instruction that can be executed by the CPU.
In machine language it is a binary or hexadecimal value such as 'B6'
loaded into the instruction register. (Refer to topic 3.3)
• In assembly language mnemonic form, an opcode is a command
such as MOV or ADD or JMP.
– For example MOV, AL, 34h
• The opcode is the MOV instruction. The other parts are called the
'operands'.
• Operands are manipulated by the opcode. In this example, the
operands are the register named AL and the value 34 hex.
3
Interpretation and Compilation
• Object code is the product obtained when source code is
translated by a compiler. Object code may almost be machine
code or close to machine code.
Compiler
Interpreter
Assembly Code
Object Code
Compiled by Benjamin
Muganzi
Source Code
Machine Code
Assembler
4
Interpreters and Compilers
• Levels of Programming Languages
• Low level (Machine code)
• Mid level (Assembly)
• High Level (C/C++, VB, Java, Fortran, etc)
Compiled by Benjamin
Muganzi
• Code written in Mid level and high level
languages
(i.e. source code) must be translated to machine
code, which is what the processor understands.
• Translators are used. The types of translators are:
• Compilers
• Interpreters
5
Interpreters
Compiled by Benjamin
Muganzi
• An interpreter takes each instruction in turn
and translates it into machine code.
• The translated instruction is executed before
the next instruction is translated.
• This is particularly useful when there is not
enough memory to hold the compiled
Program especially on old computers.
6
Compilers
Compiled by Benjamin
Muganzi
• A compiler translates source code into machine or
object code
• A compiler translates the whole program as one
complete unit to create an executable file
• The high-level language version of the program is
called the source code and the resulting machine
code program is called the object code.
7
Compilers –Adv. And Disadv.
• Advantages.
– the translation is done once only and as a separate
process. The program that is run is already translated
into machine code so is much faster in execution.
– compiled programs can run on any computer
• Disadvantages
Compiled by Benjamin
Muganzi
– A Compiler uses a lot of computer resources. It has to
be loaded in the computer's memory at the same
time as the source code, and there has to be sufficient
memory to hold the object code.
– When an error in a program occurs it is difficult to pinpoint its source in the original program
8
Interpreters – Adv. And Disadv.
Advantages
–
–
–
•
Need less memory than compilers (useful in early computers
which had limited power and memory).
As the error messages when the error is produced on the line
it is encountered it is easier to identify / isolate the instruction
causing the problem.
Individual segments can be run without needing compile the
whole program.
Disadvantages
–
–
Compiled by Benjamin
Muganzi
•
every line has to be translated each time it is executed, thus
interpreters tend to be slow.
As interpreter doesn't create an object file the source code
must be distributed along with the interpreter in order for the
user to run the software.
9
Interpreters vs. Compilers
COMPILER
Fast, creates executable file that runs directly on the CPU
INTERPRETER
Slower, interprets code one line at a time
Debugging is more difficult. One error can produce many Debugging is easier. Each line of code is analysed and
spurious errors
checked before being executed
Less likely to crash as the instructions are being carried
out either on the interpreters' command line or within a
virtual machine environment which is protecting the
computer from being directly accessed by the code.
Easier to protect Intellectual Property as the machine
code is difficult to understand
Weaker Intellectual property as the source code (or
bytecode) has to be available at run time. For example if
you write a Flash Actionscript application, you can easily
get de-compilers that convert the p-code back into
actionscript source code (unless you use encryption, but
that is another story).
Uses more memory - all the execution code needs to be
loaded into memory, although tricks like dynamic Link
Libraries lessen this problem
Unauthorised modification to the code more difficult.
The executable is in the form of machine code. So it is
difficult to understand program flow.
Uses less memory, source code only has to be present
one line at a time in memory
Compiled by Benjamin
Muganzi
More likely to crash the computer. The machine code is
running directly on the CPU
Easier to modify as the instructions are at a high level
and so the program flow is easier to understand and
10
modify
The Translation process
• Source code goes through
various stages and is
converted into
Intermediate Language.
• From this, final object
code is generated, which
can be optimized.
Compiled by Benjamin
Muganzi
• The basic translation
process is identical for
both interpreters and
compilers.
11
• White spaces, blank lines and comments are removed from the code.
• Using the grammar of the language being used, the lexical analyzer assigns
tokens to a meaningful string of characters. Single characters are converted
into their ASCII codes.
• A token could be anything from 16-bit unsigned integers (starting from 256)
to simple labels.
• Variable names require extra information. A symbol table is used to keep
record of variables. This table is used throughout the translation process.
• During Lexical Analysis, only variable names are noted into the symbol
table.
• The symbol table is stored as a Linked List and searching is performed using
hashing.
• Some basic error reporting is done in this stage. E.g. Illegal Identifier…
Compiled by Benjamin
Muganzi
Lexical Analysis stage
12
Syntax analysis stage
• The Syntax Analyzer (or Parser) will analyze the tokenized
code against the grammar of the language.
• The parsing transforms the code into a data structure,
usually a Binary Tree, which is suitable for further
processing.
• Invalid command names, such as INPT instead INPUT will be
identified at this point.
• Some languages require variables to be declared before
they can be used. The syntax analyzer will catch variables
without declarations (using the symbol table).
Compiled by Benjamin
Muganzi
• All computer languages have their specific grammar
(syntax) of writing valid programming statements.
• This grammar is defined using BNF (Backus-Naur Form).
13
Syntax analysis – Example of BNF
•
•
•
Taking a very elementary language, an assignment statement may be defined to be of the form
<variable> <assignment_operator> <expression>
and expression is
<variable> <arithmetic_operator> <variable>
The parser must take the output from the lexical analyser and check that it is of this form.
If the statement is
sum := sum + number, the parser will receive
<variable> <assignment_operator> <variable> <arithmetic_operator> <variable>
which becomes <variable> <assignment_operator> <expression>
and then
<assignment statement>, which is valid.
• If the original statement is sum := sum + + number, this will be input as
<variable> <assignment_operator> <variable> <arithmetic_operator><arithmetic_operator>
<variable>
and this does not represent a valid statement hence an error message will be returned.
Compiled by Benjamin
Muganzi
•
14
Syntax analysis – semantic analysis
• During syntax analysis, certain semantic checks are carried out:
Semantic means relating to meaning in language or logic. Something may be
syntactically correct but semantically meaningless. ‘Jake ate a banana’ has
meaning and obeys the rules of English but ‘A banana ate Jake’ obeys the rules but
not the semantics.
• Note: although semantic checks check the logic to a certain extent, it is not
the same as checking for a logic error. Remember, a logic error will not
cause the program to crash, it will simply cause unexpected results. A
compiler cannot find such errors. E.g. if the programmer has written a = b +
c instead of a = b - c
Compiled by Benjamin
Muganzi
 Label checks – make sure the line a GOTO statement passes control to exists
 Flow of control checks – make sure statements are used in the correct place and
order e.g. CONTINUE can only be placed inside of a loop, IF statement matched
with correct END IF
 Declaration checks – make sure all variables have been properly declared
15
• At this stage, the address of each variable is
now calculated and stored in the symbol table
as each is encountered.
• Intermediate code is produced which, after
optimisation, is turned into executable code /
machine code
– All errors due to incorrect use of the language
have been removed by this stage
Compiled by Benjamin
Muganzi
Code Generation
16
Code optimisation
• The code optimiser will remove redundant code so the
above example could become:
c=y+3
• A compiler’s code optimiser can favour speed or
memory optimisation because in the real world you
often cannot optimise both.
Compiled by Benjamin
Muganzi
• Once the code generator has created machine code, it
tries to optimise the code to make it more efficient.
• Consider the following lines of code
x=y+3
b=x
c=b
17
Library routines
Library routines are pre-compiled modules available for use by other programs.
 Programs are made up of modules
 Commonly used modules can be compiled and stored ready for repeated use
 These modules are stored in an a library
 They are called library routines
•
•
Windows uses libraries in the form of DLL files (dynamic link library)
Because the variable names and memory addresses will be different from one use of the
library to the next, two programs are needed at runtime or when an executable file is created:
These programs are Loaders and Linkers.
•
•
A loader has the job of loading all the modules into memory
A linker resolves references in the main program, known as links or symbols, to library
routines. If a function in a library routine is called, the linker will match the call in the program
with the function in the library routine.
 Libraries reduce the amount of code that needs to be written
 If a library routine is updated, programs using it may stop working if the interfaces
between the modules change
18
Compiled by Benjamin
Muganzi
•
Sample Questions
1. Explain why the size of the memory available is
particularly relevant to the process of compilation. (4)
b) Give one advantage of the use of each of the two
translation techniques.
(2)
3. State any three stages of compilation and describe, briefly,
the purpose of each.
(6)
4. Explain, in detail, the stage of compilation known as
lexical analysis.
(6)
Compiled by Benjamin
Muganzi
2. a) Explain the difference between the two translation
techniques of interpretation and compilation. (2)
19
Download