System Softwarenew

advertisement
System Software
• System software- A system software is a collection of system
programs that perform a variety of functions i.e file editing, resource
accounting, IO management, storage management etc.
• System program – A system program (SP) is a program which aids in
effective execution of a general user’s computational requirements on
a computer system. The term execution here includes all activities
concerned with the initial input of the program text and various stages
of its processing by computer system, namely, editing, storage,
translation, relocation, linking and eventual execution.
• System programming- System programming is the activity of
designing and implementing SPs.
• The system programs of a system comprises of various translators (for
translating the HLLs to machine language) .The machine language
programs generated by various translators are handed over to
operating system (for scheduling the work to be done by the CPU
from moment to moment). Collection of such SPs is the system
software of a particular computer system
Introduction to Software Processors
A contemporary programmer very rarely programs in the one language
that a computer can really understand by itself---the so called machine
language. Instead the programmer prefers to write their program in one
of the higher level languages (HLLs). This considerably simplifies
various aspects of program development, viz. program design, coding,
testing and debugging. However, since the computer does not
understand any language other than its own machine language, it
becomes necessary to process a program written by a programmer so as
to make it understandable to the computer. This processing is generally
performed by another program, hence the term software processors.
Broadly the various software processors are classified as:
- Translators
- Loaders
- Interpreters
Program 1
Software
processor 1
Program 2
Software
processor II
Other programs
Program 3
Computer
System
Results
The software processor 1 in the figure is known as a translator. It performs the task of
converting a program written in one language (HLL program termed as program 1) into
a program written in another programming language (program 2). Software processor II
is called a loader also known as linkage editor. The loader performs some very lowlevel processing of program 2 in order to convert it into a ready –to-run program in the
machine language (program 3). This is the program form which actually runs on the
computer , reading input data, if any, and producing the results.
• Translators of programming languages are broadly classified into two
groups depending on the nature of source language accepted by them.
• An assembler is a translator for an assembly language program of a
computer. An assembly language is a low-level programming
language which is peculiar to a certain computer or a certain family of
computers.
• A compiler is a translator for a machine independent High Level language like
FORTRAN, COBOL, PASCAL. Unlike assembly language, HLLs create their own
feature architecture which may be quite different from the architecture of the
machine on which the program is to be executed. The tasks performed by a compiler
are therefore necessarily more complex than those performed by an assembler.
The output program form constructed by the translator is known as the object
program or the target program form. This is a program in a low-level language--possibly in the machine language of the computer. Thus the loader, while creating a
ready-to-run machine language program out of such program forms, does not have
to perform any real translation tasks. The loader’s task is more in the nature of
modifying or updating the parts of an object program and integrating it with other
object programs to produce a ready –to-run machine language form
• Interpreter- Another software processor is an interpreter. The
interpreter does not perform any translation of the source program.
Instead, it analyzes the source program statement by statement and
itself carries out the actions implied by each statement. An interpreter,
which is itself a program running on the computer, in effect simulates
a computer whose machine language is the programming language in
which the source program is written
Program 1
Software
Processor
Results
Data
Computer
System
Execution of HLL program using an interpreter
ASSEMBLER
• Elements of Assembly language ProgrammingAn assembly language program is the lowest level programming
language for a computer. It is peculiar to a certain computer system
and is hence machine-dependent. When compared to a machine
language, it provides three basic features which make programming a
lot easier than in the machine language
• Mnemonic operation Code- Instead of using numeric operation
codes (opcodes), mnemonics are used. Apart from providing a minor
convenience in program writing, this feature also supports indication
of coding errors,i.e misspelt operation codes.
• Symbolic operand specification- Symbolic names can be associated
with data or instructions. This provides considerable convenience
during program modification.
• Declaration of data/storage areas- Data can be declared using the
decimal notation. This avoids manual conversion of constants into
their internal machine representation.
• An assembly language statement has the following general format
[Label] Mnemonic OP Code
Operand [Operand…]
• Types of statements in an assembly language program:
• Imperative statement- An imperative assembly language statement
indicates action to be performed during execution of assembly
program. Hence each imperative statement translates into( generally
one) machine instruction
The format of machine instruction generated has the format
sign
opcode
index register
operand address
• Declarative statements- A declarative assembly language statement
declares constants or storage areas in a program. For example the
statement
A
DS
1
indicates that a storage area namely A is reserved for 1 word.
G
DS
200
indicates that a storage area namely G is reserved for a block of 200
words.
• Constants are declared using statement
ONE DC
‘1’
indicating that one is the symbolic name of the constant 1.
Many assemblers permit the use of literals. These are essentially constants directly
used in an operand field
ADD ‘=1’
= preceding the value 1 indicates that it is a literal. The value of the constant is
written in the same way as it would be written in a DC statement. Use of literals
save the trouble of defining the constant through a DC statement and naming it.
• Assembler Directives- Statements of this kind neither represent the machine
instruction to be included in the object program nor indicate the allocation of
storage for constants or program variables. These statements direct the assembler to
take certain actions during the process of assembling a program. They are used to
indicate certain things regarding how assembly of the input program is to be
performed. For example
START
100
indicating that first word of the object program to be generated by the assembler
should be placed in the machine location with address 100
Similarly, the statement
END
indicates that no more assembly language statements remain to be
processed.
AN ASSEMBLY PROCESS
• The overall process of conversion of an assembly language program
to its equivalent machine code can be broadly divided into two phases:
• Analysis phase
• Synthesis phase
Analysis
of
Source text
+
Synthesis
of
Target Text
=
Translation from
Source to Target
Text
• Analysis Phase- This phase is mainly concerned with the
understanding of syntax (rules of grammar) and semantics (rules of
meaning) of the language. The various tasks that have to be performed
during this phase are:
• Isolate the label, mnemonic operation code and operand fields of a
statement
• Enter the symbol found in label field (if any) and address of the
next available machine word into the symbol table
• Validate the mnemonic operation code by looking it up in the
Mnemonic table
• Determine the storage requirements of the statement by
considering the mnemonic operation code and operand fields of
the statement. Calculate the address of the first machine word
following the target code generated for this statement (Location
counter processing)
• Synthesis Phase- The basic task of the synthesis phase is to construct
the machine instruction for the corresponding assembly language
code. In this phase we select the appropriate machine operation code
for the mnemonic and place it in the machine instruction’s operation
code field. Operand symbols are replaced by their corresponding
addresses. The symbols and their addresses are maintained in the
analysis phase in the form of symbol tables. The various tasks that are
performed during synthesis phase are:
• Obtain the machine operation code corresponding to the
mnemonic operation code by searching the Mnemonic table
• Obtain the address of the operand from the symbol table.
• Synthesise the machine instruction or the machine form of the
constant, as the case may be.
• Location counter processing- The best way to keep track of the
addresses to be assigned is by actually using a counter called the
location counter. By convention, this counter always contain the
address of the next available word in the target program. At the start
of the processing by the assembler, the default value of the start
address (by convention generally the address 0000) can be put into
this counter. When the start statement is processed by the assembler,
the value indicated in its operand field can be copied into the counter.
Thus, the first generated machine word would get the desired address.
Thereafter whenever a statement is processed the number of machine
words required for by it would be added to to this counter so that it
always points to the next available address in the target program.
A simple Assembly Scheme-
Fig: 1
Let us start applying the translation model to the assembly scheme given. As the END
statement in the scheme is with a label, the execution of the program starts from the
statement that bears the label First.
As regards the analysis of the an assembly statement say,
FIRST
READ
A
All the information required to design the analysis phase is given. We already know the
three fields: label, opcode mnemonic and operand field. The mnemonic opcode is
checked whether it is valid or not by comparing it with the list of mnemonics of the
language provided. Once, the mnemonic turns out to be valid, we determine whether the
symbols written followed the symbol writing rules. This completes the analysis phase.
• In the synthesis phase, we determine the machine operation code for
the mnemonic used in the statement. This can be achieved by
maintaining the list of machine opcode and corresponding mnemonic
opcode. Net we take the symbol and obtain its address from the
symbol table entry done during the analysis phase. This address can
be put in operand address field of the machine instruction to give it the
final form.
• Pass Structure of an assembler-In order to understand the pass
structure of an assembler, we need to first understand its need and
significance. This can be understood with the help of an assembly
program. The assembly scheme given in fig 1, when input to an
assembler, is processed in the following way. Processing of the
START statement will lead to initialization of the location counter to
the value 100. On encountering the next statement
A
DS
‘1’
the analysis phase will enter the (symbol, address) pair (A,100) into the
symbol table. Location counter will be simply copied into the
appropriate symbol table entry. The analysis phase will then find that
DS is not the mnemonic of a machine instruction, instead it is a
declarative. On processing the operand field, it will find that one
storage location is to be reserved against the name A. Therefore LC
will be incremented by 1.
On processing the next two statements, the (symbol, address) pairs
(B,101) and (FIRST,102) will be reentered into the symbol table.
After this the following instructions will be generated and inserted
into the target program
Address
Instruction
opcode
operand Address
102
09
100
103
09
101
104
04
100
105
02
101
generation of these instructions is quite straightforward since the
opcodes can be picked up from the mnemonics table and the operand
addresses from the symbol table.
The next statement to be processed is:
TRIM
LARGEG
While synthesizing the machine instruction for this statement, the
mnemonic TRIM would be translated into machine operation code
’07’. While processing the operand field, the assembler looks for
LARGEB in the symbol table. However this symbol is not present
there. On looking at the source program again, we find that the
symbol LARGEB does appear in the label field of third-last assembly
statement in the program
• The problem arising in processing this reference to symbol LARGEB
belongs to assembler rather than the assembly program being
translated. This problem arises as the definition of LARGEB occurs in
the program after its reference. Such a reference is called forward
reference . We can see that similar problems will arise for all the
forward references. Thus we have to find a solution to this problem
of assembling such forward references.
• On further analysis of situation. We can see that this problem is not
any shortcoming of our translation model but it is the result of our
application of the translation model to an arbitrary piece of the source
program, namely a statement of the assembly language. For the
translation to succeed, we must select a meaningful unit of the source
program which can be translated independent of subsequent units in it.
In order to characterize the translation process on this basis, we
introduce the concept of a translator pass, which is defined as:
• A translator pass is one complete scan of the source program input
to the translator, or its equivalent representation
•
Multipass Translation – Multipass translation of the assembly
language program can take care of the problem of forward references.
Most practical assemblers do process an assembly program
in multiple passes. The unit of source program used for the purpose of
translation is the entire program.
• While analyzing the statements of this program for the first time, LC
processing is performed and symbols defined in the program are
entered into the symbol table.
• During the second pass, statements are processed for the purpose of
synthesizing the target form. Since all the defined symbols and their
addresses can be found in the symbol table, no problems are faced in
assembling the forward references.
In each pass, it is necessary to process every statement of the
program. If this processing is performed on the source form of the
program, there would be a certain amount of duplication in the actions
performed by each pass. In order to reduce this duplication of effort,
the results of analyzing a source statement by the first pass are
represented in an internal form of the source statement. This form is
popularly known as the intermediate code of the source statement.
Symbol
Table
Source
Program
Pass II
Pass I
Intermediate
Code
Assembler
Target
Program
• Single Pass Translation- Single pass translation also tackles the
problem of forward references in its own way. Instructions containing
forward references are left incomplete until the address of the
referenced symbol becomes known. On encountering its definition, its
address can be filled into theses instructions. Thus, instruction
corresponding to the statement
TRIM
LARGEB
the statement will only be partially synthesized. Only the operation
code ’07’ will be assembled to reside in location 106. The need for
putting in the operand address at a later stage can be indicated by
putting in some information into a Table of Incomplete Instructions
(TII). Typically, this would be a pair (106,LARGEB). At the end of
the program assembly, all entries in this table can be processed to
complete such instructions.
• Single pass assemblers have the advantage that every source statement
has to be processed only once. Assembly would thus proceed faster
than in the case of multipass assemblers. However, there is a
disadvantage. Since both the analysis and synthesis have to be done
by the same pass, the assembler can become quite large.
• Design of a two-pass assembler- The design of two pass assembler
depends on the type of tasks that are done in two passes of assembler.
The pass wise grouping of tasks in a two-pass assembler is:
• Pass 1-
– Separate the symbol, mnemonic opcode and
operand fields
– Determine the storage required for every assembly
language statement and update the location
counter
– Build the symbol table
– Construct intermediate code for every assembly
language statement
• Pass II
– Synthesize the target code by processing the
intermediate code generated during pass 1
• Pass 1- In pass 1 of the assembler, the main task lies in maintenance
of various tables used in the second pass of the translation. Pass 1 uses
the following data structures for the purpose of assembly:
– OPTAB: A table of mnemonic opcodes and certain related
information
– SYMTAB: The Symbol table
– LITTAB: A table of literals used in the program
• Functioning of pass 1 centers around the interpretation of entries in
OPTAB. After label processing for every source statement, the
mnemonic is isolated and searched in OPTAB. If it is not present in
OPTAB, an error is indicated and no further processing needs to be
done for the statement. If present, the second field in its (OPTAB)
entry is examined to determine whether the mnemonic belongs to the
class of imperative, declarative or assembler directive statements. In
the case of an imperative statement, the length field contains the
length of the corresponding machine instruction. This is simply added
to the LC to complete the processing of this statement.
• For both assembler directive and declarative statements, the ‘Routine id’ field
contains the identifier of a routine which would perform the appropriate processing
for the statement. This routine would process the operand field of the statement to
determine the amount of storage required by this statement and update the LC
appropriately.
• Similarly for an assembler directive the called routine would perform appropriate
actions before returning. In both these cases, the length field is irrelevant and hence
ignored.
• Each SYMTAB entry contains symbol and address fields. It also contains two
additional fields ‘Length’ and ‘other information’ to cater for certain peculiarities of
the assembly.
• In the format of literal table LITTAB, each entry of the table consists of two fields,
meant for storing the source form of a literal and the address assigned to it. In the
first pass, it is only necessary to collect together all literals used in a program. For
this purpose, on encountering a literal, it can be simply looked up in the table. If not
found, a new entry can be used to store its source form. If a literal already exists in
the table, it need not be entered a new. However possibility of multiple literal pools
existing in a program forces us to use a slightly more complicated scheme. When we
come across a literal in the assembly statement, we have to find out whether it
already exists in current pool of literals. Therefore awareness of different literal
pools has to be built into the LITTAB organization. The auxiliary table POOLTAB
achieves this effect. This table contains pointers to the first literal of every pool. At
any stage, the start of the current pool is indicated by the last of the active pointers in
POOLTAB. This pool extends up to the last occupied entry of LITTAB.
Meanings of some other assembler directives
• ORIGIN- The format of this directive is:
ORIGIN
address specification
The address specification is any expression which evaluates to a value of type
‘address’. The directive indicates that the location counter should be set to the
address given by the address specifications.
• EQU- The EQU statement simply defines a new symbol and gives it the value
indicated by its operand expression.
• LTORG- A literal is merely a convenient way to define and use a constant.
However, there is no machine instruction which can directly use or operate on a
value. Thus while assembling a reference to a literal, the following responsibilities
devolve on the assembler.
– Allocation of a machine location to contain the value of literal during execution
– Use of the address of this location as the operand address in the statement
referencing the literal
Locations for accommodating the literals cannot be determined arbitrarily by the
assembler. One criteria for selecting the locations is that control should never reach
any of them during execution of the program. Secondly they should be so allocated
as not to interfere with the intended arrangement of program variables and
instructions in the storage.
• By convention, all literals are allocated immediately following the
END statement. Alternatively, the programmer can use the LTORG
statement to indicate the place in the program where the literals may
be allocated. At every LTORG Statement, the assembler allocates all
literals used in the program since the start of the program or since the
last LTORG statement. Same action is done at the END statement. All
references to literals in an assembly program are thus forward
references by definition
• Difference between passes and phases of an assembler
Phases
Passes
• Phases of an assembler define
Pass defines the part of total
the overall process of translation
translation task to be performed
of an assembly language program to during one scan of the source
machine language program.
program or its equivalent
• There are two phases of an
Assembler----Analysis phase
And synthesis phase
There can be any number of
passes ranging from one to n
INTERMEDIATE CODE FORMS
Simultaneous with the processing of imperative, declarative and assembler directive
statements, pass 1 of the assembler must also generate the intermediate code for the
processed statements to avoid repetitive analysis of same source program
statements. Variant forms of intermediate codes, specifically operand and address
fields, arise in practice due to trade off between processing efficiency and memory
economy.
• Intermediate Code---variant 1
Features of this intermediate code form is given below:
• The label field has no significance in intermediate code
• The source form of mnemonic field is replaced by a code depending on the class of
the statement.
– For imperatives, this code is the machine language operation code itself. Class
name can also be added with the opcode. The class name for imperatives is IS.
For example the mnemonic Read will be represented as
09
or
(IS,09)-------(statement class, code)
– For declarative and assembler directives, this code is a flag indicating the
class and a numeric identifying the ordinal number within the class. The class
names for directive and declaratives are AD and DL respectively
Example: AD#5 or (AD,05) stands for Assembler Directive whose ordinal
number is 5 within the class of directives
• The operand field of a statement is also completely processed.
– The Constants appearing as operands are replaced by their internal machine
representation in decimal, binary, octal or hexadecimal as the case may be, to
simplify their processing in pass II. This fact is indicated by a suffix I as in 200I .
This representation, nowadays, uses an (operand class, code) pair. The operand
class for a constant is given as C and for Symbols and Literals it is S or L. For
constants , the code field includes the internal representation of the constant
itself. Thus ,
START 200 will be represented as (AD,01) (C, 200)
– A Symbol referenced in the operand field is searched in the symbol table, and if
not already present, it is entered as a new symbol. The symbol’s appearance in
the operand field is indicated by the code S#n or (S,n) standing for ‘Symbol
number n’ in the intermediate code. Here n refers to the ordinal number of
operand’s entry in the SYMTAB Thus, reference to A is indicated as S#1 or
(S,01), reference to NEXT as S#3 or (S,03) etc.
– Reference to Literal is indicated as L#m or (L,m) where the concerned literal
happens to occupy the mth entry in LITTAB.
•
•
Since a symbol is entered into SYMTAB on its definition or its first reference whichever
comes first, this gives rise to two kinds of entries in SYMTAB.
– A symbol whose definition appears before any reference to it exists in the table
along with its allocated address (Type 1 entry)
– A symbol whose reference is encountered before its definition exists in the table
without any machine address (Type 2 Entry)
This difference should be taken into account while processing the appearance of a symbol in
the label field
LOOP
START
200
LOAD
=‘5’
STORE
A
LOAD
A
SUB
=‘1’
DS
1
TRANS
NEXT
LTORG
AD #1
04
05
04
02
DL#1
06
AD #5
200I
L#1
S#1
S#1
L#2
11
S#3
Intermediate Code-----variant 1
Intermediate Code Variant 1
Code for registers is entered as (1-4 for AREG-DREG)
Codes doe condition code is entered as 1-6 for LT-ANY
• Intermediate Code Variant II- In this form of intermediate code, the
mnemonic is processed in a manner analogous to variant 1 of the
intermediate code. The operand fields of the source statements are
selectively processed.
• For assembler directives and declaratives, processing of the operand
fields is essential since it influences manipulation of the location
counter. Hence these fields contains the processed form.
• For imperative statements, operand field is processed only for
identifying the literal references. Literals are entered into LITTAB.
In the intermediate code, literal references can be retained in the
source form or optionally they can be indicated in the form L#m or
(L,m). Symbol references appearing in the source statement are not at
all processed during pass 1.
LOOP
START
200
LOAD
=‘5’
STORE
A
LOAD
A
SUB
=‘1’
DS
1
TRANS
NEXT
LTORG
AD #1
04
05
04
02
DL#1
06
AD #5
200I
L#1
A
A
L#2
1I
NEXT
Intermediate Code-----variant II
Intermediate Code variant -II
•
•
•
•
•
•
Assembler Directives
START
01
END
02
ORIGIN
03
EQU
04
LTORG
05
Assembler Declaratives
DC
01
DL
02
• Comparison of the two variants
• Variant 1 of the intermediate code appears to require extra work since
operand fields are completely processed. However, this considerably
simplifies the tasks of pass II. Assembler directives and declarative
statements would require some marginal processing, while the
imperatives only require one reference to the appropriate table for
obtaining the operand address. The intermediate code is quite
compact. If each operand reference like S#n can be fitted into the
same number of bits as an operand address in a machine language
instruction, then the intermediate code is as compact as the target code
itself.
• By using variant II the work of pass I is reduced by transferring the
burden of operand filed processing from pass I and pass II of the
assembler. The intermediate code is less compact since the operand
field of most imperatives is the source form itself. On the other hand,
by requiring pass II to perform more work, the functions and storage
requirements of the two passes are better balanced. This might lead to
reduced storage requirements of the assembler as a whole. Variant II
is particularly suited if expressions are permitted in the operand fields
of an assembly statement.
• Pass II of the assembler- The main task of pass II of the assembler is
to generate the machine code for the source code given to the
assembler. Regarding the nature of the target code, there are basically
two options
• Generation of machine language program
• Generation of some other slightly different form to conform to the
input requirements of a linkage editor or loader. Such an output form
is known as object module.
• Listing and Error Indication- Design of the error indication scheme
involves some critical decisions which influence its effectiveness,
storage requirements and possibly the speed of the assembly. The
basic choice involved is whether to produce program listing and error
reports in the first pass itself or delay the action until the second pass.
• If listing and error indications are performed in pass I, then as soon as
the processing of a source statement is completed, the statement can
be printed out along with the errors(if any). The source form of the
statement need not be retained after this point.
• If listing and error indications are performed only in pass II, the
source form of the statement need to be available in pass II as well.
For this purpose, entire source program may be retained in storage
itself or it may be written out on a secondary storage device in the
form of a file.
• Thus in the first approach, the execution of the assembler will slow
down due to additional IO operations whereas the second approach
will lead to increased storage requirements.
• MACROS AND MACRO PROCESSORS
• Definition- A macro is a unit of specification for program
generation through expansion. It is common experience in assembly
language programming that certain sequence of instructions are
required at a number of places in the program It is very cumbersome
to repeat the same sequence of instructions wherever they are needed
in an assembly scheme. This repetitive task of writing the same
instructions can be avoided using macros.
• Macros provide a single name for a set of instructions The assembler
performs the definition processing for the macro inorder to remember
its name and associated assembly statements. The assembler performs
the macro expansion for each use of macro, replacing it with the
sequence of instructions defined for it.
• A macro consists of a name, a set of formal parameters and a body of
code.
• A macro definition is placed at the start of the program, enclosed
between the statements
MACRO
MEND
• Thus a group of statements starting with MCARO and ending with
MEND constitutes one macro definition unit. If many macros are to
be defined in a program, as many definitions units will exist at the
start of the program. Each definition unit names a new operation and
defines it to consist of a sequence of assembly language statements.
• The operation defined by a macro can be used by writing the macro
name in the mnemonic field and its operands in the operand field of an
assembly statement. Appearance of a macro name in the mnemonic
field amounts to a call on the macro. The assembler replaces such a
statement by the statement sequence comprising the macro. This is
known as macro expansion. All macro calls in a program are
expanded in the same fashion giving rise to a program form in which
only the imperatives actually supported by the computer appear along
with permitted declaratives and assembler directives. This program
form can be assembled by a conventional assembler. Two kinds of
expansions are identified
• Lexical Expansion- Replacement of character string by another
character string during program generation
• Semantic Expansion- Semantic expansion implies generation of instruction tailored
to the requirements of a specific usage---for example, generation of type specific
instructions for manipulation of byte and word operands. Semantic expansion is
characterized by the fact that different uses of a macro can lead to codes which
differ in the number, sequence and opcodes of instructions. For example, The
following sequence of instructions is used to increment the values ina memory word
by a constant:
• Move the value from the memory word into a machine register
• Increment the value in the machine register
• Move the new value into the memory word
Since the instruction sequence MOVE-ADD-MOVE may be used a number of times
in a program, it is convenient to define a macro named INCR. Using lexical
expansion, the macro call INCR A,B,AREG can lead to the generation of a MOVEADD-MOVE instruction sequence to increment A by the value of B using AREG to
perform the arithmetic.
Use of semantic expansion can enable the instruction sequence to be adapted to the
types of A and B. For example Intel 8088, an INC instruction could be generated if
A is a byte operand and B has the value ‘1’ while a MOV-ADD-MOV sequence can
be generated in all other situations.
•
•
•
•
•
•
•
•
Definition of macro- A macro definition is enclosed between a macro header statement and
a macro end statement. Macro definitions are located at the start of the program. A macro
definition consists of
A macro prototype statement
One or more model statements
Macro preprocessor statements
MACRO
………….Macro header statement
INCR
&X,&Y ………macro prototype statement
LOAD
&X
ADD
&Y
Model Statements
STORE
&X
MEND
………… End of definition unit
Macro header statement indicates the existence of a macro definition unit. Absence of header
statement as the first statement of a program or the first statement following the macro
definition unit, signals the start of the main assembly language program.
The prototype of the macro call indicates how the operands in any call on macro would be
written . The macro prototype statement has the following syntax:
<macroname> [<formal parameter spec>[,..]]
where <macroname> appears in the mnemonic field of an assembly statement and formal
parameter spec is of the form
&<parameter name>[<parameter kind]
Model statements are the statements that will be generated by the expansion of the macro
Macro preprocessor is used to perform some auxiliary functions during macro expansion.
• Macro Call---A macro call leads to macro expansion. During macro
expansion, the macro call statement is replaced by sequence of
assembly statements. The macro call has the syntax:
<macroname> [<actual parameter spec> [,..]]
where actual parameter spec resembles that of the operand
specification in an assembly statement.
• To differentiate between the original statements of a program and the
statements resulting from the macro expansion, each expanded
statement is marked with a ‘+’ preceding its label field.
Two key notions concerning macro expansion are:
• Expansion time control flow- This determines the order in which the
model statements are visited during macro expansion.
• Lexical substitution- Lexical substitution is used to generate an
assembly statement from a model statement.
• Flow of control during expansion- The default flow of control during
macro expansion is sequential. Thus in absence of preprocessor
statements, the model statements of the macro are visited sequentially
starting from statement following the macro prototype statement and
ending with the statement preceding the MEND statement. A
preprocessor statement can alter the flow of control during expansion
such that some model statements are either never visited during
expansion or are repeatedly visited during expansion. The former
results in conditional expansion and latter in expansion time loops.
The flow of control during macro expansion is implemented using a
macro expansion counter (MEC).
• Algorithm- (Outline for macro expansion)
• Step 1: MEC:=statement number of first statement following the
prototype statement
• Step 2: Repeat while MEC not MEND statement
•
if statement = model statement
•
Expand the statement
•
MEC:=MEC+1
•
Else
•
MEC:= new value specified in the statement
• Step 3: Exit
• Lexical Substitution- The model statements of a macro consist of
three types of strings:
• An ordinary string, which stands for itself
• The name of a formal parameter which is preceded by the character
‘&’
• The name of a preprocessor variable, which is also preceded by the
character ‘&’
During lexical substitution, strings of type 1 are retained without
substitution. Strings of type 2 and 3 are replaced by the values of the
formal parameters of preprocessor variables. The value of a formal
parameter is the corresponding actual parameter string. The rules for
determining the value of the formal parameter depend on the kind of
parameter.
• Positional parameters in macro call- A positional formal parameter is written as
&<parameter name> e.g &SAMPLE where sample is the name of the parameter.
The value of the positional parameter say, XYZ is determined by the rule of
positional association as
– Find the ordinal position of XYZ in the list of formal parameters in
the macro prototype of the statement
– Find the actual parameter specification occupying the same ordinal
position in the list of actual parameters in the macro call statement.
• Keyword parameters- For keyword parameters, in formal parameter specification,
<parameter name> is an ordinary string and <parameter kind> is the string ‘=‘ in the
syntax.
&<parameter name>[ <parameter kind>]
• The <actual parameter spec> is written as <formal parameter name>=<ordinary
string>. The value of a formal parameter XYZ is determined by the rule of keyword
association as follows:
• Find the actual parameter specification which has the form XYZ= <ordinary string>
• Let <ordinary string> in the specification be the string ABC. Then the value of
formal parameter XYZ is ABC.
• The ordinal position of the specification XYZ=ABC in the list of actual parameters
is immaterial.
• Example of macro call with keyword parameters
INCR MEM_VAL=A,INCR_VAL=B, REG=AREG
Macro definition---MACRO
INCR_M
&MEM_VAL=,&INCR_VAL=,&REG=
MOVER
&REG,&MEM_VAL
ADD
&REG,&INCR_VAL
MOVEM
&REG, &MEM_VAL
MEND
• Default specification for parameters- A default is a standard assumption in the
absence of an explicit specification by the programmer. Default specification of
parameters is useful in situations where a parameter has the same value in most
calls. When desired value is different from the default value, the desired value can
be specified explicitly in a macro call. This specification overrides the default value
of parameter for the duration of the call.
• Default value specification of keyword parameters can be incorporated by extending
the syntax for formal parameter specification as :
•
&<parameter name>[<parameter kind >[default value]]
• Example:
MACRO
INCR_M
&MEM_VAL=,&INCR_VAL=, &REG=AREG
MOVER
&REG,&MEM_VAL
ADD
&REG,&INCR_VAL
MOVEM
&REG, &MEM_VAL
MEND
INCR_M MEM_VAL=A, INCR_VAL=B
•
• Macros can also be called with mixed parameters (both positional and
keyword parameters) but all positional parameters must precede all
keyword parameters. Formal parameters can also appear in the label
and opcode fields of model statements
MACRO
CALC
&LAB
&X,&Y,&OP=MULT,&LAB
MOVER
AREG, &X
&OP
AREG, &Y
MOVEM
AREG, &X
MEND
Expansion of the call CALC A,B, LAB=LOOP leads to the
following code
+ LOOP MOVER AREG, A
+
MULT AREG, B
+
MOVEM AREG, A
• Nested macro Call- A model statement in a macro may constitute a
call on another macro. Such calls are known as nested macro calls.
The macro containing the nested call is called the outer macro and the
called macro as the inner macro. Expansion of nested macro calls
follows the last-in-first-out (LIFO) rule.
• Advanced Macro facilities
Advanced macro facilities are aimed at supporting semantic
expansion. These facilities can be grouped into
• Facilities for alteration of flow control during expansion
• Expansion time variables
• Attributes of parameters
• Alteration of flow control during expansion—Two features are
provided to facilitate alteration of flow of control during expansion
– Expansion time sequencing symbols
– Expansion time statements AIF,AGO and ANOP
• A sequencing symbol (SS) has the syntax
.<ordinary string>
SS is defined by putting it in the label field of a statement in the
macro body, It is used as an operand in an AIF or AGO statement to
designate the destination of an expansion time control transfer. It
never appears in the expanded form of a model statement
• Conditional Expansion (Expansion time statements)- While writing a
general purpose macro it is important to ensure execution efficiency
of its generated code. Conditional expansion helps in generating
assembly code specifically suited to the parameters in a macro call.
This is achieved by ensuring that a model statement is visited only
under specific conditions during the expansion of a macro. The AIF
and AGO statements are used for this purpose
• An AIF statement has the syntax
AIF (<expression>)<sequencing symbol>
where <expression> is a relational expression involving ordinary
strings, formal parameters and their attributes, and expansion time
variables. If the relational expression evaluates to true, expansion time
control is transferred to the statement containing <sequencing
symbol> in its label field.
• An AGO statement has the syntax
AGO <sequencing symbol>
and unconditionally transfers expansion time control to the statement
containing <sequencing symbol> in its label field.
• An ANOP statement is written as
<sequencing symbol> ANOP
and simply has the effect of defining the sequencing symbol.
•
MACRO
EVAL
AIF
MOVER
SUB
ADD
AGO
.ONLY MOVER
.OVER
MEND
&X,&Y,&Z
(&Y EQ &X) .ONLY
AREG, &X
AREG, &Y
AREG,&Z
.OVER
AREG, &Z
• Expansion time variables- Expansion time variables (EV) are variables which can
only be used during the expansion of macro calls. A local EV is created for use only
during a particular macro call. A global EV exists across all macro calls situated in a
program and can be used in any macro which has a declaration for it. Local and
global EVs are created through declaration statements with the following syntax:
•
LCL <EV Specification>[,<EV specification>..]
GBL <EV specification>[,<EV specification>…]
and <EV Specification> has the syntax &<EV name>, where <EV name> is an
ordinary string. Values of EV’s can be manipulated through the preprocessor
statement SET. A SET statement is written as
<EV specification> SET <SET-expression>
where <EV specification> appears in the label field and SET in the mnemonic field.
A SET statement assigns the value of <SET-expression> to the EV specified in <EV
specification>. The value of an EV can be used in any field of a model statement,
and in the expression of an AIF statement.
•
&A
&A
MACRO
CONSTANTS
LCL
SET
DB
SET
DB
MEND
&A
1
&A
&A +1
&A
A call on macro CONATANTS creates a local EV A and SET assigns a value
‘1’ for it. The first DB statement declares a byte constant ‘1’. The second SET
statement assigns the value ‘2’ to A and second DB statement declares a
constant ‘2’.
• Attributes of formal Parameters- The expressions used in AIF
statement can include parameters with attributes. An attribute is
written using the syntax
<attribute name>’<formal parameter spec>
and represents information about the value of the formal parameter
i.e about the corresponding actual parameter. The type, length and size
attributes have the names T,L and S
MACRO
DECL_CONST
&A
AIF
(L’&A EQ 1) .NEXT
------.NEXT ---------MEND
• Expansion time loops- It is often necessary to generate many similar
statements during the expansion of a macro. This can be achieved by
writing similar model statements in the macro. Alternatively, the same
effect can be achieved by writing an expansion time loop which visits
a model statement, or set of model statements, repeatedly during
macro expansion.. Expansion time loops can be written using
expansion time variables (EV’s) and expansion time control transfer
statements AIF and AGO
MACRO
CLEAR
&X,&N
LCL
&M
&M
SET
0
MOVER
AREG, ‘=0’
.MORE MOVEM AREG, &X+&M
&M
SET
&M+1
AIF
(&M NE N) .MORE
MEND
On calling the macro with CLEAR B,5
• M is initialized to zero. The expansion of model statement
MOVEM AREG, &X+&M leads to generation of the statement
MOVEM AREG ,B
The value of B is incremented by 1 and the model statement MOVEM …
is expanded repeatedly until its value equals the value of N, which is 5
in this case. Thus macro call leads to generation of the statements:
+ MOVER AREG,=‘0’
+ MOVEM AREG, B
+ MOVEM AREG, B+1
+ MOVEM AREG, B+2
• Other facilities for expansion time loops-Many assemblers provide
other facilities for conditional expansion, an ELSE clause in AIF
being an obvious example.
• The REPT statement
REPT <expression>
expression should evaluate to a numerical value during macro
expansion. The statements between REPT and ENDM statement
would be processed for expansion expression number of times.
• The IRP statement
IRP <formal parameter>,<argument list>
The formal parameter mentioned in the statement takes successive
values from the argument list. For each value, the statements between
the IRP and ENDM statements are expanded once.
•
MACRO
CONST10
LCL
&M
&M
SET
1
REPT
10
DC
‘&M’
&M
SET
&M+1
ENDM
MEND
Declares 10 constants with values 1,2,------10
•
MACRO
CONSTS
&M,&N,&Z
IRP
&Z,&M,7,&N
DC
‘&Z’
ENDM
MEND
A macro call CONST 4,10 leads to the declaration of 3 constants with the values 4,7
and 10
• Semantic Expansion- Semantic expansion is the generation of
instructions tailored to the requirements of a specific usage. It can be
achieved by a combination of advanced macro facilities like AIF and
AGO statements and expansion time variables. The CLEAR macro
and EVAL macros are the examples of instances of semantic
expansion.
MACRO
CREATE_CONST &X,&Y
AIF
(T’ &X EQ B) .BYTE
&Y
DW
25
AGO
.OVER
.BYTE ANOP
&Y
DB
25
.OVER
MEND
This macro creates a constant ’25’ with the name given by the 2nd
parameter. The type of the constant matches the type of the first
parameter
DESIGN OF MACRO PRE-PROCESSOR
• The process of macro expansion requires that the source program containing the
macro definition and call is first translated to the assembly language program
without any macro definitions or calls. This program form can be handed over to a
conventional assembler to obtain the target language form of the program.
• In such a schematic, the process of macro expansion is completely segregated from
the process of program assembly. The translator which performs macro expansion in
this manner is called a macro pre-processor. The advantage of this scheme is that
any existing conventional assembler can be enhanced in this manner to incorporate
macro processing. It would reduce the programming cost involved in making macro
facility available. The disadvantage is that this scheme is not very efficient because
of time spent in generating assembly language statements and processing them again
for the purpose of translation to the target language.
• As against this schematic of a macro preprocessor preceding a conventional
assembler, it is possible to design a macro assembler which not only processes
macro definitions and macro calls for the purpose of expansion but also assembles
the expanded statements along with the original assembly statements. The macro
assembler should require fewer passes over the program than the preprocessor
scheme. This holds out a promise for better efficiency.
• Design of macro Pre-processor- The macro preprocessor accepts an
assembly program containing definitions and calls and translates it
into an assembly program which does not contain any macro
definition call
Macro
Assembler
Preprocessor
Target
Program
Program with
Macro definitions
Program
And calls
Without macros
•
•
•
•
•
Design overview
We begin the design by listing all tasks involved in macro expansion
Step1 : Identify macro calls in the program
Step 2: Determine the values of formal parameters
Step 3: Maintain the values of expansion time variables defined in a
macro
• Step 4: Organize expansion time control flow
• Step 5: Determine the values of sequencing symbols
• Step 6: Perform expansion of a model statement
• Identify macro calls- Examine all statements in the assembly source
program to detect macro calls Scan all macro definitions one by one
for each macro defined . For each of the macro defined, perform the
following tasks:
• Enter its name in macro name Table (MNT)
• Enter the entire macro definition in macro definition table MDT
• Add auxiliary information to the MNT indicating where the definition
of a macro is found in MDT
While processing a statement, the preprocessor compares the
string found in mnemonic field with the macro names in the MNT. A
match indicates that the current statement is a macro call.
• Identify values of formal parameters- A table called the actual
parameter table (APT) is designed to hold the values of formal
parameters during the expansion of a macro call. Each entry in the
table is a pair
(<formal parameter name>,<value>)
Two items of information are required to construct this table, names of
formal parameters and default values of keyword parameters. For this
table, a table called the parameter default table (PDT) is used for each
macro. This table would be accessible from the MNT entry of a macro
and would contain pairs of the form (<formal parameter
name>,<default value>). If a macro call statement does not specify a
value for some parameter par, its default value would be copied from
the PDT to APT.
• Maintain Expansion time variables- An expansion time variable table
(EVT) is maintained for this purpose. The table contains pairs of the
form
(<EV name>,<value>)
The value field of a pair is accessed when a preprocessor statement or a
model statement under expansion refers to an EV.
• Organize Expansion time control flow- The body of a macro i.e the set
of preprocessor statements and model statements in it , is stored in a
table called the Macro definition table (MDT) for use during macro
expansion. The flow of control during macro expansion determines
when a model statement is to be visited for expansion.
• Determine values of Sequencing Symbols- A sequencing symbols
table (SST) is maintained to hold this information. The table contains
pairs of the form
(<sequencing symbol name>,<MDT entry #>)
where MDT entry # is the number of the MDT entry which contains
the model statement defining the sequencing symbol. This entry is
made on encountering a statement which contains the sequencing
symbol in its label field or on encountering a reference prior to its
definition (in case of forward reference)
• Perform Expansion of a model statement- This is the trivial task:
• MEC points to the MDT entry containing the model statement.
• Values of formal parameters and EV’s are available in APT and EVT
respectively.
• The model statement defining a sequencing symbol can be identified
from SST
Expansion of the model statement is achieved by performing a lexical
substitution for the parameters and EV’s used in the model statement.
• Data structures- The tables APT,PDT and EVT contain pairs which
are searched using the first component of the pair as a key--- for
example, the formal parameter name is used as the key to obtain its
value from APT. this search can be eliminated if the position of an
entry within a table is known when its value is to be accessed.
• The value of the formal parameter ABC is needed while expanding a
model statement using it i,e
MOVER AREG,&ABC
Let the pair (ABC,ALPHA) occupy entry #5 in APT. The search in
APT can be avoided if the model statement appears as
MOVER AREG, (P,5)
in the MDT, where (P,5) stands for the word ‘parameter #5’
Thus macro expansion can be made more efficient by storing an
intermediate code for a statement, rather than its source form, in the
MDT. All parameter names could be replaced by pairs of the form
(P, n) in model statements and preprocessor statements stored in MDT.
• To implement this simplification, ordinal numbers are assigned to all
parameters. A table named parameter Name table (PNTAB) is used
for this purpose. Parameter names are entered in PNTAB in the same
order in which they appear in the prototype statement. The entry # of
the parameter’s entry in PNTAB is now its ordinal number. This entry
is used to replace the parameter name in the model and preprocessor
statements of the macro while storing it in MDT. Thus the information
in APT has been split into two tables:
PNTAB which contains formal parameter names
APTAB which contains formal parameter values
(i.e contains actual parameters)
PNTAB is used while processing a macro definition while APTAB is
used during macro expansion.
• Similar analysis leads to splitting of EVT into EVNTAB and EVTAB
and SST into SSNTAB and SSTAB. EV names are entered in
EVNTAB while processing EV declarations. SS names are entered in
SSNTAB while processing an SS reference or definition whichever
occurs earlier.
• PDT (parameter Default table) is replaced by keyword parameter
default table(KPDTAB). This table would contain entries only for the
keyword parameters.
• Thus, each MNT entry contains three pointers MDTP,KPDTP and
SSTP which are pointers to MDT,KPDTAB, and SSNTAB for the
macro respectively. Instead of using different MDTs for different
macros, we can create a single MDT and use different sections of
table for different macros.
• Construction and use of the macro preprocessor data structures can be
summarized as follows:
• PNTAB and KPTAB are constructed by processing the prototype
statement.
• Entries are added to EVNTAB and SSNTAB as EV declarations and
SS definitions/references are encountered
• MDT entries are constructed while processing the model statements
and preprocessor statements in the macro body.
• An entry added to SSTAB when the definition of a sequencing
symbol is encountered.
• APTAB is constructed while processing a macro call.
• EVTAB is constructed at the start of expansion of a macro
•
MACRO
CLEARMEM
LCL
&M SET
MOVER
.MORE MOVEM
&M
SET
AIF
MEND
&X,&N,&REG=AREG
&M
0
&REG,=‘0’
&REG, &X+&M
&M+1
(&M NE N) .MORE
Data structures shown in the next slide include all the tables created for the given
macro. The data structures shown above the dotted line are the ones that are used
during the processing of a macro definition while the data structures between the broken
line and firm line are constructed during macro definition processing and used during
macro expansion. Data structures below the firm line are used for the expansion of a
macro call.
• Design of a macro Pre-processor- The broad details for the schematic for design of a
macro pre-processor are:
• Step1: Scan all macro definitions one by one for each macro defined
• Step 2: Enter its name in macro name Table (MNT)
• Step 3: Enter the entire macro definition in macro definition table MDT
• Step 4: Add auxiliary information to the MNT indicating where the definition of a
macro found in MDT
• Step 5: Examine all statements in the assembly source program to detect macro calls
For each macro call:
•
Locate the macro in MNT
•
Obtain information from MNT regarding position of macro definition in MDT
•
Process the macro call statement to establish correspondence between all formal
parameters and their values (that is actual parameters)
•
Expand the macro call by following the procedure given in step 6
• Step 6: Process the statements in the macro definition as find in MDT in their
expansion time order until the MEND statement is encountered . The conditional
assembly statements AIF and AGO will enforce changes in the normal sequential
order based on certain expansion time relations between values of formal parameters
and expansion time variables.
• Processing of Macro definitions- The following initializations are
performed before initiating the processing of macro definitions in a
program
KPDTAB_ptr:=1;
SSTAB_ptr:= 1;
MDT_ptr:= 1;
• Algorithm : (Processing of a macro definition)
• Step 1: SSNTAB_ptr:=1;
PNTAB_ptr:=1;
• Step 2: Process the macro prototype statement and form the MNT
entry
(a) name:= macro name;
(b) For each positional parameter
(i) Enter parameter name in PNTAB[PNTAB_ptr]
(ii) PNTAB_ptr:=PNTAB_ptr + 1;
(iii) #PP:=#PP+1;
(c ) KPDTP:=KPDTAB_ptr;
(d) For each keyword parameter
(i) Enter parameter name and default value in
KPDTAB[KPDTAB_ptr]
( ii) Enter parameter name in PNTAB[PNTAB_ptr]
(iii) KPDTAB_ptr:=KPDTAB_ptr+1;
(iv) PNTAB_ptr:=PNTAB_ptr +1;
(v) #KP:=#KP +1;
(e) MDTP:=MDTP_ptr;
(f) #EV:=0;
(g) SSTP:=SSTAB_ptr;
• Step 3: While not a MEND statement
(a) If an LCL statement then
(i) Enter expansion time variable name in EVNTAB
(ii) #EV:=EV + 1
(b) If a model statement then
(i) If label field contains a sequencing symbol then
If symbol is present in SSNTAB then
q:= entry number in SSNTAB
else
Enter symbol in SSNTAB[SSNTAB_ptr]
q:=SSNTAB_ptr;
SSNTAB_ptr:=SSNTAB_ptr+1;
SSTAB[SSTP+q-1]:=MDT_ptr;
(ii) For a parameter, generate the specification (P,#n)
(iii) For an expansion variable, generate the specification (E,#m)
(iv) Record the IC (Intermediate Code) in MDT[MDT_ptr];
(v) MDT_ptr:=MDT_ptr + 1;
(c ) If a preprocessor statement then
(i) If a SET statement
Search each expansion time variable name used in the statement in
EVNTAB and generate the spec (E,#m)
(ii) If an AIF or AGO statement then
If sequencing symbol used in the statement is present in SSNTAB then
q:= entry number in SSNTAB;
else
Enter symbol in SSNTAB[SSNTAB_ptr]
q:=SSNTAB_ptr
SSNTAB_ptr:=SSNTAB_ptr+1;
Replace the symbol by (S,SSTP+q-1)
(iii) Record the IC in MDT[MDT_ptr]
(iv) MDT_ptr:=MDT_ptr+1;
• Step 4: (MEND Statement)
If SSNTAB_ptr=1 (i.e SSNTAB is empty) then
SSTP:=0
Else
SSTAB_ptr:=SSTAB_ptr + SSNTAB_ptr-1
If #KP=0 then KPDTP=0;
• Macro Expansion- Following data structures are used to perform macro expansion
APTAB
Actual parameter table
EVTAB
EV table
MEC
MACRO Expansion Counter
APTAB_ptr APTAB pointer
EVTAB_ptr EVTAB pointer
Algorithm : (Macro expansion)
• Step 1: Perform the initializations for the expansion of a macro
(a) MEC:=MDTP field of the MNT entry
(b) Create EVTAB with #EV entries and set EVTAB_ptr
(c ) Create APTAB with #PP+#KP entries and set APTAB_ptr
(d) Copy keyword parameter defaults from the entries
KPDTAB[KPDTP]….KPDTAB[KPDTP+#KP-1] into
APTAB[#PP+1]…. APTAB[#PP+#KP]
(e) process positional parameters in the actual parameter list and copy them
into APTAB[1]….APTAB[#PP].
(f) For keyword parameters in the actual parameter list
Search the keyword name in parameter name field of
•
•
•
KPDTAB[KPDTP]…..KPDTAB[KPDTP+#KP-1]. Let KPDTAB[q]
contain a matching entry. Enter value of the keyword parameter in the call
in APTAB[#PP+q-KPDTP+1]
Step 2: While statement pointed by MEC is not MEND statement
(a) If a model statement then
(i) Replace operands of the form (P,#n) and (E,#m) by values in
APTAB[n] and EVTAB[m] respectively
(ii) Output the generated statement
(iii) MEC := MEC +1;
(b) If a SET statement with the specification (E,#m) in the label field
then
(i) Evaluate the expression in the operand field and set an appropriate
value in EVTAB[M]
(ii) MEC:=MEC+1
(c ) If an AGO statement with (S,#s) in the operand field , then
MEC:= SSTAB[SSTP+s-1];
(d) If an AIF statement with (S,#s) in the operand field, then
If condition in the AIF statement is true, then
MEC:=SSTAB[SSTP+s-1]
Step 3: Exit from macro expansion.
MACRO
COMPUTE
MOVEM
INCR_D
MOVER
MEND
COMPUTE X,Y
&FIRST, &SECOND
BREG,TMP
&FIRST, &SECOND, REG=BREG
BREG, TMP
MOVEM BREG, TMP
INCR_D X,Y
MOVER
BREG, TMP
• Nested macro calls- Two basic alternatives exist for processing nested
macro calls.
• In the first scheme, The macro expansion scheme can be applied to the
each level of expanded code to expand the nested macro calls until we
obtain a code form which does not contain any macro calls. This
scheme would require a number of passes of macro expansion which
makes it quite expensive.
• A more efficient alternative would be to examine each statement
generated during macro expansion to see if it is itself a macro call. If
so, a provision can be made to expand this call before continuing with
the expansion of the parent macro call. This avoids multiple passes of
macro expansion, thus ensuring processing efficiency. This alternative
requires some extensions in the macro expansion scheme.
• In order to implement the second scheme for macro expansion of
nested macros, two provisions are required
• Each macro under expansion must have its own set of data structures i.e MEC,
APTAB, EVTAB, APTAB_ptr and EVTAB_ptr
• An expansion nesting counter (Nest_cntr) is maintained to count the number of
nested macro calls. Nest_cntr is incremented when a macro call is recognized and
decremented when a MEND statement is encountered. Thus Nest_cntr > 1 indicates
that a nested macro call is under expansion, while Nest_cntr=0 implies that macro
expansion is not in progress currently
The first provision can be implemented by creating many copies of the
expansion time data structures. These can be stored in the form of an array. For
example we can have an array called APTAB_array, each element of which is an
APTAB. It is expensive in terms of memory as well as requirements. It also involves
a difficult design decision. Since macro calls are expanded in a LIFO manner, a
practical solution is to use a stack to accommodate the expansion time data
structures
• The stack consists of expansion records, each expansion record accommodating one
set of expansion time data structures. The expansion record at the top of the stack
corresponds to the macro call currently being expanded. When a nested macro call is
recognized, a new expansion record is pushed on the stack to hold the data structures
for the call. At the MEND, an expansion record is popped off the stack. This would
uncover the previous expansion record in the stack which houses the expansion time
data structures of the outer macro.
Previous
Expansion
record
Reserved
Pointer
RB
1(RB)
2(RB)
3(RB)
TOS
MEC
EVTAB_ptr
APTAB
EVTAB
Use of stack for macro preprocessor data structures
• The expansion record at the top of the stack contains the data
structures in current use. Record base (RB) is a pointer pointing to the
start of this expansion record. TOS (Top of Stack) points to the last
occupied entry in stack. When a nested macro call is detected, another
set of data structures is allocated on the stack. RB is now set to point
to the start of the new expansion record. MEC, EVTAB_ptr, APTAB
and EVTAB are allocated on the stack in that order. During macro
expansion, the various data structures are accessed with reference to
the value contained in RB. This is performed using the following
addresses:
•
Data structure
Address
Reserved Pointer
0(RB)
MEC
1(RB)
EVTAB_ptr
2(RB)
APTAB
3(RB) to eAPTAB + 2(RB)
EVTAB
contents of EVTAB_ptr
Where 1(RB) stands for ‘contents of RB +1’.
At a MEND statement, a record is popped off the stack by
setting TOS to the end of the previous record. It is now necessary to
set RB to point to the start of previous record in stack. This is
achieved by using the entry marked ‘reserved pointer’ in the
expansion record. This entry always points to the start of the previous
expansion record in stack. While popping off a record, the value
contained in this entry can be loaded into RB. This has the effect of
restoring access to the expansion time data structures used by the
outer macro.
• Actions at the start of expansion are:
No
Statement
1
TOS:=TOS+1
2
TOS* :=RB
3
RB:=TOS
4
1(RB) :=MDTP entry of MNT
5
2(RB):= RB + 3+ #eAPTAB
6
TOS:=TOS+ #eAPTAB + #eEVTAB +2
The first statement increments TOS to point to the first word of the
new expansion record. This is the reserved pointer. The ‘*’ mark in
the second statement TOS* := RB indicates indirection. This
statement deposits the address of the previous record base onto this
word. New RB is now established in statement 3. Statements 4 , 5 set
MEC, EVTAB_ptr respectively. Statement 6 sets TOS to point to the
last entry of the expansion record.
• Actions at the end of the expansion are:
No
Statement
1.
TOS:=RB -1
2.
RB := RB*
The first statement pops an expansion record off the stack by resetting
TOS to the value it had while the outer macro was being expanded.
RB is then made to point at the base of the previous record. Data
structures in the old expansion record are now accessible as
displacements from the new value in RB i.e MEC is 1(RB).
• Design of macro Assembler- The use of a macro preprocessor
followed by a conventional assembler is an expensive way of handling
macros since the number of passes over the source program is large
and many functions get duplicated. For example, analysis of a source
statement to detect macro calls requires us to process the mnemonic
fields. A similar function is required in the first pass of the assembler.
Similar functions of preprocessor and the assembler can be merged if
macros are handled by a macro assembler which performs macro
expansion and program assembly simultaneously. This may also
reduce the number of passes.
• It is not always possible to perform macro expansion in a single pass.
Certain kind of forward references cannot be handled in a single pass.
The problem of forward references arise when a macro call wants to
use a variable in macro call which has not been defined in the
program. This problem can be solved by using classical two pass
organization for macro expansion. The first pass collects the
information about the symbols defined in a program and the second
pass perform the macro expansion.
&Y
.BYTE
&Y
.OVER
A
MACRO
CREATE_CONST
AIF
DW
AGO
ANOP
DB
MEND
CREATE_CONST
.
.
DB
END
&X,&Y
(T’&X EQ B) .BYTE
25
.OVER
25
A, NEW_CON
?
• Pass structure of a macro assembler
• To design the pass structure of a macro assembler, we identify the
functions of a macro preprocessor and the conventional assembler
which can be merged to advantage. After merging, the functions can
be structured into passes of the macro assembler. This process leads to
following pass structure:
• Pass I
– SYMTAB construction
– Macro definition processing
• Pass II
– Macro expansion
– Memory allocation and LC processing
– Processing of literals
– Intermediate code generation
• Pass III
– Target Code generation
Pass II is large in size as it performs
many functions. Further, since it performs macro
expansion as well as Pass I of a conventional
assembler, all the data structures of the macro
preprocessor and conventional assembler need to
exist during this pass.
• Can a one-pass macro processor successfully handle a macro call with conditional
macro pseudo-ops
Consider the following case:
MACRO
WCM
&S
AIF
(&S EQ 19) .END
.END MEND
WCM V
V EQU 12
END
If a one pass processor cannot, what modifications would be necessary to enable it to
handle this type of situation
LOADERS AND LINKERS
• Execution of a program written in a language L involves the
following steps:
• Translation of the program
• Linking of the program with other programs needed for its execution
• Relocation of the program to execute from the specific memory area
allocated to it.
• Loading of the program in memory for the purpose of execution.
These steps are performed by different language processors. Step1 is
performed by the translator for language L. Steps 2 and 3 are
performed by a linker while step 4 is performed by a loader.
• Loaders- A loader is a software processor which performs some lowlevel processing of the programs input to it to produce a ready-toexecute program form. The loader is a program which accepts the
object program , prepares these programs for execution by the
computer, and initiates the execution. In particular, The loader must
perform four functions:
• Allocate space in memory for the program (allocation)
• Resolve symbolic references between the object modules (linking)
• Adjust all the address dependent locations, such as address constants,
to correspond to the allocated space (relocation)
• Physically place the machine instructions and data into memory
(loading)
Object
program
Translator
Source
program
M/C language Program
Loader
Other object
programs
M/C Language
program
Result
• LOADER SCHEMES- There are various schemes for accomplishing
the four functions of a loader.
• Compile-and-Go Loader- One method of performing the loader
functions is to have the assembler run in one part of memory and
place the assembled machine instructions and data, as they are
assembled, directly into their assigned memory locations. When the
assembly is completed, the assembler causes a transfer to the starting
instruction of the program. This is a simple solution, involving no
extra procedures.
• Such a loading scheme is commonly called “compile-and-go” or
assemble-and-go”. It is relatively easy to implement. The assembler
simply places the code into the core and the loader consists of one
instruction that transfers to the starting instruction of the newly
assembled program.
• Disadvantages of Compile-and-Go Loader
• A portion of the memory is wasted because the core occupied by the
assembler is unavailable to the object program
• It is necessary to retranslate (assemble) the user’s program every time
it is run.
• It is very difficult to handle multiple segments, especially if the source
programs are in different languages( e.g one subroutine in assembly
language and another subroutine in FORTRAN language)
Source
Program
Compile-and-Go
Translator
(e.g., assembler)
Program Loaded
in memory
Assembler
Compile-and-Go Loader Scheme
• General Loader Scheme- Outputting the instructions and data as they are assembled
circumvents the problem of wasting core for the assembler. Such an output could be
saved and loaded whenever the code was to be executed. The assembled program
could be loaded into the same area in core that the assembler occupied( since the
translation have been completed). This output form, containing a coded form of the
instructions is called an object program or object code.
• The use of an object code as intermediate data to avoid one disadvantage of the
compile-and-Go scheme requires the addition of a new program to the system, a
loader.
• The loader accepts the assembled machine instructions and data in core in an
executable computer form. The loader is assumed to be smaller than the assembler,
so that more memory is available to the user. Also reassembly is no longer necessary
to run the program at a later date.
• If all the source program translators (assemblers and compilers) produce compatible
object programs and use compatible linkage conventions, it is possible to write
subroutines in several different languages since the object codes to be processed by
the loader will be in the same language (machine language).
•
•
Absolute Loader- The simplest type of loader scheme , which fits the general loader is
called the absolute loader. In this scheme the assembler outputs the machine language
translation of the source program in almost the same form as in the assemble-and-go scheme,
except that the data is stored in the form of object code instead of being placed directly in
memory. The loader in turn simply accepts the machine language text and places it into core
at the location prescribed by the assembler. This scheme makes more core available to the
user since the assembler is not in memory at load time. Every instruction of this program
form is already bound to a specific load-time address. For execution of this program, it needs
to be loaded into the main storage without any relocation
In this form of loader, the loader is presented with:
– Text of program which has been linked for the designated area
– Load address for the first word of the program (called load address
origin)
– Length of the program
•
•
Because of its simplicity, an absolute loader can be loaded in very few machine instructions.
For example loaders of some of the minicomputers is as small as 20 instructions in length.
Hence not much storage is wasted with the presence of a loader in memory. Also the
program can be stored in the library in their ready-to-execute form.
Absolute loaders are simple to implement but they do have several disadvantages
– The programmer must specify to the assembler the address in core
where the program is to be loaded .
– If there are multiple subroutines , the programmer must remember the
address of each and use that absolute address explicitly in his other
subroutines to perform subroutine linkage.
Source
program
Translator
Object
program1
Loader
Source
Program
Translator
Object
program
ready for
execution
Object
Program 2
Loader
General Loader Scheme
• Program Relocatability-Another function provided by the loader is
that of program relocation. Assume a HLL program A calls a standard
function SIN. A and SIN would have to be linked with each other. But
where in memory shall we load A and SIN ? A possible solution is to
load them according to the addresses assigned when they were
translated. But it is possible that the assigned addresses are wide apart
in storage. For example as translated A might require storage area 200
to 298 while SIN occupies the area 100 to 170. If we were to load
these programs at their translated addresses, the storage area situated
between them would be wasted.
• Another possibility is that both A and SIN can co-exist in storage.
Hence the loader has to relocate one or both the programs to avoid
address conflicts or storage wastage.
• Relocation is more than simply moving a program from one storage
area to another. This is because of the existence of address sensitive
instructions in the program. After moving into appropriate storage
area, it is necessary to modify the location sensitive code so that it can
execute correctly in the new set of locations.
• Feasibility of relocating a program and the manner in which relocation can be
carried out characterizes a program into one of the following forms:
• Non-relocatable programs
• Relocatable programs
• Self-relocating programs
• Non-relocatable programs- A non-relocatable program is one which cannot be made
to execute in any area of storage other than the one designated for it at the time of
coding or translation. Non-relocatability is the result of address sensitivity of code
and lack of information regarding which parts of program are address sensitive and
in what manner. For example, consider a program A which is coded and translated
to be executed from storage location 100 to 198. In the program there can be
instructions that contain the operand addresses and data. The addresses will fall in
the range of 100 to 198 which can be relocated but data values should not be
replaced. Also if one can differentiate between instructions and data, this scheme
will not work for the programs using index registers or base-displacement modes of
addressing.
• Thus relocation is feasible only if relevant information regarding the addresssensitivity of a program is available. A program form which does not contain such
information is non-relocatable.
• Relocatable program- A relocatable program form is one which consists of a
program and relevant information for its relocation. Using this information, it is
possible to relocate the program to execute from a storage area other than theone
designated for it at the time of its coding or translation. The relocation information
would identify the address sensitive portions of code and indicate how they can be
relocated. For example, in case of program A assuming absolute addressing mode,
the relocation information can be in the form of a table containing the addresses of
those instructions which need to be relocated, or it could simply be the address of
first and last instruction of the program, assuming all instructions to be contiguous.
• In relocatable form, the program is a passive object. Some other program must
operate on it using the relocation information inorder to make it ready-to-execute
from its load area. The relocatable program form is called an object module and the
agent which performs its relocation is the linkage editor or linking loader. For the
target program produced by the translator to be relocatable, The translator should
itself supply the relocation information along with the program. Thus , the output
interface of every translator (other than the translator which uses the translate-andgo schematic) should be an object module and not merely the code and data
constituting the target program.
• Self-relocating Programs- A self relocating program is one which can
itself perform the relocation address-sensitive portions. Thus, it not
only contains the information regarding its address-sensitivity, but
also the code to make use of this information regarding its addresssensitivity but also the code to make use of this information to
relocate the address sensitive part of its code. Such a program does not
need an external agency like the linkage editor to relocate it. It can be
simply loaded into its load area and given control for execution. Prior
to execution of its main code, the relocating logic would be executed
to adapt the program to its execution time storage addresses.
• Most programs can be made self-relocating by simply adding
– Information about the address-sensitive portions
– Relocating logic which would make use of this
information
• Extending in this fashion, the self relocating form would occupy more
storage than the original program. There are standard techniques for
coding a program in the self relocating form using index registers or
the base-displacement mode of addressing.
• If the linked origin ≠ translated origin, relocation must be performed by the linker. If
the linked origin ≠ load origin, relocation must be performed by the loader. Mostly
the relocation is done at the time of linking but can also be done at the time of
loading.
• Performing relocation- Let the translated and linked origins of program P be
t_origin and l_origin respectively. Consider a symbol symb in program P. Let its
translation time address be tsymb and link time address be lsymb. The relocation
factor of P is defined as
relocation_factor= l_origin –t_origin---------------(1)
• Thus the relocation factor can be positive, negative or zero.
• Consider a statement which uses symb as an operand. The translator puts the
address tsymb in the instruction generated for it. Now,
tsymb=t_origin + dsymb
• Where dsymb is the offset of symb in P. hence
•
lsymb=l_origin +dsymb
Using (1)
lsymb= t_origin +relocation_factor +dsymb
= t_origin +dsymb+ relocation_factor
= tsymb +relocation_factor
• Let IRR designate the set of instructions requiring relocation in program P.
Relocation of program P can be performed by computing the relocation factor for P
and adding it to the translation time address(es) in every instruction i ε IRR
• Example:
Statement
Address
Code
START 500
ENTRY TOTAL
EXTRN MAX, ALPHA
READ A
500)
+ 09 0 540
LOOP
501)
MOVER AREG, ALPHA
BC
ANY, MAX
BC
STOP
A
DS
TOTAL DS
END
LT, LOOP
1
1
518)
519)
538)
539)
540)
541)
+ 04 0 000
+ 06 6 000
+ 06 1 501
• The translated origin of the program shown is 500. The translation
time address of LOOP is therefore 501. If the program is loaded for
execution in memory area starting with address 900, the load time
origin is 900. The load time address of LOOP would be 901.
• Thus the relocation factor for the program is
•
relocation_factor=900-500
•
= 400
• Thus, IRR contains the instructions with the translated addresses 500
and 538. the instructions with translated address 500 contains the
address 540 in the operand field. This address is changed to
(540+400)= 940. similarly, 400 is added to the operand address in the
instruction with the translated address 538.Thus the address of LOOP
is changed from 501 to 901.
• Subroutine linkages-The problem of subroutine linkages is this:
• A main program A wishes to call a subprogram B.
• The programmer in program A could write a transfer instruction to
subprogram B. However the assembler does not know the value of
this symbol reference and will declare it as an error unless a special
mechanism has been provided. This mechanism is typically
implemented with a relocating or direct linking loader.
• The assembler pseudo-op EXTERN followed by a list of symbols
indicates that these symbols are defined in other programs but
referenced in present program. External means that the values of these
symbols are not known to the assembler but will be provided by
loaders.
• Correspondingly, if a symbol is defined in one program and is
referenced in other programs, we insert it into a symbol list following
the pseudo-op ENTRY.
• To avoid possible reassembling of all subroutines when a single
subroutine is changed, and to perform the tasks of allocation and
linking for the programmer, a general class of relocating loaders was
introduced.
• An example of a relocating loader scheme is that of Binary Symbolic
Subroutine (BSS) loader
• The output of a relocation assembler using a BSS scheme is the object
program and information about all other programs it references. In
addition there is information (relocation information) as to locations in
this program that need to be changed if it is to be loaded in an
arbitrary place in core i.e the locations which are dependent on the
core allocation.
• For each source program, the assembler outputs a text( machine
translation of the program) prefixed by a transfer vector that consists
of addresses containing names of the subroutines referenced by the
source program.
• Thus the assembler using the BSS Scheme provides the
loader
– Object program + relocation information
– Prefixed with information about all other program it
references (transfer vector).
– The length of the entire program
– The length of the transfer vector portion
Transfer Vector
• A transfer vector consists of
– addresses containing names of the subroutines
referenced by the source program
– if a Square Root Routine (SQRT) was referenced and
was the first subroutine called, the first location in the
transfer vector could contain the symbolic name
SQRT.
– The statement calling SQRT would be translated into
a branch to the location of the transfer vector
associated with SQRT
• After loading the text and the transfer vector into core, the loader
would load each subroutine identified in the transfer vector. It would
then place a transfer instruction to the corresponding subroutine in
each entry in the transfer vector. Thus, the execution of the call SQRT
statement would result in a branch to the first location in the transfer
vector, which would contain a transfer instruction to the location of
SQRT.
• Two methods for specifying relocation as part of the object
program:
• 1. A Modification record
– describe each part of the object code that must be changed when
the program is relocated
– M0000_16
• 2. Use of “relocation bits”. The assembler
associates a bit with each instruction or address field.
If this bit equals one, then the corresponding address
field must be relocated; otherwise the field is not
relocated. These relocation indicators are known as
relocation bits and are included in the object
S for Absolute: does not need modification
R for Relative: needs relocation
X for external.
Example
T00106119SFE00S4003S0E01R
• Direct linking loader- A direct linking loader is a general relocatable loader and is
perhaps the most popular loading scheme presently used. The direct linking loader
has the advantage of allowing the programmer multiple procedure segments and
multiple data segments and of giving him complete freedom in referencing data or
instructions contained in other segments. This provides flexible inter segment
referencing and accessing ability while at the same time allowing independent
translation of programs.
• The assembler (translator) must give the loader the following information with each
procedure or data segment:
• The length of segment
• A list of all the symbols in the segment that may be referenced by other segments
and their relative location within the segment (ENTRY symbols)
• A list of all symbols not defined in the segment but referenced in the segment
(EXTERN Symbol).
• Information as to where address constants are located in the segment and a
description of how to revise their values
• The machine code translation of the source program and the relative address
assigned.
• In order to provide the above cited information, the assembler produces four types of
cards in the object code: ESD, TXT, RLD and END
• ESD (External Symbol Dictionary)- This contains information about all the
symbols that are defined in this program but can be referenced in other programs
(ENTRY) and all symbols that are defined elsewhere but are referenced in this
program (EXTRN).
• TXT(Text)- This contains the actual object code translated version of the source
program
• RLD(Relocation and Linkage Directory)- This contains the information about
those locations in the program whose contents depend on the address at which the
program is placed. For such locations, the assembler must supply information
enabling the loader to correct their contents. The RLD cards contain the following
information:
– The location of each constant that needs to be changed due
to relocation
– By what it has to be changed
– The operation to be performed
Reference no symbol Flag
Length
Relative location
14
JOHN
+
4
48
17
SUM
+
4
60
The first RLD card of example contains a 48, denoting the relative
location of a constant that must be changed; a plus sign indicating that
something has to be added to the constant; and the symbol field
indicating that the value of external symbol JOHN must be added to
the relative location 48. The process of adjusting the address constant
of an internal symbol is normally called relocation while the process
of supplying the contents of an address constant is normally referred
to as linking. Significantly, RLD is used for both cases which explains
why they are called relocation and linkage directory.
• END- This indicates the end of the object code and specifies the
starting address for execution if the assembled routine is the main
program
Dynamic Loading
• All the subroutines needed are loaded into core at the same time. If the
total amount of core required by all these subroutines exceeds the
amount available, as is common with large programs or small
computers, there is trouble. There are several hardware techniques
such as paging and segmentation, that attempt to solve this problem.
Dynamic binding is another scheme that can be use for solving the
problem
• Disadvantages of Direct Linking Loader
• One disadvantage of the direct-linking loader is that it is necessary to
allocate , relocate , link, and load all of the subroutines each time in
order to execute a program.
• Even though the loader program may be smaller than the assembler, it
does absorb a considerable amount of space.
These problems can be solved by dividing the loading process
into two separate programs: a binder and a module loader
BINDER
• A binder is a program that performs the same functions as the direct
linking loader in binding subroutines together, but rather than placing
the relocated and linked text directly into memory, it outputs the text
as a file or a card.
• This output file is in a format ready to be loaded and is typically
called a load module.
• The module loader merely has to physically load the module into core.
The binder essentially performs the functions of allocation, relocation
and linking; the module loader merely performs the function of
loading.
• There are two major classes of binders:
• Core-Image Builder- The simplest type of binder produces a load
module that looks very much like a single absolute loader deck. This
means that the specific core allocation of the program is performed at
the time that the subroutines are bound together. Since this kind of
module looks like an actual snapshot or image of a section of core, it
is called a core image module and the corresponding binder is called
a core image builder.
• Linkage Editor-A more sophisticated binder, called a linkage editor,
can keep track of the relocation information so that the resulting load
module can be further relocated and thereby loaded anywhere in core.
In this case the module loader must perform additional allocation and
relocation as well as loading, but it does not have to worry about the
complex problems of linking.
In both the cases, a program that is to be used repeatedly need
only be bound once and then can be loaded whenever required. The
core image builder binder is relatively simple and fast. The linkage
editor binder is somewhat more complex but allows a more flexible
allocation and loading scheme.
Disadvantage
• If a subroutine is referenced but never executed
– if the programmer had placed a call statement
in the program but was never executed
because of a condition that branched around
it
– the loader would still incur the overhead or
linking the subroutine.
• All of these schemes require the programmer to explicitly
name all procedures that might be called.
Example
• Suppose a program consisting of five subprograms
(A{20k},B{20k}, C{30k}, D{10k}, and E{20k}) that require
100K bytes of core.
– Subprogram A only calls B, D and E;
– subprogram B only calls C and E;
– subprogram D only calls E
– subprogram C and E do not call any other
routines
• Note that procedures B and D are never in used the same time;
neither are C and E. if we load only those procedures that are
actually to be used at any particular time, the amount of core
needed is equal to the longest path of the overlay structure.
Longest Path Overlay Structure
100k vs 70k needed
A
20K
B
20K
D
10K
E
20K
C
30K
• In order for the overlay structure to work it is necessary for the
module loader to load the various procedures as they are needed. The
portion of the loader that actually intercepts the calls and loads the
necessary procedure is called the overlay supervisor or simply the
flipper. This overall scheme is called dynamic loading or load-oncall.
Dynamic Linking
• A major disadvantage of some loading schemes is that if a subroutine is referenced
but never executed for example if the programmer had placed a call statement in his
program but this statement was never executed because of a condition that branched
around it, the loader would still incur the overhead of linking the subroutine.
• A very general type of loading scheme is called dynamic linking. This is a
mechanism by which loading and linking of external reference are postponed until
execution time. That is, assembler produces text, binding, and relocation information
from a source language deck. The loader loads only the main program. If the main
program should execute a transfer instruction to an external address, or should
reference an external variable that is variable that has not been defined in this
procedure segment, the loader is called. Only then is the segment containing the
external reference loaded.
• An advantage here is that no overhead is incurred unless the procedure to be called
or referenced is actually used. A further advantage is that the system can be
dynamically reconfigured. The major drawback to using this type of loading scheme
is the considerable overhead and complexity incurred, due to the fact that we have
postponed most of the binding process until execution time.
• Design of an Absolute Loader
• With an absolute loading scheme, the programmer and the assembler
perform the tasks of allocation, relocation and linking. Therefore it is
only necessary for the loader to read cards of the object deck and
move the text on the cards to the absolute locations specified by the
assembler.
• There are two types of information that the object desk must
communicate from the assembler to the loader:
• It must convey the machine instructions that the assembler has created
along with the assigned core locations
• It must convey the entry point of the program, which is where the
loader is to transfer control when all instructions are loaded
• Assuming the information is transmitted on cards, a possible format
for the card is shown:
• Text card (for instructions and data)
Card Column
Contents
1
Card type =0 (for text card identifier)
2
Count of number of bytes( 1 byte per column)
of information on card
3-5
Address at which the data on card is to be put
6-7
Empty (could be used for validity checking)
8-72
Instruction and data to be loaded
73-80
Card Sequence number
• Transfer Card (to hold entry point to the program)
Card Column
Contents
1
card Type = 1 (transfer card identifier)
2
Count=0
3-5
Address of entry point
5-72
Empty
73-80
card sequence number
Thus when a card is read, it is stored as 80 contiguous bytes.
• The algorithm for an absolute loader is quite simple.
• The object deck for the loader consists of a series of text cards
terminated by a transfer card.
• The loader should read one card at one time, moving the text to the
location specified on the card, until a transfer card is reached.
• At this point the instructions are in core and it is only necessary to
transfer to the entry point specified on the transfer card.
INITIALIZE
READ CARD
Set CURLOC to location
in characters 3-5
Type = 0
Card type ?
Set LNG to count in
character 2
Type=1
Transfer to location
CURLOC
Move LNG bytes of text
from characters 8-72 to
location CURLOC
Flowchart of working of an Absolute Loader
Design of a Direct Linking Loader
• Direct linking loader- A direct linking loader is a general relocatable loader and is
perhaps the most popular loading scheme presently used. The direct linking loader
has the advantage of allowing the programmer multiple procedure segments and
multiple data segments and of giving him complete freedom in referencing data or
instructions contained in other segments. This provides flexible inter segment
referencing and accessing ability while at the same time allowing independent
translation of programs.
• The assembler (translator) must give the loader the following information with each
procedure or data segment:
• The length of segment
• A list of all the symbols in the segment that may be referenced by other segments
and their relative location within the segment (ENTRY symbols)
• A list of all symbols not defined in the segment but referenced in the segment
(EXTERN Symbol).
• Information as to where address constants are located in the segment and a
description of how to revise their values
• The machine code translation of the source program and the relative address
assigned.
• In order to provide the above cited information, the assembler
produces four types of cards in the object code: ESD, TXT, RLD and
END
• ESD (External Symbol Dictionary)- This contains information about
all the symbols that are defined in this program but can be referenced
in other programs (ENTRY) and all symbols that are defined
elsewhere but are referenced in this program (EXTRN). In other
words , ESD contains information about the symbols that can be
referred beyond the subroutine level. There are three types of external
symbols the are entered in the ESD card. These are:
–
–
–
Segment Definition (SD)-name on START
Local Definition (LD)- Specified on ENTRY
External Reference (ER)- Specified on EXTRN
•
Each SD and ER symbol is assigned a unique number by the assembler. This
number is called the symbol’s identifier (or ID) and is used in conjunction with the
RLD cards.
• TXT (Text)- This contains the actual object code translated version of the source
program. It contains the block of data and relative address at which the data is to be
placed. Once the loader has decided where to load the program, it merely adds the
Program Load Address (PLA) to the relative address and move the data into
resulting location. The data on the TXT card may be instructions, nonrelocated data
or initial values of address constants.
• RLD (Relocation and Linkage Directory)- This contains the information about
those locations in the program whose contents depend on the address at which the
program is placed. For such locations, the assembler must supply information
enabling the loader to correct their contents. The RLD cards contain the following
information:
– The location and length of each address constant that
needs to be changed due to relocation or linking
– The external symbol by which the address constant has to
be changed
– The operation to be performed (add or subtract)
Reference no symbol Flag
Length
Relative location
14
JOHN
+
4
48
17
SUM
+
4
60
The first RLD card of example contains a 48, denoting the relative location of a
constant that must be changed; a plus sign indicating that something has to be added
to the constant; and the symbol field indicating that the value of external symbol
JOHN must be added to the relative location 48. Rather than using the actual
external symbol’s name on the RLD card, external symbol’s identifier or ID is used.
There are various reasons for this the major one probably being that the ID is only a
single byte long compared to the eight bytes occupied by the symbol name so that
considerable amount of space is saved on the RLD cards. The process of adjusting
the address constant of an internal symbol is normally called relocation while the
process of supplying the contents of an address constant is normally referred to as
linking. Significantly, RLD is used for both cases which explains why they are
called relocation and linkage directory.
• END- This indicates the end of the object deck and specifies the starting address for
execution if the assembled routine is the main program . If the assembler END card
has a symbol in the operand field, it specifies a start of execution point for the entire
program (all subroutines). This address is recorded on the END card
• LDT or EOF-There is a final card required to specify the end of a collection of
object decks. This is called Loader terminate (LDT) or End of File (EOF) card.
Subroutine A
Subroutine B
ESD
TXT
RLD
END
ESD
TXT
RLD
END
EOF or LDT
• The direct linking loader may encounter external reference in an
object deck which cannot be evaluated until a later object deck is
processed, this type of loader requires two passes for its complete
execution. Their functions are very similar to those of the two passes
of an assembler.
• The major function of pass 1 of a direct linking loader is to allocate
and assign each program a location in core and create a symbol table
filling in the value of the external symbols.
• The major function of pass2 is to load the actual program text and
perform the relocation modification of any address constants needing
to be altered.
Core
Memory
Two-pass direct linking loader scheme
• Specification of Data structures- The data bases required by each pass of the loader
are:
• Pass 1 data bases:
– Input object decks
– A parameter, the initial Program Load Address (IPLA)
supplied by the programmer or the operating system, that
specifies the address to load the first segment
– A program Load address (PLA) counter, used to keep track
of each segment’s assigned location.
– A table, the Global External Symbol Table (GEST), that is
use to store each external symbol and its corresponding
assigned core address.
– A copy of input to be used later by pass2. This may be
stored on an auxiliary storage device or the original object
decks may be reread by the loader a second time for pass 2.
– A printed listing, the load map, that specifies each external
symbol and its assigned value.
•
•
•
•
•
Pass 2 data bases:
Copy of object program inputted to pass 1
The Initial Program Load Address parameter (IPLA)
The Program Load Address counter (PLA)
The Global External Symbol Table (GEST), prepared by pass 1,
containing each external symbol and its corresponding absolute
address value.
• An array, the Local External Symbol Array (LESA), which is used to
establish a correspondence between the ESD ID numbers, used on
ESD and RLD cards, and the corresponding external symbol’s
absolute address value.
• Format of Databases- The third step in the design procedure is to specify the format
of various databases that are to be used.
• Object deck• Global External Symbol Table- The GEST is used to store the external symbols
defined by means of a Segment Definition (SD) or local definition (LD) entry on an
External Symbol Dictionary (ESD) card. When these symbols are encountered
during pass 1, they are assigned an absolute core address ; this address is stored
along with the symbol in the GEST.
• Local External Symbol Array- The external symbol to be used for relocation or
linking is identified on the RLD cards by means of an ID number rather than the
symbol’s name. The ID number must match an SD or ER entry on the ESD card.
This technique both saves the space on the RLD card and speeds the processing by
eliminating many searches of the Global External Symbol table.
It is necessary to establish a correspondence between an ID number on an RLD
card and the absolute core address value. The ESD card contains the ID numbers
and the symbols they correspond to, while the information regarding their addresses
is stored on the GEST. In pass2 of the loader, the GEST and ESD information for
each individual object deck is merged to produce the local external symbol array
that directly relate ID number and value.
Core
Memory
Use of databases by loader passes
• Algorithm
• Pass 1- allocate segments and define symbols
The purpose of pass 1 is to assign a location to each segment and thus
to define the values of all external symbols. Inorder to minimize the
amount of core storage required for the total program, each segment
is assigned the next available location after the preceding segment. It
is necessary for the loader to know where it can load the first segment.
This address, the Initial Program Load Address (IPLA) , is normally
determined by the operating system. In some systems the programmer
may specify the IPLA; in either case, the IPLA is a parameter supplied
to the loader.
• Step 1: Set the value of Program Load Address (PLA) to IPLA
• Step 2: Read the object card and make a copy of the card for use by
pass 2
• Step 3: If card type = TXT or RLD , then
no processing required
go to step 2
•
•
Step 4: Else If Card type = ESD , then
If external symbol type = SD [Segment definition]
set SLENGTH := LENGTH [from segment definition]
Set VALUE := PLA
If the symbol does not exist in the GEST
Store the VALUE and Symbol in the GEST [Global External symbol Table]
The Symbol and its value are printed as part of the load map
go to step 2
else
Report the error
go to step 2
Else if external symbol type = LD [Local definition]
Set VALUE := PLA plus the relative address , ADDR, indicated on the ESD card
If the symbol does not exist in the GEST
Store the VALUE and Symbol in the GEST [Global External symbol Table]
go to step 2
else
Report the error
go to step 2
Else if external symbol type = ER, then
no processing required during pass 1
go to step 2
[End of If structure]
Step 5: Else if card type = END , then
Set PLA:= PLA + SLENGTH [ PLA is incremented by SLENGTH and is saved in
SLENGTH becoming PLA for the next segment]
go to step 2
•
Step 6: Else if card type = LD or EOF
PASS 1 is finished and control transfers to the pass 2
[End of If structure]
Detailed Pass One flow chart
•
•
•
•
Pass 2: Load text and relocate/link address constants
After all the segments have been assigned locations and the external symbols have been defined
by pass 1, it is possible to complete the loading by loading the text and adjusting (relocation or linking)
address constants. At the end of pass 2, the loader will transfer control to the loaded program.
Step 1: The program load address is initialized as in pass 1, and execution start
address (EXADDR) is set to IPLA.
Step 2: Read the first card.
Step 3: If card type = ESD, then
If symbol = SD, then
SLENGTH:= LENGTH
The appropriate entry in the local external symbol array, LESA
(ID), is set to the current value of program Load Address. [the
address of SD will be given by PLA]
go to step 2
else if symbol = LD, then
no processing required in pass 2
go to step 2
else if symbol = ER, then
Search GEST for a match with ER symbol .
If it is not found,
The corresponding segment or entry must be missing and is an
error.
go to step 2
If a symbol is found in GEST,
Its value is extracted and corresponding Local External Symbol
Array entry LESA(ID) , is set equal to it.
go to step 2
•
•
Step 4: Else if card type = TXT, then
text is copied from the card to the appropriate relocated core location (PLA + ADDR)
[Every txt entry is put at the address equal to the sum of its relative address on ESD
card and current PLA]
Step 5: Else if card type= RLD, then
The value to be used for relocation and linking is extracted from the
Local external symbol array as specified by the ID field i.e LESA(ID).
Depending on the flag setting (plus or minus) the value is either added or
subtracted from the address constant. The actual relocated address of
the address constant is computed as the sum of the PLA and the ADDR
field specified on the RLD card.
Go to step 2
•
Step 6: Else if card type = END, then
If execution start address is specified on the END card , it is saved in
the variable EXADDR after being relocated by the PLA. The Program
load address is incremented by the length of the segment and saved in
SLENGTH, becoming the PLA for the next segment.
Go to step 2
•
Step 7: Else if card type = LDT /EOF, then
The loader transfers control to the loaded program at the address
specified by current contents of the execution address variable
(EXADDR)
[End of If structure]
• Step 8: End
Detailed pass two Flow chart
COMPILER
• A compiler bridges the semantic gap between a PL domain and an
execution domain. Two aspects of compilation are:
• Generate code to implement meaning of a source program in the
execution domain
• Provide diagnostics for violation of PL semantics in a source program.
A compiler accepts a program written in a higher level language
as input and produces its machine language equivalent as output. The
various tasks that a compiler has to do inorder to produce the machine
language equivalent of a source program are:
• Recognize certain strings as basic elements e.g recognize that COST
is a variable, ‘=‘ is an operator etc
• Recognize the combination of elements as syntactic units and interpret
their meaning e.g ascertain that the first statement is a procedure name
with three arguments, that the next statement defines four variables
etc.
• Allocate storage and assign locations for all variables in this program
• Generate the appropriate object code
The structure of a compiler
• A compiler takes as input a source program and produces as output an
equivalent sequence of machine instructions. This process is so
complex that it is not reasonable , either from a logical point of view
or from an implementation point of view, to consider the compilation
process as occurring in one single step. For this reason, it is customary
to partition the compilation process into a series of sub processes
called the phases
• A phase is a logically cohesive operation that takes as input one
representation of the source program and produces as output another
representation.
• Lexical analysis-The first phase , called the lexical analyzer or scanner
separates characters of the source program into groups that logically
belong together. These groups are called the tokens. The usual tokens
are keywords, such as DO or If, identifiers such as X or NUM,
operator symbols such as <= or + and punctuation symbols such as
parentheses or commas. The output of the lexical analyzer is a stream
of tokens which is passed to the next phase, the syntax analyzer or
parser. The tokens in this stream can be represented by codes which
may be regarded as integers . Thus Do might be represented by 1, +
by 2 and identifier by 3.The source program is scanned sequentially.
• The basic elements (identifiers and literals) are placed into tables. As
other phases recognize the use and meaning of the elements, further
information is entered into these tables (e.g precision, data types,
length, and storage class)
• Other phases of the compiler use the attributes of each basic element
and must therefore have access to this information. Either all the
information about each element is passed to other phases or typically
the source string itself is converted into a string of “uniform symbols”.
Uniform symbols are of fixed size and consist of the syntactic class
and a pointer to the table entry of the associated basic element.
• Because the uniform symbols are of fixed size, converting to them
makes the later phases of the compiler simpler. The lexical process
can be done in one continuous pass through the data by creating an
intermediate form of the program consisting of a chain or table of
tokens. Alternatively, some schemes reduce the size of the token table
by only parsing tokens as necessary and discarding those that are no
longer needed.
WCM : PROCEDURE ( RATE, START, FINISH )
DECLARE (COST, RATE, START, FINISH) FIXED BINARY
(31) STATIC ;
COST = RATE * (START –FINISH) + 2 * RATE * (START –
FINISH – 100);
RETURN (COST);
CLASS
CLASSES OF UNIFORM
SYMBOLS
IDENTIFIER (ID)
TERMINAL SYMBOL
(TRM)
LITERAL (LIT)
IDN
TRM
TRM
TRM
IDN
PTR
WCM
:
PROCEDURE
(
RATE
• Syntactic Phase- Once the program has been broken down into tokens
or uniform symbols, the compiler must
• Recognize the phrases (syntactic construction)
• Interpret the meaning of the construction.
The first of these steps is concerned solely with recognizing and
thus separating the basic syntactical constructs in the source program.
This process is called the syntax analysis
– The syntax analyzer groups tokens together into
syntactic structures. For example , the three tokens
representing A + B might be grouped into a
syntactic structure called an expression.
Expressions might further be complicated to form
statements .Often the syntactic structure can be
regarded as a tree whose leaves are the tokens.
The interior nodes of the tree represent strings of
token that logically belong together
WCM : PROCEDURE ( RATE, START, FINISH ) Valid procedure statement
DECLARE ( COST , RATE , START , FINISH ) FIXED
BINARY ( 31 ) STATIC ; Valid declare statement
COST = RATE * ( START – FINISH ) + 2 * RATE * (
START – FINISH – 100 ) ;
RETURN ( COST ) ;
Syntax analysis also notes syntactic errors and assures some sort of
recovery so that the compiler can continue to look for other
compilation errors. Once the syntax of the statement has been
ascertained, the second step is to interpret the meaning (semantics).
Associated with each syntactic construction is a defined meaning
(semantics). This may be in the form of an object code or an
intermediate form of the construction.
• Intermediate code generation -The third phase is called the
intermediate code generation phase. The intermediate code generator
uses the structure produced by the syntax analyzer to create a stream
of simple instructions. Many styles of intermediate code are possible.
One common style uses the instructions with one operator and a small
number of operands.
The intermediate form affords two advantages:
– It facilitates optimization of object code
– it allows a logical separation between the
machine dependent (code generation and
assembly) and machine independent phases(
lexical syntax, interpretation)
Using an intermediate form depends on the type of syntactic
construction e.g Arithmetic, nonarithmetic or nonexecutable
statements
• Arithmetic Statements- One intermediate form of an arithmetic statement is a parse
tree. The rules for converting an arithmetic statement into a parse tree are:
• Any variable is a terminal node of the tree
• For every operator, construct a binary tree whose left branch is the tree for operand
1 and whose right branch is the tree for operand 2
• A compiler may use as an intermediate form a linear representation of the parse tree
called matrix. In a matrix, operations of the program are listed sequentially in the
order they would be executed. Each matrix entry has one operator and two operands.
The operands are uniform symbols denoting wither variables, literals or other matrix
entries.
• Nonarithmetic statements- These statements can also be represented in the matrix
form. The statements Do, If can all be replaced by a sequential ordering of
individual matrix entries.
• Nonexecutable statements- Non executable statements like declare give the compiler
information that clarifies the referencing or allocation of variables and associated
storage. There is no intermediate form for these statements. Instead, the information
in the non executable statements is entered into tables to be used by other parts of
the compiler.
• COST = RATE * ( START – FINISH ) + 2 * RATE * ( START –
FINISH – 100 ) ;
•
=
cost
+
*
rate
*
start
finish
*
2
rate 100
start
finish
Operator
Operand1
Operand2
-
START
FINISH
*
RATE
M1
*
2
RATE
-
START
FINISH
-
M4
100
*
M3
M5
+
M2
M6
=
COST
M7
Matrix form of intermediate code for a statement
• Code optimization- Code optimization is an optional phase designed
to improve the intermediate code so that ultimate program runs faster
and/or takes less space. Its output is another intermediate code
program that does the same job as the original but perhaps in a way
that saves time and/or space.
• The final phase called the code generation produces the object code
by deciding on the memory locations for data, selecting code to access
each datum, and selecting the registers in which each computation is
to be done. Designing a code generator that produces truly efficient
object program is one of the most difficult parts of compiler design
both practically and theoretically.
• Table management- Table management or book keeping portion of
compiler keeps track of the names used by the program and records
essential information about each such as its type (integer,real etc). The
data structure used to record this information is called a symbol table.
• The error handler is invoked when a flaw in the source program is
detected. It must warn the programmer by issuing a diagnostic and
adjust the information being passed from phase to phase so that each
phase can proceed. It is desirable that compilation be completed on
flawed programs at least through the syntax analysis phase, so that as
many errors as possible can be detected in one compilation. Both the
table management and error handling routines interact with all phases
of the compiler
• The seven phase model of a compiler- In analyzing the compilation of
simple PL program, we have found seven distinct logical phases of
compilation . These are:
• Lexical analysis phase- recognition of basic elements and creation of
uniform symbols.
• Syntax analysis- recognition of basic syntactic constructs through
reductions.
• Interpretation- definition of exact meaning, creation of matrix and
tables by action routines
• Machine independent optimization- creation of more optimal matrix
• Storage assignment- modification of identifier and literal tables. It
makes entries in the matrix that allow code generation to create code
that allocates dynamic storage and that also allow the assembly phase
to reserve the proper amount of static storage
• Code generation- use of macro processor to produce more optimal
assembly code
• Assembly and output- resolving symbolic addresses(labels) and
generating machine language.
• The phases one through 4 are machine independent and language
dependent whereas phases 5 through 7 are machine dependent and
language independent. The various databases that are used by the
compiler and which form the lines of communication between various
phases are:
• Source code- Simple source program in high level language
• Uniform symbol table- Consists of a full or partial list of the tokens as
they appear in the program. Created by lexical analysis and used by
syntactic and interpretation phases.
• Terminal table- a permanent table which lists all keywords and special
symbols of the language in symbolic form
• Identifier table- contains all the variables in the program and
temporary storage and any information needed to reference or allocate
storage for them; created by lexical analysis, modified by
interpretation and storage allocation and referenced by code
generation and assembly. The table may also contain information of
all temporary locations that the compiler creates for use during
execution of the source program (e.g temporary matrix entries)
• Literal table- contains all constants in the program
• Reductions- Permanent table of decisions rules in the form of pattern
matching with the uniform symbol table to discover syntactic structure
• Matrix – intermediate form of the program which is created by the
action routines, optimized, and then used for code generation.
• Code production- permanent table of definitions. There is one entry
defining code for each possible matrix operator
• Assembly code- assembly language version of the program which is
created by the code generation phase and is input to the assembly
phase
• Relocatable object code- final output of the assembly phase, ready to
be used as input to the loader.
• Lexical analysis phase• TASKS• To parse the source program into basic elements or tokens of the
language
• To build a literal table and an identifier table
• To build a uniform symbol table
• DATABASES
• Source program
• Terminal table- a permanent database that has entry for each terminal
symbol (e.g arithmetic operators, keywords, non alphanumeric
symbols ). Each entry consists of the terminal symbol, an indication of
its classification (operator, break character) and its precedence (used
in later phases)
Symbol
Indicator
Precedence
• Literal table – created by lexical phase to describe all literals used in
the source program. There is one entry for each literal consisting of a
value, a number of attributes, an address denoting the location of the
literal at execution time (filled in by later phase). The attributes such
as data types or precisions can be deduced from the literal itself and
filled in by lexical analysis .
Literal
Base
Scale
Precision
Other
Address
information
• Identifier table- created by lexical analysis to describe all identifiers
used in the source program. There is one entry for each identifier.
Lexical analysis creates the entry and places the name of the identifier
into that entry. Later phases will fill in the data attributes and address
of each identifier
Name
Data attribute
Address
• Uniform Symbol table- created by lexical analysis phase to represent
the program as a string of tokens rather than of individual characters.
Each uniform symbol contains the identification of the table of which
token is a member and its index within that table
Table
Index
• Algorithm• The first task of the lexical analysis algorithm is to parse the input
character string into tokens.
• The second is to make appropriate entries in the table.
• The input string is separated into tokens by break characters. Break
characters are denoted by the contents of a special field in the
terminal table. Source characters are read in, checked for legality and
tested to see if they are break characters . Consecutive nonbreak
characters are accumulated into tokens. Strings between break
characters are tokens as are nonblank break characters. Blanks may or
may not serve as tokens.
• Lexical analysis recognizes three types of tokens:
– Terminal symbols
– Possible identifiers
– Literals
• It checks all tokens by first comparing them with the entries in the
terminal table. Once a match is found, the token is classified as a
terminal symbol and lexical analysis creates a uniform symbol of type
TRM, and inserts it in uniform symbol table. If a token is not a
terminal symbol, lexical analysis proceeds to classify it as a possible
identifier or literal. Those tokens that satisfy the lexical rules for
forming identifiers are classified as possible identifiers.
• After a token is classified as possible identifier, the identifier table is
examined. If this particular token is not in the table, a new entry is
made. Lexical analysis creates a symbol of the type IDN and inserts it
into uniform symbol table corresponding to the identifier.
• Numbers, quoted character strings and other self defining data
are classified as literals. After a token is classified as literal, its entry
is searched in the literal table. If it is not found, a new entry is made.
As lexical analysis can determine all the attributes of a literal by
looking at the characters that represent it, each new entry made in the
literal table consist of literal and its attributes (base, scale and
precision). A uniform symbol of type LIT is entered in the uniform
symbol table.
• Syntax phase- The function of the syntax phase is to recognize the major constructs
of the language and to call the appropriate action routines that will generate the
intermediate form or matrix for these constructs.
• There are many ways of operationally recognizing the basic constructs and
interpreting their meaning. One method is using reduction rules which specify the
syntax form of the source language. These reductions define the basic syntax
construction and appropriate compiler routine (action routines) to be executed when
a construction is recognized. The action routine interpret the meaning of the
constructions and generate either the code or an intermediate form of the
construction. The reductions are dependent upon the syntax of the source language.
• DATABASES
• Uniform symbol table- created by the lexical analysis phase and containing the
source program in the form of uniform symbols. It is used by the syntax and
interpretation phase as the source of input to the stack. Each symbol from UST
enters the stack only once.
• Stack – Stack is the collection of uniform symbols that is currently being worked on
by the syntax analysis and interpretation phases. Additions to or deletions from the
stack are done by phases that use it.
• Reductions- The syntax rules of the source language are contained in the reduction
table.
• The syntax analysis phase is an interpreter driven by the reductions the general form
of a reduction is
label: old top of stack/ action routine/ New top of stack/ Next reduction
•
•
•
Briefly about the various fields of a reduction rule:
Label – it is optional
Old top of stack- to be compared to top of stack. It can take one or a group of following
items
– Blank or null- always a match, regardless of what is on top of stack
– Non-blank- one or more items from the following categories
•
• <syntactic type> such as an identifier or literal- matches any uniform symbol
of this type
• <any>- matches a uniform symbol of any type
• Symbolic representation of a keyword, such as “PROCEDURE” or “IF” –
matches keyword
Action routine- to be called if old top of stack field matches current Top of stack
– Blank or null- no action routines called
– Name of the action routine- call the routine
•
New top of stack- Changes to be made to top of stack after action routine is executed
– Blank or null- no change
– “ - delete top of stack (pattern that has been matched)
– Syntactic type, keyword or stack item (Sm)- delete old top of stack and
replace with this item(s)
– * - get next uniform symbol from the uniform symbol table and put it on
top of stack
• Next reduction
– Blank or null- interpret the next sequential reduction
– n – interpret reduction n
• Algorithm
• Reductions are tested consecutively for match between old top of
stack and actual top of stack until match is found
• When match is found, the action routines specified in the action field
are executed in order from left to right
• When control returns to the syntax analyzer, it modifies the top of
stack to agree with the New top of stack field
• Step 1 is then repeated starting with the reduction specified in the next
reduction field.
•
Example:
/ / ***/
2: <idn> : PROCEDURE / bgn_proc / S1 *** / 4
<any> <any> <any> / ERROR / S2 S1 * /2
The above given reduction checks that the starting of a
program should be a label field
– Start by putting the first three uniform symbols on the stack
– Test to see if top three elements are <idn>: PROCEDURE.
– If they are, call the bgn_proc , delete the top of stack and put S1, i.e
initial top of stack on the top. Then enter next three token from UST.
– If the top three elements are not a label, Error routine is called and third
element on the stack is removed and one more token is added and again
checked for label
• Interpretation phase- The interpretation phase is typically a collection of routines that are
•
•
called when a construct is recognized in the syntactic phase. The purpose of these routines
(called the action routine) is to create an intermediate form of the source program and add
information to the identifier table.
The separation of interpretation phase from syntactic phase is a logical division. The former
phase recognizes the syntactic structure while the latter interprets the precise meaning into the
matrix or identifier table.
DATABASES
• Uniform symbol table
• Stack- contains token currently being parsed by the syntax ad
interpretation phase
• Identifier table- initialized by lexical analysis to completely describe
all identifiers used in the source program. The interpretation phase
enters all the attributes of the of the identifiers.
• Matrix- the primary intermediate form of the program. A simple form
of a matrix entry consists of a triplet where first element is a uniform
symbol denoting the terminal symbol or operator and other two
elements is a uniform symbol denoting the arguments. The code
generation phase will be driven by this matrix and will produce the
appropriate object code. There may also be chaining field that can be
utilized by the optimization phase to add or delete entries.
• Martix operands are uniform symbols of type IDN,LIT or TRM and a
fourth form MTX. A uniform symbol
MTX
•
n
denotes the result of the nth matrix entry and points to the
corresponding entry in the temporary storage area of the identifier
table line
Operator
Operand1
Operand2
-
START
FINISH
*
RATE
M1
*
2
RATE
-
START
FINISH
-
M4
100
*
M3
M5
+
M2
M6
=
COST
M7
Matrix form of intermediate code for a statement
• Temporary Storage table- (may be implemented as part of the
identifier table). The interpretation phase enters into the temporary
storage table all information about the associated values of MTX
symbols i.e the attributes of the temporary computations resulting
from matrix lines such a data types, precision, source statement in
which it is used etc.
• Algorithm
• Do any necessary additional parsing- this permits action routines to
add symbols to or delete them from the stack as they deem necessary
• Create new entries in the matrix or add data attributes to the identifier
table. In the former case the routines must be able to determine the
proper operator and operands and insert them into the matrix. In the
latter case, they must decide exactly what attributes have been
declared and put them into identifier table. In both these cases, the
complexity of the action routines will depend on how much has been
done by the reductions and vice versa.
Interaction of lexical, syntax and interpretation Phases
• Machine independent Optimization- This involves creation of more optimal matrix.
There are two types of optimizations: machine dependent and machine independent
optimization..
• Machine dependent optimization is so intimately related to the instructions that get
generated that it is incorporated into the code generation phase, whereas machine
independent optimization is done in a separate optimization phase.
• In deciding whether or not to incorporate a particular optimization in the compiler, it
is necessary to weigh the gains it would bring in increased efficiency of the compiler
against the increased cost in the compilation time and complexity
• DATABASES
• Matrix- this is the major database that is used in this phase. To facilitate the
elimination or insertion of entries into matrix , we add to each entry chaining
information, forward or backwards pointers. This avoids the necessity of reordering
and relocating matrix entries when an entry is added or deleted. The forward pointer
allows the code generation phase to go through the matrix in proper order. The
backward pointer allows backward sequencing through the matrix as may be needed
•
•
•
Identifier table- accessed to delete unneeded temporary storage and obtain information about the
identifiers.
Literal table- new literals that may be created by certain type of optimization.
Types of optimizations
– Elimination of common sub expressions-There can be some statements in the
program that are repetitive in nature. These can be eliminated. The common sub
expressions must be identical and must be in the same statement.
•
•
•
•
•
•
•
•
•
•
Example
Original expression
START FINISH
*
RATE
M1
*
2
RATE
START FINISH
M4
100
*
M3
M5
+
M2
M6
=
COST
M7
modified matrix
- START
FINISH
*
RATE
M1
*
2
RATE
*
*
=
M1
M3
M2
COST
100
M5
M6
M7
– Compile time compute- Doing computations involving constants at compile time
saves both the space and execution time for the object program. For example if
we had the statement
a= 2* 276 / 92 * B
The compiler could perform the multiplication and division so indicated and substitute 6 * B for the
original expression. This will enable it to delete two matrix entries.
– Boolean Expression optimization- We may use the properties of boolean
expressions to shorten their computation
– Move invariant computations outside of the
loop- If a computation within a loop depends on a
variable that does not change within that loop , the
computation may be moved outside the loop. This
involves reordering of a part of the matrix.
• Storage assignment- The purpose of this phase is to
– Assign storage to all variables referenced in the
source program
– Assign storage to all temporary locations that are
necessary for intermediate results
– Assign storage to literals
– Ensure that the storage is allocated and appropriate
locations are initialized
• DATABASES
• Identifier table- storage assignment designates locations to all
identifiers that denote data.
• Literal table- the storage assignment phase assigns all literal addresses
and places an entry in the matrix to denote that code generation should
allocate this storage.
• Matrix- storage assignment places entries into the matrix to ensure
that code generation creates enough storage for identifiers, and
literals.
• Code generation phase- The purpose of code generation phase is to
produce the appropriate code (assembly or machine language). The
code generation phase has matrix as input. It uses the code
productions which define the operators that may appear in the matrix
to produce code. It also references identifier tables or literal tables to
generate proper address and code conversion.
• DATABASES
• Matrix- each entry has its operator defined in the code pattern data
base
• Identifier table, literal table- are used to find the locations of variables
and literals
• Code productions (macro definitions)- a permanent data base defining
all possible matrix operators.
• Code generation phase is implemented in a way analogous to that used
for the assembler macro processor. The operation field of each matrix
line is treated as a macro call and matrix operands on the line are used
as macro arguments. Simple machine dependent optimizations are
performed during code generation.
• Assembly phase- The task of the assembly phase depends on how
much has been done in code generation phase. The main tasks that has
to be done in this phase are:
• Resolve label references
• Calculate addresses
• Generate binary machine instructions
• Generate storage, convert literals
Structure of compiler
• Passes of a compiler- Instead of viewing the compiler in terms of its seven logical
phases, we could have looked at it in terms of N physical passes that it must make
over its data bases.
• Pass 1 corresponds to the lexical analysis phase. It scans the program and creates the
identifier, literal and uniform symbol tables
• Pass2 corresponds to the syntactic and interpretation phases. Pass 2 scans the
uniform symbol table, produces the matrix and places information about identifiers
into the identifier table. Pass 1 and pass 2 can be combined into one by treating
lexical analysis as an action routine that would parse the source program and
transfer tokens to the stack as they were needed.
• Pass 3 through N-3 correspond to the optimization phase. Each separate type of
optimization may require several passes over the matrix
• Pass N-2 correspond to storage assignment phase. This is a pass over the identifier
and literal tables rather than the program itself.
• Pass N-1 correspond to the code generation phase. It scans the matrix and creates the
first version of object deck
• Pass N corresponds to the assembly phase. It resolves the symbolic addresses and
creates information for the loader
INTERPRETER
• Interpreters are the software processors that after analysing a
particular statement of the source program perform the action which
implement its meaning. Interpretation avoids the overheads of
compilation. This is an advantage during program development
because a program may be modified between every two executions.
However, interpretation is expensive in terms of CPU time, because
each statement is subjected to the interpretation cycle.
• Both compilers and interpreter analyse a source statement to
determine its meaning. During compilation, analysis of a statement is
followed by code generation while during interpretation it is followed
by actions which implement its meaning. Hence we could assume
•
tc= ti
• Where tc and ti are the time taken for compiling and interpreting one
statement
• te which is the execution time of compiler generated code for a
statement, can be several times less than the compilation time tc. Let
us assume
•
tc = 20 te
• Considering a program with sizep=200. For a specific data, let
program p execute as:
20 statements are executed for initialization purpose
10 iterations of a loop with 8 statements
20 statements for printing the result
Thus statements executed stm=20 + 10X 8 + 20=120
• Thus, Total execution time using the compilation model is
= 200.tc +120.te
= 206.tc
•
Total execution time for interpretation model
= 120.ti
=120.tc
•
Clearly, interpretation is beneficial in this case
• Use of interpreters- Use of interpreter is motivated by two reasons—
• Efficiency in certain environments and simplicity
• It is better to use interpretation for a program if program is modified
between executions
• It is also beneficial to use interpreter in program where executable
statements are less than the total size of the program stm < sizep
• Interpretation scheme- the interpreter consists of three main components:
• Symbol Table- the symbol table holds information concerning entities in the source
program
• Data store- the data store contains values of the data items declared in the program
being interpreted. The data store consists of a set of components {compi }. A
component compi is an array named namei containing elements of a distinct type
typei
• Data manipulation routines- A set of data manipulation routines exist. This set
contains a routine for every legal data manipulation action in the source language.
On analyzing a declaration statement, say a statement declaring an array alpha
of type typ, the interpreter locates a component compi of its data store, such that
typei=typ. Alpha is now mapped into a part of namei. The memory mapping for
alpha is remembered in its symbol table entry. An executable statement is analysed
to identify the actions which constitute its meaning. For each action, the interpreter
finds the appropriate data manipulation routine and invokes it with appropriate
parameters. For example, the meaning of statement
a:= b+c
where a,b,c are of same type can be implemented by executing the calls
add (b,c,result)
assign ( a, result)
in the interpreter. The interpreter procedure add is called with the symbol table
entries b and c. add analyses the type of b and c and decides the procedure that have
to be called to realize the addition
• Consider the basic problem
real a,b
integer c
let c=7
let b = 1.2
a= b +c
The symbol table for the program and memory allocation for variables
a, b and c. each symbol table entry contains information about the
type and memory allocation for a variable. The values ‘real’ and ‘8’ in
the symbol table entry of a indicate that a is allocated the word
ivar
rvar[8].
rvar
Symbol type address
a
Real
8
b
Real
13
c
int
5
8 a
13 b
5 c
• Pure and impure interpreter
• Pure interpreter- The source program is retained in the source form all
through its interpreter. This arrangement incurs substantial analysis
overheads while interpreting a statement.
Data
Source
program
Interpreter
Pure Interpreter
Results
• Impure interpreter- An impure interpreter performs some preliminary
processing of the source program to reduce the overheads during
interpretation. The preprocessor converts the program to an
intermediate representation (IR) which is used during interpretation.
This speeds up the interpretation as the code component of the IR, i.e
the IC, can be analyzed more efficiently than the source form of the
program. However, use of IR also implies that the entire program has
to be preprocessed after any modification. This involves fixed
overheads at the start of interpretation. The postfix notation is a
popular intermediate code for interpreters.
Data
Source
program
Preprocessor
IR
Interpreter
Impure Interpreter
Results
Editors
• Text editors-Editor is a computer program that allows users to create
and format a document. The document can be a computer program,
text, images, equations, graphics and tables etc. , that can be thought
of being typed and saved on the computer system and later printed on
a page. Text editors are the ones that are used for editing of textual
document. The text editors are of the following forms:
• Line Editors
• Stream Editors
• Screen Editors
• Word processors
• Structure Editors
• Line editors: The scope of edit operations of a line editor is limited to a line of text.
The line is designated positionally e.g by specifying its serial number in the text, or
contextually e.g by specifying a context which uniquely identifies it. The primary
advantage of line editors is their simplicity. A stream editor views the entire text as a
stream of characters. This permits edit operations to cross line boundaries. Stream
editors typically support character, line and context oriented commands based on the
current editing context indicated by the position of a text pointer. The pointer can be
manipulated using positioning or search commands. Line and stream editors
typically maintain multiple representations of text. One representation (the display
form) shows the text as a sequence of lines. The editor also maintains an internal
form which is used to perform the edit operations. The editor ensures that these
representations are compatible at every moment. The example of a line editor is
edlin editor of DOS and example of stream editor is sed editor in UNIX
• Screen Editors: A line or stream editor does not display the text in the manner it
would appear if printed. A screen editor uses the what-you-see-is-what-you-get
principle in editor design. The editor displays a screenful of text at a time. The user
can move the cursor over the screen, position it at the point where he desires to
perform an edit operation on the screen. This is very useful while formatting the text
to produce printed documents.
• Word Processors- Word processors are basically document editors
with additional features to produce well formatted hard copy output.
Essential features of word processors are commands for moving
sections of text from one place to another, merging of text and
searching and replacement of words. Many word processors support a
spell check option. With the advent of personal computers, word
processors have seen widespread use amongst authors, office
personnel and computer professionals. WordStar is a popular editor of
this class.
• Structure editors- A structure editor incorporates an awareness of the
structure of a document. This is useful in browsing through a
document e.g if a programmer wishes to edit a specific function in a
program file. The structure is specified by the user while creating or
modifying the document. Editing requirements are specified using the
structure. A special class of structure editors, called syntax directed
editors, are used in programming environments.
Contemporary editors support a combination of line, stream
and screen editing functions. This makes it hard to classify them into
the categories. The vi editor of Linux and editors in desktop
publishing systems are typical examples of these.
• The editing process- Generally speaking, an editor is a computer
program that allows the user to create, correct, format and revise a
computer document. The editing process basically performs four
different tasks:
• Select the part of the document that is to be edited
• Conclude how to format the part of the document and how to display
it.
• State and execute the operations that will modify the document
• Update the external view accordingly.
Thus the various processes can be classified as:
• Selection- The first thing that is to be done by the editor for a
document is to select the part of the document that is to be edited or
viewed. This is done by traveling through the document to locate the
area of interest. This selection is done on the basis of various
commands like select page, select paragraph etc
• Filtering- After selection is done, filtering extracts the selected part of
the document that is to be edited or viewed, from the whole document.
• Formatting- After the required portion of document is filtered then
formatting decides how the result of filtering will be viewed to the
user on the display device.
• Editing- This is the phase where actual editing is being done. User
now can view and edit the document. Here the document is created,
altered or edited as per the user specification. User can give
commands like insert, delete, copy , move, replace, cut etc.The editing
commands given by the user basically depends upon the type of editor
being used. For example, an editor of text document may work upon
the elements like character, line, paragraph etc. and an editor of
programming language may work upon elements like variables,
identifiers, keywords etc.
• Design of a text editor- All of the text editors have the same structure
regardless of the features that are provided by the editor and the
architecture of machine on which they are implemented. The main
components of a text editor are:
• User command processor- It accepts the input given by the user ,
analyzes the tokens and checks the structure of the command given by
the user. In other words, its function is similar to lexical analysis and
syntax analysis function of the compiler. As a compiler, an editor also
has the routines to check the semantic of the input. In the text editor,
these semantic routines perform the functions of viewing and editing.
The semantic routines called by the editor perform the functions of
editing, viewing, traveling and displaying. Although the user always
specify the editing operation, the viewing and traveling operations are
either done implicitly by the editor or explicitly done by the user.
• Editing module- Editing module is the collection of all sub modules
that help in editing functions. While editing a document, the starting
point of the document that is to be edited is stored by the current
editing pointer (cep) included in the editing module. The value of cep
can be set or reset by the user with the help of traveling command like
next line, next paragraph etc. or implicitly by the system as a result of
some other or previous editing operation.
As the user issues the editing command, the editing
component invokes the editing filter. This editing filter filters the
document and creates an editing buffer based on the current editing
pointer and on the parameters of the editing filter. The editing filter
parameters are given by the user or the system, and specify the range
of the text that will be affected by the editing operation. Editing buffer
thus have that portion of the document that is to be edited.
Filtering either can consist of the selected contiguous characters
beginning at the current pointer or it may depend on more complex
user specifications about the content and structure of the document.
• Viewing module- Viewing module is the collection of all submodules
that help in determining the next view of the output device.Viewing
module maintains a pointer called current viewing pointer (cvp) which
stores the starting point of the document area to be viewed. The value
of the cvp can be set or reset by the user with the help of traveling
commands like next line, next paragraph, next screen etc. or
implicitly by the system as a result of some other or previous editing
operation.
When display needs to be updated the viewing module invokes
the viewing filter. This viewing filter filters the document and creates
a viewing buffer based on the current viewing pointer and on the
parameters of the viewing filter. The viewing filter parameters are
given by the user or system, and specify the information about the
number of characters needed to be displayed and how to select them
from the document.
• Traveling module- It basically performs the settings of the current
editing pointer and current viewing pointer, on the basis of the
traveling commands. Thus it determines the point at which viewing
and editing filters begin.
• Display module- After all processing of editing and viewing, the
viewing buffer is passed to the display module, which produces the
display by mapping the viewing buffer onto the rectangular subpart of
the screen known as a window.
Apart from the fundamental editing functions, most editors
support an undo function to nullify one or more of the previous edit
operations performed by the user. The undo function can be
implemented by storing a stack of previous views or by devising an
inverse for each edit operation. Multilevel undo commands pose
obvious difficulties in implementing overlapping edits
User command
User Command
processor
Editing Module
Traveling Module
Output Devices
Viewing Module
Editing Buffer
Editing filter
Viewing Buffer
viewing Filter
Main Memory
Display Module
• Linker- Linking is the process where program is linked with other
programs, data or libraries that are necessary for the successful
execution of the program. Thus, linking is the process of binding of
external reference to actual link time address. The statements used for
this purpose are:
• ENTRY- This statement lists the public definitions that are defined in
the program. Public definitions in the list are the symbols defined in
the program which may be used or referenced in other program
• EXTERN-This statement indicates the list of all external references
made in the program. External references list all the references made
to a symbol, which is not defined in the program. The symbols may be
defined in some other program.
• Design of a linker- The linker invocation command has the following
format:
•
LINKER <link origin>,<object module names>,[<execution start
address>]
• To form a binary program from a set of object modules, the
programmer invokes the linker command. <link origin> specifies the
memory address to be given to the first word of the binary program.
<execution start address> is usually a pair (program unit name, offset
in program unit). The linker converts this into the linked start address.
This is stored along with the binary program for use when the
program is to be executed. If specification of <execution start
address> is omitted the execution start address is assumed to be the
same as the linked origin.
• Linker converts the object modules in the set of program units SP into
a binary program . Since we have assumed that link address = load
address, the loader simply loads the binary program into the
appropriate area of memory for the purpose of execution
• The object module of a program contains all information necessary to relocate and
link the program with other programs. The object module of a program P consists of
4 components:
• Header- The header contains translated origin, size and execution start address of P
• Program: This component contains the machine language program corresponding to
P
• Relocation table: (RELOCTAB) This table describes IRRp. Each RELOCTAB
entry contains a single field
Translated address: Translated address of an address sensitive instruction
• Linking Table (LINKTAB)- This table contains information concerning the public
definitions and external references in P
Each LINKTAB contains three fields:
– Symbol: symbolic name
– Type: PD/EXT indicating whether public definition or external reference
– Translated address: For a public definition, this is the address of the first
memory word allocated to the symbol.
For an external reference, it is the address of the memory word which is
required to contain the address of the symbol
• The linker in the task of making an executable form from various
object modules performs two important functions.
• Relocation of address sensitive instructions
• Linking involves the resolution of external references in various
object modules linked in a program
•
•
•
•
The relocation done by the linker follows the following steps:
Relocation algorithmStep 1: program_linked_origin:=<link origin> from linker command
Step 2: For each object module
– Set t_origin:=translated origin of the object module
Set OM_size:=size of the object module
– Set Relocation_factor:=program_linked_origin – t_origin
– Read the machine language program in work_area
– Read RELOCTAB of the object module
– For each entry in RELOCTAB
• Set Translated_addr:= address in the RELOCTAB entry
• Set Address_in_work_area:=address of work_area +
translated_address – t_origin
• Add relocation_factor to the operand address in the word with
the address address_in_work_area
– Program_linked_origin := program_link_origin + OM_size
• Linking algorithm- The linker processes all object modules being linked and builds a
table of all public definitions and their load time addresses. Linking is thus simply a
matter of searching for a particular symbol in this table and copying its linked
address into the word containing the external reference.
• A name table (NTAB) is defined for use in program linking. Each entry of the table
contains the following fields:
• Symbol: symbolic name of an external reference or an object module
• Linked address: For a public definition, this field contains linked address of the
symbol. For an object module, it contains the linked origin of the object module
• Most information in the NTAB is obtained from the LINKTAB. This table contains
information concerning the public definitions and external references in program
• Each LINKTAB contains three fields:
– Symbol: symbolic name
– Type: PD/EXT indicating whether public definition or external reference
– Translated address: For a public definition, this is the address of the first
memory word allocated to the symbol.
For an external reference, it is the address of the memory word which is
required to contain the address of the symbol.
• Algorithm:( program linking)
• Step 1: program_linked_origin:=<link_origin> from the linker
command
• Step 2: For each object module
– Set t_origin:=translated origin of the object module
Set OM_size:=size of the object module
– Set Relocation_factor:=program_linked_origin – t_origin
– Read the machine language program in work_area
– Read LINKTAB of the object module
– For each LINKTAB entry with the type=PD
• Set name:=symbol
• Set Linked_addres:=translated_address + relocation_factor
• Enter(name, linked_addres) in NTAB
– Enter (object module name, program_linked_origin) in NTAB
– Set program_linked_origin:=program_linked_origin + OM_size
• For each object module
– T_origin:=translated origin of object module
program_linked_origin:=load_address from NTAB
– For each LINKTAB entry with type=EXT
• Address_in_work_area:=address of work_area +
program_linked_origin –
<link origin> +translated address –
t_origin
• Search symbol in NTAB and copy its linked address. Add the
linked address to the operand address in the word with the
address address_in_work_area
LAST YEAR QUESTION
PAPERS
• Bootstrap loader- In computing, booting (booting up) is a bootstrapping process that
starts operating systems when the user turns on a computer system. A boot sequence
is the initial set of operations that the computer performs when power is switched
on. The bootstrap loader typically loads the main operating system for the computer.
• A computer's central processor can only execute program code found in Read-Only
Memory (ROM) and Random Access Memory (RAM). Modern operating systems
and application program code and data are stored on nonvolatile data storage
devices, such as hard disc drives, CD, DVD, USB flash drive, and floppy disk.
When a computer is first powered on, it does not have an operating system in ROM
or RAM. The computer must initially execute a small program stored in ROM along
with the bare minimum of data needed to access the nonvolatile devices from which
the operating system programs and data are loaded into RAM.
• The small program that starts this sequence of loading into RAM, is known as a
bootstrap loader, bootstrap or boot loader. This small boot loader program's only job
is to load other data and programs which are then executed from RAM. Often,
multiple-stage boot loaders are used, during which several programs of increasing
complexity sequentially load one after the other in a process of chain loading.
• Bootstrap Loader- When the computer is first turned on, at that time
the machine is empty, without any program in memory. Then it is
required to answer the question that who will load the loader into the
memory, which loads the other programs. Here, we can say that the
loader is loaded by the operating system but who will load the
operating system. Since the machine is idle, there is no need of
program relocation. The only function required is of loading. We can
specify the absolute address of the program that is to be loaded first in
memory. Normally, this program is the operating system, which
occupies the predefined location in the memory. This is basically the
function of an absolute loader, which loads the program to the
absolute address without relocation.
• In some computers, the absolute loader is permanently stored in a read only memory
(ROM). When the system is switched on, the machine starts executing this ROM
program and it loads the operating system. On some computers, this absolute loader
program is executed directly from ROM and in some other systems it is copied from
ROM to main memory and executed there. It is better to execute it in the main
memory, as it can be very inconvenient to change ROM program if some
modification are required in the absolute loader.
• A better solution is to have a small ROM program that loads a fixed length record or
instruction from some device into main memory at a fixed location. Immediately
after loading is complete, the control is transferred to the address of main memory
where the record is stored. This record contains the machine instruction that loads
the absolute program that follows. These first records loaded in memory are referred
to as bootstrap loader. Bootstrap loader is a special type of absolute loader, which is
executed first when the system is powered on. This bootstrap loader loads the first
program ot be run by the computer---mostly on operating system. Such a loader can
be added to the beginning of all the object programs that are to be loaded into an
empty machine. Such programs are like operating systems, or standalone programs
that are to be run without operating system.
Problem: Give a flow chart for pass 1 of a two pass assembler scheme
Generate IC (DL, code)
Generate IC (IS, code)
EQU
Evaluate Operand
field along with
address
Flowchart of pass 1 of the assembler
Flowchart of pass 2 of the assembler
• USING pseudo op- USING is a pesudo op that indicates to the
assembler which general register to use as a base register and what its
contents will be. This is necessary because no special registers are set
aside for addressing thus the programmer must inform the assembler
which register(s) to use and how to use them. Since addresses are
relative, he can indicate to the assembler the address contained in the
base register. The assembler is thus able to produce the machine code
with the correct base register and offset
• BALR is an instruction to the computer to load a register with the next
address and branch to the address in the second field. When the
second operand is register 0, execution proceeds with the next
instruction
• Difference between a macro and a subroutine- The basic difference
between a macro and a subroutine is that a macro call is an instruction
to the assembler to replace the macro name with the macro body. A
subroutine call is a machine instruction that is inserted into the object
program and that will later be executed to call the subroutine.
• A call to a macro leads to its expansion whereas calling a subroutine
leads to its execution
• Use of macros considerably increase the size of the program
• Macros do not affect the execution speed on the other hand frequent
calls to the subroutines are said to affect the execution efficiency of
the programs.
• Briefly discuss what modifications must be made to the macro
processor implementation, if labels are allowed in the macro
definition
• Generation of unique labels- Advanced macro processors allow
assembly language programmers to define and use labels in their
programs like the use of labels such as goto in high level
programming languages. However, allowing labels in macro
definitions is not an easy task for macroprocessor designers, since we
have to deal with the problem of duplicate label definition when the
program is assembled. Therefore, most of the macroprocessors do not
allow the use of labels in macro definitions and use relative addressing
instead to jump from one statement to the other. For long jumps, using
relative addressing is error –prone and difficult to read. Therefore,
there must be some way to deal with duplicate definition problem so
that the labels within the macros are correctly assembled after
expansion.
• Example
•
MACRO
TESTLABEL
CLEAR
X
$LOOP
COMP
AREG
BREG
JEQ
$LOOP
COMP
AREG
CREG
JLT
$LOOP
MEND
Now suppose the assembly program contains the following two calls
to the above macro
100)
TESTLABEL
.
.
200)
TESTLABEL
•
Now after expansion, the resulting assembly code would look like this
100a) CLEAR
X
100b) $LOOP
COMP
AREG
BREG
100c) JEQ
$LOOP
100d) COMP
AREG
CREG
100e) JLT
$LOOP
.
.
.
200a) CLEAR
X
200b) $LOOP
COMP
AREG
BREG
200c) JEQ
$LOOP
200d) COMP
AREG
CREG
200e) JLT
$LOOP
As shown in the lines 100b) and 200b) the same label $LOOP is defined twice in the resulting
assembly program (after macro expansion) leading to error by the assembler
• A common solution to this problem of duplicate label definitions has
to be formulated. It ensures that no matter how many times a macro
is called, the macro processor always generates unique labels during
macro expansion.
• Each time a macro is expanded , the character $ preceding the label
is replaced by the string $xx where xx is a two character alphanumeric counter of the no. of macro instructions expanded xx will
have values AA,AB, AC, AD……….A9, BA, BB, BC, BD….B9
and so on. This provides 36 X 36 (=1296) unique labels. When
generation of unique labels facility is incorporated, the example
macro TESTLABEL would be expanded like this:
100a) CLEAR
X
100b) $AALOOP
COMP
AREG
100c) JEQ
$AALOOP
100d) COMP
AREG
CREG
100e) JLT
$AALOOP
.
.
.
200a) CLEAR
X
200b) $ABLOOP
COMP
AREG
200c) JEQ
$ABLOOP
200d) COMP
AREG
CREG
200e) JLT
$ABLOOP
BREG
BREG
Create a copy
of source
program with all
macro definitions
removed
Pass 1 –processing macro definition
Pass 2– processing macro calls and expansion
•
•
Argument List Array- Pass 1 of macro processor creates a data structure Aragument List
Array which is used to replace the formal arguments with actual arguments upon macro
expansion in pass 2. during pass 1, ALA stores the positional indices of formal parameters.
The index 0 is reserved for the label name, if any, in macro definition. For example the ALA
for the macro definition
INCR &REG,&VAL,&ADDR is as given below:
Index
Argument
0
1
2
3
NULL
#1
#2
#3
This ALA is again referred in pass 2 and is edited to fill with the actual arguments,
for all formal parameter entries, given in the macro call. For example, a macro call
to above given macro as
INCR A,50,2500H
will be given as
Index
Argument
0
1
2
3
NULL
A
50
2500H
• Single processor macro design- In a single pass macro processor, all
macro processing is done in a single pass. For processing macro in a
single pass, a restriction is imposed on programmers that macro
definition must appear before calling a macro. The single pass
macroprocessor design has an additional advantage that it can easily
process macro definitions within another macro definition. The inner
macro definition is encountered when a call to outer is being expanded
• A one pass macro processor uses two additional variables namely
Macro Definition Input (MDI) indicator and Macro definition Level
counter (MDLC). Since a single pass macroprocessor has to handle
the macro definition and macro call simultaneously, MDI is used to
indicate the status. When MDI contains a Boolean value ON, it
indicates the macroprocessor is currently expanding the macro call.
Otherwise MDI contains a Boolean value OFF. MDLC ensures that
the entire macro definition is stored in MDT. When it is zero, it
indicates that all nested macro definitions have been handled. MDLC
variable calculates the difference between number of MACRO and
number of MEND directives in a macro definition. Other data
structures used in a one pass macro processor are same as used in a
two pass macroprocessor.
Simple one pass macro processor
Problem: Inorder to process a macro in a single pass, we had to restrict the macro
language. Describe the restrictions and the limitations that it imposes on
program organisation
• In a single pass macro processor, all macro processing is done in a single pass. For
processing macro in a single pass, a restriction is imposed on programmers that
macro definition must appear before calling a macro. The single pass
macroprocessor design has an additional advantage that it can easily process macro
definitions within another macro definition. The inner macro definition is
encountered when a call to outer is being expanded
• A one pass macro processor uses two additional variables namely Macro Definition
Input (MDI) indicator and Macro definition Level counter (MDLC). Since a single
pass macroprocessor has to handle the macro definition and macro call
simultaneously, MDI is used to indicate the status. When MDI contains a Boolean
value ON, it indicates the macroprocessor is currently expanding the macro call.
Otherwise MDI contains a Boolean value OFF. MDLC ensures that the entire macro
definition is stored in MDT. When it is zero, it indicates that all nested macro
definitions have been handled. MDLC variable calculates the difference between
number of MACRO and number of MEND directives in a macro definition. Other
data structures used in a one pass macro processor are same as used in a two pass
macroprocessor.
• Problem: Write a macro that moves 8 numbers from first 8 positions
of an array specified as the first operand into the first 8 positions of
an array specified as the second operand.
• Solution:
MACRO
COPYTO &X,&Y
LCL
&M
&M
SET
0
.MORE MOVER
AREG,&X+&M
MOVEM AREG,&Y+&M
&M
SET
&M+1
AIF
(&M NE 7) .MORE
MEND
Download