System Programming

advertisement
Chih-Hung Wang
Chapter 2: Assembler (Full)
參考書目
Leland L. Beck, System Software: An Introduction to
Systems Programming (3rd), Addison-Wesley, 1997.
1
Role of Assembler
Object
Source
Program
Assembler
Code
Linker
Executable
Code
Loader
2
Chapter 2 -- Outline
 Basic Assembler Functions
 Machine-dependent Assembler Features
 Machine-independent Assembler Features
 Assembler Design Options
3
Introduction to Assemblers
 Fundamental functions
 Translating mnemonic operation codes to their machine language
equivalents
 Assigning machine addresses to symbolic labels
 Machine dependency
 Different machine instruction formats and codes
4
Example Program (Fig. 2.1)
 Purpose
 Reads records from input device (code F1)
 Copies them to output device (code 05)
 At the end of the file, writes EOF on the output device, then RSUB
to the operating system
 Program (See Fig. 2.1)
5
SIC Assembly Program (Fig. 2.1)
Line numbers
(for reference)
Address
labels
Mnemonic
opcode
comments
operands
6
SIC Assembly Program (Fig. 2.1)
Indicate comment lines
Index addressing
7
SIC Assembly Program (Fig. 2.1)
8
Example Program (Fig. 2.1)
 Data transfer (RD, WD)
 a buffer is used to store record
 buffering is necessary for different I/O rates
 the end of each record is marked with a null character (0016)
 the end of the file is indicated by a zero-length record
 Subroutines (JSUB, RSUB)
 RDREC, WRREC
 save link register first before nested jump
9
Assembler Directives
 Pseudo-Instructions
 Not translated into machine instructions
 Providing information to the assembler
 Basic assembler directives
 START :
 Specify name and starting address for the program
 END :
 Indicate the end of the source program, and (optionally) the first executable
instruction in the program.
 BYTE :
 Generate character or hexadecimal constant, occupying as many bytes as needed
to represent the constant.
 WORD :
 Generate one-word integer constant
 RESB :
 Reserve the indicated number of bytes for a data area
 RESW :
 Reserve the indicated number of words for a data area
10
Object Program
 Header
Col. 1 H
Col. 2~7
Col. 8~13
Col. 14-19
Program name
Starting address (hex)
Length of object program in bytes (hex)
 Text
Col.1 T
Col.2~7
Col. 8~9
Col. 10~69
Starting address in this record (hex)
Length of object code in this record in bytes (hex)
Object code (69-10+1)/6=10 instructions
 End
Col.1 E
Col.2~7
Address of first executable instruction (hex)
(END program_name)
11
Fig. 2.3 (Object Program)
1033-2038: Storage reserved by the loader
12
Assembler Tasks
 The translation of source program to object code requires us
the accomplish the following functions:
 Convert mnemonic operation codes to their machine language




equivalents (e.g. translate STL to 14 - Line 10)
Convert symbolic operands to their equivalent machine addresses
format (e.g. translate RETARD to 1033 - Line 10)
Build machine instructions in the proper format
Convert the data constants specified in the source program into
their internal machine representations (e.g. translate EOF to
454F46) - Line 80
Write object program and the assembly listing
13
Example of Instruction
Assemble
STCH
BUFFER,X
8
opcode
(54)16
549039
1
x
1 (001)2
15
address
m
(039)16
 Forward reference
14
Forward Reference
 A reference to a label (RETADR) that is defined later in the
program
 Solution
 Two passes
 First pass: does little more than scan the source program for label
definition and assign addresses (such as those in the Loc column in Fig.
2.2).
 Second pass: performs most of the actual instruction translation
previously defined.
15
Difficulties: Forward
Reference
 Forward reference: reference to a label that is defined later in
the program.
Loc
Label
Operator
Operand
1000
1003
…
1012
…
1033
FIRST
CLOOP
…
STL
JSUB
…
J
…
RETADR
RDREC
…
CLOOP
…
…
RETADRRESW
…
…
1
16
Two Pass SIC Assembler
 Pass 1 (define symbols)
 Assign addresses to all statements in the program
 Save the addresses assigned to all labels for use in Pass 2
 Perform assembler directives, including those for address
assignment, such as BYTE and RESW
 Pass 2 (assemble instructions and generate object
program)
 Assemble instructions (generate opcode and look up
addresses)
 Generate data values defined by BYTE, WORD
 Perform processing of assembler directives not done during
Pass 1
 Write the object program and the assembly listing
17
Two Pass SIC Assembler
 Read from input line
 LABEL, OPCODE, OPERAND
Source
program
Intermediate
file
Pass 1
OPTAB
SYMTAB
Pass 2
Object
codes
SYMTAB
18
Assembler Data Structures
 Operation Code Table (OPTAB)
 Symbol Table (SYMTAB)
 Location Counter (LOCCTR)
OPTAB
Pass 1
Intermediate
file
Source
Object
Program
Pass 2
LOCCTR
SYMTAB
19
Location Counter (LOCCTR)
 A variable that is used to help in the assignment of addresses,
i.e., LOCCTR gives the address of the associated label.
 LOCCTR is initialized to be the beginning address specified in
the START statement.
 After each source statement is processed during pass 1, the
length of assembled instruction or data area to be generated
is added to LOCCTR.
20
Operation Code Table
(OPTAB)
 Contents:
 Mnemonic operation codes (as the keys)
 Machine language equivalents
 Instruction format and length
 Note: SIC/XE has instructions of different lengths
 During pass 1:
 Validate operation codes
 Find the instruction length to increase LOCCTR
 During pass 2:
 Determine the instruction format
 Translate the operation codes to their machine language equivalents
 Implementation: a static hash table (entries are not normally
added to or deleted from it)
 Hash table organization is particularly appropriate
21
SYMTAB
 Contents:
 Label name
 Label address
 Flags (to indicate error conditions)
 Data type or length
COPY
FIRST
CLOOP
ENDFIL
EOF
THREE
ZERO
RETADR
LENGTH
BUFFER
RDREC
1000
1000
1003
1015
1024
102D
1030
1033
1036
1039
2039
 During pass 1:
 Store label name and assigned address (from LOCCTR) in
SYMTAB
 During pass 2:
 Symbols used as operands are looked up in SYMTAB
 Implementation:
 a dynamic hash table for efficient insertion and retrieval
 Should perform well with non-random keys (LOOP1, LOOP2).
22
Fig. 2.2 (1) Program with
Object code
23
Fig. 2.2 (2) Program with
Object code
24
Fig. 2.2 (3) Program with
Object code
25
Figure 2.1 (Pseudo code Pass 1)
26
Figure 2.1 (Pseudo code Pass 1)
27
Figure 2.1 (Pseudo code Pass 2)
28
Figure 2.1 (Pseudo code Pass 2)
29
SIC/XE Assembly Program
extended
format
immediate
addressing
indirect addressing
30
SIC/XE Assembly Program
31
SIC/XE Assembly Program
32
Benefits of SIC/XE Addressing
Modes
 Register-to-register instructions
 Shorter than register-to-memory instructions
 No memory reference
 Immediate addressing mode
 No memory reference. The operand is already
present as part of the instruction
 Indirect addressing mode
 Avoids the needs for another instruction
 Relative addressing mode
 Shorten than the extended instruction
 Easy program relocation
33
Considering Instruction
Formats
 START directive specifies a beginning program address of 0: a
relocatable program.
 Register-to-register instructions: simply convert the
mnemonic name to their number equivalents
 OPTAB: for opcodes
 SYMTAB: preloaded with register names and their values
34
 COMPR A,S
150
---- ---- ---- ---1010 0000 0000 0100  A004
CLEAR X
125
1011 0100 0001 0000  B410
35
Considering Addressing
Modes
 PC or base relative addressing
 Calculate displacement
 Displacement must be small enough to fit in the 12-bit field (-
2048..2047 for PC relative mode, 0..4095 for base relative mode)
 Extended instruction format (4-byte)
 20-bit field for direct addressing
36
How Assembler Recognizes
the Addressing Mode
 Extended format:
 Indirect addressing:
 Immediate addressing:
 Index addressing:
+op m
op @m
op #c
op m,X
op m
 Relative addressing:
 1st choice: PC relative (arbitrarily chosen)
 2nd choice: base relative (if displacement is invalid in
PC relative mode)
 3rd choice: error message (if displacement is invalid in
both relative modes)
37
SIC/XE Assembly with Object Code
38
SIC/XE Assembly with Object Code
39
SIC/XE Assembly with Object Code
40
Immediate Addressing Mode
Instruction:
55
0020
(00)16
01 00 0 0
(01)16
Instruction: 133
(74)16
(75)16
LDA
01 00 0 1
(1)16
010003
(003)16
(0)16
103C
#3
(003)16
+LDT
#4096
75101000
(01000)16
(01000)16
41
Extended Format
Instruction:
(48)16
(4B)16
15
0006 CLOOP
11 00 0 1
(1)16
+JSUB RDREC
4B101036
(01036)16
(01036)16
42
PC Relative Addressing Mode
Instruction:
10
12
:
95
0000 FIRST
STL RETADR
0003
LDB #LENGTH
:
0030 RETADR RESW 1
(14)16
(17)16
11 00 1 0
(2)16
17202D
69202D
(02D)16
(02D)16
PC is advanced after each instruction is fetched and before it is executed.
That is, PC contains the address of the next instruction.
disp = (0030)16-(0003)16 = (002D)16
43
PC Relative Addressing Mode
Instruction:
15
0006 CLOOP +JSUB RDREC
:
0017
J
CLOOP
001A ENDFIL LDA EOF
:
40
45
(3C)16
(3F)16
11 00 1 0
(2)16
4B101036
3F2FEC
032010
(FEC)16
(FEC) 16
disp = (006)16-(01A)16 = (FEC)16
-14
44
Base Relative Addressing Mode
Instruction:
12
13
:
100
105
:
160
(54)16
0003
LDB
BASE
#LENGTH
LENGTH
:
0033 LENGTH RESW 1
0036 BUFFER RESB 4096
:
104E
STCH BUFFER,X
11 1 10 0
69202D
57C003
(003)16
(57)16
(C)16
(003)16
•PC relative is no longer applicable
•BASE directive explicitly informs the assembler that the base
register will contain the address of LENGTH (use NOBASE to
invalidate)
•LDB loads the address of LENGTH into base register during
execution
disp = (0036)16-(0033)16 = (0003)16
45
Immediate + PC Relative Addressing
Mode
Instruction: 12
0003
LDB
13
BASE
15
0006 CLOOP +JSUB
:
:
100
0033 LENGTH RESW
(68)16
(69)16
01 00 1 0
(2)16
#LENGTH
LENGTH
RDREC
69202D
4B101036
1
(02D)16
(02D)16
disp = (0033)16-(0006)16 = (002D)16
46
Indirect + PC Relative Addressing
Mode
Instruction: 70 002A
J
@RETADR
80
002D EOF
BYTE
C’EOF’
95
0030 RETADR RESW 1
(3C)16
(3E)16
10 00 1 0
(2)16
3E2003
454F46
(003)16
(003)16
disp = (0030)16-(002D)16 = (0003)16
47
Why Program Relocation
 To increase the productivity of the machine
 Want to load and run several programs at the
same time (multiprogramming)
 Must be able to load programs into memory
wherever there is room
 Actual starting address of the program is not
known until load time
48
Absolute Program
 Program with starting address specified at assembly time
 In the example of SIC assembly program
Instruction:
55
101B
LDA
THREE
00102D
Calculated from the
starting address 1000
 The address may be invalid if the program is loaded into some
where else.
49
Relocatable Program
50
What Needs to be Relocated
 Need to be modified:
 The address portion of those instructions that use absolute (direct)
addresses.
 Need not be modified:
 Register-to-register instructions (no memory references)
 PC or base-relative addressing (relative displacement remains the
same regardless of different starting addresses)
51
How to Relocate Addresses
 For Assembler
 For an address label, its address is assigned
relative to the start of the program (that’s why
START 0)
 Produce a modification record to store the
starting location and the length of the address
field to be modified.
 For loader
 For each modification record, add the actual
beginning address of the program to the address
field at load time.
52
Format of Modification Record
 One modification record for each address to be modified
 The length is stored in half-bytes (20 bits = 5 half-bytes)
 The starting location is the location of the byte containing
the leftmost bits of the address field to be modified.
 If the field contains an odd number of half-bytes, the
starting location begins in the middle of the first byte.
53
Relocatable Object Program
5 half-bytes
15
+JSUB RDREC
35
+JSUB WRREC
65
+JSUB WRREC
54
Machine Independent
Assembler Features
 Features are not closely related to machine
architecture.
 More related to issues about:
 Programmer convenience
 Software environment
 Common examples:
 Literals
 Symbol-defining statements
 Expressions
 Program blocks
 Control sections
 Assembler directives are widely used to support
these features
55
Literals
 Literal is equivalent to:
 Define a constant explicitly and assign an address
label for it
 Use the label as the instruction operand
 Why use literals:
 To avoid defining the constant somewhere and
making up a label for it
 Instead, to write the value of a constant operand
as a part of the instruction
 How to use literals:
 A literal is identified with the prefix =, followed by
a specification of the literal value
56
Original Program
57
Using Literal
58
Object Program Using Literal
The same as before
59
Original Program
60
Using Literal
61
Object Program Using Literal
The same as before
62
Literal vs. Immediate
Addressing
 Same:
 Operand field contains constant values
 Difference:
 Immediate addressing: the assembler put the constant value as
part of the machine instruction
 Literal: the assembler store the constant value elsewhere and put
that address as part of the machine instruction
63
Literal Pool
 All of the literal operands are gathered together into one or
more literal pools.
 literal pool:
 At the location where the LTORG directive is encountered
 To keep the literal operand close to the instruction that uses it
 At the end of the object program, generated immediately
following the END statement
64
Duplicate Literals
 Duplicate literals:
 The same literal used more than once in the program
 Only one copy of the specified value needs to be
stored
 For example, =X’05’ in the example program
 How to recognize the duplicate literals
 Compare the character strings defining them
 Easier to implement, but has potential problem (see next)
 E.g., =X’05’
 Compare the generated data value
 Better, but will increase the complexity of the assembler
 E.g., =C’EOF’ and =X’454F46’
65
Problem of Duplicate-Literal
Recognition using Character
Strings
 There may be some literals that have the same
name, but different values
 For example, the literal whose value depends
on its location in the program
 The value of location counter denoted by *
BASE
*
LDB
=*
 The literal =* repeatedly used in the program has the
same name, but different values
 All this kind of literals have to be stored in the
literal pool
66
Implementation of Literal
 Data structure: a literal table LITTAB
 Literal name
 Operand value and length
 Address
 LITTAB is often organized as a hash table, using the
literal name or value as the key
67
Implementation of Literal
 Pass 1
 As each literal operand is recognized
 Search the LITTAB for the specified literal name or value
 If the literal is already present, no action is needed
 Otherwise, the literal is added to LITTAB (store the name, value,
and length, but not address)
 As LTORG or END is encountered
 Scan the LITTAB
 For each literal with empty address field, assign the address and
update the LOCCTR accordingly
68
Implementation of Literal
 Pass 2
 As each literal operand is recognized
 Search the LITTAB for the specified literal name or value
 If the literal is found, use the associated address as the operand of
the instruction
 Otherwise, error (should not happen)
 As LTORG or END is encountered
 insert the data values of the literals in the object program
 Modification record is generated if necessary
69
Symbol-Defining Statements
 How to define symbols and their values
 Address label
 The label is the symbol name and the assigned address is its
value
FIRST
STL
RETADR
 Assembler directive EQU
symbol EQU value
 This statement enters the symbol into SYMTAB and assigns to
it the value specified
 The value can be a constant or an expression
 Assembler directive ORG
ORG value
70
Use of EQU
 To improve the program readability, avoid using the
magic numbers, make it easier to find and change
constant values
 +LDT #4096
 MAXLEN EQU 4096
+LDT #MAXLEN
 To define mnemonic names for registers
 A EQU 0
 X EQU 1
 BASE EQU R1
 COUNT EQU R2
71
Use of ORG
 Indirect value assignment:
ORG
value
 When ORG is encountered, the assembler resets its LOCCTR
to the specified value
 ORG will affect the values of all labels defined until the next
ORG
 If the previous value of LOCCTR can be automatically
remembered, we can return to the normal use of LOCCTR by
simply write
ORG
72
Example of Using ORG
 Data structure
 SYMBOL: 6 bytes
 VALUE: 3 bytes (one word)
 FLAGS: 2 bytes
 Refer to every field of each entry
73
Not Using ORG
Offsets from STAB
Less readable and meaningful
 We can fetch the VALUE field by
LDA VALUE,X
 X = 0, 11, 22, … for each entry
74
Using ORG
Set the LOCCTR to STAB
Size of field
more
meaningful
Restore the LOCCTR to its
previous value
Or only use ORG
75
Forward-Reference Problem
 Forward reference is not allowed for EQU and ORG.
 That is, all terms in the value field must have been
defined previously in the program.
 The reason is that all symbols must have been
defined during Pass 1 in a two-pass assembler.
Allowed
Not allowed
76
Forward-Reference Problem
Not allowed
Not allowed
77
Expressions
 A single term as an instruction operand can be replaced by
an expression.
STAB
RESB
1100
STAB
RESB
11*100
STAB
RESB
(6+3+2)*MAXENTRIES
 The assembler has to evaluate the expression to produce a
single operand address or value.
 Expressions consist of
 Operator
 +,-,*,/ (division is usually defined to produce an integer result)
 Individual terms
 Constants
 User-defined symbols
 Special terms, e.g., *, the current value of LOCCTR
78
Relocation Problem in
Expressions
 Values of terms can be
 Absolute (independent of program location)
 constants
 Relative (to the beginning of the program)
 Address labels
 * (value of LOCCTR)
 Expressions can be
 Absolute
 Only absolute terms
 Relative terms in pairs with opposite signs for each pair
 Relative
 All the relative terms except one can be paired as described in “absolute”.
The remaining unpaired relative term must have a positive sign.
 No relative terms may enter into a multiplication or division
operation
 Expressions that do not meet the conditions of either “absolute”
or “relative” should be flagged as errors.
79
Absolute Expression
 Relative term or expression implicitly represents (S+r)
 S: the starting address of the program
 r: value of the term or expression relative to S
 For example
 BUFFER: S+r1
 BUFEND: S+r2
 The expression, BUFEND-BUFFER, is absolute.
 MAXLEN = (S+r2)-(S+r1) = r2-r1 (no S here)
 MAXLEN means the length of the buffer area
 Illegal expressions: BUFEND+BUFFER, 100-BUFFER, 3*BUFFER
Values associated with symbols
80
Absolute or Relative
 To determine the type of an expression, we must
keep track of the types of all symbols defined in
the program.
 We need a “flag” in the SYMTAB for indication.
81
Program Blocks
 Collect many pieces of code/data that scatter in
the source program but have the same kind into
a single block in the generated object program.
 For example, code block, initialized data block, un-
initialized data block. (Like code, data segments on a
Pentium PC).
 Advantage:
 Because pieces of code are closer to each other now, format 4
can be replaced with format 3, saving space and execution time.
 Code sharing and data protection can better be done.
 With this function, in the source program, the
programmer can put related code and data near
each other for better readability.
82
Advantages of Using
Program blocks
 To satisfy the contradictive goals:
 Separate the program into blocks in a particular
order
 Large buffer area is moved to the end of the object
program
 Using the extended format instructions or base relative
mode may be reduced. (lines 15, 35, and 65)
 Placement of literal pool is easier: simply put them before
the large data area, CDATA block. (line 253)
 Data areas are scattered
 Program readability is better if data areas are placed in
the source program close to the statements that
reference them.
83
Program Block Example
Default block.
84
Use the default block.
85
Use the default block.
• At the beginning of the program, statements are assumed to be
part of the unnamed (default) block.
• The default block (unnamed) contains the executable instructions.
• The CDATA block contains all data areas that are a few words or
less in length.
• The CBLKS block contain all data areas that consist of large blocks
of memory.
86
Job of Assembler
 A program block may contain several separate
segments of the source program.
 The assembler will (logically) rearrange these segments
to gather together the pieces of each block.
 These blocks will then be assigned addresses in the
object program, with the blocks appearing in the same
order in which they were first begun in the source
program.
 The result is the same as if the programmer had
physically rearranged the source statements to group
together all the source lines belonging to each block.
87
Assembler Processing (1)
 Pass 1:
 Maintain a separate location counter for each program
block.
 The location counter for a block is initialized to 0 when the
block is first begun.
 The current value of this location counter is saved when
switching to another block, and the saved value is restored
when resuming a previous block.
 Thus, during pass 1, each label is assigned an address that is
relative to the beginning of the block that contains it.
 After pass 1, the latest value of the location counter for
each block indicates the length of that block.
 The assembler then can assign to each block a starting
address in the object program.
88
Assembler Processing (2)
 Pass 2
 When generating object code, the assembler needs the
address for each symbol relative to the start of the object
program (not the start of an individual problem block)
 This can be easily done by adding the location of the
symbol (relative to the start of its block) to the assigned
block starting address.
89
Figure 2.12 (a)
There is no block
number for MAXLEN.
This is because
MAXLEN is an
absolute symbol.
90
0063+3
91
Symbol Table After Pass 1
92
Object Code in Pass 2
 20
0006
0
LDA LENGTH
032060
• The SYMTAB shows that LENGTH has a relative address
0003 within problem block 1 (CDATA).
• The starting address for CDATA is 0066.
• Thus the desired target address is 0066 + 0003 = 0069.
• Because this instruction is assembled using program
counter-relative addressing, and PC will be 0009 when the
instruction is executed (the starting address for the default
block is 0), the displacement is 0069 – 0009 = 60.
93
Advantages
 Because the large buffer area is moved to
the end of the object program, we no longer
need to use format 4 instructions on line 15,
35, and 65.
 For the same reason, use of the base register
is no longer necessary; the LDB and BASE
have been deleted.
 Code sharing and data protection can be
more easily achieved.
94
Object Code (Figure 2.13)
 Although the assembler internally rearranges code
and data to form blocks, the generated code and
data need not be physically rearranged. The
assembler can simple write the object code as it is
generated during pass 2 and insert the proper load
address in each text record.
95
Leave the Job to Loader
No code need
to be generated
for these two
blocks. We just
need to reserve
space for them.
96
Control Section
 A control section is a part of the program that
maintains its identity after assembly.
 Each such control section can be loaded and
relocated independently of the others. (Main
advantage)
 Different control sections are often used for
subroutines or other logical subdivisions of a
program.
 The programmer can assemble, load, and
manipulate each of these control sections
separately.
97
Program Linking
 Instructions in one control section may need to refer to
instructions or data located in another control section. (Like
external variables used in C language)
 Thus, program (actually, control section) linking is necessary.
 Because control sections are independently loaded and relocated,
the assembler is unable to know a symbol’s address at assembly
time. This job can only be delayed and performed by the loader.
 We call the references that are between control sections
“external references”.
 The assembler generates information for each external reference
that will allow the loader to perform the required linking.
98
Control Section Example
Default control section
99
A new control section
100
A new control section
101
External References
 Symbols that are defined in one control section
cannot be used directly by another control
section.
 They must be identified as external references
for the loader to handle.
 Two assembler directives are used:
 EXTDEF (external definition)
 Identify those symbols that are defined in this control
section and can be used in other control sections.
 Control section names are automatically considered as
external symbols.
 EXTREF (external reference)
 Identify those symbols that are used in this control section
but defined in other control sections.
102
Code Involving External Reference (1)
 15
0003 CLOOP +JSUB
RDREC
4B100000
 The operand (RDREC) is named in the EXTREF
statement, therefore this is an external reference.
 Because the assembler has no idea where the control
section containing RDREC will be loaded, it cannot
assemble the address for this instruction.
 Therefore, it inserts an address of zero.
 Because the RDREC has no predictable relationship to
anything in this control section, relative addressing
cannot be used.
 Instead, an extended format instruction must be used.
 This is true of any instruction whose operand involves an
external reference.
103
Code Involving External Reference (2)
 160 0017 +STCH
BUFFER,X
57900000
 This instruction makes an external reference to BUFFER.
 The instruction is thus assembled using extended format with an
address of zero.
 The x bit is set to 1 to indicate indexed addressing.
104
Code Involving External Reference (3)
 190 0028 MAXLEN WORD BUFEND – BUFFER 000000
 The value of the data word to be generated is specified by
an expression involving two external references.
 As such, the assembler stores this value as zero.
 When the program is loaded, the loader will add to this
data area the address of BUFEND and subtract from it the
address of BUFFER, which then results in the desired value.
 Notice the difference between line 190 and 107. In line 107,
EQU can be used because BUFEND and BUFFER are
defined in the same control section and thus their
difference can be immediately calculated by the assembler.
105
Figure 2.16 Program Object
Code (1)
106
Figure 2.16 Program Object
Code (2)
107
Figure 2.16 Program Object
Code (3)
108
External Reference Processing
 The assembler must remember (via entries
in SYMTAB) in which control section a
symbol is defined.
 Any attempt to refer to a symbol in another
control section must be flagged as an error
unless the symbol is identified (via EXTREF)
as an external reference.
 The assembler must allow the same symbol
to be used in different control sections.
 E.g., the conflicting definitions of MAXLEN on line
107 and 190 should be allowed.
109
Two New Record Types (1)
 We need two new record types in the object
program and a change in the previous defined
modification record type.
 Define record
 Give information about external symbols that are defined in
this control section
 Refer record
 List symbols that are used as external references by this
control section.
110
Two New Record Types (2)
111
Revised Modification Record
112
Object Program
(Figure 2.17)
113
Program Relocation
 The modified “modification record” can still be used for
program relocation.
Program name
114
More Restriction on Expression
 Previously we required that all of the relative terms in an
expression be paired to make the expression an absolute
expression.
 With control sections, the above requirement is not enough.
 We must require that both terms in each pair must be relative
within the same control section.
 BUFEND- BUFFER (allowed) because they are defined in the
same control section.
 On the other hand, RDRED – COPY (not allowed) because
the value is unpredictable.
 How to enforce this restriction
 When an expression involves external references, the assembler
cannot determine whether or not the expression is legal. The
assembler evaluates all of the terms it can, combines these to form
an initial expression value, and generates Modification records. The
loader checks the expression for errors and finishes the evaluation.
115
Assembler Design Options One and Multi-Pass Assembler
 So far, we have presented the design and implementation of a
two-pass assembler.
 Here, we will present the design and implementation of
 One-pass assembler
 If avoiding a second pass over the source program is necessary or
desirable.
 Multi-pass assembler
 Allow forward references during symbol definition.
116
One-Pass Assembler
 The main problem is about forward reference.
 Eliminating forward reference to data items can
be easily done.
 Simply ask the programmer to define variables before
using them.
 However, eliminating forward reference to
instruction cannot be easily done.
 Sometimes your program needs a forward jump.
 Asking your program to use only backward jumps is
too restrictive.
117
Program Example
118
119
All variables are defined before they are used.
120
Two Types of One-pass
Assembler
 There are two types of one-pass assembler:
 Produce object code directly in memory for
immediate execution
 No loader is needed
 Load-and-go for program development and testing
 Good for computing center where most students
reassemble their programs each time.
 Can save time for scanning the source code again
 Produce the usual kind of object program for later
execution
121
Internal Implementation
 The assembler generate object code
instructions as it scans the source program.
 If an instruction operand is a symbol that
has not yet been defined, the operand
address is omitted when the instruction is
assembled.
 The symbol used as an operand is entered
into the symbol table.
 This entry is flagged to indicate that the
symbol is undefined yet.
122
Internal Implementation
(cont’d)
 The address of the operand field of the
instruction that refers to the undefined
symbol is added to a list of forward
references associated with the symbol table
entry.
 When the definition of the symbol is
encountered, the forward reference list for
that symbol is scanned, and the proper
address is inserted into any instruction
previously generated.
123
Processing Example
After scanning line 40
124
Processing Example (cont’d)
After scanning line 160
125
Processing Example (cont’d)
 Between scanning line 40 and 160:
 On line 45, when the symbol ENDFIL is defined, the
assembler places its value in the SYMTAB entry.
 The assembler then inserts this value into the
instruction operand field (at address 201C).
 From this point on, any references to ENDFIL would
not be forward references and would not be entered
into a list.
 At the end of the processing of the program,
any SYMTAB entries that are still marked with *
indicate undefined symbols.
 These should be flagged by the assembler as errors.
126
Multi-Pass Assembler
 If we use a two-pass assembler, the following
symbol definition cannot be allowed.
ALPHA EQU
BETA
BETA
EQU
DELTA
RESW
DELTA
1
 This is because ALPHA and BETA cannot be defined
in pass 1. Actually, if we allow multi-pass processing,
DELTA is defined in pass 1, BETA is defined in pass 2,
and ALPHA is defined in pass 3, and the above
definitions can be allowed.
 This is the motivation for using a multi-pass
assembler.
127
Multi-Pass
Assembler(cont’d)
 It is unnecessary for a multi-pass assembler to make more
than two passes over the entire program.
 Instead, only the parts of the program involving forward
references need to be processed in multiple passes.
 The method presented here can be used to process any kind
of forward references.
128
Multi-Pass Assembler
Implementation
 Use a symbol table to store symbols that are
not totally defined yet.
 For a undefined symbol, in its entry,
 We store the names and the number of undefined
symbols which contribute to the calculation of its
value.
 We also keep a list of symbols whose values depend
on the defined value of this symbol.
 When a symbol becomes defined, we use its
value to reevaluate the values of all of the
symbols that are kept in this list.
 The above step is performed recursively.
129
Forward Reference Example
LOC:1034
130
Forward Reference Processing
But one symbol is unknown yet
Defined
Not defined yet
After first line
131
But two symbols are unknown yet
Now defined
After second line
132
After third line
133
Start knowing values
After 4’th line
134
Start knowing values
All symbols are
defined and their
values are known
now.
After 5’th line
135
Download