Assemblers and Linkers

advertisement
Assemblers and Linkers
Professor Jennifer Rexford
COS 217
1
Goals of This Lecture
• Compilation process
 Compile, assemble, archive, link, execute
• Assembling
 Representing instructions
– Prefix, opcode, addressing modes, operands
 Translating labels into memory addresses
– Symbol table, and filling in local addresses
 Connecting symbolic references with definitions
– Relocation records
 Specifying the regions of memory
– Generating sections (data, BSS, text, etc.)
• Linking
 Concatenating object files
 Patching references
2
Compilation Pipeline
.c
• Compiler (gcc): .c  .s
Compiler
 High-level to assembly language
.s
• Assembler (as): .s  .o
Assembler
 Assembly to machine language
• Archiver (ar): .o  .a
 Object files into single library
• Linker (ld): .o + .a  a.out
 Builds an executable file
.o
Archiver
.a
Linker/Loader
a.out
• Execution (execlp)
 Loads executable and starts it
Execution
3
Assembler
• Assembly language
 A symbolic representation of machine instructions
• Machine language
 Contains everything needed to link, load, and execute
the program
• Assembler
 Translates assembly language into machine language
– Translate instruction mnemonics into op-codes
– Translate symbolic names for memory locations
 Stores the result in an object file (.o)
4
General IA32 Instruction Format
Instruction
Opcode
prefixes
ModR/M
SIB
Displacement
Up to 4
1, 2, or 3 byte 1 byte
1 byte
0, 1, 2,
prefixes of
opcode
(if required) (if required)
or 4 bytes
1 byte each
(optional)
7
6 5
3 2
0
7
6 5
3 2
Mod
Reg/
Opcode
R/M
Scale Index
Immediate
0, 1, 2,
or 4 bytes
0
Base
• Prefixes: we won’t worry about these for now
• Opcode
• ModR/M and SIB (scale-index-base): for memory operands
• Displacement and immediate: depending on opcode, ModR/M and SIB
• Note: byte order is little-endian (low-order byte of word at lower addresses)
5
Example: Push on to Stack
• Assembly language:
pushl %edx
• Machine code:
 IA32 has a separate opcode for push for each register operand
– 50: pushl %eax
– 51: pushl %ecx
0101 0010
– 52: pushl %edx
–…
 Results in a one-byte instruction
• Observe: sometimes one assembly language instruction
can map to a group of different opcodes
6
Example: Load Effective Address
• Assembly language:
leal (%eax,%eax,4), %eax
• Machine code:
 Byte 1: 8D (opcode for “load effective address”)
1000 1101
 Byte 2: 04 (dest %eax, with scale-index-base)
0000 0100
 Byte 3: 80 (scale=4, index=%eax, base=%eax)
1000 0000
Load the address %eax + 4 * %eax into register %eax
7
Example: Movl (Opcode 44)
ModR/M
Instruction
prefixes
Opcode
SIB
M
o reg R/M S
d
Displacement
B
I
Immediate
Mod=11 table
4
movl %ecx, %ebx
4
3
EAX
ECX
EDX
EBX
ESP
EBP
ESI
EDI
3
1000 1000 11 011 001
ebx ecx
mov r/m32,r32
4
movl 12(%ecx), %ebx
2
4
mode %_, %_
2
3
3
8
1000 1000 01 011 001 00001100
12
ebx (ecx)
mode disp8(%_), %_
Reference: IA-32 Intel Architecture Software Developer’s
Manual, volume 2, page 2-1, page 2-6, and page 3-441
0
1
2
3
4
5
6
7
Mod=01 table
[EAX]+disp8 0
[ECX]+disp8 1
[EDX]+disp8 2
[EBX]+disp8 3
[--][--]+disp8 4
[EBP]+disp8 5
[ESI]+disp8 6
[EDI]+disp8 7
8
Example: Mov Immediate to Memory
ModR/M
Instruction
prefixes
Opcode
SIB
M
o reg R/M S
d
I
B
Displacement
Immediate
mov r/m8,imm8
4
movb $97, 999
4
2
3
3
1100 0110 00 000 101
32
8
00000000000000000000001111100111
999
mode disp32
01100001
97
Mod=00 table
[EAX] 0
[ECX] 1
[EDX] 2
[EBX] 3
[--][--] 4
disp32 5
[ESI] 6
[EDI] 7
9
Encoding as Byte String
ModR/M
Instruction
prefixes
Opcode
M
o reg R/M S
d
4
movb $97, 999
SIB
4
2
3
I
Displacement
3
1100 0110 00 000 101
C6
B
05
Immediate
32
00000000000000000000001111100111
E7 03 00 00
8
01100001
61
little-endian
10
Assembly Language
4
movb $97, 999
4
2
3
3
1100 0110 00 000 101
C6
05
char grade = 67;
…
grade = ‘a’;
located at
address 999
32
00000000000000000000001111100111
E7 03 00 00
8
01100001
61
.globl grade
.data
grade:
.byte 67
.text
.
.
.
movb
$97, grade
.
.
.
11
Symbol Manipulation
.text
...
movl count, %eax
...
.data
count:
.word 0
...
.globl loop
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
jmp loop
done:
Create labels and remember their addresses
Deal with the “forward reference problem”
12
Dealing with Forward References
• Most assemblers have two passes
 Pass 1: symbol definition
 Pass 2: instruction assembly
• Or, alternatively,
 Pass 1: instruction assembly
 Pass 2: patch the cross-reference
13
Implementing an Assembler
instruction
symbol table
.s file
disk
instruction
input
assemble
data section
instruction
text section
instruction
bss
section
in memory
structure
in memory
structure
output
.o file
disk
14
Input Functions
• Read assembly language and produce
list of instructions
instruction
symbol table
.s file
instruction
input
assemble
data section
instruction
text section
instruction
bss
section
output
.o file
15
Input Functions
• Lexical analyzer
 Group a stream of characters into tokens
add %g1 , 10 , %g2
• Syntactic analyzer
 Check the syntax of the program
<MNEMONIC><REG><COMMA><REG><COMMA><REG>
• Instruction list producer
 Produce an in-memory list of instruction data structures
instruction
instruction
16
Instruction Assembly
...
loop:
D 0 3 9 [0]
cmpl %edx, %eax
1 byte
jge done
4 bytes
pushl %edx
disp? 7 D [2]
5 2 [4]
call foo
disp?
E 8 [5]
jmp loop
disp?
E 9 [10]
done:
[15]
How to compute the address displacements?
17
Symbol Table
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
disp_l
done:
D 0 3
disp_s 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
18
Symbol Table
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
disp_l
done:
D 0 3
disp_s 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
19
Symbol Table
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
disp_l
done:
D 0 3
disp_s
+13 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
20
Symbol Table
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
disp_l
done:
D 0 3
disp_s
+13 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
21
Filling in Local Addresses
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
-10
done:
D 0 3
+13 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
22
Filling in Local Addresses
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
-10
done:
D 0 3
+13 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
23
Filling in Local Addresses
loop
done
.globl loop
type
def
disp_s
disp_l
disp_l
def
label
loop
done
foo
loop
done
address
0
2
5
10
15
loop:
cmpl %edx, %eax
jge done
pushl %edx
call foo
disp_l
jmp loop
-10
done:
D 0 3
+13 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
24
Relocation Records
...
.globl loop
loop:
cmpl %edx, %eax
def
disp_l
loop
foo
jge done
pushl %edx
call foo
jmp loop
done:
disp_l
-10
0
5
D 0 3
+13 7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
25
Assembler Directives
• Delineate segments
 .section
• Allocate/initialize data and bss segments
 .word .half .byte
 .ascii .asciz
 .align .skip
• Make symbols in text externally visible
 .global
26
Assemble into Sections
• Process instructions and directives to produce object file
output structures
instruction
symbol table
.s file
instruction
input
assemble
data section
instruction
text section
instruction
bss
section
output
.o file
27
Output Functions
• Machine language output
 Write symbol table and sections into object file
instruction
symbol table
.s file
instruction
input
assemble
data section
instruction
text section
instruction
bss
section
output
.o file
28
ELF: Executable and Linking Format
• Format of .o and a.out files
 Output by the assembler
 Input and output of linker
ELF Header
optional for .o files
Program Hdr
Table
Section 1
...
Section n
optional for a.out files
Section Hdr
Table
29
Invoking the Linker
• ld bar.o main.o
compiled
program
modules
–l libc.a
library
(contains
more
.o files)
–o a.out
output
(also in “.o”
format, but
no undefined
symbols)
• Invoked automatically by gcc,
• but you can call it directly if you like.
30
Multiple Object Files
main.o
start
def
disp_l
main
foo
loop
disp_l
bar.o
0
8
15
def
disp_l
[0]
[2]
[4]
[7]
[8]
[12]
[15]
[20]
loop
foo
disp_l
-10
0
5
D 0 3
+13 7
5
E
E
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
31
Step 1: Pick An Order
main.o
start
def
disp_l
main
foo
loop
disp_l
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
disp_l
-10
0
5
D 0 3
+13 7
5
E
E
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
32
Step 1: Pick An Order
main.o
start
def
disp_l
main
foo
loop
disp_l
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
disp_l
-10
0
5
D 0 3
+13 7
5
E
E
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
33
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
disp_l
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
34
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
disp_l
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
35
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
disp_l
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
36
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
disp_l
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
37
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
38
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
39
Step 2: Patch
main.o
start
def
disp_l
main
foo
loop
0-(15+15)=-30
bar.o
15+0
15+8
15+15
def
disp_l
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
loop
foo
0
5
D 0 3
+13 7
5
15+8-5=18
E
E
-10
9 [0]
D [2]
2 [4]
8 [5]
9 [10]
[15]
40
Step 3: Concatenate
a.out
start
main
15+0
D 0
+13
+18
-10
-30
3
7
5
E
E
9
D
2
8
9
[0]
[2]
[4]
[5]
[10]
[15]
[17]
[19]
[22]
[23]
[27]
[30]
[35]
41
Summary
• Assember
 Read assembly language
 Two-pass execution (resolve symbols)
 Produce object file
• Linker
 Order object codes
 Patch and resolve displacements
 Produce executable
42
Download