Review ISA and understand instruction encodings
• Arithmetic and Logical Instructions
• Review memory organization
• Memory (data movement) instructions
• Control flow instructions
• Procedure/Function calls
• Program assembly, linking, & encoding
(2)
• Reading 2.8, 2.12
• Appendix A: A1 - A.6
• Practice Problems: 10, 14,23
• Goals
Understand the binary encoding of complete program executables o How can procedures be independently compiled and linked (e.g., libraries)?
o What makes up an executable? o How do libraries become part of the executable?
o What is the role of the ISA in encoding programs?
o What constitutes the hardware/software interface
(3)
• Basic functionality
Transfer of parameters & control to procedure
Transfer of results & control back to the calling program
Support for nested procedures
• What is so hard about this?
Consider independently compiled code modules o Where are the inputs?
o Where should I place the outputs?
o Recall: What do you need to know when you write procedures in C?
(4)
• Where do we pass data
Preferably registers make the common case fast
Memory as an overflow area
• Nested procedures
The stack, $fp, $sp and $ra
Saving and restoring machine state
• Set of rules that developers/compilers abide by
Which registers can am I permitted to use with no consequence?
Caller and callee save conventions for MIPS
(5)
• Register usage
• What about nested calls?
• What about excess arguments?
arg1: arg2: loop: func: exit:
.data
.word 22, 20, 16, 4
.word 33,34,45,8
.text
addi $t0, $0, 4 move $t3, $0 move $t1, $0 move $t2, $0 beq $t0, $0, exit addi $t0, $t0, -1 lw $a0, arg1($t1) lw $a1, arg2($t2) jal func add $t3, $t3, $v0 addi $t1, $t1, 4 addi $t2, $t2, 4 j loop sub $v0, $a0, $a1 jr $ra
---
PC
$31
PC
$31
+ 4
(6)
• C code: int leaf_example (int g, h, i, j)
{ int f; f = (g + h) - (i + j); return f;
}
Arguments g, …, j are passed in $a0, …, $a3
f in $s0 (we need to save $s0 on stack – we will see why later)
Results are returned in $v0, $v1 argument registers
$a0
$a1
$a2
$a3 procedure
$v0
$v1 result registers
(7)
• Procedure call: jump and link jal ProcedureLabel
Address of following instruction put in $ra
Jumps to target address
• Procedure return: jump register jr $ra
Copies $ra to program counter
Can also be used for computed jumps o e.g., for case/switch statements
Example:
(8)
• MIPS code: leaf_example: addi $sp, $sp, -4 sw $s0, 0($sp) add $t0, $a0, $a1 add $t1, $a2, $a3 sub $s0, $t0, $t1 add $v0, $s0, $zero lw $s0, 0($sp) addi $sp, $sp, 4 jr $ra
Save $s0 on stack
Procedure body
Result
Restore $s0
Return
(9)
High Address
$fp
$sp
Old Stack Frame
System Wide Memory Map
$sp stack
$fp
New Stack
Frame
$sp arg registers return address
Saved registers local variables
$gp
PC dynamic data static data text reserved
Low Address compiler
ISA
HW compiler addressing
(10)
$fp
$sp arg 1 arg 2
..
callee saved registers
$s0-$s9 caller saved registers
$a0-$a3
$t0-$t9 local variables
..
$fp
$ra
Call Sequence
1. place excess arguments
2. save caller save registers
($a0-$a3, $t0-$t9)
3. jal
4. allocate stack frame
5. save callee save registers
($s0-$s9, $fp, $ra)
6 set frame pointer
Return
1. place function argument in $v0
2. restore callee save registers
3. restore $fp
4. pop frame
5. jr $31
(11)
Name Register number
$zero 0
$v0-$v1
$a0-$a3
2-3
4-7
$t0-$t7
$s0-$s7
$t8-$t9
8-15
16-23
24-25
$gp
$sp
$fp
28
29
30
$ra 31 values for results and expression evaluation arguments temporaries saved more temporaries global pointer stack pointer frame pointer
Usage the constant value 0 return address
(12)
• $a0 – $a3 : arguments (reg ’ s 4 – 7)
• $v0, $v1: result values (reg ’ s 2 and 3)
• $t0 – $t9 : temporaries
Can be overwritten by callee
• $s0 – $s7: saved
Must be saved/restored by callee
• $gp : global pointer for static data (reg
28)
• $sp : stack pointer (reg 29)
• $fp : frame pointer (reg 30)
• $ra : return address (reg 31)
(13)
• Procedures that call other procedures
• For nested call, caller needs to save on the stack:
Its return address
Any arguments and temporaries needed after the call
• Restore from the stack after the call
(14)
• C code: int fact (int n)
{ if (n < 1) return f; else return n * fact(n - 1);
}
Argument n in $a0
Result in $v0
(15)
1. Allocate stack frame ( decrement stack pointer )
2. Save any registers ( callee save registers )
3. Procedure body ( remember some arguments may be on the stack!
)
4. Restore registers ( callee save registers )
5. Pop stack frame ( increment stack pointer )
6. Return ( jr $ra )
(16)
} int fact (int n)
{ callee save if (n < 1) return f; else return n * fact(n - 1) ; restore
(17)
• MIPS code:
Callee save fact: addi $sp, $sp, -8 # adjust stack for 2 items sw $ra, 4($sp) # save return address
Termination
Check sw $a0, 0($sp) # save argument slti $t0, $a0, 1 # test for n < 1
Leaf Node beq $t0, $zero, L1 addi $v0, $zero, 1 # if so, result is 1 addi $sp, $sp, 8 # pop 2 items from stack jr $ra # and return
L1: addi $a0, $a0, -1 # else decrement n
Recursive call jal fact # recursive call lw $a0, 0($sp) # restore original n lw $ra, 4($sp) # and return address
Intermediate
Node addi $sp, $sp, 8 # pop 2 items from stack mul $v0, $a0, $v0 # multiply to get result jr $ra # and return
(18)
Review ISA and understand instruction encodings
• Arithmetic and Logical Instructions
• Review memory organization
• Memory (data movement) instructions
• Control flow instructions
• Procedure/Function calls
• Program assembly, linking, & encoding
(19)
Reading: 2.12, A2, A3, A4, A5
C program compiler
Assembly assembler
Object module linker
Object library executable loader memory
(20)
• Create a binary encoding of all native instructions
Translation of all pseudo-instructions
Computation of all branch offsets and jump addresses
Symbol table for unresolved (library) references
• Create an object file with all pertinent information
Header (information)
Example
:
Text segment
Data segment
Relocation information
Symbol table
(21)
• One pass vs. two pass assembly
• Effect of fixed vs. variable length instructions
• Time, space and one pass assembly
• Local labels, global labels, external labels and the symbol table
What does mean when a symbol is unresolved?
• Absolute addresses and re-location
(22)
L1: main: loop: then: exit:
.
data
.word 0x44,22,33,55 # array
.text
.globl main la $t0, L1 li $t1, 4 add $t2, $t2, $zero lw $t3, 0($t0) add $t2, $t2, $t3 addi $t0, $t0, 4 addi $t1, $t1, -1 bne $t1, $zero, loop bgt $t2, $0, then move $s0, $t2 j exit move $s1, $t2 li $v0, 10 syscall
What changes when you relocate code?
00400000] 3c081001 lui $8, 4097 [L1]
[00400004] 34090004 ori $9, $0, 4
[00400008] 01405020 add $10, $10, $0
[0040000c] 8d0b0000 lw $11, 0($8)
[00400010] 014b5020 add $10, $10, $11
[00400014] 21080004 addi $8, $8, 4
[00400018] 2129ffff addi $9, $9, -1
[0040001c] 1520fffc bne $9, $0, -16 [loop-0x0040001c]
[00400020] 000a082a slt $1, $0, $10
[00400024] 14200003 bne $1, $0, 12 [then-0x00400024]
[00400028] 000a8021 addu $16, $0, $10
[0040002c] 0810000d j 0x00400034 [exit]
[00400030] 000a8821 addu $17, $0, $10
[00400034] 3402000a ori $2, $0, 10
[00400038] 0000000c syscall
Assembly
Program
Native
Instructions
Assembled
Binary
(23)
• Linker
“Links” independently compiled modules
Determines “real” addresses
Updates the executables with real addresses
• Loader
As the name implies
Specifics are operating system dependent
(24)
Program A
Assembly A
Program B
Assembly B cross reference labels
header text static data reloc symbol table debug
• Why do we need independent compilation?
Study: Example on pg. 127
• What are the issues with respect to independent compilation?
• references across files ( can be to data or code!
)
• absolute addresses and relocation
(25)
# separate file
.text
addi $4, $0, 4 addi $5, $0, 5 jal func_add done
0x20040004
0x20050005
000011
0x0340200a
0x0000000c
# separate file
.text
.globl func_add func_add: add $2, $4, $5 0x00851020 jr $31 0x03e00008
0x00400000
0x00400004
0x00400008
0x0040000c
0x00400010
0x00400014
0x00400018
Ans: 0x0c100005
0x20040004
0x20050005
?
0x3402000a
0x0000000c
0x008551020
0x03e00008
(26)
• Load from image file on disk into memory
1.
Read header to determine segment sizes
2.
Create virtual address space ( later )
3.
Copy text and initialized data into memory o Or set page table entries so they can be faulted in
4.
Set up arguments on stack
5.
Initialize registers (including $sp, $fp, $gp)
6.
Jump to startup routine o Copies arguments to $a0, … and calls main o When main returns, do exit syscall
(27)
• Static Linking
All labels are resolved at link time
Link all procedures that may be called by the program
Size of executables?
• Dynamic Linking: Only link/load library procedure when it is called
Requires procedure code to be relocatable
Avoids image bloat caused by static linking of all
( transitively ) referenced libraries
Automatically picks up new library versions
(28)
Indirection table
Stub: Loads routine ID,
Jump to linker/loader
Linker/loader code
Dynamically mapped code
(29)
Register File (Programmer Visible State)
0x00
0x01
0x02
0x03
Memory Interface stack
Processor Internal Buses
0x1F
Dynamic Data
Program
Counter
Instruction register
Kernel registers
Programmer Invisible State
Data segment
(static)
Text Segment
Reserved
0xFFFFFFFF
Arithmetic Logic Unit (ALU)
Memory Map
Program Execution and the von Neumann model
(30)
• Instruction set architectures are characterized by several features
1. Operations
Types, precision, size
2. Organization of internal storage
Stack machine
Accumulator
General Purpose Registers (GPR)
3. Memory addressing
Operand location and addressing
(31)
4. Memory abstractions
Segments, virtual address spaces (more later)
Memory mapped I/O (later)
5. Control flow
Condition codes
Types of control transfers – conditional vs. unconditiional
• ISA design is the result of many tradeoffs
Decisions determine hardware implementation
Impact on time, space, and energy
• Check out ISAs for PowerPC, ARM, x86,
SPARC, etc.
(32)
• ARM: the most popular embedded core
• Similar basic set of instructions to MIPS
Date announced
Instruction size
Address space
Data alignment
Data addressing modes
Registers
Input/output
ARM
1985
32 bits
32-bit flat
Aligned
9
15 × 32-bit
Memory mapped
MIPS
1985
32 bits
32-bit flat
Aligned
3
31 × 32-bit
Memory mapped
(33)
• Uses condition codes for result of an arithmetic/logical instruction
Negative, zero, carry, overflow
Compare instructions to set condition codes without keeping the result
• Each instruction can be conditional
Top 4 bits of instruction word: condition value
Can avoid branches over single instructions
CPU/Core
Z V C N
$0
$1
$31
ALU
(34)
Differences?
(35)
• Evolution with backward compatibility
8080 (1974): 8-bit microprocessor o Accumulator, plus 3 index-register pairs
8086 (1978): 16-bit extension to 8080 o Complex instruction set (CISC)
8087 (1980): floating-point coprocessor o Adds FP instructions and register stack
80286 (1982): 24-bit addresses, MMU o Segmented memory mapping and protection
80386 (1985): 32-bit extension (now IA-32 ) o Additional addressing modes and operations o Paged memory mapping as well as segments
(36)
• Further evolution…
i486 (1989): pipelined , on-chip caches and FPU
Pentium (1993): superscalar , 64-bit datapath o Later versions added MMX (Multi-Media eXtension) instructions o The infamous FDIV bug
Pentium Pro (1995), Pentium II (1997) o New microarchitecture (see Colwell, The Pentium
Chronicles)
Pentium III (1999) o Added SSE (Streaming SIMD Extensions) and associated registers
Pentium 4 (2001) o New microarchitecture o Added SSE2 instructions
(37)
• And further…
AMD64 (2003): extended architecture to 64 bits
EM64T – Extended Memory 64 Technology (2004) o AMD64 adopted by Intel (with refinements) o Added SSE3 instructions
Intel Core (2006) o Added SSE4 instructions, virtual machine support
AMD64 (announced 2007): SSE5 instructions
Intel Advanced Vector Extension ( AVX announced
2008)
• If Intel didn ’ t extend with compatibility, its competitors would!
Technical elegance ≠ market success
• Commonly thought of as a Complex Instruction
Set Architecture (CISC)
(38)
(39)
• Two operands per instruction
Source/dest operand
Register
Second source operand
Register
Register
Register
Memory
Immediate
Memory
Register
Memory Immediate
Memory addressing modes
Address in register
Address = R base
+ displacement
Address = R base
Address = R base
+ 2 scale × R index
+ 2 scale × R index
(scale = 0, 1, 2, or 3)
+ displacement
(40)
• Variable length encoding
Postfix bytes specify addressing mode
Prefix bytes modify operation o Operand length, repetition, locking, …
(41)
• Complex instruction set makes implementation difficult
Hardware translates instructions to simpler microoperations o Simple instructions: 1–1 o Complex instructions: 1–many
Microengine similar to RISC
Market share makes this economically viable
• Comparable performance to RISC
Compilers avoid complex instructions
• Better code density
(42)
• Powerful instruction higher performance
Fewer instructions required
But complex instructions are hard to implement o May slow down all instructions, including simple ones
Compilers are good at making fast code from simple instructions
• Use assembly code for high performance
But modern compilers are better at dealing with modern processors
More lines of code more errors and less productivity
(43)
• Backward compatibility instruction set does not change
But they do accrete more instructions x86 instruction set
(44)
• Instruction complexity is only one variable
lower instruction count vs. higher CPI / lower clock rate
• Design Principles:
simplicity favors regularity
smaller is faster
good design demands compromise
make the common case fast
• Instruction set architecture
a very important abstraction indeed!
(45)
• Compute number of bytes to encode a SPIM program
• What does it mean for a code segment to be relocatable?
• Identify addresses that need to be modified when a program is relocated.
Given the new start address modify the necessary addresses
• Given the assembly of an independently compiled procedure, ensure that it follows the MIPS calling conventions, modifying it if necessary
(46)
• Given a SPIM program with nested procedures, ensure that you know what registers are stored in the stack as a consequence of a call
• Encode/disassemble jal and jr instructions
• Computation of jal encodings for independently compiled modules
• How can I make procedure calls faster?
Hint: What about a call is it that takes time?
• How are independently compiled modules linked into a single executable? (assuming one calls a procedure located in another)
(47)
• Argument registers
• Caller save registers
• Callee save registers
• Disassembly
• Frame pointer
• Independent compilation
• Labels: local, global, external
• Linker/loader
• Linking: static vs. dynamic vs. lazy
• Native instructions
• Nested procedures
• Object file
• One/two pass assembly
• Procedure invocation
• Pseudo instructions
• Relocatable code
• Stack frame
• Stack pointer
• Symbol table
• Unresolved symbol
(48)