Uploaded by davidlai0917

Chapter 2 2

advertisement
COMPUTER ORGANIZATION AND DESIGN
The Hardware/Software Interface
Chapter 2
Instructions: Language
of the Computer
(Part 2)
Outline





2.8 Supporting Procedures in Computer
Hardware
2.9 Communicating with People
2.10 RISC-V Addressing for Wide Immediates
and Addresses
2.11 Parallelism and Instructions:
Synchronization
2.12 Translating and Starting a Program
Chapter 2 (Part 2) — 2

Procedure calls change control flow
...
c = sum(a,b);
...

Caller
Must specify:





Procedure address
Arguments
Local variables
Return value
Return address
Callee
int sum(int x, int y) {
int temp;
temp = x + y;
return temp;
}
2.8 Supporting Procedures in Computer Hardware
Procedure Call: An Example
Where to put these data?
How to translate into machine code?
Chapter 2 (Part 2) — 3
Procedure Call

Required steps
1.
2.
3.
4.
5.
6.
Place arguments in registers (x10 to x17)
Transfer control to procedure
Acquire storage for procedure
Perform procedure’s operations
Place result in register (x10, x11)
Return to place of call (return address in x1)
Chapter 2 (Part 2) — 4
Procedure Call Instructions

Procedure call: jump and link
jal x1, ProcedureLabel
 Address of following instruction put in x1
 Jumps to target address

Procedure return: jump and link register
jalr x0, 0(x1)
 Jumps to 0 + address in x1
 Use x0 as rd


We do not need the return address and x0 cannot be
changed (always 0)
Can also be used for computed jumps

e.g., for case/switch statements
Chapter 2 (Part 2) — 5
Procedure Call Instructions

jal x1, ProcedureAddr

Jump to ProcedureAddr and simultaneously save the address of
the following instruction, (PC + 4), in register x1
call
jal x1, ProcedureAddr
return jalr x0, 0(x1)
Chapter 2 (Part 2) — 6
Execution of a Procedure
Caller
(1) Place arguments in registers (x10 – x17)
 If more than 8 arguments, push them into the stack
(stack pointer: x2 or sp)
(2) Transfers control to callee by jal x1, ProcedureAddr
Callee
(3) Allocate required storage on stack
(4) Perform the desired task
(5) Place the result in register (x10, x11)
(6) Return control to caller by jalr x0, 0(x1)
Caller
(7) Get return result from register (x10, x11)
Chapter 2 (Part 2) — 7
Leaf Procedure Example

C code:
long long int leaf_example (
long long int g, long long int h,
long long int i, long long int j) {
long long int f;
f = (g + h) - (i + j);
return f;
}
 Arguments g, …, j in x10, …, x13
 Local variable f in x20
 Temporaries x5, x6
 Save x5, x6, x20 on stack before use
 Result in x10
Chapter 2 (Part 2) — 8
Leaf Procedure Example
RISC-V code:

leaf_example:
addi sp,sp,-24
sd
x5,16(sp)
sd
x6,8(sp)
sd
x20,0(sp)
add x5,x10,x11
add x6,x12,x13
sub x20,x5,x6
addi x10,x20,0
ld
x20,0(sp)
ld
x6,8(sp)
ld
x5,16(sp)
addi sp,sp,24
jalr x0,0(x1)
# save x5, x6, x20 on stack
# x5 = g + h
# x6 = i + j
# f = x5 – x6
# copy f to result register x10
# restore x5, x6, x20 from stack
# return to caller
Chapter 2 (Part 2) — 9
Local Data on the Stack
(a): before procedural call
(b): during procedural call
(c): after procedural call
Chapter 2 (Part 2) — 10
Register Name, Use, Calling Convention
Register
ABI Name
Use
Saver
x0
x1
x2
x3
x4
x5-7
x8
x9
x10-11
x12-17
x18-27
x28-31
zero
ra
sp
gp
tp
t0-t2
s0/fp
s1
a0-a1
a2-a7
s2-s11
t3-t6
Hard-wired zero
Return address
Stack pointer
Global pointer
Thread pointer
Temporaries
Saved register/frame pointer
Saved register
Function arguments/return values
Function arguments
Saved registers
Temporaries
−
Caller
Callee
−
−
Caller
Callee
Callee
Caller
Caller
Callee
Caller
Chapter 2 (Part 2) — 11
Procedure Call Convention
... sum(a,b)...;





long long int sum(long long int x,
long long int y) {
long long int temp;
temp = x + y;
return temp;
}
Return address
ra (x1)
Procedure address Labels
Arguments
a0~a7 (x10~x17)
Local variables
t0~t6 (x5~x7,x28~x31)
Return value
a0,a1 (x10,x11)
Chapter 2 (Part 2) — 12
Why Procedure Call Convention?

As a contract between caller and callee, so that



People who have never seen or even communicated with
each other can write procedures that work together
Preserved: if used, callee saves and restores them in stack
Not preserved: callee uses them freely without preserving
 So if caller needs them after the call, it has to preserve
them
If the software relies on the global pointer register, it is also preserved.
Based on this convention, x5, x6 in leaf procedure example (page 9) need not be saved
Chapter 2 (Part 2) — 13
Non-Leaf Procedures


Procedures that call other procedures
For nested calls, caller needs to save on
the stack:



Its return address (x1)
Any arguments (x10-x17) and temporaries
(x5-x7, x28-x31) needed after the call
Restore the placed registers from the stack
after the call
Chapter 2 (Part 2) — 14
Non-Leaf Procedure Example

C code:
long long int fact (long long int n)
{
if (n < 1) return 1;
else return n * fact(n - 1);
}


Argument n in x10
Result in x10
Chapter 2 (Part 2) — 15
Non-Leaf Procedure Example
fact: addi
sd
sd
addi
bge
addi
addi
jalr
L1:
addi
jal
addi
ld
ld
addi
mul
jalr
sp,sp,-16
x1,8(sp)
x10,0(sp)
x5,x10,-1
x5,x0,L1
x10,x0,1
sp,sp,16
x0,0(x1)
x10,x10,-1
x1,fact
x6,x10,0
x10,0(sp)
x1,8(sp)
sp,sp,16
x10,x10,x6
x0,0(x1)
# make space on stack
# save return address in x1 onto stack
# save argument in x10 onto stack
# x5 = n – 1
# if n >= 1, go to L1
# else, set return value to 1
# pop stack, don’t bother restoring values
# return
#n=n–1
# call fact(n – 1)
# move return value of fact(n – 1) to x6
# restore caller’s n
# restore return address
# return space on stack
# return n * fact(n – 1)
# return
Chapter 2 (Part 2) — 16
Local Data on the Stack
(a): before procedural call
(b): during procedural call
(c): after procedural call

Local data allocated by callee


e.g., C automatic variables
Procedure frame (activation record)


Used by some compilers to manage stack storage
Frame pointer: x8 or fp
Chapter 2 (Part 2) — 17
Memory Layout


Text: program code
Static data: global
variables



Dynamic data: heap


e.g., static variables in C,
constant arrays and strings
x3 (global pointer, gp)
x3
initialized to address
allowing ±offsets into this
segment
E.g., malloc() and free() in
C, new in Java
Stack: automatic storage
(local variables)
Chapter 2 (Part 2) — 18
Summary: Procedure Calls


Compiler (or assembly programmer) and
processor hardware work together to
support/translate procedure calls in HLLs
Processor hardware provides



Registers: sp (stack pointer), ra (return address), …
Instructions: jal, jalr
Compiler does


Allocation of memory space for stack and local
variables
Generation of instructions for managing stack,
passing arguments and return values, jumping to and
returning from procedure
Chapter 2 (Part 2) — 19
2.9 Communicating with People
Character Data

Byte-encoded character sets

ASCII: 128 characters


95 graphic, 33 control
Unicode: 32-bit character set



Used in Java, …
Most of the world’s alphabets, plus symbols
UTF-8, UTF-16: variable-length encodings
Chapter 2 (Part 2) — 20
Byte/Halfword/Word Operations

RISC-V byte/halfword/word load/store

Load byte/halfword/word: Sign extend to 64 bits in rd




Load byte/halfword/word unsigned: Zero extend to 64 bits in rd




lb rd, offset(rs1)
lh rd, offset(rs1)
lw rd, offset(rs1)
lbu rd, offset(rs1)
lhu rd, offset(rs1)
lwu rd, offset(rs1)
Store byte/halfword/word: Store rightmost 8/16/32 bits



sb rs2, offset(rs1)
sh rs2, offset(rs1)
sw rs2, offset(rs1)
Chapter 2 (Part 2) — 21
String Copy Example

C code:
Null-terminated string
void strcpy (char x[], char y[])
{ size_t i;
i = 0;
while ((x[i]=y[i])!='\0')
i += 1;
}
x:x10, y:x11 (argument registers)
i:x19 (saved register  need to be preserved)




Compiler should actually use registers for temporaries (x5x7, x28-x31)
Chapter 2 (Part 2) — 22
String Copy Example

RISC-V code:
strcpy:
addi
sd
add
L1: add
lbu
add
sb
beq
addi
jal
L2: ld
addi
jalr
sp,sp,-8
x19,0(sp)
x19,x0,x0
x5,x19,x11
x6,0(x5)
x7,x19,x10
x6,0(x7)
x6,x0,L2
x19,x19, 1
x0,L1
x19,0(sp)
sp,sp,8
x0,0(x1)
#
#
#
#
#
#
#
#
#
#
#
#
#
adjust stack for 1 doubleword
push x19
i=0
x5 = addr of y[i]
x6 = y[i]
x7 = addr of x[i]
x[i] = y[i]
if y[i] == 0 then exit
i = i + 1
next iteration of loop
restore saved x19
pop 1 doubleword from stack
and return
Chapter 2 (Part 2) — 23


I-format only allows 12-bit immediate; what if we want to
load a 32-bit constant?
Use Load Upper Immediate (lui) + addi
lui rd, constant
Immediate[31-12]
20 bits
U-type
rd
opcode
5 bits
7 bits
Copy 20-bit constant to bits [31:12] of rd
 Extend bit 31 to bits [63:32], clear bits [11:0] of rd to 0
lui x19,976
# 976=0x003D0

2.10 RISC-V Addressing for Wide Immediates and Addresses
32-bit Constants
x19 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 0000 0000 0000 0000
addi x19,x19,1280
# 1280=0x500
x19 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0011 1101 0000 0101 0000 0000
Chapter 2 (Part 2) — 24
Branch Addressing

RISC-V Code:
Loop:
beq
addi
jal
x6,x0,End
x19,x19,1
x0,Loop
End:

Branch instructions (SB-type)
imm[10:5]
imm[12]
rs2
rs1
funct3
imm
[4:1]
opcode
imm[11]
bne x10,x11,2000 # 2000 = 0 0111 1101 0000
0 111110
01011
01010
001
1000 0


1100011
But immediate field can only take 12 bits for target address
How to get the full address for the target instruction?
Chapter 2 (Part 2) — 25
Branch Addressing


Most branch targets are near, forward or
backward
Solution: PC-relative addressing

Use PC to give the 64-bit address and +/- immediate,
because most branch targets are near branch
instruction, whose address is currently held in PC


12-bit immediate is a signed two’s complement integer
to be added to the PC if branch taken
The addresses actually point to halfwords, i.e.,
target address = PC + immediate × 2 = PC + {imm | 0}

Keep the flexibility of supporting 2-byte instructions in RISCV, so the branch immediates represent the number of
halfwords between the branch instruction and the branch
target
Chapter 2 (Part 2) — 26
Jump Addressing

Jump and link (jal) target uses 20-bit
immediate for larger range: UJ-type
imm[10:1]
imm[19:12]
imm[20]
imm[11]
jal x0,2000
0

rd
opcode
5 bits
7 bits
# 2000 = 0 0000 0000 0111 1101 0000
1111101000
0 00000000
00000
1101111
For long jumps, e.g., to 32-bit absolute
address
 lui: load address[31:12] to temp register
 jalr: add address[11:0] and jump to target
jalr x0,100(x19)
Chapter 2 (Part 2) — 27
Target Addressing Calculation

Assume Loop at location 80000
Loop: slli x10,x22,3
80000
0
3
22
1
10
19
add
x10,x10,x25
80004
0
25
10
0
10
51
ld
x9,0(x10)
80008
0
0
10
3
9
3
bne
x9,x24,Exit
80012
0
24
9
1
12
99
80016
0
1
22
0
22
19
0
0
0
13
99
addi x22,x22,1
beq
x0,x0,Loop
-20
+12
Exit: …
imm[10:5]
imm[12]
80020 127
80024
rs2
rs1
funct3
imm
opcode
[4:1]
imm[11]
80012: 0000000 11000 01001 001 01100 1100111  0 0 000000 0110 0 = +12
80020: 1111111 00000 00000 000 01101 1100111  1 1 111111 0110 0 = -20
Chapter 2 (Part 2) — 28
RISC-V Addressing Modes
addi x6,x21,4
add x6,x21,x22
ld x6,0(x21)
beq x20,x21,L1
Chapter 2 (Part 2) — 29
RISC-V Instruction Formats
31
25 24
funct7
20 19
rs2
imm[11:0]
15 14 12 11
7 6
0
rs1
funct3
rd
opcode
R-type
rs1
funct3
rd
opcode
I-type
imm[11:5]
rs2
rs1
funct3
imm[4:0]
opcode
S-type
imm[12, 10:5]
rs2
rs1
funct3
imm[4:1, 11]
opcode
SB-type
imm[31:12]
rd
opcode
U-type
imm[20, 10:1, 11, 19:12]
rd
opcode
UJ-type






R-type: Arithmetic instructions
I-type: Loads & immediate arithmetic
S-type: Stores
SB-type: Conditional branch format
UJ-type: Unconditional jump format
U-type: Upper immediate format
Chapter 2 (Part 2) — 30
RISC-V Encoding of Opcodes

What is the assembly
code corresponding to
the machine instruction
00578833hex?


0000 0000 0101 0111 1000 1000 0011 0011
 opcode: 0110011
 funct3: 000
 funct7: 0000000
 rd: 10000
 rs1: 01111
 rs2: 00101
add x16, x15, x5
Chapter 2 (Part 2) — 31

Two processors sharing an area of memory


P1 writes, then P2 reads
Data race if P1 and P2 don’t synchronize


Hardware support required



Result depends on order of accesses
Atomic read/write memory operation
No other access to the location allowed between the
read and write
Could be a single instruction


2.11 Parallelism and Instructions: Synchronization
Synchronization
E.g., atomic swap of register ↔ memory
Or an atomic pair of instructions
Chapter 2 (Part 2) — 32
Synchronization in RISC-V

Load reserved: lr.d rd,(rs1)



Load from address in rs1 to rd
Place reservation on memory address
Store conditional: sc.d rd,rs2,(rs1)

Succeeds if location not changed since the lr.d



Stores from rs2 to address in rs1
Returns 0 in rd
Fails if location is changed

Returns non-zero value in rd
Chapter 2 (Part 2) — 33
Synchronization in RISC-V

Example 1: atomic swap
again: lr.d
sc.d
bne
addi

Example 2: lock (acquire lock at location in x20, 0: lock
is free, 1: lock is acquired)
addi
again: lr.d
bne
sc.d
bne

x10,(x20)
x11,x23,(x20) # X11 = status
x11,x0,again # branch if store failed
x23,x10,0
# X23 = loaded value
x12,x0,1
x10,(x20)
x10,x0,again
x11,x12,(x20)
x11,x0,again
#
#
#
#
#
copy locked value
read lock
check if it is 0 yet
attempt to store
branch if fails
x0,0(x20)
# free lock
Unlock:
sd
Chapter 2 (Part 2) — 34
Many compilers produce
object modules directly
2.12 Translating and Starting a Program
Translation and Startup
Chapter 2 (Part 2) — 35
Assembler: Producing an Object Module


Assembler (or compiler) translates programs into object
files
Object file






Header: describes the size and position of the other pieces of the
object file
Text segment: contains the machine code
Static data segment: contains data allocated for the life of the
program
Relocation information: identifies instructions and data words that
depend on absolute addresses when the program is loaded into
memory
Symbol table: contains the remaining labels that are not defined,
such as external references
Debug information: contains a description of how the modules
were compiled so that a debugger can associate machine code
with source code and make data structures readable
Chapter 2 (Part 2) — 36
Assembler Pseudoinstructions


Most assembly instructions represent machine
instructions one-to-one
Pseudoinstructions: assembly instructions defined
by the assembler to help assembly programming,
but they are not really implemented by the hardware,
because they can be realized by true instructions
li
j
mv
and
x9,123
L1
x10,x11
x9,x10,15




addi
jal
addi
andi
x9,x0,123
x0,L1
x10,x11,0
x9,x10,15
Chapter 2 (Part 2) — 37
Linker: Linking Object Modules

Produces an executable file that can be run on a
computer





Merge object modules by placing code and data modules
symbolically in memory
Determine addresses of data and instruction labels using
relocation information and symbol table
Patch internal and external references
Determine memory locations where each module will occupy
 All absolute references must be relocated to reflect true
locations
Could leave location dependencies for fixing by a
relocating loader


But with virtual memory, no need to do this
Program can be loaded into absolute location in virtual memory
space
Chapter 2 (Part 2) — 38
Example: Linking Object Files
Chapter 2 (Part 2) — 39
Example: Linking Object Files
Chapter 2 (Part 2) — 40
Loader: Loading a Program

Load from an executable file on disk into
memory
1. Read header to determine segment sizes
2. Create virtual address space
3. Copy text and initialized data into memory

Or set page table entries so they can be faulted in
4. Set up arguments for main () on stack
5. Initialize registers (including sp, fp, gp)
6. Jump to startup routine


Copies arguments to x10, … and calls main ()
When main () returns, do exit () syscall
Chapter 2 (Part 2) — 41
Dynamic Linking

Problems with static libraries



Library routines become part of the executable code
 cannot use newer version libraries unless rebuilt
It loads all routines in the library that are called in the
executable, even if those calls are not executed
Dynamically linked libraries (DLLs): Only
link/load library procedure when it is called



Requires procedure code to be relocatable
Avoids image bloat caused by static linking of all
(transitively) referenced libraries
Automatically picks up new library versions
Chapter 2 (Part 2) — 42
Lazy Procedure Linkage
Indirection table
Stub: Loads routine ID,
Jump to linker/loader
Linker/loader code
Dynamically
mapped code
Chapter 2 (Part 2) — 43
Translation Hierarchy for Java
Simple portable
instruction set for
the JVM
Compiles
bytecodes of
“hot” methods
into native
code for host
machine
Interprets
bytecodes
Chapter 2 (Part 2) — 44
Download