PPT

advertisement
Chapter 4: Processor Architecture


How does the hardware execute the instructions?
We’ll see by studying an example system
 Based on simple instruction set devised for this purpose
 Y86, inspired by x86
Fewer data types, instructions, addressing modes
 Simpler encodings
 Reasonably complete for integer programs
 We’ll design hardware to implement Y86 ISA
 Basic building blocks
 Sequential implementation
 Pipelined implementation

Instruction Set Architecture

Defines interface between hardware
and software
 Software spec is assembly language
State: registers, memory
 Instructions, encodings
 Hardware must execute instructions
correctly
 May use variety of transparent tricks
to make execution fast.
 Results must match sequential
execution.


ISA is a layer of abstraction
 Above: how to program machine
 Below: what needs to be built
Application
Program
Compiler
OS
ISA
CPU
Design
Circuit
Design
Chip
Layout
Y86 Processor and System State
RF: Program
registers
%eax
%ecx
%edx
%esi
%edi
%esp
%ebx
%ebp
CC:
Condition
codes
Stat: Program Status
ZF SF OF
DMEM: Memory
PC
 Program Registers
Same 8 as with IA32. Each 32 bits
Condition Codes
 Single-bit flags as in x86: OF (Overflow), ZF (Zero), SF (Negative)
Program Counter
 Indicates address of instruction
Memory
 Byte-addressable storage, words in little-endian byte order
Stat
 Indicates exceptional outcomes (bad opcode, bad address, halt)





Y86 Instructions

Format
 1 to 6 bytes of information read from memory

Can determine instruction length from first byte

Not as many instruction types, and simpler encoding than IA32
 Each accesses and modifies some portion of the CPU and system
state

Program registers

Condition codes

Program counter

Memory contents
Encoding Registers

Each register has 4-bit ID
%eax
%ecx
%edx
%ebx
0
1
2
3
%esi
%edi
%esp
%ebp
6
7
4
5
 Similar encoding used in IA32


But we never deciphered encoding to notice!
Register ID 0xF indicates “no register”
 Will use this in our hardware design in multiple places
 Could otherwise encode register # in 3 bits
 Simplifies decoding of instructions
Instruction Example

Addition instruction
Generic Form
Encoded Representation
addl rA, rB
6 0 rA rB
 Add value in register rA to that in register rB
Store result in register rB
 Y86 allows addition to be applied to register data only
 Set condition codes based on result
 Two-byte encoding
 First byte indicates instruction type
 Second gives source and destination registers
 e.g., addl %eax,%esi
has encoding 60 06

Arithmetic and Logical Operations
Instruction Code
Add
addl rA, rB
Function Code
6 0 rA rB
Subtract (rA from rB)
subl rA, rB
6 1 rA rB
And
andl rA, rB
6 2 rA rB
Exclusive-Or
xorl rA, rB
6 3 rA rB
 Refer to generically as “OPl”
 Encodings differ only by
“function code”
 Low-order 4 bits in first
instruction word
 All set condition codes as side
effect
Move Operations
Register --> Register
rrmovl rA, rB
2 0 rA rB
irmovl V, rB
3 0 F rB
V
rmmovl rA, D(rB) 4 0 rA rB
D
5 0 rA rB
D
mrmovl D(rB), rA
Immediate --> Register
Register --> Memory
Memory --> Register
 Similar to the IA32 movl instruction
 Simpler format for memory addresses
 Separated into different instructions to simplify hardware
implementation
Move Instruction Examples
IA32
Y86
Encoding
movl $0xabcd, %edx
irmovl $0xabcd, %edx
30 82 cd ab 00 00
movl %esp, %ebx
rrmovl %esp, %ebx
20 43
movl -12(%ebp),%ecx
mrmovl -12(%ebp),%ecx
50 15 f4 ff ff ff
movl %esi,0x41c(%esp)
rmmovl %esi,0x41c(%esp)
40 64 1c 04 00 00
movl $0xabcd, (%eax)
—
movl %eax, 12(%eax,%edx)
—
movl (%ebp,%eax,4),%ecx
—
Jump Instructions
Jump Unconditionally
jmp Dest
7 0
Dest
Jump When Less or Equal
jle Dest
7 1
Dest
Jump When Less
jl Dest
7 2
Dest
Jump When Equal
je Dest
7 3
Dest
Jump When Not Equal
jne Dest
7 4
Dest
Jump When Greater or Equal
jge Dest
7 5
Dest
Jump When Greater
jg Dest
7 6
Dest
 Refer to generically as “jXX”
 Encodings differ only by
“function code”
 Based on values of condition
codes
 Same as IA32 counterparts
 Encode full destination address
 Unlike PC-relative
addressing in IA32
Stack Operations
pushl rA
a 0 rA 8
 Decrement %esp by 4
 Store word from rA to memory at %esp
 Like IA32
popl rA




b 0 rA 8
Read word from memory at %esp
Save in rA
Increment %esp by 4
Like IA32
Same stack conventions as IA32
Subroutine Call and Return
call Dest
8 0
Dest
 Push address of next instruction onto stack
 Start executing instructions at Dest
 Like IA32
ret
9 0
 Pop value from stack
 Use as address for next instruction
 Like IA32
Miscellaneous Instructions
nop
0 0
 Don’t do anything
halt
1 0
 Stop executing instructions
 IA32 has comparable instruction, but it can’t be executed in user
mode
 We will use this instruction to stop the simulator
Y86 Instruction Set
Byte
0
nop
0 0
halt
1 0
rrmovl rA, rB
2 0 rA rB
irmovl V, rB
3 0 F rB
V
rmmovl rA, D(rB)
4 0 rA rB
D
mrmovl D(rB), rA
OPl rA, rB
jXX Dest
call Dest
ret
pushl rA
popl rA
1
2
3
5 0 rA rB
4
5
addl
6 0
subl
6 1
andl
6 2
xorl
6 3
jmp
7 0
jle
7 1
jl
7 2
je
7 3
jne
7 4
jge
7 5
jg
7 6
D
6 fn rA rB
7 fn
8 0
Dest
Dest
9 0
A 0 rA F
B 0 rA F
Writing Y86 Code

Best to use C compiler as much as possible
 Write code in C
 Compile for IA32 with gcc -S
 Hand translate into Y86

Coding example
 Find number of elements in null-terminated list
int len1(int a[]);
a
5043
6125
7395
0
 3
Y86 Code Generation Example

First try
 Write typical array code
/* Find number of elements in
null-terminated list */
int len1(int a[])
{
int len;
for (len = 0; a[len]; len++)
;
return len;
}
 Compile with gcc -O2 -S

Problem
 Hard to do array indexing on
Y86: no scaled addressing
modes
L18:
incl %eax
cmpl $0,(%edx,%eax,4)
jne L18
x86 code
Y86 Code Generation Example #2

Second try
 Revise to use pointers
/* Find number of elements in
null-terminated list */
int len2(int a[])
{
int len = 0;
while (*a++)
len++;
return len;
}
 Compile with gcc -O2 -S

Result
 Doesn’t use indexed addressing
L5:
movl (%edx),%eax
incl %ecx
addl $4,%edx
testl %eax,%eax
jne L5
x86 code
Y86 Code Generation Example #3

IA32 code
 Setup

len2:
pushl %ebp
xorl %ecx,%ecx
movl %esp,%ebp
movl 8(%ebp),%edx
movl (%edx),%eax
je L7
Y86 code
 Setup
len2:
pushl %ebp
xorl %ecx,%ecx
rrmovl %esp,%ebp
mrmovl 8(%ebp),%edx
mrmovl (%edx),%eax
je L7
Hand translation
#
#
#
#
#
#
Save %ebp
len = 0
Set frame
Get a
Get *a
Goto exit
Y86 Code Generation Example #4

IA32 code
 Loop + Finish

L5:
movl (%edx),%eax
incl %ecx
addl $4,%edx
testl %eax,%eax
jne L5
movl %ebp,%esp
movl %ecx,%eax
popl %ebp
ret
Y86 code
 Loop + Finish
L5:
mrmovl (%edx),%eax # Get *a
irmovl $1,%esi
addl %esi,%ecx
# len++
irmovl $4,%esi
addl %esi,%edx
# a++
andl %eax,%eax
# *a == 0?
jne L5
# No--Loop
rrmovl %ebp,%esp # Pop
rrmovl %ecx,%eax # Rtn len
popl %ebp
ret
Hand translation
Y86 Program Structure
irmovl Stack,%esp
rrmovl %esp,%ebp
irmovl List,%edx
pushl %edx
call len2
halt
.align 4
List:
.long 5043
.long 6125
.long 7395
.long 0
# Set up stack
# Set up frame
# Push argument
# Call Function
# Halt
 Programmer must do

# List of elements
# Function
len2:
. . .
# Allocate space for stack
.pos 0x100
Stack:



more work; no
compiler, linker, runtime system
Make program
placement explicit
Stack initialization
must be explicit (addr.
0x100)
 Must ensure code
is not overwritten!
Must initialize data
Can use symbolic
names
Assembling Y86 Program
unix> yas eg.ys
 Generates “object code” file eg.yo


Actually looks like disassembler output
ASCII file to make it easy for you to read
0x000:
0x006:
0x008:
0x00e:
0x010:
0x015:
0x018:
0x018:
0x018:
0x01c:
0x020:
0x024:
308400010000
2045
308218000000
a028
8028000000
10
b3130000
ed170000
e31c0000
00000000
|
irmovl Stack,%esp
|
rrmovl %esp,%ebp
|
irmovl List,%edx
|
pushl %edx
|
call len2
|
halt
| .align 4
| List:
|
.long 5043
|
.long 6125
|
.long 7395
|
.long 0
# Set up stack
# Set up frame
# Push argument
# Call Function
# Halt
# List of elements
Simulating Y86 Program
unix> yis eg.yo
 Instruction set simulator


Computes effect of each instruction on processor state
Prints changes in state from original
Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%eax:
0x00000000
0x00000003
%ecx:
0x00000000
0x00000003
%edx:
0x00000000
0x00000028
%esp:
0x00000000
0x000000fc
%ebp:
0x00000000
0x00000100
%esi:
0x00000000
0x00000004
Changes to memory:
0x00f4:
0x00f8:
0x00fc:
0x00000000
0x00000000
0x00000000
0x00000100
0x00000015
0x00000018
CISC Instruction Sets

CISC: Complex Instruction Set Computer
 Dominant style of machines designed prior to ~1980

Stack-oriented instruction set
 Use stack to pass arguments, save program counter
 Explicit push and pop instructions

Arithmetic instructions can access memory
 addl %eax, 12(%ebx,%ecx,4)


Requires memory read and write + complex address calculation
Condition codes
 Set as side effect of arithmetic and logical instructions

Philosophy
 Add instructions to perform “typical” programming tasks
RISC Instruction Sets

Reduced Instruction Set Computer
 Early projects at IBM, Stanford (Hennessy), and Berkeley (Patterson)

Fewer, simpler instructions in ISA
(initially)
 Takes more to perform same operations (relative to CISC)
 But an instruction can execute faster on simpler hardware

Register-oriented instruction set
 Many more (typically  32) registers
 Used for arguments, return value and address, temporaries

Only load and store instructions can access memory
 Similar to Y86 mrmovl and rmmovl

No condition codes
 Test instructions return 0/1 in general purpose register
Example: MIPS Registers
Example: MIPS Instructions
R-R
Op
Ra
addu $3,$2,$1
Rb
Rd
00000
Fn
# Register add: $3 = $2+$1
R-I
Op
Ra
addu $3,$2,3145
sll $3,$2,2
Branch
Op
Ra
beq $3,$2,dest
Load/Store
Op
Ra
Rb
Immediate
# Immediate add: $3 = $2+3145
# Shift left: $3 = $2 << 2
Rb
Offset
# Branch when $3 = $2
Rb
Offset
lw $3,16($2)
# Load Word: $3 = M[$2+16]
sw $3,16($2)
# Store Word: M[$2+16] = $3
CISC vs. RISC Debate

Strong opinions at the time!
 CISC arguments
Easy for compiler (bridge semantic gap)
 Concise object code (memory was expensive)
 RISC arguments
 Simple is better for optimizing compilers
 A simple CPU can be made to run very fast


Current status
 For desktop processors, choice of ISA not a technical issue
With enough hardware, anything can be made to run fast
 Code compatibility more important
 For embedded processors, RISC makes sense
 Smaller, cheaper, less power

4.1 Summary

Y86 instruction set architecture




Similar state and instructions as IA32
Simpler encodings
Small instruction set
Y86 somewhere between CISC and RISC
 Changes from x86 consistent with RISC principles
4.2: Logic Design: A Brief Review

Fundamental hardware requirements
 Communication
How to get values from one place to another
 Computation
 Storage


All are simplified by restricting to 0s and 1s
 Communication
Low or high voltage on wire
 Computation
 Compute Boolean functions
 Storage
 Store bits of information

Communication: Digital Signals
0
1
0
Voltage
Time
 Use voltage thresholds to extract discrete values from continuous
signal
 Simplest version: 1-bit signal
 Either high range (1) or low range (0)
 With guard range between them
 Not strongly affected by noise or low quality circuit elements
 Can make circuits simple, small, and fast
Computation: Logic Gates
 Outputs are Boolean functions of inputs
 Respond continuously to changes in inputs

After some small delay
Rising Delay
Falling Delay
a && b
b
Voltage
a
Time
Combinational Circuits
Acyclic Network
Primary
Inputs

Primary
Outputs
Acyclic network of logic gates
 Continuously responds to changes on primary inputs
 Primary outputs become (after some delay) Boolean functions of
primary inputs
Bit Equality
Bit equal
a
eq
HCL Expression
bool eq = (a&&b)||(!a&&!b)
b
 Generate 1 if a and b are equal

Hardware control language (HCL)
 Very simple hardware description language
Boolean operations have syntax similar to C logical operations
 We’ll use it to describe control logic for processors
 Much more convenient than drawing gates
 Assumes compiler exists to turn HCL into gate equivalent

Word Equality
Word-Level Representation
b31
Bit equal
eq31
B
=
a31
b30
Bit equal
A
eq30
a30
HCL Representation
Eq
b1
Bit equal
a0
Bit equal
bool Eq = (A == B)
eq1
a1
b0
Eq
eq0
 32-bit word size
 HCL representation


Equality operation
Generates Boolean value
Bit-Level Multiplexer
s
Bit MUX
HCL Expression
bool out = (s&&a)||(!s&&b)
b
out
a
 Control signal s
 Data signals a and b
 Output a when s=1, b when s=0
Word Multiplexer
Word-Level Representation
s
s
B
b31
out31
a31
Out
A
HCL Representation
b30
out30
a30
MUX
int Out = [
s : A;
1 : B;
];
 Select input word A or B depending
b0
a0
on control signal s
 HCL representation
 Case expression
 Series of test : value pairs
out0
 Result value determined by first
successful test
Arithmetic Logic Unit
0
Y
X
1
Y
A
A
L
U
X+Y
B
OF
ZF
CF
X
2
Y
A
A
L
U
B
X-Y
OF
ZF
CF
X
3
Y
A
A
L
U
B
X&Y
OF
ZF
CF
X
 Combinational logic
Continuously responding to inputs
 Control signal selects function computed
 Corresponding to 4 arithmetic/logical operations in Y86
 Also computes values for condition codes

A
A
L
U
B
X^Y
OF
ZF
CF
Edge-Triggered Latch (Flip Flop)
D
R
Data
Q+
Q–
C
T
Clock
S
Trigger
 Only in latching mode for
C
T
D
Q+
Time
brief period
 On rising clock edge
 Value latched depends on
data as clock rises
 Output remains stable at all
other times
Storage: Registers
Structure
i7
D
C
Q+
o7
i6
D
C
Q+
o6
i5
D
C
Q+
o5
i4
D
C
Q+
o4
i3
D
C
Q+
o3
i2
D
C
Q+
o2
i1
D
C
Q+
o1
i0
D
C
Q+
o0
I
O
Clock
Clock
 Each stores word of data (one byte in above register)
Different from program registers (e.g., %eax)
 Collection of edge-triggered latches
 Loads input on rising edge of clock

Register Operation
State = x
Input = y
Output = x
x
State = y

Rising
clock

Output = y
y
 Stores data bits
 For most of time acts as barrier between input and output
 As clock rises, loads input
State Machine Example
Comb. Logic
0
 Accumulator circuit
 Load or accumulate
A
L
U
0
Out
on each cycle
MUX
In
1
Load
Clock
Clock
Load
In
Out
x0
x1
x0
x0+x1
x2
x0+x1+x2
x3
x4
x3
x3+x4
x5
x3+x4+x5
Storage: Random-Access Memory
valA
srcA
A
valW
Register
file
Read ports
valB
srcB
B
 Stores multiple words of memory

Clock
Address input specifies which word to read or write
 Register file
Holds values of program registers
– %eax, %esp, etc.
 Register identifier serves as address
– ID 0xF implies no read or write performed

 Multiple Ports

W
Can read and/or write multiple words simultaneously
– Each has separate address and data input/output
dstW
Write port
Register File Timing

valA
srcA
x
2
A
Register
file
valB
srcB
x
2
address

B

2
Reading
 Like combinational logic
 Output data generated based on input
After some delay
Writing
 Like register (a few slides ago)
 Update only as clock rises
x
valW
Register
file
W
Clock
dstW
y
2

Rising
clock
2

y
valW
Register
file
W
Clock
dstW
4.2 Summary

Computation
 Performed by combinational logic
 Computes Boolean functions
 Continuously reacts to input changes

Storage
 Registers

Hold single words

Loaded as clock rises
 Random-access memories

Hold multiple words

Multiple read and write ports possible

Read word anytime address input changes

Write word only on rising clock edge
Download