Document

advertisement
Y86 Instruction Set Architecture – SEQ
processor
CENG331: Introduction to Computer Systems
8th Lecture
Instructor:
Erol Sahin
Acknowledgement: Most of the slides are adapted from the ones prepared
by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
Instruction Set Architecture
Assembly Language View

Processor state
 Registers, memory, …

Instructions
 addl, movl, leal, …
 How instructions are encoded as
bytes
Layer of Abstraction

Above: how to program machine
 Processor executes instructions in a
sequence

Application
Program
Compiler
OS
ISA
CPU
Design
Circuit
Design
Below: what needs to be built
 Use variety of tricks to make it run
fast
 E.g., execute multiple instructions
simultaneously
Chip
Layout
–2–
Y86 Processor State
Program
registers
%eax
%esi
%ecx
%edi
%edx
%esp
%ebx
%ebp

Condition
codes
Memory
OF ZF SF
PC
Program Registers
 Same 8 as with IA32. Each 32 bits

Condition Codes
 Single-bit flags set by arithmetic or logical instructions
» OF: Overflow

ZF: Zero
SF:Negative
Program Counter
 Indicates address of instruction

Memory
 Byte-addressable storage array
 Words stored in little-endian byte order
–3–
Y86 Instructions
Format

1--6 bytes of information read from memory
 Can determine instruction length from first byte
 Not as many instruction types, and simpler encoding than with IA32

Each accesses and modifies some part(s) of the program state
–4–
Encoding Registers
Each register has 4-bit ID
%eax
%ecx
%edx
%ebx

0
1
2
3
%esi
%edi
%esp
%ebp
6
7
4
5
Same encoding as in IA32
Register ID 8 indicates “no register”

Will use this in our hardware design in multiple places
–5–
%eax
%ecx
%edx
%ebx
Instruction Example
0
1
2
3
%esi
%edi
%esp
%ebp
6
7
4
5
Addition Instruction
Generic Form
Encoded Representation
addl rA, rB

6 0 rA rB
Add value in register rA to that in register rB
 Store result in register rB
 Note that Y86 only allows addition to be applied to register data

Set condition codes based on result
e.g., addl %eax,%esi Encoding: 60 06

Two-byte encoding

 First indicates instruction type
 Second gives source and destination registers
–6–
Arithmetic and Logical Operations
Instruction Code
Add
addl rA, rB
Function Code
6 0 rA rB

Refer to generically as “OPl”

Encodings differ only by
“function code”
 Low-order 4 bytes in first
Subtract (rA from rB)
subl rA, rB
6 1 rA rB
instruction word

Set condition codes as side
effect
And
andl rA, rB
6 2 rA rB
Exclusive-Or
xorl rA, rB
6 3 rA rB
–7–
Move Operations
rrmovl rA, rB
Register --> Register
2 0 rA rB
3 0 8 rB
V
rmmovl rA, D(rB) 4 0 rA rB
D
5 0 rA rB
D
irmovl V, rB
mrmovl D(rB), rA

Like the IA32 movl instruction

Simpler format for memory addresses
Give different names to keep them distinct

Immediate --> Register
Register --> Memory
Memory --> Register
–8–
Move Instruction Examples
%eax
%ecx
%edx
%ebx
0
1
2
3
%esi
%edi
%esp
%ebp
6
7
4
5
IA32
Y86
Encoding
movl $0xabcd, %edx
irmovl $0xabcd, %edx
30 82 cd ab 00 00
movl %esp, %ebx
rrmovl %esp, %ebx
20 43
movl -12(%ebp),%ecx
mrmovl -12(%ebp),%ecx
50 15 f4 ff ff ff
movl %esi,0x41c(%esp)
rmmovl %esi,0x41c(%esp)
40 64 1c 04 00 00
movl $0xabcd, (%eax)
—
movl %eax, 12(%eax,%edx)
—
movl (%ebp,%eax,4),%ecx
—
–9–
Jump Instructions
Jump Unconditionally
jmp Dest
7 0
Dest
Jump When Less or Equal
jle Dest
7 1
7 2
7 3
7 4
Encodings differ only by
“function code”
Based on values of condition
codes
Same as IA32 counterparts
Encode full destination address
Dest

Dest
Jump When Not Equal
jne Dest


Jump When Equal
je Dest
Refer to generically as “jXX”
Dest
Jump When Less
jl Dest


 Unlike PC-relative addressing
seen in IA32
Dest
Jump When Greater or Equal
jge Dest
7 5
Dest
Jump When Greater
jg Dest
7 6
Dest
– 10 –
Y86 Program Stack
Stack
“Bottom”



•
Increasing
Addresses
Region of memory holding program
data
Used in Y86 (and IA32) for supporting
procedure calls
Stack top indicated by %esp
 Address of top stack element
•

Stack grows toward lower addresses
 Top element is at highest address in
•
%esp
the stack
 When pushing, must first decrement
stack pointer
 When popping, increment stack
pointer
Stack “Top”
– 11 –
Stack Operations
pushl rA
a 0 rA 8

Decrement %esp by 4
Store word from rA to memory at %esp

Like IA32

popl rA
b 0 rA 8

Read word from memory at %esp


Save in rA
Increment %esp by 4

Like IA32
– 12 –
Subroutine Call and Return
call Dest



ret



8 0
Dest
Push address of next instruction onto stack
Start executing instructions at Dest
Like IA32
9 0
Pop value from stack
Use as address for next instruction
Like IA32
– 13 –
Miscellaneous Instructions
0 0
nop

Don’t do anything
halt



1 0
Stop executing instructions
IA32 has comparable instruction, but can’t execute it in user
mode
We will use it to stop the simulator
– 14 –
Y86 Code Generation Example
First Try

Write typical array code
Problem

Hard to do array indexing on Y86
 Since don’t have scaled
addressing modes
/* Find number of elements in
null-terminated list */
int len1(int a[])
{
int len;
for (len = 0; a[len]; len++)
;
return len;
}

L18:
incl %eax
cmpl $0,(%edx,%eax,4)
jne L18
Compile with gcc -O2 -S
– 15 –
Y86 Code Generation Example #2
Second Try

Write with pointer code
/* Find number of elements in
null-terminated list */
int len2(int a[])
{
int len = 0;
while (*a++)
len++;
return len;
}

Result

Don’t need to do indexed
addressing
L24:
movl (%edx),%eax
incl %ecx
L26:
addl $4,%edx
testl %eax,%eax
jne L24
Compile with gcc -O2 -S
– 16 –
Y86 Code Generation Example #3
IA32 Code

Setup
len2:
pushl %ebp
xorl %ecx,%ecx
movl %esp,%ebp
movl 8(%ebp),%edx
movl (%edx),%eax
jmp L26
Y86 Code

Setup
len2:
pushl %ebp
#
xorl %ecx,%ecx
#
rrmovl %esp,%ebp
#
mrmovl 8(%ebp),%edx #
mrmovl (%edx),%eax #
jmp L26
#
Save %ebp
len = 0
Set frame
Get a
Get *a
Goto entry
– 17 –
Y86 Code Generation Example #4
IA32 Code

Loop + Finish
L24:
movl (%edx),%eax
incl %ecx
L26:
addl $4,%edx
testl %eax,%eax
jne L24
movl %ebp,%esp
movl %ecx,%eax
popl %ebp
ret
Y86 Code

Loop + Finish
L24:
mrmovl (%edx),%eax
irmovl $1,%esi
addl %esi,%ecx
L26:
irmovl $4,%esi
addl %esi,%edx
andl %eax,%eax
jne L24
rrmovl %ebp,%esp
rrmovl %ecx,%eax
popl %ebp
ret
# Get *a
# len++
# Entry:
#
#
#
#
#
a++
*a == 0?
No--Loop
Pop
Rtn len
– 18 –
Y86 Program Structure
irmovl Stack,%esp
rrmovl %esp,%ebp
irmovl List,%edx
pushl %edx
call len2
halt
.align 4
List:
.long 5043
.long 6125
.long 7395
.long 0
# Set up stack
# Set up frame
# Push argument
# Call Function
# Halt


Program starts at
address 0
Must set up stack
 Make sure don’t
overwrite code!
# List of elements


Must initialize data
Can use symbolic
names
# Function
len2:
. . .
# Allocate space for stack
.pos 0x100
Stack:
– 19 –
Assembling Y86 Program
unix> yas eg.ys

Generates “object code” file eg.yo
 Actually looks like disassembler output
0x000:
0x006:
0x008:
0x00e:
0x010:
0x015:
0x018:
0x018:
0x018:
0x01c:
0x020:
0x024:
308400010000
2045
308218000000
a028
8028000000
10
b3130000
ed170000
e31c0000
00000000
|
|
|
|
|
|
|
|
|
|
|
|
irmovl Stack,%esp
rrmovl %esp,%ebp
irmovl List,%edx
pushl %edx
call len2
halt
.align 4
List:
.long 5043
.long 6125
.long 7395
.long 0
# Set up stack
# Set up frame
# Push argument
# Call Function
# Halt
# List of elements
– 20 –
Simulating Y86 Program
unix> yis eg.yo

Instruction set simulator
 Computes effect of each instruction on processor state
 Prints changes in state from original
Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%eax:
0x00000000
0x00000003
%ecx:
0x00000000
0x00000003
%edx:
0x00000000
0x00000028
%esp:
0x00000000
0x000000fc
%ebp:
0x00000000
0x00000100
%esi:
0x00000000
0x00000004
Changes to memory:
0x00f4:
0x00f8:
0x00fc:
0x00000000
0x00000000
0x00000000
0x00000100
0x00000015
0x00000018
– 21 –
CISC versus RISC
CENG331: Introduction to Computer Systems
8th Lecture
Instructor:
Erol Sahin
Acknowledgement: Most of the slides are adapted from the ones prepared
by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
CISC Instruction Sets


Complex Instruction Set Computer
Dominant style through mid-80’s
Stack-oriented instruction set


Use stack to pass arguments, save program counter (IA32 but not int x8664)
Explicit push and pop instructions
Arithmetic instructions can access memory

addl %eax, 12(%ebx,%ecx,4)
 requires memory read and write
 Complex address calculation
Condition codes

Set as side effect of arithmetic and logical instructions
Philosophy

Add instructions to perform “typical” programming tasks
– 23 –
RISC Instruction Sets


Reduced Instruction Set Computer
Internal project at IBM, later popularized by Hennessy (Stanford) and
Patterson (Berkeley)
Fewer, simpler instructions


Might take more to get given task done
Can execute them with small and fast hardware
Register-oriented instruction set


Many more (typically 32) registers
Use for arguments, return pointer, temporaries
Only load and store instructions can access memory

Similar to Y86 mrmovl and rmmovl
No Condition codes

Test instructions return 0/1 in register
– 24 –
MIPS Registers
$0
$0
$1
$at
$2
$v0
$3
$v1
$4
$a0
$5
$a1
$6
$a2
$7
Constant 0
Reserved Temp.
$16
$s0
$17
$s1
$18
$s2
$19
$s3
$20
$s4
$21
$s5
$22
$s6
$a3
$23
$s7
$8
$t0
$24
$t8
$9
$t1
$25
$t9
$10
$t2
$26
$k0
$11
$t3
$27
$k1
$12
$t4
$28
$gp
$13
$t5
$29
$sp
$14
$t6
$30
$s8
$15
$t7
$31
$ra
Return Values
Procedure arguments
Caller Save
Temporaries:
May be overwritten by
called procedures
Callee Save
Temporaries:
May not be
overwritten by
called procedures
Caller Save Temp
Reserved for
Operating Sys
Global Pointer
Stack Pointer
Callee Save Temp
Return Address
– 25 –
MIPS Instruction Examples
R-R
Op
Ra
addu $3,$2,$1
R-I
Op
Ra
addu $3,$2, 3145
sll $3,$2,2
Branch
Op
Ra
beq $3,$2,dest
Load/Store
Op
Ra
Rb
Rd
00000
Fn
# Register add: $3 = $2+$1
Rb
Immediate
# Immediate add: $3 = $2+3145
# Shift left: $3 = $2 << 2
Rb
Offset
# Branch when $3 = $2
Rb
Offset
lw $3,16($2)
# Load Word: $3 = M[$2+16]
sw $3,16($2)
# Store Word: M[$2+16] = $3
– 26 –
CISC vs. RISC
Original Debate



Strong opinions!
CISC proponents---easy for compiler, fewer code bytes
RISC proponents---better for optimizing compilers, can make run fast
with simple chip design
Current Status

For desktop processors, choice of ISA not a technical issue
 With enough hardware, can make anything run fast
 Code compatibility more important

For embedded processors, RISC makes sense
 Smaller, cheaper, less power
– 27 –
Summary
Y86 Instruction Set Architecture



Similar state and instructions as IA32
Simpler encodings
Somewhere between CISC and RISC
How Important is ISA Design?

Less now than before
 With enough hardware, can make almost anything go fast

AMD/Intel moved away from IA32
 Does not allow enough parallel execution
 x86-64
» 64-bit word sizes (overcome address space limitations)
» Radically different style of instruction set with explicit parallelism
» Requires sophisticated compilers
– 28 –
Logic Design and HCL
CENG331: Introduction to Computer Systems
8th Lecture
Instructor:
Erol Sahin
Acknowledgement: Most of the slides are adapted from the ones prepared
by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
Computing with Logic Gates
a
b
out
out = a && b


Not
Or
And
a
b
out
out
a
out = !a
out = a || b
Outputs are Boolean functions of inputs
Respond continuously to changes in inputs
 With some, small delay
Rising Delay
Falling Delay
a && b
b
Voltage
a
Time
– 30 –
Combinational Circuits
Acyclic Network
Primary
Inputs
Primary
Outputs
Acyclic Network of Logic Gates


Continously responds to changes on primary inputs
Primary outputs become (after some delay) Boolean functions of
primary inputs
– 31 –
Bit Equality
Bit equal
a
HCL Expression
eq
bool eq = (a&&b)||(!a&&!b)
b

Generate 1 if a and b are equal
Hardware Control Language (HCL)

Very simple hardware description language
 Boolean operations have syntax similar to C logical operations

We’ll use it to describe control logic for processors
– 32 –
Word Equality
Word-Level Representation
b31
Bit equal
eq31
B
=
a31
b30
Bit equal
A
eq30
a30
HCL Representation
Eq
b1
Bit equal
a0
Bit equal
bool Eq = (A == B)
eq1

a1
b0
Eq
eq0

32-bit word size
HCL representation
 Equality operation
 Generates Boolean value
– 33 –
Bit-Level Multiplexor
s
Bit MUX
HCL Expression
bool out = (s&&a)||(!s&&b)
b
out
a



Control signal s
Data signals a and b
Output a when s=1, b when s=0
– 34 –
Word Multiplexor
Word-Level Representation
s
s
B
b31
out31
a31
Out
A
HCL Representation
b30
out30
a30


int Out = [
s : A;
1 : B;
];
Select input word A or B depending
on control signal s
HCL representation
 Case expression
b0
out0
a0
MUX
 Series of test : value pairs
 Output value for first successful test
– 35 –
HCL Word-Level Examples
Minimum of 3 Words
C
B
A
MIN3
Min3
int Min3 = [
A < B && A < C : A;
B < A && B < C : B;
1
: C;
];



Find minimum of three input
words
HCL case expression
Final case guarantees match
4-Way Multiplexor
s1
s0
D0
D1
D2
D3
MUX4
Out4
int Out4 = [
!s1&&!s0: D0;
!s1
: D1;
!s0
: D2;
1
: D3;
];



Select one of 4 inputs
based on two control
bits
HCL case expression
Simplify tests by
assuming sequential
matching
– 36 –
Arithmetic Logic Unit
0
Y
A
X
B

1
A
L
U
Y
A
X+Y
OF
ZF
CF
X
B
A
L
U
2
Y
A
X-Y
OF
ZF
CF
X
B
3
Y
A
L
U
A
X&Y
OF
ZF
CF
X
B
A
L
U
X^Y
OF
ZF
CF
Combinational logic
 Continuously responding to inputs

Control signal selects function computed
 Corresponding to 4 arithmetic/logical operations in Y86

Also computes values for condition codes
– 37 –
Registers
Structure
i7
D
C
Q+
o7
i6
D
C
Q+
o6
i5
D
C
Q+
o5
i4
D
C
Q+
o4
i3
D
C
Q+
o3
i2
D
C
Q+
o2
i1
D
C
Q+
o1
i0
D
C
Q+
o0
I
O
Clock
Clock

Stores word of data
 Different from program registers seen in assembly code


Collection of edge-triggered latches
Loads input on rising edge of clock
– 38 –
Register Operation
State = x
Input = y
Output = x
x



State = y

Rising
clock

Output = y
y
Stores data bits
For most of time acts as barrier between input and output
As clock rises, loads input
– 39 –
State Machine Example
Comb. Logic
0


A
L
U
0
Out
Accumulator circuit
Load or accumulate
on each cycle
MUX
In
1
Load
Clock
Clock
Load
In
Out
x0
x1
x0
x0+x1
x2
x0+x1+x2
x3
x4
x3
x3+x4
x5
x3+x4+x5
– 40 –
Random-Access Memory
valA
srcA
A
valW
Register
file
Read ports
valB
srcB
W
dstW
Write port
B
Clock

Stores multiple words of memory
 Address input specifies which word to read or write

Register file
 Holds values of program registers
 %eax, %esp, etc.
 Register identifier serves as address
» ID 8 implies no read or write performed

Multiple Ports
 Can read and/or write multiple words in one cycle
» Each has separate address and data input/output
– 41 –
Register File Timing
Reading
valA
srcA
x
A
valB
srcB
x

Register
file

2
Like combinational logic
Output data generated based on input
address
 After some delay
B
2
Writing


2
Like register
Update only as clock rises
x
valW
Register
file
W
Clock
dstW
y
2

Rising
clock
2

y
valW
Register
file
W
dstW
Clock
– 42 –
Hardware Control Language


Very simple hardware description language
Can only express limited aspects of hardware operation
 Parts we want to explore and modify
Data Types

bool: Boolean
 a, b, c, …

int: words
 A, B, C, …
 Does not specify word size---bytes, 32-bit words, …
Statements

bool a = bool-expr ;

int A = int-expr ;
– 43 –
HCL Operations

Classify by type of value returned
Boolean Expressions

Logic Operations
 a && b, a || b, !a

Word Comparisons
 A == B, A != B, A < B, A <= B, A >= B, A > B

Set Membership
 A in { B, C, D }
» Same as A == B || A == C || A == D
Word Expressions

Case expressions
 [ a : A; b : B; c : C ]
 Evaluate test expressions a, b, c, … in sequence
 Return word expression A, B, C, … for first successful test
– 44 –
Summary
Computation



Performed by combinational logic
Computes Boolean functions
Continuously reacts to input changes
Storage

Registers
 Hold single words
 Loaded as clock rises

Random-access memories
 Hold multiple words
 Possible multiple read or write ports
 Read word when address input changes
 Write word as clock rises
– 45 –
SEQuential processor implementation
CENG331: Introduction to Computer Systems
8th Lecture
Instructor:
Erol Sahin
Acknowledgement: Most of the slides are adapted from the ones prepared
by R.E. Bryant, D.R. O’Hallaron of Carnegie-Mellon Univ.
Y86 Instruction Set
Byte
0
nop
0
0
halt
1
0
rrmovl rA, rB
2
0 rA rB
irmovl V, rB
3
0
8 rB
V
rmmovl rA, D(rB)
4
0 rA rB
D
mrmovl D(rB), rA
OPl rA, rB
jXX Dest
call Dest
ret
pushl rA
popl rA
5
1
2
3
0 rA rB
4
5
addl
6
0
subl
6
1
andl
6
2
xorl
6
3
jmp
7
0
jle
7
1
jl
7
2
je
7
3
jne
7
4
jge
7
5
jg
7
6
D
6 fn rA rB
7 fn
8
9
A
B
0
Dest
Dest
0
0 rA 8
0 rA 8
– 47 –
newPC
PC
SEQ Hardware StructureWrite back
valE, valM
valM
State

Program counter register (PC)
Condition code register (CC)
Register File

Memories


 Access same memory space
Data
Data
memory
memory
Memory
Addr, Data
valE
Bch
Execute
CC
CC
ALU
ALU
aluA, aluB
 Data: for reading/writing program data
 Instruction: for reading instructions
valA, valB
Instruction Flow



srcA, srcB
Read instruction at address specified by Decode
dstA, dstB
PC
icode ifun
valP
Process through stages
rA, , rB
valC
Update program counter
Instruction
PC
Instruction
PC
Fetch
memory
memory
A
B
Register
RegisterM
file
file
E
increment
increment
PC
– 48 –
newPC
SEQ Stages
PC
valE, valM
Write back
valM
Fetch

Read instruction from instruction
memory
Data
Data
memory
memory
Memory
Addr, Data
Decode

Read program registers
Execute

valE
Bch
Execute
CC
CC
aluA, aluB
Compute value or address
valA, valB
Memory

Read or write data
icode ifun
rA , rB
valC
B
valP
,
Fetch

A
Register
RegisterM
file
file
E
Write program registers
PC
srcA, srcB
dstA, dstB
Decode
Write Back

ALU
ALU
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Update program counter
PC
– 49 –
Instruction Decoding
Optional
5
Optional
0 rA rB
D
icode
ifun
rA
rB
valC
Instruction Format



Instruction byte
Optional register byte
Optional constant word
icode:ifun
rA:rB
valC
– 50 –
Executing Arith./Logical Operation
OPl rA, rB
Fetch

Memory
Read 2 bytes
Decode

6 fn rA rB
Read operand registers
Execute

Perform operation

Set condition codes

Do nothing
Write back

Update register
PC Update

Increment PC by 2
– 51 –
Stage Computation: Arith/Log. Ops
OPl rA, rB
icode:ifun  M1[PC]
Read instruction byte
rA:rB  M1[PC+1]
Read register byte
valP  PC+2
Compute next PC
valA  R[rA]
Read operand A
valB  R[rB]
Read operand B
valE  valB OP valA
Perform ALU operation
Set CC
Set condition code register
Memory
Write
R[rB]  valE
Write back result
back
PC update
PC  valP
Update PC
Fetch
Decode
Execute


Formulate instruction execution as sequence of simple steps
Use same general form for all instructions
– 52 –
Executing rmmovl
rmmovl rA, D(rB) 4 0 rA rB
Fetch

Memory
Read 6 bytes
Decode

Read operand registers
Execute

D
Compute effective address

Write to memory
Write back

Do nothing
PC Update

Increment PC by 6
– 53 –
Stage Computation: rmmovl
rmmovl rA, D(rB)
Fetch
Decode
Execute
Memory
Write
back
PC update

icode:ifun  M1[PC]
Read instruction byte
rA:rB  M1[PC+1]
Read register byte
valC  M4[PC+2]
Read displacement D
valP  PC+6
Compute next PC
valA  R[rA]
Read operand A
valB  R[rB]
Read operand B
valE  valB + valC
Compute effective address
M4[valE]  valA
Write value to memory
PC  valP
Update PC
Use ALU for address computation
– 54 –
Executing popl
popl rA
Fetch

Memory
Read 2 bytes
Decode

Read stack pointer
Execute

b 0 rA 8
Increment stack pointer by 4

Read from old stack pointer
Write back


Update stack pointer
Write result to register
PC Update

Increment PC by 2
– 55 –
Stage Computation: popl
popl rA
icode:ifun  M1[PC]
Read instruction byte
rA:rB  M1[PC+1]
Read register byte
valP  PC+2
valA  R[%esp]
Compute next PC
valB  R [%esp]
Read stack pointer
valE  valB + 4
Increment stack pointer
Memory
Write
valM  M4[valA]
R[%esp]  valE
Read from stack
back
PC update
R[rA]  valM
Write back result
PC  valP
Update PC
Fetch
Decode
Execute


Read stack pointer
Update stack pointer
Use ALU to increment stack pointer
Must update two registers
 Popped value
 New stack pointer
– 56 –
Executing Jumps
jXX Dest
7 fn
fall thru:
XX XX
Not taken
target:
XX XX
Taken
Fetch


Memory
Read 5 bytes
Increment PC by 5
Decode

Do nothing
Execute

Dest
Determine whether to take
branch based on jump condition
and condition codes

Do nothing
Write back

Do nothing
PC Update

Set PC to Dest if branch taken or
to incremented PC if not branch
– 57 –
Stage Computation: Jumps
jXX Dest
Fetch
icode:ifun  M1[PC]
Read instruction byte
valC  M4[PC+1]
Read destination address
valP  PC+5
Fall through address
Bch  Cond(CC,ifun)
Take branch?
PC  Bch ? valC : valP
Update PC
Decode
Execute
Memory
Write
back
PC update


Compute both addresses
Choose based on setting of condition codes and branch condition
– 58 –
Executing call
call Dest
XX XX
target:
XX XX
Memory

Read 5 bytes

Increment PC by 5
Decode
Read stack pointer
Execute

Dest
return:
Fetch

8 0
Decrement stack pointer by 4

Write incremented PC to new
value of stack pointer
Write back

Update stack pointer
PC Update

Set PC to Dest
– 59 –
Stage Computation: call
call Dest
icode:ifun  M1[PC]
Read instruction byte
valC  M4[PC+1]
Read destination address
valP  PC+5
Compute return point
valB  R[%esp]
Read stack pointer
valE  valB + –4
Decrement stack pointer
Memory
Write
M4[valE]  valP
R[%esp]  valE
Write return value on stack
back
PC update
PC  valC
Set PC to destination
Fetch
Decode
Execute


Update stack pointer
Use ALU to decrement stack pointer
Store incremented PC
– 60 –
Executing ret
9 0
ret
return:
Fetch

XX XX
Memory
Read 1 byte

Decode

Read stack pointer
Execute

Increment stack pointer by 4
Read return address from old
stack pointer
Write back

Update stack pointer
PC Update

Set PC to return address
– 61 –
Stage Computation: ret
ret
icode:ifun  M1[PC]
Read instruction byte
valA  R[%esp]
Read operand stack pointer
valB  R[%esp]
Read operand stack pointer
valE  valB + 4
Increment stack pointer
Memory
Write
valM  M4[valA]
R[%esp]  valE
Read return address
back
PC update
PC  valM
Set PC to return address
Fetch
Decode
Execute


Update stack pointer
Use ALU to increment stack pointer
Read return address from memory
– 62 –
Computation Steps
OPl rA, rB
Fetch
Decode
Execute
icode,ifun
icode:ifun  M1[PC]
Read instruction byte
rA,rB
rA:rB  M1[PC+1]
Read register byte
valC
[Read constant word]
valP
valP  PC+2
Compute next PC
valA, srcA
valA  R[rA]
Read operand A
valB, srcB
valB  R[rB]
Read operand B
valE
valE  valB OP valA
Perform ALU operation
Cond code Set CC
Set condition code register
Memory
Write
valM
[Memory read/write]
back
PC update
dstM


dstE
PC
R[rB]  valE
Write back ALU result
[Write back memory result]
PC  valP
Update PC
All instructions follow same general pattern
Differ in what gets computed on each step
– 63 –
Irmovl: Computation Steps
irmovl V, rB
Fetch
Decode
Execute
icode,ifun
icode:ifun  M1[PC]
Read instruction byte
rA,rB
rA:rB  M1[PC+1]
Read register byte
valC
valC M4[PC+2]
[Read constant word]
valP
valP  PC+6
Compute next PC
valE  0 + valC
Perform ALU operation
R[rB]  valE
Write back ALU result
PC  valP
Update PC
valA, srcA
valB, srcB
valE
Cond code
Memory
Write
valM
back
PC update
dstM


dstE
PC
All instructions follow same general pattern
Differ in what gets computed on each step
– 64 –
Computation Steps
call Dest
icode,ifun
Fetch
Decode
Execute
rA,rB
[Read register byte]
valC  M4[PC+1]
Read constant word
valP
valP  PC+5
Compute next PC
valA, srcA
[Read operand A]
valB, srcB
valB  R[%esp]
Read operand B
valE
valE  valB + –4
Perform ALU operation
Cond code
valM
back
PC update
dstM

Read instruction byte
valC
Memory
Write

icode:ifun  M1[PC]
dstE
PC
[Set condition code reg.]
M4[valE]  valP
R[%esp]  valE
[Memory read/write]
[Write back ALU result]
Write back memory result
PC  valC
Update PC
All instructions follow same general pattern
Differ in what gets computed on each step
– 65 –
Computed Values
Fetch
Execute
icode
Instruction code
ifun
Instruction function
rA Instr. Register A
rB Instr. Register B
valC
Instruction constant
valP
Incremented PC


valE
Bch
ALU result
Branch flag
Memory

valM
Value from memory
Decode
srcA
srcB
dstE
dstM
valA
Register ID A
Register ID B
Destination Register E
Destination Register M
Register value A
valB
Register value B
– 66 –
newPC
SEQ Hardware
New
PC
PC
Key
valM
data out

Blue boxes: predesigned
hardware blocks
read
Data
Data
memory
memory
Mem.
control
Memory
write
Addr
Data
 E.g., memories, ALU

Gray boxes:
logic
control
Execute
Bch
valE
CC
CC
ALU
ALU
 Describe in HCL




White ovals:
for signals
Thick lines:
word values
Thin lines:
values
Dotted lines:
values
ALU
A
ALU
fun.
ALU
B
labels
valA
valB
32-bit
Decode
A
dstE dstM srcA
srcB
dstE dstM srcA
srcB
B
Register
Register M
file
file E
4-8 bit
Write back
icode
ifun
rA
rB
valC
valP
1-bit
Fetch
Instruction
Instruction
memory
memory
PC
PC
increment
increment
PC
– 67 –
Fetch Logic
icode
ifun
rA
rB
valC
valP
Need
valC
Instr
valid
Need
regids
Split
Split
PC
PC
increment
increment
Align
Align
Byte 0
Bytes 1-5
Instruction
Instruction
memory
memory
Predefined Blocks




PC
PC: Register containing PC
Instruction memory: Read 6 bytes (PC to PC+5)
Split: Divide instruction byte into icode and ifun
Align: Get fields for rA, rB, and valC
– 68 –
Fetch Logic
icode
ifun
rA
rB
valC
valP
Need
valC
Instr
valid
Need
regids
Split
Split
PC
PC
increment
increment
Align
Align
Byte 0
Bytes 1-5
Instruction
Instruction
memory
memory
PC
Control Logic



Instr. Valid: Is this instruction valid?
Need regids: Does this instruction have a register bytes?
Need valC: Does this instruction have a constant word?
– 69 –
Fetch Control Logic
nop
0
0
halt
1
0
rrmovl rA, rB
2
0 rA rB
irmovl V, rB
3
0
8 rB
V
rmmovl rA, D(rB)
4
0 rA rB
D
mrmovl D(rB), rA
5
0 rA rB
D
OPl rA, rB
6
fn rA rB
jXX Dest
7
fn
Dest
call Dest
8
0
Dest
ret
9
0
pushl rA
A
0 rA 8
popl rA
B
0 rA 8
bool need_regids =
icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
IIRMOVL, IRMMOVL, IMRMOVL };
bool instr_valid = icode in
{ INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };
– 70 –
Decode Logic
valA
valB
A
B
valM valE
Register File



Read ports A, B
Write ports E, M
Addresses are register IDs or 8 (no
access)
Register
Register
file
file
dstE
dstM
srcA
dstE dstM srcA
M
E
srcB
srcB
Control Logic


srcA, srcB: read port
addresses
dstA, dstB: write port
addresses
icode
rA
rB
– 71 –
A Source
OPl rA, rB
Decode
valA  R[rA]
Read operand A
rmmovl rA, D(rB)
Decode
valA  R[rA]
Read operand A
popl rA
Decode
valA  R[%esp]
Read stack pointer
jXX Dest
Decode
No operand
call Dest
Decode
No operand
ret
Decode
valA  R[%esp]
Read stack pointer
int srcA = [
icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL
icode in { IPOPL, IRET } : RESP;
1 : RNONE; # Don't need register
];
} : rA;
– 72 –
E Destination
OPl rA, rB
Write-back R[rB]  valE
Write back result
rmmovl rA, D(rB)
Write-back
None
popl rA
Write-back R[%esp]  valE
Update stack pointer
jXX Dest
Write-back
None
call Dest
Write-back R[%esp]  valE
Update stack pointer
ret
Write-back R[%esp]  valE
Update stack pointer
int dstE = [
icode in { IRRMOVL, IIRMOVL, IOPL} : rB;
icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
1 : RNONE; # Don't need register
];
– 73 –
Execute Logic
Units

ALU
 Implements 4 required functions
 Generates condition code values

bcond
bcond
bcond
Control Logic



Set CC: Should condition code register
be loaded?
ALU A: Input A to ALU
ALU B: Input B to ALU
Set
CC
icode ifun
ALU
fun.
ALU
ALU
CC
CC
 Computes branch flag

valE
CC
 Register with 3 condition code bits

Bch
ALU
A
valC
ALU
B
valA
valB
ALU fun: What function should ALU
compute?
– 74 –
ALU A Input
Execute
OPl rA, rB
valE  valB OP valA
Perform ALU operation
rmmovl rA, D(rB)
Execute
valE  valB + valC
Compute effective address
popl rA
Execute
valE  valB + 4
Increment stack pointer
jXX Dest
Execute
No operation
call Dest
Execute
valE  valB + –4
Decrement stack pointer
ret
Execute
valE  valB + 4
Increment stack pointer
int aluA = [
icode in { IRRMOVL, IOPL } : valA;
icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC;
icode in { ICALL, IPUSHL } : -4;
icode in { IRET, IPOPL } : 4;
# Other instructions don't need ALU
];
– 75 –
ALU Operation
OPl rA, rB
Execute
valE  valB OP valA
Perform ALU operation
rmmovl rA, D(rB)
Execute
valE  valB + valC
Compute effective address
popl rA
Execute
valE  valB + 4
Increment stack pointer
jXX Dest
Execute
No operation
call Dest
Execute
valE  valB + –4
Decrement stack pointer
ret
Execute
valE  valB + 4
int alufun = [
icode == IOPL : ifun;
1 : ALUADD;
];
Increment stack pointer
– 76 –
Memory Logic
Memory

valM
Reads or writes memory word
data out
Mem.
read
Control Logic




Mem. read: should word be
read?
Mem. write: should word be
written?
Mem. addr.: Select address
Mem. data.: Select data
Mem.
write
read
Data
Data
memory
memory
write
data in
Mem
addr
icode
valE
Mem
data
valA valP
– 77 –
Memory Address
OPl rA, rB
Memory
No operation
rmmovl rA, D(rB)
Memory
M4[valE]  valA
Write value to memory
popl rA
Memory
valM  M4[valA]
Read from stack
jXX Dest
Memory
No operation
call Dest
Memory
M4[valE]  valP
Write return value on stack
ret
Memory
valM  M4[valA]
Read return address
int mem_addr = [
icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE;
icode in { IPOPL, IRET } : valA;
# Other instructions don't need address
];
– 78 –
Memory Read
OPl rA, rB
Memory
No operation
rmmovl rA, D(rB)
Memory
M4[valE]  valA
Write value to memory
popl rA
Memory
valM  M4[valA]
Read from stack
jXX Dest
Memory
No operation
call Dest
Memory
M4[valE]  valP
Write return value on stack
ret
Memory
valM  M4[valA]
Read return address
bool mem_read = icode in { IMRMOVL, IPOPL, IRET };
– 79 –
PC Update Logic
PC
New PC

New
PC
Select next value of PC
icode
Bch
valC
valM
valP
– 80 –
PC
Update
OPl rA, rB
PC update
PC  valP
Update PC
rmmovl rA, D(rB)
PC update
PC  valP
Update PC
popl rA
PC update
PC  valP
Update PC
jXX Dest
PC update
PC  Bch ? valC : valP
Update PC
call Dest
PC update
PC  valC
Set PC to destination
ret
PC update
PC  valM
Set PC to return address
int new_pc = [
icode == ICALL : valC;
icode == IJXX && Bch : valC;
icode == IRET : valM;
1 : valP;
];
– 81 –
SEQ Operation
State
PC register
 Cond. Code register
 Data memory
 Register file
All updated as clock rises

Combinational
Logic
Read
Write
Data
Data
memory
memory
CC
CC
Read
Ports
Write
Ports
Register
Register
file
file
Combinational Logic


PC
0x00c

ALU
Control logic
Memory reads
 Instruction memory
 Register file
 Data memory
– 82 –
Cycle 1
SEQ
Operation #2
Combinational
Logic
CC
CC
100
100
Cycle 2
Cycle 3
Cycle 4
Clock
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Write
Read

Data
Data
memory
memory
Write
Ports
Read
Ports
Register
Register
file
file

state set according to
second irmovl
instruction
combinational logic
starting to react to state
changes
%ebx
0x100
%ebx == 0x100
PC
0x00c
– 83 –
Cycle 1
SEQ
Operation #3
Combinational
Logic
CC
CC
100
100
000
Cycle 2
Cycle 3
Cycle 4
Clock
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Write
Read

state set according to
second irmovl instruction

combinational logic
generates results for addl
instruction
Data
Data
memory
memory
Write
Ports
Read
Ports
Register
Register
file
file
%ebx
0x100
%ebx == 0x100
0x00e
PC
0x00c
– 84 –
Cycle 1
SEQ
Operation #4
Cycle 3
Cycle 4
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Read
Combinational
Logic
CC
CC
000
000
Cycle 2
Clock
Write

state set according to
addl instruction

combinational logic
starting to react to
state changes
Data
Data
memory
memory
Read
Ports
Write
Ports
Register
Register
file
file
%ebx
%ebx == 0x300
0x300
PC
0x00e
– 85 –
Cycle 1
SEQ
Operation #5
Cycle 3
Cycle 4
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Read
Combinational
Logic
CC
CC
000
000
Cycle 2
Clock
Write

state set according to
addl instruction

combinational logic
generates results for
je instruction
Data
Data
memory
memory
Read
Ports
Write
Ports
Register
Register
file
file
%ebx
%ebx == 0x300
0x300
0x013
PC
0x00e
– 86 –
SEQ Summary
Implementation




Express every instruction as series of simple steps
Follow same general flow for each instruction type
Assemble registers, memories, predesigned combinational blocks
Connect with control logic
Limitations




Too slow to be practical
In one cycle, must propagate through instruction memory, register file,
ALU, and data memory
Would need to run clock very slowly
Hardware units only active for fraction of clock cycle
– 87 –
Download