Multiple Cycle Implementation of MIPS-Lite CPU CS365 Lecture 8

advertisement
Multiple Cycle Implementation of
MIPS-Lite CPU
CS365
Lecture 8
Review on Single Cycle Datapath

Subset of the core MIPS ISA
Arithmetic/Logic instructions: AND, OR, ADD, SUB,
SLT
 Data flow instructions: LW, SW
 Branch instructions: BEQ, J


Five steps in processor design
Analyze the instruction
 Determine the datapath components
 Assemble the components
 Determine the control
 Design the control unit

Multi-cycle CPU
CS465
2
D. Barbara
Complete Single Cycle Datapath
How lw, sw, R-Type, beq, j instructions work?
Multi-cycle CPU
CS465
3
D. Barbara
Delays in Single Cycle Datapath
2ns
2ns
2ns
2ns
2ns
1ns
What are the delays for lw, sw, R-Type, beq, j instructions?
Multi-cycle CPU
CS465
4
D. Barbara
Single Cycle Implementation

Calculate cycle time assuming negligible delays
except:

memory (2ns), ALU and adders (2ns), register file
access (1ns)
Instruction Instruction Register ALU Register/
class
Fetch
Access
Memory
Access
R-Type
Load
Store
X
X
X
X
X
X
X
X
X
Branch
Jump
X
X
X
X
Multi-cycle CPU
CS465
R
M
M
Register
Access
X
6
8
7
5
2
5
D. Barbara
Remarks on Single Cycle Datapath

Single cycle datapath ensures the execution of
any instruction within one clock cycle
Functional units must be duplicated if used multiple
times by one instruction, e.g. ALU Why?
 Functional units can be shared if used by different
instructions


Single cycle datapath is not efficient in time
Clock cycle time is determined by the instruction
taking the longest time, eg. lw in MIPS
 Variable clock cycle time is too complicated


Alternative design/implementation approaches
Multiple clock cycles per instruction – this lecture
 Pipelining – Chapter 6

Multi-cycle CPU
CS465
6
D. Barbara
Multi-cycle Approach

Single cycle problems:



What if we had a more complicated instruction like FP?
Wasteful of area
One solution:



Use a “smaller” cycle time
Have different instructions take different numbers of cycles
A “multicycle” datapath
Instruction
register
PC
Data
Address
A
Memory
Data
Multi-cycle CPU
Register #
Instruction
or data
ALU
Registers
ALUOut
Register #
Memory
data
register
B
Register #
CS465
7
D. Barbara
Multistage Execution of Instructions

We have informally described instructions as
executing in several steps
Instruction fetch and PC update
 Instruction decode and reading from registers
 Performing an ALU computation
 Reading/writing data memory
 Storing data back to register file


What if we made these stages explicit in the
hardware design?
Multi-cycle CPU
CS465
8
D. Barbara
Benefits

Performance benefits

Each instruction can execute only the stages that are
necessary



E.g. branch instruction beq
Instructions can complete as soon as possible, instead
of being limited by the slowest instruction
Cost benefits
As an added bonus, we can eliminate some of the
extra hardware from the single-cycle datapath
 We will restrict to using each functional unit once per
cycle
 We could reuse some units in a different cycle during
the execution of a single instruction


E.g. beq can compare operand and compute target in
different cycles
Multi-cycle CPU
CS465
9
D. Barbara
Multicycle Approach

We will be reusing functional units
ALU used to compute address and to increment PC
 Memory used for instruction and data


Unlike the single cycle implementation, our
control signals will not be determined solely by
instruction
E.g., what should the ALU do for a “subtract”
instruction?
 Need to specify not only instruction but also which
cycle in instruction’s execution


We will use a finite state machine for control
Multi-cycle CPU
CS465
10
D. Barbara
Review: Finite State Machines

Finite state machines:
A set of states and
 Next state function (determined by current state and
the input)
 Output function (determined by current state and
possibly input)

Current state
Next-state
function
Next
state
Clock
Inputs
Output
function

Outputs
There are two types of FSM according to the output
function
Multi-cycle CPU
CS465
11
D. Barbara
Moore Machine vs. Mealy Machine

Moore machine


The output function depends on the current state
Mealy machine

The output function depends on both the current state
and the current input
Current state
Next-state
function
Next
state
Clock
Inputs
Output
function
Outputs
We will use a Moore machine
 Refer to Appendix B.10 (page B-67)

Multi-cycle CPU
CS465
12
D. Barbara
Multicycle Approach

Break up the instructions into steps, each step takes
a cycle



Balance the amount of work to be done
Restrict each cycle to use only one major functional unit
At the end of a cycle


PC
Store values for use in later cycles
Introduce additional “internal” registers
0
M
u
x
1
Address
Memory
Instruction
[25– 21]
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
MemData
Instruction
[15– 0]
Write
data
Instruction
register
Instruction
[15– 0]
Memory
data
register
Multi-cycle CPU
0
M
Instruction u
x
[15– 11]
1
A
B
0
M
u
x
1
Sign
extend
CS465
32
Zero
ALU ALU
result
ALUOut
0
4
Write
data
16
0
M
u
x
1
1 M
u
2 x
3
Shift
left 2
13
D. Barbara
Changes to Datapath

A single memory unit is used to store both
instructions and data
 Memory
address specified by PC or ALUout
 Need a new multiplexor (IorD) to control
whether the memory is being accessed to
read an instruction or read/write data (for
lw/sw)
Instruction
register
PC
Data
Address
A
Register #
Instruction
Memory
or data
Data
Multi-cycle CPU
ALU
Registers
ALUOut
Register #
Memory
data
register
B
Register #
CS465
14
D. Barbara
Changes to Datapath

New internal registers added to support multi-cycle
operation



Store data output by functional elements for use in
subsequent cycles
Memory access: Instruction register (IR), memory data
register (MDR); register access:A, B; ALU op: ALUOut
All except IR hold data only between adjacent cycles

Only IR needs to hold the instruction until the end of execution of
that instruction, hence need a write control
Instruction
register
PC
Data
Address
A
Memory
Data
Multi-cycle CPU
Register #
Instruction
or data
ALU
Registers
ALUOut
Register #
Memory
data
register
B
Register #
CS465
15
D. Barbara
Changes to Datapath

ALU used for all arithmetic (including computing branch
target address and PC+4)


Need a new multiplexor for top input to ALU (ALUSrcA): choose
between PC and register A
Bottom input multiplexor (ALUSrcB) to ALU extended to have four
inputs (instead of two in single cycle datapath): register B, signextended immediate, address offset (sign-extended and shifted),
constant 4
Instruction
register
PC
Data
Address
A
Register #
Instruction
Memory
or data
Data
Multi-cycle CPU
ALU
Registers
ALUOut
Register #
Memory
data
register
B
Register #
CS465
16
D. Barbara
Multicycle Datapath with Control
Figure 5.27
Multi-cycle CPU
CS465
17
D. Barbara
Multicycle Datapath & Control
PCWriteCond
PCSource
PCWrite
ALUOp
IorD Outputs
ALUSrcB
MemRead
ALUSrcA
Control
MemWrite
RegWrite
MemtoReg
Op
RegDst
IRWrite
[5– 0]
0
M
26
Instruction [25– 0]
PC
0
M
u
x
1
Shift
left 2
Instruction
[31-26]
Address
Memory
MemData
Write
data
Instruction
[25– 21]
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Write
data
Instruction
[15– 0]
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
Instruction u
x
[15– 11]
1
B
4
0
M
u
x
1
16
Sign
extend
32
Shift
left 2
1 u
x
2
PC [31-28]
0
M
u
x
1
A
28
Jump
address [31-0]
Zero
ALU ALU
result
ALUOut
0
1 M
u
2 x
3
ALU
control
Instruction [5– 0]

With support to jump and branch
Multi-cycle CPU
CS465
Figure 5.28
18
D. Barbara
Five Execution Steps





Instruction fetch
Instruction decode and register fetch
Execution, memory address computation, or
branch completion
Memory access or R-type instruction completion
Write-back step
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
Multi-cycle CPU
CS465
19
D. Barbara
High Level View of FSM Control
Figure 5.31
Multi-cycle CPU
CS465
20
D. Barbara
Step 1: Instruction Fetch

Operations
Use PC to get instruction and put it in the Instruction
Register
 Increment the PC by 4 and put the result back in PC
 Described in RTL (Register-Transfer Language)

IR = Memory[PC];
PC = PC + 4;

Control signals
What is the advantage of
updating the PC now?
Assert MemRead, IRwrite
 Set IorD to 0  select PC as the source of the
address
 Increment PC by 4: setting ALUSrcA to 0 (sending PC
to ALU), ALUSrcB to 01 (sending 4), ALUOp  00 (for
add), store PC+4 to PC  PCSource 00, and PCWrite
1
21

Multi-cycle CPU
CS465
D. Barbara
Step 2: Instr. Decode & Reg. Fetch

Operations
Read registers rs and rt in case we need them
 Compute the branch address in case the instruction is
a branch
 RTL:
A = Reg[IR[25-21]];
B = Reg[IR[20-16]];
ALUOut = PC + (sign-extend(IR[15-0]) << 2);


Control signals
We are not setting any control lines based on the
instruction type yet (we are busy "decoding" it in our
control logic)
 Set ALUSrcA to 0 so that PC is sent to ALU, ALUSrcB
to 11 so that sign-extended and shifted offset to ALU

Multi-cycle CPU
CS465
22
D. Barbara
Instruction Fetch & Decode
Instruction decode/
Register fetch
Instruction fetch
0
1
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
(Op = 'JMP')
Start
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00
Memory reference FSM
(Figure 5.33)
Multi-cycle CPU
R-type FSM
(Figure 5.34)
Branch FSM
(Figure 5.35)
CS465
Jump FSM
(Figure 5.36)
23
D. Barbara
Step 3: Execution

ALU is performing one of three functions, based
on instruction type
Need to set ALUSrcA, ALUSrcB, ALUOp
 Memory reference (LW/SW)
ALUOut = A + sign-extend(IR[15-0]);
 R-type:
ALUOut = A op B;
 Branch:
if (A==B) PC = ALUOut;



Need to set PCWriteCond, PCSource, use Zero
Jump
PC = PC[31:28]|(IR[25:0], 2’b00)
 Need to set PCWrite, PCSource
Multi-cycle CPU
CS465
24
D. Barbara
Step 4: R-type or Memory-access

Loads and stores access memory
MDR = Memory[ALUOut];
or
Memory[ALUOut] = B;


Need to set MemRead/MemWrite, IorD
R-type instructions finish
Reg[IR[15-11]] = ALUOut;

Need to set RegDst, RegWrite, MemtoReg
The write actually takes place at the end of the
cycle on the edge
Multi-cycle CPU
CS465
25
D. Barbara
Step 5: Write-back

For LW instruction:
Reg[IR[20-16]]= MDR;
 Need to set RegWrite, MemtoReg, RegDst
What about all the other instructions?
Multi-cycle CPU
CS465
26
D. Barbara
FSM for Memory Access Instr.
From state 1
(Op = 'LW') or (Op = 'SW')
Memory address computation
2
(Op = 'LW')
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
Memory
access
3
Memory
access
5
MemRead
IorD = 1
MemWrite
IorD = 1
Write-back step
4
To state 0
(Figure 5.32)
RegWrite
MemtoReg = 1
RegDst = 0
Multi-cycle CPU
CS465
27
D. Barbara
FSM for R-format Instructions
From state 1
(Op = R-type)
Execution
6
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
R-type completion
7
RegDst = 1
RegWrite
MemtoReg = 0
To state 0
(Figure 5.32)
Multi-cycle CPU
CS465
28
D. Barbara
FSM for Branch Instruction
From state 1
(Op = 'BEQ')
Branch completion
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
To state 0
(Figure 5.32)
Multi-cycle CPU
CS465
29
D. Barbara
FSM for Jump
From state 1
(Op = 'J')
Jump completion
9
PCWrite
PCSource = 10
To state 0
(Figure 5.32)
Multi-cycle CPU
CS465
30
D. Barbara
Summary of Steps
Step name
Instruction fetch
Action for R-type
instructions
Instruction
decode/register fetch
Action for memory-reference
Action for
instructions
branches
IR = Memory[PC]
PC = PC + 4
A = Reg [IR[25-21]]
B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address
computation, branch/
jump completion
ALUOut = A op B
ALUOut = A + sign-extend
(IR[15-0])
Memory access or R-type
completion
Reg [IR[15-11]] =
ALUOut
Load: MDR = Memory[ALUOut]
or
Store: Memory [ALUOut] = B
Memory read completion
Multi-cycle CPU
if (A ==B) then
PC = ALUOut
Action for
jumps
PC = PC [31-28] II
(IR[25-0]<<2)
Load: Reg[IR[20-16]] = MDR
CS465
31
D. Barbara
Complete Finite State Machine
Instruction decode/
register fetch
Instruction fetch
Start
Memory address
computation
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PC Source = 00
6
ALUSrcA = 1
ALUSrcB = 10
ALUOp = 00
(Op = 'LW')
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
Branch
completion
Execution
2
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Memory
access
3
1
Memory
access
5
MemRead
IorD = 1
(Op = 'J')
0
Jump
completion
9
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PC WriteCond
PCSource = 01
PCWrite
PC Source = 10
R-type completion
7
MemWrite
IorD = 1
RegDst = 1
RegWrite
MemtoReg = 0
Write-back step
4
R egDst = 0
RegWrite
MemtoReg = 1
Multi-cycle CPU
CS465
32
D. Barbara
Implementing the Control

Value of control signals is dependent upon:



Use the information we’ve accumulated to specify a finite
state machine



What instruction is being executed
Which step is being performed
Specify the finite state machine graphically, or
Use microprogramming
Implementation can be derived from specification
Multi-cycle CPU
CS465
33
D. Barbara
Finite State Machine for Control
Implementation:
PCW rite
PCW riteCond
IorD
M em Read
M em W rite
IRW rite
Control logic
M em toReg
PCS ource
ALUO p
Outputs
ALUSrcB
ALUSrcA
RegW rite
RegD st
NS3
NS2
NS1
NS0
Instruction register
opcode field
Multi-cycle CPU
S0
S1
S2
S3
Op0
Op1
Op2
Op3
Op4
Inputs
Op5

State register
CS465
34
D. Barbara
Simple Questions

How many cycles will it take to execute this
code?
Label:


lw $t2, 0($t3)
lw $t3, 4($t3)
beq $t2, $t3, Label #assume not taken
add $t5, $t2, $t3
sw $t5, 8($t3)
...
What is going on during the 8th cycle of
execution?
In what cycle does the actual addition of $t2 and
$t3 takes place?
Multi-cycle CPU
CS465
35
D. Barbara
Exceptions

Hardest part of control is to implement
exceptions & interrupts
Type of event
From where?
I/O device request
Invoke the operating
system from user
program
Arithmetic overflow
Using an undefined
instruction
Hardware
malfunctions
External
Internal
MIPS
terminology
Interrupt
Exception
Internal
Internal
Exception
Exception
Multi-cycle CPU
Internal or
External
CS465
Exception or
interrupt
36
D. Barbara
How Are Exceptions Handled?

In our design, we will consider two types of
exceptions
 Arithmetic
overflow
 Execution of an undefined instruction

Actions on exception
 Save
address of offending instruction in the
Exception Program Counter (EPC)
 Transfer control to the operating system at a
pre-specified address (exception handler)

OS then takes appropriate action
Multi-cycle CPU
CS465
37
D. Barbara
Exception Handling


For the OS to take appropriate action, it must
know the reason for the exception
Two ways to communicate reason to OS
Have a Status register which holds a field that
indicates the reason for the exception
 Vectored interrupts



Address to which control is transferred depends upon the
cause of the exception
MIPS uses first method above; it has a register
called Cause (in addition to the EPC register)
Undefined instruction =0
 Arithmetic overflow =1

Multi-cycle CPU
CS465
38
D. Barbara
CPU w/ Exception Support
CauseWrite
IntCause
EPCWrite
PCSource
ALUOp
PCWriteCond
PCWrite
IorD
Outputs
MemRead
MemWrite
ALUSrcB
Control
ALUSrcA
MemtoReg
IRWrite
RegWrite
Op
[5– 0]
RegDst
0
26
Instruction [25– 0]
PC
0
M
u
x
1
Shift
left 2
Instruction
[31-26]
Address
Memory
MemData
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Write
data
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
Instruction u
x
[15– 11]
1
B
4
Write
data
0
M
u
x
1
Zero
ALU ALU
result
Sign
extend
32
Shift
left 2
3
0
1M
u
2 x
3
ALUOut
0
1
16
u
x
PC [31-28]
0
M
u
x
1
A
1M
2
CO 00 00 00
Instruction
[25– 21]
Instruction
[15– 0]
28
Jump
address [31-0]
EPC
0
M
u
x
1
Cause
ALU
control
Instruction [5– 0]
Multi-cycle CPU
CS465
39
D. Barbara
Exception Handling

Datapath additions


Control signals




EPC, Cause (for undefined instruction, Cause = 0,
arithmetic overflow Cause = 1)
EPCWrite, CauseWrite
IntCause (sets the Cause register)
PCSrc has to be modified so that one of its sources is
the OS entry point
Three steps
1.
2.
3.
Write Cause
EPC = PC – 4 (Have to use ALU, so need to expand
multiplexors for ALUSrcA and ALUSrcB
Write PC
Multi-cycle CPU
CS465
40
D. Barbara
CPU w/ Exception Support
CauseWrite
IntCause
EPCWrite
PCSource
ALUOp
PCWriteCond
PCWrite
IorD
Outputs
MemRead
MemWrite
ALUSrcB
Control
ALUSrcA
MemtoReg
IRWrite
RegWrite
Op
[5– 0]
RegDst
0
26
Instruction [25– 0]
PC
0
M
u
x
1
Shift
left 2
Instruction
[31-26]
Address
Memory
MemData
Read
register 1
Instruction
[20– 16]
Read
Read
register 2 data 1
Registers
Write
Read
register data 2
Write
data
Instruction
register
Instruction
[15– 0]
Memory
data
register
0
M
Instruction u
x
[15– 11]
1
B
4
Write
data
0
M
u
x
1
Zero
ALU ALU
result
Sign
extend
32
Shift
left 2
3
0
1M
u
2 x
3
ALUOut
0
1
16
u
x
PC [31-28]
0
M
u
x
1
A
1M
2
CO 00 00 00
Instruction
[25– 21]
Instruction
[15– 0]
28
Jump
address [31-0]
EPC
0
M
u
x
1
Cause
ALU
control
Instruction [5– 0]
Multi-cycle CPU
CS465
41
D. Barbara
States for Handling Exceptions
11
IntCause = 1
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
10
IntCause = 0
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
Multi-cycle CPU
To state 0 to begin next instruction
CS465
42
D. Barbara
FSM including Exception Support
Instruction decode/
Register fetch
Instruction fetch
0
Memory address
computation
Branch
completion
Execution
2
6
(Op = 'LW')
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 00
8
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10
Memory
access
Memory
access
5
MemRead
IorD = 1
ALUSrcA = 0
ALUSrcB = 11
ALUOp = 00
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 01
PCWriteCond
PCSource = 01
R-type completion
7
MemWrite
IorD = 1
(Op = 'J')
Start
3
1
MemRead
ALUSrcA = 0
IorD = 0
IRWrite
ALUSrcB = 01
ALUOp = 00
PCWrite
PCSource = 00
11
RegDst = 1
RegWrite
MemtoReg = 0
Overflow
Jump
completion
9
PCWrite
PCSource = 10
IntCause = 1
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
10
IntCause = 0
CauseWrite
ALUSrcA = 0
ALUSrcB = 01
ALUOp = 01
EPCWrite
PCWrite
PCSource = 11
Write-back step
4
Overflow
RegWrite
MemtoReg = 1
RegDst = 0
Multi-cycle CPU
CS465
43
D. Barbara
Exercise

Given the following instruction mix, what is
the CPI for the multi-cycle CPU, assuming
each step requires 1 clock cycle?
 Loads:
20%, stores: 10%, branches: 13%,
jumps: 2%, ALU: 55%

Idea
 Get
the cycles needed for every instruction
type: these are their individual CPIs
 Calculate the average CPI considering the
instruction mix (instruction frequency)
Multi-cycle CPU
CS465
44
D. Barbara
Cycles for Individual Instructions
Step name
Instruction fetch
Action for R-type
instructions
Instruction
decode/register fetch
Action for memory-reference
Action for
instructions
branches
IR = Memory[PC]
PC = PC + 4
A = Reg [IR[25-21]]
B = Reg [IR[20-16]]
ALUOut = PC + (sign-extend (IR[15-0]) << 2)
Execution, address
computation, branch/
jump completion
ALUOut = A op B
ALUOut = A + sign-extend
(IR[15-0])
Memory access or R-type
completion
Reg [IR[15-11]] =
ALUOut
Load: MDR = Memory[ALUOut]
or
Store: Memory [ALUOut] = B
Memory read completion




if (A ==B) then
PC = ALUOut
Action for
jumps
PC = PC [31-28] II
(IR[25-0]<<2)
Load: Reg[IR[20-16]] = MDR
Loads: 5, stores: 4
ALU instructions: 4
Branches: 3
Jumps: 3
Multi-cycle CPU
CS465
45
D. Barbara
Exercise

Given the following instruction mix, what is the
CPI for the multi-cycle CPU, assuming each step
requires 1 clock cycle?

Loads: 20%, stores: 10%, branches: 13%, jumps: 2%,
ALU: 55%

CPI for each of them: loads: 5, stores: 4, branches: 3,
jumps:3, ALU: 4

Average CPI = 0.205 + 0.104 + 0.133 + 0.023 +
0.554 = 4.05

Compare with the worst-case CPI of 5.0 if all
instructions take the same number of cycles
Multi-cycle CPU
CS465
46
D. Barbara
Summary

Multiple-cycle datapath and control




Break up the instructions into steps, each step takes a
cycle
Reduced clock cycle time; different instructions take varied
cycles to implement
Functional units can be reused
Typical steps





Instruction fetch
Instruction decode and register fetch
Execution: R-type, memory address computation, condition
check
Memory access or R-type write
Write back from memory
Multi-cycle CPU
CS465
47
D. Barbara
Next Lecture

Topic
 Pipelining

Reading
 Ch6
Patterson and Hennessy
Multi-cycle CPU
CS465
48
D. Barbara
Download