Appendix_D_Multicycle_Approach

advertisement
Computer Organization
Multi-cycle Approach
Dr. Iyad Jafar
Adapted from Dr. Gheith Abandah slides
http://www.abandah.com/gheith/Courses/CPE335_S08/index.html
CPE232 Basic MIPS Architecture
1
Multicycle Datapath Approach

Let an instruction take more than 1 clock cycle to complete

Break up instructions into steps where
- each step takes a cycle while trying to balance the amount of work to be
done in each step
- restrict each cycle to use only one major functional unit; unless used in
parallel


Not every instruction takes the same number of clock cycles
In addition to faster clock rates, multicycle allows functional
units that can be used more than once per instruction as long
as they are used on different clock cycles, as a result

Need one memory only– but only one memory access per cycle

Need one ALU/adder only – but only one ALU operation per cycle
CPE232 Basic MIPS Architecture
2
Multicycle Datapath Approach, con’t
At the end of a cycle
Address
Write Data

MDR
Read Data
(Instr. or Data)
Read Addr 1
Register Read
Read Addr 2 Data 1
File
Write Addr
Read
Write Data Data 2
ALUout
Memory
A
PC

Store values needed in a later cycle by the current instruction in internal registers
(A,B, IR, and MDR) . These registers are invisible to the programmer.
All of these registers, except IR, hold data only between a pair of adjacent clock
cycles thus they don’t need write control signal.
ALU
B

IR

IR – Instruction Register
MDR – Memory Data Register
A, B – regfile read data registers
ALUout – ALU output register
Data used by subsequent instructions are stored in programmer visible registers
(i.e., register file, PC, or memory)
CPE232 Basic MIPS Architecture
3
Multicycle Datapath Approach, con’t
 Similar to single cycle, shared functional units should have
multiplexers at their inputs.
There is only one adder that will be used to update PC, perform ALU
operations, comparison for beq, memory address computation, and
branch address computation.
CPE232 Basic MIPS Architecture
4
Multicycle Datapath Approach- Control Signals
CPE232 Basic MIPS Architecture
5
The Multicycle Datapath with Control Signals
Memory
Address
IR
PC
Instr[31-26]
0
1
Read Data
(Instr. or Data)
0
1
MDR
Write Data
1
PC[31-28]
Read Addr 1
Register Read
Data 1
Read Addr 2
File
Write Addr
Read
Data 2
Write Data
0
Instr[15-0]
Instr[5-0]
CPE232 Basic MIPS Architecture
Shift
left 2
Instr[25-0]
Sign
Extend 32
Shift
left 2
28
2
0
1
0
1
zero
ALU
4
0
1
2
3
ALU
control
6
ALUout
MemRead
MemWrite
MemtoReg
IRWrite
PCSource
ALUOp
Control
ALUSrcB
ALUSrcA
RegWrite
RegDst
A
IorD
B
PCWriteCond
PCWrite
Multicycle Machine: 1-bit Control Signals
Signal
Effect when deasserted
Effect when asserted
The destination register number comes
from the rt field
The destination register number comes from
the rd field
RegWrite
None
Write is enabled to selected destination
register
ALUSrcA
The first ALU operand is the PC
The first ALU operand is register A
MemRead
None
Content of memory address is placed on
Memory data out
MemWrtite
None
Memory location specified by the address is
replaced by the value on Write data input
MemtoReg
The value fed to register file is from
ALUOut
The value fed to register file is from memory
PC is used as an address to memory
unit
ALUOut is used to supply the address to the
memory unit
IRWrite
None
The output of memory is written into IR
PCWrite
None
PC is written; the source is controlled by
PCSource
PCWriteCond
None
PC is written if Zero output from ALU is also
active
7
RegDst
IorD
CPE232 Basic MIPS Architecture
Multicycle Machine: 2-bit Control Signals
Signal
Value
ALUOp
ALUSrcB
PCSource
CPE232 Basic MIPS Architecture
Effect
00
ALU performs add operation
01
ALU performs subtract operation
10
The funct field of the instruction determines the ALU operation
00
The second input to the ALU comes from register B
01
The second input to the ALU is 4 (to increment PC)
10
The second input to the ALU is the sign extended offset , lower 16
bits of IR.
11
The second input to the ALU is the sign extended , lower 16 bits of
the IR shifted left by two bits
00
Output of ALU (PC +4) is sent to the PC for writing
01
The content of ALUOut are sent to the PC for writing (Branch
address)
10
The jump address is sent to the PC for writing
8
Breaking Instruction Execution into Clock Cycles
Cycle 1 Cycle 2
IFetch
Dec
Cycle 3 Cycle 4 Cycle 5
Exec
Mem
WB
1. IFetch: Instruction Fetch and Update PC (Same for all
instructions)


Operations
1.1 Instruction Fetch:
IR <= Memory[PC]
1.2 Update PC :
PC <= PC + 4
Control signals values
-
IorD = 0 , MemRead = 1 , IRWrite = 1
ALUSrcA = 0, ALUSrcB = 01, ALUOp = 00, PCWrite = 1
PCSrc = 00
CPE232 Basic MIPS Architecture
9
Breaking Instruction Execution into Clock Cycles
2. Decode - Instruction decode and register fetch (same
for all instructions)
We don’t know the instruction yet, do non harmful
operations
 Operations
2.1 read the two source registers rs and rt and place them in
registers A and B, respectively.
A <= Reg[IR[25:21]]
B <= Reg[IR[20:16]]
2.2 Compute the branch address
ALUOut <= PC + (sign-extend(IR[15:0]) <<2)

Control signals values
-
ALUSrcA = 0, ALUSrcB = 11, ALUOp = 00
CPE232 Basic MIPS Architecture
10
Breaking Instruction Execution into Clock Cycles
3. Execution, Memory address computation, or branch
completion
Operation in this cycle depends on instruction type
 Operations
* if memory reference, compute address
ALUOut <= A + sign-extend(IR[15:0])
ALUSrcA = 1, ALUSrcB = 10, ALUOp = 00
* if arithmetic-logic instruction, perform operation
ALUOut <= A op B
ALUSrcA = 1, ALUSrcB = 00, ALUOp = 10
CPE232 Basic MIPS Architecture
11
Breaking Instruction Execution into Clock Cycles
3. Execution, Memory address computation, or branch
completion (continued)
operation depends on instruction type
 Operations
* if branch instruction
if (A == B) PC<= ALUOut
ALUSrcA = 1, ALUSrcB = 00, ALUOp = 01,
PCWriteCond = 1, PCSrc = 01
* if jump instruction
PC <= {PC[31:28], (IR[25:0],2’b00)}
PCSource = 10, PCWrite = 1
CPE232 Basic MIPS Architecture
12
Breaking Instruction Execution into Clock Cycles
4. Memory access or R-type completion
operation in this cycle depends on instruction type
 Operations
* if load instruction : read value from memory into MDR
MDR <= Memory[ALUOut]
MemRead = 1, IorD = 1
* if store instruction: store rt into memory
Memory[ALUOut] <= B
MemWrite = 1, IorD = 1
* if arithmetic-logical instruction: write ALU result into rd
Reg[IR[15:11]] <= ALUOut
CPE232 Basic MIPS Architecture
MemtoReg = 0, RegDst = 1, RegWrite = 1
13
Breaking Instruction Execution into Clock Cycles
5. Memory read completion
Needed for the load instruction only
 Operations
5.1 store the loaded value in MDR into rt
Reg[IR[20:16]] <= MDR
RegWrite = 1, MemtoReg = 1, RegDst = 0
CPE232 Basic MIPS Architecture
14
Breaking Instruction Execution into Clock Cycles

In this implementation, not all instructions take 5
cycles
Instruction Class
Clock Cycles Required
Load
5
Store
4
Branch
3
Arithmetic-logical
4
Jump
3
CPE232 Basic MIPS Architecture
15
Multicycle Performance

Compute the average CPI for multicycle implementation for
SPECINT2000 program which has the following instruction
mix: 25% loads, 10% stores, 11% branches, 2% jumps, 52%
ALU. Assume the CPI for each instruction class as given in
the previous table

CPI = Σ CPIi x ICi / IC
= 0.25 x 5 + 0.1 x 4 + 0.11 x 3 + 0.02 x 3 + 0.52 x 4
= 4.12

Compare to CPI = 1 for single cycle ?!!
 Assume CCM = 1/5 CCS
 Then
PerformanceM / PerformanceS = (IC x 1 x CCS ) / (IC x 4.12 x (1/5) CCS)
= 1.21
 Multicycle is also cost-effective in terms of hardware.
CPE232 Basic MIPS Architecture
16
Multicycle Control Unit


Multicycle datapath control signals are not determined solely
by the bits in the instruction

e.g., op code bits tell what operation the ALU should be doing, but
not what instruction cycle is to be done next

Since the instruction is broken into multiple cycles, we need to know
what we did in the previous cycle(s) in order to determine the current
action
Must use a finite state machine (FSM) for control
a set of states (current state stored in State Register)

next state function (determined
by current state and the input)

output function (determined by
current state and the input)
CPE232 Basic MIPS Architecture
Combinational
control logic
Datapath
control
points
...

...
. . . State Reg
Inst
Opcode
Next State
17
The States of the Control Unit

10 states are
required in the
FSM control

The sequence of
states is
determined by five
steps of execution
and the instruction
CPE232 Basic MIPS Architecture
18
The Control Unit
1.
Logic gates



2.
inputs : present state +
opcode  #bits = 10
outputs: control + next
state  #bits = 20
truth table size = 210 rows
x 20 columns
ROM



Can be used to implement
the truth table above (210 x
20 bit = 20 Kbit)
Each location stores the
control signals values and
the next state
Each location is
addressable by the opcode
and next state value
CPE232 Basic MIPS Architecture
19
Micro-programmed Control Unit

ROM implementation is
vulnerable to bugs and
expensive especially for complex
CPU. Size increase as the
number and complexity of
instructions (states) increases.

Use Microprogramming

The next state value may not
be sequential

Generate the next state
outside the storage element

Each state is a
microinstruction and the
signals are specified
symbolically

Use labels for sequencing
CPE232 Basic MIPS Architecture
20
Sequencer
CPE232 Basic MIPS Architecture
21
Microprogram

The microassembler converts the microcode into actual signal values

The sequencing field is used along with the opcode to determine the
next state
CPE232 Basic MIPS Architecture
22
Multicycle Advantages & Disadvantages

Uses the clock cycle efficiently – the clock cycle is timed to
accommodate the slowest instruction step
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
lw
IFetch

sw
Dec
Exec
Mem
WB
IFetch
R-type
Dec
Exec
Mem
IFetch
Multicycle implementations allow functional units to be used
more than once per instruction as long as they are used on
different clock cycles
but

Requires additional internal state registers, more muxes,
and more complicated (FSM) control
CPE232 Basic MIPS Architecture
23
Single Cycle vs. Multiple Cycle Timing
Single Cycle Implementation:
Cycle 1
Cycle 2
Clk
lw
sw
multicycle clock
slower than 1/5th of
single cycle clock due
to state register
overhead
Multiple Cycle Implementation:
Clk
Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
lw
IFetch
sw
Dec
CPE232 Basic MIPS Architecture
Exec
Mem
WB
IFetch
R-type
Dec
Exec
Mem
IFetch
24
Download