# Chapter04

```Computer Organization &amp; Design 5th.
Chapter 4
The Processor: Datapath and Control

ROBERT CHEN
SHU-TE UNIVERSITY CSIE DEPT.
4-1
Computer Organization &amp; Design 5th.
Outlines
•
•
•
•
•
•
Introduction
Logic Design Conventions
Building a Datapath
A Simple Implementation Scheme
A Multicycle Implementation
Exception
SHU-TE UNIVERSITY CSIE DEPT.
4-2
Computer Organization &amp; Design 5th.
Introduction
• 計算機的效能受到下面三個因素影響：
– 指令的數目(instruction count)
– 每個指令的時脈週期數目 (CPI)
• 整數指令, 算數邏輯指令, 記憶體相關指令及分支
– 時脈週期的長短(clock cycle time)
• 編譯器(compiler)和指令集架構(ISA)決定了一個程式所

• 時脈週期的長度和每個指令的時脈週期數目(CPI)卻是由

• 在本章中，我們分別對於兩種不同的MIPS指令製作方式，

– 單一時脈製作方法
– 多重時脈製作方法
SHU-TE UNIVERSITY CSIE DEPT.
4-3
Computer Organization &amp; Design 5th.
Introduction
• 製作MIPS時，其功能單元包含兩個不同的邏輯元件：
– 能運算資料的元件
• 例：ALU
• 組合式(元件的輸出值僅取決於現有的輸入值)
– 含狀態的元件
• 例：記憶體和暫存器檔案
• 循序式(輸出值決定在輸入值及其內部的狀態)
– 循序邏輯
SHU-TE UNIVERSITY CSIE DEPT.
4-4
Computer Organization &amp; Design 5th.
Introduction
• 執行指令的階段
–
–
–
–
–
•

SHU-TE UNIVERSITY CSIE DEPT.
4-5
Computer Organization &amp; Design 5th.
Introduction
• We're ready to look at an implementation of the MIPS
• Simplified to contain only:
– memory-reference instructions:
• lw, sw
– arithmetic-logical instructions:
• add, sub, and, or, slt
– control flow instructions:
• beq, j
SHU-TE UNIVERSITY CSIE DEPT.
4-6
Computer Organization &amp; Design 5th.
Introduction
• State Elements
– Unclocked vs. Clocked
– Clocks used in synchronous logic
• when should an element that contains state be updated?
falling edge
cycle time
rising edge
SHU-TE UNIVERSITY CSIE DEPT.
4-7
Computer Organization &amp; Design 5th.
Introduction
• An unclocked state element
– The set-reset latch
• output depends on present inputs and also on past inputs
• Latches and Flip-flops
– Latches and flip-flops are the simplest memory elements.
– Output is equal to the stored value inside the element
(don't need to ask for permission to look at the value)
– Change of state (value) is based on the clock
– Latches: whenever the inputs change, and the clock is asserted
– Flip-flop: state changes only on a clock edge
(edge-triggered methodology)
•
•
A clocking methodology defines when signals can be read and written
Wouldn't want to read a signal at the same time it was being written
SHU-TE UNIVERSITY CSIE DEPT.
4-8
Computer Organization &amp; Design 5th.
Introduction
• D-latch
– Two inputs:
• the data value to be stored (D)
• the clock signal (C) indicating when to read &amp; store D
– Two outputs:
• the value of the internal state (Q) and it's complement
– When the latch is open (C asserted), the value of Q changes as D changes
transparent latch.
C
Q
D
C
_
Q
D
SHU-TE UNIVERSITY CSIE DEPT.
Q
4-9
Computer Organization &amp; Design 5th.
Introduction
• D flip-flop(D型正反器)
– Flip-flops are not transparent
– Output changes only on the clock edge
– The first latch, called the master, is open and follows the input D when C is
asserted. When the clock input falls, the first latch is closed, but the 2nd
latch, called the slave, is open and gets its input from the output of the
master latch.
D
D
C
D
latch
Q
D
Q
D
latch _
C
Q
Q
D
_
Q
C
C
Q
SHU-TE UNIVERSITY CSIE DEPT.
4 - 10
Computer Organization &amp; Design 5th.
Introduction
• Set-up time and Hold time
– Set-up time: the minimum time that the input must remain valid
before the clock edge
– Hold time: the minimum time that the input must be valid after the
clock edge (usually very small)
D
Set-up time
Hold time
C
SHU-TE UNIVERSITY CSIE DEPT.
4 - 11
Computer Organization &amp; Design 5th.
Introduction
• An edge triggered methodology(邊緣觸發)
– Decide signals when to be read, when to be written
• Typical execution:
– read contents of some state elements,
– send values through some combinational logic
– write results to one or more state elements
State
element
1
Combinational logic
State
element
2
Clock cycle
SHU-TE UNIVERSITY CSIE DEPT.
4 - 12
Computer Organization &amp; Design 5th.
Introduction
• Register File(暫存器檔案)
– A register file consists of a set of registers that can be read and written by
supplying a register number to be accessed.
– Built using D flip-flops and decoders (specify register number)
– Read part (left) : supply a register number as input, and the output is the
information stored in that register.
– A register file with 2 read ports and 1 write ports. (right)
number 1
Register 0
Register 1
Register n
1
M
u
x
number 1
Register n
number 2
Register file
Write
register
number 2
Write
data
M
u
x
SHU-TE UNIVERSITY CSIE DEPT.
data 1
data 2
W rite
4 - 13
Computer Organization &amp; Design 5th.
Introduction
• Register File
– Write part: need 3 inputs: a register number, the data to write, and a clock that
controls the writing into the register.
– Note: we still use the real clock to determine when to write
Write
0
Register number
C
Register 0
1
D
n-to-1
decoder
C
n －1
Register 1
D
n
C
Register n －1
D
C
Register n
Register data
SHU-TE UNIVERSITY CSIE DEPT.
D
4 - 14
Computer Organization &amp; Design 5th.
Introduction
• Simple Implementation
– Basic components:
• two state elements instruction memory (指令記憶體)and program counter (PC)
are needed to store and access instructions.
– Since the instruction memory is read-only(唯讀), we can treat it as combinational
logic.
Instruction
PC
Instruction
Instruction
memory
a. Instruction memory
SHU-TE UNIVERSITY CSIE DEPT.
b. Program counter
4 - 15
Computer Organization &amp; Design 5th.
Introduction
• Fetching instruction and incrementing PC
(擷取指令並遞增PC)
– A portion of the datapath used for fetching instructions and
incrementing Program Counter
PC送出位址讀取指令之後，

4
PC
Instruction
Instruction
memory
SHU-TE UNIVERSITY CSIE DEPT.
4 - 16
Computer Organization &amp; Design 5th.
Introduction
• R-Format ALU operations
– R-format instruction has 3 register operands, 2 read and 1 write
– Rg. add \$t0, \$t1, \$t2
– Register numbers are 5 bits to indicate 32 registers, data bus are 32 bits and
ALU control has 4 bits
5
4
register 1
R egister
5
data 1
Ze ro
register 2
numbers
Registers
5
D ata
W rite
ALU
ALU
result
register
D ata
ALU control
data 2
W rite
data
R egW rite
a. Registers
SHU-TE UNIVERSITY CSIE DEPT.
b. ALU
4 - 17
Computer Organization &amp; Design 5th.
Introduction
• Datapath for R-type Instruction
– Eg. add \$t0, \$t1, \$t2
4
register 1
Instruction
register 2
Registers
Write
register
Write
data
ALU operation
data 1
Zero
ALU ALU
result
data 2
RegWrite
SHU-TE UNIVERSITY CSIE DEPT.
4 - 18
Computer Organization &amp; Design 5th.
Introduction
register, to a 16-bit signed offset field contained in the instruction
– “Sign extension unit” extends the 16-bit data to 32-bit data by replicating
the high-order sign bit to the extra higher 16-bit data
– Eg. lw \$t0, 40(\$t1)
sw \$t0, 32(\$t1)
MemWrite
Write
data
data
Data
memory
16
Sign
extend
32
a. Data memory unit
SHU-TE UNIVERSITY CSIE DEPT.
b. Sign-extension unit
4 - 19
Computer Organization &amp; Design 5th.
Introduction
• Datapath for load and store instructions
– 資料路徑的載入和儲存動作
• 暫存器的存取發生在記憶體位址計算之後。
• 對記憶體的讀取。
• 如果是載入指令，會有一個寫入動作到暫存器檔案中。
lw \$t0, 40(\$t1)
sw \$t0, 32(\$t1)
t1
register 1
Instruction
t0
register 2
Registers
Write
register
Write
data
data 1
Zero
ALU
ALU
result
data 2
Write
data
40
16
SHU-TE UNIVERSITY CSIE DEPT.
Sign
extend
data
Data
memory
32
4 - 20
Computer Organization &amp; Design 5th.
Introduction
• J-type Instruction
– Branch datapath
• Needs to compute the branch target address (計算分支目標位址)
– PC+4 is the address of the next instruction
– Offset field is left-shifted two bits to make a word offset.
(PC0-27  Offset 25-0 +00 )
• Needs to compare register contents(比較暫存器內容)
PC + 4 from instruction datapath
Branch target
Shift
left 2
beq \$t1, \$t2, offset
Instruction
register 1
4
ALU operation
data 1
register 2
Registers
Write
register
data
2
Write
data
ALU Zero
To branch
control logic
RegWrite
16
SHU-TE UNIVERSITY CSIE DEPT.
Sign
extend
32
4 - 21
Computer Organization &amp; Design 5th.
Introduction
• 聖戰士組合
– 利用多工器(MUX)或資料選擇器(data selector)將R形態指令和記憶體指

4
SHU-TE UNIVERSITY CSIE DEPT.
4 - 22
Computer Organization &amp; Design 5th.
Introduction
• 聖戰士組合
– 加入指令擷取部份的資料路徑
SHU-TE UNIVERSITY CSIE DEPT.
4 - 23
Computer Organization &amp; Design 5th.
Introduction
• 聖戰士組合
– 加入分支部份的資料路徑
– 跳躍指令目標位址=指令之偏移量+跳躍指令之位址
SHU-TE UNIVERSITY CSIE DEPT.
4 - 24
Computer Organization &amp; Design 5th.
Introduction
• 大功告成?
– 最難的是Control Unit 之設計
SHU-TE UNIVERSITY CSIE DEPT.
4 - 25
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
• 這個簡易的製作方式包含
– 載入字組 (lw) 及儲存字組 (sw)
– 相等分支 (beq)
– ALU 指令： add, sub, and , or, 及 set on less than
• 根據不同的指令形態，ALU需要可以做下列運算
– 加法 計算 lw 及 sw 的記憶體位址
– 減法 為了相等分支
– AND, OR, subtraction, add, 或 slt 為了 R-形態指令需要 (由6位元的功能

• ALU 控制輸入
–
–
–
–
–
–
0000 ： AND
0001 ： OR
a
0010 ： 加法
0110 ： 減法
0111 ： 小於時設定 set on less than
b
1100 ：NOR (for other MIPS instructions)
ALU-operation
4
ALU
Zero
Result
Overflow
CarryOut
SHU-TE UNIVERSITY CSIE DEPT.
4 - 26
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
• Purpose
– Selecting the operations to perform (ALU, read/write, etc.)
– Controlling the flow of data (multiplexor inputs)
• How you get these control signals:
– Information comes from the 32 bits of the instruction
Instruction Format:
000000
10001
10010
01000
00000
100000
op
rs
rt
rd
shamt
funct
• ALU's operation based on instruction type and function code
SHU-TE UNIVERSITY CSIE DEPT.
4 - 27
Computer Organization &amp; Design 5th.
What Control Signals Do We Need?
SHU-TE UNIVERSITY CSIE DEPT.
4 - 28
Computer Organization &amp; Design 5th.
Design Method for Control
• Multi-level control (decoding)
• Instruction opcode: main control unit (first level)
– ALU control
• Sub-control for arithmetic
– MUX control
•
•
•
•
Which source registers and destination registers
ALU input source
Input source of destination register
Input source of PC
– Result for first level
• Seven 1-bit control lines
• 2-bit ALUOP control signals
• The above control signals can be set based solely on the opcode field of
the instruction
– Exception: PCSrc (depends on the beq result)
SHU-TE UNIVERSITY CSIE DEPT.
4 - 29
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
• ALU控制位元的控制是由 ALUOp 控制位元所決定
• ALUOp是來用決定不同的指令型態

ALU的控制輸入
XXXXXX

0010

XXXXXX

0010
01

XXXXXX

0110
10

100000

0010
R-type
10

100010

0110
R-type
10
AND
100100
and
0000
R-type
10
OR
100101
or
0001
R-type
10

101010

0111

ALUOp
LW
00

SW
00
Branch
equal
R-type

SHU-TE UNIVERSITY CSIE DEPT.
4 - 30
Computer Organization &amp; Design 5th.
ALU Control
• ALU Control
ALUOp
– Instructions using ALU
lw \$t1, offset(t2)
– Branch eq
2
6
function
field
• Subtract for comparison
• ‘taken’ or ‘not taken’
beq \$t1, \$t2, offset
ALU
control
ALU
operation
4
ALU
– R-type
• and/or
• set-on-less-than
SHU-TE UNIVERSITY CSIE DEPT.
4- 31
Computer Organization &amp; Design 5th.
ALU Control
• Multi-level control (decoding)
– Instruction opcode: main control unit – first level
00 = lw, sw
01 = beq,
10 = arithmetic
• 2nd level: function code for arithmetic : sub control
– Main CU generates the ALUOP bits as inputs of the ALU control unit
– Reduce the size of main control but may increase the delay
SHU-TE UNIVERSITY CSIE DEPT.
4 - 32
Computer Organization &amp; Design 5th.
ALU Control
• Truth table
– X : don’t care term
– All zeros or don’t care terms are eliminated
Output
Input
ALUOp
ALUOp1 ALUOp0
0
0
X
1
1
X
1
X
1
X
1
X
1
X
F5
X
X
X
X
X
X
X
SHU-TE UNIVERSITY CSIE DEPT.
Funct field
F4 F3 F2 F1
X X X X
X X X X
X 0 0 0
X 0 0 1
X 0 1 0
X 0 1 0
X 1 0 1
Operation
F0
X
X
0
0
0
1
0
0010
0110
0010
0110
0000
0001
0111

1.ALUOP 目前無 ’11’項

2.Funct field中F5F4皆為
’10’故改成’XX’
4 - 33
Computer Organization &amp; Design 5th.
Design Main Control Unit(設計主要的控制單元)
• 指令的格式
– Op 欄位：Op[5 : 0]
– R 型指令、相等則分支(beq)指令及儲存指令中，

– 載入及儲存指令中的基底暫存器：指令的25 : 21 位元(rs)
– 相等則分支(beq)指令﹑載入指令及儲存指令的16 位元偏移量(offset)：

SHU-TE UNIVERSITY CSIE DEPT.
4 - 34
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
• Seven single-bit control lines, one 2-bit ALUOp control signal
• Except for PCSrc, the control signal can be set solely based on the
opcode field of the instruction.
• To generate PCSrc, we need to AND together a signal from the control
unit, which we call Branch, with the Zero signal out of the ALU.
Signal
RegDst
Regwrite
ALUSrc
PCSrc
MemWrite
MemtoReg
Deasserted 未設定
dest register from rt (20-16)
none
2nd operand from reg output 2
PC&lt;-- PC+4
none
none
put ALU result to reg
SHU-TE UNIVERSITY CSIE DEPT.

Asserted
from rd (15-11)
write to dest register
from 16 bit sign extension
PC&lt;--branch dest
write to memory
put memory read data to reg
4 - 35
Computer Organization &amp; Design 5th.
The Simple Datapath with the Control Unit
0
M
u
x
4
Instruction[31 26]
Instruction
memory
Instruction[15 11]
Zero
ALU ALU
result
register 1
Instruction[20 16]
Instruction
[31 0]
1
Shift
left 2
RegDst
Branch
MemtoReg
Control ALUOp
MemWrite
ALUSrc
RegWrite
Instruction[25 21]
PC
ALU
0
M
u
x
1
data1
register 2
Write
data2
register
0
M
u
x
1
Write
data
Write
data
Instruction[15 0]
16
Sign
extend
data
Data
memory
1
M
u
x
0
32
ALU
control
Instruction[5 0]
SHU-TE UNIVERSITY CSIE DEPT.
4 - 36
Computer Organization &amp; Design 5th.
The Simple Datapath with the Control Unit
• 定義所有控制訊號線應該如何對每一種運作碼來設定
– 來源暫存器都是rs 和rt，目的暫存器都是rd；定義了ALUSrc 和RegDst 控制訊

– R-型指令將運算結果寫入暫存器(RegWrite=1)，但不會存取(讀寫)數據記憶體。
– 當Branch 控制訊號等於0 時，PC 的值無條件地由PC+4 取代；否則，如果
ALU 的Zero 輸出值也為1 時，PC 的值由分支目標位址取代
– R-型指令的ALUOp 欄位被設定為10，表示ALU 的控制是由功能欄位(funct
field)來產生
SHU-TE UNIVERSITY CSIE DEPT.
4 - 37
Computer Organization &amp; Design 5th.
The Simple Datapath with the Control Unit
• 第二列及第三列說明lw 指令及sw 指令的控制訊號設定。
– ALUSrc 和ALUOp 欄位設定成執行位址的計算。
– 載入指令，RegDst 及RegWrite 設定成將結果儲存到rt 暫存器中。
SHU-TE UNIVERSITY CSIE DEPT.
4 - 38
Computer Organization &amp; Design 5th.
The Simple Datapath with the Control Unit
• 分支指令的ALUOp 欄位設定成執行減法(ALU 控制=01)，用

– 如果RegWrite 控制訊號為0 時MemtoReg 欄位是無關緊要的：因為暫

– 表格最後兩列的MemtoReg 項目用X 來表示，代表「don’t care」。
– 當RegWrite 為0 時，RegDst 也可以使用X 來表示。
SHU-TE UNIVERSITY CSIE DEPT.
4 - 39
Computer Organization &amp; Design 5th.
Datapath operation(數據通道運作)
• 數據通道對R 型指令：add \$t1, \$t2, \$t3
– 一個時脈週期內可以把它想像成執行了四個步驟：
指令被擷取，並且遞增PC 的值
兩個暫存器從暫存器檔案中讀出；同時，主控制單元計算

ALU 根據功能碼(指令中的5 :0 位元功能欄位)來產生ALU

ALU 的運算結果使用指令的15:11 位元來選擇目的暫存器
(\$t1)，以寫入暫存器檔案
SHU-TE UNIVERSITY CSIE DEPT.
4 - 40
Computer Organization &amp; Design 5th.
lw
\$t0, 32(\$s3) ； 35 19 8 32
35 or 43
rs
rt
31:26
25:21
20:16
15:0
SHU-TE UNIVERSITY CSIE DEPT.
4 - 41
Computer Organization &amp; Design 5th.
Datapath operation(數據通道運作)
• 載入指令中有動作的功能單元和被設定的控制訊號
lw \$t0, 32(\$s3)
– 想像成執行了五個步驟：
指令從指令記憶體中被擷取，並且遞增PC 的值
暫存器的值從暫存器檔案中讀出
ALU 計算由暫存器檔案中讀出的值和符號延伸過的指令中

ALU 所得的和作為數據記憶體的位址
記憶體傳回的數據寫入暫存器檔案；暫存器的目的地可由

SHU-TE UNIVERSITY CSIE DEPT.
4 - 42
Computer Organization &amp; Design 5th.
beq
\$s1, \$s2, 100 ； 4 17 18 25
4
rs
rt
31:26
25:21
20:16
15:0
SHU-TE UNIVERSITY CSIE DEPT.
4 - 43
Computer Organization &amp; Design 5th.
Datapath operation(數據通道運作)
• beq指令： beq \$s0, \$s1, 100
– 想像成執行時的四個步驟：
指令從指令記憶體中被擷取，並且遞增PC 的值
兩個暫存器\$t1 和\$t2 從暫存器檔案中被讀出
ALU 對暫存器檔案中讀出的值執行減法。PC+4 的值與符

ALU 的Zero 輸出被用來決定哪一個加法器的結果要寫回
PC
SHU-TE UNIVERSITY CSIE DEPT.
4 - 44
Computer Organization &amp; Design 5th.
beq
\$s1, \$s2, 100 ； 4 17 18 25
4
rs
rt
31:26
25:21
20:16
15:0
SHU-TE UNIVERSITY CSIE DEPT.
4 - 45
Computer Organization &amp; Design 5th.
Complete Control Unit(完成控制單元)
• 加入跳躍(jump)指令，以便說明如何在基本的數據通道

SHU-TE UNIVERSITY CSIE DEPT.
4 - 46
Computer Organization &amp; Design 5th.
Jump
2
31:26
25:0

SHU-TE UNIVERSITY CSIE DEPT.
4 - 47
Computer Organization &amp; Design 5th.

• Longest delay determines clock period
– 指令記憶體暫存器檔案 ALU 資料記憶體傳存器檔案
(Instruction memory  register file  ALU  data memory  register file)
• 對不同指令沒有彈性可以改變週期
(Not feasible to vary period for different instructions)
• 違反設計原則 Violates design principle
– 讓一般情況加快(Making the common case fast)
• 利用管線處理來增進效能
(We will improve performance by pipelining)
SHU-TE UNIVERSITY CSIE DEPT.
4 - 48
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
• 為什麼單一時脈週期的製作方式不被採用？
– 每個指令的時脈週期都必須有相同長度(因此，CPI = 1)
– 計算機的運算處理指令中最長的路徑將決定時脈週期的長度
– 整體效能似乎不是很好
• 範例：單一時脈計算機的效能,假設功能單元的運算時間如下：
–
–
–
–

ALU 及加法器： 2 ns

1. 每個指令在一個固定長度的時脈週期內運作完成
2. 每個指令在一個時脈週期內運作完成，但時脈週期長度是可變動
SHU-TE UNIVERSITY CSIE DEPT.
4 - 49
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
•

24% 載入， 12% 儲存， 44% R

• 解答
1. CPU 時脈週期為 8 ns.
2. CPU 時脈週期
= 8*24% + 7*12% + 6*44% +
5*18% + 2*2%
= 6.3 ns

SHU-TE UNIVERSITY CSIE DEPT.

R格式

ALU

ALU

ALU

ALU

ALU

R格式
2
1
2
0
1
6ns

2
1
2
2
1
8ns

2
1
2
2

2
1
2

2
7ns
5ns
2ns
4 - 50
Computer Organization &amp; Design 5th.
A Simple Implementation Scheme
• 範例
–
–
–
–

• 1.每個指令在一個固定長度的時脈週期內運作完成
• 2.每個指令在一個時脈週期內運作完成，但時脈週期長度是可變動
– 為了計算效能，假設我們使用下列指令的混合比例：
– 31%載入， 21%儲存， 27% R形態指令， 5%分支，2% 跳躍指令，
7%浮點加法及7% FP浮點乘法
• 解答
– 1. 最長的指令為浮點乘法，其時脈週期為
2 + 1 + 16 + 1 = 20 ns
– 2. 浮點指令的加法須時 2 + 1 + 8 + 1 = 12 ns.
– CPU 時脈週期
= 8*31% + 7*21% + 6*27% + 5*5% + 2*2% +20*7% + 12*7%= 7.0 ns
– 效能改進的比例為20/7 = 2.9.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 51
Computer Organization &amp; Design 5th.

• 管線式洗衣(Pipelined laundry: overlapping execution)
– 平行處理增進效能(Parallelism improves performance)



Non-stop:

SHU-TE UNIVERSITY CSIE DEPT.
Speedup
= 8/3.5 = 2.3
Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages
4 - 52
Computer Organization &amp; Design 5th.
MIPS管線處理(MIPS Pipeline)
•

–
•

1. IF: Instruction Fetch(擷取指令)

2.
ID: Instruction Decode(指令解碼)

3.
EX: Execution(執行指令)

4.
MEM: Memory Access(記憶體存取)

5.
WB: Write Back(寫回)

SHU-TE UNIVERSITY CSIE DEPT.
4 - 53
Computer Organization &amp; Design 5th.

• 假設每一階段的時間為(Assume time for stages is)
– 暫存器讀寫：100ps for register read or write
– 其他階段：200ps for other stages
• 比較管線處理與單一週期的資料路徑
(Compare pipelined datapath with single-cycle datapath)
Instr
Instr fetch Register
ALU op
Memory
access
Register
write
Total time
lw
200ps
100 ps
200ps
200ps
100 ps
800ps
sw
200ps
100 ps
200ps
200ps
R-format
200ps
100 ps
200ps
beq
200ps
100 ps
200ps
SHU-TE UNIVERSITY CSIE DEPT.
700ps
100 ps
600ps
500ps
4 - 54
Computer Organization &amp; Design 5th.
Pipeline Performance
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
SHU-TE UNIVERSITY CSIE DEPT.
4 - 55
Computer Organization &amp; Design 5th.

• 所有階段都一致If all stages are balanced
– 每一階段時間都相同(i.e., all take the same time)
– Time between instructionspipelined
= Time between instructionsnonpipelined
Number of stages
• 若階段不一致，加速值較少If not balanced, speedup is less

Speedup due to increased throughput
– 延遲時間(latency每一指令的時間)沒有減少
Latency (time for each instruction) does not decrease
–
SHU-TE UNIVERSITY CSIE DEPT.
4 - 56
Computer Organization &amp; Design 5th.
Pipelining and ISA Design
• MIPS ISA專為管線化處理所設計
(MIPS ISA designed for pipelining)
– 所有指令皆為32位元(All instructions are 32-bits)
• 較容易在一個週期內擷取並解碼
(Easier to fetch and decode in one cycle)
• c.f. x86: 1- to 17-byte instructions
– 少量且規則的指令格式(Few and regular instruction formats)
• 能在一個步驟內解碼並讀取暫存器
(Can decode and read registers in one step)
• 能在第3階段計算位址，在第4階段存取記憶體
(Can calculate address in 3rd stage, access memory in 4th stage)
– 記憶體運算元對齊(Alignment of memory operands)
• 記憶體存取只需一個週期
(Memory access takes only one cycle)
SHU-TE UNIVERSITY CSIE DEPT.
4 - 57
Computer Organization &amp; Design 5th.

• 下一週期的起始位址不是下一指令
Situations that prevent starting the next instruction in the next cycle
• 危障種類Hazard types
– 結構危障(Structure hazards)
• 所需資源忙碌中(A required resource is busy)
– 資料(數據)危障(Data hazard)
• 等待前一指令完成資料讀寫
Need to wait for previous instruction to complete its data read/write
– 控制危障(Control hazard)
• 依前一指令結果決定控制動作
Deciding on control action depends on previous instruction
SHU-TE UNIVERSITY CSIE DEPT.
4 - 58
Computer Organization &amp; Design 5th.

• 定義：當安排好的指令由於硬體無法支援當時應執行的指令

• 使用資源衝突Conflict for use of a resource
– MIPS中只有一個記憶體 In MIPS pipeline with a single memory
– 該週期的指令擷取必須延遲(stall)，需管線泡泡
Instruction fetch would have to stall for that cycle
Would cause a pipeline “bubble”
• 管線式資料路徑需要獨立的指令/資料記憶體
– 或獨立的指令/資料快取(記憶體)
Hence, pipelined datapaths require separate instruction/data
memories
– Or separate instruction/data caches
SHU-TE UNIVERSITY CSIE DEPT.
4 - 59
Computer Organization &amp; Design 5th.

[問題]若有第四個指令進入管線中，則…..

SHU-TE UNIVERSITY CSIE DEPT.

4 - 60
Computer Organization &amp; Design 5th.

• 定義：當安排好的指令執行所需的資料未取得而無法在適當

• 與前依指令資料存取完成結果有關
An instruction depends on completion of data access by a previous instruction
sub
\$s0, \$t0, \$t1
\$t2, \$s0, \$t3
SHU-TE UNIVERSITY CSIE DEPT.
4 - 61
Computer Organization &amp; Design 5th.

• 使用已經計算完成的結果Use result when it is computed
– 不需等到存至暫存器中Don’t wait for it to be stored in a register
– 資料路徑需要額外的連接線
Requires extra connections in the datapath
SHU-TE UNIVERSITY CSIE DEPT.
4 - 62
Computer Organization &amp; Design 5th.
• 使用前饋仍無法避免要用延遲/停滯(stall)
Can’t always avoid stalls by forwarding
– 當所需要的值尚未計算完成
If value not computed when needed
– 無法前饋至之前的時間
Can’t forward backward in time!
SHU-TE UNIVERSITY CSIE DEPT.
4 - 63
Computer Organization &amp; Design 5th.
Code Scheduling to Avoid Stalls
Reorder code to avoid use of load result in the next instruction
• C code for
stall
stall
lw
lw
sw
lw
sw
A = B + E;
C = B + F;
\$t1,
\$t2,
\$t3,
\$t3,
\$t4,
\$t5,
\$t5,
0(\$t0)
4(\$t0)
\$t1, \$t2
12(\$t0)
8(\$t0)
\$t1, \$t4
16(\$t0)
13 cycles
SHU-TE UNIVERSITY CSIE DEPT.
lw
lw
lw
sw
sw
\$t1,
\$t2,
\$t4,
\$t3,
\$t3,
\$t5,
\$t5,
0(\$t0)
4(\$t0)
8(\$t0)
\$t1, \$t2
12(\$t0)
\$t1, \$t4
16(\$t0)
11 cycles
4 - 64
Computer Organization &amp; Design 5th.

• 定義：當所擷取的指令並非所需的指令而造成適當的指

• 分支決定控制流程Branch determines flow of control
– 擷取下一指令取決於分支結果
Fetching next instruction depends on branch outcome
– 管線處理不可能永遠擷取正確的下一個指令
Pipeline can’t always fetch correct instruction
• 仍在分支指令的ID階段
Still working on ID stage of branch
• MIPS的管線處理中In MIPS pipeline
– 需要在管線中比較暫存器與提早計算目標位址Need to compare
registers and compute target early in the pipeline
– 在ID階段增加硬體來處理
Add hardware to do it in ID stage
SHU-TE UNIVERSITY CSIE DEPT.
4 - 65
Computer Organization &amp; Design 5th.

• 等到分支結果來決定擷取下一指令
Wait until branch outcome determined before
fetching next instruction
SHU-TE UNIVERSITY CSIE DEPT.
4 - 66
Computer Organization &amp; Design 5th.

• 較長的管線無法完全提早決定分之結果
Longer pipelines can’t readily determine branch outcome early
– 延遲時間變得無法接受
Stall penalty becomes unacceptable
• 分支預測結果Predict outcome of branch
– 預測錯誤只有造成延遲Only stall if prediction is wrong
• MIPS管線處理中In MIPS pipeline
– 可以預測分支未發生Can predict branches not taken
– 分支後的擷取指令沒有延遲
Fetch instruction after branch, with no delay
SHU-TE UNIVERSITY CSIE DEPT.
4 - 67
Computer Organization &amp; Design 5th.
MIPS with Predict Not Taken
Prediction
correct
Prediction
incorrect
SHU-TE UNIVERSITY CSIE DEPT.
4 - 68
Computer Organization &amp; Design 5th.
More-Realistic Branch Prediction
• 靜態分支預測Static branch prediction
– 基於典型分支行為Based on typical branch behavior
– 範例：迴圈與if指令Example: loop and if-statement branches
• 預測反向分支會發生Predict backward branches taken
• 預測前向分支不會發生Predict forward branches not taken
• 動態分支預測Dynamic branch prediction
– 硬體預測器根據每道分支的行為來作預測，並且在程式運作過程中可

– 硬體測量實際分支行為Hardware measures actual branch behavior
• 例如：記錄每一分支最近結果的歷史
e.g., record recent history of each branch
– 假設未來行為會持續趨勢
Assume future behavior will continue the trend
• 當猜錯時，使用重新擷取時延遲stall、並更新歷史紀錄
When wrong, stall while re-fetching, and update history
SHU-TE UNIVERSITY CSIE DEPT.
4 - 69
Computer Organization &amp; Design 5th.

• 管線處理增加效能是利用增加指令處理量
Pipelining improves performance by increasing instruction throughput
– 同時執行多個指令Executes multiple instructions in parallel
– 每一指令有相同延遲時間Each instruction has the same latency
• 危障Subject to hazards
– 結構、資料、控制Structure, data, control
• 指令集設計影響實現管線處理的複雜度
Instruction set design affects complexity of pipeline implementation
SHU-TE UNIVERSITY CSIE DEPT.
4 - 70
Computer Organization &amp; Design 5th.

• 有效率的管道運作通常是除了記憶體系統之外，決定處理器的
CPI 也就是其效能最重要的因素
• 結構危障通常發生在可能無法完全管道化的浮點單元中
• 控制危障通常在有較多分支而且分支較難預測的整數程式中較

• 數據危障
– 通常在浮點程式中由於其較少的分支以及較規律的記憶體存取樣式，方

• 對於偏向使用指標(pointer)而導致較不規律記憶體存取的整數

• 管道化改善了指令的處理量然而反而會增加單一指令的執行時

SHU-TE UNIVERSITY CSIE DEPT.
4 - 71
Computer Organization &amp; Design 5th.

• 單一週期數據通道
– 5 級的管道
– 代表在任一時脈週期內最多有5 道指令正在執行
1. IF：指令擷取
2. ID：指令解碼與暫存器檔案讀取
3. EX：執行或位址計算
4. MEM：數據記憶體存取
5. WB：寫回
SHU-TE UNIVERSITY CSIE DEPT.
4 - 72
Computer Organization &amp; Design 5th.
MIPS Pipelined Datapath
• 2個例外
– Write Back(WB)
– Next PC
MEM
Right-to-left flow
WB
SHU-TE UNIVERSITY CSIE DEPT.
4 - 73
Computer Organization &amp; Design 5th.

• 管線各階段間需要暫存器Need registers between stages
– 保存上一階段產生的資訊
To hold information produced in previous cycle
64bits
SHU-TE UNIVERSITY CSIE DEPT.
128
97
64bits4 - 74
Computer Organization &amp; Design 5th.
Pipeline Operation
• 管線化資料路徑「逐一週期」的流程
Cycle-by-cycle flow of instructions through the pipelined datapath
– 單一週期管線圖
“Single-clock-cycle” pipeline diagram
• 顯示單一週期管線的使用Shows pipeline usage in a single cycle
• 標示使用到的資源Highlight resources used
– 比較：多重時脈週期管線圖
c.f. “multi-clock-cycle” diagram
• Graph of operation over time
We’ll look at “single-clock-cycle” diagrams for load &amp; store
SHU-TE UNIVERSITY CSIE DEPT.
4 - 75
Computer Organization &amp; Design 5th.

SHU-TE UNIVERSITY CSIE DEPT.
4 - 76
Computer Organization &amp; Design 5th.

• 顯示載入指令通過管道中五個階段時數據通道中強

– 任何在後方管道階段會使用到的資訊都必須透過

– 數據通道中的每個邏輯元件如指令記憶體、暫存

• 現在我們來發現在載入指令設計中的一個錯誤。你

– 寫入暫存器的編號！
SHU-TE UNIVERSITY CSIE DEPT.
4 - 77
Computer Organization &amp; Design 5th.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 78
Computer Organization &amp; Design 5th.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 79
Computer Organization &amp; Design 5th.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 80
Computer Organization &amp; Design 5th.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 81
Computer Organization &amp; Design 5th.
Wrong
register
number
SHU-TE UNIVERSITY CSIE DEPT.
4 - 82
Computer Organization &amp; Design 5th.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 83
Computer Organization &amp; Design 5th.
EX for Store
SHU-TE UNIVERSITY CSIE DEPT.
4 - 84
Computer Organization &amp; Design 5th.
MEM for Store
SHU-TE UNIVERSITY CSIE DEPT.
4 - 85
Computer Organization &amp; Design 5th.
WB for Store
SHU-TE UNIVERSITY CSIE DEPT.
4 - 86
Computer Organization &amp; Design 5th.
Multi-Cycle Pipeline Diagram
• 顯示資源利用(Form showing resource) usage
SHU-TE UNIVERSITY CSIE DEPT.
4 - 87
Computer Organization &amp; Design 5th.
SHU-TE UNIVERSITY CSIE DEPT.
4 - 88
```