Slides

advertisement

Pipelining

CS365

Lecture 9

Outline

Today’s topic

 Pipelining is an implementation technique in which multiple instructions are overlapped in execution

 Subset of MIPS instructions

 lw, sw, and, or, add, sub, slt, beq

 Outline

 Pipeline high-level introduction

Stages, hazards

 Pipelined datapath and control design

D. Barbara

2

Pipeline CS465

Pipelining is Natural!

 Laundry example

 Ann, Brian, Cathy, Dave each has one load of clothes

A B C D to wash, dry, and fold

 Washer takes 30 minutes

 Dryer takes 40 minutes

“Folder” takes 20 minutes

D. Barbara

3

Pipeline CS465

e r

O r d

T a s k

Sequential Laundry

6 PM 7 8 9

Time

10 11 Midnight

30 40 20 30 40 20 30 40 20 30 40 20

A

B

C

D

 Sequential laundry takes 6 hours for 4 loads

 If they learned pipelining, how long would laundry take?

Pipeline CS465 D. Barbara

4

Pipelined Laundry

6 PM 7 8 9

Time

10 11 Midnight

30 40 40 40 40 20 s k

T a

A

B d e

O r r

C

D

 Start work ASAP

 Pipelined laundry takes 3.5 hours for 4 loads

Pipeline CS465 D. Barbara

5

T a s k e r

O r d

Pipelining Lessons (I)

6 PM 7 8 9

Time

 Multiple tasks operating simultaneously using different resources

30 40 40 40 40 20

A

Pipelining doesn’t help latency of single task, it helps throughput of entire workload

B

C

D

 Pipeline rate is limited by slowest pipeline stage

 Unbalanced lengths of pipeline stages reduces speedup

D. Barbara

6

Pipeline CS465

e r

O r d

T a s k

Pipelining Lessons (II)

6 PM 7 8 9

 Potential speedup =

Number pipeline stages

Time

A

30 40 40 40 40 20 

Time to “ fill ” pipeline and time to “ drain ” it reduces speedupstartup and wind down

B

C

D

 Stall for dependencies

D. Barbara

7

Pipeline CS465

Five Stages of Workload

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Load Ifetch Reg/Dec Exec Mem Wr

 Ifetch : Instruction Fetch

 Fetch the instruction from the Instruction Memory

 Reg/Dec : Registers Fetch and Instruction

Decode

 Exec : Calculate the memory address

 Mem : Read the data from the Data Memory

 Wr : Write the data back to the register file

D. Barbara

8

Pipeline CS465

Clk

Single Cycle, Multi-Cycle, Pipeline

Cycle 1 Cycle 2

Single Cycle Implementation:

Load Store Waste

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10

Clk

Multiple Cycle Implementation:

Load

Ifetch Reg Exec Mem Wr

Store

Ifetch Reg Exec Mem

R-type

Ifetch

Pipeline Implementation:

Load Ifetch Reg Exec Mem Wr

Store Ifetch Reg Exec Mem Wr

Pipeline

R-type Ifetch Reg Exec Mem Wr

CS465 D. Barbara

9

Why Pipeline? (Performance)

 Suppose we execute 100 instructions

 Single cycle machine

 45 (ns/cycle) x 1 (CPI) x 100 (inst) = 4500 ns

 Multicycle machine

 10 (ns/cycle) x 4.4 (CPI) (due to inst mix) x

100 (inst) = 4400 ns

 Ideal pipelined machine

 10 (ns/cycle) x (1 (CPI) x 100 (inst) + 4 cycle drain) = 1040 ns

10

D. Barbara Pipeline CS465

Pipelining Throughput

 Ideal speedup is no. of stages in the pipeline; in practice:

Pipeline stage time are limited by the slowest resource, either the

ALU or memory access

Fill and drain time

Pipeline CS465

11

D. Barbara

Why Pipeline? (Resource)

Time (clock cycles) s t r.

I n

Inst 0

Inst 1

O r d e r

Inst 2

Inst 3

Inst 4

Im Reg

Im Reg

Dm Reg

Im Reg

Dm Reg

Im Reg

Dm Reg

Im Reg

Dm Reg

Dm Reg

12

D. Barbara Pipeline CS465

Pipeline Hazards

 Hazards prevent next instruction from executing during its designated clock cycle

 Structural hazards : attempt to use the same resource two different ways at the same time

E.g., combined washer/dryer would be a structural hazard or folder busy doing something else (watching TV)

One memory port

 Data hazards : attempt to use data before it is ready

E.g., one sock of pair in dryer and one in washer; can’t fold until you get sock from washer through dryer

Instruction depends on result of prior instruction still in the pipeline

 Control hazards : attempt to make a decision before condition is evaluated

Branch instructions

13

D. Barbara Pipeline CS465

Structural Hazard: One Memory

Time (clock cycles) s t

I n r.

Load

Instr 1

O r d e r

Instr 2

Instr 3

Instr 4

Mem Reg

Mem Reg

Mem Reg

Mem Reg

Mem

Mem Reg

Reg

Mem Reg

Mem Reg

Mem Reg

Mem Reg

Pipeline

•Solution 1: add more HW

•Hazards can always be resolved by waiting

CS465

14

D. Barbara

Structural Hazard: One Memory

Time (clock cycles) s t

I n r.

Load

Instr 1

O r d e r

Instr 2 stall

Instr 3

Mem Reg

Mem Reg

Mem Reg

Mem Reg

Mem Reg

Mem Reg

Bubble Bubble Bubble Bubble Bubble

Mem Reg Mem Reg

Pipeline

•Hazards can always be resolved by waiting

CS465

15

D. Barbara

Data Hazard Example

 Data hazard: an instruction depends on the result of a previous instruction still in the pipeline add r1 ,r2,r3 sub r4, r1 ,r3 and r6, r1 ,r7 or r8, r1 ,r9 xor r10, r1 ,r11

16

D. Barbara Pipeline CS465

O r d e r

I n s t r.

Data Hazard Example

Dependences backward in time are hazards

Time (clock cycles)

I add r1 ,r2,r3

ID/R E

X

ME

M

Dm

W sub r4, r1 ,r3 and r6, r1 ,r7

Im Reg

Im Reg

Dm Reg

Dm or r8, r1 ,r9 xor r10, r1 ,r11

Im Reg

Im Reg

Reg

Dm Reg

Dm Reg

 Compilers can help, but it gets messy and difficult

17

D. Barbara Pipeline CS465

Data Hazard Solution

Time (clock cycles)

I add r1 ,r2,r3

I n s t r.

sub r4, r1 ,r3 and r6, r1 ,r7

O r d e r or r8, r1 ,r9 xor r10, r1 ,r11

ID/R E

X

Im Reg

Im

ME

M

Dm

Reg

Im

Dm

W

Reg

Im

Reg

Dm

Reg

Reg

Dm Reg

Dm Reg

Solution : “forward” result from one stage to another

Pipeline CS465

18

D. Barbara

Data Hazard Even with Forwarding

Time (clock cycles)

I lw r1 ,0(r2) sub r4, r1 ,r3

ID/R E

X

Im Reg

ME W

Dm Reg

Can’t go back in time! Must delay/stall instruction dependent on loads

19

D. Barbara Pipeline CS465

Data Hazard Even with Forwarding

Time (clock cycles)

I lw r1 ,0(r2) sub r4, r1 ,r3

ID/R E

X

Stall

Im

ME W

Reg Dm Reg

 Must delay/stall instruction dependent on loads

 Sometimes the instruction sequence can be reordered to avoid pipeline stalls

Pipeline CS465

20

D. Barbara

Control Hazards

 Branch instructions may change execution flow

 Suppose we can do decoding/branch decision/branch target computation at stage 2

Still introduce 1-cycle stall

Implementation details later

Pipeline CS465

21

D. Barbara

Control Hazard Solution: Predict

 Predict: guess one direction then back up if wrong

 Impact: 0 lost cycles per branch instruction if right, 1 if wrong

Need to “Squash” and restart following instruction if wrong

 Prediction scheme

 Random prediction: correct 50% of time

 History-based prediction: correct 90% of time

22

D. Barbara Pipeline CS465

Control Hazard Solution: Predict

Pipeline CS465

23

D. Barbara

Pipeline Overview Summary

 Pipelining is a fundamental concept

 Multiple steps using distinct resources

 Utilize capabilities of the datapath by pipelined instruction processing

 Start next instruction while working on the current one

 Detect and resolve hazards

Structural hazards, data hazards, control hazards

All hazards can be solved by stall

Other approaches: forwarding, prediction, reordering

 In modern processors, what really makes it hard:

 Exception handling

 Out-of-order execution

 Next: datapath design for pipeling

24

D. Barbara Pipeline CS465

Single Cycle Datapath

Pipeline CS465

25

D. Barbara

Multi Cycle Datapath

 Divide the work into stages; internal registers

Pipeline CS465

26

D. Barbara

Single-Cycle Pipeline Datagram

 What do we need to add to split the datapath into stages?

Pipeline CS465

27

D. Barbara

Pipelined Datapath

64 128 97 64

 How many bits stored in each pipeline register?

Pipeline CS465

28

D. Barbara

Observations

 5-stage pipeline

 IF, ID, EX, MEM, WB

 Left-to-right flow of instructions

 Instructions and data move generally from left to right

 Two exceptions: WB stage and the selection of PC

May lead to data hazards and control hazards

 Why there is no pipeline register at the end of the

WB stage?

 Last stage must update either register file, or memory, or PC

Pipeline CS465

29

D. Barbara

Pipelining the Load Instruction

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Clock

1st lw Ifetch

2nd lw

Reg/Dec

Ifetch

Exec

Reg/Dec

Mem Wr

Exec Mem Wr

3rd lw Ifetch Reg/Dec Exec Mem Wr

 The five independent functional units in the pipeline datapath are:

Instruction Memory for the IF stage

Register File’s Read Ports (busA and busB) for the ID stage

 ALU for the EXE stage

 Data Memory for the MEM stage

Register File’s Write port (bus W) for the WB stage

Pipeline CS465

30

D. Barbara

The Four Stages of R-type

Cycle 1 Cycle 2 Cycle 3 Cycle 4

R-type Ifetch Reg/Dec Exec Wr

 IF : Instruction Fetch

 Fetch the instruction from the Instruction Memory

 ID : Registers Fetch and Instruction Decode

 EXE : ALU operates on the two register operands

 WB : Write the ALU output back to the register file

Pipeline CS465

31

D. Barbara

Pipelining R-type and Load Instruction

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock

R-type Ifetch

R-type

Reg/Dec Exec Wr

Ifetch Reg/Dec Exec Wr

Oops! We have a problem!

Load Ifetch

R-type

Reg/Dec

Ifetch

Exec

Reg/Dec

Mem Wr

Exec Wr

R-type Ifetch Reg/Dec Exec Wr

 We have pipeline conflict or structural hazard:

 Two instructions try to write to the register file at the same time!

 Only one write port

Pipeline CS465

32

D. Barbara

Important Observation

 Each functional unit can only be used once per instruction

Each functional unit must be used at the same stage for all instructions

Delay Rtype’s register write by one cycle:

 Now Rtype instructions also use Reg File’s write port at Stage 5

 Mem stage is a NO-OP stage: nothing is being done

Pipeline

R-type

1 2 3

Ifetch Reg/Dec Exec

Store Ifetch Reg/Dec Exec

Beq Ifetch Reg/Dec Exec

CS465

4

Mem

Mem

Mem

5

Wr

Wr

Wr

33

D. Barbara

Pipelined Execution

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9

Clock

R-type Ifetch Reg/Dec Exec

R-type Ifetch Reg/Dec

Mem Wr

Exec Mem Wr

Load Ifetch Reg/Dec Exec Mem Wr

Exec Mem Wr R-type Ifetch Reg/Dec

R-type Ifetch Reg/Dec Exec Mem Wr

 All instruction types have five pipeline stages

 Some stages may be wasted for some instructions

34

D. Barbara Pipeline CS465

Pipelined Execution of Load Instruction

Pipeline CS465

35

D. Barbara

Pipelined Execution of Load Instruction

Pipeline CS465

36

D. Barbara

Pipelined Execution of Load Instruction

Pipeline CS465

37

D. Barbara

Pipelined Execution of Load Instruction

Pipeline CS465

38

D. Barbara

Pipelined Execution of Load Instruction

Pipeline CS465

39

D. Barbara

Pipelined Execution of Store Instruction

Pipeline CS465

40

D. Barbara

Pipelined Execution of Store Instruction

Pipeline CS465

41

D. Barbara

Observations from Load and Store

 Pass information needed from an earlier stage to a latter stage

 Each logical component of the datapath – such as IM, Reg read ports, ALU, DM, Reg write port – can be used only within a single pipeline stage.

Otherwise, we would have structural hazard

 A bug in the pipelined datapath for load. Can you tell?

42

D. Barbara Pipeline CS465

Modified Datapath

•For basic R-Type, LW/SW, and BEQ

Pipeline CS465

43

D. Barbara

Pipelined Execution for Multiple Instr.

Pipeline CS465

44

D. Barbara

Pipelined Execution for Multiple Instr.

Pipeline CS465

45

D. Barbara

Pipelined Execution for Multiple Instr.

Pipeline CS465

46

D. Barbara

Pipelined Execution for Multiple Instr.

Pipeline CS465

47

D. Barbara

Pipelined Execution for Multiple Instr.

Pipeline CS465

48

D. Barbara

Pipelined Execution for Multiple Instr.

Pipeline CS465

49

D. Barbara

Pipelined Datapath Control

Fig. 6.22

Pipeline CS465

50

D. Barbara

Overview on Datapath Control op

RegDst

ALUSrc

MemtoReg

RegWrite

MemWrite

Branch

Jump

ExtOp

ALUop

00 0000

R-type

1

1

0

0

0

0

0 x

“R-type”

00 1101 10 0011 10 1011 00 0100 00 0010 ori

0 lw

0 sw x beq x jump x

1

0

1

0

1

1

1

0

1 x

0

1

0 x

0

0 x x

0

0

0

Or

0

0

0

0

1

Add

0

0

1

Add

1

0 x

Subtract x xx

0

1

 For the subset of instructions under consideration

 ALUOp = 00 for Add, 01 for Sub, and 10 for R-type

51

D. Barbara Pipeline CS465

Observations

No write control for all pipeline registers and PC since they are updated at every clock cycle

To specify the control for the pipeline, set the control values during each pipeline stage

Control lines can be divided into 5 groups:

IF

ID

ALU –

MEM –

WB –

NONE

NONE

RegDst, ALUOp, ALUSrc

Branch, MemRead, MemWrite

MemtoReg, RegWrite

Group these nine control lines into 3 subsets:

 ALUControl, MEMControl, WBControl

 Control signals are generated at ID stage, how to pass them to other stages?

Pipeline CS465

52

D. Barbara

Pass Control Signals

 Extend the pipeline registers to include control information

Pipeline CS465

53

D. Barbara

The Complete Pipelined Datapath

Fig 6.27

Pipeline CS465

54

D. Barbara

Example Pipeline Execution

 Show the five instructions going through the pipeline: lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9

Note that these instructions are independent from each other!

Pipeline CS465

55

D. Barbara

Clock1

Pipeline CS465

56

D. Barbara

Clock2

Pipeline CS465

57

D. Barbara

Clock3

Pipeline CS465

58

D. Barbara

Clock4

Pipeline CS465

59

D. Barbara

Clock5

Pipeline CS465

60

D. Barbara

Clock6

Pipeline CS465

61

D. Barbara

Clock7

Pipeline CS465

62

D. Barbara

Clock8

Pipeline CS465

63

D. Barbara

Clock9

Pipeline CS465

64

D. Barbara

Summary

 Overview of pipeline

 Stages

 Hazards

 Pipelined datapath

 Pipeline registers

 Pipelined execution

 Pipelined control

 Different signals for different stages

 Propagate control signals

Pipeline CS465

65

D. Barbara

Next Lecture

 Topic:

 Pipeline hazards and solutions

 Exception handling

 Reading

 Patterson & Hennessy Ch6.4-6.9

Pipeline CS465

66

D. Barbara

Download