Lecture 21

advertisement
ITEC 352
Lecture 21
Pipelining
Review
• Questions?
• Homework 3 on Wed.
• JVM vs Assembly Similarities / Differences
Pipelining
Outline
• Pipelining
– Motivation
– Fits with execution model
– Examples
Pipelining
CISC Vs.
RISC
• A long time back when memory costed
– The focus of most computer architects
• (E.g., Intel and Motorola)
• Support fewer instructions that performed more complicated computations.
• E.g., addld [a], [b], [c] would be a complex instruction that replaced:
ld [a], %r1
ld [b], %r2
addcc %r1, %r2, %r3
st %r3, [c]
Why?
Complex instructions  Shorter programs  Smaller
memory needed.
However, memory became cheaper. So architects started
thinking about techniques to use more memory that
would speed up computations.
Pipelining
CISC Vs.
RISC (2)
• Solution: RISC (Reduced Instruction Set Computer)
– E.g., ARC instructions.
– The instructions are similar in complexity, i.e., they take more-or-less similar
number of clock cycles to execute…
• So architects thought why not have every RISC instruction execute in one
CPU cycle using a technique called pipelining.
• To use pipeling, instructions must be of similar complexity, ie.., every
instruction must require more or less the same number of CPU cycles
to execute.
• However, what is the complexity of an instruction?
• Trivia:
– Apple’s original processor (PowerPC) based on Motorola chips:
RISC processor
– Intel: stuck to CISC – even today!
Slide © Prem Uppuluri, Derived from Murdocca and Heuring
Pipelining
Pipelining
and RISC
• The complexity of an instruction can be based on the
number of steps it takes to execute the fetch-execute
cycle.
• Recall the Fetch-Execute cycle (every instruction goes
through a fetch-execute cycle)
–
–
–
–
–
Fetch instruction from memory to register.
Decode the opcode
Fetch operands from memory to register
Execute operation
Store result back into memory.
• We said: “this is how every instruction is executed by
the control unit”.
– Ahem…this is not entirely true! It is almost true: each class of instruction
has slightly different stages in the fetch-execute cycle.
Slide © Prem Uppuluri, Derived from Murdocca and Huering
Pipelining
Complete ARC Instruction and PSR Formats
Pipelining
Arithmetic
Instructions
• Arithmetic instructions in RISC have these
following 5 stages:
–
–
–
–
–
Fetch the instruction from memory
Decode the instruction
Fetch the operands from the register file
Apply the operands to the ALU
Write the result back to the register file.
• E.g., take addcc %r1, %r2, r3
– Trace the 5 stages on this instruction as
exercise.
© Prem Uppuluri
Pipelining
RISC
branch
instruction
• Branch instructions have the following stages
– Fetch the instruction from memory
– Decode the instruction
– Fetch the components of the address from the instruction or
register file
– Apply the components of the address to the ALU
– Copy the resulting effective address into the PC (program
counter).
• Exercise; Trace the stages for the instruction:
be 2048
© Prem Uppuluri
Pipelining
Load/Store
• Load and store instructions have the following
stages
– Fetch the instruction from the memory
– Decode the instruction
– Fetch the components of the address from the instruction
or register file
– Apply the components to the ALU
– Apply the resulting effective address to memory along with a
read or write signal. If write the data item to be written must be
retrieved from the register file.
• Exercise: Trace the ld %r1, %r2, %r3 instruction
through these stages.
© Prem Uppuluri
Pipelining
Summarizin
g…
• The fetch-execute stages differ across the
different instructions….
– But they have similarities. All the instructions have the
following stages:
•
•
•
•
•
Instruction fetch
Decode
Operand Fetch
ALU operation
Result writeback (to memory, from memory or to register
depending on the type of instruction).
– So computer architects decided to break the
control unit into 5 parts – each part for one
stage of the fetch execute cycle.
© Prem Uppuluri
Pipelining
RISC
control unit
• RISC processors have 5 hardware units:
– Each corresponding to one stage of the fetch-execute
cycle.
• E.g., the “Fetch instruction” hardware part of the
control unit, fetches instruction while “Fetch
operand” hardware fetches the operands.
• These hardware units can execute in parallel.
• Each hardware unit takes 1 CPU tick to execute.
• How does this help?
© Prem Uppuluri
Pipelining
RISC
control unit
Fetch
Instr.
Decode
opcode
Fetch
operands
A pipeline of five hardware units
(together form the control unit). An
instruction moves from left to right in the
pipeline. Whenever the clock ticks, each
unit passes the instruction to the next.
© Prem Uppuluri. Derived from Doug Comer
Pipelining
Execute
Instr.
Store
result
RISC
pipelining
• E.g., life of an instruction:
CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5
1
inst1
2
inst1
3
inst1
4
inst1
5
inst1
In instruction moves through the pipeline in 5 clock ticks. So in effect, it takes 5
CPU cycles for an instruction to execute. However, consider this: “all the
units can work in parallel”. Can we use this fact to speed up the instruction,
i.e., can we make the instruction execute faster than 5 cycles?
© Prem Uppuluri, Derived from Doug Comer
Pipelining
RISC pipelining: speeding up instructions
When Unit2 in clock cycle 2 is
executing inst1, Unit1 is idle…
why not start using this unit to
execute the next instruction:
inst2
CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5
1
inst1
2
inst2
inst1
3
inst1
4
inst1
5
inst1
© Prem Uppuluri, Derived from Doug Comer
Pipelining
Pipeline
filling
• E.g.,
CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5
1
inst1
2
inst2 inst1
3
inst3 inst2 inst1
4
inst4 inst3 inst2 inst1
5
inst5 inst4 inst3 inst2 inst1
6
inst6 inst5 inst4 inst3 inst2
7
inst7 inst6 inst5 inst4 inst3
8
inst8 inst7 inst6 inst5 inst4
After the pipeline is filled up (in CPU cycle 5, after every CPU cycle,
one instruction is getting executed. This is called Instruction Level
Pipelining (ILP).
Pipelining
© Prem Uppuluri, Derived from Doug Comer
Class
Discussion
• Implement the pipeline for the following:
srl %r3, %r5
addcc %r1, 10, %r1
ld %r2, %r4
subcc %r3, %r1, %r4
be label
Pipelining
© Prem Uppuluri, Derived from Doug Comer
Commercial
processors
• Intel Pentium Pro: one of the first to provide speculative
executions.
– 12 pipeline stages.
• Intel Pentium 4: went from 10 to 20 pipeline stages
– Why did Intel do this?
• Increase in pipelining  less work per clock cycle  the
clock cycle time can be reduced  clock cycle speeds
are increased.
• Hence, intel could now support 1.2 Ghz+ speeds.
Pipelining
Commercial
Processors
• Video by Apple (Apple promotional material):
http://www.youtube.com/watch?v=PKF9GOE2q38
• Intel Pentium Pro: one of the first to provide speculative
executions.
– 12 pipeline stages.
• NEXT: Things that effect a pipelines performance.
– Pipeline “bubbles”.
Pipelining
Discussion
• Pipelining is not always efficient. Sometimes an
instruction depends on its previous instruction’s
results.
– Implement the pipeline for the following:
srl %r3, %r5
addcc %r1, 10, %r1
ld %r1, %r2
subcc %r2, %r4, %r4
• E.g.,
CPUcycle Unit1 Unit2 Unit3 Unit4 Unit5
Pipelining
© Prem Uppuluri, Derived from Doug Comer
Review
• Pipelining
Pipelining
Download