Principles of pipelining PC 2 new

advertisement
Principles of pipelining
• The two major parametric considerations in
designing a parallel computer architecture are:
– executing multiple number of instructions in parallel,
– increasing the efficiency of processors.
There are various methods by which instructions can be
executed in parallel
– Pipelining is one of the classical and effective methods
to increase parallelism where different stages perform
repeated functions on different operands.
– Vector processing is the arithmetic or logical
computation applied on vectors whereas in scalar
processing only one data item or a pair of data items is
processed.
• Superscalar processing : For improving
the processor’s speed by having multiple
instructions per cycle is known as
Superscalar processing.
• Multithreading : used for increasing
processor utilization which is also used in
parallel computer architecture.
OBJECTIVES
•
•
•
•
•
Principles of linear pipelining .
Classification of pipeline processor
Instruction and arithmetic pipeline
Principles of designing pipeline processors
vector processing requirements .
PARALLEL PROCESSING
Execution of Concurrent Events in the computing
process to achieve faster Computational Speed
Levels of Parallel Processing
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
PARALLEL COMPUTERS
Architectural Classification
– Flynn's classification
• Based on the multiplicity of Instruction Streams and Data
Streams
• Instruction Stream
– Sequence of Instructions read from memory
• Data Stream
– Operations performed on the data in the
processor
Number of Data Streams
Number of Single
Instruction
Streams
Multiple
Single
Multiple
SISD
SIMD
MISD
MIMD
COMPUTER ARCHITECTURES FOR PARALLEL
PROCESSING
Von-Neuman
based
SISD
Superscalar processors
Superpipelined processors
VLIW
MISD
Nonexistence
SIMD
Array processors
Systolic arrays
Dataflow
Associative processors
MIMD
Reduction
Shared-memory multiprocessors
Bus based
Crossbar switch based
Multistage IN based
Message-passing multicomputers
Hypercube
Mesh
Reconfigurable
Pipelining
PIPELINING
A technique of decomposing a sequential process
into sub operations, with each sub process being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci
for i = 1, 2, 3, ... , 7
Ai
Bi
R1
R2
Memory Ci
Segment 1
Multiplier
Segment 2
R4
R3
Segment 3
Adder
R5
R1  Ai, R2  Bi
R3  R1 * R2, R4  Ci
R5  R3 + R4
Load Ai and Bi
Multiply and load Ci
Add
Pipelining
OPERATIONS IN EACH PIPELINE STAGE
Clock Segment 1
Pulse
Number R1
R2
1
A1
B1
2
A2
B2
3
A3
B3
4
A4
B4
5
A5
B5
6
A6
B6
7
A7
B7
8
9
Segment 2
R3
A1 * B1
A2 * B2
A3 * B3
A4 * B4
A5 * B5
A6 * B6
A7 * B7
R4
C1
C2
C3
C4
C5
C6
C7
Segment 3
R5
A1 * B1 + C1
A2 * B2 + C2
A3 * B3 + C3
A4 * B4 + C4
A5 * B5 + C5
A6 * B6 + C6
A7 * B7 + C7
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock
Input
S1
R1
S2
R2
S3
R3
S4
R4
Space-Time Diagram
Segment 1
2
3
4
1
2
3
4
5
6
7
8
T1
T2
T3
T4
T5
T6
T1
T2
T3
T4
T5
T6
T1
T2
T3
T4
T5
T6
T1
T2
T3
T4
T5
9
T6
Clock cycles
PIPELINE PROCESSING
• Pipelining is a method to realize, overlapped parallelism in
the proposed solution of a problem, on a digital computer
in an economical way .
• To introduce pipelining in a processor P, the following
steps must be followed:
• Sub-divide the input process into a sequence of subtasks. These
subtasks will make stages of pipeline, which are also known as
segments.
• Each stage Si of the pipeline according to the subtask will perform
some operation on a distinct set of operands.
• When stage Si has completed its operation, results are passed to
the next stage Si+1 for the next operation.
• The stage Si receives a new set of input from previous stage Si-1 .
parallelism in a pipelined
processor can be
achieved such that m
independent operations
can be performed
simultaneously in m
segments as shown
pipeline processor
• A pipeline processor can be defined as a processor
that consists of a sequence of processing circuits
called segments and a stream of operands (data) is
passed through the pipeline.
• In each segment partial processing of the data
stream is performed and the final output is received
when the stream has passed through the whole
pipeline.
• An operation that can be decomposed into a
sequence of well-defined sub tasks is realized
through the pipelining concept.
Classification of Pipeline Processors
• Level of Processing
• Pipeline configuration
• Type of Instruction and data
Classification according to level
of processing
• Instruction pipeline
• Arithmetic pipeline
Instruction Pipeline
• An instruction cycle may consist of many
operations like, fetch opcode, decode opcode,
compute operand addresses, fetch operands, and
execute instructions.
• These operations of the instruction execution cycle
can be realized through the pipelining concept.
Each of these operations forms one stage of a
pipeline.
• The overlapping of execution of the operations
through the pipeline provides a speedup over the
normal execution. Thus, the pipelines used for
instruction cycle operations are known as
instruction pipelines.
Instruction Pipelines
• The stream of instructions in the instruction
execution cycle, can be realized through a
pipeline where overlapped execution of different
operations are performed.
• The process of executing the instruction involves
the following major steps:
• Fetch the instruction from the main memory
• Decode the instruction
• Fetch the operand
• Execute the decoded instruction
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
[1] Fetch an instruction from memory
[2] Decode the instruction
[3] Calculate the effective address of the operand
[4] Fetch the operands from memory
[5] Execute the operation
[6] Store the result in the proper place
* Some instructions skip some phases
* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase
==> 4-Stage Pipeline
[1] FI: Fetch an instruction from memory
[2] DA: Decode the instruction and calculate
the effective address of the operand
[3] FO: Fetch the operand
[4] EX: Execute the operation
INSTRUCTION PIPELINE
Execution of Three Instructions in a 4-Stage Pipeline
Conventional (sequential)
i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelined
i
FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
Segment1:
Segment2:
Fetch instruction
from memory
Decode instruction
and calculate effective address
yes
Branch?
no
Segment3:
Fetch operand
from memory
Segment4:
Execute instruction
Interrupt
handling
Update PC
Empty pipe
yes
Interrupt?
no
Step:
1
Instruction 1
2
(Branch) 3
4
5
6
7
FI
2
3
4
5
6
7
8
9
10 11 12 13
DA FO EX
FI
DA FO EX
FI
DA FO EX
FI
FI
DA FO EX
FI
DA FO EX
FI
DA FO EX
FI
DA FO EX
Instruction buffers
• For taking the full advantage of pipelining, pipelines should
be filled continuously.
• Instruction fetch rate should be matched with the pipeline
consumption rate. To do this, instruction buffers are used.
• Instruction buffers in CPU have high speed memory for
storing the instructions. The instructions are pre-fetched in
the buffer from the main memory.
• Another alternative for the instruction buffer is the cache
memory between the CPU and the main memory.
• The advantage of cache memory is that it can be used for
both instruction and data. But cache requires more complex
control logic than the instruction buffer.
Arithmetic Pipeline
• The complex arithmetic operations like
multiplication, and floating point operations
consume much of the time of the ALU.
• These operations can also be pipelined by
segmenting the operations of the ALU and as a
consequence, high speed performance may be
achieved.
• Thus, the pipelines used for arithmetic operations
are known as arithmetic pipelines.
Arithmetic Pipelines
• The technique of pipelining can be applied to various
complex and slow arithmetic operations to speed up the
processing time.
• Arithmetic pipelines based on arithmetic operations.
Arithmetic pipelines are constructed for simple fixed-point
and complex floating-point arithmetic operations.
• These arithmetic operations are well suited to pipelining
as these operations can be efficiently partitioned into
subtasks for the pipeline stages.
• For implementing the arithmetic pipelines we generally
use following two types of adder:
• Carry propagation adder (CPA): It adds
two numbers such that carries generated in
successive digits are propagated.
• Carry save adder (CSA): It adds two
numbers such that carries generated are
not propagated rather these are saved in a
carry vector .
Fixed Arithmetic pipelines
• Ex: Multiplication of fixed numbers.
– Two fixed-point numbers are added by the
ALU using add and shift operations.
– This sequential execution makes the
multiplication a slow process.
– Multiplication is the process of adding the
multiple copies of shifted multiplicands as
shown
• The first stage generates the partial product of
the numbers, which form the six rows of shifted
multiplicands.
• In the second stage, the six numbers are given to
the two CSAs merging into four numbers.
• In the third stage, there is a single CSA merging
the numbers into 3 numbers.
• In the fourth stage, there is a single number
merging three numbers into 2 numbers.
• In the fifth stage, the last two numbers are added
through a CPA to get the final product
Floating point Arithmetic pipelines
• Floating point computations are the best
candidates for pipelining.
• example :Addition of two floating point
numbers.
– Following stages are identified for the addition
of two floating point numbers
• First stage will compare the exponents of the two
numbers.
• Second stage will look for alignment of mantissas.
• In the third stage, mantissas are added.
• In the last stage, the result is normalized
ARITHMETIC PIPELINE
Exponents
a
Floating-point adder
b
R
Mantissas
A
B
R
2a
X=Ax
Y = B x 2b
Segment 1:
[1]
[2]
[3]
[4]
Compare the exponents
Align the mantissa
Add/sub the mantissa
Normalize the result
Compare Difference
exponents
by subtractn
R
Segment 2:Choose exponent
Align mantissa
R
Add or subtract
mantissas
Segment 3:
R
Segment 4:
Adjust
exponent
R
R
Normalize
result
R
Classification according to pipeline
configuration
• Unifunction Pipelines: When a fixed and
dedicated function is performed through a
pipeline, it is called a Unifunction pipeline.
• Multifunction Pipelines: When different
functions at different times are performed
through the pipeline, this is known as
Multifunction pipeline.
– Multifunction pipelines are reconfigurable at
different times according to the operation being
performed
Classification according to type
of instruction and data
• Scalar Pipelines: This type of pipeline
processes scalar operands of repeated
scalar instructions.
• Vector Pipelines: This type of pipeline
processes vector instructions over vector
operands.
Performance and Issues in Pipelining
• Speedup : How much speed up performance we get
through pipelining.
– n: Number of tasks to be performed
• Conventional Machine (Non-Pipelined)
– tn: Clock cycle
– t1: Time required to complete the n tasks
– t1 = n * tn
• Pipelined Machine (k stages)
– tp: Clock cycle (time to complete each sub operation)
– tk: Time required to complete the n tasks
– tk = (k + n - 1) * tp
• Speedup
– Sk: Speedup
–
•
Sk = n*tn / (k + n - 1)*tp
lim
Sk =
n
tn
tp
( = k, if tn = k * tp )
Pipelining
PIPELINE AND MULTIPLE FUNCTION UNITS
Example
- 4-stage pipeline
- sub operation in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS
Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS
Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS
Speedup
Sk = 8000 / 2060 = 3.88
4-Stage Pipeline is basically identical to the system
with 4 identical function units
Ii
I i+1
I i+2
I i+3
P1
P2
P3
P4
Multiple Functional Units
• Efficiency: The efficiency of a pipeline
can be measured as the ratio of busy time
span to the total time span including the
idle time.
• Let c be the clock period of the pipeline,
the efficiency E can be denoted as:
• E = (n. m. c) / m. [m. c + (n-1).c] = n / [(m + (n-1)]
• As n-> ∞ , E becomes 1.
• Throughput: Throughput of a pipeline can
be defined as the number of results that
have been achieved per unit time.
• It can be denoted as:
– T = (n / [m + (n-1)]) / c = E / c
• Throughput denotes the computing power
of the pipeline.
• Maximum speedup, efficiency and
throughput are the ideal cases.
Limitations to speed up
• Data dependency between successive tasks:
There may be dependencies between the
instructions of two tasks used in the pipeline.
• For example:
– One instruction cannot be started until the previous
instruction returns the results, as both are
interdependent.
– Another instance of data dependency will be when that
both instructions try to modify the same data object.
These are called data hazards.
Resource Constraints: When resources are
not available at the time of execution then
delays are caused in pipelining.
For example:
1)If one common memory is used for both data and
instructions and there is need to read/write and
fetch the instruction at the same time, then only one
can be carried out and the other has to wait.
2)Limited resource like execution unit, which may be
busy at the required time.
• Branch
Instructions and Interrupts in the
program: A program is not a straight flow of
sequential instructions.
There may be branch instructions that alter the
normal flow of program, which delays the pipelining
execution and affects the performance.
Similarly, there are interrupts that postpones the
execution of next instruction until the interrupt has
been serviced.
Branches and the interrupts have damaging effects
on the pipelining.
PRINCIPLES OF DESIGNING
PIPELINE PROCESSORS
CONTENTS
 INSTRUCTION-PREFETCH AND BRANCH
HANDLING.
 DATA BUFFERING AND BUSSING
STRUCTURES.
 INTERNAL FORWARDING AND REGISTER
TAGGING.
 HAZARD DETECTION AND RESOLUTION.
INSTRUCTION PREFETCH AND BRANCH
HANDLING
• FOR DESIGNING PIPELINED INSTRUCTION
UNITS :
• Interrupts and branch, produce damaging effects
on the performance of pipeline computers.
• Two possible path for conditional branch
operation:
• 1)
Yes path.
2) No path
Five segments of Instruction
Pipeline
Fetch
Instruction
Decode
Fetch
Operands
Execute
Store
Results
Overlapped Execution of
Instruction Without Branching
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
I1
I2
I3
I4
I5
I6
I7
I8
Effect of Branching on performance of
Instruction pipeline
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
I1
I2
I3
I4
I5
I6
I7
I8
Timing Diagram for
Instruction Pipeline Operation
The Effect of a Conditional Branch on Instruction
Pipeline Operation
Instruction 3 is a conditional branch to instruction 15:
Alternative Pipeline Depiction
Instruction 3 is conditional branch to instruction 15:
Instruction Prefetching Strategy
• Instruction words ahead of one currently being
decoded are fetched from the memory before
the instruction decoding units requests them.
• 2 prefetch buffers :
– Sequential prefetch buffer ( s size)
• Holds instruction fetched during sequential run of pgm.
• When a branch is successful, the contents of this buffer are
invalidated .
– Target prefetch buffer ( r size)
• Holds instruction fetched from the target of a conditional
branch
• When the conditional branch is unsuccessful, the contents of
this buffer are invalidated
• Unconditional branch (Jump):
– The instruction word at the target of instruction is
requested immediately by decoder and decoding
ceases until the target inst. Returns from memory.
• conditional branch
– Sequential prefetching is suspended .
– Instructions are prefetched from the target memory
address of conditional branch instruction
– If branch is successful the target instruction stream
becomes the sequential stream
• Instruction prefetching reduces the damaging
effect of branching .
An Instruction Pipeline with Both Sequential
and Target Pre fetch Buffer
Memory system
(Access Time T)
Sequential Prefetch Buffer (s
words)
Target Prefetch Buffer
(t words)
Decoder (r Time units)
1
2
}
Execution
Pipeline
E
DATA BUFFERING AND BUSING
STRUCTURES
•
•
•
The processing speeds of pipeline segments are
usually unequal.
The throughput of the pipeline is inversely
proportional to the bottleneck.
It is desirable to remove the bottleneck which
causes the unnecessary congestion.
Segment 2 is The bottleneck
S1
T1
S2
T2
T1 = T3 = T,
S3
T3
T2 = 3T
Segments s1, s2 , s3 having delays T1, T2, T3
Subdivision of Segment 2
S2
S1
T
S2
T
S1
T
S2
T
S2
T
S2
T
S3
T
S3
2T
T
SUBDIVIDE THE BOTTLENECK INTO 2 DIFFERENT DIVISIONS OF S2
Replication of segment 2
S2
3T
S1
S2
S3
T
3T
T
S2
3T
If bottleneck is not sub divisible, use duplicate of bottleneck in Parallel
to smooth congestion .
Data and Instruction Buffers
• To smooth the traffic flow in a pipeline is to use
buffers to close up the speed gap between the
memory accesses for either instructions or
operands.
• Buffering can avoid unnecessary idling to the
processing stages caused by memory-access
conflicts or by unexpected branching or
interrupts.
Busing Structures
• Ideally,the subfunction being executed by one
stage should be independent of the other
subfunctions being executed by the remaining
stages;otherwise some processes in the pipeline
must be halted until the dependency is removed.
• These problems cause additional time delays.An
efficient internal busing structure is desired to
route results to the requesting stations with
minimum time delays.
Internal Forwarding and Register Tagging
• Internal forwarding refers to a “short circuit”
technique for replacing unnecessary memory
accesses by register-to-register transfers in a
sequence of fetch-arithmetic-store
operations.
• Register tagging refers to the use of tagged
registers, buffers,and reservation stations for
exploiting concurrent activities among
multiple arithmetic units.
Internal Forwarding Examples
Mi
Mi
Mi  R1( store)
R2Mi (Fetch) 2 memory access
R1
R2
R1
R2
a)store-fetch forwarding
Mi  R1( store)
R2R1 (register transfer)
Mi
Mi
R1  Mi( fetch)
R2Mi (Fetch) 2 memory access
b)Fetch-Fetch forwarding
R1
R2
Mi
R1
R2
R1  Mi( fetch)
R2R1( Register Transfer) 1
memory access
Mi  R1( Store )
Mi
Mi  R2( Store
)
2 memory access
c)Store-Store Forwarding
R1
R2
R2
Mi  R2( Store )
1 memory access
HAZARD DETECTION AND RESOLUTION
• Pipeline hazards are caused by resource-usage
conflicts among various instruction in the pipeline.
• Such hazards are triggered by inter instruction
dependencies.
• Three classes of data dependencies hazards,
according to various data update patterns:
• 1)write after read(WAR) 2)read after
write(RAW) 3)write after write(WAW)
Continued…….
• Hazard detection can be done in the
instruction-fetch stage of a pipeline processor
by comparing the domain and range of the
incoming instruction with those of the
instructions being processed in the pipe.
• A warning signal can be generated to prevent
the hazard from taking place.
MAJOR HAZARDS IN PIPELINED EXECUTION
Structural hazards(Resource Conflicts)
Hardware Resources required by the instructions in
simultaneous overlapped execution cannot be met
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available
R1 <- B + C
R1 <- R1 + 1
ADD
Data dependency
DA
B,C
+
INC
DA
bubble
R1
+1
Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
JMP
ID
PC
+
bubble
Hazards in pipelines may make it
necessary to stall the pipeline
PC
Branch address dependency
IF
ID
OF
OE
OS
Pipeline Interlock:
Detect Hazards Stall until it is cleared
STRUCTURAL HAZARDS
Structural Hazards
Occur when some resource has not been
duplicated enough to allow all combinations
of instructions in the pipeline to execute
Example: With one memory-port, a data and an instruction fetch
cannot be initiated in the same clock
i
i+1
i+2
FI
DA
FO
EX
FI
DA
FO
EX
stall
stall
FI
DA
FO
EX
The Pipeline is stalled for a structural hazard
<- Two Loads with one port memory
-> Two-port memory will serve without stall
DATA HAZARDS
Data Hazards
Occurs when the execution of an instruction
depends on the results of a previous instruction
ADD
R1, R2, R3
SUB
R4, R1, R5
Data hazard can be dealt with either hardware
techniques or software technique
Hardware Technique
Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible
Software Technique
Instruction Scheduling (compiler) for delayed load
FORWARDING HARDWARE
Example:
ADD
SUB
Register
file
R1, R2, R3
R4, R1, R5
3-stage Pipeline
MUX
MUX
I: Instruction Fetch
A: Decode, Read Registers,
ALU Operations
E: Write the result to the
destination register
Bypass
path
Result
write bus
ALU
R4
ALU result buffer
ADD
I
A
SUB
I
SUB
I
E
A
A
E
E
Without Bypassing
With Bypassing
INSTRUCTION SCHEDULING
a = b + c;
d = e - f;
Unscheduled code:
LW
Rb, b
LW
Rc, c
ADD
Ra, Rb, Rc
SW
a, Ra
LW
Re, e
LW
Rf, f
SUB
Rd, Re, Rf
SW
d, Rd
Scheduled Code:
LW
Rb, b
LW
Rc, c
LW
Re, e
ADD
Ra, Rb, Rc
LW
Rf, f
SW
a, Ra
SUB
Rd, Re, Rf
SW
d, Rd
Delayed Load
A load requiring that the following instruction not use its result
CONTROL HAZARDS
Branch Instructions
- Branch target address is not known until
the branch instruction is completed
Branch
Instruction
Next
Instruction
FI
DA
FO
EX
FI
DA
FO
EX
Target address available
- Stall -> waste of cycle times
Dealing with Control Hazards
* Prefetch Target Instruction
* Branch Target Buffer
* Loop Buffer
* Branch Prediction
* Delayed Branch
CONTROL HAZARDS
Prefetch Target Instruction
– Fetch instructions in both streams,
branch not taken and branch taken
– Both are saved until branch is executed.
Then, select the right instruction stream and discard
the wrong stream
Branch Target Buffer (BTB; Associative Memory)
– Entry: Address of previously executed branches;
Target instruction and the next few instructions
– When fetching an instruction, search BTB.
– If found, fetch the instruction stream in BTB;
– If not, new stream is fetched and update BTB
Loop Buffer (High Speed Register file)
– Storage of entire loop that allows to execute a loop without
accessing memory
Branch Prediction
– Guessing the branch condition, and fetch an instruction stream
based on the guess. Correct guess eliminates the branch
penalty.
Delayed Branch
– Compiler detects the branch and rearranges the instruction
sequence by inserting useful instructions that keep the pipeline
busy in the presence of a branch instruction
DELAYED LOAD
LOAD:
LOAD:
ADD:
STORE:
R1  M[address 1]
R2  M[address 2]
R3  R1 + R2
M[address 3]  R3
Three-segment pipeline timing
Pipeline timing with data conflict
clock cycle
Load R1
Load R2
Add R1+R2
Store R3
1 2 3 4 5 6
I A E
I A E
I A E
I A E
Pipeline timing with delayed load
clock cycle
Load R1
Load R2
NOP
Add R1+R2
Store R3
1 2 3 4 5 6 7
I A E
I A E
I A E
I A E
I A E
The data dependency is taken
care by the compiler rather
than the hardware
DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps
Using no-operation instructions
Clock cycle s :
1. Load
2. Incre m e nt
3. Add
4. Subtr act
5. Branch to X
6. NOP
7. NOP
8. Ins tr . in X
1 2 3 4 5 6 7 8 9 10
I A E
I A E
I A E
I A E
I A E
I A E
I A E
I A E
Rearranging the instructions
Clock cycle s :
1. Load
2. Incre m e nt
3. Br anch to X
4. Add
5. Subt r act
6. Ins tr . in X
1 2 3 4 5 6 7 8
I A E
I A E
I A E
I A E
I A E
I A E
Pipeline Throughput
• The average number of task initiations per
clock cycle
Dynamic pipeline and Reconfigurability
The dynamic pipeline may initiate task from different reservation
table simultaneously to allow multiple number of initiation of different
function in the same pipeline.
It is assumed that any computation step can be delay by
inserting non-compute stage.
Pipeline with perfect cycle can be better utilize than those with
non perfect initiation cycle.
Reconfigurability:Reconfigurable pipelines with different function
types are more desirable. Such an approach requires
extensive resource sharing among different functions.
To achieve this , more complicated structure of pipeline
segment and their interconnection control is needed.
Bypass technique can be used to avoid unwanted stages.
This may caused a collision when one instructions , as a result
of bypassing ,attempts to used operand fetched for preceding
instructions.
UNIVERSITY Question Bank
1.Discuss key design problems of a pipeline processor?
2.Discuss various instruction pre fetch and branch control
strategies with there effect on performance of pipeline
processor?
3.Explain internal forwarding and register tagging technique?
4.Explain causes , detection ,avoidance and resolution of pipeline
hazards?
5.For pipeline processor system explain
i) Instruction pre fetching
ii) Data dependency hazards.
6.What are the factors affecting the performance of pipeline
computers?
7.What are the different hazards in pipeline processor? How are
they detected and resolved?
Download