Non Linear Pipeline

advertisement
UNIT-4
Characteristics Of Pipeline Processors
Pipelining refers to the temporal overlapping of processing pipelines are
nothing more than assembly lines in computing that can be used for
instruction processing. A basic pipeline process a sequence of tasks or
instruction, according to the following principle of operation.
Each task is subdivided into a number of successive tasks. The processing of
each single instruction can be broken down into four sub tasks:1. Instruction Fetch
2. Instruction Decode
3. Execute
4. Write back
It is assumed that there is a pipelined stage associated with each subtask.
The same amount of time is available in each stage for performing the
required subtask.
All the pipeline stages operate like an assembly line, that is, receiving their
input from the previous stage and delivering their output to next stage. We
also assumes, the basic pipeline operates clocked, in other words
synchronously. This means that each stage accepts a non input at start of
clock cycle, each stage has a single clock cycle available for performing the
required operation and each stage increases the result to the next stage by
the beginning of subsequent clock cycle.
Linear Pipeline Processors:
A linear Pipeline processor is a cascade of processing stages which are linearly
connected to perform a fixed function over a stream of data flowing from one
end to other. In modern computers, linear pipelines are applied for instruction execution,
arithmetic computation, memory access operations.
A linear pipeline processor is constructed with be processing stages. External
inputs are fed into the pipeline at the first stage S1. The processed results are
passed from stage Si to stage Si+1 for all i = 1,2…….K-1. The final result emerges
from the pipeline at the last stage Sk. Depending on the control of data flow
along the pipeline, linear pipelines are model in two categories.
Asynchronous Model: Data flow between adjacent stages in asynchronous
pipeline is controlled by hankshaking protocol. When stage S1 is ready to
transmit, it sends a ready signal to Si + 1. After stage Si+1 receives the incoming
data, it returns an acknowledge signal to Si.
NON LINEAR PIPELINE PROCESSOR:
A Three Stage Pipeline
Clock period The logic circuitry in each stage Si has a time delay denoted by τi .
Let τl be the time delay of each interface latch. The clock period of a linear
pipeline is defined by
The reciprocal of the clock period is called the frequency f = 1/τ .
Ideally, a linear pipeline with k stages can process n tasks in
T k=k+(n-1) periods, where k cycles are used to fill up the pipeline or to
complete execution of the first task and n − 1 cycles are needed to complete
the remaining n − 1 tasks. The same number of tasks (operand pairs) can be
executed in a nonpipeline processor with an equivalent function in
T1-n.k time delay.
Speedup We define the speedup of a k -stage linear pipeline processor
over an equivalent nonpipeline processor as
It should be noted that the maximum speedup isS k k → ,for n >> k. In other
words, the maximum speedup that a linear pipeline can provide us is k ,
where k is the number of stages in the pipe. The maximum speedup is never
fully achievable because of data dependencies between instructions,
interrupts, and other factors.
Efficiency :The efficiency of a linear pipeline is measured by the percentage
of busy time-space spans over the total time-space span, which equals the sum
of all busy and idle time-space spans. Let n, k,τ be the number of tasks
(instructions), the number of pipeline stages, and the clock period of a linear
pipeline, respectively. The pipeline efficiency is defined by
Note that η  1as n  ∞. This implies that the larger the number of
tasks flowing through the pipeline, the better is its efficiency. Moreover, we
realize that η = Sk /k . This provides another view of efficiency of a linear
pipeline as the ratio of its actual speedup to the ideal speedup k . In the steady
state of a pipeline, we have n >> k, the efficiency η should approach 1.
However, this ideal case may not hold all the time because of program
branches and interrupts, data dependency, and other reasons.
Throughput :The number of results (tasks) that can be completed by a
pipeline per unit time is called its throughput. This rate reflects the computing
power of a pipeline. In terms of efficiency η and clock period τ of a linear
pipeline, we define the throughput as follows:
where n equals the total number of tasks being processed during an
observation period kτ + (n − 1)τ . In the ideal case, w = 1/ τ=f
when η → 1.This means that the maximum throughput of a linear pipeline is
equal to its frequency, which corresponds to one output result per clock period.
According to the levels of processing, pipeline processors can be classified into
the classes: arithmetic, instruction, processor, unifunction vs. multifunction,
static vs. dynamic, scalar vs. vector pipelines.
Reservation Table in linear pipelining:
The utilization pattern of successive stages in a synchronons pipeline is
specified by reservation table. The table is essentially a space time diagram
depicting the precedence relationship in using the pipeline stages. For a Kstage linear pipeline, ‘K’ clock cycles are needed to flow through the pipeline.
Reservations table in Non-linear pipelining:
Reservation table for a dynamic pipeline become more complex and
interesting because a non-linear pattern is followed. For a given non-linear
pipeline configuration, multiple reservation tables can be generated. Each
reservation table will represent evaluation of different function. Each
reservation table displays the time space flow of data through the pipeline
for one function evaluation. Different function may follows different paths on
the reservation table.
Processing sequence
S1 S2 S1 S2 S3 S1 S3 S1
Reservation table for function ‘X’
Latency Analysis:
• Latency: The number of time units (clock cycles) between two initiations of a
pipeline is the latency between them.
• A latency value k means that two initiations are separated by k clock cycles.
• Any attempt by two or more initiations to use the same pipeline stage at the
same time will cause a collision.
• A collision implies resource conflicts between two initiations in a pipeline.
Collision with scheduling latency 2:
• Latencies that cause collision are called forbidden latencies.
• Forbidden latencies for function X are 2,4,5,7
• Latencies 1,3,6 do not cause collision.
• Maximum forbidden latency can be m
• n = no. of columns
• m ≤ n-1
• All the latencies greater than m+ do not cause
collisions.
• Permissible Latency p, lies in the range:
– 1 ≤ p ≤ m-1
– Value of p should be as small as possible
– Permissible latency p=1 corresponds to an ideal case, can
be achieved by a static pipeline.
Non Linear Pipeline:
Collision Vectors:
• Combined set of permissible and forbidden latencies.
• m-bit binary vector
C = (Cm Cm-1….C2 C1 )
• The value of Ci = 1 if the latency i causes a collision; Ci
= 0 if the latency i is permissible.
• Cm = 1, always; it corresponds to the maximum forbidden
latency.
State Diagrams
• State diagrams can be constructed to specify the permissible
transitions among successive initiations.
• The collision vector, corresponding to the initial state of pipeline
at time 1, is called the initial collision vector.
• The next state of the pipeline at time t+p can be obtained by
using a bit-right shift register
• Initial CV is loaded into the register.
• The register is then shifted to the right
– When a 0 emerges from the right end after p shifts, p is a
permissible latency
– When a 1 emerges, the corresponding latency should be
forbidden
• Logical 0 enters from the left end of the shift register.
• The next state after p shifts is obtained by bitwise-ORing the
initial CV with the shifted register contents.
• This bitwise-ORing of the shifted contents is meant to prevent
collisions from the future initiations starting at time t+1 and
onward.
Latency Cycles
• Simple Cycles : Latency cycle in which each state appears only once.
• Greedy Cycles : whose edges are all made with
minimum latencies from their respective starting
states.
• MAL : minimum average latency
– At least one of the greedy cycles will lead to MAL.
Collision-free scheduling
• Finding Greedy cycles from the set of Simple
cycles.
• The Greedy cycle yielding the MAL is the final
Choice.
Optimization technique:
• Insertion of Delay stages
– Modification of reservation table
– New CV
– Improved state diagram
• To yield an optimal latency cycle
Bounds on MAL
• MAL is lower-bounded by the maximum number of
checkmarks in any row of the reservation table.
• MAL is lower than or equal to the average latency of
any greedy cycle in the state diagram.
• Average latency of any greedy cycle is upperbounded
by the number of 1’s in the initial CV plus 1.
• Optimal latency cycle is selected from one of the lowest
greedy cycles.
output
Instruction Pipeline Design:
A stream of instructions can be executed by pipeline in an overlapped manner. A
typical instruction execution consists of a sequence of operations, including
(1) Instruction fetch
(2) Decode
(3) Operand fetch
(4) Execute
(5) Write back phases
Pipeline instruction processing:
A typical instruction pipeline has seven stages as depicted below in figures;
Fetch stage (F) fetches instructions from a cache memory.
Decode stage (D) decode the instruction in order to find function to be
performed and identifies the resources needed.
· Issue stage (I) reserves resources. Resources include GPRs, bases and
functional units.
· The instructions are executed in one or several execute stages (E)
· Write back stage (WB) is used to write results into the registers.
· Memory lead and store (L/S) operations are treated as part of solution.
· Floating point add and multiply operations take four execution clock cycles.
·
·
· In many RISC processors fewer cycles are
Ideal cycles when instruction issues are blocked
needed.
due to resource conflicts
before date Y and Z are located in.
the store of sum to memory location X must wait three cycles for the add to
finish due to flow dependence.
Download