UNIT-4 Characteristics Of Pipeline Processors Pipelining refers to the temporal overlapping of processing pipelines are nothing more than assembly lines in computing that can be used for instruction processing. A basic pipeline process a sequence of tasks or instruction, according to the following principle of operation. Each task is subdivided into a number of successive tasks. The processing of each single instruction can be broken down into four sub tasks:1. Instruction Fetch 2. Instruction Decode 3. Execute 4. Write back It is assumed that there is a pipelined stage associated with each subtask. The same amount of time is available in each stage for performing the required subtask. All the pipeline stages operate like an assembly line, that is, receiving their input from the previous stage and delivering their output to next stage. We also assumes, the basic pipeline operates clocked, in other words synchronously. This means that each stage accepts a non input at start of clock cycle, each stage has a single clock cycle available for performing the required operation and each stage increases the result to the next stage by the beginning of subsequent clock cycle. Linear Pipeline Processors: A linear Pipeline processor is a cascade of processing stages which are linearly connected to perform a fixed function over a stream of data flowing from one end to other. In modern computers, linear pipelines are applied for instruction execution, arithmetic computation, memory access operations. A linear pipeline processor is constructed with be processing stages. External inputs are fed into the pipeline at the first stage S1. The processed results are passed from stage Si to stage Si+1 for all i = 1,2…….K-1. The final result emerges from the pipeline at the last stage Sk. Depending on the control of data flow along the pipeline, linear pipelines are model in two categories. Asynchronous Model: Data flow between adjacent stages in asynchronous pipeline is controlled by hankshaking protocol. When stage S1 is ready to transmit, it sends a ready signal to Si + 1. After stage Si+1 receives the incoming data, it returns an acknowledge signal to Si. NON LINEAR PIPELINE PROCESSOR: A Three Stage Pipeline Clock period The logic circuitry in each stage Si has a time delay denoted by τi . Let τl be the time delay of each interface latch. The clock period of a linear pipeline is defined by The reciprocal of the clock period is called the frequency f = 1/τ . Ideally, a linear pipeline with k stages can process n tasks in T k=k+(n-1) periods, where k cycles are used to fill up the pipeline or to complete execution of the first task and n − 1 cycles are needed to complete the remaining n − 1 tasks. The same number of tasks (operand pairs) can be executed in a nonpipeline processor with an equivalent function in T1-n.k time delay. Speedup We define the speedup of a k -stage linear pipeline processor over an equivalent nonpipeline processor as It should be noted that the maximum speedup isS k k → ,for n >> k. In other words, the maximum speedup that a linear pipeline can provide us is k , where k is the number of stages in the pipe. The maximum speedup is never fully achievable because of data dependencies between instructions, interrupts, and other factors. Efficiency :The efficiency of a linear pipeline is measured by the percentage of busy time-space spans over the total time-space span, which equals the sum of all busy and idle time-space spans. Let n, k,τ be the number of tasks (instructions), the number of pipeline stages, and the clock period of a linear pipeline, respectively. The pipeline efficiency is defined by Note that η 1as n ∞. This implies that the larger the number of tasks flowing through the pipeline, the better is its efficiency. Moreover, we realize that η = Sk /k . This provides another view of efficiency of a linear pipeline as the ratio of its actual speedup to the ideal speedup k . In the steady state of a pipeline, we have n >> k, the efficiency η should approach 1. However, this ideal case may not hold all the time because of program branches and interrupts, data dependency, and other reasons. Throughput :The number of results (tasks) that can be completed by a pipeline per unit time is called its throughput. This rate reflects the computing power of a pipeline. In terms of efficiency η and clock period τ of a linear pipeline, we define the throughput as follows: where n equals the total number of tasks being processed during an observation period kτ + (n − 1)τ . In the ideal case, w = 1/ τ=f when η → 1.This means that the maximum throughput of a linear pipeline is equal to its frequency, which corresponds to one output result per clock period. According to the levels of processing, pipeline processors can be classified into the classes: arithmetic, instruction, processor, unifunction vs. multifunction, static vs. dynamic, scalar vs. vector pipelines. Reservation Table in linear pipelining: The utilization pattern of successive stages in a synchronons pipeline is specified by reservation table. The table is essentially a space time diagram depicting the precedence relationship in using the pipeline stages. For a Kstage linear pipeline, ‘K’ clock cycles are needed to flow through the pipeline. Reservations table in Non-linear pipelining: Reservation table for a dynamic pipeline become more complex and interesting because a non-linear pattern is followed. For a given non-linear pipeline configuration, multiple reservation tables can be generated. Each reservation table will represent evaluation of different function. Each reservation table displays the time space flow of data through the pipeline for one function evaluation. Different function may follows different paths on the reservation table. Processing sequence S1 S2 S1 S2 S3 S1 S3 S1 Reservation table for function ‘X’ Latency Analysis: • Latency: The number of time units (clock cycles) between two initiations of a pipeline is the latency between them. • A latency value k means that two initiations are separated by k clock cycles. • Any attempt by two or more initiations to use the same pipeline stage at the same time will cause a collision. • A collision implies resource conflicts between two initiations in a pipeline. Collision with scheduling latency 2: • Latencies that cause collision are called forbidden latencies. • Forbidden latencies for function X are 2,4,5,7 • Latencies 1,3,6 do not cause collision. • Maximum forbidden latency can be m • n = no. of columns • m ≤ n-1 • All the latencies greater than m+ do not cause collisions. • Permissible Latency p, lies in the range: – 1 ≤ p ≤ m-1 – Value of p should be as small as possible – Permissible latency p=1 corresponds to an ideal case, can be achieved by a static pipeline. Non Linear Pipeline: Collision Vectors: • Combined set of permissible and forbidden latencies. • m-bit binary vector C = (Cm Cm-1….C2 C1 ) • The value of Ci = 1 if the latency i causes a collision; Ci = 0 if the latency i is permissible. • Cm = 1, always; it corresponds to the maximum forbidden latency. State Diagrams • State diagrams can be constructed to specify the permissible transitions among successive initiations. • The collision vector, corresponding to the initial state of pipeline at time 1, is called the initial collision vector. • The next state of the pipeline at time t+p can be obtained by using a bit-right shift register • Initial CV is loaded into the register. • The register is then shifted to the right – When a 0 emerges from the right end after p shifts, p is a permissible latency – When a 1 emerges, the corresponding latency should be forbidden • Logical 0 enters from the left end of the shift register. • The next state after p shifts is obtained by bitwise-ORing the initial CV with the shifted register contents. • This bitwise-ORing of the shifted contents is meant to prevent collisions from the future initiations starting at time t+1 and onward. Latency Cycles • Simple Cycles : Latency cycle in which each state appears only once. • Greedy Cycles : whose edges are all made with minimum latencies from their respective starting states. • MAL : minimum average latency – At least one of the greedy cycles will lead to MAL. Collision-free scheduling • Finding Greedy cycles from the set of Simple cycles. • The Greedy cycle yielding the MAL is the final Choice. Optimization technique: • Insertion of Delay stages – Modification of reservation table – New CV – Improved state diagram • To yield an optimal latency cycle Bounds on MAL • MAL is lower-bounded by the maximum number of checkmarks in any row of the reservation table. • MAL is lower than or equal to the average latency of any greedy cycle in the state diagram. • Average latency of any greedy cycle is upperbounded by the number of 1’s in the initial CV plus 1. • Optimal latency cycle is selected from one of the lowest greedy cycles. output Instruction Pipeline Design: A stream of instructions can be executed by pipeline in an overlapped manner. A typical instruction execution consists of a sequence of operations, including (1) Instruction fetch (2) Decode (3) Operand fetch (4) Execute (5) Write back phases Pipeline instruction processing: A typical instruction pipeline has seven stages as depicted below in figures; Fetch stage (F) fetches instructions from a cache memory. Decode stage (D) decode the instruction in order to find function to be performed and identifies the resources needed. · Issue stage (I) reserves resources. Resources include GPRs, bases and functional units. · The instructions are executed in one or several execute stages (E) · Write back stage (WB) is used to write results into the registers. · Memory lead and store (L/S) operations are treated as part of solution. · Floating point add and multiply operations take four execution clock cycles. · · · In many RISC processors fewer cycles are Ideal cycles when instruction issues are blocked needed. due to resource conflicts before date Y and Z are located in. the store of sum to memory location X must wait three cycles for the add to finish due to flow dependence.