Chapter One Introduction to Pipelined Processors Handler’s Classification • Based on the level of processing, the pipelined processors can be classified as: 1. Arithmetic Pipelining 2. Instruction Pipelining 3. Processor Pipelining Arithmetic Pipelining • The arithmetic logic units of a computer can be segmented for pipelined operations in various data formats. • Example : Star 100 Arithmetic Pipelining Arithmetic Pipelining • Example : Star 100 – It has two pipelines where arithmetic operations are performed – First: Floating Point Adder and Multiplier – Second : Multifunctional • All scalar instructions • Floating point adder, multiplier and divider. – Both pipelines are 64-bit and can be split into four 32-bit at the cost of precision Star 100 Architecture Instruction Pipelining • The execution of a stream of instructions can be pipelined by overlapping the execution of current instruction with the fetch, decode and operand fetch of the subsequent instructions • It is also called instruction look-ahead Instruction Pipelining Example : 8086 • The organization of 8086 into a separate BIU and EU allows the fetch and execute cycle to overlap. This is called pipelining. Processor Pipelining • This refers to the processing of same data stream by a cascade of processors each of which processes a specific task • The data stream passes the first processor with results stored in a memory block which is also accessible by the second processor • The second processor then passes the refined results to the third and so on. Processor Pipelining Li and Ramamurthy's Classification • According to pipeline configurations and control strategies, Li and Ramamurthy classify pipelines under three schemes – Unifunction v/s Multi-function Pipelines – Static v/s Dynamic Pipelines – Scalar v/s Vector Pipelines Uni-function v/s Multi-function Pipelines Unifunctional Pipelines • A pipeline unit with fixed and dedicated function is called unifunctional. • Example: CRAY1 (Supercomputer - 1976) • It has 12 unifunctional pipelines described in four groups: – Address Functional Units: • Address Add Unit • Address Multiply Unit Unifunctional Pipelines – Scalar Functional Units • • • • Scalar Add Unit Scalar Shift Unit Scalar Logical Unit Population/Leading Zero Count Unit – Vector Functional Units • Vector Add Unit • Vector Shift Unit • Vector Logical Unit Unifunctional Pipelines – Floating Point Functional Units • Floating Point Add Unit • Floating Point Multiply Unit • Reciprocal Approximation Unit Cray 1 : Architecture Cray -1 Multifunctional • A multifunction pipe may perform different functions either at different times or same time, by interconnecting different subset of stages in pipeline. • Example 4X-TI-ASC (Supercomputer - 1973) 4X-TI ASC • It has four multifunction pipeline processors, each of which is reconfigurable for a variety of arithmetic or logic operations at different times. • It is a four central processor comprised of nine units. Multifunctional • It has – one instruction processing unit – four memory buffer units and – four arithmetic units. • Thus it provides four parallel execution pipelines below the IPU. • Any mixture of scalar and vector instructions can be executed simultaneously in four pipes. Architecture Overview of 4X-TI ASC Static Vs Dynamic Pipeline Static Pipeline • It may assume only one functional configuration at a time • It can be either unifunctional or multifunctional • Static pipelines are preferred when instructions of same type are to be executed continuously • A unifunction pipe must be static. Dynamic pipeline • It permits several functional configurations to exist simultaneously • A dynamic pipeline must be multi-functional • The dynamic configuration requires more elaborate control and sequencing mechanisms than static pipelining Scalar Vs Vector Pipeline Scalar Pipeline • It processes a sequence of scalar operands under the control of a DO loop • Instructions in a small DO loop are often prefetched into the instruction buffer. • The required scalar operands are moved into a data cache to continuously supply the pipeline with operands • Example: IBM System/360 Model 91 IBM System/360 Model 91 • In this computer, buffering plays a major role. • Instruction fetch buffering: – provide the capacity to hold program loops of meaningful size. – Upon encountering a loop which fits, the buffer locks onto the loop and subsequent branching requires less time. • Operand fetch buffering: – provide a queue into which storage can dump operands and execution units can fetch operands. – This improves operand fetching for storage-toregister and storage-to-storage instruction types. Architecture overview of IBM 360/Model 91 Vector Pipelines • They are specially designed to handle vector instructions over vector operands. • Computers having vector instructions are called vector processors. • The design of a vector pipeline is expanded from that of a scalar pipeline. • The handling of vector operands in vector pipelines is under firmware and hardware control. • Example : Cray 1 Linear pipeline (Static & Unifunctional) • In a linear pipeline data flows from one stage to another and all stages are used once in a computation and it is for one functional evaluation. Non-linear pipeline • In floating point adder, stage (2) and (4) needs a shift register. • We can use the same shift register and then there will be only 3 stages. • Then we should have a feedback from third stage to second stage. • Further the same pipeline can be used to perform fixed point addition. • A pipeline with feed-forward and/or feedback connections is called non-linear Example: 3-stage nonlinear pipeline 3 stage non-linear pipeline Output A Input Sa Output B Sb Sc • It has 3 stages Sa, Sb and Sc and latches. • Multiplexers(cross circles) can take more than one input and pass one of the inputs to output • Output of stages has been tapped and used for feedback and feed-forward. 3 stage non-linear pipeline • The above pipeline can perform a variety of functions. • Each functional evaluation can be represented by a particular sequence of usage of stages. • Some examples are: 1. Sa, Sb, Sc 2. Sa, Sb, Sc, Sb, Sc, Sa 3. Sa, Sc, Sb, Sa, Sb, Sc Reservation Table • Each functional evaluation can be represented using a diagram called Reservation Table(RT). • It is the space-time diagram of a pipeline corresponding to one functional evaluation. • X axis – time units • Y axis – stages Reservation Table • For first sequence Sa, Sb, Sc, Sb, Sc, Sa called function A , we have Sa Sb Sc 0 A 1 2 A 3 4 A A A 5 A Reservation Table • For second sequence Sa, Sc, Sb, Sa, Sb, Sc called function B, we have Sa Sb Sc 0 B 1 2 B B 3 B 4 5 B B 3 stage non-linear pipeline Output A Input Output B Sa Sc Reservation Table Time Stage 0 Sa Sb Sc Sb 1 2 3 4 5 Function A 3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Output B Sa Sb Sc Reservation Table Time Stage Sa Sb Sc 0 A 1 2 3 4 5 3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 A 1 A 2 3 4 5 3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 A 1 2 A A 3 4 5 3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 A 1 2 A 3 A A 4 5 3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 A 1 2 A 3 4 A A A 5 3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 A 1 2 A 3 4 A A A 5 A Function B 3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 B 1 2 3 4 5 3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 B 1 B 2 3 4 5 3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 B 1 2 B B 3 4 5 3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 B 1 2 B B 3 B 4 5 3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 B 1 2 B B 3 B 4 B 5 3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc Output A Input Output B Sa Sc Reservation Table Time Stage Sa Sb Sc Sb 0 B 1 2 B B 3 B 4 5 B B Reservation Table • After starting a function, the stages need to be reserved in corresponding time units. • Each function supported by multifunction pipeline is represented by different RTs • Time taken for function evaluation in units of clock period is compute time.(For A & B, it is 6) Reservation Table • Marking in same row => usage of stage more than once • Marking in same column => more than one stage at a time Multifunction pipelines • Hardware of multifunction pipeline should be reconfigurable. • Multifunction pipeline can be static or dynamic Multifunction pipelines • Static: – Initially configured for one functional evaluation. – For another function, pipeline need to be drained and reconfigured. – You cannot have two inputs of different function at the same time Multifunction pipelines • Dynamic: – Can do different functional evaluation at a time. – It is difficult to control as we need to be sure that there is no conflict in usage of stages.