Chapter One Introduction to Pipelined Processors Principle of Designing Pipeline Processors (Design Problems of Pipeline Processors) Instruction Prefetch and Branch Handling • The instructions in computer programs can be classified into 4 types: – Arithmetic/Load Operations (60%) – Store Type Instructions (15%) – Branch Type Instructions (5%) – Conditional Branch Type (Yes – 12% and No – 8%) Instruction Prefetch and Branch Handling • Arithmetic/Load Operations (60%) : – These operations require one or two operand fetches. – The execution of different operations requires a different number of pipeline cycles Instruction Prefetch and Branch Handling • Store Type Instructions (15%) : – It requires a memory access to store the data. • Branch Type Instructions (5%) : – It corresponds to an unconditional jump. Instruction Prefetch and Branch Handling • Conditional Branch Type (Yes – 12% and No – 8%) : – Yes path requires the calculation of the new address – No path proceeds to next sequential instruction. Instruction Prefetch and Branch Handling • Arithmetic-load and store instructions do not alter the execution order of the program. • Branch instructions and Interrupts cause some damaging effects on the performance of pipeline computers. Handling Example – Interrupt System of Cray1 Cray-1 System • The interrupt system is built around an exchange package. • When an interrupt occurs, the Cray-1 saves 8 scalar registers, 8 address registers, program counter and monitor flags. • These are packed into 16 words and swapped with a block whose address is specified by a hardware exchange address register Instruction Prefetch and Branch Handling • In general, the higher the percentage of branch type instructions in a program, the slower a program will run on a pipeline processor. Effect of Branching on Pipeline Performance • Consider a linear pipeline of 5 stages Fetch Instruction Decode Fetch Operands Execute Store Results Overlapped Execution of Instruction without branching I1 I2 I3 I4 I5 I6 I7 I8 I5 is a branch instruction I1 I2 I3 I4 I5 I6 I7 I8 Estimation of the effect of branching on an n-segment instruction pipeline Estimation of the effect of branching • Consider an instruction cycle with n pipeline clock periods. • Let – p – probability of conditional branch (20%) – q – probability that a branch is successful (60% of 20%) (12/20=0.6) Estimation of the effect of branching • Suppose there are m instructions • Then no. of instructions of successful branches = mxpxq (mx0.2x0.6) • Delay of (n-1)/n is required for each successful branch to flush pipeline. Estimation of the effect of branching • Thus, the total instruction cycle required for m instructions = 1 mpq (n 1) n m 1 n n Estimation of the effect of branching • As m becomes large , the average no. of instructions per instruction cycle is given as Lt m m n m 1 m pq(n 1) n = n ? Estimation of the effect of branching • As m becomes large , the average no. of instructions per instruction cycle is given as Lt m m n m 1 m pq(n 1) n n 1 pq(n 1) n Estimation of the effect of branching • When p =0, the above measure reduces to n, which is ideal. • In reality, it is always less than n. Solution = ? Multiple Prefetch Buffers • Three types of buffers can be used to match the instruction fetch rate to pipeline consumption rate 1. Sequential Buffers: for in-sequence pipelining 2. Target Buffers: instructions from a branch target (for out-of-sequence pipelining) Multiple Prefetch Buffers • A conditional branch cause both sequential and target to fill and based on condition one is selected and other is discarded Multiple Prefetch Buffers 3. Loop Buffers – Holds sequential instructions within a loop Data Buffering and Busing Structures Speeding up of pipeline segments • The processing speed of pipeline segments are usually unequal. • Consider the example given below: S1 S2 S3 T1 T2 T3 Speeding up of pipeline segments • If T1 = T3 = T and T2 = 3T, S2 becomes the bottleneck and we need to remove it • How? • One method is to subdivide the bottleneck – Two divisions possible are: Speeding up of pipeline segments • First Method: S1 T S3 T 2T T Speeding up of pipeline segments • First Method: S1 T S3 T 2T T Speeding up of pipeline segments • Second Method: S1 T S3 T T T T Speeding up of pipeline segments • If the bottleneck is not sub-divisible, we can duplicate S2 in parallel S2 3T S1 S2 S3 T 3T T S2 3T Speeding up of pipeline segments • Control and Synchronization is more complex in parallel segments Data Buffering • Instruction and data buffering provides a continuous flow to pipeline units • Example: 4X TI ASC Example: 4X TI ASC • In this system it uses a memory buffer unit (MBU) which – Supply arithmetic unit with a continuous stream of operands – Store results in memory • The MBU has three double buffers X, Y and Z (one octet per buffer) – X,Y for input and Z for output Example: 4X TI ASC • This provides pipeline processing at high rate and alleviate mismatch bandwidth problem between memory and arithmetic pipeline Busing Structures • PBLM: Ideally subfunctions in pipeline should be independent, else the pipeline must be halted till dependency is removed. • SOLN: An efficient internal busing structure. • Example : TI ASC Example : TI ASC • In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer.