Chapter One Introduction to Pipelined Processors

Chapter One Introduction to Pipelined Processors Principle of Designing Pipeline Processors (Design Problems of Pipeline Processors) Data Buffering and Busing Structures Speeding up of pipeline segments • The processing speed of pipeline segments are usually unequal. • Consider the example given below: S1 S2 S3 T1 T2 T3 Speeding up of pipeline segments • If T1 = T3 = T and T2 = 3T, S2 becomes the bottleneck and we need to remove it • How? • One method is to subdivide the bottleneck – Two divisions possible are: Speeding up of pipeline segments • First Method: S1 T S3 T 2T T Speeding up of pipeline segments • First Method: S1 T S3 T 2T T Speeding up of pipeline segments • Second Method: S1 T S3 T T T T Speeding up of pipeline segments • If the bottleneck is not sub-divisible, we can duplicate S2 in parallel S2 3T S1 S2 S3 T 3T T S2 3T Speeding up of pipeline segments • Control and Synchronization is more complex in parallel segments Data Buffering • Instruction and data buffering provides a continuous flow to pipeline units • Example: 4X TI ASC Example: 4X TI ASC • In this system it uses a memory buffer unit (MBU) which – Supply arithmetic unit with a continuous stream of operands – Store results in memory • The MBU has three double buffers X, Y and Z (one octet per buffer) – X,Y for input and Z for output Example: 4X TI ASC • This provides pipeline processing at high rate and alleviate mismatch bandwidth problem between memory and arithmetic pipeline Busing Structures • PBLM: Ideally subfunctions in pipeline should be independent, else the pipeline must be halted till dependency is removed. • SOLN: An efficient internal busing structure. • Example : TI ASC Example : TI ASC • In TI ASC, once instruction dependency is recognized, update capability is incorporated by transferring contents of Z buffer to X or Y buffer. Internal Data Forwarding and Register Tagging Internal Forwarding and Register Tagging • Internal Forwarding: It is replacing unnecessary memory accesses by register-toregister transfers. • Register Tagging: It is the use of tagged registers for exploiting concurrent activities among multiple ALUs. Internal Forwarding • Memory access is slower than register-toregister operations. • Performance can be enhanced by eliminating unnecessary memory accesses Internal Forwarding • This concept can be explored in 3 directions: 1. Store – Load Forwarding 2. Load – Load Forwarding 3. Store – Store Forwarding Store – Load Forwarding Load – Load Forwarding Store – Store Forwarding Register Tagging Example : IBM Model 91 : Floating Point Execution Unit Example : IBM Model 91-FPU • The floating point execution unit consists of : – Data registers – Transfer paths – Floating Point Adder Unit – Multiply-Divide Unit – Reservation stations – Common Data Bus Example : IBM Model 91-FPU • There are 3 reservation stations for adder named A1, A2 and A3 and 2 for multipliers named M1 and M2. • Each station has the source & sink registers and their tag & control fields • The stations hold operands for next execution. Example : IBM Model 91-FPU • 3 store data buffers(SDBs) and 4 floating point registers (FLRs) are tagged • Busy bits in FLR indicates the dependence of instructions in subsequent execution • Common Data Bus(CDB) is to transfer operands Example : IBM Model 91-FPU • There are 11 units to supply information to CDB: 6 FLBs, 3 adders & 2 multiply/divide unit • Tags for these stations are : Unit Tag Unit Tag FLB1 FLB2 FLB3 0001 0010 0011 ADD1 ADD2 ADD3 1010 1011 1100 FLB4 0100 M1 1000 FLB5 0101 M2 1001 FLB6 0110 Example : IBM Model 91-FPU • Internal forwarding can be achieved with tagging scheme on CDB. • Example: • Let F refers to FLR and FLBi stands for ith FLB and their contents be (F) and (FLBi) • Consider instruction sequence ADD F,FLB1 F  (F) + (FLB1) MPY F,FLB2 F  (F) x (FLB2) Example : IBM Model 91-FPU • During addition : – Busy bit of F is set to 1 – Contents of F and FLB1 is sent to adder A1 – Tag of F is set to 1010 (tag of adder) F Busy Bit = 1 Tag=1010 Storage Bus Instruction Unit 6 5 Floating Point Buffers (FLB) 4 Control 3 2 Floating Point Operand Stack(FLOS) Busy Bit = 1 Tag=1010 Tags 1 Decoder Tag Sink Tag Sink 1010 F Tag Tag 0001 Source Source FLB1 CTRL CTRL CTRL Tag Sink Tag Sink Adder Tag Tag Source CTRL Source CTRL Multiplier (Common Data Bus) Store 3 data buffers 2 (SDB) 1 Example : IBM Model 91-FPU • Meantime, the decode of MPY reveals F is busy, then – F should set tag of M1 as 1010 (Tag of adder) – F should change its tag to 1000 (Tag of Multiplier) – Send content of FLB2 to M1 F Busy Bit = 1 Tag=1000 Storage Bus Instruction Unit 6 5 Floating Point Buffers (FLB) 4 Control 3 2 Floating Point Operand Stack(FLOS) Busy Bit = 1 Tag=1000 Tags 1 Decoder Tag Sink Tag Source Tag Sink Tag Source Tag Sink Tag Source CTRL CTRL CTRL Tag Sink Tag 1000 F 0010 Adder Source CTRL FLB2 CTRL Multiplier (Common Data Bus) Store 3 data buffers 2 (SDB) 1 Example : IBM Model 91-FPU • When addition is done, CDB finds that the result should be sent to M1 • Multiplication is done when both operands are available Hazard Detection and Resolution Hazard Detection and Resolution • Hazards are caused by resource usage conflicts among various instructions • They are triggered by inter-instruction dependencies Terminologies: • Resource Objects: set of working registers, memory locations and special flags Hazard Detection and Resolution • Data Objects: Content of resource objects • Each Instruction can be considered as a mapping from a set of data objects to a set of data objects. • Domain D(I) : set of resource of objects whose data objects may affect the execution of instruction I. Hazard Detection and Resolution • Range R(I): set of resource objects whose data objects may be modified by the execution of instruction I • Instruction reads from its domain and writes in its range Hazard Detection and Resolution • Consider execution of instructions I and J, and J appears immediately after I. • There are 3 types of data dependent hazards: 1. RAW (Read After Write) 2. WAW(Write After Write) 3. WAR (Write After Write) RAW (Read After Write) • The necessary condition for this hazard is R( I )  D( J )   RAW (Read After Write) • Example: I1 : LOAD r1,a I2 : ADD r2,r1 • I2 cannot be correctly executed until r1 is loaded • Thus I2 is RAW dependent on I1 WAW(Write After Write) • The necessary condition is R( I )  R( J )   WAW(Write After Write) • Example I1 : MUL r1, r2 I2 : ADD r1,r4 • Here I1 and I2 writes to same destination and hence they are said to be WAW dependent. WAR(Write After Read) • The necessary condition is D( I )  R( J )   WAR(Write After Read) • • • • Example: I1 : MUL r1,r2 I2 : ADD r2,r3 Here I2 has r2 as destination while I1 uses it as source and hence they are WAR dependent Hazard Detection and Resolution • Hazards can be detected in fetch stage by comparing domain and range. • Once detected, there are two methods: 1. Generate a warning signal to prevent hazard 2. Allow incoming instruction through pipe and distribute detection to all pipeline stages.

Chapter One Introduction to Pipelined Processors

Related documents

Products

Support

Chapter One Introduction to Pipelined Processors

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib