Chapter One Introduction to Pipelined Processors Principle of Designing Pipeline Processors (Design Problems of Pipeline Processors) Internal Data Forwarding and Register Tagging Internal Forwarding and Register Tagging • Internal Forwarding: It is replacing unnecessary memory accesses by register-toregister transfers. • Register Tagging: It is the use of tagged registers for exploiting concurrent activities among multiple ALUs. Internal Forwarding • Memory access is slower than register-toregister operations. • Performance can be enhanced by eliminating unnecessary memory accesses Internal Forwarding • This concept can be explored in 3 directions: 1. Store – Load Forwarding 2. Load – Load Forwarding 3. Store – Store Forwarding Store – Load Forwarding Load – Load Forwarding Store – Store Forwarding EXAMPLE Example EXAMPLE Example Register Tagging Example : IBM Model 91 : Floating Point Execution Unit Example : IBM Model 91-FPU • The floating point execution unit consists of : – Data registers – Transfer paths – Floating Point Adder Unit – Multiply-Divide Unit – Reservation stations – Common Data Bus Example : IBM Model 91-FPU • There are 3 reservation stations for adder named A1, A2 and A3 and 2 for multipliers named M1 and M2. • Each station has the source & sink registers and their tag & control fields • The stations hold operands for next execution. Example : IBM Model 91-FPU • 3 store data buffers(SDBs) and 4 floating point registers (FLRs) are tagged • Busy bits in FLR indicates the dependence of instructions in subsequent execution • Common Data Bus(CDB) is to transfer operands Example : IBM Model 91-FPU • There are 11 units to supply information to CDB: 6 FLBs, 3 adders & 2 multiply/divide unit • Tags for these stations are : Unit Tag Unit Tag FLB1 FLB2 FLB3 FLB4 0001 0010 0011 0100 ADD1 ADD2 ADD3 M1 1010 1011 1100 1000 FLB5 0101 M2 1001 FLB6 0110 Example : IBM Model 91-FPU • Internal forwarding can be achieved with tagging scheme on CDB. • Example: • Let F refers to FLR and FLBi stands for ith FLB and their contents be (F) and (FLBi) • Consider instruction sequence ADD F,FLB1 F (F) + (FLB1) MPY F,FLB2 F (F) x (FLB2) Example : IBM Model 91-FPU • During addition : – Busy bit of F is set to 1 – Contents of F and FLB1 is sent to adder A1 – Tag of F is set to 1010 (tag of adder) F Busy Bit = 1 Tag=1010 Storage Bus Instruction Unit 6 5 Floating Point Buffers (FLB) 4 Control 3 2 Floating Point Operand Stack(FLOS) Busy Bit = 1 Tag=1010 Tags 1 Decoder Tag Sink Tag Sink 1010 F Tag Tag 0001 Source Source FLB1 CTRL CTRL CTRL Tag Sink Tag Sink Adder Tag Tag Source CTRL Source CTRL Multiplier (Common Data Bus) Store 3 data buffers 2 (SDB) 1 Example : IBM Model 91-FPU • Meantime, the decode of MPY reveals F is busy, then – F should set tag of M1 as 1010 (Tag of adder) – F should change its tag to 1000 (Tag of Multiplier) – Send content of FLB2 to M1 F Busy Bit = 1 Tag=1000 Storage Bus Instruction Unit Before addition 6 5 Floating Point Buffers (FLB) 4 Control 3 2 Floating Point Operand Stack(FLOS) Busy Bit = 1 Tag=1000 Tags 1 Decoder Tag Sink Tag Source Tag Sink Tag Source Tag Sink Tag Source CTRL CTRL CTRL 1010 F 0010 Tag Sink Tag Adder FLB2 CTRL Source CTRL Multiplier (Common Data Bus) Store 3 data buffers 2 (SDB) 1 Storage Bus Instruction Unit After addition 6 5 Floating Point Buffers (FLB) 4 Control 3 2 Floating Point Operand Stack(FLOS) Busy Bit = 1 Tag=1000 Tags 1 Decoder Tag Sink Tag Source Tag Sink Tag Source Tag Sink Tag Source CTRL CTRL CTRL 1000 F 0010 Tag Sink Tag Adder FLB2 CTRL Source CTRL Multiplier (Common Data Bus) Store 3 data buffers 2 (SDB) 1 Example : IBM Model 91-FPU • When addition is done, CDB finds that the result should be sent to M1 • Multiplication is done when both operands are available Hazard Detection and Resolution Hazard Detection and Resolution • Hazards are caused by resource usage conflicts among various instructions • They are triggered by inter-instruction dependencies Terminologies: • Resource Objects: set of working registers, memory locations and special flags Hazard Detection and Resolution • Data Objects: Content of resource objects • Each Instruction can be considered as a mapping from a set of data objects to a set of data objects. • Domain D(I) : set of resource of objects whose data objects may affect the execution of instruction I.(e.g.Source Registers) Hazard Detection and Resolution • Range R(I): set of resource objects whose data objects may be modified by the execution of instruction I .(e.g. Destination Register) • Instruction reads from its domain and writes in its range Hazard Detection and Resolution • Consider execution of instructions I and J, and J appears immediately after I. • There are 3 types of data dependent hazards: 1. RAW (Read After Write) 2. WAW(Write After Write) 3. WAR (Write After Read) RAW (Read After Write) • The necessary condition for this hazard is R( I ) D( J ) RAW (Read After Write) • Example: I1 : LOAD r1,a I2 : ADD r2,r1 • I2 cannot be correctly executed until r1 is loaded • Thus I2 is RAW dependent on I1 WAW(Write After Write) • The necessary condition is R( I ) R( J ) WAW(Write After Write) • Example I1 : MUL r1, r2 I2 : ADD r1,r4 • Here I1 and I2 writes to same destination and hence they are said to be WAW dependent. WAR(Write After Read) • The necessary condition is D( I ) R( J ) WAR(Write After Read) • • • • Example: I1 : MUL r1,r2 I2 : ADD r2,r3 Here I2 has r2 as destination while I1 uses it as source and hence they are WAR dependent Hazard Detection and Resolution • Hazards can be detected in fetch stage by comparing domain and range. • Once detected, there are two methods: 1. Generate a warning signal to prevent hazard 2. Allow incoming instruction through pipe and distribute detection to all pipeline stages.