Fall 2011 Prof. Hyesoon Kim • App1 IPC: 1.5 App2 IPC 0.5 – Average IPC ? – 1/(1/1.5+1/0.5) = 0.75 Add: 1 cycles FE_stage FE L ID L add r1, r2, r3 add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub mul sub sub mul mul sub sub EX add mul mul L MEM L WB L add add add add sub sub • Data Dependencies – RAW: Read-After-Write (True Dependence) – WAR: Anti-Depedence – WAW: Output Dependence • Control Dependence Ifetch Reg ALU – When following instructions depend on the outcome of a previous branch/jump DMem Reg • Register dependencies – RAW, WAR, WAW, based on register number • Memory dependencies – Based on memory address – This is harder • Register names known at decode • Memory addresses not known until execute • Also called memory disambiguation • Basically similar to regular (register) data dependencies: RAW, WAR, WAW • However, the exact location is not known until run-time : – A: STORE R1, 0[R2] – B: LOAD R5, 24[R8] – C: STORE R3, -8[R9] – RAW exists if (R2+0) == (R8+24) – WAR exists if (R8+24) == (R9 – 8) – WAW exists if (R2+0) == (R9 – 8) Time (clock cycles) Control hazard Ifetch and r6,r1,r7 O TARG r or r1,r2,r9 d e br (r1) TARG1 r Control hazard and data hazard Reg DMem Ifetch Reg DMem Reg DMem Reg ALU n s uncond. TARG t r. ALU I add r1,r2,r3 ALU O r d e r Cycle 1Cycle 2 Cycle 3 Cycle 4Cycle 5 Cycle 6 Cycle 7 ALU I n s t r. Ifetch Ifetch Reg Reg Reg DMem Reg Add: 1 cycles FE_stage FE L ID L add r1, r2, r3 add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub mul sub sub mul mul sub sub EX add mul mul Can we do better than this? L MEM L WB L add add add add sub sub or r8,r1,r9 xor r10,r1,r4 Reg DMem Ifetch Reg DMem Ifetch Reg DMem Ifetch Reg ALU and r6,r1,r7 Ifetch ALU sub r4,r1,r3 Reg ALU O r d e r add r1,r2,r3 Ifetch ALU I n s t r. ALU Time (clock cycles) DMem Reg Reg Reg Reg DMem Reg • Destination register is a name for instr’s result • Source registers are names for sources • Forwarding logic builds data dependence (flow) graph for instructions in the execution window and r6,r1,r7 Reg Ifetch DMem Reg Reg Reg Ifetch Ifetch Ifetch Reg Reg DMem ALU sub r4,r1,r3 Ifetch ALU add r1,r2,r3 ALU Time (clock cycles) Reg DMem Reg and r6,r1,r7 Reg DMem Ifetch Reg DMem Reg Reg Ifetch Reg Reg ALU sub r4,r1,r3 Ifetch ALU add r1,r2,r3 ALU Time (clock cycles) DMem Reg or r8,r1,r9 DMem Ifetch Reg DMem Reg Ifetch Ifetch Reg Reg Reg DMem ALU and r6,r1,r7 Reg ALU sub r4,r1,r3 Ifetch ALU add r1,r2,r3 ALU Time (clock cycles) Reg DMem Reg Reg ALU Ifetch DMem Reg Reg ALU Ifetch DMem Reg • Is that all? • How Do we know whether Add is writing a data to R1? • Broadcasting Register id Destination register ID Ifetch Reg ALU Sources Ready? DMem Reg • • • • • Sim.cpp Op structure, Op latency Pipeline latch Get Op run_a_cycle OP pool cycle_count++ Op Traces Update pipeline stages Op Op Op get_op Data structure for simulation WB MEM EX ID FE and r6,r2,r7 Reg Ifetch DMem Reg Reg Reg Ifetch Reg Reg ALU sub r4,r1,r3 Ifetch ALU O r d e r add r1,r2,r3 ALU I n s t r. DMem All sources are ready? Why not execute them? Q: What is static scheduling? A: Compiler re-order instructions DMem Reg Reg • Give processor more registers than specified by the ISA – temporarily map ISA registers (“logical” or “architected” registers) to the physical registers to avoid overwrites • Components: – mapping mechanism – physical registers • allocated vs. free registers • allocation/deallocation mechanism • Example – I3 can not exec before I2 because I3 will overwrite R5 – I5 can not go before I2 because I2, when it goes, will overwrite R2 with a stale value Program code I1: ADD R1, R2, R3 I2: SUB R2, R1, R5 I3: AND R5, R11, R7 I4: OR R8, R5, R2 I5: XOR R2, R4, R11 RAW WAR WAW • Solution: I1: ADD R1, R2, R3 Let’s give I3 temporary name/ I2: SUB R2, R1, R5 location (e.g., S) for the value I3: AND S, R5 R11, R7 it produces. R5, R2 I4: OR R8, S, • But I4 uses that value, I5: XOR R2, T, R4, R11 so we must also change that to S… • In fact, all uses of R5 from I3 to the next instruction that writes to R5 again must now be changed to S! • We remove WAW deps in the same way: change R2 in I5 (and subsequent instrs) to T. Program code • Implementation – Space for S, T, etc. – How do we know when to rename a register? I1: ADD R1, R2, R3 I2: SUB R2, R1, R5 I3: AND S, R11, R7 I4: OR R8, S, R2 I5: XOR T, R4, R11 • Simple Solution – Do renaming for every instruction – Change the name of a register each time we decode an instruction that will write to it. – Remember what name we gave it