CS 7810 Lecture 11 Delaying Physical Register Allocation Through Virtual-Physical Registers T. Monreal, A. Gonzalez, M. Valero, J. Gonzalez, V. Vinals Proceedings of MICRO-32 November 1999 Register File Design Considerations • Number of ports = 3 x issue width • Number of entries = window size + logical-regs • Multiple threads more registers (more power) • Wire delays, clock speeds multiple cycle access • Pipelining a RAM structure is hard Register Allocation Fetch Rename assign pr7 cycle 4 Issue cycle 15 no result – 26 cyc Complete Wake-up write pr7 cycle 30 read pr7 cycle 50 Commit release pr7 cycle 80 useful time – 20 cyc no activity – 30 cyc Two-Level Register File Base regfile Two-level regfile Virtual-Physical Registers Register map table lr3 vr7 vr7 vr7 vr7 Virtual map table Virtual-Physical Registers Register map table lr3 vr7 vr7 vr7 vr7 Virtual map table Instruction issues Virtual-Physical Registers Register map table lr3 vr7, pr9 vr7 (pr9) vr7 pr9 Virtual map table vr7, pr9 Instruction completes Is assigned pr9 Virtual-Physical Registers Register map table lr3 vr7, pr9 vr7 (pr9) vr7 pr9 pr9 Virtual map table Lack of Registers Finishes, has no register, keeps re-executing In-flight window Has physical register Has no physical register Lack of Registers cycle t cycle t+1 commits Finishes, has no register, keeps re-executing gets reg In-flight window Has physical register Has no physical register Deadlock Who will generate a register for this instr? Finishes, has no register, keeps re-executing Solution: Reserve a register for the oldest instruction In-flight window Has physical register Has no physical register Sequential Execution Oldest instr has reserved register In-flight window Has physical register Has no physical register Sequential Execution instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register Sequential Execution Behaves like an in-order processor instr commits, releases another reg, that is then reserved for the new oldest instr In-flight window Has physical register Has no physical register Reserving All Registers Allows quick progress, but almost behaves like a conventional processor Has physical register Has no physical register Register Stealing Instr finishes; steals register from the youngest finished instr In-flight window Has physical register Has no physical register • No reservation of regs • The younger instrs may have to execute twice • Note the pre-execution effect Implementation • Finished instructions have to remain in issueq in case they have to re-execute • Issued dependents of the victim instruction need not re-execute • The VP tag of the victim has to be broadcast so that unissued dependents can reset the ready bit • Can benefit from an instruction reuse buffer? • Pre-execution without explicitly attempting it Results • Improves the base case by 5% (Int programs) and 24% (FP programs) • FP programs have more ILP, better branch prediction, and are more limited by cache misses • Re-executions: 10% (int) 58% (fp) • Steals: 5% (int) 12% (fp) • For the same IPC, VP registers employ 25% fewer registers Title • Bullet