Dynamic Scheduling Using Tomasulo’s Approach Salient Characteristics: • Track instruction dependences and availability of operands • Allow execution as soon as operands are available to avoid RAW hazards • Use register renaming to avoid WAW and WAR hazards • A dynamic scheduling scheme, in which hardware reschedules instruction execution to reduce stalls The Structure of a DLX FP Unit (see Figure 4.8) • Instructions are issued in FIFO order from Instruction Queue • Reservation stations include the operation and the actual operands • Load buffers hold the results of outstanding loads • All results from FP units or load units are put on the common data bus (CDB) Lifecycle of an Instruction 1. Issue – Get an instruction from the Instruction Queue – Issue it if there is an empty reservation station – Send operands to the reservation station if they are in the registers – A load/store operation can issue if there’s an available buffer – If a buffer or reservation station is not available, the instructions stalls due to a structural hazard Lifecycle of an Instruction (Cont’d) 2. Execute – Execute when both operands are available – Monitor the CDB while waiting for operands 3. Write Result – When the result is available, write it on the CDB – From CDB, the result is written into the registers and any reservation station waiting for this result Reservation Stations Fields • Every reservation station has six fields: OP operation to perform Qj, Qk - the reservation stations that will produce the source operand; Vj, Vk - the value of the source operands Busy - indicates that this reservation station is busy • The register file has a field, Qi Qi - the reservation station or buffer that contains the operation whose result is to be stored into the register Tomasulo’s Algorithm - Example LD LD MULTD SUBD DIVD ADDD F6, 34(R2) F2, 45(R3) F0, F2, F4 F8, F6, F2 F10, F0, F6 F6, F8, F2 See Figure 4.9 and 4. 10 See Figure 4.11 for steps in the algorithm Tomasulo’s Algorithm: A Loop-Based Example Loop: LD MULTD SD SUBI BNEZ F0, 0(R1) F4, F0, F2 0(R1), F4 R1, R1, #8 R1, Loop; branches if R1 0 • If we predict taken branches, the loop is unrolled dynamically by the hardware Scoreboard - Steps in Execution 1. Issue: The scoreboard issues an instruction if a. A functional unit for the instruction is free b. No other active instruction has the same destination register If a structural or WAW hazard exists, the instruction issue stalls. 2. Read Operands: • The scoreboard monitors the availability of operands • When operands become available, the execution begins after reading the operands • RAW hazards are dynamically resolved here Scoreboard - Step in Execution (Cont’d) 3. Execution • The functional unit begins execution • When the result is ready, it notifies the scoreboard 4. Write a result • The scoreboard checks for WAR hazard and stalls writing the result if needed