Description

Part II: Tomasulo Algorithm and Load Store Queue Assigned: June 20th Due Date for ALL students: midnight(11:59pm) July 16th. Introduction In this part, we are going to implement the Tomasulo algorithm along with a Load Store Queue (LSQ) for the pipelined processor which executes MIPS32 instructions as defined in Part I of the project. This is an individual project. Description of pipeline The pipeline has five stages. (1) Instruction Fetch (IF): IF fetches one instruction per cycle and writes into instruction queue(IQ). IF fetches the first instruction from address 596, and increases PC by 4 at each cycle (predict not-taken), unless PC is updated by a branch taken instruction at Commit stage (described in the commit stage). (2) Issue: Get an instruction from the instruction queue per cycle and decode the instruction. Issue is inorder. Issue the instruction if there is an empty reservation station and an empty slot in the ROB; send the operands to the reservation station(RS) if they are available in either the register file or the ROB. Update the control entries to indicate the buffers are in use. The ID of the ROB entry allocated for the result is also sent to the RS, so that the ID can be used to tag the result when it’s placed on the CDB. If either the RS or the ROB is full, instruction issue is stalled until both have available entries. Note, for simplicity, for J instruction, compute the target address from instruction code, set the PC to the target address, and cancel the following instructions from instruction queue(IQ), all in the Issue cycle. Instruction Fetch fetches the instruction from the target address of J at the next cycle. J does not need any RS. Both Load(LW) and Store(SW) need to allocate one entry in LSQ and one entry in ROB. Load also need to allocate one entry in RS (for the register it is loaded to). If any of LSQ, RS and ROB is full, Load cannot be issued. If either LSQ or ROB is full, Store cannot be issued. See the description of LSQ for more details. (3) Execute: If one or more of the operands is not yet available, monitor the CDB while waiting for the source operand to be computed. This step checks for RAW hazards. When both operands are available at RS, execute the operation. Instructions may take multiple clock cycles in this stage. Load/Store executes as follows: Load instruction takes two steps to finish (go to Write Result stage). The first step is address calculation (AC). When the necessary register is ready and there is a free Address Unit, the address is calculated. Otherwise, it must wait. This step takes one cycle. The second step is real memory access. A load can access memory if and only if 1) there is no more earlier store in the LSQ with the same address or unknown (uncalculated) address 2) there is no other load instruction is accessing memory. (Checking those conditions happens at every cycle. We assume AC only needs the first half cycle and the checking always happens at the second half so that no more cycle is needed if a load can access the memory. For the store and store-load forwarding, the check procedure is similar.)The memory access step takes two cycles. When the second step finished, the LSQ entry is freed. Store instruction needs one step to finish (go to Commit stage). The first step is address calculation (CA). When the necessary register is ready and there is a free Address Unit, the address is calculated. Otherwise, it must wait. This step takes one cycle. When 1) address is ready; 2) the data to write is ready; 3) there is no previous store/load in LSQ (store is the top instruction in LSQ), the store instruction can goes to commit stage and the LSQ entry is freed at same time. Notice, multiple load/store instructions can finish Execute stage at the same time. Store-Load forwarding, when the address of a load instruction matches the address of a previous store, the value to be stored can be forwarded to load. The forward only happens when there is 1) no store between the load-store pair with the same address or unknown (uncalculated) address, 2) the data of Store is ready. In the other words, the store-load forwarding only happens when the store is the last store to the same address. When load-store forwarding happens, the load will skip the memory access stage. The forwarding checking also happens at each cycle(at the second half). (4) Write Result: When the result is available, write it into the ROB through CDB, as well as to any RS waiting for this result. Mark the operand in the RS as available. We assume CDB has unlimited bandwidth, and there is no CDB hazard at this stage. Instruction spends one cycle at this stage. Store instruction bypasses this stage. (5) Commit: Only 1 instruction could be committed per cycle. Commit must be in-order. There are three different sequences of actions at commit depending on whether the committing instruction is a “regular” instruction commit, a store, or a branch. The regular commit occurs when an instruction reaches the head of the ROB and its result is present in the buffer; at this point, the processor updates the register with the result and removes the instruction from the ROB. Committing a store is similar except that main memory is updated rather than a result register. Store spends two cycles for updating main memory. All the main memory cycles occupies the commit bandwidth. When commit a taken branch, the instruction queue, ROB, LSQ, RS are flushed and execution is restarted at the correct target of the branch. If the branch was not taken, the branch is simply committed. Upon an instruction commits, its entry in the ROB is reclaimed. If multiple instructions wait for commit, commit the instructions according to their program order. Pipeline Units The pipeline has following function units. (1) Instruction Queue(IQ): An unlimited buffer to hold instructions fetched by instruction fetch stage. The instruction stays in IQ until issue cycle. (2) Reservation Station(RS): Instructions enters RS after being issued and waits in the corresponding RS until the functional unit and all the source operands (via register file or CDB) are available. There are 4 RS entries, two for ALU instructions and two for load instructions. (3) Reorder Buffer (ROB): It contains 6 entries. All the issued instructions have to stay in ROB till finishing Commit stage. (4) Address Unit: It calculates one effective address per cycle for Load/Store instructions at the Execution stage. There are 2 Address Units. (5) Integer Unit: There are two Integer Units. All the ALU, branch, etc. instructions in RSs need one cycle at an Integer Unit. (6) Register File: There are 32 integer registers. We assume the register file has unlimited read/write ports, so there will be no hardware hazard for register read/write. (7) Load Store Queue(LSQ). There are 4 entries in LSQ. Each entry contains a ROB ID, the address, corresponding RS ID or value and several state information bits. (7) Main Memory: We assume there are sufficient read/write ports to Main Memory, instruction fetch, data read and data write can happen at the same cycle. Instruction fetch takes 1 cycle to finish, data read /write takes 2 cycles to finish. For data write, the main memory is updated at the end of second cycle. Assumptions As in part I, assume the program starts at memory location 596 (decimal). PC is initialized to this location for fetching the first instruction out of the memory. The data section begins at address “700”. Following that is a sequence of 32-bit 2’s complement signed integers for the program data up to the end of file. The instruction section won’t exceed “700”. Assume the effective address is the same as the physical memory address. Instruction issue is static and in-order. Instruction commit is in-order. Proper pipeline registers must be used to latch intermediate results between pipeline stages. In your simulator, there is no delay branch slot. Guidelines    Your output should match the following sample output formats. Your simulator should simulate the actual execution and produce the correct results for the given program. A program will be considered "complete" once the BREAK instruction leaves the Commit stage. Command Line Your simulator (MIPSsim) should provide the following options to users, dis/sim option is omit for this part. MIPSsim inputfilename outputfilename [-Tm:n]     Inputfilename - The file name of the binary input file. Outputfilename - The file name for printing the output. -Tm:n - Optional argument to specify the start (m) and end (n) cycles of simulation output trace. -T0:0 indicates that no tracing is to be performed; eliminating the argument specifies that every cycle (complete execution) is to be traced.

Description

Related documents

Products

Support

Description

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib