Description

advertisement
Part II: Tomasulo Algorithm and Load Store Queue
Assigned: June 20th
Due Date for ALL students: midnight(11:59pm) July 16th.
Introduction
In this part, we are going to implement the Tomasulo algorithm along with a Load Store Queue (LSQ) for
the pipelined processor which executes MIPS32 instructions as defined in Part I of the project. This is an
individual project.
Description of pipeline
The pipeline has five stages.
(1) Instruction Fetch (IF): IF fetches one instruction per cycle and writes into instruction queue(IQ). IF
fetches the first instruction from address 596, and increases PC by 4 at each cycle (predict not-taken),
unless PC is updated by a branch taken instruction at Commit stage (described in the commit stage).
(2) Issue: Get an instruction from the instruction queue per cycle and decode the instruction. Issue is inorder. Issue the instruction if there is an empty reservation station and an empty slot in the ROB; send the
operands to the reservation station(RS) if they are available in either the register file or the ROB. Update
the control entries to indicate the buffers are in use. The ID of the ROB entry allocated for the result is also
sent to the RS, so that the ID can be used to tag the result when it’s placed on the CDB. If either the RS or
the ROB is full, instruction issue is stalled until both have available entries. Note, for simplicity, for J
instruction, compute the target address from instruction code, set the PC to the target address, and cancel
the following instructions from instruction queue(IQ), all in the Issue cycle. Instruction Fetch fetches the
instruction from the target address of J at the next cycle. J does not need any RS.
Both Load(LW) and Store(SW) need to allocate one entry in LSQ and one entry in ROB. Load also need to
allocate one entry in RS (for the register it is loaded to). If any of LSQ, RS and ROB is full, Load cannot be
issued. If either LSQ or ROB is full, Store cannot be issued. See the description of LSQ for more details.
(3) Execute: If one or more of the operands is not yet available, monitor the CDB while waiting for the
source operand to be computed. This step checks for RAW hazards. When both operands are available at
RS, execute the operation. Instructions may take multiple clock cycles in this stage.
Load/Store executes as follows:
Load instruction takes two steps to finish (go to Write Result stage). The first step is address calculation
(AC). When the necessary register is ready and there is a free Address Unit, the address is calculated.
Otherwise, it must wait. This step takes one cycle. The second step is real memory access. A load can
access memory if and only if 1) there is no more earlier store in the LSQ with the same address or unknown
(uncalculated) address 2) there is no other load instruction is accessing memory. (Checking those
conditions happens at every cycle. We assume AC only needs the first half cycle and the checking always
happens at the second half so that no more cycle is needed if a load can access the memory. For the store
and store-load forwarding, the check procedure is similar.)The memory access step takes two cycles. When
the second step finished, the LSQ entry is freed.
Store instruction needs one step to finish (go to Commit stage). The first step is address calculation (CA).
When the necessary register is ready and there is a free Address Unit, the address is calculated. Otherwise,
it must wait. This step takes one cycle. When 1) address is ready; 2) the data to write is ready; 3) there is
no previous store/load in LSQ (store is the top instruction in LSQ), the store instruction can goes to commit
stage and the LSQ entry is freed at same time. Notice, multiple load/store instructions can finish Execute
stage at the same time.
Store-Load forwarding, when the address of a load instruction matches the address of a previous store,
the value to be stored can be forwarded to load. The forward only happens when there is 1) no store
between the load-store pair with the same address or unknown (uncalculated) address, 2) the data of Store
is ready. In the other words, the store-load forwarding only happens when the store is the last store to the
same address. When load-store forwarding happens, the load will skip the memory access stage. The
forwarding checking also happens at each cycle(at the second half).
(4) Write Result: When the result is available, write it into the ROB through CDB, as well as to any RS
waiting for this result. Mark the operand in the RS as available. We assume CDB has unlimited bandwidth,
and there is no CDB hazard at this stage. Instruction spends one cycle at this stage. Store instruction
bypasses this stage.
(5) Commit: Only 1 instruction could be committed per cycle. Commit must be in-order. There are three
different sequences of actions at commit depending on whether the committing instruction is a “regular”
instruction commit, a store, or a branch. The regular commit occurs when an instruction reaches the head of
the ROB and its result is present in the buffer; at this point, the processor updates the register with the result
and removes the instruction from the ROB. Committing a store is similar except that main memory is
updated rather than a result register. Store spends two cycles for updating main memory. All the main
memory cycles occupies the commit bandwidth. When commit a taken branch, the instruction queue, ROB,
LSQ, RS are flushed and execution is restarted at the correct target of the branch. If the branch was not
taken, the branch is simply committed. Upon an instruction commits, its entry in the ROB is reclaimed. If
multiple instructions wait for commit, commit the instructions according to their program order.
Pipeline Units
The pipeline has following function units.
(1) Instruction Queue(IQ): An unlimited buffer to hold instructions fetched by instruction fetch stage. The
instruction stays in IQ until issue cycle.
(2) Reservation Station(RS): Instructions enters RS after being issued and waits in the corresponding RS
until the functional unit and all the source operands (via register file or CDB) are available. There are 4 RS
entries, two for ALU instructions and two for load instructions.
(3) Reorder Buffer (ROB): It contains 6 entries. All the issued instructions have to stay in ROB till
finishing Commit stage.
(4) Address Unit: It calculates one effective address per cycle for Load/Store instructions at the Execution
stage. There are 2 Address Units.
(5) Integer Unit: There are two Integer Units. All the ALU, branch, etc. instructions in RSs need one cycle
at an Integer Unit.
(6) Register File: There are 32 integer registers. We assume the register file has unlimited read/write ports,
so there will be no hardware hazard for register read/write.
(7) Load Store Queue(LSQ). There are 4 entries in LSQ. Each entry contains a ROB ID, the address,
corresponding RS ID or value and several state information bits.
(7) Main Memory: We assume there are sufficient read/write ports to Main Memory, instruction fetch, data
read and data write can happen at the same cycle. Instruction fetch takes 1 cycle to finish, data read /write
takes 2 cycles to finish. For data write, the main memory is updated at the end of second cycle.
Assumptions
As in part I, assume the program starts at memory location 596 (decimal). PC is initialized to this location
for fetching the first instruction out of the memory. The data section begins at address “700”. Following
that is a sequence of 32-bit 2’s complement signed integers for the program data up to the end of file. The
instruction section won’t exceed “700”.
Assume the effective address is the same as the physical memory address.
Instruction issue is static and in-order. Instruction commit is in-order.
Proper pipeline registers must be used to latch intermediate results between pipeline stages.
In your simulator, there is no delay branch slot.
Guidelines



Your output should match the following sample output formats.
Your simulator should simulate the actual execution and produce the correct results for the given
program.
A program will be considered "complete" once the BREAK instruction leaves the Commit stage.
Command Line
Your simulator (MIPSsim) should provide the following options to users, dis/sim option is omit for this
part.
MIPSsim inputfilename outputfilename [-Tm:n]




Inputfilename - The file name of the binary input file.
Outputfilename - The file name for printing the output.
-Tm:n - Optional argument to specify the start (m) and end (n) cycles of simulation output trace.
-T0:0 indicates that no tracing is to be performed; eliminating the argument specifies that every
cycle (complete execution) is to be traced.
Download