CS3220 Tutorial 4
Question 1
Part a ld r1, a ld r2, b m r1, r1, r2 ld r2, c ld r3, d a r3, r3, r2 a r1, r1, r3
Assuming that each register has its own two-bit counter: ld r1-0, a ld r2-0, b m r1-1, r1-0, r2-0 ld r2-1, c ld r3-0 d, a r3-1, r3-0, r2-1 a r1-2, r1-1, r3-1
All four load instructions are sent to the load unit, both add instructions sent to the add unit, and single multiply instruction sent to the multiply unit.
If a, b cache miss, but c, d not cache miss: i) Load unit successfully loads c into r2-1 and d into r3-0 ahead of ld r1-0, a and ld r2-0, b ii) Contents of r2-1 and r3-0 are broadcasted on CDB. a r3-1, r3-0, r2-1 now have their values for r3-0, r2-1, and can be executed on the adder. The result is written to r3-1 and broadcasted on the CDB. iii) For both (i) and (ii) results are also written onto the ROB, which will reorder writes back in sequence to the physical registers (e.g., for r2, will write r2-0, then r2-1, etc.)
Part b
When a register is used for the first time, or when it is written to, the tag is incremented. ld 0, a ld 1, b m 2, 0, 1 ld 3, c ld 4, d a 5, 4, 3
;
;
;
;
;
; r1->0 r2->1 r1 is updated. Given new tag 2 r2 is written again. Given new tag r3->4 r3 is updated and given new tag a 6, 2, 5
Execution is same as above.
Max. # of tags required = max. # of outstanding instructions. Assuming that instructions pass through the pipeline smoothly,
# of outstanding instructions = # of add reservation stations + # of add pipeline stages
+ # of mult res stns + # of mult pipeline stages
+ # of load buffers + # of store buffers
= 3 + 5 + 4 + 4 +5
= 21 tags
32 should be adequate. However real situation is more complicated (e.g. many instructions waiting to be retired while many instructions don’t need to wait for registers and are being executed immediately, etc) and can cause the tags to run out.
Part c
If a and b miss, load is written to r2-1, which is different from r2-0 written by ld r2-0, b.
This removes WAW dependencies.
Question 2
Ways instructions can get their operands: i) If the operand is already written back to the register, the register will not be tagged as BUSY, and the instruction can read the value from the register. ii) If the result is in the ROB, the register is tagged as BUSY, and the current tag is issued to the instruction. The instruction can search the ROB for the value. iii) If the result is not in the ROB, wait at the reservation station. When the instruction producing the result for the wanted register completes, the result is broadcasted on the CDB, and the waiting instruction can read the value from the CDB.
Question 3
CDB – Can match tags of newly generated results to tags of registers that instructions are waiting for. E.g. ld r2-1, c ld r3-0, d a r3-1, r3-0, r2-1 ; Waiting at the ADD reservation station
When the two loads complete, the results for r3-0, r2-1 are able to be matched to the add instruction waiting for r3-0 and r2-1. This is similar to associative memory where data can be located based on content rather than address.
ROB – New instructions reading the latest copy of the register can match register number and tag number with tagged registers waiting to be written back. E.g suppose r2-1 and r3-
0 are waiting in the re-order buffer to be written back:
Re-order Buffer
Register
…. r2
Tag
….
1
Content
….
XXX r3
….
0
….
YYY
….
The add instruction a r3-1, r2-1, r3-0 is able to match r2-1 and r3-0 with the respective entry in the ROB and get the contents XXX and YYY from the ROB.
Thus the ROB is also able to extract data by tag rather than by memory address (i.e. absolute position within the ROB. I.e. matching is by the tagged register name r2-1 rather than by “3 rd entry in ROB”).