Total time: 12 cycles

advertisement
Question 13.2
S/N
0
1
2
3
4
5
6
7
8
9
10
Instruction
add r3, r1, r2
load r6.[r3]
and r7,r5,3
add r1,r6,r0
srl r7,r0,8
or r2,r4,r7
sub r5,r3,r4
add r0,r1,r10
load r6,[r5]
sub r2,r1,r6
and r3,r7,15
IF
0
1
2
3
4
5
6
7
8
9
10
ID
1
2
3
4
5
6
7
8
9
10
11
EX
2
4
5
10
6
8
9
12
13
19
14
WB Comments
3
9
6
11
7
10
12
13
18
20
15
Clock cycles = 21
b. No out-of-order capability:
S/N
0
1
2
3
4
5
6
7
8
9
10
Instruction
add r3,r1,r2
load r6,[r3]
and r7,r5,3
add r1,r6,r0
srl r7,r0,8
or r2,r4,r7
sub r5,r3,r4
add r0,r1,r10
load r6, [r5]
sub r2,r1,r6
and r3,r7,15
IF
0
1
2
3
4
5
6
7
8
9
10
Now needs 25 clock cycles.
ID
1
2
3
4
5
6
7
8
9
10
11
EX
2
4
5
10
11
13
14
15
16
22
23
WB Comments
3
9
6
11
12
14
15
16
21
23
24
c. Superscalar
Instr0
add r3, r1, r2
load r6,[r3]
add r1,r6,r0
sub r5,r3,r4
load r6,[r5]
sub r2,r1,r6
IF0
0
1
2
3
4
5
Total clock cycles = 16
ID0
1
2
3
4
5
6
EX0
2
4
10
5
7
14
WB0
3
9
11
6
13
15
Instr1
and r7, r5, 3
srl r7, r0, 8
or r2,r4,r7
add r0,r1,r10
and r3,r7,15
IF1
0
1
2
3
4
ID1
1
2
3
4
5
EX1
2
3
5
12
6
WB1
3
4
6
13
7
Question 13.3
i.
Floating point instructions take very long to execute
a. Desirable to execute them first to save time
ii.
Floating point instructions are likely to be independent of other integer
instructions (since they write to their own set of registers) and can be taken
out ahead of integer instructions.
iii.
Hence taking out a floating point instruction and processing it ahead of other
integer instructions is not only possible but very beneficial to amortize the
cost of processing.
iv.
For branches, since branch prediction is used it is always possible to execute
branches ahead of comparison results:
a. If it turns out that branching is the correct decision, can just continue with
fetching and executing target instructions.
b. If wrong decision, must take corrective action.
v.
For other integer instructions, simpler and cheaper to take from bottom of
queue since benefits will not be so great due to higher degrees of dependency.
Simplifies dispatch logic.
Question 13.6
a. Dependencies:
i1: load r1, a
i2: add r2, r1
i3: add r3, r4
i4: mul r4, r5
i5: comp r6
i6: mul r6, r7
i)
ii)
iii)
True dependency between i1 and i2
Anti-dependency between i3 and i4
True dependency between i5 and i6
b. In order issue / In order completion
CC
0
1
2
3
4
5
6
7
8
9
10
11
12
13
IF1
i1
i3
i5
i5
ID1
i1
i3
i3
i3
i3
i5
IF2
i2
i4
i4
i4
i6
i6
ID2
i2
i2
i2
i4
i4
i6
i6
i6
IF3
ID3
Mul
Add
Log
Ld
S1
S2
i1
i1
i4
i4
i4
i2
i2
i3
i3
i2
i5
i3
i4
i6
i6
i6
i6
i5
Total time: 14 cycles
c. In order issue / Out of order completion
CC
0
1
2
3
4
5
6
7
8
9
10
11
12
IF1
i1
i3
i5
i5
ID1
i1
i3
i3
i3
i3
i5
IF2
i2
i4
i4
i6
i6
i6
ID2
IF3
ID3
i2
i2
i4
i4
i4
i6
i6
Mul
Add
Log
Ld
S1
S2
i1
i1
i4
i4
i4
i6
i6
i6
i2
i2
i3
i3
i2
i5
i3
i4
i5
i6
Total time: 12 cycles
d. Out-of-order issue out-of-order completion
CC
0
1
2
3
4
5
6
7
8
9
10
IF1
i1
i4
ID1
i1
i4
IF2
i2
i5
i5
Total Time: 11 cycles
ID2
i2
i2
i5
IF3
i3
i6
ID3
Mul
Add
i4
i4
i4
i3
i3
i2
i2
i3
i6
i6
Log
Ld
S1
S2
i1
i1
i3
i5
i4
i5
i2
i6
i6
i6
Download