THE MICROARCHITECTURE LEVEL

advertisement
4
THE MICROARCHITECTURE LEVEL
1
MAR
To
and
from
main
memory
Memory
control
registers
MDR
PC
MBR
SP
LV
Control signals
Enable onto B bus
CPP
Write C bus to register
TOS
OPC
C bus
B bus
H
A
ALU control
B
6
N
Z
ALU
Shifter
Shifter control
2
Figure 4-1. The data path of the example microarchitecture used in this chapter.
2222222222222222222222222222222222222222222222222
12222222222222222222222222222222222222222222222222
F 1 F 1 ENA 1 ENB 1 INVA 1 INC 1 Function 1
1
1
1
1
1
1 0 1 1 1
0
1
1
0
0
0
A
2222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
1
1
0
1
0
1
0
0
B
3
1
1
1
1
1
1
1
1
0 1 1 1 1
1
1
1 0 1 A
1 0
1
12222222222222222222222222222222222222222222222222
3
1
1 0 1 B
1 1
1
12222222222222222222222222222222222222222222222222
1 1 0 1 1
0
1
1
1
1 1
1
1 1 1 1 1 1
0
0
A+B
2222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
1
1 1 1 1 1
0
1 1 1 A+B+1 1
1 1
1
12222222222222222222222222222222222222222222222222
1
1 1 1 A+1
1 0
1
1 1 1 1 1 1
0
1
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
1
1
0
1
0
1
B
+
1
1
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
1
1
1
1
1
1
B−A
1
1
1
1
1
1
1
1
1 1 1 1 0
1
1
1 1 1 B−1
1 1
1
12222222222222222222222222222222222222222222222222
1
1 1 1 −A
1 0
1
12222222222222222222222222222222222222222222222222
1 1 1 1 1
1
1
1
1 1
1
1 0 1 0 1 1
0
0
A AND B 1
2222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
1
0 1 1 1 1
0
1 0 1 A OR B 1
1
1 1
12222222222222222222222222222222222222222222222222
1
1 0 1 0
1
1 0
1 0 1 1 1 0
0
1
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
0
1
0
0
0
1
1
1
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
11
1
1
1
1
1
1
12222222222222222222222222222222222222222222222222
1
1 0 1 −1
1
1 0
1 0 1 1 1 0
Figure 4-2. Useful combinations of ALU signals and the function performed.
Registers loaded
instantaneously from
C bus and memory on
rising edge of clock
Shifter
output
stable
Cycle 1
starts
here
Clock cycle 1
∆w
∆x
Set up
signals
to drive
data path
Drive H
and
B bus
∆y
Clock cycle 2
New MPC used to
load MIR with next
microinstruction here
∆z
ALU
and
shifter
MPC
available
here
Propagation
from shifter
to registers
Figure 4-3. Timing diagram of one data path cycle.
32-Bit MAR (counts in words)
Discarded
0 0
32-Bit address bus (counts in bytes)
Figure 4-4. Mapping of the bits in MAR to the address bus.
Bits
9
3
NEXT_ADDRESS
Addr
J
M
P
C
J
A
M
N
8
J
A
M
Z
JAM
S
L
L
8
9
3
4
S F0 F1 E E I I H O T C L S P M M W R F
R
P O P V P C D A R E E
N N N N
I
T
R R T A C
A
C S P
A B V C
1
A
E D H
ALU
C
Mem
B
bus
B
B bus registers
0 = MDR
1 = PC
2 = MBR
3 = MBRU
4 = SP
Figure 4-5. The microinstruction format for the Mic-1.
5 = LV
6 = CPP
7 = TOS
8 = OPC
9 -15 none
Memory control signals (rd, wr, fetch)
3
4
4-to-16
Decoder
MAR
MDR
MPC
9
PC
O
8
MBR
SP
512 × 36-Bit
control store
for holding
the microprogram
8
LV
JMPC
CPP
Addr
J
ALU
C
MIR
M B
TOS
JAMN/JAMZ
OPC
H
B bus
2
1-bit flip–flop
N
6
ALU
control
High
bit
ALU
Control
signals
Enable
onto
B bus
Z
Shifter
C bus
2
Write
C bus
to register
Figure 4-6. The complete block diagram of our example microarchitecture, the Mic-1.
Address
Addr
JAM
0x75
0x92
001
Data path control bits
JAMZ bit set
…
0x92
…
0x192
One of
these
will follow
0x75
depending
on Z
Figure 4-7. A microinstruction with JAMZ set to 1 has two potential successors.
SP
LV
SP
LV
SP
LV
a3
a2
a1
(a)
108
104
100
b4
b3
b2
b1
a3
a2
a1
c2
c1
b4
b3
b2
b1
a3
a2
a1
(b)
(c)
SP
LV
d5
d4
d3
d2
d1
a3
a2
a1
(d)
Figure 4-8. Use of a stack for storing local variables. (a)
While A is active. (b) After A calls B. (c) After B calls C. (d)
After C and B return and A calls D.
,
,
,
SP
SP
LV
a2
a3
a2
a1
(a)
LV
a3
a2
a3
a2
a1
(b)
SP
LV
a2 + a3
a3
a2
a1
(c)
SP
LV
a3
a2
a2 + a3
(d)
Figure 4-9. Use of an operand stack for doing an arithmetic computation.
Current
Operand
Stack 3
SP
Current
Local
Variable
Frame 3
LV
Local
Variable
Frame 2
Constant
Pool
Local
Variable
Frame 1
Method
Area
CPP
Figure 4-10. The various parts of the IJVM memory.
PC
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
Hex 1
Mnemonic
Meaning
1
1
1
1
0x10 1 BIPUSH byte
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Push byte onto stack
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
0x59
DUP
Copy top word on stack and push onto stack
1
1
1
1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 0xA7 1 GOTO offset
1 Unconditional branch
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Pop two words from stack; push their sum
1
0x60 1 IADD
1
1
1
1
0x7E
IAND
Pop
two
words
from
stack;
push
Boolean
AND
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
0x99 1 IFEQ offset
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Pop word from stack and branch if it is zero
1
1 0x9B 1 IFLT offset
1 Pop word from stack and branch if it is less than zero 1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
offset
Pop
two
words
from
stack;
branch
if
equal
0x9F
IF
3
ICMPEQ
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1 0x84 1 IINC varnum const
1 Add a constant to a local variable
1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
0x15 1 ILOAD varnum
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Push local variable onto stack
1
1 0xB6 1 INVOKEVIRTUAL disp 1 Invoke a method
1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
0x80 1 IOR
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Pop two words from stack; push Boolean OR
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
0xAC IRETURN
Return from method with integer value
1
1
1
1
0x36 1 ISTORE varnum
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Pop word from stack and store in local variable
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
0x64
ISUB
Pop two words from stack; push their difference
1
1
1
1
index
Push
constant
from
constant
pool
onto
stack
0x13
LDC
3
W
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Do nothing
1
0x00 1 NOP
1
1
1
1
0x57 1 POP
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Delete word on top of stack
1
0x5F 1 SWAP
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Swap the two top words on the stack
1
1 0xC4 1 WIDE
1 Prefix instruction; next instruction has a 16-bit index 1
1
1
1
1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-11. The IJVM instruction set. The operands byte,
const, and varnum are 1 byte. The operands disp, index, and
offset are 2 bytes.
Stack after
INVOKEVIRTUAL
Caller's LV
Caller's PC
Space for
caller's local
variables
Stack before
INVOKEVIRTUAL
Pushed
parameters
Caller's
local
variable
frame
Parameter 3
Parameter 2
Parameter 1
OBJREF
Previous LV
Previous PC
SP
Caller's
local
variables
Parameter 2
Parameter 1
Link ptr
(a)
SP
Stack base
after
INVOKEVIRTUAL
Stack base
before
INVOKEVIRTUAL
LV
Parameter 3
Parameter 2
Parameter 1
Link ptr
Previous LV
Previous PC
Caller's
local
variables
Parameter 2
Parameter 1
Link ptr
(b)
Figure 4-12. (a) Memory before executing INVOKEVIRTUAL.
(b) After executing it.
LV
Stack before
IRETURN
Return value
Previous LV
Previous PC
SP
Caller's
local
variables
Parameter 3
Parameter 2
Parameter 1
Link ptr
Previous LV
Previous PC
Caller's
local
variable
frame
Caller's
local
variables
Parameter 2
Parameter 1
Link ptr
(a)
Stack base
before
IRETURN
LV
Stack after
IRETURN
Return value
Previous LV
Previous PC
Stack base
after
IRETURN
SP
Caller's
local
variables
Parameter 2
Parameter 1
Link ptr
LV
(b)
Figure 4-13. (a) Memory before executing IRETURN. (b) After executing it.
i = j + k;
if (i == 3)
k = 0;
else
j = j − 1;
(a)
1
ILOAD j // i = j + k
2
ILOAD k
3
IADD
4
ISTORE i
5
ILOAD i // if (i < 3)
6
BIPUSH 3
7
IF3ICMPEQ L1
8
ILOAD j // j = j − 1
9
BIPUSH 1
10
ISUB
11
ISTORE j
12
GOTO L2
13 L1:
BIPUSH 0
14
ISTORE k
15 L2:
(b)
0x15 0x02
0x15 0x03
0x60
0x36 0x01
0x15 0x01
0x10 0x03
0x9F 0x00 0x0D
0x15 0x02
0x10 0x01
0x64
0x36 0x02
0xA7 0x00 0x07
// k = 0 0x10 0x00
0x36 0x03
(c)
Figure 4-14. (a) A Java fragment. (b) The corresponding Java
assembly language. (c) The IJVM program in hexadecimal.
0
j
1
k
j
2
j+k
3
j
8
1
j
9
j–1
10
11
4
j
5
3
j
6
7
12
0
13
14
15
Figure 4-15. The stack after each instruction of Fig. 4-14(b).
222222222222222222222222222
1222222222222222222222222222
1
DEST = H
1
1
DEST
=
SOURCE
2
22222222222222222222222222
1
1
33
1222222222222222222222222222
1
DEST = H
3
33333333
1
1
DEST = SOURCE
21 22222222222222222222222222
1
1
DEST = H + SOURCE
21 22222222222222222222222222
1 DEST = H + SOURCE + 1 1
21 222222222222222222222222221
DEST = H + 1
1
21 22222222222222222222222222
1 DEST = SOURCE + 1
1
21 22222222222222222222222222
1
DEST
=
SOURCE
−
H
21 222222222222222222222222221
1
DEST = SOURCE − 1
21 22222222222222222222222222
1
1
DEST = −H
21 22222222222222222222222222
1
DEST = H AND SOURCE 1
21 22222222222222222222222222
1 DEST = H OR SOURCE
1
21 222222222222222222222222221
DEST = 0
1
21 22222222222222222222222222
1 DEST = 1
1
21 22222222222222222222222222
1
DEST
=
−1
12222222222222222222222222221
Figure 4-16. All permitted operations. Any of the above
operations may be extended by adding ‘‘<< 8’’ to them to shift
the result left by 1 byte. For example, a common operation is
H = MBR < < 8
Operations
Comments
2Label
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
PC = PC + 1; fetch; goto (MBR)
MBR holds opcode; get next byte; dispatch
2Main1
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
nop1
goto
Main1
Do
nothing
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iadd1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
iadd2
H = TOS
H = top of stack
MDR = TOS = MDR + H; wr; goto Main1
Add top two words; write to top of stack
2iadd3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
isub1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
isub2
H = TOS
H = top of stack
isub3
MDR = TOS = MDR − H; wr; goto Main1
Do subtraction; write to top of stack
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iand1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
iand2
H = TOS
H = top of stack
iand3
MDR = TOS = MDR AND H; wr; goto Main1 Do AND; write to new top of stack
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
ior1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
ior2
H = TOS
H = top of stack
ior3
MDR = TOS = MDR OR H; wr; goto Main1 Do OR; write to new top of stack
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
dup1
MAR = SP = SP + 1
Increment SP and copy to MAR
dup2
MDR
=
TOS;
wr;
goto
Main1
Write new stack word
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
pop1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
pop2
Wait for new TOS to be read from memory
TOS = MDR; goto Main1
Copy new word to TOS
2pop3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
swap1
MAR = SP − 1; rd
Set MAR to SP − 1; read 2nd word from stack
swap2
MAR = SP
Set MAR to top word
swap3
H = MDR; wr
Save TOS in H; write 2nd word to top of stack
swap4
MDR = TOS
Copy old TOS to MDR
swap5
MAR = SP − 1; wr
Set MAR to SP − 1; write as 2nd word on stack
TOS = H; goto Main1
Update TOS
2swap6
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
bipush1
SP = MAR = SP + 1
MBR = the byte to push onto stack
bipush2
PC = PC + 1; fetch
Increment PC, fetch next opcode
MDR = TOS = MBR; wr; goto Main1
Sign-extend constant and push on stack
2bipush3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iload1
H = LV
MBR contains index; copy LV to H
iload2
MAR = MBRU + H; rd
MAR = address of local variable to push
iload3
MAR = SP = SP + 1
SP points to new top of stack; prepare write
iload4
PC = PC + 1; fetch; wr
Inc PC; get next opcode; write top of stack
TOS = MDR; goto Main1
Update TOS
2iload5
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
istore1
H = LV
MBR contains index; Copy LV to H
istore2
MAR = MBRU + H
MAR = address of local variable to store into
istore3
MDR = TOS; wr
Copy TOS to MDR; write word
istore4
SP = MAR = SP − 1; rd
Read in next-to-top word on stack
istore5
PC = PC + 1; fetch
Increment PC; fetch next opcode
istore6
TOS = MDR; goto Main1
Update TOS
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
wide1
PC = PC + 1; fetch; goto (MBR OR 0x100) Multiway branch with high bit set
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
wide3iload1 PC = PC + 1; fetch
MBR contains 1st index byte; fetch 2nd
wide3iload2 H = MBRU << 8
H = 1st index byte shifted left 8 bits
wide3iload3 H = MBRU OR H
H = 16-bit index of local variable
3iload4 MAR = LV + H; rd; goto iload3
MAR = address of local variable to push
2wide
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
wide3istore1 PC = PC + 1; fetch
MBR contains 1st index byte; fetch 2nd
wide3istore2 H = MBRU << 8
H = 1st index byte shifted left 8 bits
wide3istore3 H = MBRU OR H
H = 16-bit index of local variable
wide
3
istore4
MAR
=
LV
+
H;
goto
istore3
MAR = address of local variable to store into
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
ldc3w1
PC = PC + 1; fetch
MBR contains 1st index byte; fetch 2nd
ldc3w2
H = MBRU << 8
H = 1st index byte << 8
ldc3w3
H = MBRU OR H
H = 16-bit index into constant pool
ldc3w4
MAR = H + CPP; rd; goto iload3
MAR = address of constant in pool
Figure 4-17. The microprogram for the Mic-1 (part 1 of 3).
Operations
Comments
2Label
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iinc1
H = LV
MBR contains index; Copy LV to H
iinc2
MAR = MBRU + H; rd
Copy LV + index to MAR; Read variable
iinc3
PC = PC + 1; fetch
Fetch constant
iinc4
H = MDR
Copy variable to H
iinc5
PC = PC + 1; fetch
Fetch next opcode
MDR = MBR + H; wr; goto Main1
Put sum in MDR; update variable
2iinc6
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
goto1
OPC = PC − 1
Save address of opcode.
goto2
PC = PC + 1; fetch
MBR = 1st byte of offset; fetch 2nd byte
goto3
H = MBR << 8
Shift and save signed first byte in H
goto4
H = MBRU OR H
H = 16-bit branch offset
goto5
PC = OPC + H; fetch
Add offset to OPC
goto Main1
Wait for fetch of next opcode
2goto6
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iflt1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
iflt2
OPC = TOS
Save TOS in OPC temporarily
iflt3
TOS = MDR
Put new top of stack in TOS
iflt4
N
=
OPC;
if
(N)
goto
T;
else
goto
F
Branch
on N bit
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
ifeq1
MAR = SP = SP − 1; rd
Read in next-to-top word of stack
ifeq2
OPC = TOS
Save TOS in OPC temporarily
ifeq3
TOS = MDR
Put new top of stack in TOS
Z = OPC; if (Z) goto T; else goto F
Branch on Z bit
2ifeq4
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
if3icmpeq1
MAR = SP = SP − 1; rd
Read in next-to-top word of stack
if3icmpeq2
MAR = SP = SP − 1
Set MAR to read in new top-of-stack
if3icmpeq3
H = MDR; rd
Copy second stack word to H
if3icmpeq4
OPC = TOS
Save TOS in OPC temporarily
if3icmpeq5
TOS = MDR
Put new top of stack in TOS
3icmpeq6
Z = OPC − H; if (Z) goto T; else goto F If top 2 words are equal, goto T, else goto F
2if22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
T
OPC = PC − 1; fetch; goto goto2
Same as goto1; needed for target address
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
F
PC = PC + 1
Skip first offset byte
F2
PC = PC + 1; fetch
PC now points to next opcode
F3
goto Main1
Wait for fetch of opcode
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
invokevirtual1
PC = PC + 1; fetch
MBR = index byte 1; inc. PC, get 2nd byte
invokevirtual2
H = MBRU << 8
Shift and save first byte in H
invokevirtual3
H = MBRU OR H
H = offset of method pointer from CPP
invokevirtual4
MAR = CPP + H; rd
Get pointer to method from CPP area
invokevirtual5
OPC = PC + 1
Save Return PC in OPC temporarily
invokevirtual6
PC = MDR; fetch
PC points to new method; get param count
invokevirtual7
PC = PC + 1; fetch
Fetch 2nd byte of parameter count
invokevirtual8
H = MBRU << 8
Shift and save first byte in H
invokevirtual9
H = MBRU OR H
H = number of parameters
invokevirtual10 PC = PC + 1; fetch
Fetch first byte of # locals
invokevirtual11 TOS = SP − H
TOS = address of OBJREF − 1
invokevirtual12 TOS = MAR = TOS + 1
TOS = address of OBJREF (new LV)
invokevirtual13 PC = PC + 1; fetch
Fetch second byte of # locals
invokevirtual14 H = MBRU << 8
Shift and save first byte in H
invokevirtual15 H = MBRU OR H
H = # locals
invokevirtual16 MDR = SP + H + 1; wr
Overwrite OBJREF with link pointer
invokevirtual17 MAR = SP = MDR;
Set SP, MAR to location to hold old PC
invokevirtual18 MDR = OPC; wr
Save old PC above the local variables
invokevirtual19 MAR = SP = SP + 1
SP points to location to hold old LV
invokevirtual20 MDR = LV; wr
Save old LV above saved PC
invokevirtual21 PC = PC + 1; fetch
Fetch first opcode of new method.
invokevirtual22 LV = TOS; goto Main1
Set LV to point to LV Frame
Figure 4-17. The microprogram for the Mic-1 (part 2 of 3).
Label
Operations
Comments
2222222222222222222222222222222222222222222222222222222222222222222222
ireturn1 MAR = SP = LV; rd
Reset SP, MAR to get link pointer
ireturn2
Wait for read
ireturn3 LV = MAR = MDR; rd
Set LV to link ptr; get old PC
ireturn4 MAR = LV + 1
Set MAR to read old LV
ireturn5 PC = MDR; rd; fetch
Restore PC; fetch next opcode
ireturn6 MAR = SP
Set MAR to write TOS
ireturn7 LV = MDR
Restore LV
ireturn8 MDR = TOS; wr; goto Main1 Save return value on original top of stack
Figure 4-17. The microprogram for the Mic-1 (part 3 of 3).
BIPUSH
(0×10)
BYTE
Figure 4-18. The BIPUSH instruction format.
ILOAD
(0x15)
(a)
INDEX
WIDE
(0xC4)
ILOAD
(0x15)
INDEX
BYTE 1
INDEX
BYTE 2
(b)
Figure 4-19. (a) ILOAD with a 1-byte index. (b) WIDE ILOAD
with a 2-byte index.
Address
0×1FF
Control store
Microinstruction
execution order
0×115
wide_iload1
0×100
Main1
0×C4
wide1
0×15
iload1
WIDE
ILOAD ILOAD
3
1
1
2
2
0×00
Figure 4-20. The initial microinstruction sequence for ILOAD
and WIDE ILOAD. The addresses are examples.
IINC
(0x84)
INDEX
CONST
Figure 4-21. The IINC instruction has two different operand fields.
Memory
1 Byte
n+3
n + 2 OFFSET BYTE 2
OFFSET BYTE 2
OFFSET BYTE 2
OFFSET BYTE 2
OFFSET BYTE 2
n + 1 OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 1
GOTO (0xA7)
GOTO (0xA7)
GOTO (0xA7)
GOTO (0xA7)
GOTO (0xA7)
n
n+1
n+1
n+2
n+2
n
n
n
OFFSET BYTE 1
OFFSET BYTE 1
OFFSET BYTE 2
n
Registers
PC
OPC
MBR
0xA7
OFFSET BYTE 1
OFFSET 1 << 8
H
(a)
(b)
(c)
(d)
(e)
Figure 4-22. The situation at the start of various microinstructions. (a) Main1 . (b) goto1. (c) goto2. (d) goto3. (e) goto4.
22222222222222222222222222222222222222222222222222222222222222222222222222222222
1
Label
Operations
Comments
21 2222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
1 pop1
1
Wait for new TOS to be read from memory 1
1 pop2
1 pop3
1
TOS = MDR; goto Main1
Copy new word to TOS
1 Main1
PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch11
122222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-23. New microprogram sequence for executing POP.
222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Label
1
Operations
Comments
222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
1 pop1
1
MBR holds opcode; fetch next byte
1 Main1.pop PC = PC + 1; fetch
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222
TOS = MDR; goto (MBR) Copy new word to TOS; dispatch on opcode 11
1 pop3
Figure 4-24. Enhanced microprogram sequence for executing
POP.
22222222222222222222222222222222222222222222222222222222222222222222222222222222
122222222222222222222222222222222222222222222222222222222222222222222222222222222
1
Label
Operations
Comments
1
1
MBR contains index; Copy LV to H
1 iload1 H = LV
1
MAR = address of local variable to push
1 iload2 MAR = MBRU + H; rd
1
1 iload3 MAR = SP = SP + 1
SP points to new top of stack; prepare write 1
1 iload4 PC = PC + 1; fetch; wr
Inc PC; get next opcode; write top of stack 1
1
1
Update TOS
1 iload5 TOS = MDR; goto Main1
1
Main1
PC
=
PC
+
1;
fetch;
goto
(MBR)
MBR
holds
opcode;
get
next
byte;
dispatch
1222222222222222222222222222222222222222222222222222222222222222222222222222222221
Figure 4-25. Mic-1 code for executing ILOAD.
22222222222222222222222222222222222222222222222222222222222222222222222222222222
122222222222222222222222222222222222222222222222222222222222222222222222222222222
1
Label
Operations
Comments
1
1
MAR = address of local variable to push
1 iload1 MAR = MBRU + LV; rd
1
SP points to new top of stack; prepare write 1
1 iload2 MAR = SP = SP + 1
1 iload3 PC = PC + 1; fetch; wr
1
Inc PC; get next opcode; write top of stack
1 iload4 TOS = MDR
1
Update TOS
1
1
iload5 PC = PC + 1; fetch; goto (MBR) MBR already holds opcode; fetch index byte 1
122222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-26. Three-bus code for executing ILOAD.
MBR2
Shift register
From memory
IMAR
MBR1
+1
2 low-order bits
C bus
PC
+1, 2
Write PC
Figure 4-27. A fetch unit for the Mic-1.
B bus
Word fetched
Word fetched
Word fetched
0
MBR1
1
MBR1
2
MBR1
3
MBR1
4
MBR1
5
MBR1
6
MBR2
MBR2
MBR2
MBR2
MBR2
Transitions
MBR1: Occurs when MBR1 is read
MBR2: Occurs when MBR2 is read
Word fetched: Occurs when a memory word is read and 4 bytes are put into the shift register
Figure 4-28. A finite state machine for implementing the IFU.
Memory
control
registers
MAR
To
and
from
main
memory
MDR
PC
Instruction
fetch unit
(IFU)
MBR
MBR2
SP
LV
Control signals
CPP
Enable onto B bus
TOS
Write C bus to register
C bus
OPC
B bus
H
A bus
ALU
control
6
ALU
Shifter
Figure 4-29. The datapath for Mic-2.
N
Z
Operations
Comments
2Label
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
goto (MBR)
Branch to next instruction
2nop1
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iadd1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
iadd2
H = TOS
H = top of stack
MDR = TOS = MDR+H; wr; goto (MBR1)
Add top two words; write to new top of stack
2iadd3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
isub1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
isub2
H = TOS
H = top of stack
isub3
MDR = TOS = MDR−H; wr; goto (MBR1)
Subtract TOS from Fetched TOS-1
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iand1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
iand2
H = TOS
H = top of stack
iand3
MDR = TOS = MDR AND H; wr; goto (MBR1) AND Fetched TOS-1 with TOS
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
ior1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
ior2
H = TOS
H = top of stack
MDR = TOS = MDR OR H; wr; goto (MBR1) OR Fetched TOS-1 with TOS
2ior3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
dup1
MAR = SP = SP + 1
Increment SP; copy to MAR
MDR = TOS; wr; goto (MBR1)
Write new stack word
2dup2
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
pop1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
pop2
Wait for read
TOS = MDR; goto (MBR1)
Copy new word to TOS
2pop3
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
swap1
MAR = SP − 1; rd
Read 2nd word from stack; set MAR to SP
swap2
MAR = SP
Prepare to write new 2nd word
swap3
H = MDR; wr
Save new TOS; write 2nd word to stack
swap4
MDR = TOS
Copy old TOS to MDR
swap5
MAR = SP − 1; wr
Write old TOS to 2nd place on stack
TOS = H; goto (MBR1)
Update TOS
2swap6
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
bipush1
SP = MAR = SP + 1
Set up MAR for writing to new top of stack
MDR = TOS = MBR1; wr; goto (MBR1)
Update stack in TOS and memory
2bipush2
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iload1
MAR = LV + MBR1U; rd
Move LV + index to MAR; read operand
iload2
MAR = SP = SP + 1
Increment SP; Move new SP to MAR
iload3
TOS = MDR; wr; goto (MBR1)
Update stack in TOS and memory
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
istore1
MAR = LV + MBR1U
Set MAR to LV + index
istore2
MDR = TOS; wr
Copy TOS for storing
istore3
MAR = SP = SP − 1; rd
Decrement SP; read new TOS
istore4
Wait for read
TOS = MDR; goto (MBR1)
Update TOS
2istore5
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
goto (MBR1 OR 0x100)
Next address is 0x100 Ored with opcode
2wide1
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
wide
3
iload1
MAR
=
LV
+
MBR2U;
rd;
goto
iload2
Identical to iload1 but using 2-byte index
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
3istore1 MAR = LV + MBR2U; goto istore2
Identical to istore1 but using 2-byte index
2wide
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
3w1
MAR = CPP + MBR2U; rd; goto iload2
Same as wide 3iload1 but indexing off CPP
2ldc
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iinc1
MAR = LV + MBR1U; rd
Set MAR to LV + index for read
iinc2
H = MBR1
Set H to constant
iinc3
MDR = MDR + H; wr; goto (MBR1)
Increment by constant and update
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
goto1
H = PC − 1
Copy PC to H
goto2
PC = H + MBR2
Add offset and update PC
goto3
Have to wait for IFU to fetch new opcode
goto (MBR1)
Dispatch to next instruction
2goto4
2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
iflt1
MAR = SP = SP − 1; rd
Read in next-to-top word on stack
iflt2
OPC = TOS
Save TOS in OPC temporarily
iflt3
TOS = MDR
Put new top of stack in TOS
iflt4
N = OPC; if (N) goto T; else goto F
Branch on N bit
Figure 4-30. The microprogram for the Mic-2 (part 1 of 2).
Operations
Comments
2Label
22222222222222222222222222222222222222222222222222222222222222222222222222222222222222
ifeq1
MAR = SP = SP − 1; rd
Read in next-to-top word of stack
ifeq2
OPC = TOS
Save TOS in OPC temporarily
ifeq3
TOS = MDR
Put new top of stack in TOS
ifeq4
Z
=
OPC;
if
(Z)
goto
T;
else
goto
F
Branch on Z bit
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
if3icmpeq1
MAR = SP = SP − 1; rd
Read in next-to-top word of stack
if3icmpeq2
MAR = SP = SP − 1
Set MAR to read in new top-of-stack
if3icmpeq3
H = MDR; rd
Copy second stack word to H
if3icmpeq4
OPC = TOS
Save TOS in OPC temporarily
if3icmpeq5
TOS = MDR
Put new top of stack in TOS
if
3
icmpeq6
Z
=
H
−
OPC;
if
(Z)
goto
T;
else
goto
F
If top 2 words are equal, goto T, else goto F
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
T
H = PC − 1; goto goto2
Same as goto1
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
F
H = MBR2
Touch bytes in MBR2 to discard
F2
goto
(MBR1)
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
invokevirtual1
MAR = CPP + MBR2U; rd
Put address of method pointer in MAR
invokevirtual2
OPC = PC
Save Return PC in OPC
invokevirtual3
PC = MDR
Set PC to 1st byte of method code.
invokevirtual4
TOS = SP − MBR2U
TOS = address of OBJREF − 1
invokevirtual5
TOS = MAR = H = TOS + 1
TOS = address of OBJREF
invokevirtual6
MDR = SP + MBR2U + 1; wr
Overwrite OBJREF with link pointer
invokevirtual7
MAR = SP = MDR
Set SP, MAR to location to hold old PC
invokevirtual8
MDR = OPC; wr
Prepare to save old PC
invokevirtual9
MAR = SP = SP + 1
Inc. SP to point to location to hold old LV
invokevirtual10 MDR = LV; wr
Save old LV
invokevirtual11
LV
=
TOS;
goto
(MBR1)
Set
LV to point to zeroth parameter.
222222222222222222222222222222222222222222222222222222222222222222222222222222222222222
ireturn1
MAR = SP = LV; rd
Reset SP, MAR to read Link ptr
ireturn2
Wait for link ptr
ireturn3
LV = MAR = MDR; rd
Set LV, MAR to link ptr; read old PC
ireturn4
MAR = LV + 1
Set MAR to point to old LV; read old LV
ireturn5
PC = MDR; rd
Restore PC
ireturn6
MAR = SP
ireturn7
LV = MDR
Restore LV
ireturn8
MDR = TOS; wr; goto (MBR1)
Save return value on original top of stack
Figure 4-30. The microprogram for the Mic-2 (part 2 of 2).
Memory
control
registers
MAR
To
and
from
main
memory
MDR
PC
Instruction
fetch unit
(IFU)
MBR1
MBR2
SP
LV
Control signals
CPP
Enable onto B bus
TOS
Write C bus to register
OPC
C bus
B bus
H
A bus
C latch
A latch
ALU
control
B latch
6
ALU
Shifter
Figure 4-31. The three-bus data path used in the Mic-3.
N
Z
22222222222222222222222222222222222222222222222222222222222222222222222222222222
122222222222222222222222222222222222222222222222222222222222222222222222222222222
1
Label
Operations
Comments
1 swap1 MAR = SP − 1; rd
1
Read 2nd word from stack; set MAR to SP
1
1
Prepare to write new 2nd word
1 swap2 MAR = SP
1
Save new TOS; write 2nd word to stack
1 swap3 H = MDR; wr
1
swap4
MDR
=
TOS
Copy
old
TOS
to
MDR
1
1
1 swap5 MAR = SP − 1; wr
1
Write old TOS to 2nd place on stack
1122222222222222222222222222222222222222222222222222222222222222222222222222222222
11
swap6 TOS = H; goto (MBR1)
Update TOS
Figure 4-32. The Mic-2 code for SWAP.
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1 Swap2 1 Swap3 1 Swap4 1
1
1
Swap1
Swap5
Swap6
1
1
1
1
1
1
1
1
Cy 1 MAR=SP−1;rd 1 MAR=SP 1 H=MDR;wr 1 MDR=TOS 1 MAR=SP−1;wr 1 TOS=H;goto (MBR1) 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1 1 B=SP
1
1
1
1
1
1
1
1
2 1 C=B−1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 B=SP
1
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
3 MAR=C; rd
C=B
1
1
1
1
1
1
1
1
4 1 MDR=mem 1 MAR=C 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
5
B=MDR
1
1
1
1
1
1
1
1
6 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1 C=B
1 B=TOS
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
7
H=C; wr
C=B
B=SP
1
1
1
1
1
1
1
1
8 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1 Mem=MDR 1 MDR=C 1 C=B−1
1 B=H
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
9
MAR=C; wr
C=B
1
1
1
1
1
1
1
1
10 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1 Mem=MDR
1 TOS=C
1
11222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
11
11 1
1
1
1
1
1 goto (MBR1)
Figure 4-33. The implementation of SWAP on the Mic-3.
IFU
IFU
Reg
A
C
B
IFU
Reg
A
C
B
IFU
Reg
A
C
B
Reg
A
C
B
1
IFU
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
IFU
Reg
A
C
B
IFU
Reg
A
C
B
IFU
Reg
A
C
B
Reg
A
C
B
2
IFU
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
IFU
Reg
A
C
B
IFU
Reg
A
C
B
IFU
Reg
A
C
B
Reg
A
C
B
Instruction
3
IFU
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
IFU
Reg
A
C
B
IFU
Reg
A
C
B
IFU
Reg
A
C
B
Reg
A
C
B
4
Cycle 1
ALU
ALU
ALU
ALU
Shifter
Shifter
Shifter
Shifter
Cycle 2
Cycle 3
Cycle 4
Ti me
Figure 4-34. Graphical illustration of how a pipeline works.
Micro-op
ROM
index
IJVM
length
1
Final
Goto
Queueing unit
2
From
memory
3
Micro-operation ROM
IADD
ISUB
ILOAD
IFLT
Instruction
fetch unit
Decoding unit
Queue
of pending
micro-ops
To/from
memory
Drives stage 4
ALU
C
M A B
MIR1
Drives stage 5
ALU
C
M A B
MIR2
Drives stage 6
ALU
C
M A B
MIR3
Drives stage 7
ALU
C
M A B
MIR4
4
Registers
7
6
C
A
B
ALU
5
Shifter
Figure 4-35. The main components of the Mic-4.
1
2
3
4
5
6
7
IFU
Decoder
Queue
Operands
Exec
Write
back
Memory
Figure 4-36. The Mic-4 pipeline.
CPU
package
CPU chip
L1-I
Processor
board
L1-D
Unified
L2
cache
Keyboard
controller
Split L1 instruction and data caches
Unified
L3 cache
Graphics
controller
Main
memory
(DRAM)
Disk
controller
Board-level cache (SRAM)
Figure 4-37. A system with three levels of cache.
Valid
Addresses that use this entry
Tag
Entry
Data
2047
65504-65535, 131040-131072, …
7
6
5
4
3
2
1
0
96-127, 65632-65663, 131068-131099
64-95, 65600-65631, 131036-131067, …
32-63, 65568-65599, 131004-131035, …
0-31, 65536-65567, 131072-131003, …
(a)
Bits
16
11
TAG
LINE
3
2
WORD BYTE
(b)
Figure 4-38. (a) A direct-mapped cache. (b) A 32-bit virtual address.
Valid
Valid
Tag
Data
Valid
Tag
Data
Valid
Tag
Data
Tag
Data
2047
7
6
5
4
3
2
1
0
Entry A
Entry B
Entry C
Figure 4-39. A four-way associative cache.
Entry D
if (i == 0)
k = 1;
else
k = 2;
(a)
CMP i,0; compare i to 0
BNE Else; branch to Else if not equal
Then: MOV k,1; move 1 to k
BR Next; unconditional branch to Next
Else: MOV k,2; move 2 to k
Next:
(b)
Figure 4-40. (a) A program fragment. (b) Its translation to a
generic assembly language.
Branch/
no branch
Valid
Slot
Branch
address/tag
6
5
4
3
2
1
0
Valid
Slot
Prediction
Branch
bits
address/tag
6
5
4
3
2
1
0
(a)
Valid
Slot
Prediction
bits
Branch
Target
address/tag
address
6
5
4
3
2
1
0
(b)
(c)
Figure 4-41. (a) A 1-bit branch history. (b) A 2-bit branch history. (c) A mapping between branch instruction address and
target address.
Branch
No branch
00
Predict
no branch
Branch
No
branch
Branch
01
10
Predict
no branch
one more
time
Predict
branch
one more
time
No
branch
11
Branch
Predict
branch
No branch
Figure 4-42. A 2-bit finite-state machine for branch prediction.
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 Registers being read 1 1 Registers being written 1
1
11
11
1
Cy 1 # 1 Decoded 1 Iss 1 Ret 1 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 1 1 1 1 1
1 1 11 1 11 R3=R0 R1 11 1 11
1 1 1 11 1 11 11 11 11 11 11 1 1 11
1 1 1 1 1 1 1 1
*
1
1
1
1
1
1 2 1 R4=R0+R2 1 2 1
1 1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 2 11 1 11 1 11 11 11 11 11 1 1 11
1
1
1
1
1 1 1 1 1 1 1 1 1 1
1 2 1 3 1 R5=R0+R1 1 3 1
11 3 1 2 1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11 3 1 2 1 1 1 1 1 1 1 11 1
4 1 R6=R1+R4 1 – 1
1
1 1 1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1 1 1
3 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 3 11 2 11 1 11 11 11 11 11 1 1 11
1
1
1
1
1 1 1 1 1 1 1
1 4 1
11 1
1 11 2 1 1 1 1 1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1 2 11 1 1 1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
3 11
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11 1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
1 5 1
11
11 1
4 1
1 1
1 1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1
1
1
1
1
1
1 5 1 R7=R1 *R2 1 5 1
1 2 1 1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
6 1 6 1 R1=R0−R2 1 – 1
11
2 1 1 1
1 1
11 1
1
1
1
1
1 1 1 1 1 1 1
1 7 1
11
11 1
1
1
1
4
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11
11 1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
8 1
5 11
11 1
1
1
1
1
1
1
1
1
1
1
1 9 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 11 11 11 1
1
1
6
1
1
11
1 7 1 R3=R3 R1 1 7 1
1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 11
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
* 1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 1 11 1 11 1 11 1 11 11 11 11 1 1 11 1 11 11 1 11 11 11 11 1
10 1
1
1
1
1
1
1
1 6 11 11
1 1 1 1 1 1 1 1 1 11 11 1
1 1 1 1 1 1 1 11
11 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
12 1
7 11
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
1
11
11 1
1
1
1
1
1
1
1
1
1
1
1
13
8 R1=R4+R4
8
2
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11
1 1 1 1 11 11 11 11 11 11 1
1
1
1
1
1
1
1
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11
11
1
14 1
1
1
1
1 1 1 1 2 1 1 1 11 1 1 1 1 1 1 1 1 1
1
1
1
11
11
11 8 1 1
11 11 11 11 11 11 11 1 1 11
11 11 11 11 11 11 1
1 15 11
222222222222222222222222222222222222222222222222222222222222222222222222222222222
Figure 4-43. Operation of a superscalar CPU with in-order issue and in-order completion.
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1 1 Registers being read 1 1 Registers being written 1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
11
11
1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Cy 11 # 11 Decoded 11 Iss 11 Ret 1 1 0 11 1 11 2 11 3 11 4 11 5 11 6 11 7 1 1 0 11 1 11 2 11 3 11 4 11 5 11 6 11 7 1
1 1 1 1 1 R3=R0 R1 1 1 1
11 1 1 1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1 11
*
1
1
1
1
1
1 2 1 R4=R0+R2 1 2 1
1 1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 2 11 1 11 1 11 11 11 11 11 1 1 11
1
1
1
1
1 1 1 1 1 1 1 1 1 1
1 2 1 3 1 R5=R0+R1 1 3 1
11 3 1 2 1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11 3 1 2 1 1 1 1 1 1 1 11 1
4 1 R6=R1+R4 1 – 1
1
1 1 1 1 1 1 1 1 1 1
1
1
1
1
1
1 1 1 1 1 1 1 1 1 1 1
1 3 11 5 11 R7=R1 *R2 11 5 11
1 1 3 11 3 11 2 11 11 11 11 11 1 1 11
1 1 1 1 1 1 1 1 1 1 1
1
6 1 S1=R0−R2 1 6 1
11 4 1 3 1 3 1 1 1 1 1 11 1
1
1 1 1 1 1 1 1
1
11 1
2 11 3 1 3 1 2 1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1 1 1
222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1 4 1
1 1 1 1 1 1 1 1 1 1 1
1 4 11
1 1 3 11 4 11 2 11 11 1 11 11 11 1 1 11
1
1
1
1 1 1 1 1 1 1 1 1 1 1
1
11 3 1 4 1 2 1 1 1 1 1 1 11 1
7 1 R3=R3 *S1 1 – 1
1
1 1 1 1 1 1 1
1
11 3 1 4 1 2 1 1 3 1 1 1 11 1
8 1 S2=R4+R4 1 8 1
1
1 1 1 1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1 1 11 2 1 3 1 2 1 1 3 1 1 1 11 1
1 1 1 1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1 1
3 11 1 2 2
3
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11 1
1
1
1
1
1
1
1
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 1 1 1 11 11 11 11 11 1 11 1 1
5 1
6 11
2 1 1 1
3 1
1
1
1
1
1
1
1
1
11
1
1 7 1
1 2 1 1 1 1 1 3 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 11
1 6 11
1
1
1
1
1 4 11
1 1 1 1 1 1 1 2 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1
1
1
1
1
1
1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1
1
5 11
1 1 2 1
1 1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
1
1
1
1
1
1
1
1
1 8 11
1 1 1 1 1 1 1 1 11 1
1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
7 1
11
1 1
11 1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
1 8 1
11
11 1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
11
11 1
1
1
1
1
1
1
1
1
1
1
1
1 1 1 1 1 1 1
1222222222222222222222222222222222222222222222222222222222222222222222222222222222
1 9 1
1
1
1 7 11 11
1 1 1 1 1 1 1 11 11 1
1 1 1 1 1 1 1
Figure 4-44. Operation of a superscalar CPU with out-oforder issue and out-of-order completion.
evensum = 0;
evensum = 0;
oddsum = 0;
oddsum = 0;
i = 0;
i = 0;
while (i < limit) {
i >= limit
while (i < limit)
k = i * i * i;
k = i * i * i;
if ((i/2) * 2) = = 0)
if ((i/2) * 2) = = 0)
evensum = evensum + k;
T
F
evensum = evensum + k;
else
oddsum = oddsum + k;
oddsum = oddsum + k;
i = i + 1;
i = i + 1;
}
(a)
(b)
Figure 4-45. (a) A program fragment. (b) The corresponding
basic block graph.
To level 2
cache
Local bus
to PCI
bridge
Bus interface unit
Level 1 I-cache
Fetch/Decode unit
Level 1 D-cache
Dispatch/Execute unit
Retire unit
Micro-operation pool
(ROB)
Figure 4-46. The Pentium II microarchitecture.
Level 1 I-cache
Pipeline stage
IFU0
Cache line fetcher
Next IP
IFU1
Instruction length decoder
Dynamic
branch predictor
IFU2
Instruction aligner
ID0
0
1
2
ID1
Micro-operation queuer
RAT
Register allocator
Micro-operation
sequencer
Static
branch predictor
ROB
Micro-operations go in the ROB
Figure 4-47. Internal structure of the Fetch/Decode unit (simplified).
Port 0
MMXExecution
unit
Floating-Point
execution unit
Integer
execution unit
Port 1
MMXExecution
unit
Floating-Point
execution unit
Integer
execution unit
Reservation
station
From/to
ROB
Port 2
Load
Unit
Loads
Port 3
Store
Unit
Stores
Port 4
Store
Unit
Stores
Figure 4-48. The Dispatch/Execute unit.
To main
memory
Memory interface unit
Level 2 cache
External cache unit
Prefetch/Dispatch unit
Level 1 cache
Grouping logic
Integer execution unit
Floating-point unit
Load/store unit
Integer registers
FP registers
Level 1 D-cache
ALU
ALU
FP ALU
FP ALU
Load
store
Graphics unit
Figure 4-49. The UltraSPARC II microarchitecture.
Store
queue
Integer pipeline
Execute
Fetch
Decode
Cache
N1
N2
Group
N3
Register
X1
X2
X3
Floating-point/graphics pipeline
Figure 4-50. The UltraSPARC II’s pipeline.
Write
Memory and I/O bus interface unit
32
32
0-16 KB
Instruction
cache
0-16 KB
Data
cache
32
Prefetch,
decode, and
folding unit
32
Execution
control
unit
Integer and
floating-point
unit
2 x 32
3 x 32
64 32-Bit registers for holding
the top 64 words of the stack
Figure 4-51. The block diagram of the picoJava II with both
level 1 caches and the floating-point unit. This is configuration
of the microJava 701.
Fetch
from
I-cache
Decode
and
fold
Fetch
operands
from
stack
Execute
instruction
Access
data
cache
Figure 4-52. The picoJava II has a six-stage pipeline.
Write
results
to stack
Without folding
SP
k
m
k
k+m
k
k
k
k
k
k
k
n
n
n
n
k+m
n
k+m
m
m
m
m
m
m
m
Start
After
folded
instruction
SP
SP
8
7
6
5
4
3
2
1
0
With folding
Start
After
ILOAD k
After
ILOAD m
(a)
After
IADD
After
ISTORE n
(b)
Figure 4-53. (a) Execution of a four-instruction sequence to
compute n = k + m. (b) The same sequence folded to one instruction.
22222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
Group
Description
Example
21 2222222222222222222222222222222222222222222222222222222222222222222222222222222
1 NF
1 Nonfoldable instructions
1 GOTO
1
21 2222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
LV
21 2222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Pushing a word onto the stack
1 ILOAD
1
122222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
MEM
Popping a word and storing it in memory
ISTORE
1 BG1
1 Operations using one stack operand
1 IFEQ
1
21 2222222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
BG2
122222222222222222222222222222222222222222222222222222222222222222222222222222222
1 Operations using two stack operands
1 IF3CMPEQ
1
122222222222222222222222222222222222222222222222222222222222222222222222222222222
OP
1 Computations on two operands with one result
1 IADD
1
Figure 4-54. JVM instruction groups for folding purposes.
222222222222222222222222222222222222222222222222222222222222222222222222222
1222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
Instruction sequence
Example
1
1
1
1 LV
1
1 MEM
LV
OP
ILOAD,
ILOAD,
IADD,
ISTORE
222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
LV
OP
ILOAD, ILOAD, IADD
1 LV
1
1
1 LV
1
1
1 LV
1
BG2 1
ILOAD, ILOAD, IF3CMPEQ
1222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
LV
BG1
1222222222222222222222222222222222222222222222222222222222222222222222222222
1 ILOAD, IFEQ
1
1
1
1
1 LV
1
1
ILOAD, IF3CMPEQ
1 BG2
1
1
222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1 MEM
1
1
LV
ILOAD, ISTORE
1222222222222222222222222222222222222222222222222222222222222222222222222222
1
1
1
1
1
1222222222222222222222222222222222222222222222222222222222222222222222222222
11 IADD, ISTORE
11
1 OP 1 MEM
1
1
Figure 4-55. Some of the JVM instruction sequences that can be folded.
Download