4 THE MICROARCHITECTURE LEVEL 1 MAR To and from main memory Memory control registers MDR PC MBR SP LV Control signals Enable onto B bus CPP Write C bus to register TOS OPC C bus B bus H A ALU control B 6 N Z ALU Shifter Shifter control 2 Figure 4-1. The data path of the example microarchitecture used in this chapter. 2222222222222222222222222222222222222222222222222 12222222222222222222222222222222222222222222222222 F 1 F 1 ENA 1 ENB 1 INVA 1 INC 1 Function 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 0 0 A 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 0 0 B 3 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 A 1 0 1 12222222222222222222222222222222222222222222222222 3 1 1 0 1 B 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 A+B 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 A+B+1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 A+1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 B + 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 B−A 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 B−1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 1 1 −A 1 0 1 12222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 0 0 A AND B 1 2222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 0 1 A OR B 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 0 1 0 0 0 1 1 1 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 12222222222222222222222222222222222222222222222222 1 1 0 1 −1 1 1 0 1 0 1 1 1 0 Figure 4-2. Useful combinations of ALU signals and the function performed. Registers loaded instantaneously from C bus and memory on rising edge of clock Shifter output stable Cycle 1 starts here Clock cycle 1 ∆w ∆x Set up signals to drive data path Drive H and B bus ∆y Clock cycle 2 New MPC used to load MIR with next microinstruction here ∆z ALU and shifter MPC available here Propagation from shifter to registers Figure 4-3. Timing diagram of one data path cycle. 32-Bit MAR (counts in words) Discarded 0 0 32-Bit address bus (counts in bytes) Figure 4-4. Mapping of the bits in MAR to the address bus. Bits 9 3 NEXT_ADDRESS Addr J M P C J A M N 8 J A M Z JAM S L L 8 9 3 4 S F0 F1 E E I I H O T C L S P M M W R F R P O P V P C D A R E E N N N N I T R R T A C A C S P A B V C 1 A E D H ALU C Mem B bus B B bus registers 0 = MDR 1 = PC 2 = MBR 3 = MBRU 4 = SP Figure 4-5. The microinstruction format for the Mic-1. 5 = LV 6 = CPP 7 = TOS 8 = OPC 9 -15 none Memory control signals (rd, wr, fetch) 3 4 4-to-16 Decoder MAR MDR MPC 9 PC O 8 MBR SP 512 × 36-Bit control store for holding the microprogram 8 LV JMPC CPP Addr J ALU C MIR M B TOS JAMN/JAMZ OPC H B bus 2 1-bit flip–flop N 6 ALU control High bit ALU Control signals Enable onto B bus Z Shifter C bus 2 Write C bus to register Figure 4-6. The complete block diagram of our example microarchitecture, the Mic-1. Address Addr JAM 0x75 0x92 001 Data path control bits JAMZ bit set … 0x92 … 0x192 One of these will follow 0x75 depending on Z Figure 4-7. A microinstruction with JAMZ set to 1 has two potential successors. SP LV SP LV SP LV a3 a2 a1 (a) 108 104 100 b4 b3 b2 b1 a3 a2 a1 c2 c1 b4 b3 b2 b1 a3 a2 a1 (b) (c) SP LV d5 d4 d3 d2 d1 a3 a2 a1 (d) Figure 4-8. Use of a stack for storing local variables. (a) While A is active. (b) After A calls B. (c) After B calls C. (d) After C and B return and A calls D. , , , SP SP LV a2 a3 a2 a1 (a) LV a3 a2 a3 a2 a1 (b) SP LV a2 + a3 a3 a2 a1 (c) SP LV a3 a2 a2 + a3 (d) Figure 4-9. Use of an operand stack for doing an arithmetic computation. Current Operand Stack 3 SP Current Local Variable Frame 3 LV Local Variable Frame 2 Constant Pool Local Variable Frame 1 Method Area CPP Figure 4-10. The various parts of the IJVM memory. PC 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Hex 1 Mnemonic Meaning 1 1 1 1 0x10 1 BIPUSH byte 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Push byte onto stack 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0x59 DUP Copy top word on stack and push onto stack 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 0xA7 1 GOTO offset 1 Unconditional branch 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop two words from stack; push their sum 1 0x60 1 IADD 1 1 1 1 0x7E IAND Pop two words from stack; push Boolean AND 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x99 1 IFEQ offset 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop word from stack and branch if it is zero 1 1 0x9B 1 IFLT offset 1 Pop word from stack and branch if it is less than zero 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 offset Pop two words from stack; branch if equal 0x9F IF 3 ICMPEQ 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x84 1 IINC varnum const 1 Add a constant to a local variable 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x15 1 ILOAD varnum 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Push local variable onto stack 1 1 0xB6 1 INVOKEVIRTUAL disp 1 Invoke a method 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 0x80 1 IOR 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop two words from stack; push Boolean OR 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0xAC IRETURN Return from method with integer value 1 1 1 1 0x36 1 ISTORE varnum 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pop word from stack and store in local variable 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 0x64 ISUB Pop two words from stack; push their difference 1 1 1 1 index Push constant from constant pool onto stack 0x13 LDC 3 W 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Do nothing 1 0x00 1 NOP 1 1 1 1 0x57 1 POP 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Delete word on top of stack 1 0x5F 1 SWAP 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Swap the two top words on the stack 1 1 0xC4 1 WIDE 1 Prefix instruction; next instruction has a 16-bit index 1 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 Figure 4-11. The IJVM instruction set. The operands byte, const, and varnum are 1 byte. The operands disp, index, and offset are 2 bytes. Stack after INVOKEVIRTUAL Caller's LV Caller's PC Space for caller's local variables Stack before INVOKEVIRTUAL Pushed parameters Caller's local variable frame Parameter 3 Parameter 2 Parameter 1 OBJREF Previous LV Previous PC SP Caller's local variables Parameter 2 Parameter 1 Link ptr (a) SP Stack base after INVOKEVIRTUAL Stack base before INVOKEVIRTUAL LV Parameter 3 Parameter 2 Parameter 1 Link ptr Previous LV Previous PC Caller's local variables Parameter 2 Parameter 1 Link ptr (b) Figure 4-12. (a) Memory before executing INVOKEVIRTUAL. (b) After executing it. LV Stack before IRETURN Return value Previous LV Previous PC SP Caller's local variables Parameter 3 Parameter 2 Parameter 1 Link ptr Previous LV Previous PC Caller's local variable frame Caller's local variables Parameter 2 Parameter 1 Link ptr (a) Stack base before IRETURN LV Stack after IRETURN Return value Previous LV Previous PC Stack base after IRETURN SP Caller's local variables Parameter 2 Parameter 1 Link ptr LV (b) Figure 4-13. (a) Memory before executing IRETURN. (b) After executing it. i = j + k; if (i == 3) k = 0; else j = j − 1; (a) 1 ILOAD j // i = j + k 2 ILOAD k 3 IADD 4 ISTORE i 5 ILOAD i // if (i < 3) 6 BIPUSH 3 7 IF3ICMPEQ L1 8 ILOAD j // j = j − 1 9 BIPUSH 1 10 ISUB 11 ISTORE j 12 GOTO L2 13 L1: BIPUSH 0 14 ISTORE k 15 L2: (b) 0x15 0x02 0x15 0x03 0x60 0x36 0x01 0x15 0x01 0x10 0x03 0x9F 0x00 0x0D 0x15 0x02 0x10 0x01 0x64 0x36 0x02 0xA7 0x00 0x07 // k = 0 0x10 0x00 0x36 0x03 (c) Figure 4-14. (a) A Java fragment. (b) The corresponding Java assembly language. (c) The IJVM program in hexadecimal. 0 j 1 k j 2 j+k 3 j 8 1 j 9 j–1 10 11 4 j 5 3 j 6 7 12 0 13 14 15 Figure 4-15. The stack after each instruction of Fig. 4-14(b). 222222222222222222222222222 1222222222222222222222222222 1 DEST = H 1 1 DEST = SOURCE 2 22222222222222222222222222 1 1 33 1222222222222222222222222222 1 DEST = H 3 33333333 1 1 DEST = SOURCE 21 22222222222222222222222222 1 1 DEST = H + SOURCE 21 22222222222222222222222222 1 DEST = H + SOURCE + 1 1 21 222222222222222222222222221 DEST = H + 1 1 21 22222222222222222222222222 1 DEST = SOURCE + 1 1 21 22222222222222222222222222 1 DEST = SOURCE − H 21 222222222222222222222222221 1 DEST = SOURCE − 1 21 22222222222222222222222222 1 1 DEST = −H 21 22222222222222222222222222 1 DEST = H AND SOURCE 1 21 22222222222222222222222222 1 DEST = H OR SOURCE 1 21 222222222222222222222222221 DEST = 0 1 21 22222222222222222222222222 1 DEST = 1 1 21 22222222222222222222222222 1 DEST = −1 12222222222222222222222222221 Figure 4-16. All permitted operations. Any of the above operations may be extended by adding ‘‘<< 8’’ to them to shift the result left by 1 byte. For example, a common operation is H = MBR < < 8 Operations Comments 2Label 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch 2Main1 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 nop1 goto Main1 Do nothing 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iadd1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iadd2 H = TOS H = top of stack MDR = TOS = MDR + H; wr; goto Main1 Add top two words; write to top of stack 2iadd3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 isub1 MAR = SP = SP − 1; rd Read in next-to-top word on stack isub2 H = TOS H = top of stack isub3 MDR = TOS = MDR − H; wr; goto Main1 Do subtraction; write to top of stack 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iand1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iand2 H = TOS H = top of stack iand3 MDR = TOS = MDR AND H; wr; goto Main1 Do AND; write to new top of stack 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ior1 MAR = SP = SP − 1; rd Read in next-to-top word on stack ior2 H = TOS H = top of stack ior3 MDR = TOS = MDR OR H; wr; goto Main1 Do OR; write to new top of stack 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 dup1 MAR = SP = SP + 1 Increment SP and copy to MAR dup2 MDR = TOS; wr; goto Main1 Write new stack word 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 pop1 MAR = SP = SP − 1; rd Read in next-to-top word on stack pop2 Wait for new TOS to be read from memory TOS = MDR; goto Main1 Copy new word to TOS 2pop3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 swap1 MAR = SP − 1; rd Set MAR to SP − 1; read 2nd word from stack swap2 MAR = SP Set MAR to top word swap3 H = MDR; wr Save TOS in H; write 2nd word to top of stack swap4 MDR = TOS Copy old TOS to MDR swap5 MAR = SP − 1; wr Set MAR to SP − 1; write as 2nd word on stack TOS = H; goto Main1 Update TOS 2swap6 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 bipush1 SP = MAR = SP + 1 MBR = the byte to push onto stack bipush2 PC = PC + 1; fetch Increment PC, fetch next opcode MDR = TOS = MBR; wr; goto Main1 Sign-extend constant and push on stack 2bipush3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iload1 H = LV MBR contains index; copy LV to H iload2 MAR = MBRU + H; rd MAR = address of local variable to push iload3 MAR = SP = SP + 1 SP points to new top of stack; prepare write iload4 PC = PC + 1; fetch; wr Inc PC; get next opcode; write top of stack TOS = MDR; goto Main1 Update TOS 2iload5 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 istore1 H = LV MBR contains index; Copy LV to H istore2 MAR = MBRU + H MAR = address of local variable to store into istore3 MDR = TOS; wr Copy TOS to MDR; write word istore4 SP = MAR = SP − 1; rd Read in next-to-top word on stack istore5 PC = PC + 1; fetch Increment PC; fetch next opcode istore6 TOS = MDR; goto Main1 Update TOS 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide1 PC = PC + 1; fetch; goto (MBR OR 0x100) Multiway branch with high bit set 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide3iload1 PC = PC + 1; fetch MBR contains 1st index byte; fetch 2nd wide3iload2 H = MBRU << 8 H = 1st index byte shifted left 8 bits wide3iload3 H = MBRU OR H H = 16-bit index of local variable 3iload4 MAR = LV + H; rd; goto iload3 MAR = address of local variable to push 2wide 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide3istore1 PC = PC + 1; fetch MBR contains 1st index byte; fetch 2nd wide3istore2 H = MBRU << 8 H = 1st index byte shifted left 8 bits wide3istore3 H = MBRU OR H H = 16-bit index of local variable wide 3 istore4 MAR = LV + H; goto istore3 MAR = address of local variable to store into 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ldc3w1 PC = PC + 1; fetch MBR contains 1st index byte; fetch 2nd ldc3w2 H = MBRU << 8 H = 1st index byte << 8 ldc3w3 H = MBRU OR H H = 16-bit index into constant pool ldc3w4 MAR = H + CPP; rd; goto iload3 MAR = address of constant in pool Figure 4-17. The microprogram for the Mic-1 (part 1 of 3). Operations Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iinc1 H = LV MBR contains index; Copy LV to H iinc2 MAR = MBRU + H; rd Copy LV + index to MAR; Read variable iinc3 PC = PC + 1; fetch Fetch constant iinc4 H = MDR Copy variable to H iinc5 PC = PC + 1; fetch Fetch next opcode MDR = MBR + H; wr; goto Main1 Put sum in MDR; update variable 2iinc6 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto1 OPC = PC − 1 Save address of opcode. goto2 PC = PC + 1; fetch MBR = 1st byte of offset; fetch 2nd byte goto3 H = MBR << 8 Shift and save signed first byte in H goto4 H = MBRU OR H H = 16-bit branch offset goto5 PC = OPC + H; fetch Add offset to OPC goto Main1 Wait for fetch of next opcode 2goto6 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iflt1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iflt2 OPC = TOS Save TOS in OPC temporarily iflt3 TOS = MDR Put new top of stack in TOS iflt4 N = OPC; if (N) goto T; else goto F Branch on N bit 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ifeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack ifeq2 OPC = TOS Save TOS in OPC temporarily ifeq3 TOS = MDR Put new top of stack in TOS Z = OPC; if (Z) goto T; else goto F Branch on Z bit 2ifeq4 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 if3icmpeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack if3icmpeq2 MAR = SP = SP − 1 Set MAR to read in new top-of-stack if3icmpeq3 H = MDR; rd Copy second stack word to H if3icmpeq4 OPC = TOS Save TOS in OPC temporarily if3icmpeq5 TOS = MDR Put new top of stack in TOS 3icmpeq6 Z = OPC − H; if (Z) goto T; else goto F If top 2 words are equal, goto T, else goto F 2if22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 T OPC = PC − 1; fetch; goto goto2 Same as goto1; needed for target address 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 F PC = PC + 1 Skip first offset byte F2 PC = PC + 1; fetch PC now points to next opcode F3 goto Main1 Wait for fetch of opcode 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 invokevirtual1 PC = PC + 1; fetch MBR = index byte 1; inc. PC, get 2nd byte invokevirtual2 H = MBRU << 8 Shift and save first byte in H invokevirtual3 H = MBRU OR H H = offset of method pointer from CPP invokevirtual4 MAR = CPP + H; rd Get pointer to method from CPP area invokevirtual5 OPC = PC + 1 Save Return PC in OPC temporarily invokevirtual6 PC = MDR; fetch PC points to new method; get param count invokevirtual7 PC = PC + 1; fetch Fetch 2nd byte of parameter count invokevirtual8 H = MBRU << 8 Shift and save first byte in H invokevirtual9 H = MBRU OR H H = number of parameters invokevirtual10 PC = PC + 1; fetch Fetch first byte of # locals invokevirtual11 TOS = SP − H TOS = address of OBJREF − 1 invokevirtual12 TOS = MAR = TOS + 1 TOS = address of OBJREF (new LV) invokevirtual13 PC = PC + 1; fetch Fetch second byte of # locals invokevirtual14 H = MBRU << 8 Shift and save first byte in H invokevirtual15 H = MBRU OR H H = # locals invokevirtual16 MDR = SP + H + 1; wr Overwrite OBJREF with link pointer invokevirtual17 MAR = SP = MDR; Set SP, MAR to location to hold old PC invokevirtual18 MDR = OPC; wr Save old PC above the local variables invokevirtual19 MAR = SP = SP + 1 SP points to location to hold old LV invokevirtual20 MDR = LV; wr Save old LV above saved PC invokevirtual21 PC = PC + 1; fetch Fetch first opcode of new method. invokevirtual22 LV = TOS; goto Main1 Set LV to point to LV Frame Figure 4-17. The microprogram for the Mic-1 (part 2 of 3). Label Operations Comments 2222222222222222222222222222222222222222222222222222222222222222222222 ireturn1 MAR = SP = LV; rd Reset SP, MAR to get link pointer ireturn2 Wait for read ireturn3 LV = MAR = MDR; rd Set LV to link ptr; get old PC ireturn4 MAR = LV + 1 Set MAR to read old LV ireturn5 PC = MDR; rd; fetch Restore PC; fetch next opcode ireturn6 MAR = SP Set MAR to write TOS ireturn7 LV = MDR Restore LV ireturn8 MDR = TOS; wr; goto Main1 Save return value on original top of stack Figure 4-17. The microprogram for the Mic-1 (part 3 of 3). BIPUSH (0×10) BYTE Figure 4-18. The BIPUSH instruction format. ILOAD (0x15) (a) INDEX WIDE (0xC4) ILOAD (0x15) INDEX BYTE 1 INDEX BYTE 2 (b) Figure 4-19. (a) ILOAD with a 1-byte index. (b) WIDE ILOAD with a 2-byte index. Address 0×1FF Control store Microinstruction execution order 0×115 wide_iload1 0×100 Main1 0×C4 wide1 0×15 iload1 WIDE ILOAD ILOAD 3 1 1 2 2 0×00 Figure 4-20. The initial microinstruction sequence for ILOAD and WIDE ILOAD. The addresses are examples. IINC (0x84) INDEX CONST Figure 4-21. The IINC instruction has two different operand fields. Memory 1 Byte n+3 n + 2 OFFSET BYTE 2 OFFSET BYTE 2 OFFSET BYTE 2 OFFSET BYTE 2 OFFSET BYTE 2 n + 1 OFFSET BYTE 1 OFFSET BYTE 1 OFFSET BYTE 1 OFFSET BYTE 1 OFFSET BYTE 1 GOTO (0xA7) GOTO (0xA7) GOTO (0xA7) GOTO (0xA7) GOTO (0xA7) n n+1 n+1 n+2 n+2 n n n OFFSET BYTE 1 OFFSET BYTE 1 OFFSET BYTE 2 n Registers PC OPC MBR 0xA7 OFFSET BYTE 1 OFFSET 1 << 8 H (a) (b) (c) (d) (e) Figure 4-22. The situation at the start of various microinstructions. (a) Main1 . (b) goto1. (c) goto2. (d) goto3. (e) goto4. 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 MAR = SP = SP − 1; rd Read in next-to-top word on stack 1 pop1 1 Wait for new TOS to be read from memory 1 1 pop2 1 pop3 1 TOS = MDR; goto Main1 Copy new word to TOS 1 Main1 PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch11 122222222222222222222222222222222222222222222222222222222222222222222222222222222 Figure 4-23. New microprogram sequence for executing POP. 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label 1 Operations Comments 222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 MAR = SP = SP − 1; rd Read in next-to-top word on stack 1 pop1 1 MBR holds opcode; fetch next byte 1 Main1.pop PC = PC + 1; fetch 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222 TOS = MDR; goto (MBR) Copy new word to TOS; dispatch on opcode 11 1 pop3 Figure 4-24. Enhanced microprogram sequence for executing POP. 22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 1 1 MBR contains index; Copy LV to H 1 iload1 H = LV 1 MAR = address of local variable to push 1 iload2 MAR = MBRU + H; rd 1 1 iload3 MAR = SP = SP + 1 SP points to new top of stack; prepare write 1 1 iload4 PC = PC + 1; fetch; wr Inc PC; get next opcode; write top of stack 1 1 1 Update TOS 1 iload5 TOS = MDR; goto Main1 1 Main1 PC = PC + 1; fetch; goto (MBR) MBR holds opcode; get next byte; dispatch 1222222222222222222222222222222222222222222222222222222222222222222222222222222221 Figure 4-25. Mic-1 code for executing ILOAD. 22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 1 1 MAR = address of local variable to push 1 iload1 MAR = MBRU + LV; rd 1 SP points to new top of stack; prepare write 1 1 iload2 MAR = SP = SP + 1 1 iload3 PC = PC + 1; fetch; wr 1 Inc PC; get next opcode; write top of stack 1 iload4 TOS = MDR 1 Update TOS 1 1 iload5 PC = PC + 1; fetch; goto (MBR) MBR already holds opcode; fetch index byte 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 Figure 4-26. Three-bus code for executing ILOAD. MBR2 Shift register From memory IMAR MBR1 +1 2 low-order bits C bus PC +1, 2 Write PC Figure 4-27. A fetch unit for the Mic-1. B bus Word fetched Word fetched Word fetched 0 MBR1 1 MBR1 2 MBR1 3 MBR1 4 MBR1 5 MBR1 6 MBR2 MBR2 MBR2 MBR2 MBR2 Transitions MBR1: Occurs when MBR1 is read MBR2: Occurs when MBR2 is read Word fetched: Occurs when a memory word is read and 4 bytes are put into the shift register Figure 4-28. A finite state machine for implementing the IFU. Memory control registers MAR To and from main memory MDR PC Instruction fetch unit (IFU) MBR MBR2 SP LV Control signals CPP Enable onto B bus TOS Write C bus to register C bus OPC B bus H A bus ALU control 6 ALU Shifter Figure 4-29. The datapath for Mic-2. N Z Operations Comments 2Label 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto (MBR) Branch to next instruction 2nop1 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iadd1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iadd2 H = TOS H = top of stack MDR = TOS = MDR+H; wr; goto (MBR1) Add top two words; write to new top of stack 2iadd3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 isub1 MAR = SP = SP − 1; rd Read in next-to-top word on stack isub2 H = TOS H = top of stack isub3 MDR = TOS = MDR−H; wr; goto (MBR1) Subtract TOS from Fetched TOS-1 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iand1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iand2 H = TOS H = top of stack iand3 MDR = TOS = MDR AND H; wr; goto (MBR1) AND Fetched TOS-1 with TOS 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ior1 MAR = SP = SP − 1; rd Read in next-to-top word on stack ior2 H = TOS H = top of stack MDR = TOS = MDR OR H; wr; goto (MBR1) OR Fetched TOS-1 with TOS 2ior3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 dup1 MAR = SP = SP + 1 Increment SP; copy to MAR MDR = TOS; wr; goto (MBR1) Write new stack word 2dup2 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 pop1 MAR = SP = SP − 1; rd Read in next-to-top word on stack pop2 Wait for read TOS = MDR; goto (MBR1) Copy new word to TOS 2pop3 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 swap1 MAR = SP − 1; rd Read 2nd word from stack; set MAR to SP swap2 MAR = SP Prepare to write new 2nd word swap3 H = MDR; wr Save new TOS; write 2nd word to stack swap4 MDR = TOS Copy old TOS to MDR swap5 MAR = SP − 1; wr Write old TOS to 2nd place on stack TOS = H; goto (MBR1) Update TOS 2swap6 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 bipush1 SP = MAR = SP + 1 Set up MAR for writing to new top of stack MDR = TOS = MBR1; wr; goto (MBR1) Update stack in TOS and memory 2bipush2 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iload1 MAR = LV + MBR1U; rd Move LV + index to MAR; read operand iload2 MAR = SP = SP + 1 Increment SP; Move new SP to MAR iload3 TOS = MDR; wr; goto (MBR1) Update stack in TOS and memory 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 istore1 MAR = LV + MBR1U Set MAR to LV + index istore2 MDR = TOS; wr Copy TOS for storing istore3 MAR = SP = SP − 1; rd Decrement SP; read new TOS istore4 Wait for read TOS = MDR; goto (MBR1) Update TOS 2istore5 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto (MBR1 OR 0x100) Next address is 0x100 Ored with opcode 2wide1 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 wide 3 iload1 MAR = LV + MBR2U; rd; goto iload2 Identical to iload1 but using 2-byte index 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3istore1 MAR = LV + MBR2U; goto istore2 Identical to istore1 but using 2-byte index 2wide 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 3w1 MAR = CPP + MBR2U; rd; goto iload2 Same as wide 3iload1 but indexing off CPP 2ldc 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iinc1 MAR = LV + MBR1U; rd Set MAR to LV + index for read iinc2 H = MBR1 Set H to constant iinc3 MDR = MDR + H; wr; goto (MBR1) Increment by constant and update 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 goto1 H = PC − 1 Copy PC to H goto2 PC = H + MBR2 Add offset and update PC goto3 Have to wait for IFU to fetch new opcode goto (MBR1) Dispatch to next instruction 2goto4 2222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 iflt1 MAR = SP = SP − 1; rd Read in next-to-top word on stack iflt2 OPC = TOS Save TOS in OPC temporarily iflt3 TOS = MDR Put new top of stack in TOS iflt4 N = OPC; if (N) goto T; else goto F Branch on N bit Figure 4-30. The microprogram for the Mic-2 (part 1 of 2). Operations Comments 2Label 22222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ifeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack ifeq2 OPC = TOS Save TOS in OPC temporarily ifeq3 TOS = MDR Put new top of stack in TOS ifeq4 Z = OPC; if (Z) goto T; else goto F Branch on Z bit 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 if3icmpeq1 MAR = SP = SP − 1; rd Read in next-to-top word of stack if3icmpeq2 MAR = SP = SP − 1 Set MAR to read in new top-of-stack if3icmpeq3 H = MDR; rd Copy second stack word to H if3icmpeq4 OPC = TOS Save TOS in OPC temporarily if3icmpeq5 TOS = MDR Put new top of stack in TOS if 3 icmpeq6 Z = H − OPC; if (Z) goto T; else goto F If top 2 words are equal, goto T, else goto F 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 T H = PC − 1; goto goto2 Same as goto1 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 F H = MBR2 Touch bytes in MBR2 to discard F2 goto (MBR1) 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 invokevirtual1 MAR = CPP + MBR2U; rd Put address of method pointer in MAR invokevirtual2 OPC = PC Save Return PC in OPC invokevirtual3 PC = MDR Set PC to 1st byte of method code. invokevirtual4 TOS = SP − MBR2U TOS = address of OBJREF − 1 invokevirtual5 TOS = MAR = H = TOS + 1 TOS = address of OBJREF invokevirtual6 MDR = SP + MBR2U + 1; wr Overwrite OBJREF with link pointer invokevirtual7 MAR = SP = MDR Set SP, MAR to location to hold old PC invokevirtual8 MDR = OPC; wr Prepare to save old PC invokevirtual9 MAR = SP = SP + 1 Inc. SP to point to location to hold old LV invokevirtual10 MDR = LV; wr Save old LV invokevirtual11 LV = TOS; goto (MBR1) Set LV to point to zeroth parameter. 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222 ireturn1 MAR = SP = LV; rd Reset SP, MAR to read Link ptr ireturn2 Wait for link ptr ireturn3 LV = MAR = MDR; rd Set LV, MAR to link ptr; read old PC ireturn4 MAR = LV + 1 Set MAR to point to old LV; read old LV ireturn5 PC = MDR; rd Restore PC ireturn6 MAR = SP ireturn7 LV = MDR Restore LV ireturn8 MDR = TOS; wr; goto (MBR1) Save return value on original top of stack Figure 4-30. The microprogram for the Mic-2 (part 2 of 2). Memory control registers MAR To and from main memory MDR PC Instruction fetch unit (IFU) MBR1 MBR2 SP LV Control signals CPP Enable onto B bus TOS Write C bus to register OPC C bus B bus H A bus C latch A latch ALU control B latch 6 ALU Shifter Figure 4-31. The three-bus data path used in the Mic-3. N Z 22222222222222222222222222222222222222222222222222222222222222222222222222222222 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Label Operations Comments 1 swap1 MAR = SP − 1; rd 1 Read 2nd word from stack; set MAR to SP 1 1 Prepare to write new 2nd word 1 swap2 MAR = SP 1 Save new TOS; write 2nd word to stack 1 swap3 H = MDR; wr 1 swap4 MDR = TOS Copy old TOS to MDR 1 1 1 swap5 MAR = SP − 1; wr 1 Write old TOS to 2nd place on stack 1122222222222222222222222222222222222222222222222222222222222222222222222222222222 11 swap6 TOS = H; goto (MBR1) Update TOS Figure 4-32. The Mic-2 code for SWAP. 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Swap2 1 Swap3 1 Swap4 1 1 1 Swap1 Swap5 Swap6 1 1 1 1 1 1 1 1 Cy 1 MAR=SP−1;rd 1 MAR=SP 1 H=MDR;wr 1 MDR=TOS 1 MAR=SP−1;wr 1 TOS=H;goto (MBR1) 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 B=SP 1 1 1 1 1 1 1 1 2 1 C=B−1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 B=SP 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 3 MAR=C; rd C=B 1 1 1 1 1 1 1 1 4 1 MDR=mem 1 MAR=C 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 5 B=MDR 1 1 1 1 1 1 1 1 6 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 C=B 1 B=TOS 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 7 H=C; wr C=B B=SP 1 1 1 1 1 1 1 1 8 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Mem=MDR 1 MDR=C 1 C=B−1 1 B=H 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 9 MAR=C; wr C=B 1 1 1 1 1 1 1 1 10 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 Mem=MDR 1 TOS=C 1 11222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 11 11 1 1 1 1 1 1 goto (MBR1) Figure 4-33. The implementation of SWAP on the Mic-3. IFU IFU Reg A C B IFU Reg A C B IFU Reg A C B Reg A C B 1 IFU ALU ALU ALU ALU Shifter Shifter Shifter Shifter IFU Reg A C B IFU Reg A C B IFU Reg A C B Reg A C B 2 IFU ALU ALU ALU ALU Shifter Shifter Shifter Shifter IFU Reg A C B IFU Reg A C B IFU Reg A C B Reg A C B Instruction 3 IFU ALU ALU ALU ALU Shifter Shifter Shifter Shifter IFU Reg A C B IFU Reg A C B IFU Reg A C B Reg A C B 4 Cycle 1 ALU ALU ALU ALU Shifter Shifter Shifter Shifter Cycle 2 Cycle 3 Cycle 4 Ti me Figure 4-34. Graphical illustration of how a pipeline works. Micro-op ROM index IJVM length 1 Final Goto Queueing unit 2 From memory 3 Micro-operation ROM IADD ISUB ILOAD IFLT Instruction fetch unit Decoding unit Queue of pending micro-ops To/from memory Drives stage 4 ALU C M A B MIR1 Drives stage 5 ALU C M A B MIR2 Drives stage 6 ALU C M A B MIR3 Drives stage 7 ALU C M A B MIR4 4 Registers 7 6 C A B ALU 5 Shifter Figure 4-35. The main components of the Mic-4. 1 2 3 4 5 6 7 IFU Decoder Queue Operands Exec Write back Memory Figure 4-36. The Mic-4 pipeline. CPU package CPU chip L1-I Processor board L1-D Unified L2 cache Keyboard controller Split L1 instruction and data caches Unified L3 cache Graphics controller Main memory (DRAM) Disk controller Board-level cache (SRAM) Figure 4-37. A system with three levels of cache. Valid Addresses that use this entry Tag Entry Data 2047 65504-65535, 131040-131072, … 7 6 5 4 3 2 1 0 96-127, 65632-65663, 131068-131099 64-95, 65600-65631, 131036-131067, … 32-63, 65568-65599, 131004-131035, … 0-31, 65536-65567, 131072-131003, … (a) Bits 16 11 TAG LINE 3 2 WORD BYTE (b) Figure 4-38. (a) A direct-mapped cache. (b) A 32-bit virtual address. Valid Valid Tag Data Valid Tag Data Valid Tag Data Tag Data 2047 7 6 5 4 3 2 1 0 Entry A Entry B Entry C Figure 4-39. A four-way associative cache. Entry D if (i == 0) k = 1; else k = 2; (a) CMP i,0; compare i to 0 BNE Else; branch to Else if not equal Then: MOV k,1; move 1 to k BR Next; unconditional branch to Next Else: MOV k,2; move 2 to k Next: (b) Figure 4-40. (a) A program fragment. (b) Its translation to a generic assembly language. Branch/ no branch Valid Slot Branch address/tag 6 5 4 3 2 1 0 Valid Slot Prediction Branch bits address/tag 6 5 4 3 2 1 0 (a) Valid Slot Prediction bits Branch Target address/tag address 6 5 4 3 2 1 0 (b) (c) Figure 4-41. (a) A 1-bit branch history. (b) A 2-bit branch history. (c) A mapping between branch instruction address and target address. Branch No branch 00 Predict no branch Branch No branch Branch 01 10 Predict no branch one more time Predict branch one more time No branch 11 Branch Predict branch No branch Figure 4-42. A 2-bit finite-state machine for branch prediction. 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Registers being read 1 1 Registers being written 1 1 11 11 1 Cy 1 # 1 Decoded 1 Iss 1 Ret 1 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 11 1 11 R3=R0 R1 11 1 11 1 1 1 11 1 11 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 * 1 1 1 1 1 1 2 1 R4=R0+R2 1 2 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 2 11 1 11 1 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 1 R5=R0+R1 1 3 1 11 3 1 2 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 3 1 2 1 1 1 1 1 1 1 11 1 4 1 R6=R1+R4 1 – 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 3 11 2 11 1 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 4 1 11 1 1 11 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 11 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 11 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 11 11 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 5 1 R7=R1 *R2 1 5 1 1 2 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 6 1 6 1 R1=R0−R2 1 – 1 11 2 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 7 1 11 11 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 8 1 5 11 11 1 1 1 1 1 1 1 1 1 1 1 1 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 11 11 11 1 1 1 6 1 1 11 1 7 1 R3=R3 R1 1 7 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 11 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 * 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 11 1 11 1 11 1 11 11 11 11 1 1 11 1 11 11 1 11 11 11 11 1 10 1 1 1 1 1 1 1 1 6 11 11 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 11 11 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 12 1 7 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 1 13 8 R1=R4+R4 8 2 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 11 11 11 11 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 1 14 1 1 1 1 1 1 1 1 2 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 11 11 11 8 1 1 11 11 11 11 11 11 11 1 1 11 11 11 11 11 11 11 1 1 15 11 222222222222222222222222222222222222222222222222222222222222222222222222222222222 Figure 4-43. Operation of a superscalar CPU with in-order issue and in-order completion. 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Registers being read 1 1 Registers being written 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 11 11 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Cy 11 # 11 Decoded 11 Iss 11 Ret 1 1 0 11 1 11 2 11 3 11 4 11 5 11 6 11 7 1 1 0 11 1 11 2 11 3 11 4 11 5 11 6 11 7 1 1 1 1 1 1 R3=R0 R1 1 1 1 11 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 11 * 1 1 1 1 1 1 2 1 R4=R0+R2 1 2 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 2 11 1 11 1 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 3 1 R5=R0+R1 1 3 1 11 3 1 2 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 3 1 2 1 1 1 1 1 1 1 11 1 4 1 R6=R1+R4 1 – 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 11 5 11 R7=R1 *R2 11 5 11 1 1 3 11 3 11 2 11 11 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 6 1 S1=R0−R2 1 6 1 11 4 1 3 1 3 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 11 1 2 11 3 1 3 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 4 11 1 1 3 11 4 11 2 11 11 1 11 11 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 3 1 4 1 2 1 1 1 1 1 1 11 1 7 1 R3=R3 *S1 1 – 1 1 1 1 1 1 1 1 1 1 11 3 1 4 1 2 1 1 3 1 1 1 11 1 8 1 S2=R4+R4 1 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 2 1 3 1 2 1 1 3 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 11 1 2 2 3 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 11 11 11 11 11 1 11 1 1 5 1 6 11 2 1 1 1 3 1 1 1 1 1 1 1 1 1 11 1 1 7 1 1 2 1 1 1 1 1 3 1 1 1 11 11 1 1 1 1 1 1 1 1 1 1 1 11 1 6 11 1 1 1 1 1 4 11 1 1 1 1 1 1 1 2 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 5 11 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 11 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 7 1 11 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 8 1 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 11 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222222222 1 9 1 1 1 1 7 11 11 1 1 1 1 1 1 1 11 11 1 1 1 1 1 1 1 1 Figure 4-44. Operation of a superscalar CPU with out-oforder issue and out-of-order completion. evensum = 0; evensum = 0; oddsum = 0; oddsum = 0; i = 0; i = 0; while (i < limit) { i >= limit while (i < limit) k = i * i * i; k = i * i * i; if ((i/2) * 2) = = 0) if ((i/2) * 2) = = 0) evensum = evensum + k; T F evensum = evensum + k; else oddsum = oddsum + k; oddsum = oddsum + k; i = i + 1; i = i + 1; } (a) (b) Figure 4-45. (a) A program fragment. (b) The corresponding basic block graph. To level 2 cache Local bus to PCI bridge Bus interface unit Level 1 I-cache Fetch/Decode unit Level 1 D-cache Dispatch/Execute unit Retire unit Micro-operation pool (ROB) Figure 4-46. The Pentium II microarchitecture. Level 1 I-cache Pipeline stage IFU0 Cache line fetcher Next IP IFU1 Instruction length decoder Dynamic branch predictor IFU2 Instruction aligner ID0 0 1 2 ID1 Micro-operation queuer RAT Register allocator Micro-operation sequencer Static branch predictor ROB Micro-operations go in the ROB Figure 4-47. Internal structure of the Fetch/Decode unit (simplified). Port 0 MMXExecution unit Floating-Point execution unit Integer execution unit Port 1 MMXExecution unit Floating-Point execution unit Integer execution unit Reservation station From/to ROB Port 2 Load Unit Loads Port 3 Store Unit Stores Port 4 Store Unit Stores Figure 4-48. The Dispatch/Execute unit. To main memory Memory interface unit Level 2 cache External cache unit Prefetch/Dispatch unit Level 1 cache Grouping logic Integer execution unit Floating-point unit Load/store unit Integer registers FP registers Level 1 D-cache ALU ALU FP ALU FP ALU Load store Graphics unit Figure 4-49. The UltraSPARC II microarchitecture. Store queue Integer pipeline Execute Fetch Decode Cache N1 N2 Group N3 Register X1 X2 X3 Floating-point/graphics pipeline Figure 4-50. The UltraSPARC II’s pipeline. Write Memory and I/O bus interface unit 32 32 0-16 KB Instruction cache 0-16 KB Data cache 32 Prefetch, decode, and folding unit 32 Execution control unit Integer and floating-point unit 2 x 32 3 x 32 64 32-Bit registers for holding the top 64 words of the stack Figure 4-51. The block diagram of the picoJava II with both level 1 caches and the floating-point unit. This is configuration of the microJava 701. Fetch from I-cache Decode and fold Fetch operands from stack Execute instruction Access data cache Figure 4-52. The picoJava II has a six-stage pipeline. Write results to stack Without folding SP k m k k+m k k k k k k k n n n n k+m n k+m m m m m m m m Start After folded instruction SP SP 8 7 6 5 4 3 2 1 0 With folding Start After ILOAD k After ILOAD m (a) After IADD After ISTORE n (b) Figure 4-53. (a) Execution of a four-instruction sequence to compute n = k + m. (b) The same sequence folded to one instruction. 22222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 Group Description Example 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 NF 1 Nonfoldable instructions 1 GOTO 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 LV 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Pushing a word onto the stack 1 ILOAD 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 MEM Popping a word and storing it in memory ISTORE 1 BG1 1 Operations using one stack operand 1 IFEQ 1 21 2222222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 BG2 122222222222222222222222222222222222222222222222222222222222222222222222222222222 1 Operations using two stack operands 1 IF3CMPEQ 1 122222222222222222222222222222222222222222222222222222222222222222222222222222222 OP 1 Computations on two operands with one result 1 IADD 1 Figure 4-54. JVM instruction groups for folding purposes. 222222222222222222222222222222222222222222222222222222222222222222222222222 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 Instruction sequence Example 1 1 1 1 LV 1 1 MEM LV OP ILOAD, ILOAD, IADD, ISTORE 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 LV OP ILOAD, ILOAD, IADD 1 LV 1 1 1 LV 1 1 1 LV 1 BG2 1 ILOAD, ILOAD, IF3CMPEQ 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 LV BG1 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 ILOAD, IFEQ 1 1 1 1 1 LV 1 1 ILOAD, IF3CMPEQ 1 BG2 1 1 222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 MEM 1 1 LV ILOAD, ISTORE 1222222222222222222222222222222222222222222222222222222222222222222222222222 1 1 1 1 1 1222222222222222222222222222222222222222222222222222222222222222222222222222 11 IADD, ISTORE 11 1 OP 1 MEM 1 1 Figure 4-55. Some of the JVM instruction sequences that can be folded.