CS152 – Computer Architecture and Engineering Lecture 8 – Multicycle Design and Microcode 2004-09-23 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L09 Multicycle (1) Fall 2004 © UC Regents Review ° Single cycle datapath => CPI=1, CCT => long ° 5 steps to design a processor • • • • 1. Analyze instruction set => datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Assemble the control logic Processor ° Control is the hard part Input Control Memory ° MIPS makes control easier • Instructions same size Datapath Output • Source registers always in same place • Immediates same size, location • Operations always on registers/immediates CS 152 L09 Multicycle (2) Fall 2004 © UC Regents What’s wrong with our CPI=1 processor? Arithmetic & Logical PC Inst Memory Reg File mux ALU Inst Memory Reg File mux ALU Data Mem mux ALU Data Mem mux setup Load PC mux setup Critical Path Store PC Inst Memory Reg File Inst Memory Reg File Branch PC cmp mux • Long Cycle Time – All instructions take as much time as the slowest – Real memory slower than idealized memory • Duplicate Resources CS 152 L09 Multicycle (3) Fall 2004 © UC Regents Memory Access Time • Physics => fast memories are small (large memories are slow) Storage Array selected word line storage cell address bit line address decoder sense amps Processor Cache L2 Cache mem. bus proc. bus • => Use a hierarchy of memories memory 1 time-period 2-3 time-periods CS 152 L09 Multicycle (4) 20 - 50 time-periods Fall 2004 © UC Regents Reducing Cycle Time • Cut combinational dependency graph and insert register / latch • Do same work in two fast cycles, rather than one slow one • May be able to short-circuit path and remove some components for storage element some instructions! storage element Acyclic Combinational Logic (A) Acyclic Combinational Logic storage element Acyclic Combinational Logic (B) storage element storage element CS 152 L09 Multicycle (5) Fall 2004 © UC Regents Limits on Cycle Time (new view of datapath) • Next address logic – PC <= branch ? PC + offset : PC + 4 • Instruction Fetch – InstructionReg <= Mem[PC] • Register Access – A <= R[rs] • ALU operation CS 152 L09 Multicycle (6) Result Store MemWr RegWr RegDst Data Mem Reg. File MemWr Mem Access Exec ALUctr ALUSrc ExtOp Operand Fetch Instruction Fetch PC Next PC nPC_sel – R <= A + B MemRd Control Fall 2004 © UC Regents Partitioning the CPI=1 Datapath • 1. Result Store Place so that balances length of clock cycle • 2. MemWr RegWr RegDst Data Mem Reg. File MemWr MemRd ALUctr ExtOp ALUSrc Exec Mem Access Operand Fetch Instruction Fetch PC Next PC nPC_sel Equal Add registers between smallest steps Logic delays about the same between registers Place to save information needed later in instruction execution CS 152 L09 Multicycle (7) Fall 2004 © UC Regents CS 152 L09 Multicycle (8) B Ext ALU A Reg. File S Result Store Reg File Mem Access IR RegWr RegDst MemToReg MemWr MemRd ALUctr ALUSrc ExtOp Equal nPC_sel E Data Mem Operand Fetch Instruction Fetch PC Next PC Example Multicycle Datapath M • Critical Path ? Fall 2004 © UC Regents Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control CS 152 L09 Multicycle (9) Fall 2004 © UC Regents Step 4: R-rtype (add, sub, . . .) • Logical Register Transfer inst Logical Register Transfers ADDU R[rd] <= R[rs] + R[rt]; PC <= PC + 4 • Physical Register Transfers inst Physical Register Transfers IR <= MEM[pc] Time ADDU A<= R[rs]; B <= R[rt] S <= A + B R[rd] <= S; PC <= PC + 4 Reg. File M Data Mem B S Mem Access A Exec Reg File IR Inst. Mem PC Next PC E CS 152 L09 Multicycle (10) Fall 2004 © UC Regents Administrivia • Working on Homework #2 • Single cycle simulation demo on Friday • (add to your calendar Midterm 1 on Tuesday Oct 12 5:30 - 8:30pm 306 Soda) CS 152 L09 Multicycle (11) Fall 2004 © UC Regents Step 4: Logical immed • Logical Register Transfer inst Logical Register Transfers ORI R[rt] <= R[rs] | ZExt(Im16); PC <= PC + 4 • Physical Register Transfers inst Physical Register Transfers IR <= MEM[pc] A<= R[rs]; B <= R[rt] Time ORI S <= A | ZExt(Im16) R[rt] <= S; PC <= PC + 4 Reg. File M Data Mem B S Mem Access A Exec Reg File IR Inst. Mem PC Next PC E CS 152 L09 Multicycle (12) Fall 2004 © UC Regents Step 4 : Load • Logical Register Transfer inst Logical Register Transfers LW R[rt] <= MEM[R[rs] + SExt(Im16)]; PC <= PC + 4 • Physical Register Transfers inst Physical Register Transfers IR <= MEM[pc] LW A<= R[rs]; B <= R[rt] Time S <= A + SExt(Im16) M <= MEM[S] R[rd] <= M; PC <= PC + 4 Reg. File M Data Mem B S Mem Access A Exec Reg File IR Inst. Mem PC Next PC E CS 152 L09 Multicycle (13) Fall 2004 © UC Regents Step 4 : Store • Logical Register Transfer inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <= R[rt]; PC <= PC + 4 • Physical Register Transfers inst Physical Register Transfers Time IR <= MEM[pc] SW A<= R[rs]; B <= R[rt] S <= A + SExt(Im16); MEM[S] <= B PC <= PC + 4 Reg. File M Data Mem B S Mem Access A Exec Reg File IR Inst. Mem PC Next PC E CS 152 L09 Multicycle (14) Fall 2004 © UC Regents Step 4 : Branch • Logical Register Transfer inst Logical Register Transfers BEQ if R[rs] == R[rt] then PC <= PC + 4+{SExt(Im16), 2’b00} else PC <= PC + 4 • Physical Register Transfers inst Physical Register Transfers Time IR <= MEM[pc] BEQ E<= (R[rs] = R[rt]) if (!E) PC <= PC + 4; else PC <=PC+4+{SExt(Im16),2’b0} Reg. File M Data Mem B S Mem Access A Exec Reg File IR Inst. Mem PC Next PC E CS 152 L09 Multicycle (15) Fall 2004 © UC Regents Alternative datapath (book): Multiple Cycle Datapath • Minimizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc BrWr Zero IorD MemWr IRWr RegDst ALUSelA RegWr Mux 32 PC 1 WrAdr 32 Din 32 Dout Rb 32 5 0 Rt Rd busA 4 0 Rw 32 1 busW 1 busB 32 Mux 0 3 << 2 Imm ALU Control Extend 16 ExtOp CS 152 L09 Multicycle (16) 32 2 32 1 32 1 32 Reg File Mux Ideal Memory Rt ALU Out 32 Ra 5 ALU Mux RAdr Instruction Reg 0 32 Zero Mux Rs 32 0 0 32 Target 1 32 MemtoReg ALUOp ALUSelB Fall 2004 © UC Regents Our Control Model • State specifies control points for Register Transfer • Transfer occurs upon exiting state (same clock edge) inputs (conditions) Next State Logic State X Register Transfer Control Points Control State Depends on Input Output Logic outputs (control points) CS 152 L09 Multicycle (17) Fall 2004 © UC Regents Step 4 Control Spec for multicycle proc “instruction fetch” IR <= MEM[PC] “Finite State Diagram” “decode / operand fetch” A <= R[rs] B <= R[rt] SW BEQ S <= A fun B S <= A | ZX S <= A + SX M <= MEM[S] R[rd] <= S PC <= PC + 4 R[rt] <= S PC <= PC + 4 CS 152 L09 Multicycle (18) R[rt] <= M PC <= PC + 4 S <= A + SX MEM[S] <= B PC <= PC + 4 PC <= Next(PC,Equal) Execute LW Memory ORi Write-back R-type Fall 2004 © UC Regents Traditional FSM Controller state op next state cond control points Truth Table 11 next State control points Equal 6 4 State op datapath State CS 152 L09 Multicycle (19) Fall 2004 © UC Regents Step 5 (datapath + state diagram control) • Translate RTs into control points • Assign states • Then go build the controller CS 152 L09 Multicycle (20) Fall 2004 © UC Regents Mapping Register Transfers to Control Points “instruction fetch” IR <= MEM[PC] imem_rd, IRen A <= R[rs] B <= R[rt] “decode” Aen, Ben, Een SW S <= A fun B ALUfun, Sen S <= A | ZX S <= A + SX M <= MEM[S] R[rd] <= S PC <= PC + 4 RegDst, RegWr, PCen R[rt] <= S PC <= PC + 4 CS 152 L09 Multicycle (21) R[rt] <= M PC <= PC + 4 S <= A + SX MEM[S] <= B PC <= PC + 4 BEQ PC <= Next(PC,Equal) Execute LW Memory ORi Write-back R-type Fall 2004 © UC Regents Assigning States “instruction fetch” IR <= MEM[PC] 0000 “decode” A <= R[rs] B <= R[rt] 0001 0110 S <= A + SX 1000 M <= MEM[S] 1001 SW BEQ S <= A + SX 1011 MEM[S] <= B PC <= PC + 4 1100 R[rd] <= S PC <= PC + 4 0101 R[rt] <= S PC <= PC + 4 0111 CS 152 L09 Multicycle (22) R[rt] <= M PC <= PC + 4 1010 PC <= Next(PC) 0011 Execute 0100 S <= A or ZX LW Memory S <= A fun B ORi Write-back R-type Fall 2004 © UC Regents (Mostly) Detailed Control Specs (missing0) State BEQ: R: ORi: LW: SW: Op field Eq Next IR 0000 0001 0001 0001 0001 0001 ??????? BEQ x R-type x ORI x LW x SW x 0001 1 0011 0100 0110 1000 1011 0011 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx 0000 0000 0101 0000 0111 0000 1001 1010 0000 1100 0000 0 1 x x x x x x x x x CS 152 L09 Multicycle (23) PC en sel Ops Exec A B E Ex Sr ALU S 111 111 111 111 111 1 1 Mem RWM Write-Back M-R Wr Dst -all same in Moore machine 0 1 x x 0 0 x x 0 1 1 0 1 0 1 1 0 0 1 fun 1 1 0 0 0 or 1 1 0 1 0 add 1 1 0 1 1 0 1 0 add 1 1 0 0 1 0 Fall 2004 © UC Regents Instruction Set and Control Options • 7-instruction subset MIPS easy to implement FSM by hand • Full MIPS instruction set > 100 instructions, inclucing *, /, Floating Point +, -, *, / – Need to use Verilog, CAD tools • Full IA-32 instruction set? – > 500 instructions, including copy and edit, save/ restore state, setup for key-like memory protection, … CS 152 L09 Multicycle (24) Fall 2004 © UC Regents Controller Design • The state diagrams that arise define the controller for an instruction set processor are highly structured • Use this structure to construct a simple “microsequencer” • Control reduces to programming this very simple device microprogramming taken ZIL datapath control Micro-PC op-code Map ROM CS 152 L09 Multicycle (25) Fall 2004 © UC Regents Our Microsequencer taken ZIL datapath control Micro-PC op-code Map ROM CS 152 L09 Multicycle (26) Fall 2004 © UC Regents Adding the Dispatch ROM •Sequencer-based control – Called “microPC” or “µPC” vs. state register Control Value Effect 00 Next µaddress = 0 1 01 Next µaddress = dispatch ROM 10 Next µaddress = µaddress + 1 Adder ROM: R-type BEQ ori LW SW 000000 000100 001101 100011 101011 0100 0011 0110 1000 1011 microPC Mux 2 1 0 0 µAddress Select Logic ROM Opcode CS 152 L09 Multicycle (27) Fall 2004 © UC Regents Microprogramming sequencer control Inputs datapath control -Code ROM microinstruction () micro-PC Opcode -sequencer: fetch,dispatch, sequential Dispatch ROM CS 152 L09 Multicycle (28) Decode Decode To DataPath Fall 2004 © UC Regents Microprogramming • Microprogramming is a convenient method for implementing structured control state diagrams: – Random logic replaced by microPC sequencer and ROM – Each line of ROM called a microinstruction: contains sequencer control + values for control points – To reduce confusion, normal instruction (e.g., MIPS addu) called “macroinstruction” – limited state transitions: branch to zero, next sequential, branch to instruction address from dispatch ROM • Control design reduces to Microprogramming – Part of the design process is to develop a “language” that describes control and is easy for humans to understand CS 152 L09 Multicycle (29) Fall 2004 © UC Regents “Macroinstruction” Interpretation Main Memory User program plus Data ADD SUB AND this can change! . . . DATA one of these is mapped into one of these execution unit CPU control memory AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) CS 152 L09 Multicycle (30) Fall 2004 © UC Regents Designing a “Microinstruction Set” 1) Start with list of control signals 2) Group signals together that make sense (vs. random): “fields” 3) Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) To minimize the width, encode operations that will never be used at the same time • “Horizontal” Code: one control bit in Instruction for every control line in datapath • “Vertical” Code: groups of control-lines coded together in Instruction (e.g. possible ALU dest) 5) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals – Use computers to design computers CS 152 L09 Multicycle (31) Fall 2004 © UC Regents Again: Alternative multicycle datapath (book) • Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond PCSrc Zero IorD MemWr IRWr RegDst ALUSelA RegWr 1 Mux 32 PC 32 Din 32 Dout 32 Imm Rb 5 0 Rt Rd 32 busB 0 32 32 1 32 2 Mux 0 3 << 2 ALU Control Extend 16 ExtOp CS 152 L09 Multicycle (32) A B busW 32 1 4 Rw 1 1 busA Reg File Mux WrAdr Rt ALU Out 1 32 Mem Data Reg Ideal Memory Ra 5 Zero ALU 32 Rs Mux Mux RAdr Instruction Reg 0 32 0 0 32 32 32 MemtoReg ALUOp ALUSelB Fall 2004 © UC Regents Single Bit Control 1&2) Start with list of control signals, grouped into fields Signal name ALUSelA RegWrite MemtoReg RegDst MemRead MemWrite None IorD Memory address = PC IRWrite None PCWrite None PCWriteCond None PCSource PCSource = ALU ExtOp Zero Extended Signal name ALUOp Multiple Bit Control Effect when deasserted 1st ALU operand = PC None Reg. write data input = ALU Reg. dest. no. = rt None ALUSelB Value 00 01 10 11 00 01 10 11 CS 152 L09 Multicycle (33) Effect when asserted 1st ALU operand = Reg[rs] Reg. is written Reg. write data input = memory Reg. dest. no. = rd Memory at address is read, MDR <= Mem[addr] Memory at address is written Memory address = S IR <= Memory PC <= PCSource IF ALUzero then PC <= PCSource PCSource = ALUout Sign Extended Effect ALU adds ALU subtracts ALU does function code (“R-format”) ALU does logical OR 2nd ALU input = 4 2nd ALU input = Reg[rt] 2nd ALU input = extended,shift left 2 2nd ALU input = extended Fall 2004 © UC Regents 3&4) Microinstruction Format: unencoded vs. encoded fields Field Name Width Control Signals Set wide narrow ALU Control SRC1 SRC2 ALU Destination Memory Memory Register PCWrite Control Sequencing Total width 4 2 2 1 5 3 3 2 3 2 1 1 3 2 3 2 24 15 “Horizontal” CS 152 L09 Multicycle (34) ALUOp ALUSelA ALUSelB, ExtOp RegWrite, MemtoReg, RegDst MemRead, MemWrite, IorD IRWrite PCWrite, PCWriteCond, PCSource AddrCtl bits “Vertical” Fall 2004 © UC Regents 5) Legend of Fields and Symbolic Names Field Name ALU SRC1 SRC2 destination Memory Memory register PC write Sequencing Values for Field Add Subt. Func code Or PC rs 4 Extend Extend0 Extshft rt rd ALU rt ALU rt Mem Read PC Read ALU Write ALU IR ALU ALUoutCond Seq Fetch Dispatch CS 152 L09 Multicycle (35) Function of Field with Specific Value ALU adds ALU subtracts ALU does function code ALU does logical OR 1st ALU input = PC 1st ALU input = Reg[rs] 2nd ALU input = 4 2nd ALU input = sign ext. IR[15-0] 2nd ALU input = zero ext. IR[15-0] 2nd ALU input = sign ex., sl IR[15-0] 2nd ALU input = Reg[rt] Reg[rd] = ALUout Reg[rt] = ALUout Reg[rt] = Mem Read memory using PC Read memory using ALUout for addr Write memory using ALUout for addr IR = Mem PC = ALU IF ALU Zero then PC = ALUout Go to sequential µinstruction Go to the first microinstruction Dispatch using ROM. Fall 2004 © UC Regents Quick check: what do these fieldnames mean? Destination: Code Name 00 --01 rd ALU 10 rt ALU 11 rt MEM RegWrite 0 1 1 1 MemToReg X 0 0 1 RegDest X 1 0 0 SRC2: Code Name ALUSelB 000 --X 001 4 00 010 rt 01 011 ExtShft 10 100 Extend 11 111 Extend0 11 CS 152 L09 Multicycle (36) ExtOp X X X 1 1 0 Fall 2004 © UC Regents Specific Sequencer from before Sequencer-based control unit from last lecture – Called “microPC” or “µPC” vs. state register Code Name Effect 00 fetch Next µaddress = 0 01 dispatch Next µaddress = dispatch ROM 10 seq Next µaddress = µaddress + 1 ROM: R-type BEQ ori LW SW 000000 000100 001101 100011 101011 0100 0011 0110 1000 1011 1 Adder microPC Mux 2 1 0 0 µAddress Select Logic ROM Opcode CS 152 L09 Multicycle (37) Fall 2004 © UC Regents Microprogram it yourself! Label ALU SRC1 SRC2 Fetch: Add Add PC PC 4 Extshft Rtype: Func rs rt Lw: Add rs Dest. Memory Read PC Mem. Reg. PC Write IR ALU Add rs rd ALU rt MEM Seq Seq Fetch Extend Extend Seq Fetch Write ALU Ori: Or rs Extend0 Seq Fetch rt ALU Beq: Subt. rs CS 152 L09 Multicycle (38) rt Seq Dispatch Seq Fetch Read ALU Sw: Sequencing ALUoutCond. Fetch Fall 2004 © UC Regents Overview of Control • Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Sequencing Control Logic Representation Implementation Technique CS 152 L09 Multicycle (39) Finite State Diagram Microprogram Explicit Next State Function Microprogram counter + Dispatch ROMs Logic Equations Truth Tables PLA ROM “hardwired control” “microprogrammed control” Fall 2004 © UC Regents Microprogramming Pros and Cons ☺ Ease of design ☺ Flexibility ☺ Easy to adapt to changes in organization, timing, technology ☺ Can make changes late in design cycle, or even in the field ☺ Can implement very powerful instruction sets (just more control memory) ☺ Generality ☺ Can implement multiple instruction sets on same machine. ☺ Can tailor instruction set to application. ☺ Compatibility ☺ Many organizations, same instruction set – Control unit in same chip as data path, so can’t replace ROM – ROM not faster than RAM (so not much faster than 1st level instruction cache), PLA smaller and therefore faster than ROM – Simpler instruction sets popular now – CAD tools + fast computers allows simulation => correct control CS 152 L09 Multicycle (40) Fall 2004 © UC Regents Legacy Software and Microprogramming • IBM bet company on 360 Instruction Set Architecture (ISA): single instruction set for many classes of machines – (8-bit to 64-bit) • Stewart Tucker stuck with job of what to do about software compatibility – If microprogramming could easily do same instruction set on many different microarchitectures, then why couldn’t multiple microprograms do multiple instruction sets on the same microarchitecture? – Coined term “emulation”: instruction set interpreter in microcode for non-native instruction set – Very successful: in early years of IBM 360 it was hard to know whether old instruction set or new instruction set was more frequently used CS 152 L09 Multicycle (41) Fall 2004 © UC Regents Thought: Microprogramming one inspiration for RISC • If simple instruction could execute at very high clock rate… • If you could even write compilers to produce microinstructions… • If most programs use simple instructions and addressing modes… • If microcode is kept in RAM instead of ROM so as to fix bugs … • If same memory used for control memory could be used instead as cache for “macroinstructions”… • Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? (microprogramming is overkill when ISA matches datapath 1-1) CS 152 L09 Multicycle (42) Fall 2004 © UC Regents Summary (1 of 3) • Disadvantages of the Single Cycle Processor – Long cycle time – Cycle time is too long for all instructions except the Load • Multiple Cycle Processor: – Divide the instructions into smaller steps – Execute each step (instead of the entire instruction) in one cycle • Partition datapath into equal size chunks to minimize cycle time – ~10 levels of logic between latches • Follow same 5-step method for designing “real” processor CS 152 L09 Multicycle (43) Fall 2004 © UC Regents Summary (cont’d) (2 of 3) • Control is specified by finite state diagram • Specialize state-diagrams easily captured by microsequencer – simple increment & “branch” fields – datapath control fields • Control design reduces to Microprogramming • Control is more complicated with: – complex instruction sets – restricted datapaths (see the book) • Simple Instruction set and powerful datapath simple control – could try to reduce hardware (see the book) – rather go for speed => many instructions at once! CS 152 L09 Multicycle (44) Fall 2004 © UC Regents Summary (3 of 3) • Microprogramming is a fundamental concept – implement an instruction set by building a very simple processor and interpreting the instructions – essential for very complex instructions and when few register transfers are possible – Control design reduces to Microprogramming • Design of a Microprogramming language – Start with list of control signals – Group signals together that make sense (vs. random): called “fields” – Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) – To minimize the width, encode operations that will never be used at the same time – Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals CS 152 L09 Multicycle (45) Fall 2004 © UC Regents Where to get more information? • Multiple Cycle Controller: Appendix C of your text book. • Microprogramming: Section 5.7 of your text book. • D. Patterson, “Microprograming,” Scientific American, March 1983. • D. Patterson and D. Ditzel, “The Case for the Reduced Instruction Set Computer,” Computer Architecture News 8, 6 (October 15, 1980) CS 152 L09 Multicycle (46) Fall 2004 © UC Regents