CS 152 Computer Architecture and Engineering Lecture 10 Multicycle Controller Design (Continued) February 20, 2001 John Kubiatowicz (http.cs.berkeley.edu/~kubitron) lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/ 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.1 Partitioning the CPI=1 Datapath ° Place enables on all registers 2/20/01 ©UCB Spring 2001 Result Store MemWr RegDst RegWr Reg. File Data Mem MemRd MemWr ALUctr ExtOp ALUSrc Exec Mem Access Operand Fetch Instruction Fetch PC Next PC nPC_sel Equal ° Add registers between smallest steps CS152 / Kubiatowicz Lec10.2 2/20/01 ExtOp Equal B ©UCB Spring 2001 S Reg. File RegDst RegWr MemToReg MemRd MemWr ALUctr Ext ALUSrc ALU A Result Store Reg File Mem Acces s IR nPC_sel E Data Mem Operand Fetch Instruction Fetch PC Next PC Recap: Example Multicycle Datapath M ° Critical Path ? CS152 / Kubiatowicz Lec10.3 Recap: FSM specification IR <= MEM[PC] 0000 “instruction fetch” “decode” A <= R[rs] B <= R[rt] R-type ORi S <= A fun B S <= A or ZX 0100 0110 LW S <= A + SX 1000 M <= MEM[S] 1001 SW BEQ S <= A + SX 1011 MEM[S] <= B PC <= PC + 4 1100 R[rd] <= S R[rt] <= S R[rt] <= M PC <= PC + 4 PC <= PC + 4 PC <= PC + 4 0101 2/20/01 0111 1010 ©UCB Spring 2001 PC <= Next(PC) 0011 Write-back Memory Execute 0001 CS152 / Kubiatowicz Lec10.4 Sequencer-based control unit: Statemachine ++ Control Logic Multicycle Datapath Outputs Inputs 1 Adder Types of “branching” • Set state to 0 • Dispatch (state 1) • Use incremented state number State Reg Address Select Logic Opcode 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.5 Recap: Micro-controller Design ° The state digrams that arise define the controller for an instruction set processor are highly structured ° Use this structure to construct a simple “microsequencer” • Each state in previous diagram becomes a “microinstruction” • Microinstructions often taken sequentially ° Control reduces to programming this device sequencer control datapath control microinstruction () micro-PC sequencer 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.6 Recap: Specific Sequencer from last lecture °Sequencer-based control unit from last lecture • Called “microPC” or “µPC” vs. state register Control Value Effect 00 Next µaddress = 0 01 Next µaddress = dispatch ROM 10 Next µaddress = µaddress + 1 1 Adder ROM: R-type BEQ ori LW SW 000000 000100 001101 100011 101011 0100 0011 0110 1000 1011 µAddress Select Logic microPC Mux 2 1 0 0 ROM Opcode 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.7 Recap: Microprogram Control Specification µPC 0000 0001 0001 0010 BEQ 0011 R: 0100 0101 ORi: 0110 0111 1000 LW: 1001 1010 SW: 1011 1100 2/20/01 Taken ? 0 1 x x x x x x x x x x x Next IR PC Ops Exec Mem Write-Back en sel A B Ex Sr ALU S R W M M-R Wr Dst inc 1 load inc zero 1 1 zero 1 0 inc 0 1 fun 1 zero 1 0 0 1 1 inc 0 0 or 1 zero 1 0 0 1 0 inc 1 0 add 1 inc 1 0 1 zero 1 0 1 1 0 inc 1 0 add 1 zero 1 0 0 1 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.8 Recap: Overview of Control ° Control may be designed using one of several initial representations. The choice of sequence control, and how logic is represented, can then be determined independently; the control can then be implemented with one of several methods using a structured logic technique. Initial Representation Sequencing Control Logic Representation Implementation Technique 2/20/01 Finite State Diagram Microprogram Explicit Next State Microprogram counter Function + Dispatch ROMs Logic Equations Truth Tables PLA ROM “hardwired control” ©UCB Spring 2001 “microprogrammed control” CS152 / Kubiatowicz Lec10.9 The Big Picture: Where are We Now? ° The Five Classic Components of a Computer Processor Input Control Memory Datapath Output ° Today’s Topics: • • • • 2/20/01 Microprogramed control Administrivia Microprogram it yourself Exceptions ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.10 Microprogramming (Maurice Wilkes) ° Control is the hard part of processor design ° Datapath is fairly regular and well-organized ° Memory is highly regular ° Control is irregular and global Microprogramming: -- A Particular Strategy for Implementing the Control Unit of a processor by "programming" at the level of register transfer operations Microarchitecture: -- Logical structure and functional capabilities of the hardware as seen by the microprogrammer Historical Note: IBM 360 Series first to distinguish between architecture & organization Same instruction set across wide range of implementations, each with different cost/performance 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.11 “Macroinstruction” Interpretation Main Memory ADD SUB AND . . . DATA execution unit CPU User program plus Data this can change! one of these is mapped into one of these AND microsequence control memory e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.12 Variations on Microprogramming ° “Horizontal” Microcode – control field for each control point in the machine µseq µaddr A-mux B-mux bus enables register enables ° “Vertical” Microcode – compact microinstruction format for each class of microoperation – local decode to generate all control points (remember ALU?) branch: µseq-op µadd execute: ALU-op A,B,R memory: mem-op S, D Horizontal Vertical 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.13 Extreme Horizontal 3 1 . . . N3 N2 N1 N0 1 bit for each loadable register enbMAR enbAC . . . input select Incr PC ALU control Depending on bus organization, many potential control combinations simply wrong, i.e., implies transfers that can never happen at the same time. Makes sense to encode fields to save ROM space Example: mem_to_reg and ALU_to_reg should never happen simultaneously; => encode in single bit which is decoded rather than two separate bits NOTE: the encoding should be only wide enough so that parallel actions that the datapath supports should still be specifiable in a single microinstruction 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.14 More Vertical Format src dst D E C other control fields next states inputs D E C MUX Some of these may have nothing to do with registers! Multiformat Microcode: 6 1 3 0 cond 1 1 3 dst D E C 2/20/01 next address 3 src 3 alu Branch Jump Register Xfer Operation D E C ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.15 Hybrid Control Not all critical control information is derived from control logic E.g., Instruction Register (IR) contains useful control information, such as register sources, destinations, opcodes, etc. enable signals from control IR op to control 2/20/01 R S 1 R S 2 R D D E C D E C D E C rs1 rs2 Register File rd ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.16 Vax Microinstructions VAX Microarchitecture: 96 bit control store, 30 fields, 4096 µinstructions for VAX ISA encodes concurrently executable "microoperations" 95 87 84 USHF 001 = left 010 = right . . . 101 = left3 68 65 63 11 UALU USUB 010 = A-B-1 100 = A+B+1 ALU Control 0 UJMP 00 = Nop 01 = CALL 10 = RTN Jump Address Subroutine Control ALU Shifter Control Current intel architecture: 80-bit microcode, 8192 instructions 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.17 Horizontal vs. Vertical Microprogramming NOTE: previous organization is not TRUE horizontal microprogramming; register decoders give flavor of encoded microoperations Most microprogramming-based controllers vary between: horizontal organization (1 control bit per control point) vertical organization (fields encoded in the control memory and must be decoded to control something) Horizontal Vertical + more control over the potential parallelism of operations in the datapath + easier to program, not very different from programming a RISC machine in assembly language - 2/20/01 uses up lots of control store - extra level of decoding may slow the machine down ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.18 Administration ° Midterm on Thursday (3/1) from 5:30 - 8:30 in 277 Cory Hall • No class on that day ° Pizza and Refreshments afterwards at LaVal’s on Euclid • I’ll Buy the pizza • LaVal’s has an interesting history ° Review Session: • Sunday (2/25), 7:00 PM in 306 Soda • Material through this Thursday ° Still problems with groups in section 2-4. Need 2 ir 3 volunteers for another group in that section ° Look over Lab 4 soon! • Must email breakdown to your TA by Thursday night at midnight 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.19 How Effectively are we utilizing our hardware? IR <- Mem[PC] A <- R[rs]; B<– R[rt] S <– A + B R[rd] <– S; PC <– PC+4; S <– A or ZX R[rt] <– S; PC <– PC+4; S <– A + SX S <– A + SX M <– Mem[S] Mem[S] <- B R[rd] <– M; PC <– PC+4; PC <– PC+4; PC < PC+4; PC < PC+SX; ° Example: memory is used twice, at different times • Ave mem access per inst = 1 + Flw + Fsw ~ 1.3 • if CPI is 4.8, imem utilization = 1/4.8, dmem =0.3/4.8 ° We could reduce HW without hurting performance 2/20/01 • extra control ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.20 “Princeton” Organization next PC P C IR ZX SX Reg File AB Bus Bus A S Mem B W-Bus ° Single memory for instruction and data access • memory utilization -> 1.3/4.8 ° Sometimes, muxes replaced with tri-state buses • Difference often depends on whether buses are internal to chip (muxes) or external (tri-state) ° In this case our state diagram does not change • several additional control signals • must ensure each bus is only driven by one source on each cycle 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.21 Today: Alternative datapath (book) ° Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond Zero MemWr IRWr RegDst ALUSelA RegWr 32 PC 32 32 5 Rt 0 Rd Rb busA A Reg File Rw B 1 Mux 0 << 2 Extend ExtOp 32 32 32 0 1 32 32 2 3 ALU Control 32 MemtoReg ©UCB Spring 2001 Zero 1 4 busW busB 1 Imm 16 2/20/01 Ra 0 ALU Out WrAdr 32 Din Dout 32 Rt Mux Ideal Memory 1 5 32 ALU 32 Rs Mem Data Reg Mux RAdr 0 Mux 0 32 Instruction Reg 32 1 Mux IorD PCSrc ALUOp ALUSelB CS152 / Kubiatowicz Lec10.22 New Finite State Machine (FSM) Spec IR <= MEM[PC] PC <= PC + 4 “instruction fetch” 0000 “decode” Q: How improve to do something in state 0001? 0001 ORi ALUout <= A fun B ALUout <= A or ZX 0100 0110 LW ALUout <= A + SX 1000 M <= MEM[ALUout] 1001 BEQ SW ALUout <= A + SX ALUout <= PC +SX 1011 0010 MEM[ALUout] <= B 1100 R[rd] <= ALUout 0101 2/20/01 R[rt] <= ALUout 0111 R[rt] <= M 1010 ©UCB Spring 2001 If A = B then PC <= ALUout 0011 Memory Write-back Execute R-type CS152 / Kubiatowicz Lec10.23 Finite State Machine (FSM) Spec IR <= MEM[PC] PC <= PC + 4 “instruction fetch” 0000 ALUout <= PC +SX “decode” 0001 ORi ALUout <= A fun B ALUout <= A or ZX 0100 0110 LW ALUout <= A + SX 1000 M <= MEM[ALUout] 1001 BEQ SW ALUout <= A + SX 1011 MEM[ALUout] <= B 1100 R[rd] <= ALUout 0101 2/20/01 R[rt] <= ALUout 0111 R[rt] <= M 1010 ©UCB Spring 2001 If A = B then PC <= ALUout 0010 Memory Write-back Execute R-type CS152 / Kubiatowicz Lec10.24 Designing a Microinstruction Set 1) Start with list of control signals 2) Group signals together that make sense (vs. random): called “fields” 3) Place fields in some logical order (e.g., ALU operation & ALU operands first and microinstruction sequencing last) 4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals • Use computers to design computers 5) To minimize the width, encode operations that will never be used at the same time 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.25 Multiple Bit Control Single Bit Control 1&2) Start with list of control signals, grouped into fields Signal name ALUSelA RegWrite MemtoReg RegDst MemRead Effect when deasserted Effect when asserted 1st ALU operand = PC 1st ALU operand = Reg[rs] None Reg. is written Reg. write data input = ALU Reg. write data input = memory Reg. dest. no. = rt Reg. dest. no. = rd None Memory at address is read, MDR <= Mem[addr] MemWrite None Memory at address is written IorD Memory address = PC Memory address = S IRWrite None IR <= Memory PCWrite None PC <= PCSource PCWriteCond None IF ALUzero then PC <= PCSource PCSource PCSource = ALU PCSource = ALUout ExtOp Zero Extended Sign Extended Signal name Value ALUOp 00 01 10 11 ALUSelB 00 01 10 11 2/20/01 Effect ALU adds ALU subtracts ALU does function code ALU does logical OR 2nd ALU input = 4 2nd ALU input = Reg[rt] 2nd ALU input = extended,shift left 2 2nd ALU input = extended ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.26 Start with list of control signals, cont’d ° For next state function (next microinstruction address), use Sequencer-based control unit from last lecture • Called “microPC” or “µPC” vs. state register Signal Value Effect Sequen 00 Next µaddress = 0 -cing 01 Next µaddress = dispatch ROM 10 Next µaddress = µaddress + 1 ° Could even include “branch” option which changes microPC by adding offset when certain control signals are true. 1 Adder µAddress Select Logic microPC Mux 2 1 0 0 ROM Opcode 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.27 3) Microinstruction Format: unencoded vs. encoded fields Field Name Width Control Signals Set wide narrow ALU Control 4 2 ALUOp SRC1 2 1 ALUSelA SRC2 5 3 ALUSelB, ExtOp ALU Destination 3 2 RegWrite, MemtoReg, RegDst Memory 3 2 MemRead, MemWrite, IorD Memory Register 1 1 IRWrite PCWrite Control 3 2 PCWrite, PCWriteCond, PCSource Sequencing 3 2 AddrCtl Total width 24 15 2/20/01 bits ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.28 4) Legend of Fields and Symbolic Names Field Name ALU Values for Field Add Subt. Func code Or SRC1 PC rs SRC2 4 Extend Extend0 Extshft rt destination rd ALU rt ALU rt Mem Memory Read PC Read ALU Write ALU Memory register IR PC write ALU ALUoutCond Sequencing Seq Fetch Dispatch 2/20/01 Function of Field with Specific Value ALU adds ALU subtracts ALU does function code ALU does logical OR 1st ALU input = PC 1st ALU input = Reg[rs] 2nd ALU input = 4 2nd ALU input = sign ext. IR[15-0] 2nd ALU input = zero ext. IR[15-0] 2nd ALU input = sign ex., sl IR[15-0] 2nd ALU input = Reg[rt] Reg[rd] = ALUout Reg[rt] = ALUout Reg[rt] = Mem Read memory using PC Read memory using ALUout for addr Write memory using ALUout for addr IR = Mem PC = ALU IF ALU Zero then PC = ALUout Go to sequential µinstruction Go to the first microinstruction Dispatch using ROM. ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.29 Quick check: what do these fieldnames mean? Destination: Code 00 01 10 11 Name --rd ALU rt ALU rt MEM RegWrite 0 1 1 1 MemToReg X 0 0 1 Name --4 rt ExtShft Extend Extend0 ALUSelB X 00 01 10 11 11 ExtOp X X X 1 1 0 RegDest X 1 0 0 SRC2: Code 000 001 010 011 100 111 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.30 Alternative datapath (book): Multiple Cycle Datapath ° Miminizes Hardware: 1 memory, 1 adder PCWr PCWrCond Zero MemWr IRWr RegDst ALUSelA RegWr 32 PC 32 32 5 Rt 0 Rd Rb busA A Reg File Rw B 1 Mux 0 << 2 Extend ExtOp 32 32 32 0 1 32 32 2 3 ALU Control 32 MemtoReg ©UCB Spring 2001 Zero 1 4 busW busB 1 Imm 16 2/20/01 Ra 0 ALU Out WrAdr 32 Din Dout 32 Rt Mux Ideal Memory 1 5 32 ALU 32 Rs Mem Data Reg Mux RAdr 0 Mux 0 32 Instruction Reg 32 1 Mux IorD PCSrc ALUOp ALUSelB CS152 / Kubiatowicz Lec10.31 Microprogram it yourself! Label ALU Fetch: Add 2/20/01 SRC1 PC SRC2 ALU Dest. 4 Memory Read PC ©UCB Spring 2001 Mem. Reg. PC Write IR ALU Sequencing Seq CS152 / Kubiatowicz Lec10.32 Microprogram it yourself! Label ALU SRC1 SRC2 Fetch: Add Add PC PC 4 Extshft Rtype: Func rs rt Dest. Memory Read PC Mem. Reg. PC Write Sequencing IR ALU Seq Dispatch rd ALU Seq Fetch Ori: Or rs Extend0 rt ALU Seq Fetch Lw: Add rs Extend Seq Seq Fetch Read ALU rt MEM Sw: Add rs Extend Seq Fetch Write ALU Beq: 2/20/01 Subt. rs rt ALUoutCond. ©UCB Spring 2001 Fetch CS152 / Kubiatowicz Lec10.33 Legacy Software and Microprogramming ° IBM bet company on 360 Instruction Set Architecture (ISA): single instruction set for many classes of machines • (8-bit to 64-bit) ° Stewart Tucker stuck with job of what to do about software compatibility ° If microprogramming could easily do same instruction set on many different microarchitectures, then why couldn’t multiple microprograms do multiple instruction sets on the same microarchitecture? ° Coined term “emulation”: instruction set interpreter in microcode for non-native instruction set ° Very successful: in early years of IBM 360 it was hard to know whether old instruction set or new instruction set was more frequently used 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.34 Microprogramming Pros and Cons ° Ease of design ° Flexibility • Easy to adapt to changes in organization, timing, technology • Can make changes late in design cycle, or even in the field ° Can implement very powerful instruction sets (just more control memory) ° Generality • Can implement multiple instruction sets on same machine. • Can tailor instruction set to application. ° Compatibility • Many organizations, same instruction set ° Costly to implement ° Slow 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.35 An Alternative MultiCycle DataPath A-Bus B Bus next PC P C inst mem IR ZX SX Reg File A S mem B W-Bus ° In each clock cycle, each Bus can be used to transfer from one source ° µ-instruction can simply contain B-Bus and W-Dst fields 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.36 What about a 2-Bus Microarchitecture (datapath)? Instruction Fetch next PC P C IR ZXSX Reg File AB Bus Bus A S Mem M B Decode / Operand Fetch next PC 2/20/01 P C IR ZXSX Reg File A S B ©UCB Spring 2001 Mem M CS152 / Kubiatowicz Lec10.37 Load Execute next PC P C IR ZXSX Reg File A S Mem B M Mem next PC P C IR P C IR ZXSX Reg File A Reg File A S addr Mem M S Mem M B Write-back next PC ZXSX B ° What about 1 bus ? 1 ©UCB adder? 1 Register port?CS152 / Kubiatowicz Spring 2001 2/20/01 Lec10.38 Summary ° Specialize state-diagrams easily captured by microsequencer • simple increment & “branch” fields • datapath control fields ° Most microprogramming-based controllers vary between: • horizontal organization (1 control bit per control point) • vertical organization (fields encoded in the control memory and must be decoded to control something) ° Steps: • identify control signals, group them, develop “mini language”, then microprogram ° Control design reduces to Microprogramming • Arbitrarily complicated instructions possible 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.39 Summary: Microprogramming one inspiration for RISC ° If simple instruction could execute at very high clock rate… ° If you could even write compilers to produce microinstructions… ° If most programs use simple instructions and addressing modes… ° If microcode is kept in RAM instead of ROM so as to fix bugs … ° If same memory used for control memory could be used instead as cache for “macroinstructions”… ° Then why not skip instruction interpretation by a microprogram and simply compile directly into lowest language of machine? 2/20/01 ©UCB Spring 2001 CS152 / Kubiatowicz Lec10.40