MIT OpenCourseWare http://ocw.mit.edu 6.004 Computation Structures Spring 2009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Building the Beta CPU Design Tradeoffs Maximum Performance: measured by the numbers of instructions executed per second Minimum Cost : measured by the size of the circuit. Best Performance/Price: measured by the ratio of MIPS to size. In power-sensitive applications MIPS/Watt is important too. Figure by MIT OpenCourseWare. Lab #5 due Thursday 3/31/09 6.004 – Spring 2009 L14 – Building a Beta 1 Performance Measure 3/31/09 6.004 – Spring 2009 L14 – Building a Beta 2 The Beta ISA 6 OpCode Millions of Instructions per Second MIPS = 6 5 5 5 11 10 X X X X Rc Ra Rb (UNUSED) Operate class: Reg[Rc] Reg[Ra] op Reg[Rb] Clock Frequency (MHz) 16 C.P.I. 11 X X X X Rc Ra Literal C (signed) Operate class: Reg[Rc] Reg[Ra] op SXT(C) Opcodes, both formats: ADD SUB MUL* DIV* *optional CMPEQ CMPLE CMPLT AND OR XOR SHL SHR SRA Clocks per instruction PUSHING PERFORMANCE ... 01 X X X X LD: ST: JMP: BEQ: BNE: LDR: TODAY: 1 cycle/inst. LATER: more MHz via pipelining 6.004 – Spring 2009 3/31/09 Instruction classes distinguished by OPCODE: OP OPC MEM Transfer of Control L14 – Building a Beta 3 6.004 – Spring 2009 Rc Ra Literal C (signed) Reg[Rc] Mem[Reg[Ra]+SXT(C)] Mem[Reg[Ra]+SXT(C)] Reg[Rc] Reg[Rc] PC+4; PC Reg[Ra] Reg[Rc] PC+4; if Reg[Ra]=0 then PC PC+4+4*SXT(C) Reg[Rc] PC+4; if Reg[Ra]0 then PC PC+4+4*SXT(C) Reg[Rc] Mem[PC + 4 + 4*SXT(C)] 3/31/09 L14 – Building a Beta 4 Approach: Incremental Featurism Multi-Port Register Files Write Port (independent Read addresses) Each instruction class can be implemented using a simple component repertoire. We’ll try implementing data paths for each class individually, and merge them (using MUXes, etc). dest 5 Write Address EN clk Our Bag of Components: Steps: 1. Operate instructions 2. Load & Store Instructions 3. Jump & Branch instructions 4. Exceptions 5. Merge data paths EN EN … Write Data A B bsel WD Read Port A “Black box” ALU WD A D Instruction Memory A RD Data Memory EN s1 0 clk D Q RD2 32 2 combinational READ ports*, 1 clocked WRITE port *internal logic ensures Reg[31] reads as 0 RD2 Memories 3/31/09 Q L14 – Building a Beta 5 3/31/09 6.004 – Spring 2009 Register File Timing L14 – Building a Beta 6 Starting point: ALU Ops 2 combinational READ ports, 1 clocked WRITE port RA RD1 (Independent Read Data) Read Port B D R/W RD1 WD Register File (3-port) 32 RA2 Register File (3-port) WE WA Muxes ALU WA 32 CLK asel 1 0 5 RA2 WE Write Enable Registers RA1 6.004 – Spring 2009 EN 5 RA1 32-bit (4-byte) ADD instruction: A 10000000100000100001100000000000 RD Reg[A] new Reg[A] tPD OpCode Rc Ra Rb (unused) tPD Means, to BETA, Reg[R4] Reg[R2] + Reg[R3] CLK WE First, we’ll need hardware to: • Read next 32-bit instruction • DECODE instruction: ADD, SUB, XOR, etc • READ operands (Ra, Rb) from Register File; • PERFORM indicated operation; • WRITE result back into Register File (Rc). A WA WD new Reg[A] tS th What if (say) WA=RA1??? RD1 reads “old” value of Reg[RA1] until next clock edge! 6.004 – Spring 2009 3/31/09 L14 – Building a Beta 7 6.004 – Spring 2009 3/31/09 L14 – Building a Beta 8 Instruction Fetch/Decode ALU Op Data Path • Use a counter to FETCH the next instruction: PROGRAM COUNTER (PC) PC 00 A Instruction Memory 32 D +4 32 32 OPCODE <31:26> Control Logic • use PC as memory address • add 4 to PC, load new value at end of cycle • fetch instruction from memory INSTRUCTION º use some instruction fields directly (register numbers, WORD FIELDS 16-bit constant) º use bits <31:26> to generate controls Ra Rb (UNUSED) Operate class: Reg[Rc] Reg[Ra] op Reg[Rb] 00 A PC Rc 10 X X X X Instruction Memory D +4 Ra: <20:16> Rb: <15:11> RA1 Rc: <25:21> RA2 Register File WA RD1 WD RD2 32 WE WERF 32 Control Logic A B ALU ALUFN CONTROL SIGNALS ALUFN WERF! 3/31/09 6.004 – Spring 2009 L14 – Building a Beta 9 PC Rc Ra 3/31/09 A Literal C (signed) 01 10 00 PC +4 RA1 Rc: <25:21> WA WA RD1 D Ra: <20:16> RA2 WE Rb: <15:11> Register File RA1 Rc: <25:21> WD RD2 Literal C (signed) Instruction Memory Rb: <15:11> Register File Ra 00 A D Ra: <20:16> Rc LD: Reg[Rc] Mem[Reg[Ra]+SXT(C)] Instruction Memory +4 L14 – Building a Beta 10 Load Instruction Operate class: Reg[Rc] Reg[Ra] op SXT(C) 00 32 6.004 – Spring 2009 ALU Operations (w/constant) 11 X X X X WERF WA WA WERF RD1 C: SXT(<15:0>) RA2 WD RD2 WE WERF C: SXT(<15:0>) 32 1 0 BSEL 1 Control Logic A BSEL 0 BSEL Control Logic B A ALU ALUFN BSEL WDSEL ALUFN Wr WERF ALUFN WERF B ALU ALUFN WD R/W Wr Data Memory 32 Adr RD 32 0 6.004 – Spring 2009 3/31/09 L14 – Building a Beta 11 6.004 – Spring 2009 3/31/09 1 2 WDSEL L14 – Building a Beta 12 Store Instruction 01 10 01 Rc JMP Instruction Ra Literal C (signed) ST: Mem[Reg[Ra]+SXT(C)] Reg[Rc] JT PCSEL 4 3 2 00 PC PC A Instruction Memory 01 10 11 1 0 A D Rc: <25:21> Rb: <15:11> 0 Register File RA1 Rc: <25:21> WA WA RD1 1 Ra: <20:16> RA2SEL No WERF! WD RD2 WE WA WA WERF RD1 BSEL 0 1 Control Logic WD RD2 WE WERF 0 BSEL PCSEL RA2SEL RA2SEL B ALU ALUFN RA2SEL Control Logic 32 BSEL WDSEL ALUFN Wr 1 RA2 JT C: SXT(<15:0>) A Register File RA1 Rc: <25:21> C: SXT(<15:0>) 1 Rc: <25:21> Rb: <15:11> 0 RA2 Literal C (signed) Instruction Memory +4 Ra: <20:16> Ra 00 D +4 Rc JMP: Reg[Rc] PC+4; PC Reg[Ra] WD R/W A Wr BSEL WDSEL ALUFN Wr Data Memory Adr RD WERF B ALU ALUFN WD R/W Wr Data Memory Adr RD WERF PC+4 0 1 2 32 WDSEL 0 3/31/09 6.004 – Spring 2009 L14 – Building a Beta 13 3 2 1 0 32 PC 01 11 10 00 Instruction Memory A Rc Ra L14 – Building a Beta 14 01 11 11 Rc Ra Literal C (signed) LDR: Reg[Rc] Mem[PC + 4 + 4*SXT(C)] Literal C (signed) BNE: Reg[Rc] PC+4; if Reg[Ra]0 then PC PC+4+4*SXT(C) D +4 Ra: <20:16> Rb: <15:11> PC+4+4*SXT(C) + 4*SXT(<15:0>) Z Register File RA1 Rc: <25:21> WA WA RD1 Z Hey, WAIT A MINUTE. What’s Load Relative good for anyway??? I Rc: <25:21> 0 1 thought RA2SEL RA2 • Code is “PURE”, i.e. READ-ONLY; and stored in a “PROGRAM” region of memory; WD RD2 WE WERF JT C: SXT(<15:0>) 1 0 • Data is READ-WRITE, and stored either BSEL Control Logic • On the STACK (local); or PCSEL RA2SEL BSEL WDSEL ALUFN Wr WERF A ALUFN WD R/W Wr RD So why an instruction designed to load data that’s “near” the instruction??? Addresses & other large constants 0 3/31/09 1 2 WDSEL L14 – Building a Beta 15 6.004 – Spring 2009 3/31/09 X = X * 123456; BETA: • In a global storage HEAP. Data Memory Adr C: • In some GLOBAL VARIABLE region; or B ALU PC+4 6.004 – Spring 2009 WDSEL Load Relative Instruction 0 1 1 1 0 1 Rc Ra Literal C (signed) BEQ: Reg[Rc] PC+4; if Reg[Ra]=0 then PC PC+4+4*SXT(C) JT 4 2 3/31/09 6.004 – Spring 2009 BEQ/BNE Instructions PCSEL 1 c1: LD(X, r0) LDR(c1, r1) MUL(r0, r1, r0) ST(r0, X) ... LONG(123456) L14 – Building a Beta 16 LDR Instruction Exceptions JT 01 11 11 PCSEL 4 3 2 1 IF PC Rc Ra Literal C (signed) 0 What if something BAD happens? LDR: Reg[Rc] Mem[PC + 4 + 4*SXT(C)] 00 • Execution of an illegal op-code • Reference to non-existent memory • Divide by zero Instruction Memory A D +4 Ra: <20:16> Rc: <25:21> Rb: <15:11> 0 + Register File RA1 Rc: <25:21> WA WA RD1 Z 1 Or, maybe, just something unanticipated… RA2SEL RA2 • User hits a key • A packet comes in via the network WD RD2 WE WERF JT C:SXT( <15:0>) PC+4+4*SXT(C) Z ASEL 1 0 1 0 BSEL GOAL: handle all these cases (and more) in SOFTWARE: Control Logic PCSEL RA2SEL ASEL BSEL WDSEL ALUFN Wr WERF A • • • • B ALU ALUFN WD R/W Wr Data Memory Adr RD PC+4 0 1 2 WDSEL 3/31/09 6.004 – Spring 2009 Treat each such case as an (implicit) procedure call… Procedure handles problem, returns to interrupted program. TRANSPARENT to interrupted program! Important added capability: handlers for certain errors (illegal opcodes) can extend instruction set using software (Lab 7!). L14 – Building a Beta 17 3/31/09 6.004 – Spring 2009 Exception Processing Implementation… How exceptions work: • Don’t execute current instruction • Instead fake a “forced” procedure call • save current PC (actually current PC + 4) • load PC with exception vector • 0x4 for synch. exception, 0x8 for asynch. exceptions Plan: • Interrupt running program • Invoke exception handler (like a procedure call) • Return to continue execution. We’d like RECOVERABLE INTERRUPTS for • Synchronous events, generated by CPU or system FAULTS (eg, Illegal Instruction, divide-by-0, illegal mem address) TRAPS & system calls (eg, read-a-character) Question: where to save current PC + 4? • Our approach: reserve a register (R30, aka XP) • Prohibit user programs from using XP. Why? IllOp: PUSH(XP) Example: DIV unimplemented LD(R31,A,R0) LD(R31,B,R1) DIV(R0,R1,R2) ST(R2,C,R31) • Asynchronous events, generated by I/O (eg, key struck, packet received, disk transfer complete) Forced by hardware KEY: TRANSPARENCY to interrupted program. 3/31/09 Fetch inst. at Mem[Reg[XP]–4] check for DIV opcode, get reg numbers perform operation in SW, fill result reg POP(XP) JMP(XP) • Most difficult for asynchronous interrupts 6.004 – Spring 2009 L14 – Building a Beta 18 L14 – Building a Beta 19 6.004 – Spring 2009 3/31/09 L14 – Building a Beta 20 Instruction Memory D +4 Ra: <20:16> 1 RA2SEL WASEL XP WA WA RD1 Z RA2 WD RD2 WE WERF JT C: SXT(<15:0>) PC+4+4*SXT(C) Register File RA1 1 Rc: <25:21> 0 IRQ Rc: <25:21> Rb: <15:11> 0 + Z ASEL 1 0 1 BSEL 0 Control Logic PCSEL RA2SEL ASEL BSEL WDSEL ALUFN Wr WERF A WD R/W Adr 1 2 WDSEL 3/31/09 L14 – Building a Beta 21 Beta: Our “Final Answer” 4 3 -1 -0 0 -4 -1 Implementation choices: • ROM indexed by opcode, external branch & trap logic • PLA • “random” logic (eg, standard cell gates) RD PC+4 0 PCSEL -"A" -1 1 1 --- -0 2 0 0 0 0 --- -Z?0:1 0 3 -1 -0 0 1 Wr Data Memory 6.004 – Spring 2009 -1 -0 0 -Z?1:0 -0 B ALU ALUFN WASEL ILL XAdr OP ALUFN F(op) F(op) "+" "+" -WERF 1 1 1 0 1 BSEL 0 1 1 1 -WDSEL 1 1 2 -- 0 WR 0 0 0 1 0 RA2SEL 0 --1 -PCSEL 0 0 0 0 2 ASEL 0 0 0 0 -WASEL 0 0 0 -- 0 IRQ A Illop Bad Opcode: Reg[XP] PC+4; PC “IllOp” Other: Reg[XP] PC+4; PC “Xadr” 00 LDR 0 BNE 1 PC BEQ 2 JMP JT ST 3 LD 4 OPC PCSEL Control Logic Exceptions OP ILL XAdr OP 3/31/09 6.004 – Spring 2009 L14 – Building a Beta 22 Next Time: Tackling the Memory Bottleneck JT 2 PC 1 0 A Instruction Memory D +4 Ra: <20:16> 1 RA2SEL WASEL XP WA WA RD1 Z PC+4+4*SXT(C) Register File RA1 1 Rc: <25:21> 0 IRQ Rc: <25:21> Rb: <15:11> 0 + No. You’ve gotta print up all those little “Beta Inside” stickers. Is that all there is to building a processor??? 00 RA2 WD RD2 WE WERF JT C: SXT(<15:0>) Z ASEL 1 0 1 0 BSEL Control Logic PCSEL RA2SEL ASEL BSEL WDSEL ALUFN Wr WERF WASEL A B ALU ALUFN Adr R/W Wr RD PC+4 0 6.004 – Spring 2009 WD Data Memory 3/31/09 1 2 WDSEL L14 – Building a Beta 23 6.004 – Spring 2009 3/31/09 L14 – Building a Beta 24