Building the Beta A ALU n a Ki strumctoiory IntMe 0 Bet B A 1 D 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 1 CPU Design Tradeoffs Maximum Performance: measured by the numbers of instructions executed per second Minimum Cost : measured by the size of the circuit. Best Performance/Price: measured by the ratio of MIPS to size. In power-sensitive applications MIPS/Watt is important too. 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 2 Performance Measure Millions of Instructions per Second MIPS = Clock Frequency (MHz) C.P.I. Clocks per instruction PUSHING PERFORMANCE ... TODAY: 1 cycle/inst. NEXT TIME: more MHz via pipelining NEXT NEXT TIME: fixing various pipelining issues 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 3 The Beta ISA OPCODE 6 5 5 5 11 16 10xxxx Rc Ra Rb (unused) Operate class: Reg[Rc] ← Reg[Ra] op Reg[Rb] 11xxxx Rc Ra literal C (signed) Operate class: Reg[Rc] ← Reg[Ra] op SXT(C) Opcodes, both formats: ADD SUB CMPEQ CMPLE AND OR SHL SHR 01xxxx Rc LD: ST: JMP: BEQ: BNE: LDR: 6.004 – Fall 2002 Ra MUL* CMPLT XOR SRA DIV* *optional literal C (signed) Instruction Instructionclasses classes distinguished distinguishedby by OPCODE: OPCODE: OP OP OPC OPC MEM MEM Transfer TransferofofControl Control Reg[Rc] ← Mem[Reg[Ra]+SXT(C)] Mem[Reg[Ra]+SXT(C)] ← Reg[Rc] Reg[Rc] ← PC+4; PC ← Reg[Ra] Reg[Rc] ← PC+4; if Reg[Ra]=0 then PC ← PC+4+4*SXT(C) Reg[Rc] ← PC+4; if Reg[Ra]≠0 then PC ← PC+4+4*SXT(C) Reg[Rc] ← Mem[PC + 4 + 4*SXT(C)] 10/29/0 2 L15 – Building a Beta 4 Approach: Incremental Featurism Each instruction class can be implemented using a simple component repertoire. We’ll try implementing data paths for each class individually, and merge them (using MUXes, etc). Steps: 1. Operate instructions 2. Load & Store Instructions 3. Jump & Branch instructions 4. Exceptions 5. Merge data paths Our Bag of Components: Registers 1 0 A ALU RA1 WA WE WD Muxes B “Black box” ALU RA2 Register File (3-port) A D WD Instruction Memory A RD Data Memory R/W RD1 6.004 – Fall 2002 10/29/0 2 RD2 Memories L15 – Building a Beta 5 Multi-Port Register Files Write Port (independent Read addresses) dest Write Address EN EN clk EN … EN 5 CLK bsel asel 5 RA1 RA2 WA WE Write Enable Write Data 5 32 WD Register File (3-port) RD1 32 Read Port A RD2 32 (Independent Read Data) Read Port B D EN s1 0 clk D Q 2 combinational READ ports*, 1 clocked WRITE port *internal logic ensures Reg[31] reads as 0 Q 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 6 Register File Timing 2 combinational READ ports, 1 clocked WRITE port RA A RD Reg[A] new Reg[A] tPD tPD CLK WE A WA WD new Reg[A] tS th What if (say) WA=RA1??? RD1 reads “old” value of Reg[RA1] until next clock edge! 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 7 Starting point: ALU Ops 32-bit (4-byte) ADD instruction: 10000000100000100001100000000000 OpCode Rc Ra Rb (unused) Means, to BETA, Reg[R4] ← Reg[R2] + Reg[R3] First, we’ll need hardware to: • Read next 32-bit instruction • DECODE instruction: ADD, SUB, XOR, etc • READ operands (Ra, Rb) from Register File; • PERFORM indicated operation; • WRITE result back into Register File (Rc). 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 8 Instruction Fetch/Decode • Use a counter to FETCH the next instruction: PROGRAM COUNTER (PC) PC 00 A Instruction 32 +4 Memory D 32 32 INSTRUCTION WORD FIELDS OPCODE <31:26> Control Logic CONTROL SIGNALS 6.004 – Fall 2002 10/29/0 2 • use PC as memory address • add 4 to PC, load new value at end of cycle • fetch instruction from memory º use some instruction fields directly (register numbers, 16-bit constant) º use bits <31:26> to generate controls L15 – Building a Beta 9 ALU Op Data Path PC 1 0 XXXX 00 A Instruction Memory Rc Ra Rb unused Reg[Rc] ← Reg[Ra] op Reg[Rb] OP: D +4 Ra: <20:16> Rc: <25:21> Rb: <15:11> RA1 RA2 Register File WA RD1 WD RD2 32 WE WERF 32 Control Logic A ALUFN ALU B ALUFN WERF 32 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 10 ALU Operations (w/constant) 1 1 XXXX PC OPC: 00 A Rc Ra literal C (signed) Reg[Rc] ← Reg[Ra] op SXT(C) Instruction Memory D +4 Ra: <20:16> RA1 Rc: <25:21> WA WA RD1 Rb: <15:11> Register File RA2 WD RD2 WE WERF C: SXT(<15:0>) 32 1 0 BSEL Control Logic A BSEL ALUFN ALU B ALUFN WERF 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 11 Load Instruction 011000 PC LD: 00 A +4 Rc Ra literal C (signed) Reg[Rc] ← Mem[Reg[Ra]+SXT(C)] Instruction Memory D Ra: <20:16> Rb: <15:11> Register File RA1 Rc: <25:21> WA WA RD1 RA2 WD RD2 WE WERF C: SXT(<15:0>) 1 0 BSEL Control Logic A BSEL WDSEL ALUFN Wr WERF ALUFN ALU B WD R/W Wr Data Memory 32 Adr RD 32 0 6.004 – Fall 2002 10/29/0 2 1 2 WDSEL L15 – Building a Beta 12 Store Instruction 011001 PC Rc 00 A literal C (signed) Mem[Reg[Ra]+SXT(C)] ← Reg[Rc] ST: Instruction Memory Ra D +4 Ra: <20:16> Rc: <25:21> Rb: <15:11> 0 Register File RA1 Rc: <25:21> WA WA RD1 1 RA2SEL RA2 WD RD2 WE WERF C: SXT(<15:0>) 1 0 BSEL Control Logic 32 RA2SEL A BSEL WDSEL ALUFN Wr ALUFN ALU B WD R/W Wr Data Memory Adr RD WERF 0 6.004 – Fall 2002 10/29/0 2 1 2 WDSEL L15 – Building a Beta 13 JMP Instruction JT PCSEL 4 3 2 PC 1 0 011011 Rc JMP: 00 A Ra literal C (signed) Reg[Rc] ← PC+4; PC ← Reg[Ra] Instruction Memory D +4 Ra: <20:16> Rc: <25:21> Rb: <15:11> 0 Register File RA1 Rc: <25:21> WA WA RD1 C: SXT(<15:0>) 1 RA2SEL RA2 WD RD2 WE WERF JT 1 0 BSEL Control Logic PCSEL RA2SEL A BSEL WDSEL ALUFN Wr ALUFN ALU B WD R/W Wr Data Memory Adr RD WERF PC+4 32 0 6.004 – Fall 2002 10/29/0 2 1 2 WDSEL L15 – Building a Beta 14 BEQ/BNE Instructions JT PCSEL 4 3 011101 2 1 0 Rc Ra literal C (signed) Reg[Rc] ← PC+4; if Reg[Ra]=0 then PC ← PC+4+4*SXT(C) BEQ: 32 PC 00 011110 Instruction Memory A Rc Ra literal C (signed) Reg[Rc] ← PC+4; if Reg[Ra]≠0 then PC ← PC+4+4*SXT(C) BNE: D +4 Ra: <20:16> Rb: <15:11> Rc: <25:21> 0 + PC+4+4*SXT(C) 4*SXT(<15:0>) Z Register File RA1 Rc: <25:21> WA WA RD1 Z 1 RA2SEL RA2 WD RD2 WE WERF JT C: SXT(<15:0>) 1 0 BSEL Control Logic PCSEL RA2SEL BSEL WDSEL ALUFN Wr WERF A ALUFN ALU B WD R/W Wr Data Memory Adr RD PC+4 0 6.004 – Fall 2002 10/29/0 2 1 2 WDSEL L15 – Building a Beta 15 Load Relative Instruction 011111 LDR: Rc Ra literal C (signed) Reg[Rc] ← Mem[PC + 4+ 4*SXT(C)] Hey, WAIT A MINUTE. What’s Load Relative good for anyway??? I thought • Code is “PURE”, i.e. READ-ONLY; and stored in a “PROGRAM” region of memory; • Data is READ-WRITE, and stored either • On the STACK (local); or • In some GLOBAL VARIABLE region; or C: C: BETA: BETA: • In a global storage HEAP. So why an instruction designed to load data that’s “near” the instruction??? Addresses & other large constants 6.004 – Fall 2002 10/29/0 2 c1: c1: XX == XX ** 123456; 123456; LD(X, LD(X, r0) r0) LDR(c1, LDR(c1, r1) r1) MUL(r0, r1, MUL(r0, r1, r0) r0) ST(r0, X) ST(r0, X) ... ... LONG(123456) LONG(123456) L15 – Building a Beta 16 LDR Instruction JT PCSEL 4 3 2 1 IF PC 011111 0 Rc literal C (signed) Reg[Rc] ← Mem[PC + 4 + 4*SXT(C)] LDR: 00 Ra Instruction Memory A D +4 Ra: <20:16> Rc: <25:21> Rb: <15:11> 0 + Register File RA1 Rc: <25:21> WA WA RD1 Z PC+4+4*SXT(C) 1 RA2SEL RA2 WD RD2 WE WERF JT C:SXT( <15:0>) Z ASEL 1 0 1 0 BSEL Control Logic PCSEL RA2SEL ASEL BSEL WDSEL ALUFN Wr WERF A ALUFN ALU B WD R/W Wr Data Memory Adr RD PC+4 0 6.004 – Fall 2002 10/29/0 2 1 2 WDSEL L15 – Building a Beta 17 Exception Processing Plan: • Interrupt running program • Invoke exception handler (like a procedure call) • Return to continue execution. We’d like RECOVERABLE INTERRUPTS for • Synchronous events, generated by CPU or system FAULTS (eg, Illegal Instruction, divide-by-0, illegal mem address) TRAPS & system calls (eg, read-a-character) • Asynchronous events, generated by I/O (eg, key struck, packet received, disk transfer complete) KEY: TRANSPARENCY to interrupted program. • Most difficult for asynchronous interrupts 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 18 Implementation… How exceptions work: • Don’t execute current instruction • Instead fake a “forced” procedure call • save current PC (actually current PC + 4) • load PC with exception vector • 0x4 for synch. exception, 0x8 for asynch. exceptions Question: where to save current PC + 4? • Our approach: reserve a register (R30, aka XP) • Prohibit user programs from using XP. Why? IllOp: PUSH(XP) Example: DIV unimplemented LD(R31,A,R0) LD(R31,B,R1) DIV(R0,R1,R2) ST(R2,C,R31) Forced by hardware Fetch inst. at Mem[Reg[XP]–4] check for DIV opcode, get reg numbers perform operation in SW, fill result reg POP(XP) JMP(XP) 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 19 ILL XAdr OP PCSEL 4 3 Exceptions JT 2 PC 1 0 Bad Opcode: Reg[XP] ← PC+4; PC ← “IllOp” Other: Reg[XP] ← PC+4; PC ←“Xadr” 00 A Instruction Memory D +4 Ra: <20:16> 0 + 1 WASEL XP Rc: <25:21> PC+4+4*SXT(C) Register File RA1 1 WA WA 0 RD1 Z IRQ Rc: <25:21> Rb: <15:11> RA2SEL RA2 WD RD2 WE WERF JT C: SXT(<15:0>) Z ASEL 1 0 1 0 BSEL Control Logic PCSEL RA2SEL ASEL BSEL WDSEL ALUFN Wr WERF WASEL A ALUFN ALU WD R/W Wr Data Memory Adr RD PC+4 0 6.004 – Fall 2002 B 10/29/0 2 1 2 WDSEL L15 – Building a Beta 20 — 1 — 0 0 — 4 — 1 LDR — “A” — 1 1 1 — — — 0 2 0 0 0 0 — — — Z ? 0 : 1 0 3 — 1 — 0 0 1 BNE trap — 1 — 0 0 — Z ? 1 : 0 — 0 Illop ALUFN F(op) F(op) “+” “+” — WERF 1 1 1 0 1 BSEL 0 1 1 1 — WDSEL 1 1 2 — 0 WR 0 0 0 1 0 RA2SEL 0 — — 1 — PCSEL 0 0 0 0 2 ASEL 0 0 0 0 — WASEL 0 0 0 — 0 BEQ JMP ST LD OPC OP Control Logic Implementation choices: • ROM indexed by opcode, external branch & trap logic • PLA • “random” logic (eg, standard cell gates) 6.004 – Fall 2002 10/29/0 2 L15 – Building a Beta 21 Beta: Our “Final Answer” ILL XAdr OP PCSEL 4 3 JT 2 PC 1 0 00 A Instruction Memory D +4 Ra: <20:16> 0 + 1 WASEL XP Rc: <25:21> PC+4+4*SXT(C) Register File RA1 1 WA WA 0 RD1 Z IRQ Rc: <25:21> Rb: <15:11> RA2SEL RA2 WD RD2 WE WERF JT C: SXT(<15:0>) Z ASEL 1 0 1 0 BSEL Control Logic PCSEL RA2SEL ASEL BSEL WDSEL ALUFN Wr WERF WASEL A ALUFN ALU WD R/W Wr Data Memory Adr RD PC+4 0 6.004 – Fall 2002 B 10/29/0 2 1 2 WDSEL L15 – Building a Beta 22 Next Time: Pipelined Betas Hey, building a computer is not really all that difficult 6.004 – Fall 2002 So let’s run down to the 6.004 lab and put Intel out of business! 10/29/0 2 L15 – Building a Beta 23