Designing a Single Cycle Datapath In this lecture, slides from lectures 3, 8 and 9 from the course Computer Architecture ECE 201 by Professor Mike Schulte are used with permission. The Big Picture: Where are We Now? ° The Five Classic Components of a Computer Processor Input Control Memory Datapath Output ° Today’s Topic: Design a Single Cycle Processor The Big Picture: The Performance Perspective ° Performance of a machine is determined by: CPI • Instruction count • Clock cycle time • Clock cycles per instruction Inst. Count Cycle Time ° Processor design (datapath and control) will determine: • Clock cycle time • Clock cycles per instruction ° Single cycle processor - one clock cycle per instruction • Advantages: Simple design, low CPI • Disadvantages: Long cycle time, which is limited by the slowest instruction. How to Design a Processor: step-by-step 1. Analyze instruction set => datapath requirements • the meaning of each instruction is given by register transfers R[rd] <– R[rs] + R[rt]; • datapath must include storage element for ISA registers • datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Design datapath to meet the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Design the control logic MIPS Instruction set MIPS operands Name Example $s0-$s7, $t0-$t9, $zero, 32 registers $a0-$a3, $v0-$v1, $gp, $fp, $sp, $ra, $at Memory[0], 232 memory Memory[4], ..., words Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential w ords differ by 4. Memory holds data structures, such as arrays, and spilled registers, such as those saved on procedure calls. Memory[4294967292] add MIPS assembly language Example Meaning add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; data in registers subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers $s1 = $s2 + 100 $s1 = Memory[$s2 + 100] Memory[$s2 + 100] = $s1 $s1 = Memory[$s2 + 100] Memory[$s2 + 100] = $s1 Used to add constants Category Arithmetic Comments Fast locations for data. In MIPS, data must be in registers to perform arithmetic. MIPS register $zero alw ays equals 0. Register $at is reserved for the assembler to handle large constants. Instruction addi $s1, $s2, 100 lw $s1, 100($s2) sw $s1, 100($s2) store word lb $s1, 100($s2) load byte sb $s1, 100($s2) store byte load upper immediate lui $s1, 100 add immediate load word Data transfer Conditional branch Unconditional jump $s1 = 100 * 2 16 Comments Word from memory to register Word from register to memory Byte from memory to register Byte from register to memory Loads constant in upper 16 bits branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to PC + 4 + 100 Equal test; PC-relative branch branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to PC + 4 + 100 Not equal test; PC-relative set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0 Compare less than; for beq, bne set less than immediate slti jump j jr jal jump register jump and link $s1, $s2, 100 if ($s2 < 100) $s1 = 1; Compare less than constant else $s1 = 0 2500 $ra 2500 Jump to target address go to 10000 For switch, procedure return go to $ra $ra = PC + 4; go to 10000 For procedure call Review: The MIPS Instruction Formats ° All MIPS instructions are 32 bits long. The three instruction formats are: 31 • R-type 26 op rs 6 bits • I-type • J-type 31 21 op 5 bits 11 6 0 rd shamt funct 5 bits 5 bits 6 bits 16 0 immediate rt 5 bits 16 bits 26 op 6 bits ° The different fields are: • • • • • • 5 bits 21 rs 6 bits 31 rt 5 bits 26 16 0 target address 26 bits op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the “op” field address / immediate: address offset or immediate value target address: target address of the jump instruction Translating MIPS Assembly into Machine Language ° Humans see instructions as words (assembly language), but the computer sees them as ones and zeros (machine language). ° An assembler translates from assembly language to machine language. ° For example, the MIPS instruction add $t0, $s1, $s2 is translated as follows (see back of book): Assembly Comment add op = 0, shamt = 0, funct = 32 $t0 rd = 8 $s1 rs = 17 $s2 rt = 18 000000 op 10001 10010 rs rt 01000 00000 rd shamt 100000 funct MIPS Addressing Modes ° Addressing modes specify where the data used by an instruction is located. mode example action register direct add $s1, $s2, $s3 $s1 = $s2 + $s3 immediate addi $s1, $s2, 200 $s1 = $s2 + 200 base+index lw $s1 = mem[200 + $s2] PC-relative beq $s1, $s2, 200 $s1, 200($s2) if ($s1 == $s2) PC = PC+4+200*4 Pseudo-direct j 4000 PC = (PC[31:28], 4000*4) ° Often, the type of addressing mode depends on the type of operation being performed (e.g., branches all use PC relative) ° A summary of MIPS addressing modes is given on the back cover of the book. MIPS Addressing Modes/Instruction Formats • All MIPS instructions are 32 bits wide - fixed length Register (direct) op rs rt add $s1, $s2, $s3 rd register Immediate Base+index op rs rt immed op rs rt immed register addi $s1, $s2, 200 Memory + lw $s1, 200($s2) PC-relative op rs PC rt immed Memory + beq $s1, $s2, 200 Step 1a: The MIPS Subset for Today ° ADD and SUB 31 op • addu rd, rs, rt • subu rd, rs, rt ° OR Immediate: • ori rt, rs, imm16 ° LOAD and STORE 26 op 31 • beq rs, rt, imm16 5 bits 21 26 op 6 bits 5 bits 16 5 bits 0 rd shamt funct 5 bits 5 bits 6 bits 0 16 bits 0 immediate 5 bits 21 rs 6 immediate rt 5 bits 11 16 rt rs 6 bits 5 bits 21 rs 6 bits 31 26 16 rt 5 bits 26 op • lw rt, rs, imm16 • sw rt, rs, imm16 ° BRANCH: rs 6 bits 31 21 16 bits 16 rt 5 bits 0 immediate 16 bits Register Transfer Logic (RTL) ° RTL gives the meaning of the instructions ° All instructions start by fetching the instruction op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = MEM[ PC ] inst Register Transfers addu R[rd] <– R[rs] + R[rt]; PC <– PC + 4 subu R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ori R[rt] <– R[rs] + zero_ext(imm16); PC <– PC + 4 load R[rt] <– MEM[ R[rs] + sign_ext(imm16)]; PC <– PC + 4 store MEM[ R[rs] + sign_ext(imm16) ] <– R[rt]; PC <– PC + 4 beq if ( R[rs] == R[rt] ) then PC <– PC + 4 + sign_ext(imm16)] || 00 else PC <– PC + 4 Step 1: Requirements of the Instruction Set ° Memory • instruction & data ° Registers (32 x 32) • read rs • read rt • write rt or rd ° PC ° Extender (sign extend or zero extend) ° Add and sub register or extended immediate ° Add 4 or shifted extended immediate to PC Step 2: Components of the Datapath ° Adder CarryIn A Adder B 32 32 Sum Carry 32 Select ° MUX B 32 MUX A 32 ° ALU OP A 32 32 ALU B 32 Y 3 32 Result Combinational Logic: Does not use a clock Storage Element: Register (Basic Building Blocks) ° Register • Similar to the D Flip Flop except - N-bit input and output Write Enable Data In N Data Out N - Write enable input • Write Enable: Clk - negated (0): Data Out will not change - asserted (1): Data Out will become Data In on the falling edge of the clock Clocking Methodology - Negative Edge Triggered Clk Setup Hold Setup Hold . . . . . . Don’t Care . . . . . . ° All storage elements are clocked by the same clock edge ° Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew Storage Element: Register File ° Register File consists of 32 registers: Write Enable RW RARB 5 5 5 • Two 32-bit output busses: busA busA and busB busW 32 32 32-bit • One 32-bit input bus: busW 32 Registers busB Clk ° Register is selected by: 32 • RA (number) selects the register to put on busA (data) • RB (number) selects the register to put on busB (data) • RW (number) selects the register to be written via busW (data) when Write Enable is 1 ° Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - RA or RB valid => busA or busB valid after “access time.” Register File - Read ° Built using D flip-flops Read register number 1 Register 0 Register 1 Read register number 1 Read register number 2 Write register Write data Register file Read data 1 Read data 2 M ... u Register n – 2 x Read data 1 Register n – 1 Read register number 2 Write M u x Read data 2 Register File - write Write C 0 1 Register number n-to-2n decoder Register 0 . .. D C Register 1 n–1 n D .. . C Register n – 2 D C Register n – 1 Register data D Storage Element: Idealized Memory ° Memory (idealized) Write Enable • One input bus: Data In • One output bus: Data Out ° Memory word is selected by: Data In 32 Clk Address 32 DataOut 32 • Address selects the word to put on Data Out • Write Enable = 1: address selects the memory word to be written via the Data In bus ° Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, memory behaves as a combinational logic block: - Address valid => Data Out valid after “access time.” Step 3 ° Register Transfer Requirements –> Datapath Design • Instruction Fetch • Decode instructions and Read Operands • Execute Operation • Write back the result 3a: Overview of the Instruction Fetch Unit ° The common RTL operations • Fetch the Instruction: mem[PC] • Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump: PC <- “something else” Clk PC Next Address Logic Address Instruction Memory Instruction Word 32 3b: Add & Subtract ° R[rd] <- R[rs] op R[rt] Example: addu rd, rs, rt • Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields • ALUctr and RegWr: control logic after decoding the instruction 31 26 21 op 16 rs 6 bits rt 5 bits Rd Rs RegWr 5 5 5 bits Rt rd shamt funct 5 bits 5 bits 6 bits busA 32 busB 32 ALU 32 Clk 32 32-bit Registers 6 ALUctr 3 5 Rw Ra Rb busW 11 Result 32 0 3c: Logical Operations with Immediate ° R[rt] <- R[rs] op ZeroExt[imm16] 31 26 op Example : ori rt, rs, imm16 21 rs 6 bits rt 5 bits RegDst 0 immediate 5 bits 31 Rd 11 16 16 15 16 bits rd? 0 immediate 0000000000000000 16 bits Rt 16 bits Mux RegWr 5 32 Clk ALUctr 3 busA Rw Ra Rb 32 32 32-bit Registers ALU busW Rs 5 5 busB 16 ZeroExt imm16 Mux 32 32 ALUSrc Result 32 3d: Load Operations ° R[rt] <- Mem[R[rs] + SignExt[imm16]] 31 26 op rs 6 bits Rd Rt RegDst Mux RegWr 5 32 Clk rt 5 bits Rs 5 5 16 bits rd ALUctr 3 busA W_Src 32 ALUSrc Mux Extender 32 ExtOp 32 MemWr Mux busB 32 16 0 immediate 5 bits Rw Ra Rb 32 32-bit Registers imm16 11 16 ALU busW 21 Example: lw rt, rs, imm16 WrEn Adr Data In 32 Clk Data Memory 32 3e: Store Operations ° Mem[ R[rs] + SignExt[imm16] <- R[rt] ] Example: sw rt, rs, imm16 31 26 21 op rs 6 bits Rd RegDst 16 0 rt 5 bits immediate 5 bits Rt 16 bits ALUctr MemWr W_Src Mux RegWr 5 32 Clk 5 5 Rw Ra Rb 32 32-bit Registers 32 32 ExtOp 32 Mux busB 32 Mux 16 busA Extender imm16 3 Rt ALU busW Rs WrEn Adr Data In 32 Clk ALUSrc Data Memory 32 3f: The Branch Instruction 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits ° beq rs, rt, imm16 • mem[PC] Fetch the instruction from memory • Equal <- R[rs] == R[rt] Calculate the branch condition • if (COND eq 0) - Calculate the next instruction’s address PC <- PC + 4 + ( SignExt(imm16) x 4 ) else - PC <- PC + 4 Datapath for Branch Operations rs, rt, imm16 31 Datapath generates condition (equal) 26 21 op rs 6 bits 16 rt 5 bits 0 immediate 5 bits 16 bits Inst Address nPC_sel 4 Adder RegWr 5 00 32 PC Mux Adder PC Ext imm16 Cond Clk busW Clk Rs 5 Rt 5 Rw Ra Rb 32 32-bit Registers busA 32 busB 32 Equal? ° beq Putting it All Together: A Single Cycle Datapath nPC_sel RegDst 00 Rs 5 Rt 5 imm16 16 Extender Clk = 32 0 1 32 Data In 32 ExtOp Clk ALUSrc 32 0 Mux busA Rw Ra Rb 32 32-bit Registers busB 32 Clk MemtoReg 3 Mux PC Mux Adder PC Ext imm16 ALUctr MemWr Equal ALU Adder 32 Imm16 0 RegWr 5 busW Rd Rd Rt 1 4 Rt Instruction<31:0> <0:15> Rs <11:15> Adr <16:20> <21:25> Inst Memory WrEn Adr Data Memory 1 Step 4: Given Datapath: RTL -> Control Instruction<31:0> Rd <0:15> Rs <11:15> Rt <16:20> Op Fun <21:25> <0:5> Adr <26:31> Inst Memory Imm16 Control nPC_selRegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg DATA PATH Equal A Single Cycle Datapath ° We have everything except control signals (underlined) • Today’s lecture will look at how to generate the control signals Instruction<31:0> 1 Mux 0 RegWr 5 Rs 5 Rt Rt ALUctr 5 busA 0 1 32 MemtoReg 0 32 WrEn Adr Data In 32 Clk Imm16 MemWr 32 ALUSrc ExtOp Rd Data Memory Mux 16 Extender imm16 32 Mux 32 Clk Rw Ra Rb 32 32-bit Registers busB 32 ALU busW Zero Rs <0:15> Clk <11:15> RegDst Rt <16:20> Rd Instruction Fetch Unit <21:25> nPC_sel 1 Meaning of the Control Signals ° MemWr: write memory ° ExtOp: “zero”, “sign” ° ALUsrc: 0 => regB; 1 => immed ° RegDst: 0 => “rt”; 1 => “rd” ° ALUctr: “add”, “sub”, “or” ° RegWr: write dest register RegDst Rs 5 Rt 5 32 1 32 Data In 32 ExtOp Clk ALUSrc 32 0 Mux 0 ALU 16 Extender imm16 = Mux busA Rw Ra Rb 32 32-bit Registers busB 32 Clk MemtoReg 3 0 RegWr 5 32 ALUctr MemWr Equal Rd Rt 1 busW ° MemtoReg: 0 => ALU; 1 => Mem WrEn Adr Data Memory 1 RTL: The Add Instruction 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 11 6 0 rd shamt funct 5 bits 5 bits 6 bits ° add rd, rs, rt • mem[PC] Fetch the instruction from memory • R[rd] <- R[rs] + R[rt] The actual operation • PC <- PC + 4 Calculate the next instruction’s address The Single Cycle Datapath during Add/Sub 31 26 21 op rs 16 rt 11 6 rd shamt ° R[rd] <- R[rs] op R[rt] <0:15> Imm16 MemtoReg = 0 Zero 1 32 0 32 32 WrEn Adr Data In 32 Clk ALUSrc = 0 ExtOp = x MemWr = 0 Data Memory Mux Extender 16 Rd 5 busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 imm16 Rs <11:15> Rt <16:20> 5 ALUctr = Add Rt Mux 32 Clk Rs ALU busW Clk 1 Mux 0 RegWr = 1 5 Instruction Fetch Unit <21:25> RegDst = 1 Rt funct Instruction<31:0> nPC_sel= +4 Rd 0 1 Instruction Fetch Unit at the End of Add ° PC <- PC + 4 • This is the same for all instructions except Branch and Jump Inst Memory Adr nPC_sel = +4 4 00 Adder imm16 PC Mux Adder Clk Instruction<31:0> The Single Cycle Datapath during Load 31 26 21 op rs 16 0 rt immediate ° R[rt] <- Data Memory {R[rs] + SignExt[imm16]} Instruction<31:0> 5 ALUctr = Add Rt 5 busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Rt Zero 32 Imm16 MemtoReg = 1 MemWr = 0 0 1 WrEn Adr Data In 32 Clk ALUSrc = 1 ExtOp = 1 Rd Mux ALU 16 Extender imm16 1 Rs 32 Mux 32 Clk Rs <0:15> 1 Mux 0 RegWr = 1 5 busW Clk <11:15> RegDst = 0 Rt <16:20> Rd Instruction Fetch Unit <21:25> nPC_sel= +4 Data Memory 32 The Single Cycle Datapath during Store 31 26 21 op rs 16 0 rt immediate ° Data Memory {R[rs] + SignExt[imm16]} <- R[rt] Instruction<31:0> 5 ALUctr = Add Rt 5 Zero Imm16 MemWr = 1 0 32 32 WrEn Adr Data In 32 32 Clk ALUSrc = 1 ExtOp = 1 Rd MemtoReg = x ALU 16 Extender imm16 1 Rs Data Memory Mux busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Rt Mux 32 Clk Rs <0:15> 1 Mux 0 RegWr = 0 5 busW Clk <11:15> RegDst = x Rt <16:20> Rd Instruction Fetch Unit <21:25> nPC_sel= +4 1 The Single Cycle Datapath during Branch 31 26 21 op rs 16 0 rt immediate ° if (R[rs] - R[rt] == 0) then Zero <- 1 ; else Zero <- 0 5 5 ALUctr = Subtract Rt 5 Zero ALU 16 Extender imm16 1 32 Rd 0 32 WrEn Adr Data In 32 Clk Imm16 MemtoReg = x MemWr = 0 32 ALUSrc = 0 ExtOp = x Rs Data Memory Mux busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 Rt Mux 32 Clk Rs <0:15> 1 Mux 0 RegWr = 0 busW Clk <11:15> RegDst = x Rt <16:20> Rd Instruction Fetch Unit <21:25> nPC_sel= “Br” Instruction<31:0> 1 Instruction Fetch Unit at the End of Branch 31 26 21 op rs 16 rt 0 immediate ° if (Zero == 1) then PC = PC + 4 + SignExt[imm16]*4 ; Inst Memory else PC = PC + 4 Instruction<31:0> Adr nPC_sel See book for what the datapath and control looks like for jump instructions. 4 00 Adder imm16 PC Mux Adder Clk Compared to book our processor also supports the ORI instructions. A Summary of the Control Signals See Appendix A We Don’t Care :-) func 10 0000 10 0010 op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump RegDst 1 1 0 0 x x x ALUSrc 0 0 1 1 1 0 x MemtoReg 0 0 0 1 x x x RegWrite 1 1 1 1 0 0 0 MemWrite 0 0 0 0 1 0 0 nPCsel 0 0 0 0 0 1 0 Jump 0 0 0 0 0 0 1 ExtOp x x 0 1 1 x x Add Subtract Or Add Add Subtract xxx ALUctr<2:0> 31 26 21 16 R-type op rs rt I-type op rs rt J-type op 11 rd 6 shamt immediate target address 0 funct add, sub ori, lw, sw, beq jump Step 5: The Concept of Local Decoding op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 R-type ori lw sw beq jump RegDst 1 0 0 x x x ALUSrc 0 1 1 1 0 x MemtoReg 0 0 1 x x x RegWrite 1 1 1 0 0 0 MemWrite 0 0 0 1 0 0 Branch 0 0 0 0 1 0 Jump 0 0 0 0 0 1 ExtOp x 0 1 1 x x “R-type” Or Add Add Subtract xxx ALUop<N:0> func op 6 Main Control 6 ALUop ALUctr 3 ALU N ALU Control (Local) The Encoding of ALUop func op 6 Main Control ALU Control (Local) 6 ALUop N ALUctr 3 ° In this exercise, ALUop has to be N=2 bits wide to represent: • (1) “R-type” instructions • “I-type” instructions that require the ALU to perform: - (2) Or, (3) Add, and (4) Subtract ° To implement the full MIPS ISA, ALUop has to be 3 bits to represent: • (1) “R-type” instructions • “I-type” instructions that require the ALU to perform: - (2) Or, (3) Add, (4) Subtract (5) And (6) Set on < ALUop (Symbolic) ALUop<2:0> R-type ori lw sw “R-type” Or Add 0 10 0 00 1 00 beq jump Add Subtract xxx 0 00 0 01 xxx The Decoding of the “func” Field func op 6 ALU Control (Local) 6 ALUop Main Control N ALUop (Symbolic) R-type ori lw sw “R-type” Or Add 0 10 0 00 1 00 26 op 21 rs Get func from back of book for R-type funct<5:0> 3 R-type ALUop<2:0> 31 ALUctr Instruction Operation 16 rt beq jump Add Subtract xxx 0 00 0 01 xxx 11 rd 6 shamt 0 funct Our processor only implements subset of operations ALUctr<2:0> ALU Operation 10 0000 add 000 Add 10 0010 subtract 001 Subtract 10 0100 and 010 And 10 0101 or 110 Or 10 1010 set-on-less-than 111 Set-on-less-than The Truth Table for ALUctr<2> R-type ALUop (Symbolic) “R-type” ALUop<2:0> 1 00 funct<3:0> Instruction Op. 0000 add beq 0010 subtract ori lw sw Or Add Add Subtract 0100 and 0 10 0 00 0 00 0 01 0101 or 1010 set-on-less-than ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALU Operation ALUctr bit<2> bit<1> bit<0> 0 0 0 x x x x Add 0 1 0 0 x 1 x x x x Subtract 1 1 0 0 1 x x x x x Or 0 0 1 1 x x 0 0 0 0 Add 0 1 0 1 x x 0 0 1 0 Subtract 1 1 0 1 x x 0 1 0 0 And 0 0 0 1 x x 0 1 0 1 Or 0 0 1 1 x x 1 0 1 0 Set on < 1 1 1 This control is for more R-type instructions than our processor, but fewer than the entire MIPS ISA. The Logic Equation for ALUctr<2> ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<2> 0 x 1 x x x x 1 1 x x 0 0 1 0 1 1 x x 1 0 1 0 1 ° ALUctr<2> = !ALUop<2> & ALUop<0> + ALUop<2> & func<1> The Truth Table for ALUctr <1> ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALU Operation ALUctr bit<2> bit<1> bit<0> 0 0 0 x x x x Add 0 1 0 0 x 1 x x x x Subtract 1 1 0 0 1 x x x x x Or 0 0 1 1 x x 0 0 0 0 Add 0 1 0 1 x x 0 0 1 0 Subtract 1 1 0 1 x x 0 1 0 0 And 0 0 0 1 x x 0 1 0 1 Or 0 0 1 1 x x 1 0 1 0 Set on < 1 1 1 The Logic Equation for ALUctr<1> ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<1> 0 0 0 x x x x 1 0 x 1 x x x x 1 1 x x 0 0 0 0 1 1 x x 0 0 1 0 1 1 x x 1 0 1 0 1 ° ALUctr<1> = !ALUop<2> & !ALUop<1> + ALUop<2> & func<2> The Truth Table for ALUctr<0> ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALU Operation ALUctr bit<2> bit<1> bit<0> 0 0 0 x x x x Add 0 1 0 0 x 1 x x x x Subtract 1 1 0 0 1 x x x x x Or 0 0 1 1 x x 0 0 0 0 Add 0 1 0 1 x x 0 0 1 0 Subtract 1 1 0 1 x x 0 1 0 0 And 0 0 0 1 x x 0 1 0 1 Or 0 0 1 1 x x 1 0 1 0 Set on < 1 1 1 The Logic Equation for ALUctr<0> ALUop func bit<2> bit<1> bit<0> bit<3> bit<2> bit<1> bit<0> ALUctr<0> 0 1 x x x x x 1 1 x x 0 1 0 1 1 1 x x 1 0 1 0 1 ° ALUctr<0> = !ALUop<2> & ALUop<1> + ALUop< 2> & func<2> & func<0> + ALUop<2> & func<3> The ALU Control Block func 6 ALUop 3 ALU Control (Local) ALUctr 3 ° ALUctr<2> = !ALUop<2> & ALUop<0> + ALUop<2> & func<1> ° ALUctr<1> = !ALUop<2> & !ALUop<1> + ALUop<2> & func<2> ° ALUctr<0> = !ALUop<2> & ALUop<1> + ALUop< 2> & func<2> & func<0> + ALUop<2> & func<3> The “Truth Table” for the Main Control RegDst func ALUSrc op 6 Main Control ALU Control (Local) 6 : ALUop ALUctr 3 3 op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 R-type ori lw sw beq jump RegDst 1 0 0 x x x ALUSrc 0 1 1 1 0 x MemtoReg 0 0 1 x x x RegWrite 1 1 1 0 0 0 MemWrite 0 0 0 1 0 0 Branch 0 0 0 0 1 0 Jump 0 0 0 0 0 1 ExtOp x 0 1 1 x x “R-type” Or Add Add Subtract xxx ALUop <2> 1 0 0 0 0 x ALUop <1> 0 1 0 0 0 x ALUop <0> 0 0 0 0 1 x ALUop (Symbolic) The “Truth Table” for RegWrite op 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 R-type ori lw sw beq jump 1 1 1 0 0 0 RegWrite ° RegWrite = R-type + ori + lw = !op<5> & !op<4> & !op<3> & !op<2> & !op<1> & !op<0> (R-type) + !op<5> & !op<4> & op<3> & op<2> & !op<1> & op<0> (ori) + op<5> & !op<4> & !op<3> & !op<2> & op<1> & op<0> (lw) .. op<5> .. op<5> <0> R-type .. op<5> <0> ori .. op<5> <0> lw .. <0> sw .. op<5> op<5> <0> beq op<0> jump RegWrite PLA Implementation of the Main Control .. op<5> .. op<5> <0> R-type .. op<5> <0> ori .. op<5> <0> lw .. <0> sw .. op<5> op<5> <0> beq jump op<0> RegWrite ALUSrc RegDst MemtoReg MemWrite Branch Jump ExtOp ALUop<2> ALUop<1> ALUop<0> Putting it All Together: A Single Cycle Processor ALUop 3 RegDst op 6 Instr<31:26> Main Control Rs 5 Rt Rt ALUctr 5 busA 0 1 32 MemtoReg 0 32 WrEn Adr Data In 32 Clk Imm16 MemWr 32 ALUSrc ExtOp Rd Data Memory Mux 16 Extender imm16 Instr<15:0> 32 Mux 32 Clk Rw Ra Rb 32 32-bit Registers busB 32 ALU busW Zero Rs <0:15> Clk <11:15> Instruction Fetch Unit 1 Mux 0 RegWr 5 3 <16:20> RegDst Rt ALUctr Instruction<31:0> nPC_sel <21:25> Rd func Instr<5:0> 6 ALUSrc : ALU Control 1 An abstract view of the critical path - load instruction Ideal Instruction Memory Instruction Rd Rs 5 5 Instruction Address Rt 5 Imm 16 A 32 32 32-bit Registers PC 32 Rw Ra Rb 32 ALU B Clk Clk Next Address Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew 32 Data Address Data In Ideal Data Memory Clk Worst case delay for load is much longer than needed for all other instructions, yet this sets the cycle time. An Abstract View of the Implementation Control Ideal Instruction Memory Instruction Rd Rs 5 5 A 32 32 32-bit Registers PC 32 Rw Ra Rb 32 B Clk Clk Conditions Rt 5 ALU Next Address Instruction Address Control Signals 32 Datapath ° Logical vs. Physical Structure Data Address Data In Clk Ideal Data Memory Data Out Summary ° 5 steps to design a processor • 1. Analyze instruction set => datapath requirements • 2. Select set of datapath components & establish clock methodology • 3. Design datapath meeting the requirements • 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. • 5. Design the control logic ° MIPS makes it easier • • • • Instructions same size Source registers always in same place Immediates same size, location Operations always on registers/immediates ° Single cycle datapath => CPI=1, CCT => long ° Next time: implementing control