ECE7660 Advanced Computer Architecture Review of Single Cycle Datapath Design (excepted from ece4680 lecture notes) Processor Design: How to Implement MIPS Simplicity favors regularity ECE4680 Datapath.1 Fall 2005 The MIPS Instruction Formats °All MIPS instructions are 32 bits long. The three instruction formats: 31 • R-type • I-type • J-type 26 op 6 bits 31 21 rs 5 bits 26 op 6 bits 31 16 rt 5 bits 21 rs 5 bits 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits 16 rt 5 bits 0 immediate 16 bits 26 op 6 bits 0 target address 26 bits °The different fields are: • op: operation of the instruction • rs, rt, rd: the source and destination register specifiers • shamt: shift amount • funct: selects the variant of the operation in the “op” field • address / immediate: address offset or immediate value • target address: target address of the jump instruction ECE4680 Datapath.2 Fall 2005 The MIPS Subset 31 °ADD and subtract • add rd, rs, rt 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits • sub rd, rs, rt 31 °OR Immediate: • ori rt, rs, imm16 0 immediate 16 bits °LOAD and STORE • lw rt, rs, imm16 • sw rt, rs, imm16 °BRANCH: • beq rs, rt, imm16 °JUMP: • j target 31 26 op 6 bits 0 target address 26 bits ECE4680 Datapath.3 Fall 2005 An Abstract View of the Implementation Two types of functional units –Operational element that operate on data (combinational) –State element that contain data (sequential) •Generic Implementation: Clk –use PC to supply instruction address –get the instruction from memory –read registers –use the instruction to decide exactly what to do PC Instruction Address Ideal Instruction Memory Rt 5 Rw Ra Rb 32 32-bit Registers •All instructions use the ALU after reading the registers Imm – Why? memory-reference? arithmetic? control flow? 16 32 32 ALU 32 Instruction Rd Rs 5 5 Clk 32 Data Address Data In Ideal Data Memory DataOut Clk Next step: to fill in the details: more units, more connections, and control unit ECE4680 Datapath.4 Fall 2005 Clocking Methodology Clk Setup Hold Setup Hold . . . . . . Don’t Care . . . . . . °All storage elements are clocked by the same clock edge • Edge-trigged: all stored values are updated on a clock edge °Cycle Time = Latch Prop + Longest Delay Path + Setup + Clock Skew °(Latch Prop + Shortest Delay Path - Clock Skew) > Hold Time ECE4680 Datapath.5 Fall 2005 An Abstract View of the Critical Path °Register file and ideal memory: • The CLK input is a factor ONLY during write operation • During read operation, behave as combinational logic: Clk Address valid => Output valid after “access time.” Critical Path (Load Operation) = PC’s prop time + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew PC Instruction Address Ideal Instruction Memory Rd Rs 5 5 Rt 5 Rw Ra Rb 32 32-bit Registers Imm 16 32 32 ALU 32 Instruction Clk 32 ECE4680 Datapath.6 Data Address Data In Ideal Data Memory DataOut Clk Fall 2005 The Steps of Designing a Processor °Instruction Set Architecture => Register Transfer Language °Register Transfer Language (RTL) => • Datapath components • Datapath interconnect °Datapath components => Control signals °Control signals => Control logic po Element < com nent ECE4680 Datapath.7 Fall 2005 What is RTL: The ADD Instruction Register Transfer Language °add rd, rs, rt • mem[PC] Fetch the instruction from memory • R[rd] <- R[rs] + R[rt] The ADD operation • PC <- PC + 4 Calculate the next instruction’s address ECE4680 Datapath.8 Fall 2005 What is RTL: The Load Instruction °lw rt, rs, imm16 • mem[PC] Fetch the instruction from memory • Addr <- R[rs] + SignExt(imm16) Calculate the memory address • R[rt] <- Mem[Addr] Load the data into the register • PC <- PC + 4 Calculate the next instruction’s address ECE4680 Datapath.9 Fall 2005 Combinational Logic Elements CarryIn °Adder A Adder B 32 Sum 32 Carry 32 °MUX (p.B-9,B-19) °Decoder Select 32 Y 32 3 Decoder B 32 MUX A out0 out1 out2 out7 °ALU OP A ECE4680 Datapath.10 32 ALU B 32 32 Result Zero Fall 2005 Storage Element: Register (p.B22-B25) °Register Write Enable • Similar to the D Flip Flop except - N-bit input and output N - Write Enable input • Write Enable: - Data Out Data In 0: Data Out will not change 1: Data Out will become Data In N Clk • Array of logical elements(see register file on next 2 slides) W tick ONLY if the ted at the clock da up is nt nte The co l is set to 1. rite Enable signa ECE4680 Datapath.11 Fall 2005 Storage Element: Register File °Register File consists of 32 registers: • Two 32-bit output busses: busA and busB • One 32-bit input bus: busW RW RA RB Write Enable 5 5 5 busW 32 Clk 32 32-bit Registers °Register is selected by: busA 32 busB 32 • RA selects the register to put on busA • RB selects the register to put on busB • RW selects the register to be written via busW when Write Enable is 1 °Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - RA or RB valid => busA or busB valid after “access time.” ECE4680 Datapath.12 Fall 2005 Storage Element: Register File -- Detailed diagram RW RA RB Write Enable 5 5 5 RA RB Write Enable 0 1 RW 32-to-1 Decoder 30 C busW 32 Clk busA 32 32 32-bit Registers busB 32 Register 0 D C D Register 1 31 M U busA C D Register 30 X C busW D Register 31 Clk M U busB X ECE4680 Datapath.13 Fall 2005 Storage Element: Idealized Memory Write Enable °Memory (idealized) • One input bus: Data In • One output bus: Data Out Address Data In 32 Clk DataOut 32 °Memory word is selected by: • Address selects the word to put on Data Out • Write Enable = 1: address selects the memory memory word to be written via the Data In bus °Clock input (CLK) • The CLK input is a factor ONLY during write operation • During read operation, behaves as a combinational logic block: - Address valid => Data Out valid after “access time.” ECE4680 Datapath.14 Fall 2005 Overview of the Instruction Fetch Unit (Fig. 5.5) °The common RTL operations • Fetch the Instruction: mem[PC] • Update the program counter: - Sequential Code: PC <- PC + 4 - Branch and Jump PC <- “something else” PC Clk Next Address Logic Address Instruction Word Instruction Memory 32 ECE4680 Datapath.15 Fall 2005 RTL: The ADD Instruction 31 26 op 6 bits °add 21 rs 5 bits 16 rt 5 bits 11 rd 5 bits 6 shamt 5 bits 0 funct 6 bits rd, rs, rt • mem[PC] Fetch the instruction from memory • R[rd] <- R[rs] + R[rt] The actual operation • PC <- PC + 4 Calculate the next instruction’s address ECE4680 Datapath.16 Fall 2005 RTL: The Subtract Instruction 31 26 op 6 bits °sub 21 rs 5 bits 16 11 rt 5 bits rd 5 bits 6 0 shamt 5 bits funct 6 bits rd, rs, rt • mem[PC] Fetch the instruction from memory • R[rd] <- R[rs] - R[rt] The actual operation • PC <- PC + 4 Calculate the next instruction’s address ECE4680 Datapath.17 Fall 2005 Datapath for Register-Register Operations °R[rd] <- R[rs] op R[rt] Example: add rd, rs, rt • Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields • ALUctr and RegWr: control logic after decoding the instruction 31 26 op 6 bits 21 rs 5 bits 16 11 rt 5 bits rd 5 bits Rd Rs Rt RegWr 5 5 5 ECE4680 Datapath.18 0 funct 6 bits ALUctr busA 32 busB 32 ALU busW 32 Clk Rw Ra Rb 32 32-bit Registers 6 shamt 5 bits Result 32 Fall 2005 Register-Register Timing Clk Clk-to-Q New Value Old Value PC Rs, Rt, Rd, Op, Func Old Value ALUctr Old Value RegWr Old Value Instruction Memory Access Time New Value Delay through Control Logic New Value New Value busA, B Old Value busW Old Value Register File Access Time New Value ALU Delay New Value Rd Rs Rt RegWr 5 5 5 Register Write Occurs Here busA Rw Ra Rb 32 32-bit Registers 32 ALU busW 32 Clk ALUctr busB 32 Result 32 ECE4680 Datapath.19 Fall 2005 RTL: The OR Immediate Instruction 31 °ori 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits rt, rs, imm16 • mem[PC] Fetch the instruction from memory • R[rt] <- R[rs] or ZeroExt(imm16) The OR operation • PC <- PC + 4 Calculate the next instruction’s address 31 0000000000000000 16 bits ECE4680 Datapath.20 16 15 0 immediate 16 bits Fall 2005 Datapath for Logical Operations with Immediate °R[rt] <- R[rs] op ZeroExt[imm16]] 31 26 op 6 bits 21 rs 5 bits 16 rt 5 bits Rs 5 5 Don’t Care (Rt) 32 ZeroExt imm16 16 Result 32 Mux busB 32 ALU 32 Clk ALUctr busA Rw Ra Rb 32 32-bit Registers busW 0 Newly added parts are in blue color. Mux RegWr 5 rt, rs, imm16 immediate 16 bits Rt Rd RegDst Example: ori 32 ALUSrc ECE4680 Datapath.21 Fall 2005 RTL: The Load Instruction 31 °lw rt, rs, imm16 26 op 6 bits • mem[PC] 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits Fetch the instruction from memory • Addr <- R[rs] + SignExt(imm16) Calculate the memory address R[rt] <- Mem[Addr] • PC <- PC + 4 Load the data into the register Calculate the next instruction’s address 31 16 15 0 0000000000000000 16 bits 31 16 15 1111111111111111 1 16 bits ECE4680 Datapath.22 0 immediate 16 bits 0 immediate 16 bits Fall 2005 Datapath for Load Operations °R[rt] <- Mem[R[rs] + SignExt[imm16]] 31 26 op 6 bits Rd RegDst 21 rs 5 bits Rs 5 5 Don’t Care (Rt) MemtoReg 32 32 ALUSrc Mux Extender 16 32 MemWr Mux busB 32 imm16 ALUctr ALU 32 Clk 0 immediate 16 bits busA Rw Ra Rb 32 32-bit Registers busW rt, rs, imm16 Rt Mux RegWr 5 16 rt 5 bits Example: lw 32 WrEn Adr Data In 32 Clk Data Memory ExtOp ECE4680 Datapath.23 Fall 2005 RTL: The Store Instruction 31 °sw 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits rt, rs, imm16 • mem[PC] Fetch the instruction from memory • Addr <- R[rs] + SignExt(imm16) Calculate the memory address • Mem[Addr] <- R[rt] Store the register into memory • PC <- PC + 4 Calculate the next instruction’s address ECE4680 Datapath.24 Fall 2005 Datapath for Store Operations °Mem[R[rs] + SignExt[imm16] <- R[rt]] 31 26 op 6 bits Rd RegDst 21 rs 5 bits Example: sw 16 rt 5 bits rt, rs, imm16 0 immediate 16 bits Rt Mux RegWr 5 Rs 5 Rt busB 32 32 32 Mux 16 32 Extender imm16 MemtoReg Mux 32 Clk MemWr busA ALU Rw Ra Rb 32 32-bit Registers busW ALUctr 5 Data In 32 32 ALUSrc Clk WrEn Adr Data Memory ExtOp ECE4680 Datapath.25 Fall 2005 RTL: The Branch Instruction 31 °beq 26 op 6 bits 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits rs, rt, imm16 • mem[PC] Fetch the instruction from memory • Cond <- R[rs] - R[rt] Calculate the branch condition • if (COND eq 0) Calculate the next instruction’s address - PC <- PC + 4 + ( SignExt(imm16) x 4 ) • else - PC <- PC + 4 ECE4680 Datapath.26 Fall 2005 Datapath for Branch Operations °beq rs, rt, imm16 31 26 op 6 bits Rd RegDst We need to compare Rs and Rt! 21 rs 5 bits 16 rt 5 bits 0 immediate 16 bits Rt Branch Clk PC Mux RegWr 5 Rs 5 Rt ALUctr 5 busA Rw Ra Rb 32 32-bit Registers 32 Clk 32 Extender 16 Next Address Logic Zero Mux busB 32 imm16 16 ALU busW imm16 To Instruction Memory 32 ALUSrc ECE4680 Datapath.27 ExtOp Fall 2005 Binary Arithmetics for the Next Address °In theory, the PC is a 32-bit byte address into the instruction memory: • Sequential operation: PC<31:0> = PC<31:0> + 4 • Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4 °The magic number “4” always comes up because: • The 32-bit PC is a byte address • And all our instructions are 4 bytes (32 bits) long °In other words: • The 2 LSBs of the 32-bit PC are always zeros • There is no reason to have hardware to keep the 2 LSBs °In practice, we can simplify the hardware by using a 30-bit PC<31:2>: • Sequential operation: PC<31:2> = PC<31:2> + 1 • Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] • In either case: Instruction Memory Address = PC<31:2> concat “00” ECE4680 Datapath.28 Fall 2005 Next Address Logic: Expensive and Fast Solution °Using a 30-bit PC: • Sequential operation: PC<31:2> = PC<31:2> + 1 • Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16] • In either case: Instruction Memory Address = PC<31:2> concat “00” 30 30 “00” Adder PC Addr<31:2> Addr<1:0> 30 0 Adder “1” imm16 Instruction<15:0> 16 SignExt Clk Instruction Memory Mux 30 1 32 30 30 Instruction<31:0> Branch Zero ECE4680 Datapath.29 Fall 2005 Next Address Logic: Cheap and Slow Solution °Why is this slow? • Cannot start the address add until Zero (output of ALU) is valid °Does it matter that this is slow in the overall scheme of things? • Probably not here. Critical path is the load operation. 30 PC 30 “1” “0” Adder Mux imm16 Instruction<15:0> 16 SignExt Clk 1 30 “00” Carry In 0 30 Addr<31:2> Addr<1:0> Instruction Memory 32 30 Instruction<31:0> Branch Zero ECE4680 Datapath.30 Fall 2005 RTL: The Jump Instruction 31 °j 26 op 6 bits 0 target address 26 bits target • mem[PC] Fetch the instruction from memory • PC<31:2> <- PC<31:28> concat target<25:0> Calculate the next instruction’s address ECE4680 Datapath.31 Fall 2005 Instruction Fetch Unit °j target • PC<31:2> <- PC<31:28> concat target<25:0> 30 30 PC<31:28> “00” Target 4 Instruction<25:0> 26 Mux 32 Jump 1 Instruction<31:0> 30 30 Branch ECE4680 Datapath.32 0 0 Instruction Memory Mux imm16 16 Instruction<15:0> SignExt Clk 30 Adder “1” Adder PC 30 1 30 Addr<31:2> Addr<1:0> Zero Fall 2005 Putting it All Together: A Single Cycle Datapath °We have everything except control signals (underline) Instruction<31:0> Branch Rs 5 Rt 32 Rd MemtoReg MemWr 0 32 WrEn Adr Data In 32 Clk Imm16 Mux 16 Extender imm16 1 Rs 32 Mux 32 Clk Zero ALU busA Rw Ra Rb 32 32 32-bit Registers busB 0 32 busW Rt ALUctr 5 <0:15> RegWr 5 <11:15> 1 Mux 0 <16:20> RegDst Instruction Fetch Unit Jump Clk Rt <21:25> Rd 1 Data Memory ALUSrc ExtOp ECE4680 Datapath.33 Fall 2005