Processor Design: How to Implement MIPS Simplicity favors regularity

advertisement
ECE7660
Advanced Computer Architecture
Review of
Single Cycle Datapath Design
(excepted from ece4680 lecture notes)
Processor Design: How to Implement MIPS
Simplicity favors regularity
ECE4680 Datapath.1
Fall 2005
The MIPS Instruction Formats
°All MIPS instructions are 32 bits long. The three instruction formats:
31
• R-type
• I-type
• J-type
26
op
6 bits
31
21
rs
5 bits
26
op
6 bits
31
16
rt
5 bits
21
rs
5 bits
11
rd
5 bits
6
shamt
5 bits
0
funct
6 bits
16
rt
5 bits
0
immediate
16 bits
26
op
6 bits
0
target address
26 bits
°The different fields are:
• op: operation of the instruction
• rs, rt, rd: the source and destination register specifiers
• shamt: shift amount
• funct: selects the variant of the operation in the “op” field
• address / immediate: address offset or immediate value
• target address: target address of the jump instruction
ECE4680 Datapath.2
Fall 2005
The MIPS Subset
31
°ADD and subtract
• add rd, rs, rt
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
11
rd
5 bits
6
shamt
5 bits
0
funct
6 bits
• sub rd, rs, rt
31
°OR Immediate:
• ori rt, rs, imm16
0
immediate
16 bits
°LOAD and STORE
• lw rt, rs, imm16
• sw rt, rs, imm16
°BRANCH:
• beq rs, rt, imm16
°JUMP:
• j target
31
26
op
6 bits
0
target address
26 bits
ECE4680 Datapath.3
Fall 2005
An Abstract View of the Implementation
ƒTwo types of functional units
–Operational element that operate on data (combinational)
–State element that contain data (sequential)
•Generic Implementation:
Clk
–use PC to supply instruction address
–get the instruction from memory
–read registers
–use the instruction to decide exactly what to do
PC
Instruction Address
Ideal
Instruction
Memory
Rt
5
Rw Ra Rb
32 32-bit
Registers
•All instructions use the ALU after reading the registers
Imm – Why? memory-reference? arithmetic? control flow?
16
32
32
ALU
32
Instruction
Rd Rs
5
5
Clk
32
Data
Address
Data
In
Ideal
Data
Memory
DataOut
Clk
Next step: to fill in the details: more units, more connections, and control unit
ECE4680 Datapath.4
Fall 2005
Clocking Methodology
Clk
Setup
Hold
Setup
Hold
.
.
.
.
.
.
Don’t Care
.
.
.
.
.
.
°All storage elements are clocked by the same clock edge
• Edge-trigged: all stored values are updated on a clock edge
°Cycle Time = Latch Prop + Longest Delay Path + Setup + Clock Skew
°(Latch Prop + Shortest Delay Path - Clock Skew) > Hold Time
ECE4680 Datapath.5
Fall 2005
An Abstract View of the Critical Path
°Register file and ideal memory:
• The CLK input is a factor ONLY during write operation
• During read operation, behave as combinational logic:
Clk
Address valid => Output valid after “access time.”
Critical Path (Load Operation) =
PC’s prop time +
Instruction Memory’s Access Time +
Register File’s Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Setup Time for Register File Write +
Clock Skew
PC
Instruction Address
Ideal
Instruction
Memory
Rd Rs
5
5
Rt
5
Rw Ra Rb
32 32-bit
Registers
Imm
16
32
32
ALU
32
Instruction
Clk
32
ECE4680 Datapath.6
Data
Address
Data
In
Ideal
Data
Memory
DataOut
Clk
Fall 2005
The Steps of Designing a Processor
°Instruction Set Architecture => Register Transfer Language
°Register Transfer Language (RTL) =>
• Datapath components
• Datapath interconnect
°Datapath components => Control signals
°Control signals => Control logic
po
Element < com
nent
ECE4680 Datapath.7
Fall 2005
What is RTL: The ADD Instruction
Register Transfer Language
°add
rd, rs, rt
• mem[PC]
Fetch the instruction from memory
• R[rd] <- R[rs] + R[rt]
The ADD operation
• PC <- PC + 4
Calculate the next instruction’s address
ECE4680 Datapath.8
Fall 2005
What is RTL: The Load Instruction
°lw
rt, rs, imm16
• mem[PC]
Fetch the instruction from memory
• Addr <- R[rs] + SignExt(imm16)
Calculate the memory address
• R[rt] <- Mem[Addr]
Load the data into the register
• PC <- PC + 4
Calculate the next instruction’s address
ECE4680 Datapath.9
Fall 2005
Combinational Logic Elements
CarryIn
°Adder
A
Adder
B
32
Sum
32
Carry
32
°MUX (p.B-9,B-19)
°Decoder
Select
32
Y
32
3
Decoder
B
32
MUX
A
out0
out1
out2
out7
°ALU
OP
A
ECE4680 Datapath.10
32
ALU
B
32
32
Result
Zero
Fall 2005
Storage Element: Register (p.B22-B25)
°Register
Write Enable
• Similar to the D Flip Flop except
- N-bit input and output
N
- Write Enable input
• Write Enable:
-
Data Out
Data In
0: Data Out will not change
1: Data Out will become Data In
N
Clk
• Array of logical elements(see register file on next 2 slides)
W
tick ONLY if the
ted at the clock
da
up
is
nt
nte
The co
l is set to 1.
rite Enable signa
ECE4680 Datapath.11
Fall 2005
Storage Element: Register File
°Register File consists of 32 registers:
• Two 32-bit output busses:
busA and busB
• One 32-bit input bus: busW
RW RA RB
Write Enable 5 5
5
busW
32
Clk
32 32-bit
Registers
°Register is selected by:
busA
32
busB
32
• RA selects the register to put on busA
• RB selects the register to put on busB
• RW selects the register to be written
via busW when Write Enable is 1
°Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational logic block:
- RA or RB valid => busA or busB valid after “access time.”
ECE4680 Datapath.12
Fall 2005
Storage Element: Register File -- Detailed diagram
RW RA RB
Write Enable 5 5
5
RA RB
Write Enable
0
1
RW
32-to-1
Decoder
30
C
busW
32
Clk
busA
32
32 32-bit
Registers
busB
32
Register 0
D
C
D
Register 1
31
M
U
busA
C
D
Register 30
X
C
busW
D
Register 31
Clk
M
U
busB
X
ECE4680 Datapath.13
Fall 2005
Storage Element: Idealized Memory
Write Enable
°Memory (idealized)
• One input bus: Data In
• One output bus: Data Out
Address
Data In
32
Clk
DataOut
32
°Memory word is selected by:
• Address selects the word to put on Data Out
• Write Enable = 1: address selects the memory
memory word to be written via the Data In bus
°Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational logic block:
- Address valid => Data Out valid after “access time.”
ECE4680 Datapath.14
Fall 2005
Overview of the Instruction Fetch Unit (Fig. 5.5)
°The common RTL operations
• Fetch the Instruction: mem[PC]
• Update the program counter:
- Sequential Code: PC <- PC + 4
-
Branch and Jump PC <- “something else”
PC
Clk
Next Address
Logic
Address
Instruction Word
Instruction
Memory
32
ECE4680 Datapath.15
Fall 2005
RTL: The ADD Instruction
31
26
op
6 bits
°add
21
rs
5 bits
16
rt
5 bits
11
rd
5 bits
6
shamt
5 bits
0
funct
6 bits
rd, rs, rt
• mem[PC]
Fetch the instruction from memory
• R[rd] <- R[rs] + R[rt]
The actual operation
• PC <- PC + 4
Calculate the next instruction’s address
ECE4680 Datapath.16
Fall 2005
RTL: The Subtract Instruction
31
26
op
6 bits
°sub
21
rs
5 bits
16
11
rt
5 bits
rd
5 bits
6
0
shamt
5 bits
funct
6 bits
rd, rs, rt
• mem[PC]
Fetch the instruction from memory
• R[rd] <- R[rs] - R[rt]
The actual operation
• PC <- PC + 4
Calculate the next instruction’s address
ECE4680 Datapath.17
Fall 2005
Datapath for Register-Register Operations
°R[rd] <- R[rs] op R[rt]
Example: add rd, rs, rt
• Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields
• ALUctr and RegWr: control logic after decoding the instruction
31
26
op
6 bits
21
rs
5 bits
16
11
rt
5 bits
rd
5 bits
Rd Rs Rt
RegWr 5 5
5
ECE4680 Datapath.18
0
funct
6 bits
ALUctr
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
32 32-bit
Registers
6
shamt
5 bits
Result
32
Fall 2005
Register-Register Timing
Clk
Clk-to-Q
New Value
Old Value
PC
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
RegWr
Old Value
Instruction Memory Access Time
New Value
Delay through Control Logic
New Value
New Value
busA, B
Old Value
busW
Old Value
Register File Access Time
New Value
ALU Delay
New Value
Rd Rs Rt
RegWr 5 5
5
Register Write
Occurs Here
busA
Rw Ra Rb
32 32-bit
Registers
32
ALU
busW
32
Clk
ALUctr
busB
32
Result
32
ECE4680 Datapath.19
Fall 2005
RTL: The OR Immediate Instruction
31
°ori
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
rt, rs, imm16
• mem[PC]
Fetch the instruction from memory
• R[rt] <- R[rs] or ZeroExt(imm16)
The OR operation
• PC <- PC + 4
Calculate the next instruction’s address
31
0000000000000000
16 bits
ECE4680 Datapath.20
16 15
0
immediate
16 bits
Fall 2005
Datapath for Logical Operations with Immediate
°R[rt] <- R[rs] op ZeroExt[imm16]]
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
Rs
5
5
Don’t Care
(Rt)
32
ZeroExt
imm16
16
Result
32
Mux
busB
32
ALU
32
Clk
ALUctr
busA
Rw Ra Rb
32 32-bit
Registers
busW
0
Newly added parts are in blue color.
Mux
RegWr 5
rt, rs, imm16
immediate
16 bits
Rt
Rd
RegDst
Example: ori
32
ALUSrc
ECE4680 Datapath.21
Fall 2005
RTL: The Load Instruction
31
°lw
rt, rs, imm16
26
op
6 bits
• mem[PC]
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
Fetch the instruction from memory
• Addr <- R[rs] + SignExt(imm16)
Calculate the memory address
R[rt] <- Mem[Addr]
• PC <- PC + 4
Load the data into the register
Calculate the next instruction’s address
31
16 15
0
0000000000000000
16 bits
31
16 15
1111111111111111 1
16 bits
ECE4680 Datapath.22
0
immediate
16 bits
0
immediate
16 bits
Fall 2005
Datapath for Load Operations
°R[rt] <- Mem[R[rs] + SignExt[imm16]]
31
26
op
6 bits
Rd
RegDst
21
rs
5 bits
Rs
5
5
Don’t Care
(Rt)
MemtoReg
32
32
ALUSrc
Mux
Extender
16
32
MemWr
Mux
busB
32
imm16
ALUctr
ALU
32
Clk
0
immediate
16 bits
busA
Rw Ra Rb
32 32-bit
Registers
busW
rt, rs, imm16
Rt
Mux
RegWr 5
16
rt
5 bits
Example: lw
32
WrEn Adr
Data In
32
Clk
Data
Memory
ExtOp
ECE4680 Datapath.23
Fall 2005
RTL: The Store Instruction
31
°sw
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
rt, rs, imm16
• mem[PC]
Fetch the instruction from memory
• Addr <- R[rs] + SignExt(imm16)
Calculate the memory address
• Mem[Addr] <- R[rt]
Store the register into memory
• PC <- PC + 4
Calculate the next instruction’s address
ECE4680 Datapath.24
Fall 2005
Datapath for Store Operations
°Mem[R[rs] + SignExt[imm16] <- R[rt]]
31
26
op
6 bits
Rd
RegDst
21
rs
5 bits
Example: sw
16
rt
5 bits
rt, rs, imm16
0
immediate
16 bits
Rt
Mux
RegWr 5
Rs
5
Rt
busB
32
32
32
Mux
16
32
Extender
imm16
MemtoReg
Mux
32
Clk
MemWr
busA
ALU
Rw Ra Rb
32 32-bit
Registers
busW
ALUctr
5
Data In
32
32
ALUSrc
Clk
WrEn Adr
Data
Memory
ExtOp
ECE4680 Datapath.25
Fall 2005
RTL: The Branch Instruction
31
°beq
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
rs, rt, imm16
• mem[PC]
Fetch the instruction from memory
• Cond <- R[rs] - R[rt]
Calculate the branch condition
• if (COND eq 0)
Calculate the next instruction’s address
- PC <- PC + 4 + ( SignExt(imm16) x 4 )
•
else
- PC <- PC + 4
ECE4680 Datapath.26
Fall 2005
Datapath for Branch Operations
°beq
rs, rt, imm16
31
26
op
6 bits
Rd
RegDst
We need to compare Rs and Rt!
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
Rt
Branch
Clk
PC
Mux
RegWr 5
Rs
5
Rt
ALUctr
5
busA
Rw Ra Rb
32 32-bit
Registers
32
Clk
32
Extender
16
Next Address
Logic
Zero
Mux
busB
32
imm16
16
ALU
busW
imm16
To Instruction
Memory
32
ALUSrc
ECE4680 Datapath.27
ExtOp
Fall 2005
Binary Arithmetics for the Next Address
°In theory, the PC is a 32-bit byte address into the instruction memory:
• Sequential operation: PC<31:0> = PC<31:0> + 4
• Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4
°The magic number “4” always comes up because:
• The 32-bit PC is a byte address
• And all our instructions are 4 bytes (32 bits) long
°In other words:
• The 2 LSBs of the 32-bit PC are always zeros
• There is no reason to have hardware to keep the 2 LSBs
°In practice, we can simplify the hardware by using a 30-bit PC<31:2>:
• Sequential operation: PC<31:2> = PC<31:2> + 1
• Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
• In either case: Instruction Memory Address = PC<31:2> concat “00”
ECE4680 Datapath.28
Fall 2005
Next Address Logic: Expensive and Fast Solution
°Using a 30-bit PC:
• Sequential operation: PC<31:2> = PC<31:2> + 1
• Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
• In either case: Instruction Memory Address = PC<31:2> concat “00”
30
30
“00”
Adder
PC
Addr<31:2>
Addr<1:0>
30
0
Adder
“1”
imm16
Instruction<15:0> 16
SignExt
Clk
Instruction
Memory
Mux
30
1
32
30
30
Instruction<31:0>
Branch
Zero
ECE4680 Datapath.29
Fall 2005
Next Address Logic: Cheap and Slow Solution
°Why is this slow?
• Cannot start the address add until Zero (output of ALU) is valid
°Does it matter that this is slow in the overall scheme of things?
• Probably not here. Critical path is the load operation.
30
PC
30
“1”
“0”
Adder
Mux
imm16
Instruction<15:0> 16
SignExt
Clk
1
30
“00”
Carry In
0
30
Addr<31:2>
Addr<1:0>
Instruction
Memory
32
30
Instruction<31:0>
Branch Zero
ECE4680 Datapath.30
Fall 2005
RTL: The Jump Instruction
31
°j
26
op
6 bits
0
target address
26 bits
target
• mem[PC]
Fetch the instruction from memory
• PC<31:2> <- PC<31:28> concat target<25:0>
Calculate the next instruction’s address
ECE4680 Datapath.31
Fall 2005
Instruction Fetch Unit
°j
target
• PC<31:2> <- PC<31:28> concat target<25:0>
30
30
PC<31:28>
“00”
Target 4
Instruction<25:0>
26
Mux
32
Jump
1
Instruction<31:0>
30
30
Branch
ECE4680 Datapath.32
0
0
Instruction
Memory
Mux
imm16
16
Instruction<15:0>
SignExt
Clk
30
Adder
“1”
Adder
PC
30
1
30
Addr<31:2>
Addr<1:0>
Zero
Fall 2005
Putting it All Together: A Single Cycle Datapath
°We have everything except control signals (underline)
Instruction<31:0>
Branch
Rs
5
Rt
32
Rd
MemtoReg
MemWr
0
32
WrEn Adr
Data In 32
Clk
Imm16
Mux
16
Extender
imm16
1
Rs
32
Mux
32
Clk
Zero
ALU
busA
Rw Ra Rb
32
32 32-bit
Registers
busB
0
32
busW
Rt
ALUctr
5
<0:15>
RegWr 5
<11:15>
1 Mux 0
<16:20>
RegDst
Instruction
Fetch Unit
Jump
Clk
Rt
<21:25>
Rd
1
Data
Memory
ALUSrc
ExtOp
ECE4680 Datapath.33
Fall 2005
Download