Lec9 Single-Cycle MIPS

advertisement
2010 R&E Computer System Education & Research
Lecture 9. MIPS Processor Design –
Single-Cycle Processor Design
Prof. Taeweon Suh
Computer Science Education
Korea University
Single-Cycle MIPS Processor
• Again, microarchitecture (CPU implementation)
is divided into 2 interacting parts
 Datapath
 Control
2
Korea Univ
Single-Cycle Processor Design
• Let’s start with a memory access instruction - lw
 Example: lw $2, 80($0)
• STEP 1: Instruction Fetch
CLK
CLK
PC'
PC
Instr
A
RD
Instruction
Memory
A1
I-Type
op
6 bits
rs
5 bits
rt
imm
5 bits
16 bits
CLK
WE3
WE
RD1
A
A2
A3
WD3
RD
Data
Memory
WD
RD2
Register
File
3
Korea Univ
Single-Cycle Processor Design
• STEP 2: Decoding
 Read source operands from register file
I-Type
Example: lw $2, 80($0)
op
6 bits
CLK
CLK
25:21
PC'
PC
A
RD
Instruction
Memory
Instr
A1
rs
5 bits
rt
imm
5 bits
16 bits
CLK
WE3
WE
RD1
A
A2
A3
WD3
RD
Data
Memory
WD
RD2
Register
File
4
Korea Univ
Single-Cycle Processor Design
• STEP 2: Decoding
 Sign-extend the immediate
I-Type
Example: lw $2, 80($0)
op
6 bits
CLK
CLK
PC'
PC
A
RD
Instr
25:21
A1
rs
5 bits
rt
imm
5 bits
16 bits
CLK
WE3
WE
RD1
A
Instruction
Memory
A2
A3
WD3
RD
Data
Memory
WD
RD2
Register
File
SignImm
15:0
module signext(input [15:0] a,
output [31:0] y);
Sign Extend
assign y = {{16{a[15]}}, a};
endmodule
5
Korea Univ
Single-Cycle Processor Design
• STEP 3: Execution
 Compute the memory address
I-Type
Example: lw $2, 80($0)
rs
op
6 bits
5 bits
rt
imm
5 bits
16 bits
ALUControl2:0
PC
A
RD
Instr
25:21
Instruction
Memory
A1
A2
A3
WD3
WE3
RD2
SrcB
Register
File
CLK
Zero
SrcA
RD1
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
SignImm
15:0
Sign Extend
6
Korea Univ
Single-Cycle Processor Design
• STEP 4: Execution
 Read data from memory and write it back to register file
I-Type
Example: lw $2, 80($0)
op
6 bits
RegWrite
5 bits
rt
imm
5 bits
16 bits
ALUControl2:0
1
010
CLK
PC
A
RD
Instruction
Memory
Instr
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
Zero
SrcA
RD1
RD2
SrcB
Register
File
ALU
CLK
PC'
rs
ALUResult
WE
A
RD
Data
Memory
WD
ReadData
SignImm
15:0
Sign Extend
7
Korea Univ
Single-Cycle Processor Design
• We are done with lw
• CPU starts fetching the next instruction from PC+4
module adder(input [31:0] a, b,
output [31:0] y);
adder
pcadd1(pc, 32'b100, pcplus4);
assign y = a + b;
endmodule
RegWrite
ALUControl2:0
1
PC
A
RD
Instr
Instruction
Memory
25:21
A1
A2
20:16
A3
+
WD3
CLK
WE3
Zero
SrcA
RD1
RD2
SrcB
Register
File
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
ReadData
PCPlus4
SignImm
4
15:0
Sign Extend
Result
8
Korea Univ
Single-Cycle Processor Design
• Let’s consider another memory access instruction - sw
 sw instruction needs to write data to data memory
I-Type
Example: sw $2, 84($0)
rs
op
6 bits
RegWrite
A
RD
Instr
Instruction
Memory
25:21
A1
A2
A3
WD3
+
WE3
MemWrite
1
Zero
SrcA
RD1
20:16
20:16
16 bits
CLK
RD2
SrcB
ALU
PC
5 bits
010
CLK
CLK
imm
ALUControl2:0
0
PC'
5 bits
rt
ALUResult
WriteData
Register
File
WE
A
RD
Data
Memory
WD
ReadData
PCPlus4
SignImm
4
15:0
Sign Extend
Result
9
Korea Univ
Single-Cycle Processor Design
• Let’s consider arithmetic and logical
instructions - add, sub, and, or
R-Type
 Write ALUResult to register file
 Note that R-type instructions write to
rd field of instruction (instead of rt)
RegWrite
RegDst
1
rs
6 bits
5 bits
0
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
funct
5 bits
5 bits
5 bits
6 bits
MemWrite
Zero
SrcA
RD1
0 SrcB
RD2
1
Register
File
ALUResult
WriteData
MemtoReg
0
0
WE
A
RD
Data
Memory
WD
ReadData
0
1
0
15:11
+
shamt
CLK
20:16
WriteReg4:0
PCPlus4
rd
varies
ALU
PC
rt
ALUSrc ALUControl2:0
1
CLK
CLK
PC'
op
1
SignImm
4
15:0
Sign Extend
Result
10
Korea Univ
Single-Cycle Processor Design
• Let’s consider a branch instruction - beq
 Determine whether register values are equal
 Calculate branch target address (BTA) from sign-extended
immediate and PC+4
I-Type
Example: beq $4,$0, around
rs
op
6 bits
5 bits
rt
imm
5 bits
16 bits
PCSrc
RegWrite
RegDst
x
0
PC
1
RD
A
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
0 SrcB
1
RD2
Register
File
20:16
+
WriteReg4:0
15:0
RD
Data
Memory
WD
A
ReadData
0
1
1
SignImm
4
WriteData
x
0
0
15:11
PCPlus4
ALUResult
MemtoReg
WE
Zero
SrcA
RD1
ALU
PC'
CLK
Sign Extend
<<2
+
0
1
110
0
CLK
CLK
MemWrite
ALUSrc ALUControl2:0 Branch
PCBranch
Result
11
Korea Univ
Single-Cycle Datapath Example
• We are done with the implementation of basic instructions
• Let’s see how or instruction works out in the implementation
R-Type
op
rs
6 bits
5 bits
rt
rd
shamt
funct
5 bits
5 bits
5 bits
6 bits
MemtoReg
31:26
5:0
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
PCSrc
RegWrite
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
A1
CLK
1
WE3
001
SrcA
RD1
0
20:16
A2
A3
WD3
RD2
0 SrcB
1
Register
File
+
WriteReg4:0
WriteData
0
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
15:0
ALUResult
0
WE
0
15:11
4
Zero
1
20:16
PCPlus4
ALU
0
CLK
CLK
Sign Extend
<<2
+
0
PCBranch
Result
12
Korea Univ
Single-Cycle Processor - Control
• As mentioned, CPU is designed with datapath and control
• Now, let’s delve into the control part design
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
WE
0
15:11
PCPlus4
Zero
SrcA
RD1
Sign Extend
<<2
+
0
CLK
ALU
CLK
PCBranch
Result
13
Korea Univ
Control Unit
Control
Unit
Opcode5:0
Main
Decoder
Opcode and funct fields come
from the fetched instruction
MemtoReg
MemWrite
Branch
ALUSrc
RegDst
RegWrite
ALUOp1:0
Funct5:0
14
ALU
Decoder
ALUControl 2:0
Korea Univ
ALU Implementation and Control
A
B
N
B
adder
N
0
N
N
1
A
N
F2
N
ALU
N
3F
Cout
N = 32 in 32-bit processor
Zero
Extend
Y
+
[N-1] S
N
N
N
N
0
1
2
3
Y
15
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
slt: set less than
2
N
F2:0
F1:0
Example:
slt $t0, $t1, $t2
// $t0 = 1 if $t1 < $t2
Korea Univ
Control Unit: ALU Control
•
Implementation is completely dependent on
hardware designers
• But, the designers should make sure the
implementation is reasonable enough
•
•
Memory access instructions (lw, sw) need to use ALU to calculate
memory target address (addition)
Branch instructions (beq, bne) need to use ALU for the equality check
(subtraction)
Control
Unit
Opcode5:0
Main
Decoder
ALU
Decoder
Meaning
00
Add
01
Subtract
10
Look at Funct
11
Not Used
MemtoReg
MemWrite
Branch
ALUSrc
ALUOp1:0
Funct
ALUControl2:0
00
X
010 (Add)
X1
X
110 (Subtract)
RegDst
RegWrite
1X
100000 (add)
010 (Add)
1X
100010 (sub)
110 (Subtract)
1X
100100 (and)
000 (And)
1X
100101 (or)
001 (Or)
1X
101010 (slt)
111 (SLT)
ALUOp1:0
Funct5:0
ALUOp1:0
ALUControl 2:0
16
Korea Univ
Control Unit: Main Decoder
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type
000000
1
1
0
0
0
0
10
lw
100011
0
0
101011
1
0
1
1
X
00
00
beq
000100
0
0
X
X
1
sw
1
0
0
1
0
X
01
Control
Unit
Opcode5:0
Main
Decoder
MemtoReg
MemWrite
Branch
ALUSrc
RegDst
RegWrite
ALUOp1:0
Funct5:0
ALU
Decoder
ALUControl 2:0
ALUOp1:0
Meaning
00
Add
01
Subtract
10
Look at
Funct field
11
Not Used
17
Korea Univ
How about Other Instructions?
• Hmmm.. Now, we are done with the control part design
• Let’s examine if the design is able to execute other instructions
 addi
Example: addi $t0, $t1, -14
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
0
15:11
PCPlus4
Zero
SrcA
RD1
Sign Extend
<<2
+
0
CLK
ALU
CLK
PCBranch
Result
18
Korea Univ
Control Unit: Main Decoder
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type
000000
1
1
0
0
0
0
10
lw
100011
1
0
1
0
0
1
00
sw
101011
0
X
1
0
1
X
00
beq
000100
0
X
0
1
0
X
01
addi
001000
1
0
1
0
0
0
00
19
Korea Univ
How about Other Instructions?
• Ok. So far, so good…
• How about jump instructions?
J-Type
 j
31:26
5:0
op
addr
6 bits
26 bits
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
WE
0
15:11
PCPlus4
Zero
SrcA
RD1
Sign Extend
<<2
+
0
CLK
ALU
CLK
PCBranch
Result
20
Korea Univ
How about Other Instructions?
• We need to add some hardware to support the j instruction
 A logic to compute the target address
op
 Mux and control signal
6 bits
Jump
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
J-Type
addr
26 bits
PCSrc
RegWrite
CLK
0
1
0
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
0 SrcB
1
Register
File
20:16
PCJump
+
WriteReg4:0
PCPlus4
WriteData
Sign Extend
RD
Data
Memory
WD
ReadData
0 Result
1
<<2
+
27:0
A
1
SignImm
15:0
ALUResult
WE
0
15:11
4
Zero
SrcA
RD1
ALU
CLK
PCBranch
31:28
25:0
<<2
21
Korea Univ
Control Unit: Main Decoder
• There is one more output in the main decoder to support
the jump instructions
• Jump
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
Jump
R-type
000000
1
1
0
0
0
0
10
0
lw
100011
1
0
1
0
0
1
00
0
sw
101011
0
X
1
0
1
X
00
0
beq
000100
0
X
0
1
0
X
01
0
addi
001000
1
0
1
0
0
0
00
0
j
000100
0
X
X
X
0
X
XX
1
22
Korea Univ
Verilog Code - Main Decoder and ALU Control
Control
Unit
Opcode5:0
module maindec(input [5:0] op,
output
memtoreg, memwrite,
output
branch, alusrc,
output
regdst, regwrite,
output
jump,
output [1:0] aluop);
Funct5:0
ALU
Decoder
ALUControl 2:0
module aludec(input
[5:0] funct,
input
[1:0] aluop,
output reg [2:0] alucontrol);
assign {regwrite, regdst, alusrc,
branch, memwrite,
memtoreg, jump, aluop} = controls;
controls <=
controls <=
controls <=
controls <=
controls <=
controls <=
controls <=
RegDst
RegWrite
ALUOp1:0
reg [8:0] controls;
always @(*)
case(op)
6'b000000:
6'b100011:
6'b101011:
6'b000100:
6'b001000:
6'b000010:
default:
endcase
endmodule
Main
Decoder
MemtoReg
MemWrite
Branch
ALUSrc
always @(*)
case(aluop)
2'b00: alucontrol <= 3'b010; // add
2'b01: alucontrol <= 3'b110; // sub
default: case(funct)
// RTYPE
6'b100000: alucontrol <= 3'b010;
6'b100010: alucontrol <= 3'b110;
6'b100100: alucontrol <= 3'b000;
6'b100101: alucontrol <= 3'b001;
6'b101010: alucontrol <= 3'b111;
default: alucontrol <= 3'bxxx; //
endcase
endcase
endmodule
9'b110000010; // R-type
9'b101001000; // lw
9'b001010000; // sw
9'b000100001; // beq
9'b101000000; // addi
9'b000000100; // j
9'bxxxxxxxxx; // ???
23
// ADD
// SUB
// AND
// OR
// SLT
???
Korea Univ
Verilog Code – ALU
A
module alu(input
[31:0] a, b,
input
[2:0] alucont,
output reg [31:0] result,
output
zero);
B
N
N
3F
ALU
N
wire [31:0] b2, sum, slt;
Y
A
assign b2 = alucont[2] ? ~b:b;
assign sum = a + b2 + alucont[2];
assign slt = sum[31];
B
N
N
N
0
1
F2
N
Cout
+
[N-1] S
Zero
Extend
N
N
N
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
N
0
1
2
3
2
N
always@(*)
case(alucont[1:0])
2'b00: result <= a & b2;
2'b01: result <= a | b2;
2'b10: result <= sum;
2'b11: result <= slt;
endcase
F2:0
F1:0
assign zero = (result == 32'b0);
endmodule
Y
24
Korea Univ
Single-Cycle Processor Performance
• How fast is the single-cycle processor?
• Clock cycle time (frequency) is limited by the critical path
 The critical path is the path that takes the longest time
 What do you think the critical path is?
• The path that lw instruction goes through
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl 2:0
0
0
PCSrc
Op
ALUSrc
Funct RegDst
RegWrite
PC'
PC
1
A
RD
Instr
Instruction
Memory
A1
WE3
010
SrcA
RD1
1
20:16
A2
A3
WD3
RD2
0 SrcB
1
Register
File
+
WriteReg4:0
A
RD
Data
Memory
WD
ReadData
1
1
SignImm
15:0
WriteData
1
0
0
15:11
4
ALUResult
0
WE
0
20:16
PCPlus4
Zero
Sign Extend
<<2
+
0
25:21
CLK
1
ALU
CLK
CLK
PCBranch
Result
25
Korea Univ
Single-Cycle Processor Performance
• Single-cycle critical path:
Tc = tpcq_PC + tmem + max(tRFread, tsext) + tmux + tALU + tmem + tmux + tRFsetup
• In most implementations, limiting paths are: memory
(instruction and data), ALU, register file. Thus,
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl 2:0
Op
0
0
PCSrc
Funct RegDst
RegWrite
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
A1
20:16
A2
010
SrcA
RD1
1
0 SrcB
1
RD2
A3
+
Register
WD3
File
0
15:11
1
WriteReg4:0
SignImm
4
15:0
Sign Extend
Zero
ALUResult
WriteData
0
1
WE
Parameter
Register clock-to-Q
tpcq_PC
Multiplexer
tmux
A
RD
Data
Memory
WD
ReadData
ALU
tALU
Memory read
tmem
Register file read
tRFread
Register file setup
tRFsetup
0
1
0
20:16
PCPlus4
ALU
0
CLK
1
WE3
<<2
+
CLK
CLK
Elements
ALUSrc
PCBranch
Result
26
Korea Univ
Single-Cycle Processor Performance
Example
Elements
Parameter
Delay (ps)
Register clock-to-Q
tpcq_PC
30
Multiplexer
tmux
25
ALU
tALU
200
Memory read
tmem
250
Register file read
tRFread
150
Register file setup
tRFsetup
20
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup
= [30 + 2(250) + 150 + 2(25) + 200 + 20] ps
= 950 ps
•
fc = 1/Tc
fc = 1/950ps
= 1.052GHz
Assuming that the CPU executes 100 billion instructions to run your program,
what is the execution time of the program on a single-cycle MIPS processor?
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle)
= (100 × 109)(1)(950 × 10-12 s)
= 95 seconds
27
Korea Univ
Download