CS61C : Machine Structures – Lecture 33

advertisement
inst.eecs.berkeley.edu/~cs61c
CS61C : Machine Structures
Lecture 33 –
Single Cycle CPU Datapath, with Verilog II
2004-04-14
Lecturer PSOE Dan Garcia
www.cs.berkeley.edu/~ddgarcia
Google Gmail Service!! 
Not so fast! State Sen Liz
Figueroa (Fremont) is drafting legislation to block it
because it’d place advertising in personal messages
after searching them for keywords. “We think it's an
absolute invasion of privacy. It's like having a massive
billboard in the middle of your home.”
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (1)
Garcia, Spring 2004 © UCB
Storage Element: Idealized Memory
Write Enable
Address
• Memory (idealized)
Data In
• One input bus: Data In
32
• One output bus: Data Out
Clk
DataOut
32
• Memory word is selected by:
• Address selects the word to put on Data Out
• Write Enable = 1: address selects the memory
word to be written via the Data In bus
• Clock input (CLK)
• The CLK input is a factor ONLY during write
operation
• During read operation, behaves as a
combinational logic block:
-
Address valid => Data Out valid after “access time.”
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (2)
Garcia, Spring 2004 © UCB
Verilog Memory for MIPS Interpreter (1/3)
//Behavioral modelof Random Access Memory:
// 32-bit wide, 256 words deep,
// asynchronous read-port if RD=1,
// synchronous write-port if WR=1,
// initialize from hex file ("data.dat")
// on positive edge of reset signal,
// dump to binary file ("dump.dat")
// on positive edge of dump signal.
module mem
(CLK,RST,DMP,WR,RD,address,writeD,readD);
input CLK, RST, DMP, WR, RD;
input [31:0] address, writeD;
output [31:0] readD;
reg [31:0] readD;
parameter memSize=256; // ~ Constant dec.
reg [31:0] memArray [0:memSize-1];
integer
chann,i;
// Temp variables: for loops ...
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (3)
Garcia, Spring 2004 © UCB
Verilog Memory for MIPS Interpreter (2/3)
integer
chann,i;
always @ (posedge RST)
$readmemh("data.dat", memArray);
always @ (posedge CLK)
if (WR) memArray[address[9:2]] =
writeD;
// write if WR & positive clock edge (synchronous)
always @ (address or RD)
if (RD)
begin
readD = memArray[address[9:2]];
$display("Getting address %h
containing %h", address[9:2], readD);
end
// read if RD, independent of clock (asynchronous)
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (4)
Garcia, Spring 2004 © UCB
Why is it “memArray[address[9:2]]”?
• Our memory is always byte-addressed
• We can lb from 0x0, 0x1, 0x2, 0x3, …
• lw only reads word-aligned requests
• We only call lw with 0x0, 0x4, 0x8, 0xC, …
• I.e., the last two bits are always 0
• memArray is a word wide and 28 deep
•reg [31:0] memArray [0:256-1];
• Size = 4 Bytes/row * 256 rows = 1024 B
• If we’re simulating lw/sw, we R/W words
• What bits select the first 256 words? [9:2]!
• 1st
word
=
0x0
=
0b000
=
memArray[0];
2nd word = 0x4 = 0b100 = memArray[1], etc.
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (5)
Garcia, Spring 2004 © UCB
Verilog Memory for MIPS Interpreter (3/3)
end;
always @ (posedge DMP)
begin
chann = $fopen("dump.dat");
if (chann==0)
begin
$display("$fopen of dump.dat
failed.");
$finish;
end
// Temp variables chan, i
for (i=0; i<memSize; i=i+1)
begin
$fdisplay(chann, "%b",
memArray[i]);
end
end // always @ (posedge DMP)
endmodule // mem
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (6)
Garcia, Spring 2004 © UCB
Peer Instruction
A. We should use the main ALU to
compute PC=PC+4
B. We’re going to be able to read 2
registers and write a 3rd in 1 cycle
C. Datapath is hard, Control is easy
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (7)
1:
2:
3:
4:
5:
6:
7:
8:
ABC
FFF
FFT
FTF
FTT
TFF
TFT
TTF
TTT
Garcia, Spring 2004 © UCB
How to Design a Processor: step-by-step
• 1. Analyze instruction set architecture (ISA)
=> datapath requirements
• meaning of each instruction is given by the
register transfers
• datapath must include storage element for ISA
registers
• datapath must support each register transfer
• 2. Select set of datapath components and
establish clocking methodology
• 3. Assemble datapath meeting requirements
• 4. Analyze implementation of each
instruction to determine setting of control
points that effects the register transfer.
•
5. Assemble the control logic (hard part!)
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (8)
Garcia, Spring 2004 © UCB
Storage Element: Register (Building Block)
•Register
• Similar to the D Flip Flop
except
Write Enable
Data In
N
Data Out
N
- N-bit input and output
- Write Enable input
Clk
• Write Enable:
- negated (or deasserted) (0):
Data Out will not change
- asserted (1):
Data Out will become Data In
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (9)
Garcia, Spring 2004 © UCB
Verilog 32-bit Register for MIPS Interpreter
// Behavioral model of 32-bit Register:
// positive edge-triggered,
// synchronous active-high reset.
module reg32 (CLK,Q,D,RST);
input [31:0] D;
input CLK, RST;
output [31:0] Q;
reg [31:0] Q;
always @ (posedge CLK)
if (RST) Q = 0; else Q = D;
endmodule // reg32
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (10)
Garcia, Spring 2004 © UCB
Storage Element: Register File
• Register File consists of 32 registers:
• Two 32-bit output busses:
busA and busB
• One 32-bit input bus: busW
• Register is selected by:
RW RA RB
Write Enable 5 5 5
busW
32
Clk
busA
32
32 32-bit
Registers busB
32
• RA (number) selects the register to put on busA (data)
• RB (number) selects the register to put on busB (data)
• RW (number) selects the register to be written
via busW (data) when Write Enable is 1
• Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational
logic block:
-
RA or RB valid => busA or busB valid after “access time.”
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (11)
Garcia, Spring 2004 © UCB
Verilog Register File for MIPS Interpreter (1/3)
//
//
//
//
//
//
Behavioral model of register file:
32-bit wide, 32 words deep,
two asynchronous read-ports,
one synchronous write-port.
Dump register file contents to
console on pos edge of dump signal.
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (12)
Garcia, Spring 2004 © UCB
Verilog Register File for MIPS Interpreter (2/3)
module regFile (CLK, wEnb, DMP,
writeReg, writeD, readReg1, readD1,
readReg2, readD2);
input CLK, wEnb, DMP;
input [4:0] writeReg, readReg1,
readReg2;
input [31:0] writeD;
output [31:0] readD1, readD2;
reg [31:0] readD1, readD2;
reg [31:0] array [0:31];
reg dirty1, dirty2;
integer
i;
• 3 5-bit fields to select registers: 1 write
register, 2 read register
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (13)
Garcia, Spring 2004 © UCB
Verilog Register File for MIPS Interpreter (3/3)
always @ (posedge CLK)
if (wEnb)
if (writeReg!=4'h0) // why?
begin
array[writeReg] = writeD;
dirty1=1'b1;
dirty2=1'b1;
end
always @ (readReg1 or dirty1)
begin
readD1 = array[readReg1];
dirty1=0;
end
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (14)
Garcia, Spring 2004 © UCB
Step 3: Assemble DataPath meeting requirements
• Register Transfer Requirements
 Datapath Assembly
• Instruction Fetch
• Read Operands and Execute Operation
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (15)
Garcia, Spring 2004 © UCB
3a: Overview of the Instruction Fetch Unit
• The common RTL operations
• Fetch the Instruction: mem[PC]
• Update the program counter:
- Sequential Code: PC = PC + 4
- Branch and Jump: PC = “something else”
Clk
PC
Next Address
Logic
Address
Instruction
Memory
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (16)
Instruction Word
32
Garcia, Spring 2004 © UCB
3b: Add & Subtract
• R[rd] = R[rs] op R[rt] Ex.: addU rd, rs, rt
• Ra, Rb, and Rw come from instruction’s Rs, Rt,
26
21
16
11
6
and Rd fields 31
op
6 bits
rs
5 bits
rt
5 bits
rd
5 bits
shamt
5 bits
funct
6 bits
0
• ALUctr and RegWr: control logic after decoding
the instruction
Rd Rs Rt
RegWr 5 5 5
32 32-bit
Registers
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
ALUctr
Result
32
• Already defined register file, ALU
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (17)
Garcia, Spring 2004 © UCB
Clocking Methodology
Clk
.
.
.
.
.
.
.
.
.
.
.
.
• Storage elements clocked by same edge
• Being physical devices, flip-flops (FF) and
combinational logic have some delays
• Gates: delay from input change to output change
• Signals at FF D input must be stable before
active clock edge to allow signal to travel within
the FF, and we have the usual clock-to-Q delay
• “Critical path” (longest path through logic)
determines length of clock period
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (18)
Garcia, Spring 2004 © UCB
Register-Register Timing: One complete cycle
Clk
PC Old Value
New Value
Rs, Rt, Rd,
Op, Func
ALUctr
Old Value
RegWr
Old Value
Old Value
busA, B
Old Value
busW
Old Value
Instruction Memory Access Time
New Value
Delay through Control Logic
New Value
New Value
Register File Access
Time New Value
ALU Delay
New Value
Rd Rs Rt
RegWr5 5 5
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (19)
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
32 32-bit
Registers
ALUctr
Register Write
Occurs Here
Result
32
Garcia, Spring 2004 © UCB
3c: Logical Operations with Immediate
• R[rt] = R[rs] op ZeroExt[imm16] ]
31
26
21
op
rs
31 6 bits
5 bits
16
rt
5 bits 16 15 rd?
11
0
immediate
16 bits
0
ALU
immediate
0000000000000000
Rd
Rt
RegDst
16 bits
16 bits
Mux
Rt register read??
Rs Rt? What about
ALUct
RegWr 5 5
5
r
busA
Rw Ra Rb
busW
32
Result
32 32-bit
32
Registers
32
busB
Clk
32
Mux
16
ZeroExt
imm16
32
ALUSrc
• Already defined 32-bit MUX; Zero Ext?
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (20)
Garcia, Spring 2004 © UCB
3d: Load Operations
• R[rt] = Mem[R[rs] + SignExt[imm16]]
Example: lw rt, rs, imm16
31
26
op
rs
6 bits
Rd
RegDst
Mux
RegWr 5
32
Clk
0
rt
5 bits
immediate
5 bits
16 bits
Rt
Rs Rt?
5
5
Rw Ra Rb
32 32-bit
Registers
busA
W_Src
32
??
32
ExtOp
32
MemWr
ALUSrc
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (21)
Data In
32
Clk
Mu
x
busB
32
Mux
16
ALUctr
Extender
imm16
16
ALU
busW
21
WrEn Adr
Data
Memory
32
Garcia, Spring 2004 © UCB
3e: Store Operations
• Mem[ R[rs] + SignExt[imm16] ] = R[rt]
Ex.: sw rt, rs, imm16
31
26
21
op
rs
6 bits 5 bits
Rd Rt
RegDst
Mux
RegWr5 5
rt
5 bits
ALU
Data In32
32
ExtOp
32
Mu
x
Extender
16
W_Src
Rs Rt
5
busA
Rw Ra Rb
32
32 32-bit
Registers busB
32
imm16
0
immediate
16 bits
ALUctr MemWr
Mux
busW
32
Clk
16
Clk
WrEn Adr
32
Data
Memory
ALUSrc
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (22)
Garcia, Spring 2004 © UCB
3f: The Branch Instruction
31
26
op
6 bits
• beq
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
rs, rt, imm16
• mem[PC]Fetch the instruction from memory
• Equal = R[rs] == R[rt]
condition
Calculate the branch
• if (Equal) Calculate the next instruction’s address
- PC = PC + 4 + ( SignExt(imm16) x 4 )
else
- PC = PC + 4
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (23)
Garcia, Spring 2004 © UCB
Datapath for Branch Operations
• beq rs, rt, imm16
Datapath generates condition (equal)
26
op
6 bits
21
00
Adder
32
PC
Mux
Rs Rt
5
busA
Rw Ra Rb
32
32 32-bit
Registers
busB
32
Cond
RegWr 5 5
busW
Clk
Adder
PC Ext
imm16
0
rs
rt
immediate
5 bits 5 bits
16 bits
Inst Address
nPC_sel
4
16
Equal?
31
Clk
• Already MUX, adder, sign extend, zero
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (24)
Garcia, Spring 2004 © UCB
Putting it All Together:A Single Cycle Datapath
Instruction<31:0>
<0:15>
<11:15>
Rs
<16:20>
<21:25>
Inst
Memory
Adr
Rt Rd Imm16
RegDst
ALUctr MemWr MemtoReg
Equal
Rt
Rd
1 0
Rs Rt
RegWr 5 5 5
busA
Rw
Ra
Rb
=
busW
32
32 32-bit
0
32
32
Registers busB
0
32
Clk
32
WrEn Adr 1
1 Data In
Data
imm16
32
Clk
16
Clk Memory
nPC_sel
imm16
Mux
ALU
Extender
PC Ext
Adder
Mux
PC
Mux
Adder
00
4
ExtOp ALUSrc
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (25)
Garcia, Spring 2004 © UCB
Peer Instruction
1:
Suppose we’re writing a MIPS interpreter in
2:
Verilog. Which sequence below is best
3:
organization for the interpreter?
4:
A. repeat loop that fetches instructions
5:
B. while loop that fetches instructions
6:
C. Decodes instructions using case statement 7:
D. Decodes instr. using chained if statements 8:
9:
E. Executes each instruction
0:
F. Increments PC by 4
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (26)
ACEF
ADEF
AECF
AEDF
BCEF
BDEF
BECF
BEDF
EF
FAE
Garcia, Spring 2004 © UCB
An Abstract View of the Implementation
PC
Clk
Next Address
ALU
Control
Ideal
Instruction
Instruction Control Signals Conditions
Memory Rd Rs Rt
5 5
5
Instruction
Address
A
Data
Data
32 Address
Rw
Ra
Rb
32
Ideal
Out
32 32-bit 32
Data
Data
Registers B
Memory
In
Clk
32
Clk
Datapath
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (27)
Garcia, Spring 2004 © UCB
Summary: Single cycle datapath
°5 steps to design a processor
• 1. Analyze instruction set => datapath requirements
• 2. Select set of datapath components & establish clock
methodology
• 3. Assemble datapath meeting the requirements
• 4. Analyze implementation of each instruction to
determine setting of control points that effects the
register transfer.
Processor
• 5. Assemble the control logic
°Control is the hard part
°Next time!
CS 61C L33 Single Cycle CPU Datapath, with Verilog II (28)
Input
Control
Memory
Datapath
Output
Garcia, Spring 2004 © UCB
Download