L17-ddg-singlecpu.ppt

advertisement
inst.eecs.berkeley.edu/~cs61c
CS61C : Machine Structures
Lecture #17
Single Cycle CPU Datapath
CPS
2005-10-31
today!
There is one handout
today at the front and
back of the room!
Lecturer PSOE, new dad Dan Garcia
www.cs.berkeley.edu/~ddgarcia
Halloween plans?  Try the Castro, SF!
Today 2005-10-31,
from 7pm-mid
($5 donation)
go at least
once…
halloweeninthecastro.com
CS61C L17 Single Cycle CPU Datapath (1)
Happy Halloween, everyone!
Garcia, Fall 2005 © UCB
Wed’s talk : Jim Larus, µsoft (Cal Ph.D.)
“An Overview of the Singularity Project”
• 306 Soda Hall, Wed 2005-11-02 @ 4-5pm
• Dr. Larus is the author of SPIM!
• ftp://ftp.research.microsoft.com/pub/tr/TR-2005-135.pdf
“Singularity is a research project in Microsoft
Research that started with the question: what would
a software platform look like if it was designed from
scratch with the primary goal of dependability?
Singularity is working to answer this question by
building on advances in programming languages
and tools to develop a new system architecture and
operating system (named Singularity), with the aim
of producing a more robust and dependable
software platform. Singularity demonstrates the
practicality of new technologies and architectural
decisions, which should lead to the construction of
more robust and dependable systems.”
CS61C L17 Single Cycle CPU Datapath (2)
Garcia, Fall 2005 © UCB
Review
• Use muxes to select among input
• S input bits selects 2S inputs
• Each input can be n-bits wide, indep of S
• Implement muxes hierarchically
• ALU can be implemented using a mux
• Coupled with basic block elements
• N-bit adder-subtractor done using N 1bit adders with XOR gates on input
• XOR serves as conditional inverter
• Programmable Logic Arrays are often
used to implement our CL
CS61C L17 Single Cycle CPU Datapath (3)
Garcia, Fall 2005 © UCB
Anatomy: 5 components of any Computer
Personal Computer
Computer
Processor
This week
and next
Control
(“brain”)
Datapath
(“brawn”)
Memory
(where
programs,
data
live when
running)
Devices
Input
Output
Keyboard,
Mouse
Disk
(where
programs,
data
live when
not running)
Display,
Printer
CS61C L17 Single Cycle CPU Datapath (4)
Garcia, Fall 2005 © UCB
Outline of Today’s Lecture
• Design a processor: step-by-step
• Requirements of the Instruction Set
• Hardware components that match the
instruction set requirements
CS61C L17 Single Cycle CPU Datapath (5)
Garcia, Fall 2005 © UCB
How to Design a Processor: step-by-step
• 1. Analyze instruction set architecture (ISA) 
datapath requirements
• meaning of each instruction is given by the
register transfers
• datapath must include storage element for ISA
registers
• datapath must support each register transfer
• 2. Select set of datapath components and
establish clocking methodology
• 3. Assemble datapath meeting requirements
• 4. Analyze implementation of each
instruction to determine setting of control
points that effects the register transfer.
•
5. Assemble the control logic
CS61C L17 Single Cycle CPU Datapath (6)
Garcia, Fall 2005 © UCB
Review: The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. 3 formats:
31
26
op
• R-type
rs
6 bits
31
• I-type
26
op
31
16
rt
5 bits
5 bits
21
rs
6 bits
• J-type
21
5 bits
11
rd
shamt
funct
5 bits
5 bits
6 bits
16
6 bits
• The different fields are:
0
0
address/immediate
rt
5 bits
16 bits
26
op
6
0
target address
26 bits
• op: operation (“opcode”) of the instruction
• rs, rt, rd: the source and destination register specifiers
• shamt: shift amount
• funct: selects the variant of the operation in the “op” field
• address / immediate: address offset or immediate value
• target address: target address of jump instruction
CS61C L17 Single Cycle CPU Datapath (7)
Garcia, Fall 2005 © UCB
Step 1a: The MIPS-lite Subset for today
• ADDU and SUBU31
26
•addu rd,rs,rt
op
• OR Immediate:
rs
6 bits
•subu rd,rs,rt
31
op
31
•lw rt,rs,imm16
•sw rt,rs,imm16
31
• BRANCH:
•beq rs,rt,imm16
26
op
6 bits
CS61C L17 Single Cycle CPU Datapath (8)
rs
5 bits
shamt
funct
5 bits
5 bits
6 bits
0
16 bits
0
immediate
5 bits
21
0
rd
16
rt
5 bits
6
immediate
5 bits
21
rs
11
16
rt
5 bits
26
6 bits
5 bits
21
rs
op
16
rt
5 bits
26
•ori rt,rs,imm166 bits
• LOAD and
STORE Word
21
16 bits
16
rt
5 bits
0
immediate
16 bits
Garcia, Fall 2005 © UCB
Register Transfer Language
• RTL gives the meaning of the instructions
{op , rs , rt , rd , shamt , funct} = MEM[ PC ]
{op , rs , rt , Imm16}
= MEM[ PC ]
• All start by fetching the instruction
inst
Register Transfers
ADDU R[rd] = R[rs] + R[rt];
PC = PC + 4
SUBU
R[rd] = R[rs] – R[rt];
PC = PC + 4
ORI
R[rt] = R[rs] | zero_ext(Imm16);
PC = PC + 4
LOAD R[rt] = MEM[ R[rs] + sign_ext(Imm16)];PC = PC + 4
STORE MEM[ R[rs] + sign_ext(Imm16) ] = R[rt];PC = PC + 4
BEQ if ( R[rs] == R[rt] ) then
PC = PC + 4 + (sign_ext(Imm16) || 00)
else PC = PC + 4
CS61C L17 Single Cycle CPU Datapath (9)
Garcia, Fall 2005 © UCB
Step 1: Requirements of the Instruction Set
• Memory (MEM)
• instructions & data
• Registers (R: 32 x 32)
• read RS
• read RT
• Write RT or RD
• PC
• Extender (sign extend)
• Add and Sub register or extended
immediate
• Add 4 or extended immediate to PC
CS61C L17 Single Cycle CPU Datapath (10)
Garcia, Fall 2005 © UCB
Step 2: Components of the Datapath
•Combinational Elements
•Storage Elements
• Clocking methodology
CS61C L17 Single Cycle CPU Datapath (11)
Garcia, Fall 2005 © UCB
Combinational Logic Elements (Building Blocks)
A
B
32
Adder
•Adder
CarryIn
32
Sum
CarryOut
32
Select
B
32
MUX
•MUX
A
32
Y
32
OP
A
B
ALU
•ALU
32
32
Result
32
CS61C L17 Single Cycle CPU Datapath (12)
Garcia, Fall 2005 © UCB
ALU Needs for MIPS-lite + Rest of MIPS
• Addition, subtraction, logical OR, ==:
ADDU R[rd] = R[rs] + R[rt]; ...
SUBU R[rd] = R[rs] – R[rt]; ...
ORI R[rt] = R[rs] | zero_ext(Imm16)...
BEQ
if ( R[rs] == R[rt] )...
• Test to see if output == 0 for any ALU
operation gives == test. How?
• P&H also adds AND,
Set Less Than (1 if A < B, 0 otherwise)
• ALU follows chap 5
CS61C L17 Single Cycle CPU Datapath (13)
Garcia, Fall 2005 © UCB
Administrivia
• Project 2 graded
• You have a week (2005-11-07) to regrade
• My wed OH this week moved to Fri @ 2p
• Final Exam location TBA (exam grp 14)
• Sat, 2005-12-17, 12:30–3:30pm
• ALL students are required to complete ALL
of the exam (even if you aced the midterm)
• Same format as the midterm
- 3 Hours
- Closed book, except for 2 study sheets + green
- Leave your backpacks, books at home
CS61C L17 Single Cycle CPU Datapath (14)
Garcia, Fall 2005 © UCB
Storage Element: Idealized Memory
Write Enable
Address
• Memory (idealized)
• One input bus: Data In
• One output bus: Data Out
• Memory word is selected by:
Data In
32
Clk
DataOut
32
• Address selects the word to put on Data Out
• Write Enable = 1: address selects the memory
word to be written via the Data In bus
• Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational logic
block:
-
Address valid  Data Out valid after “access time.”
CS61C L17 Single Cycle CPU Datapath (15)
Garcia, Fall 2005 © UCB
Storage Element: Register (Building Block)
• Similar to D Flip Flop except
- N-bit input and output
- Write Enable input
• Write Enable:
- negated (or deasserted) (0):
Data Out will not change
- asserted (1):
Data Out will become Data In
CS61C L17 Single Cycle CPU Datapath (16)
Write Enable
Data In
N
Data Out
N
Clk
Garcia, Fall 2005 © UCB
Storage Element: Register File
• Register File consists of 32 registers:
• Two 32-bit output busses:
busA and busB
• One 32-bit input bus: busW
• Register is selected by:
RW RA RB
Write Enable 5 5 5
busW
32
Clk
busA
32
32 32-bit
Registers busB
32
• RA (number) selects the register to put on busA (data)
• RB (number) selects the register to put on busB (data)
• RW (number) selects the register to be written
via busW (data) when Write Enable is 1
• Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational
logic block:
-
RA or RB valid => busA or busB valid after “access time.”
CS61C L17 Single Cycle CPU Datapath (17)
Garcia, Fall 2005 © UCB
Step 3: Assemble DataPath meeting requirements
• Register Transfer Requirements
 Datapath Assembly
• Instruction Fetch
• Read Operands and Execute Operation
CS61C L17 Single Cycle CPU Datapath (18)
Garcia, Fall 2005 © UCB
3a: Overview of the Instruction Fetch Unit
• The common RTL operations
• Fetch the Instruction: mem[PC]
• Update the program counter:
- Sequential Code: PC = PC + 4
- Branch and Jump: PC = “something else”
Clk
PC
Next Address
Logic
Address
Instruction
Memory
CS61C L17 Single Cycle CPU Datapath (19)
Instruction Word
32
Garcia, Fall 2005 © UCB
3b: Add & Subtract
• R[rd] = R[rs] op R[rt] Ex.: addU rd,rs,rt
• Ra, Rb, and Rw come from instruction’s Rs, Rt,
26
21
16
11
6
and Rd fields 31
op
6 bits
rs
5 bits
rt
5 bits
rd
5 bits
shamt
5 bits
funct
6 bits
0
• ALUctr and RegWr: control logic after decoding
the instruction
Rd Rs Rt
RegWr 5 5 5
32 32-bit
Registers
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
ALUctr
Result
32
• Already defined register file, ALU
CS61C L17 Single Cycle CPU Datapath (20)
Garcia, Fall 2005 © UCB
Clocking Methodology
Clk
.
.
.
.
.
.
.
.
.
.
.
.
• Storage elements clocked by same edge
• Being physical devices, flip-flops (FF) and
combinational logic have some delays
• Gates: delay from input change to output change
• Signals at FF D input must be stable before active clock
edge to allow signal to travel within the FF, and we have
the usual clock-to-Q delay
• “Critical path” (longest path through logic)
determines length of clock period
CS61C L17 Single Cycle CPU Datapath (21)
Garcia, Fall 2005 © UCB
Register-Register Timing: One complete cycle
Clk
New Value
PC Old Value
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
RegWr
Old Value
busA, B
Old Value
busW
Old Value
Instruction Memory Access Time
New Value
Delay through Control Logic
New Value
New Value
Register File Access
Time New Value
ALU Delay
New Value
Rd Rs Rt
RegWr5 5 5
CS61C L17 Single Cycle CPU Datapath (22)
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
32 32-bit
Registers
ALUctr
Register Write
Occurs Here
Result
32
Garcia, Fall 2005 © UCB
3c: Logical Operations with Immediate
• R[rt] = R[rs] op ZeroExt[imm16] ]
31
26
21
op
rs
31 6 bits
5 bits
16
rt
5 bits 16 15 rd?
11
0
immediate
16 bits
0
ALU
immediate
0000000000000000
Rd
Rt
RegDst
16 bits
16 bits
Mux
Rt register read??
Rs Rt? What about
ALUct
RegWr 5 5
5
r
busA
Rw Ra Rb
busW
32
Result
32 32-bit
32
Registers
32
busB
Clk
32
Mux
16
ZeroExt
imm16
32
ALUSrc
• Already defined 32-bit MUX; Zero Ext?
CS61C L17 Single Cycle CPU Datapath (23)
Garcia, Fall 2005 © UCB
3d: Load Operations
• R[rt] = Mem[R[rs] + SignExt[imm16]]
Example: lw rt,rs,imm16
31
26
op
rs
6 bits
Rd
RegDst
Mux
RegWr 5
32
Clk
0
rt
5 bits
immediate
5 bits
16 bits
Rt
Rs Rt
5
5
Rw Ra Rb
32 32-bit
Registers
busA
W_Src
32
32
ExtOp
CS61C L17 Single Cycle CPU Datapath (24)
32
MemWr
??
ALUSrc
Data In
32
Clk
Mu
x
busB
32
Mux
16
ALUctr
Extender
imm16
16
ALU
busW
21
WrEn Adr
Data
Memory
32
Garcia, Fall 2005 © UCB
3e: Store Operations
• Mem[ R[rs] + SignExt[imm16] ] = R[rt]
Ex.: sw rt, rs, imm16
31
26
21
op
rs
6 bits 5 bits
Rd Rt
RegDst
Mux
RegWr5 5
rt
5 bits
W_Src
32
ExtOp
CS61C L17 Single Cycle CPU Datapath (25)
32
Data In32
Clk
WrEn Adr
32
Data
Memory
Mux
Extender
16
immediate
16 bits
ALUctr MemWr
ALU
busA
Rw Ra Rb
32
32 32-bit
Registers busB
32
imm16
0
Rs Rt
5
Mux
busW
32
Clk
16
ALUSrc
Garcia, Fall 2005 © UCB
3f: The Branch Instruction
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
• beq rs, rt, imm16
• mem[PC] Fetch the instruction from memory
• Equal = R[rs] == R[rt] Calculate branch condition
• if (Equal) Calculate the next instruction’s address
- PC = PC + 4 + ( SignExt(imm16) x 4 )
else
- PC = PC + 4
CS61C L17 Single Cycle CPU Datapath (26)
Garcia, Fall 2005 © UCB
Datapath for Branch Operations
• beq rs, rt, imm16
Datapath generates condition (equal)
26
op
6 bits
21
00
Adder
32
PC
Mux
Adder
PC Ext
imm16
0
rs
rt
immediate
5 bits 5 bits
16 bits
Inst Address
nPC_sel
4
16
Rs Rt
5
busA
Rw Ra Rb
32
32 32-bit
Registers
busB
32
Cond
RegWr 5 5
busW
Clk
Equal?
31
Clk
• Already MUX, adder, sign extend, zero
CS61C L17 Single Cycle CPU Datapath (27)
Garcia, Fall 2005 © UCB
Putting it All Together:A Single Cycle Datapath
Instruction<31:0>
<0:15>
<11:15>
Rs
<16:20>
<21:25>
Inst
Memory
Adr
Rt Rd Imm16
RegDst
ALUctr MemWr MemtoReg
Equal
Rt
Rd
1 0
Rs Rt
RegWr 5 5 5
busA
Rw
Ra
Rb
=
busW
32
32 32-bit
0
32
32
Registers busB
0
32
Clk
32
WrEn Adr 1
1 Data In
Data
imm16
32
Clk
16
Clk Memory
nPC_sel
imm16
Mux
ALU
Extender
PC Ext
Adder
Mux
PC
Mux
Adder
00
4
ExtOp ALUSrc
CS61C L17 Single Cycle CPU Datapath (28)
Garcia, Fall 2005 © UCB
An Abstract View of the Implementation
PC
Clk
Next Address
ALU
Control
Ideal
Instruction
Instruction Control Signals Conditions
Memory Rd Rs Rt
5 5
5
Instruction
Address
A
Data
Data
32 Address
Rw
Ra
Rb
32
Ideal
Out
32 32-bit 32
Data
Data
Registers B
Memory
In
Clk
32
Clk
Datapath
CS61C L17 Single Cycle CPU Datapath (29)
Garcia, Fall 2005 © UCB
Peer Instruction
A. If the destination reg is the same
as the source reg, we could
compute the incorrect value!
B. We’re going to be able to read 2
registers and write a 3rd in 1 cycle
C. Datapath is hard, Control is easy
CS61C L17 Single Cycle CPU Datapath (30)
1:
2:
3:
4:
5:
6:
7:
8:
ABC
FFF
FFT
FTF
FTT
TFF
TFT
TTF
TTT
Garcia, Fall 2005 © UCB
Peer Instruction
A. Our ALU is a synchronous device
B. We should use the main ALU to
compute PC=PC+4
C. The ALU is inactive for memory
reads or writes.
CS61C L17 Single Cycle CPU Datapath (31)
1:
2:
3:
4:
5:
6:
7:
8:
ABC
FFF
FFT
FTF
FTT
TFF
TFT
TTF
TTT
Garcia, Fall 2005 © UCB
Peer Instruction
ABC
A. SW can peek at HW (past ISA
FFF
abstraction boundary) for optimizations 1:
2: FFT
3: FTF
B. SW can depend on particular HW
4: FTT
implementation of ISA
C. Timing diagrams serve as a critical
debugging tool in the EE toolkit
CS61C L17 Single Cycle CPU Datapath (32)
5:
6:
7:
8:
TFF
TFT
TTF
TTT
Garcia, Fall 2005 © UCB
Peer Instruction
A.
(a+b)• (a+b) = b
B.
N-input gates can be thought of cascaded 2input gates. I.e.,
(a ∆ b ∆ c ∆ d) = a ∆ (b ∆ (c ∆ d))
where ∆ is one of AND, OR, XOR, NAND
C.
You can use NOR(s) with clever wiring to
simulate AND, OR, & NOT
CS61C L17 Single Cycle CPU Datapath (33)
1:
2:
3:
4:
5:
6:
7:
8:
ABC
FFF
FFT
FTF
FTT
TFF
TFT
TTF
TTT
Garcia, Fall 2005 © UCB
Peer Instruction Answer
A. (next slide)
B. (next slide)
C. You can use NOR(s) with clever wiring to
simulate AND, OR, & NOT.
°
°
°
A.
B.
C.
NOR(a,a)= a+a = aa = a
Using this NOT, can we make a NOR an OR? An And?
TRUE
ABC
(a+b)• (a+b) = b
1: FFF
N-input gates can be thought of cascaded 22: FFT
input gates. I.e.,
3: FTF
(a ∆ b ∆ c ∆ d) = a ∆ (b ∆ (c ∆ d))
4: FTT
where ∆ is one of AND, OR, XOR, NAND
5: TFF
6: TFT
You can use NOR(s) with clever wiring to
7: TTF
simulate AND, OR, & NOT
8: TTT
CS61C L17 Single Cycle CPU Datapath (34)
Garcia, Fall 2005 © UCB
Peer Instruction Answer (A)
(a+b) • (a+b) =?= b
A.
(a+b)•(a+b)
aa+ab+ba+bb
distribution
0+b(a+a)+b
complimentarity,
commutativity,
distribution,
idempotent
b(1)+b
identity, complimentarity
b+b
identity
b
idempotent
CS61C L17 Single Cycle CPU Datapath (35)
TRUE
Garcia, Fall 2005 © UCB
A.
Peer Instruction Answer (B)
B.
N-input gates can be thought of cascaded 2-input
gates. I.e.,
(a ∆ b ∆ c ∆ d) = a ∆ (b ∆ (c ∆ d))
where ∆ is one of AND, OR, XOR, NAND…FALSE
Let’s confirm!
CORRECT 3-input
XYZ|AND|OR|XOR|NAND
000| 0 0|0 0| 0 0| 1 1
001| 0 0|1 1| 1 1| 1 1
010| 0 0|1 1| 1 1| 1 1
011| 0 0|1 1| 0 0| 1 1
100| 0 0|1 1| 1 1| 1 0
101| 0 0|1 1| 0 0| 1 0
110| 0 0|1 1| 0 0| 1 0
111| 1 1|1 1| 1 1| 0 1
CS61C L17 Single Cycle CPU Datapath (36)
CORRECT 2-input
YZ|AND|OR|XOR|NAND
00| 0 |0 | 0 | 1
01| 0 |1 | 1 | 1
10| 0 |1 | 1 | 1
11| 1 |1 | 0 | 0
Garcia, Fall 2005 © UCB
Peer Instruction
A. Truth table for mux with 4-bits of
signals has 24 rows
B. We could cascade N 1-bit shifters
to make 1 N-bit shifter for sll, srl
C. If 1-bit adder delay is T, the N-bit
adder delay would also be T
CS61C L17 Single Cycle CPU Datapath (37)
1:
2:
3:
4:
5:
6:
7:
8:
ABC
FFF
FFT
FTF
FTT
TFF
TFT
TTF
TTT
Garcia, Fall 2005 © UCB
Peer Instruction Answer
A. Truth table for mux with 4-bits of signals
controls 16 inputs, for a total of 20 inputs,
so truth table is 220 rows…FALSE
B. We could cascade N 1-bit shifters to
make 1 N-bit shifter for sll, srl … TRUE
C. What about the cascading carry? FALSE
ABC
A. Truth table for mux with 4-bits of
1: FFF
4
signals is 2 rows long
2: FFT
B. We could cascade N 1-bit shifters
to make 1 N-bit shifter for sll, srl
C. If 1-bit adder delay is T, the N-bit
adder delay would also be T
CS61C L17 Single Cycle CPU Datapath (38)
3:
4:
5:
6:
7:
8:
FTF
FTT
TFF
TFT
TTF
TTT
Garcia, Fall 2005 © UCB
Summary: Single cycle datapath
°5 steps to design a processor
• 1. Analyze instruction set  datapath requirements
• 2. Select set of datapath components & establish clock
methodology
• 3. Assemble datapath meeting the requirements
• 4. Analyze implementation of each instruction to
determine setting of control points that effects the
register transfer.
Processor
• 5. Assemble the control logic
°Control is the hard part
°Next time!
CS61C L17 Single Cycle CPU Datapath (39)
Input
Control
Memory
Datapath
Output
Garcia, Fall 2005 © UCB
Download