2009SuCS61C-L09-jh-1..

advertisement
inst.eecs.berkeley.edu/~cs61c
CS61CL : Machine Structures
Lecture #9 – Single Cycle CPU Design
2009-07-22
Jeremy Huddleston
CS61CL L09 Single Cycle CPU Design (1)
Huddleston, Summer 2009 © UCB
Finite State Machine Example: 3 ones…
FSM to detect the occurrence of 3 consecutive 1’s in the input.
Draw the FSM…
Assume state transitions are controlled by the clock:
on each clock cycle the machine checks the inputs and moves
to a new state and produces a new output…
CS61CL L09 Single Cycle CPU Design (2)
Huddleston, Summer 2009 © UCB
Hardware Implementation of FSM
… Therefore a register is needed to hold the a representation of which
state the machine is in. Use a unique bit pattern for each state.
+
=
Combinational logic circuit is
used to implement a function
maps from present state and
input to next state and output.
CS61CL L09 Single Cycle CPU Design (3)
?
Huddleston, Summer 2009 © UCB
Register Details…What’s inside?
• n instances of a “Flip-Flop”
• Flip-flop name because the output flips and
flops between and 0,1
• D is “data”, Q is “output”
• Also called “d-type Flip-Flop”
CS61CL L09 Single Cycle CPU Design (4)
Huddleston, Summer 2009 © UCB
Adder/Subtracter – One-bit adder LSB…
CS61CL L09 Single Cycle CPU Design (5)
Huddleston, Summer 2009 © UCB
Adder/Subtracter – One-bit adder (1/2)…
CS61CL L09 Single Cycle CPU Design (6)
Huddleston, Summer 2009 © UCB
Adder/Subtracter – One-bit adder (2/2)…
CS61CL L09 Single Cycle CPU Design (7)
Huddleston, Summer 2009 © UCB
N 1-bit adders  1 N-bit adder
b0
+
CS61CL L09 Single Cycle CPU Design (8)
+
+
Huddleston, Summer 2009 © UCB
Administrivia
frequency
• Midterm handed back today
CS61CL L09 Single Cycle CPU Design (9)
Huddleston, Summer 2009 © UCB
Five Components of a Computer
Computer
Processor
Control
Datapath
Memory
(passive)
(where
programs,
data live
when
running)
CS61CL L09 Single Cycle CPU Design (10)
Devices
Input
Output
Keyboard,
Mouse
Disk
(where
programs,
data live
when not
running)
Display,
Printer
Huddleston, Summer 2009 © UCB
The CPU
• Processor (CPU): the active part of the
computer, which does all the work
(data manipulation and decisionmaking)
• Datapath: portion of the processor
which contains hardware necessary to
perform operations required by the
processor (the brawn)
• Control: portion of the processor (also
in hardware) which tells the datapath
what needs to be done (the brain)
CS61CL L09 Single Cycle CPU Design (11)
Huddleston, Summer 2009 © UCB
Stages of the Datapath : Overview
• Problem: a single, atomic block which
“executes an instruction” (performs all
necessary operations beginning with
fetching the instruction) would be too
bulky and inefficient
• Solution: break up the process of
“executing an instruction” into stages,
and then connect the stages to create
the whole datapath
• smaller stages are easier to design
• easy to optimize (change) one stage
without touching the others
CS61CL L09 Single Cycle CPU Design (12)
Huddleston, Summer 2009 © UCB
Stages of the Datapath (1/5)
• There is a wide variety of MIPS
instructions: so what general steps do
they have in common?
• Stage 1: Instruction Fetch
• no matter what the instruction, the 32-bit
instruction must first be fetched from
memory (the cache-memory hierarchy)
• also, this is where we Increment PC
(that is, PC = PC + 4, to point to the next
instruction: byte addressing so + 4)
CS61CL L09 Single Cycle CPU Design (13)
Huddleston, Summer 2009 © UCB
Stages of the Datapath (2/5)
• Stage 2: Instruction Decode
• upon fetching the instruction, we next
gather data from the fields (decode all
necessary instruction data)
• first, read the opcode to determine
instruction type and field lengths
• second, read in data from all necessary
registers
- for add, read two registers
- for addi, read one register
- for jal, no reads necessary
CS61CL L09 Single Cycle CPU Design (14)
Huddleston, Summer 2009 © UCB
Stages of the Datapath (3/5)
• Stage 3: ALU (Arithmetic-Logic Unit)
• the real work of most instructions is done
here: arithmetic (+, -, *, /), shifting, logic (&,
|), comparisons (slt)
• what about loads and stores?
- lw
$t0, 40($t1)
- the address we are accessing in memory = the
value in $t1 PLUS the value 40
- so we do this addition in this stage
CS61CL L09 Single Cycle CPU Design (15)
Huddleston, Summer 2009 © UCB
Stages of the Datapath (4/5)
• Stage 4: Memory Access
• actually only the load and store
instructions do anything during this
stage; the others remain idle during this
stage or skip it all together
• since these instructions have a unique
step, we need this extra stage to account
for them
• as a result of the cache system, this
stage is expected to be fast
CS61CL L09 Single Cycle CPU Design (16)
Huddleston, Summer 2009 © UCB
Stages of the Datapath (5/5)
• Stage 5: Register Write
• most instructions write the result of
some computation into a register
• examples: arithmetic, logical, shifts,
loads, slt
• what about stores, branches, jumps?
- don’t write anything into a register at the
end
- these remain idle during this fifth stage or
skip it all together
CS61CL L09 Single Cycle CPU Design (17)
Huddleston, Summer 2009 © UCB
+4
1. Instruction
Fetch
ALU
Data
memory
rd
rs
rt
registers
PC
instruction
memory
Generic Steps of Datapath
imm
2. Decode/
Register
Read
CS61CL L09 Single Cycle CPU Design (18)
3. Execute 4. Memory
5. Reg.
Write
Huddleston, Summer 2009 © UCB
Datapath Walkthroughs (1/3)
• add
$r3,$r1,$r2 # r3 = r1+r2
• Stage 1: fetch this instruction, inc. PC
• Stage 2: decode to find it’s an add, then
read registers $r1 and $r2
• Stage 3: add the two values retrieved in
Stage 2
• Stage 4: idle (nothing to write to memory)
• Stage 5: write result of Stage 3 into register
$r3
CS61CL L09 Single Cycle CPU Design (19)
Huddleston, Summer 2009 © UCB
reg[1]+reg[2]
reg[2]
ALU
Data
memory
2
reg[1]
imm
add r3, r1, r2
+4
3
1
registers
PC
instruction
memory
Example: add Instruction
CS61CL L09 Single Cycle CPU Design (20)
Huddleston, Summer 2009 © UCB
Datapath Walkthroughs (3/3)
• sw
$r3, 17($r1)
• Stage 1: fetch this instruction, inc. PC
• Stage 2: decode to find it’s a sw, then
read registers $r1 and $r3
• Stage 3: add 17 to value in register $r1
(retrieved in Stage 2)
• Stage 4: write value in register $r3
(retrieved in Stage 2) into memory
address computed in Stage 3
• Stage 5: idle (nothing to write into a
register)
CS61CL L09 Single Cycle CPU Design (21)
Huddleston, Summer 2009 © UCB
17
CS61CL L09 Single Cycle CPU Design (22)
reg[1]
reg[1]+17
reg[3]
ALU
Data
MEM[r1+17]<=r3 memory
imm
x
1
SW r3, 17(r1)
+4
3
registers
PC
instruction
memory
Example: sw Instruction
Huddleston, Summer 2009 © UCB
Why Five Stages? (1/2)
• Could we have a different number of
stages?
• Yes, and other architectures do
• So why does MIPS have five if
instructions tend to idle for at least
one stage?
• The five stages are the union of all the
operations needed by all the
instructions.
• There is one type of instruction that uses
all five stages: the load
CS61CL L09 Single Cycle CPU Design (23)
Huddleston, Summer 2009 © UCB
Why Five Stages? (2/2)
• lw
$r3, 17($r1)
• Stage 1: fetch this instruction, inc. PC
• Stage 2: decode to find it’s a lw, then
read register $r1
• Stage 3: add 17 to value in register $r1
(retrieved in Stage 2)
• Stage 4: read value from memory
address compute in Stage 3
• Stage 5: write value found in Stage 4 into
register $r3
CS61CL L09 Single Cycle CPU Design (24)
Huddleston, Summer 2009 © UCB
ALU
MEM[r1+17]
17
reg[1]+17
Data
memory
imm
x
1
reg[1]
LW r3, 17(r1)
+4
3
registers
PC
instruction
memory
Example: lw Instruction
CS61CL L09 Single Cycle CPU Design (25)
Huddleston, Summer 2009 © UCB
Datapath Summary
• The datapath based on data transfers
required to perform instructions
ALU
Data
memory
rd
rs
rt
registers
PC
instruction
memory
• A controller causes the right transfers
to happen
imm
+4
opcode, funct
Controller
CS61CL L09 Single Cycle CPU Design (26)
Huddleston, Summer 2009 © UCB
CPU clocking (1/2)
For each instruction, how do we control the
flow of information though the datapath?
• Single Cycle CPU: All stages of an
instruction are completed within one
long clock cycle.
• The clock cycle is made sufficient long to
allow each instruction to complete all
stages without interruption and within
one cycle.
1. Instruction
Fetch
2. Decode/
Register
Read
CS61CL L09 Single Cycle CPU Design (27)
3. Execute 4. Memory
5. Reg.
Write
Huddleston, Summer 2009 © UCB
CPU clocking (2/2)
For each instruction, how do we control the
flow of information though the datapath?
• Multiple-cycle CPU: Only one stage of
instruction per clock cycle.
• The clock is made as long as the slowest
stage.
1. Instruction
Fetch
2. Decode/
3. Execute 4. Memory
Register
Read
5. Reg.
Write
• Several significant advantages over
single cycle execution: Unused stages in
a particular instruction can be skipped
OR instructions can be pipelined
(overlapped).
CS61CL L09 Single Cycle CPU Design (28)
Huddleston, Summer 2009 © UCB
How to Design a Processor: step-by-step
• 1. Analyze instruction set architecture (ISA)
 datapath requirements
• meaning of each instruction is given by the register transfers
• datapath must include storage element for ISA registers
• datapath must support each register transfer
• 2. Select set of datapath components and establish
clocking methodology
• 3. Assemble datapath meeting requirements
• 4. Analyze implementation of each instruction to
determine setting of control points that effects the
register transfer.
• 5. Assemble the control logic (hard part!)
CS61CL L09 Single Cycle CPU Design (29)
Huddleston, Summer 2009 © UCB
Review: The MIPS Instruction Formats
• All MIPS instructions are 32 bits long. 3 formats:
• R-type
• I-type
31
26
op
rs
6 bits
31
26
31
5 bits
21
rs
6 bits
16
rt
5 bits
op
• J-type
21
5 bits
11
6
rd
shamt
funct
5 bits
5 bits
6 bits
16
0
address/immediate
rt
5 bits
16 bits
26
op
6 bits
0
0
target address
26 bits
• The different fields are:
•
•
•
•
•
•
op: operation (“opcode”) of the instruction
rs, rt, rd: the source and destination register specifiers
shamt: shift amount
funct: selects the variant of the operation in the “op” field
address / immediate: address offset or immediate value
target address: target address of jump instruction
CS61CL L09 Single Cycle CPU Design (30)
Huddleston, Summer 2009 © UCB
The MIPS-lite Subset for today
• ADDU and SUBU
•addu rd,rs,rt
•subu rd,rs,rt
• OR Immediate:
31
op
31
31
6 bits
0
rd
shamt
funct
5 bits
5 bits
6 bits
0
16 bits
16
rt
5 bits
6
immediate
5 bits
21
rs
11
16
rt
5 bits
26
op
5 bits
21
rs
6 bits
16
rt
5 bits
26
op
•lw rt,rs,imm16
•sw rt,rs,imm16
21
rs
6 bits
•ori rt,rs,imm16
• LOAD and
STORE Word
26
0
immediate
5 bits
16 bits
• BRANCH:
•beq rs,rt,imm16
31
26
op
6 bits
CS61CL L09 Single Cycle CPU Design (31)
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
Huddleston, Summer 2009 © UCB
ALU Needs for MIPS-lite + Rest of MIPS
• Addition, subtraction, logical OR, ==:
ADDU
R[rd] = R[rs] + R[rt]; ...
SUBU
R[rd] = R[rs] – R[rt]; ...
ORI
R[rt] = R[rs] |
zero_ext(Imm16)...
BEQ
if ( R[rs] == R[rt] )...
• Test to see if output == 0 for any ALU
operation gives == test. How?
• P&H also adds AND,
Set Less Than (1 if A < B, 0 otherwise)
• ALU follows chap 5
CS61CL L09 Single Cycle CPU Design (32)
Huddleston, Summer 2009 © UCB
How to Design a Processor: step-by-step
• 1. Analyze instruction set architecture (ISA)
 datapath requirements
• meaning of each instruction is given by the register transfers
• datapath must include storage element for ISA registers
• datapath must support each register transfer
• 2. Select set of datapath components and establish
clocking methodology
• 3. Assemble datapath meeting requirements
• 4. Analyze implementation of each instruction to
determine setting of control points that effects the
register transfer.
• 5. Assemble the control logic (hard part!)
CS61CL L09 Single Cycle CPU Design (33)
Huddleston, Summer 2009 © UCB
What Hardware Is Needed? (1/2)
• PC: a register which keeps track of
memory addr of the next instruction
• General Purpose Registers
• used in Stages 2 (Read) and 5 (Write)
• MIPS has 32 of these
• Memory
• used in Stages 1 (Fetch) and 4 (R/W)
• cache system makes these two stages as
fast as the others, on average
CS61CL L09 Single Cycle CPU Design (34)
Huddleston, Summer 2009 © UCB
What Hardware Is Needed? (2/2)
• ALU
• used in Stage 3
• something that performs all necessary
functions: arithmetic, logicals, etc.
• we’ll design details later
• Miscellaneous Registers
• In implementations with only one stage per
clock cycle, registers are inserted between
stages to hold intermediate data and control
signals as they travels from stage to stage.
• Note: Register is a general purpose term
meaning something that stores bits. Not all
registers are in the “register file”.
CS61CL L09 Single Cycle CPU Design (35)
Huddleston, Summer 2009 © UCB
Combinational Logic Elements (Building Blocks)
• Adder
CarryIn
A
B
• MUX
Adder
32
Sum
32
CarryOut
32
Select
B
32
MUX
A
32
• ALU
A
OP
ALU
B
32
Y
32
32
Result
32
CS61CL L09 Single Cycle CPU Design (36)
Huddleston, Summer 2009 © UCB
Storage Element: Idealized Memory
• Memory (idealized)
Write Enable Address
• One input bus: Data In
• One output bus: Data Out
• Memory word is found by:
Data In
32
Clk
DataOut
32
• Address selects the word to put on Data Out
• Write Enable = 1: address selects the memory
word to be written via the Data In bus
• Clock input (CLK)
• The CLK input is a factor ONLY during write
operation
• During read operation, behaves as a combinational
logic block:
-
Address valid  Data Out valid after “access time.”
CS61CL L09 Single Cycle CPU Design (37)
Huddleston, Summer 2009 © UCB
Storage Element: Register (Building Block)
• Similar to D Flip Flop except
• N-bit input and output
• Write Enable input
• Write Enable:
• negated (or deasserted) (0):
Data Out will not change
Write Enable
Data In
Data Out
N
N
clk
• asserted (1):
Data Out will become Data In on positive
edge of clock
CS61CL L09 Single Cycle CPU Design (38)
Huddleston, Summer 2009 © UCB
Storage Element: Register File
• Register File consists of 32 registers:
• Two 32-bit output busses:
busA and busB
• One 32-bit input bus: busW
RWRA RB
Write Enable 5 5 5
busA
busW
32
32 32-bit
32
Registers busB
• Register is selected by:
Clk
32
• RA (number) selects the register to put on busA (data)
• RB (number) selects the register to put on busB (data)
• RW (number) selects the register to be written
via busW (data) when Write Enable is 1
• Clock input (clk)
• The clk input is a factor ONLY during write operation
• During read operation, behaves as a combinational logic
block:
-
RA or RB valid  busA or busB valid after “access time.”
CS61CL L09 Single Cycle CPU Design (39)
Huddleston, Summer 2009 © UCB
How to Design a Processor: step-by-step
• 1. Analyze instruction set architecture (ISA)
 datapath requirements
• meaning of each instruction is given by the register transfers
• datapath must include storage element for ISA registers
• datapath must support each register transfer
• 2. Select set of datapath components and establish
clocking methodology
• 3. Assemble datapath meeting requirements
• 4. Analyze implementation of each instruction to
determine setting of control points that effects the
register transfer.
• 5. Assemble the control logic (hard part!)
CS61CL L09 Single Cycle CPU Design (40)
Huddleston, Summer 2009 © UCB
Overview of the Instruction Fetch Unit
• The common RTL operations
• Fetch the Instruction: mem[PC]
• Update the program counter:
- Sequential Code: PC  PC + 4
- Branch and Jump: PC  “something else”
clk
PC
Next Address
Logic
Address
Instruction
Memory
CS61CL L09 Single Cycle CPU Design (41)
Instruction Word
32
Huddleston, Summer 2009 © UCB
Add & Subtract
• R[rd] = R[rs] op R[rt]
Ex.: addU rd,rs,rt
• Ra, Rb, and Rw come from instruction’s Rs, Rt, and Rd fields
• ALUctr and RegWr: control logic after decoding the instruction
31
26
21
16
11
op
rs
rt
rd
6 bits
5 bits
5 bits
5 bits
• … Already defined the register file & ALU
Rd Rs Rt
RegWr 5 5 5
Rw Ra Rb
32 32-bit
Registers
clk
CS61CL L09 Single Cycle CPU Design (42)
funct
6 bits
0
ALUctr
busA
32
busB
ALU
busW
32
6
shamt
5 bits
Result
32
32
Huddleston, Summer 2009 © UCB
Logical Operations with Immediate
• R[rt] = R[rs] op ZeroExt[imm16]
31
26
op
31 6 bits
21
rs
16
rt
5 bits
0
immediate
5 bits 16 15
16 bits
0
immediate
0000000000000000
16 bits
16 bits
But we’re writing to Rt register??
ALUctr
RegWr Rd Rs Rt
5
5
Rw Ra Rb
busA
RegFile
busB
clk
CS61CL L09 Single Cycle CPU Design (43)
32
ALU
busW
5
32
32
Huddleston, Summer 2009 © UCB
Logical Operations with Immediate
• R[rt] = R[rs] op ZeroExt[imm16]
31
26
21
op
rs
31 6 bits
Rd Rt
RegWr
5 bits 16 15
16 bits
0
immediate
16 bits
What about Rt register read??
Rs Rt
5
ALUctr
5
Rw Ra Rb
busA
RegFile
busB
32
16
ZeroExt
clk
imm16
immediate
32
0
ALU
32
rt
5 bits
0
5
0
0000000000000000
16 bits
RegDst
1
16
32
1
32
ALUSrc
• Already defined 32-bit MUX; Zero Ext?
CS61CL L09 Single Cycle CPU Design (44)
Huddleston, Summer 2009 © UCB
Load Operations
• R[rt] = Mem[R[rs] + SignExt[imm16]]
Example: lw rt,rs,imm16
31
26
op
6 bits
21
16
rs
5 bits
rt
5 bits
0
immediate
16 bits
RegDst Rd Rt
1
RegWr
5
Rs Rt
5
ALUctr
5
Rw Ra Rb
busA
RegFile
busB
32
imm16
CS61CL L09 Single Cycle CPU Design (45)
16
ZeroExt
clk
32
0
ALU
32
0
32
1
32
ALUSrc
Huddleston, Summer 2009 © UCB
Load Operations
• R[rt] = Mem[R[rs] + SignExt[imm16]]
Example: lw rt,rs,imm16
31
26
op
6 bits
21
rs
5 bits
16
0
rt
5 bits
immediate
16 bits
MemtoReg
ALUctr
MemWr
RegDst Rd Rt
1
RegWr
0
Rs Rt
5
5
5
Rw Ra Rb
RegFile
32
busB
32
imm16
16
ExtOp
CS61CL L09 Single Cycle CPU Design (46)
Extender
clk
32
0
1
32
ALU
busW
busA
? 32
Data In
ALUSrc clk
32
0
WrEn Adr
Data
Memory
1
Huddleston, Summer 2009 © UCB
Store Operations
• Mem[ R[rs] + SignExt[imm16] ] = R[rt]
Ex.: sw rt, rs, imm16
31
26
op
6 bits
21
rs
5 bits
16
0
rt
5 bits
immediate
16 bits
MemtoReg
ALUctr
MemWr
RegDst Rd Rt
1
RegWr
0
Rs Rt
5
5
5
Rw Ra Rb
RegFile
32
busB
32
imm16
16
ExtOp
CS61CL L09 Single Cycle CPU Design (47)
Extender
clk
32
0
ALU
busW
busA
32
0
32 WrEn Adr
1
32
Data In
ALUSrc clk
Data
Memory
1
Huddleston, Summer 2009 © UCB
Store Operations
• Mem[ R[rs] + SignExt[imm16] ] = R[rt]
Ex.: sw rt, rs, imm16
31
26
op
6 bits
21
rs
5 bits
16
0
rt
5 bits
immediate
16 bits
MemtoReg
ALUctr
MemWr
RegDst Rd Rt
1
RegWr
0
Rs Rt
5
5
5
Rw Ra Rb
RegFile
32
busB
32
imm16
16
ExtOp
CS61CL L09 Single Cycle CPU Design (48)
Extender
clk
32
0
ALU
busW
busA
32
0
32 WrEn Adr
1
32
Data In
ALUSrc clk
Data
Memory
1
Huddleston, Summer 2009 © UCB
The Branch Instruction
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
immediate
16 bits
0
beq rs, rt, imm16
• mem[PC] Fetch the instruction from memory
• Equal = R[rs] == R[rt] Calculate branch condition
• if (Equal) Calculate the next instruction’s
address
- PC = PC + 4 + ( SignExt(imm16) x 4 )
else
- PC = PC + 4
CS61CL L09 Single Cycle CPU Design (49)
Huddleston, Summer 2009 © UCB
Datapath for Branch Operations
• beq rs, rt, imm16
Datapath generates condition (equal)
31
26
21
op
rs
6 bits 5 bits
Inst Address
16
rt
5 bits
immediate
16 bits
Equal
nPC_sel
Adder
4
00
RegWr
busW
clk
clk
5
ALUctr
Rs Rt
5
5
Rw Ra Rb
busA
RegFile
busB
32
=
ALU
PC
Mux
Adder
PC Ext
imm16
0
32
32
Already have mux, adder, need special sign
extender for PC, need equal compare (sub?)
CS61CL L09 Single Cycle CPU Design (50)
Huddleston, Summer 2009 © UCB
Putting it All Together:A Single Cycle Datapath
RegDst
00
32
5
Rs Rt
5
5
Rw Ra Rb
RegFile
busB
16
Extender
imm16
busA
32
clk
clk
MemtoReg
ALUctr
MemWr
32
=
ALU
PC
PC Ext
Adder
Mux
busW
Equal
0
RegWr
Adder
4
Rt Rd Imm16
Rd Rt
1
Instruction<31:0>
<0:15>
nPC_sel
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
0
32
1
32
Data In
clk
32
0
WrEn Adr
Data
Memory
1
imm16
ExtOp
CS61CL L09 Single Cycle CPU Design (51)
ALUSrc
Huddleston, Summer 2009 © UCB
An Abstract View of the Implementation
Ideal
Instruction
Memory
PC
clk
32
Instruction
Rd Rs Rt
5 5
5
Rw Ra Rb
Register
File
clk
Control Signals Conditions
A
32
ALU
Next Address
Instruction
Address
Control
B
32
32
Data
Addr
Ideal
Data
Memory
Data
Out
Data
In clk
Datapath
CS61CL L09 Single Cycle CPU Design (52)
Huddleston, Summer 2009 © UCB
Summary: A Single Cycle Datapath
Instruction<31:0>
Rs Rt Rd Imm16
ALUctr
MemtoReg
Rd Rt
1
0
RegWr
5
Rs Rt
5
busB
32
16
Extender
clk
imm16
busA
ExtOp
CS61CL L09 Single Cycle CPU Design (53)
32
MemWr
=
ALU
RegFile
32
zero
5
Rw Ra Rb
busW
<0:15>
RegDst
<11:15>
clk
<16:20>
instr
fetch
unit
nPC_sel
<21:25>
• We have
everything
except control
signals
0
32
1
32
Data In
clk
32
0
WrEn Adr
Data
Memory
1
ALUSrc
Huddleston, Summer 2009 © UCB
“And In conclusion…”
• N-bit adder-subtractor done using N 1bit adders with XOR gates on input
• XOR serves as conditional inverter
• CPU design involves Datapath,Control
• Datapath in MIPS involves 5 CPU stages
1. Instruction Fetch
2. Instruction Decode & Register Read
3. ALU (Execute)
4. Memory
5. Register Write
CS61CL L09 Single Cycle CPU Design (54)
Huddleston, Summer 2009 © UCB
Bonus slides
• These are extra slides that used to be
included in lecture notes, but have
been moved to this, the “bonus” area
to serve as a supplement.
• The slides will appear in the order they
would have in the normal presentation
CS61CL L09 Single Cycle CPU Design (55)
Huddleston, Summer 2009 © UCB
Review of Timing Terms
• Clock (CLK) - steady square wave that
synchronizes system
• Setup Time - when the input must be stable before
the rising edge of the CLK
• Hold Time - when the input must be stable after the
rising edge of the CLK
• “CLK-to-Q” Delay - how long it takes the output to
change, measured from the rising edge
• Flip-flop - one bit of state that samples every rising
edge of the CLK
• Register - several bits of state that samples on
rising edge of CLK or on LOAD
CS61CL L09 Single Cycle CPU Design (56)
Huddleston, Summer 2009 © UCB
What about overflow?
• Consider a 2-bit signed # & overflow:
•10
•11
•00
•01
= -2 + -2 or -1
= -1 + -2 only
= 0 NOTHING!
= 1 + 1 only
• Highest adder
±
#
• C1 = Carry-in = Cin, C2 = Carry-out = Cout
• No Cout or Cin  NO overflow!
• Cin and Cout  NO overflow!
What • C but no C  A,B both > 0, overflow!
in
out
op?
• Cout but no Cin  A,B both < 0, overflow!
CS61CL L09 Single Cycle CPU Design (57)
Huddleston, Summer 2009 © UCB
What about overflow?
• Consider a 2-bit signed # & overflow:
10
11
00
01
= -2 + -2 or -1
= -1 + -2 only
= 0 NOTHING!
= 1 + 1 only
±
#
• Overflows when…
• Cin, but no Cout  A,B both > 0, overflow!
• Cout, but no Cin  A,B both < 0, overflow!
CS61CL L09 Single Cycle CPU Design (58)
Huddleston, Summer 2009 © UCB
Extremely Clever Subtractor
CS61CL L09 Single Cycle CPU Design (59)
Huddleston, Summer 2009 © UCB
Datapath Walkthroughs (2/3)
• slti
$r3,$r1,17
• Stage 1: fetch this instruction, inc. PC
• Stage 2: decode to find it’s an slti, then
read register $r1
• Stage 3: compare value retrieved in Stage 2
with the integer 17
• Stage 4: idle
• Stage 5: write the result of Stage 3 in
register $r3
CS61CL L09 Single Cycle CPU Design (60)
Huddleston, Summer 2009 © UCB
17
reg[1]<17?
ALU
Data
memory
imm
x
1
reg[1]
slti r3, r1, 17
+4
3
registers
PC
instruction
memory
Example: slti Instruction
CS61CL L09 Single Cycle CPU Design (61)
Huddleston, Summer 2009 © UCB
Clocking Methodology
Clk
.
.
.
.
.
.
.
.
.
.
.
.
• Storage elements clocked by same edge
• Being physical devices, flip-flops (FF) and
combinational logic have some delays
• Gates: delay from input change to output change
• Signals at FF D input must be stable before active clock
edge to allow signal to travel within the FF (set-up time),
and we have the usual clock-to-Q delay
• “Critical path” (longest path through logic)
determines length of clock period
CS61CL L09 Single Cycle CPU Design (62)
Huddleston, Summer 2009 © UCB
An Abstract View of the Critical Path
Ideal
Instruction
Memory
PC
clk
32
Instruction
Rd Rs Rt
5 5
5
Rw Ra Rb
Register
File
clk
CS61CL L09 Single Cycle CPU Design (63)
(Assumes a fast controller)
A
32
ALU
Next Address
Instruction
Address
Critical Path (Load Instruction) =
Delay clock through PC (FFs) +
Instruction Memory’s Access Time +
Register File’s Access Time, +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Stable Time for Register File Write
B
32
32
Data
Addr
Ideal
Data
Memory
Data
In clk
Huddleston, Summer 2009 © UCB
Download