CS 61C: Great Ideas in Computer Architecture Single Cycle MIPS CPU—Part I Instructors:

advertisement
CS 61C:
Great Ideas in Computer Architecture
Single Cycle MIPS CPU—Part I
Instructors:
Krste Asanovic, Randy H. Katz
http://inst.eecs.Berkeley.edu/~cs61c/fa12
6/27/2016
Fall 2012 -- Lecture #25
1
You are Here!
Software
• Parallel Requests
Assigned to computer
e.g., Search “Katz”
Hardware
Smart
Phone
Warehouse
Scale
Computer
Harness
• Parallel Threads Parallelism &
Assigned to core
e.g., Lookup, Ads
Achieve High
Performance
Computer
• Parallel Instructions
>1 instruction @ one time
e.g., 5 pipelined instructions
• Parallel Data
>1 data item @ one time
e.g., Add of 4 pairs of words
• Hardware descriptions
All gates @ one time
Memory
Core
(Cache)
Input/Output
Instruction Unit(s)
Core
Functional
Unit(s)
A0+B0 A1+B1 A2+B2 A3+B3
Cache Memory
Today
Logic Gates
• Programming Languages
6/27/2016
…
Core
Fall 2012 -- Lecture #25
2
Levels of
Representation/Interpretation
High Level Language
Program (e.g., C)
Compiler
Assembly Language
Program (e.g., MIPS)
Assembler
Machine Language
Program (MIPS)
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
lw
lw
sw
sw
0000
1010
1100
0101
$t0, 0($2)
$t1, 4($2)
$t1, 0($2)
$t0, 4($2)
1001
1111
0110
1000
1100
0101
1010
0000
Anything can be represented
as a number,
i.e., data or instructions
0110
1000
1111
1001
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine
Interpretation
Hardware Architecture Description
(e.g., block diagrams)
Architecture
Implementation
Logic Circuit Description
(Circuit Schematic Diagrams) Fall 2012 -- Lecture #25
6/27/2016
3
Review
• Hardware systems made from Stateless Combinational Logic and Stateful “Memory” Logic (Registers)
• Clocks tell us when D-flip-flops change
– Setup and Hold times important
• We pipeline long-delay CL for faster clock cycle
– Split up the critical path
• Finite State Machines extremely useful
• Use muxes to select among input
– S input bits selects 2S inputs
– Each input can be n-bits wide, indep of S
• Can implement muxes hierarchically
• Can implement FSM with register + logic
6/27/2016
Fall 2012 -- Lecture #18
4
Agenda
•
•
•
•
MIPS-lite Datapath
Administrivia
CPU Timing
MIPS-lite Control
6/27/2016
Fall 2012 -- Lecture #25
5
Agenda
•
•
•
•
MIPS-lite Datapath
Administrivia
CPU Timing
MIPS-lite Control
6/27/2016
Fall 2012 -- Lecture #25
6
Processor Design Process
• Five steps to design a processor:
Processor
1. Analyze instruction set 
Input
datapath requirements
Control
Memory
2. Select set of datapath
components & establish
Datapath
Output
clock methodology
3. Assemble datapath meeting
the requirements
4. Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.
5. Assemble the control logic
• Formulate Logic Equations
• Design Circuits
6/27/2016
Fall 2012 -- Lecture #25
7
The MIPS-lite Subset
• ADDU and SUBU
31
op
– addu rd,rs,rt
– subu rd,rs,rt
• OR Immediate:
26
rs
6 bits
31
op
31
– lw rt,rs,imm16
– sw rt,rs,imm16
• BRANCH:
31
26
op
– beq rs,rt,imm16 6 bits
6/27/2016
5 bits
Fall 2012 -- Lecture #25
rd
shamt
funct
5 bits
5 bits
6 bits
0
16 bits
0
immediate
5 bits
21
rs
0
16
rt
5 bits
6
immediate
5 bits
21
rs
11
16
rt
5 bits
26
6 bits
5 bits
21
rs
op
16
rt
5 bits
26
– ori rt,rs,imm16 6 bits
• LOAD and
STORE Word
21
16 bits
16
rt
5 bits
0
immediate
16 bits
8
Register Transfer Language (RTL)
• RTL gives the meaning of the instructions
{op , rs , rt , rd , shamt , funct}  MEM[ PC ]
{op , rs , rt ,
Imm16}  MEM[ PC ]
• All start by fetching the instruction
Inst
Register Transfers
ADDU
R[rd]  R[rs] + R[rt]; PC  PC + 4
SUBU
R[rd]  R[rs] – R[rt]; PC  PC + 4
ORI
R[rt]  R[rs] | zero_ext(Imm16); PC  PC + 4
LOAD
R[rt]  MEM[ R[rs] + sign_ext(Imm16)]; PC  PC + 4
STORE
MEM[ R[rs] + sign_ext(Imm16) ]  R[rt]; PC  PC + 4
BEQ
if ( R[rs] == R[rt] )
then PC  PC + 4 + (sign_ext(Imm16) || 00)
else PC  PC + 4
6/27/2016
Fall 2012 -- Lecture #25
9
Step 1: Requirements of the
Instruction Set
• Memory (MEM)
– Instructions & data (will use one for each: really caches)
• Registers (R: 32 x 32)
– Read rs
– Read rt
– Write rt or rd
• PC
• Extender (sign/zero extend)
• Add/Sub/OR unit for operation on register(s) or extended
immediate
• Add 4 (+ maybe extended immediate) to PC
• Compare if registers equal?
6/27/2016
Fall 2012 -- Lecture #25
10
mux
+4
1. Instruction
Fetch
6/27/2016
rd
rs
rt
ALU
Data
memory
registers
PC
instruction
memory
Generic Steps of Datapath
imm
2. Decode/
Register
Read
Fall 2012 -- Lecture #25
3. Execute 4. Memory
5. Register
Write
11
Step 2: Components of the Datapath
• Combinational Elements
• State Elements + Clocking Methodology
• Building Blocks
OP
CarryIn
A
A
CarryOut
32
Adder
6/27/2016
B
32
32
Y
B
32
Multiplexer
Fall 2012 -- Lecture #25
32
ALU
32
Sum
A
MUX
Adder
B
32
Select
32
Result
32
ALU
12
ALU Needs for MIPS-lite + Rest of MIPS
• Addition, subtraction, logical OR, ==:
ADDU
SUBU
ORI
R[rd] = R[rs] + R[rt]; ...
R[rd] = R[rs] – R[rt]; ...
R[rt] = R[rs] | zero_ext(Imm16)...
BEQ
if ( R[rs] == R[rt] )...
• Test to see if output == 0 for any ALU operation
gives == test. How?
• P&H also adds AND, Set Less Than (1 if A < B, 0
otherwise)
• ALU from Appendix C, section C.5
6/27/2016
Fall 2012 -- Lecture #25
13
Storage Element: Idealized Memory
Write Enable
Address
• Memory (idealized)
– One input bus: Data In
– One output bus: Data Out
• Memory word is found by:
Data In
32
Clk
DataOut
32
– Address selects the word to put on Data Out
– Write Enable = 1: address selects the memory
word to be written via the Data In bus
• Clock input (CLK)
– CLK input is a factor ONLY during write operation
– During read operation, behaves as a combinational logic
block: Address valid  Data Out valid after “access time”
6/27/2016
Fall 2012 -- Lecture #25
14
Storage Element: Register (Building Block)
Write Enable
• Similar to D Flip Flop except
– N-bit input and output
– Write Enable input
• Write Enable:
Data In
Data Out
N
N
clk
– Negated (or deasserted) (0): Data Out will not
change
– Asserted (1): Data Out will become Data In on
rising edge of clock
6/27/2016
Fall 2012 -- Lecture #25
15
Storage Element: Register File
RW RA RB
Write Enable 5 5 5
• Register File consists of 32 registers:
– Two 32-bit output busses:
busA and busB
– One 32-bit input bus: busW
• Register is selected by:
busW
32
Clk
32 x 32-bit
Registers
busA
32
busB
32
– RA (number) selects the register to put on busA (data)
– RB (number) selects the register to put on busB (data)
– RW (number) selects the register to be written
via busW (data) when Write Enable is 1
• Clock input (clk)
– Clk input is a factor ONLY during write operation
– During read operation, behaves as a combinational logic block:
• RA or RB valid  busA or busB valid after “access time.”
6/27/2016
Fall 2012 -- Lecture #25
16
Step 3: Assemble DataPath Meeting
Requirements
• Register Transfer Requirements
 Datapath Assembly
• Instruction Fetch
• Read Operands and Execute
Operation
• Common RTL operations
clk
– Fetch the Instruction:
mem[PC]
– Update the program counter:
• Sequential Code:
PC  PC + 4
• Branch and Jump:
PC  “something else”
6/27/2016
Fall 2012 -- Lecture #25
PC
Next Address
Logic
Address
Instruction Word
Instruction
Memory
32
17
Step 3: Add & Subtract
• R[rd] = R[rs] op R[rt] (addu rd,rs,rt)
– Ra, Rb, and Rw come from instruction’s Rs, Rt, and Rd fields
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
11
rd
5 bits
6
shamt
5 bits
0
funct
6 bits
– ALUctr and RegWr: control logic after decoding the instruction
rd rs rt
RegWr 5 5 5
Rw Ra Rb
32 x 32-bit
Registers
busA
32
busB
clk
ALU
busW
32
ALUctr
Result
32
32
• … Already defined the register file & ALU
6/27/2016
Fall 2012 -- Lecture #25
18
Agenda
•
•
•
•
MIPS-lite Datapath
Administrivia
CPU Timing
MIPS-lite Control
6/27/2016
Fall 2012 -- Lecture #25
21
Clocking Methodology
Clk
.
.
.
.
.
.
.
.
.
.
.
.
• Storage elements clocked by same edge
• “Critical path” (longest path through logic) determines length
of clock period
• Have to allow for Clock-to-Q and Setup Times too
• This lecture (and P&H sections) 4.3-4.4 do whole instruction
in 1 clock cycle for pedagogic reasons
– Project 4 will do it in 2 clock cycles via simple pipelining
– Soon explain pipelining and use 5 clock cycles per instruction
6/27/2016
Fall 2012 -- Lecture #25
22
Register-Register Timing:
One Complete Cycle
Clk
Clk-to-Q
PC Old Value
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
RegWr
Old Value
busA, B
Old Value
busW
Old Value
New Value
Instruction Memory Access Time
New Value
Delay through Control Logic
New Value
New Value
Register File Access Time
New Value
ALU Delay
New Value
ALUctr
RegWr Rd Rs Rt
5
Rw
busW
5
5
Ra Rb
6/27/2016
clk
32
ALU
RegFile
busA
Setup Time
busB
32
Register Write
Occurs Here
32
Fall 2012 -- Lecture #25
23
Register-Register Timing:
One Complete Cycle
Clk
Clk-to-Q
PC Old Value
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
RegWr
Old Value
busA, B
Old Value
busW
Old Value
New Value
Instruction Memory Access Time
New Value
Delay through Control Logic
New Value
New Value
Register File Access Time
New Value
ALU Delay
New Value
ALUctr
RegWr Rd Rs Rt
5
Rw
busW
5
5
Ra Rb
6/27/2016
clk
32
ALU
RegFile
busA
Setup Time
busB
32
Register Write
Occurs Here
32
Fall 2012 -- Lecture #25
24
Agenda
•
•
•
•
MIPS-lite Datapath
Administrivia
CPU Timing
MIPS-lite Control
6/27/2016
Fall 2012 -- Lecture #25
25
Logical Operations with Immediate
• R[rt] = R[rs] op ZeroExt[imm16]
31
26
21
op
16 15
rs
31 6 bits
0
rt
5 bits
immediate
5 bits 16 15
16 bits
0
immediate
0000000000000000
16 bits
16 bits
But we’re writing to Rt register??
And immediate ALU input??
ALUctr
RegWr Rd Rs Rt
5
Rw
busW
5
Ra Rb
6/27/2016
busA
32
ALU
RegFile
clk
5
busB
32
32
Fall 2012 -- Lecture #25
26
Logical Operations with Immediate
• R[rt] = R[rs] op ZeroExt[imm16]
31
26
21
op
rd
rt
1
0
RegWr
5
Rw
0
rt
5 bits
immediate
5 bits 16 15
0000000000000000
16 bits
16 bits
0
immediate
16 bits
2:1 multiplexor
rs
5
rt
ALUctr
5
Ra Rb
32
busA
busB
32
clk
16
ZeroExt
imm16
ALU
RegFile
32
6/27/2016
rs
31 6 bits
RegDst
16
0
32
• Already defined
32-bit MUX;
Zero Ext?
1
32
ALUSrc
Fall 2012 -- Lecture #25
27
Load Operations
• R[rt] = Mem[R[rs] + SignExt[imm16]]
Example: lw rt,rs,imm16
31
26
21
op
16
rs
6 bits
0
rt
5 bits
immediate
5 bits
16 bits
RegDst rd rt
1
RegWr
5
Rw
rs
5
ALUctr
5
Ra Rb
busA
busB
32
clk
imm16
32
ALU
RegFile
32
16
6/27/2016
rt
ZeroExt
What sign
extending??
And where is
Mem??
0
32
0
1
32
Fall 2012 -- Lecture #25
ALUSrc
28
Load Operations
• R[rt] = Mem[R[rs] + SignExt[imm16]]
Example: lw rt,rs,imm16
31
26
21
op
16
rs
6 bits
0
rt
5 bits
immediate
5 bits
16 bits
ALUctr
RegDst rd rt
1
RegWr
0
rs
5
5
Rw
busW
5
Ra Rb
busA
16
ExtOp
Extender
imm16
32
ALU
busB
32
clk
6/27/2016
rt
RegFile
32
MemtoReg
MemWr
32
0
0
1
? 32
Data In
ALUSrc
clk
32
Fall 2012 -- Lecture #25
WrEn Adr
Data
Memory
1
29
And in Conclusion, …
Single-Cycle Processor
• Five steps to design a processor:
Processor
1. Analyze instruction set 
Input
datapath requirements
Control
Memory
2. Select set of datapath
components & establish
Datapath
Output
clock methodology
3. Assemble datapath meeting
the requirements
4. Analyze implementation of each instruction to determine
setting of control points that effects the register transfer.
5. Assemble the control logic
• Formulate Logic Equations
• Design Circuits
6/27/2016
Fall 2011 -- Lecture #25
30
Download