0 - Iowa State University

advertisement
CprE 381 Computer Organization and Assembly
Level Programming, Fall 2013
Chapter 4
The Processor
Zhao Zhang
Iowa State University
Revised from original slides provided
by MKP
Week 8 Overview




CPU design overview
Datapath and Control
Control Unit
ALU Control Unit
Chapter 1 — Computer Abstractions and Technology — 2
Announcements

Mini-project B starts in week 9


Mini-projects B and C will be revised
The grading scale will be discussed by
Friday (week 8)
Chapter 1 — Computer Abstractions and Technology — 3
§4.1 Introduction
Introduction

CPU performance factors

Instruction count


CPI and Cycle time


Determined by ISA and compiler
Determined by CPU hardware
We will examine two MIPS implementations


A simplified, single-cycle version
A more realistic, pipelined version
Chapter 4 — The Processor — 4
Nine-Instruction MIPS


We will first use a MIPS subset of nine
instructions, then extend the subset
It’s enough to illustrate the most aspects of CPU
design, particularly datapath and control design
Memory reference: LW and SW
Arithmetic/logic: ADD, SUB, AND, OR, SLT
Branch: BEQ, BNE
Chapter 1 — Computer Abstractions and Technology — 5
Instruction Execution


PC  instruction memory, Fetch instruction
Register numbers  register file, Read
registers
Then, depending on instruction class
 Execute: Use ALU to calculate





Arithmetic result
Memory address for load/store
Branch target address
Memory access: Access data memory for
load/store
Register writeback: Write data back to registers
PC update (for all): PC  target address or PC + 4
Chapter 4 — The Processor — 6
CPU Overview
A Sketchy view
Next Sequential PC
= PC + 4
Branch Target
= (PC+4)+offset
An instruction may change
1. PC (all instructions)
2. Some register (arithmetic/logic, load)
3. Some memory word/halfword/byte (store)
Chapter 4 — The Processor — 7
Multiplexers

Can’t just join
wires together

Use multiplexers
What would happen if you just join signals in VHDL?
Chapter 4 — The Processor — 8
Control
Control signals: mux
select, read/write enable,
ALU opcode, etc.
Chapter 4 — The Processor — 9

Combinational element



Operate on data
Output is a function of input
State (sequential) elements


§4.2 Logic Design Conventions
Logic Design Basics
Store information
Output is a function of internal state and
input
Chapter 4 — The Processor — 10
Combinational Elements

AND-gate


Y=A&B
A
B

Multiplexer

A
+
Y=A+B
Y
B
Y


Adder
Arithmetic/Logic Unit

Y = F(A, B)
Y = S ? I1 : I0
A
I0
I1
M
u
x
S
ALU
Y
Y
B
F
Chapter 4 — The Processor — 11
Sequential Elements

Register: stores data in a circuit



Uses a clock signal to determine when to
update the stored value
Edge-triggered: update when Clk changes
from 0 to 1
Data output Q is stable for a clock cycle
Clk
D
Q
D
Clk
Q
Chapter 4 — The Processor — 12
Sequential Elements

Register with write control

Only updates on clock edge when write
control input is 1


VHDL: rising_edge(Clk) AND Write
Used when stored value is required later
Clk
D
Write
Clk
Q
Write
D
Q
Chapter 4 — The Processor — 13
Clocking Methodology

Combinational logic transforms data during
clock cycles


Input from state elements
Output must stabilize within one cycle


Longest delay determines clock period
Output to state element at the next rising edge
Chapter 4 — The Processor — 14
Clocking Methodology

Processor is a big state machine

Works like a Moore machine in non-I/O phase


Output is a function of the state
States include PC, all registers and memory
contents
Chapter 1 — Computer Abstractions and Technology — 15

Datapath elements

Elements that process data and addresses
in the CPU


Registers, ALUs, mux’s, memories, …
§4.3 Building a Datapath
Building a Datapath
We will build a MIPS datapath
incrementally

Refining the overview design
Chapter 4 — The Processor — 16
Instruction Fetch
32-bit
register
Increment
by 4 for
next
instruction
Datapath elements: PC register, instruction memory, 32-bit adder
Chapter 4 — The Processor — 17
R-Format Instructions



Read two register operands
Perform arithmetic/logical operation
Write register result
Datapath elements: Register file, ALU
Chapter 4 — The Processor — 18
Load/Store Instructions


Read register operands
Calculate address using 16-bit offset



Use ALU, but sign-extend offset
Load: Read memory and update register
Store: Write register value to memory
Datapath elements: Data memory, sign extender
Chapter 4 — The Processor — 19
Branch Instructions


Read register operands
Compare operands


Use ALU, subtract and check Zero output
Calculate target address



Sign-extend displacement
Shift left 2 places (word displacement)
Add to PC + 4

Already calculated by instruction fetch
Chapter 4 — The Processor — 20
Branch Instructions
Just
re-routes
wires
New: Shifter,
2nd
32-bit Adder
Sign-bit wire
replicated
Chapter 4 — The Processor — 21
Composing the Elements

First-cut data path does an instruction in
one clock cycle



Each datapath element can only do one
function at a time
Hence, we need separate instruction and data
memories
Use multiplexers where alternate data
sources are used for different instructions
Chapter 4 — The Processor — 22
R-Type/Load/Store Datapath
Chapter 4 — The Processor — 23
Full Datapath
Chapter 4 — The Processor — 24
Performance Issues

Longest delay determines clock period



Critical path: load instruction
Instruction memory  register file  ALU  data
memory  register file
Not every instruction requires the same time
Chapter 4 — The Processor — 25
Performance Issues





Some instructions may take substantially longer
time, e.g. multiply/division
Not feasible to vary clock cycle for different
instructions
Must use the worst-case delay as the clock cycle
Violates design principle making the common
case fast
We will improve performance by pipelining
Chapter 4 — The Processor — 26

ALU used for



Load/Store: F = add
Branch: F = subtract
R-type: F depends on funct field
ALU control
Function
0000
AND
0001
OR
0010
add
0110
subtract
0111
set-on-less-than
1100
NOR
§4.4 A Simple Implementation Scheme
ALU Control
Chapter 4 — The Processor — 27
ALU Control

Assume 2-bit ALUOp derived from opcode

Combinational logic derives ALU control
opcode
ALUOp
Operation
funct
ALU function
ALU control
lw
00
load word
XXXXXX
add
0010
sw
00
store word
XXXXXX
add
0010
beq
01
branch equal
XXXXXX
subtract
0110
R-type
10
add
100000
add
0010
subtract
100010
subtract
0110
AND
100100
AND
0000
OR
100101
OR
0001
set-on-less-than
101010
set-on-less-than
0111
Chapter 4 — The Processor — 28
VHDL Notes

How to program the ALU control?
-- Behavior style
process (alu_op, funct)
begin
case alu_op is
when ‘00’ =>
alu_code <= ‘0010’;
when ’01’ =>
…
end case;
end process;
Chapter 1 — Computer Abstractions and Technology — 29
The Main Control Unit

Control signals derived from instruction
R-type
0
rs
31:26
Load/
Store
35 or 43
31:26
Branch
4
25:21
rs
opcode
25:21
always
read
rd
20:16
rt
25:21
rs
31:26
rt
shamt
15:11
10:6
funct
5:0
address
20:16
rt
15:0
address
20:16
read,
except
for load
15:0
write for
R-type
and load
sign-extend
and add
Chapter 4 — The Processor — 30
Datapath With Control
Chapter 4 — The Processor — 31
Summary of Control Signals








RegDst: Write to register rt or rd?
ALUSrc: Immediate to ALU?
MemtoReg: Write memory or ALU output?
RegWrite: Write to regfile at all?
MemRead: Read from Data Memory?
MemWrite: Write to the Data Memory?
Branch: Is it a branch intruction?
ALUOp[1:0]: ALU control field
Chapter 1 — Computer Abstractions and Technology — 32
R-Type Instruction
Chapter 4 — The Processor — 33
R-Type: Control Signals
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp[1:0]
1 (write to rd)
0 (No immediate)
0 (wrote not from memory)
1 (does write regfile)
0 (no memory read)
0 (no memory write)
0 (does write regfile)
10 (R-type ALU op)
Chapter 1 — Computer Abstractions and Technology — 34
Load Instruction
Chapter 4 — The Processor — 35
Load: Control Signals
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp[1:0]
0
1
1
1
1
0
0
00
Chapter 1 — Computer Abstractions and Technology — 36
Store: Control Signals
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp[1:0]
X
1
X
0
0
1
0
00
Chapter 1 — Computer Abstractions and Technology — 37
Branch-on-Equal Instruction
Chapter 4 — The Processor — 38
BEQ: Control Signals
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUOp[1:0]
X
0
X
0
0
0
1
01
Chapter 1 — Computer Abstractions and Technology — 39
Control Signal Setting

What’re the control signal values for each
instruction or instruction type?
Inst
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branc ALUO ALUO
h
p1
p0
R-
1
0
0
1
0
0
0
1
0
lw
0
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
Note: “R-” means R-format
Chapter 1 — Computer Abstractions and Technology — 40
VHDL Notes

How to program the control?
entity control is
port (op_code
reg_dst
alu_src
mem_to_reg
reg_write
mem_read
mem_write
branch
alu_op
end control;
:
:
:
:
:
:
:
:
:
in
out
out
out
out
out
out
out
out
m32_6bits;
m32_1bit;
m32_1bit;
m32_1bit;
m32_1bit;
m32_1bit;
m32_1bit;
m32_1bit;
m32_2bits);
Chapter 1 — Computer Abstractions and Technology — 41
VHDL Notes
architecture rom of control is
subtype code_t is m32_vector(8 downto 0);
type rom_t is array (0 to 63) of code_t;
-- The ROM content for control signals
signal rom : rom_t := (
00 => "100100010",
-- R-type
35 => "011110000",
-- LW
… -- More for other instructions
others=>"000000000");
begin
(reg_dst, alu_src, mem_to_reg, reg_write, mem_read,
mem_write, branch, alu_op(1), alu_op(0))
<= rom(to_integer(unsigned(op_code)));
end rom;
Chapter 1 — Computer Abstractions and Technology — 42
Implementing Jumps
Jump
2
address
31:26


Jump uses word address
Update PC with concatenation of




25:0
Top 4 bits of old PC
26-bit jump address
00
Need an extra control signal decoded from
opcode
Chapter 4 — The Processor — 43
Datapath With Jumps Added
Chapter 4 — The Processor — 44
Grading Scale
Tentative grading scale
A: 90, A-: 87
B+: 84, B: 80, B-: 75
C+: 70, C: 65, C-: 60
D: 50
 There will be a bonus in lab projects

Chapter 1 — Computer Abstractions and Technology — 45
Mini-Project B, Tentative
Implement single-cycle processor (SCP).
There will be three parts
1. Part 1, SCPv1: Implement the nineinstruction ISA plus the J instruction
2. Part 2, SCPv2a: Support all the
instructions needed to run bubble sorting
3. Part 3, SCPv2b: Detailed modeling of
data elements
Chapter 1 — Computer Abstractions and Technology — 46
Mini-Project B

Bonus part, SCPv3: Support all integer
instructions on the green sheet, due in the
last lab

Some support files will be provided


High-level modeling of Register File, ALU,
Adder, to be used in Parts 1 and 2
Partial sample VHDL code will be provided
Chapter 1 — Computer Abstractions and Technology — 47
Mini-Project B



The CPU composition must be strongly
structural
Parts 1 and 2 may use behavior/dataflow
modeling for data elements
Part 3 must use detailed modeling for data
elements – Reuse your VHDL code in the
labs
Chapter 1 — Computer Abstractions and Technology — 48
Extend Single-Cycle MIPS
Consider the following instructions
 addi: add immediate
 sll: Shift left logic by a constant
 bne: branch if not equal
 jal: Jump and link
 jr: Jump register
Chapter 1 — Computer Abstractions and Technology — 49
SCPv0: R-Format, LW/SW, BEQ
Chapter 4 — The Processor — 50
SCPv1: R-Format, LW/SW, BEQ, J
Chapter 4 — The Processor — 51
SCPv1: Control Signals

What’re the control signal values for each
instruction or instruction type?
Inst
RegDst
ALU- Mem- Reg- Mem Mem Bran
toReg Write Read Write ch
Src
ALU
Op1
ALU
Op0
Jum
p
R-
1
0
0
1
0
0
0
1
0
0
lw
0
1
1
1
1
0
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
0
beq
X
0
X
0
0
0
1
0
1
0
j
X
X
X
0
0
0
0
X
X
1
Note: “R-” means R-format
Chapter 1 — Computer Abstractions and Technology — 52
Extend the Single-Cycle Processor
For each instruction, do we need
1. Any new or revised datapath element(s)?
2. Any new control signal(s)?
Then revise, if necessary,
1. Datapath: Add new elements or revise
existing ones, add new connections
2. Control Unit: Add/extend control signals,
extend the truth table
3. ALU Control: Extend the truth table
Chapter 1 — Computer Abstractions and Technology — 53
SCPv0 + ADDI
addi rs, rt, immediate
001000
31:26
rs
25:21
rt
immediate
20:16
15:0
R[rt] = R[rs]+SignExtImm




Read register operands (only one is used)
Sign extend the immediate (in parallel)
Perform arithmetic/logical operation
Write register result
Chapter 1 — Computer Abstractions and Technology — 54
SCPv0 + ADDI
What changes to this
baseline?
Chapter 1 — Computer Abstractions and Technology — 55
SCPv0 + ADDI
Do we need new or revised
datapath elements?
Chapter 4 — The Processor — 56
SCPv0 + ADDI


Do we need new or revised datapath
elements?
Do we need new control signal(s)?
Inst
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branc ALUO ALUO
h
p1
p0
R-
1
0
0
1
0
0
0
1
0
lw
0
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
addi
Chapter 1 — Computer Abstractions and Technology — 57
SCPv0 + ADDI

Like LW



Inst
Like R-format arithmetic

I-format instruction
Write to register[rt]
Use add operation
RegDst

Write ALU result to
register file
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branc ALUO ALUO
h
p1
p0
R-
1
0
0
1
0
0
0
1
0
lw
0
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
addi
0
1
0
1
0
0
0
0
0
Chapter 1 — Computer Abstractions and Technology — 58
SCPv0 + SLL
sll rd, rs, shamt
000000
31:26
rs
25:21
rt
rd
20:16
15:11
shamt
10:6
000000
5:0
R[rd] = R[rt]<<shamt



Read register operands (only one is used)
Perform shift operation
Write register result
Note: sllv rd, rt, rs for shift left logic variable
Chapter 1 — Computer Abstractions and Technology — 59
SCPv0 + SLL
What changes to the
datapath elements?
Chapter 1 — Computer Abstractions and Technology — 60
SCPv0 + SLL
ALU needs to do the
shift operation
ALU 1st input needs another
source: shamt extended to
32-bit
Chapter 1 — Computer Abstractions and Technology — 61
SCPv0 + SLL

Add another source to the
1st input of ALU


Add a Mux and ALUSrc1
control line




Shamt: Instruction[10-6]
0: R[rs]
1: Shamt (sign-extended)
Rename ALUSrc to
ALUSrc2
Extend ALU control

Add an ALU control code for
SLL
Chapter 1 — Computer Abstractions and Technology — 62
SCPv0 + SLL

Extend ALU control: Choose a code of
your choice (kkkk shown in the table)
opcode
ALUOp
Operation
funct
ALU function
ALU control
lw
00
load word
XXXXXX
add
0010
sw
00
store word
XXXXXX
add
0010
beq
01
branch equal
XXXXXX
subtract
0110
R-type
10
add
100000
add
0010
subtract
100010
subtract
0110
AND
100100
AND
0000
OR
100101
OR
0001
set-on-less-than
101010
set-on-less-than
0111
shift-left-logic
000000
shift-left-logic
kkkk
Chapter 4 — The Processor — 63
SCPv0 + SLL
Inst
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branc
h
ALUO
p
R-
1
0
0
1
0
0
0
1 0
lw
0
1
1
1
1
0
0
0 0
sw
X
1
X
0
0
1
0
0 0
beq
X
0
X
0
0
0
1
0 1
Inst
RegDst
ALU
Src1
R-
1
0
0
0
1
0
0
0
1 0
lw
0
0
1
1
1
1
0
0
0 0
sw
X
0
1
X
0
0
1
0
0 0
beq
X
0
0
X
0
0
0
1
0 1
sll
1
1
0
0Chapter11 — Computer
0
0
0 and 1
0
Abstractions
Technology
— 64
sll
ALU- Mem- Reg- Mem Mem Bran
Src2 toReg Write ch
Read Write
ALU
Op
SCPv0 + BNE
bne rs, rt, label
000101
31:26


25:21
rt
offset
20:16
15:0
PC = (R[Rs]!=R[rt]) ?
PC+4+(SignExtImm<<2) : PC+4
Read register operands
Compare operands


rs
Use ALU, subtract and check Zero output
Calculate target address



Sign-extend displacement
Shift left 2 places (word displacement)
Add to PC + 4

Already calculated by instruction fetch
Chapter 1 — Computer Abstractions and Technology — 65
SCPv0 + BNE
Make what changes to
the datapath?
Chapter 4 — The Processor — 66
SCPv0 + BNE

Extend Branch to two bits




10: Branch-Equal
11: Branch-Not-Equal
Replace the AND gate with the
following logic
Can use a different truth table
Branch Zero
1
1
0
1
1
0
otherwise
Branch
taken?
1
1
0
Chapter 1 — Computer Abstractions and Technology — 67
SCPv0 + BNE
Inst
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branc
h
ALUO
p
R-
1
0
0
1
0
0
0
1 0
lw
0
1
1
1
1
0
0
0 0
sw
X
1
X
0
0
1
0
0 0
beq
X
0
X
0
0
0
1
0 1
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branc
h
ALUO
p
bne
Inst
RegDst
R-
1
0
0
1
0
0
0 0
1 0
lw
0
1
1
1
1
0
0 0
0 0
sw
X
1
X
0
0
1
0 0
0 0
beq
X
0
X
0
0
0
1 0
0 1
bne
X
0
X
0
0
0
1 1
0 1
Chapter 1 — Computer Abstractions and Technology — 68
SCPv1 + JAL
jal target
000011
31:26



address
25:0
PC = JumpAddr
R[31] = PC+4
Jump uses word address
Update PC with JumpAddr: concatenation of
top 4 bits of old PC, 26-bit jump address, and
00 (called pseudo-direct)
Save PC+4 to $ra
Chapter 1 — Computer Abstractions and Technology — 69
SCPv1 + JAL
Make what changes to
the datapath?
Chapter 4 — The Processor — 70
Download