add

advertisement
CSCI-365
Computer Organization
Lecture
Note: Some slides and/or pictures in the following are adapted from:
Computer Organization and Design, Patterson & Hennessy, ©2005
The CPU
• Processor (CPU): the active part of the
computer, which does all the work (data
manipulation and decision-making)
– Datapath: portion of the processor which contains
hardware necessary to perform operations required
by the processor (the brawn)
– Control: portion of the processor (also in hardware)
which tells the datapath what needs to be done (the
brain)
The CPU
• Our implementation of the MIPS is simplified
– memory-reference instructions: lw, sw
– arithmetic-logical instructions: add, sub, and, or, slt
– control flow instructions: beq, j
• Generic implementation
– use the program counter (PC) to supply the
instruction address and fetch the instruction
from memory (and update the PC)
Fetch
PC = PC+4
Exec
Decode
– decode the instruction (and read registers)
– execute the instruction
• All instructions (except j) use the ALU after reading the registers
A Small Set of Instructions
31
R
I
op
25
rs
20
rt
15
rd
10
sh
fn
5
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
Opcode
Source 1
or base
Source 2
or dest’n
Destination
Unused
Opcode ext
imm
Operand / Offset, 16 bits
J
jta
Jump target address, 26 bits
inst
Instruction, 32 bits
MIPS instruction formats and naming of the various fields.
We will refer to this diagram later
Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor)
Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)
Two I-format memory access instructions (lw, sw)
Three I-format conditional branch instructions (bltz, beq, bne)
Four unconditional jump instructions (j, jr, jal, syscall)
0
The MIPS
Instruction Set
Copy
Arithmetic
Logic
Memory access
Control transfer
Instruction
Usage
Load upper immediate
lui
rt,imm
Add
add
rd,rs,rt
Subtract
sub
rd,rs,rt
Set less than
slt
rd,rs,rt
Add immediate
addi
rt,rs,imm
Set less than immediate
slti
rd,rs,imm
AND
and
rd,rs,rt
OR
or
rd,rs,rt
XOR
xor
rd,rs,rt
NOR
nor
rd,rs,rt
AND immediate
andi
rt,rs,imm
OR immediate
ori
rt,rs,imm
XOR immediate
xori
rt,rs,imm
Load word
lw
rt,imm(rs)
Store word
sw
rt,imm(rs)
Jump
j
L
Jump register
jr
rs
Branch less than 0
bltz
rs,L
Branch equal
beq
rs,rt,L
Branch not equal
bne
rs,rt,L
Jump and link
jal
L
System call
syscall
op fn
15
0
0
0
8
10
0
0
0
0
12
13
14
35
43
2
0
1
4
5
3
0
32
34
42
36
37
38
39
8
12
Fetching Instructions
• Fetching instructions involves
– reading the instruction from
Add
4
the Instruction Memory
Instruction
Memory
– updating the PC to hold the
address of the next instruction
– PC is updated every cycle, so
it does not need an explicit
write control signal
– Instruction Memory is read
every cycle, so it doesn’t need
an explicit read control signal
PC
Read
Address
Instruction
Decoding Instructions
• Decoding instructions involves
– sending the fetched instruction’s opcode and function
field bits to the control unit
– reading two values from the Register File
• Register File addresses are contained in the instruction
Control
Unit
Read Addr 1
Instruction
Read
Register
Read Addr 2 Data 1
File
Write Addr
Read
Write Data
Data 2
Executing R Format Operations
• R format operations (add,sub,slt,and,or)
31
R-type: op
25
rs
20
15
rt
rd
10
5
0
shamt funct
– perform the (op and funct) operation on values in rs and rt
– store the result back into the Register File (into location rd)
RegWrite
Instruction
Read Addr 1
Register Read
Read Addr 2 Data 1
File
Write Addr
Read
Write Data
ALU control
ALU
overflow
zero
Data 2
– The Register File is not written every cycle (e.g. sw), so we need
an explicit write control signal for the Register File
Executing Load and Store Operations
•
Load and store operations involve
– compute memory address by adding the base register (read from the Register
File during decode) to the 16-bit signed-extended offset field in the instruction
– store value (read from the Register File during decode) written to the Data
Memory
– load value, read from the Data Memory, written to the Register File
RegWrite
Instruction
ALU control
overflow
zero
Read Addr 1
Register Read
Read Addr 2 Data 1
File
Write Addr
Read
Write Data
16
Address
ALU
Data
Memory Read Data
Write Data
Data 2
Sign
Extend
MemWrite
MemRead
32
Executing Branch Operations
Add
4
Add
Shift
left 2
Branch
target
address
ALU control
PC
•
Branch operations involves
Instruction
– compare the operands read from
the Register File during decode for
equality (zero ALU output)
– compute the branch target address
by adding the updated PC to the 16bit signed-extended offset field in
the instr
Read Addr 1
Register Read
Read Addr 2 Data 1
File
Write Addr
Read
Write Data
16
Data 2
Sign
Extend
32
zero (to branch
control logic)
ALU
Jump and Branch Instructions
Unconditional jump and jump through register instructions
j
jr
$ra is the
symbolic
name for
reg. $31
(return
address)
verify
$ra
31
J
# go to mem loc named “verify”
# go to address that is in $ra;
# $ra may hold a return address
op
jump target address
25
0 0 0 0 1 0
0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1
j=2
x x x x 0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
From PC
(incremented)
op
31
R
Effective target address (32 bits)
25
rs
20
rt
15
rd
10
sh
5
fn
0
0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
ALU
instruction
Source
register
Unused
Unused
Unused
jr = 8
The jump instruction j of MiniMIPS is a J-type instruction which is shown
along with how its effective target address is obtained. The jump register
(jr) instruction is R-type, with its specified register often being $ra.
Executing Jump Operations
• Jump operation involves
– replace the lower 28 bits of the PC with the lower 26
bits of the fetched instruction shifted left by 2 bits
Add
4
4
Instruction
Memory
PC
Read
Address
Shift
left 2
Instruction
26
Jump
address
28
Creating a Single Datapath from the Parts
• Assemble the datapath segments and add control
lines and multiplexors as needed
• Single cycle design – fetch, decode and execute each
instructions in one clock cycle
– no datapath resource can be used more than once per
instruction, so some must be duplicated (e.g., separate
Instruction Memory and Data Memory, several adders)
– multiplexors needed at the input of shared elements with
control lines to do the selection
– write signals to control writing to the Register File and Data
Memory
• Cycle time is determined by length of the longest path
The Main Control Unit

Control signals derived from instruction
R-type
Load/
Store
Branch
0
rs
rt
rd
shamt
funct
31:26
25:21
20:16
15:11
10:6
5:0
35 or 43
rs
rt
address
31:26
25:21
20:16
15:0
4
rs
rt
address
31:26
25:21
20:16
15:0
opcode
always
read
read,
except
for load
write for
R-type
and load
sign-extend
and add
Single Cycle Datapath with Control Unit
0
Add
Add
Shift
left 2
4
ALUOp
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Address
Instr[31-0]
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
R-type Instruction Data/Control Flow
0
Add
Add
Shift
left 2
4
ALUOp
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Instr[31-0]
Address
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[5-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
Deriving the Control Signals
Control signals for the single-cycle MicroMIPS implementation.
Control signal
Reg
file
ALU
op
Data
cache
Next
addr
0
1
2
3
RegWrite
Don’t write
Write
RegDst
rt
rd
$31
MemtoReg
Data out
ALU out
IncrPC
ALUSrc
(rt )
imm
AddSub
Add
Subtract
LogicFn1, LogicFn0
AND
OR
XOR
NOR
FnClass1, FnClass0
lui
Set less
Arithmetic
Logic
MemRead
Don’t read
Read
MemWrite
Don’t write
Write
BrType1, BrType0
No branch
beq
bne
bltz
PCSrc
IncrPC
jta
(rs)
SysCallAddr
Load Word Instruction Data/Control Flow
0
Add
Shift
left 2
4
ALUOp
Instr[31-26]
Add
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Control
Unit
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21]
PC
Instruction
Memory
Read
Address Instr[31-0]
Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Store Word
Instruction?
Instr[15-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
Deriving the Control Signals
Control signals for the single-cycle MicroMIPS implementation.
Control signal
Reg
file
ALU
op
Data
cache
Next
addr
0
1
2
3
RegWrite
Don’t write
Write
RegDst
rt
rd
$31
MemtoReg
Data out
ALU out
IncrPC
ALUSrc
(rt )
imm
AddSub
Add
Subtract
LogicFn1, LogicFn0
AND
OR
XOR
NOR
FnClass1, FnClass0
lui
Set less
Arithmetic
Logic
MemRead
Don’t read
Read
MemWrite
Don’t write
Write
BrType1, BrType0
No branch
beq
bne
bltz
PCSrc
IncrPC
jta
(rs)
SysCallAddr
Branch Instruction Data/Control Flow
0
Add
Add
Shift
left 2
4
ALUOp
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21]
Instruction
Memory
PC
Read
Instr[31-0]
Address
Read Addr 1
Register Read
Instr[20-16]
Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
Deriving the Control Signals
Control signals for the single-cycle MicroMIPS implementation.
Control signal
Reg
file
ALU
op
Data
cache
Next
addr
0
1
2
3
RegWrite
Don’t write
Write
RegDst
rt
rd
$31
MemtoReg
Data out
ALU out
IncrPC
ALUSrc
(rt )
imm
AddSub
Add
Subtract
LogicFn1, LogicFn0
AND
OR
XOR
NOR
FnClass1, FnClass0
lui
Set less
Arithmetic
Logic
MemRead
Don’t read
Read
MemWrite
Don’t write
Write
BrType1, BrType0
No branch
beq
bne
bltz
PCSrc
IncrPC
jta
(rs)
SysCallAddr
Jump Operation?
0
Add
Add
Shift
left 2
4
ALUOp
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Address
Instr[31-0]
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
Adding the Jump Operation
Instr[25-0]
Shift
28
left 2
PC+4[31-28]
26
1
32
0
0
Add
Jump
ALUOp
Add
Shift
left 2
4
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Instr[31-26] Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Instr[31-0]
Address
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
Can the Control Unit set all the control signals based solely on the
opcode field of the instruction?
Instr[25-0]
Shift
left 2
26
1
28
32
0
PC+4[31-28]
Add
Jump
ALUOp
Instr[31-26]
Add
Shift
left 2
4
0
1
PCSrc
Branch
MemRead
MemtoReg
MemWrite
Control
Unit
ALUSrc
RegWrite
RegDst
Instruction
Memory
PC
Read
Instr[31-0]
Address
ovf
Instr[25-21] Read Addr 1
Register Read
Instr[20-16] Read Addr 2 Data 1
File
0
Write Addr
Read
1
Instr[15
-11]
Instr[15-0]
Write Data
zero
ALU
Data
Memory Read Data
1
Write Data
0
0
Data 2
1
Sign
16 Extend
Address
32
Instr[5-0]
ALU
control
0
1
1
0
1
00
01
10
11
00
01
10
0
0
FnClass
00
10
10
01
10
01
11
11
11
11
11
11
11
10
10
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
PCSrc
10 10
1
0
0
0
1
1
0
0
0
0
1
1
1
1
1
LogicFn
01
01
01
01
01
01
01
01
01
01
01
01
01
00
Add’Sub
00
01
01
01
00
00
01
01
01
01
00
00
00
00
BrType
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
1
0
DataW rite
001111
000000 100000
000000 100010
000000 101010
001000
001010
000000 100100
000000 100101
000000 100110
000000 100111
001100
001101
001110
100011
101011
000010
000000 001000
000001
000100
000101
000011
000000 001100
ALUSrc
fn
DataRead
Load upper immediate
Add
Subtract
Set less than
Add immediate
Set less than immediate
AND
OR
XOR
NOR
AND immediate
OR immediate
XOR immediate
Load word
Store word
Jump
Jump register
Branch on less than 0
Branch on equal
Branch on not equal
Jump and link
System call
op
RegInS rc
Settings
Instruction
RegDst
Signal
RegWrite
Control
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
01
10
00
00
00
01
11
11
01
10
00
Control Signal Generation
Auxiliary signals identifying instruction classes
arithInst = addInst  subInst  sltInst  addiInst  sltiInst
logicInst = andInst  orInst  xorInst  norInst  andiInst  oriInst  xoriInst
immInst = luiInst  addiInst  sltiInst  andiInst  oriInst  xoriInst
Example logic expressions for control signals
RegWrite = luiInst  arithInst  logicInst  lwInst  jalInst
ALUSrc = immInst  lwInst  swInst
addInst
subInst
jInst
AddSub = subInst  sltInst  sltiInst
DataRead = lwInst
PCSrc0 = jInst  jalInst  syscallInst
.
.
.
Control
.
.
.
sltInst
13.6 Performance of the Single-Cycle Design
An example combinational-logic data path to compute z := (u + v)(w – x) / y
Add/Sub
latency
2 ns
Multiply
latency
6 ns
Divide
latency
15 ns
Total
latency
23 ns
u
+
v
Note that the divider gets its
correct inputs after 9 ns,
but this won’t cause a problem
if we allow enough total time

w
-
/
z
x
y
Beginning with inputs u, v, w, x, and y
stored in registers, the entire computation
can be completed in 25 ns, allowing 1
ns each for register readout and write
Performance Estimation for Single-Cycle MicroMIPS
Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz
R-type 44%
6 ns
Load
24%
8 ns
Store
12%
7 ns
Branch 18%
5 ns
Jump
2%
3 ns
Weighted mean  6.36 ns
ALU-type
P
C
Load
P
C
Store
P
C
Branch
P
C
(and jr)
Jump
(except
jr & jal)
P
C
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
Not
used
The MicroMIPS data path unfolded (by depicting the register write step as a
separate block) so as to better visualize the critical-path latencies.
How Good is Our Single-Cycle Design?
Clock rate of 125 MHz not impressive
How does this compare with
current processors on the market?
Not bad, where latency is concerned
Instruction access
2 ns
Register read
1 ns
ALU operation
2 ns
Data cache access
2 ns
Register write
1 ns
Total
8 ns
Single-cycle clock = 125 MHz
A 2.5 GHz processor with 20 or so pipeline stages has a latency of about
0.4 ns/cycle  20 cycles = 8 ns
Throughput, however, is much better for the pipelined processor:
Up to 20 times better with single issue
Perhaps up to 100 times better with multiple issue
Download