Document

advertisement
Chapter 1
An Introduction to Processor
Design
부산대학교
컴퓨터공학과
1.1 Processor Architecture &
Organization

All modern general-purpose computers employ
“stored program concept”



IAS computer by von Neumann at Princeton Institute
for Advanced Studies (in 1946)
First implemented in ‘Baby Machine’ at Univ. of
Manchester, England (in 1948)
[Figure 1.1] The state in a stored-program digital
computer
FF.. FF16
ins truc tions
regis ters
addres s
dat a
processor
ins truc tions
and data
2015-04-09
memory
PNU Computer Eng.
00. .0016
2
1.1 Processor Architecture &
Organization

50 years of development:




performance of processors h
cost i
 cost-effective computers (principles of operation not
changed much)
Most of improvements:

Advances in technology of electronics


New insights:




2015-04-09
Vacuum tubes -> transistors -> ICs -> VLSI
Virtual memory (early 1960s)
Cache memory
Pipelining
RISC
PNU Computer Eng.
3
1.2 Abstraction in Hardware Design
Transistors (elementary component)


Logically act as inverters
Logic gates


CMOS NAND gate (using 4 trs)




If A = B = Vdd, output = Vss
If either A or B (or both) = Vss, output =Vdd
=> output = not(A.B)
Transistor circuit, logic symbol, truth table
Vdd
A
A.B
A
B
out put
A
B
Output
0
0
1
0
1
1
1
0
1
1
1
0
B
Vss
2015-04-09
Logic sy mbol
PNU Computer Eng.
Truth table
4
1.2 Abstraction in Hardware Design

The gate abstraction



Simplify the process of designing circuits with great number of trs
Removes the need to know that the gate is built from trs
Free from implementation technology in function level



Eg. Field effect tr, bipolar tr, etc.
However, performance difference exists
Levels of abstraction








Trs
Gates, memory cells
Adder, MUX, decoder, registers
ALUs, shifters, memory blocks
Processors, peripherals, memories
ICs
PCBs
PCs, controllers, mobile phones
2015-04-09
PNU Computer Eng.
5
1.3 MU0 – a simple processor

A simple form of processor can be built from a few
basic components






PC (program counter)
ACC (accumulator)
ALU (arithmetic-logic unit)
IR (instruction register)
Instruction decoder, control logic
The MU0 instruction set


A 16-bit machine with a 12-bit address space (4K x 2 bytes:
8K bytes memory)
Instructions: 16 bits long (op: 4 bits, address field: 12 bits)
4 bits
opcode
2015-04-09
12 bits
S
PNU Computer Eng.
6
1.3 MU0 – a simple processor

[Table 1.1] The MU0 instruction set
Instruction
Opcode
Effect
LDA S
0000
ACC := mem 16[S]
STO S
0001
mem 16[S] := ACC
ADD S
0010
ACC := ACC + mem 16[S]
SUB S
0011
ACC := ACC - mem 16[S]
JMP S
0100
PC := S
JGE S
0101
if ACC >= 0 PC := S
JNE S
0110
if ACC !=0 PC := S
STP
0111
stop
2015-04-09
PNU Computer Eng.
7
1.3 MU0 – a simple processor

Datapath


A register transfer level (RTL) design style based on
registers, MUXs, and so on
[Figure 1.5] MU0 datapath example
address bus
PC
control
IR
memory
ALU
ACC
data bus
2015-04-09
PNU Computer Eng.
8
RTL level design


[Figure 1.6] MU0 register transfer level organization
Control signals:

enables on all of regs
function select lines to ALU
select control lines for two MUXs
control for a tri-state driver
to send ACC value to memory
MEMrq (memory request)
RnW (read/write control lines)
2015-04-09
PNU Computer Eng.





9
1.4 Instruction set design


To build a high-performance processor (beyond MU0
inst. set), inst. set design is important.
4 address insts (the most general form)

Ex) add d, s1, s2, next_i; d := s1 + s2
f bits
n bits
f unc tion op 1 addr.

n bits
op 2 addr.
n bits
n bits
des t. addr. nex t_i addr.
3 address insts


Make address of the next inst. implicit using PC (except for
branch)
Ex) add d, s1, s2; d := s1 + s2
f bits
n bits
f unction op 1 addr.
2015-04-09
n bits
op 2 addr.
n bits
dest. addr.
PNU Computer Eng.
10
1.4 Instruction set design

2 address insts


Make destination reg. the same as one of source reg.
Ex) add d, s1; d := d + s1
f bits
n bits
f unct ion op 1 addr.

n bits
des t. addr.
1 address insts


AC is used as destination
Ex) add s1; AC := AC + s1
f bits
n bits
f unction op 1 addr.

0 address insts (using a stack)

Ex) add; tos := tos + next on stack
f bits
f unc tion
2015-04-09
PNU Computer Eng.
11
1.4 Instruction set design

Addressing modes







Immediate addressing: immediate data
Absolute addressing: inst. contains full address for data
Indirect addressing: inst. contains address of location that
contains address of data
Register addressing: data is in a reg.
Register indirect addressing
Index addressing
Stack addressing
2015-04-09
PNU Computer Eng.
12
1.4 Instruction set design

Control flow instructions




Subroutine calls & returns
System calls


Branch, jump
Conditional branch
Branch to an operating system routine
Exceptions

Error handling
2015-04-09
PNU Computer Eng.
13
1.5 Processor design trade-offs

CISC vs RISC

CISC




To reduce semantic gap b/w high level language & machine
instruction
Complex sequence of operations
Make compiler’s job easy
RISC



ARM’s middle name: from RISC
Reducing semantic gap is not the right way to make an efficient
computer
[Table 1.3] Typical dynamic instruction usage
Instruction type
2015-04-09
Dynamic usage
Data movement
Control flow
43%
23%
Arithmetic operations
15%
Comparisons
13%
Logical operations
5%
Other
1%
PNU Computer Eng.
14
1.5 Processor design trade-offs



Data movement b/w regs and memory:
almost half
Control flow such as branches & procedure
calls: almost quarter
Arithmetic operations: only 15%
 Complex arithmetic insts do not help much

The most important tech: pipelining, cache
memory
 To make processors go faster
2015-04-09
PNU Computer Eng.
15
1.5 Processor design trade-offs

Pipelines
1.
2.
3.
4.
5.
6.

Fetch
Decode
REG: get operands from register bank
ALU
MEM: access memory for an operand, if necessary
RES: write result back to register bank
[Figure 1.13] Pipelined instruction execution
1
f et ch dec
2
3
instruction
reg
f et ch dec
ALU mem res
reg
f et ch dec
ALU mem res
reg
ALU mem res
time
2015-04-09
PNU Computer Eng.
16
1.5 Processor design trade-offs

Pipeline hazards

Read after write hazard (data hazard)


1
Result from one inst is used as an operand by the next inst =>
inst2 must stall until the result is available
[Figure 1.14] Read-after-write pipeline hazard
f etch dec
2
reg ALU mem res
f etch dec
stall
reg ALU mem res
instruction
time
2015-04-09
PNU Computer Eng.
17
1.5 Processor design trade-offs

Branch hazard

Solution:




Compute branch target earlier (if possible)
The target may be computed speculatively
Delayed branch
[Figure 1.15] Pipelined branch behavior
1 (branch) f et ch dec
2
reg
f et ch dec
3
ALU mem res
reg
f et ch dec
4
5 (br anch tar get)
ALU mem res
reg
f et ch dec
ALU mem res
reg
f et ch dec
ALU mem res
reg
ALU mem res
instruction
time

Pipeline efficiency

2015-04-09
The deeper the pipeline, the worse the problems get: RISC approach is
better
PNU Computer Eng.
18
1.6 RISC


In 1980, Patterson: RISCI project
RISCI arch


Fixed (32-bit) inst size with few formats
Load-store arch:




RISCI organization




Insts that process data operate only on regs
Separate insts to access memory
A large register bank (32 32-bit regs) to allow load-store arch to
operate efficiently
Hard-wired inst decode logic
Pipelined execution
Single cycle execution
RISCI advantages



A smaller die size
A shorter development time
A higher performance (controversial)
2015-04-09
PNU Computer Eng.
19
Download