CS61CL Machine Structures – Instruction Set Architecture Lec 5 David Culler

advertisement
CS61CL Machine Structures
Lec 5 – Instruction Set Architecture
David Culler
Electrical Engineering and Computer Sciences
University of California, Berkeley
What is “Computer Architecture”?
Applications
App photo
Operating
System
Compiler
Firmware
Instr. Set Proc. I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
Layout & fab
Semiconductor Materials
Die photo
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
7/27/2016
cs61cl f09 lec 5
2
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Operating
Systems
History
(A = F / M)
7/27/2016
cs61cl f09 lec 5
3
The Instruction Set: a Critical Interface
software
instruction set
hardware
• Properties of a good abstraction
–
–
–
–
Lasts through many generations (portability)
Used in many different ways (generality)
Provides convenient functionality to higher levels
Permits an efficient implementation at lower levels
7/27/2016
cs61cl f09 lec 5
4
Instruction Set Architecture
... the attributes of a [computing] system as seen by
the programmer, i.e. the conceptual structure and
functional behavior, as distinct from the organization
of the data flows and controls the logic design, and
the physical implementation.
– Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
-- Organization of Programmable
Storage
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Formats
-- Instruction (or Operation Code) Set
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
7/27/2016
cs61cl f09 lec 5
5
Computer Organization
• Capabilities & Performance
Characteristics of Principal Functional
Units
– (e.g., Registers, ALU, Shifters, Logic Units, ...)
• Ways in which these components are
interconnected
• Information flows between components
• Logic and means by which such
information flow is controlled.
• Choreography of FUs to realize the ISA
• Register Transfer Level (RTL)
Description
7/27/2016
cs61cl f09 lec 5
Logic Designer's View
ISA Level
FUs & Interconnect
6
Fundamental Execution Cycle
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
7/27/2016
Memory
Obtain instruction
from program
storage
Determine required
actions and
instruction size
Locate and obtain
operand data
Processor
program
regs
F.U.s
Data
Compute result value
or status
von Neuman
Deposit results in
storage for later
use
bottleneck
Determine successor
instruction
cs61cl f09 lec 5
7
Elements of an ISA
• Set of machine-recognized data types
– bytes, words, integers, floating point, strings, . . .
• Operations performed on those data types
– Add, sub, mul, div, xor, move, ….
• Programmable storage
– regs, PC, memory
• Methods of identifying and obtaining data
referenced by instructions (addressing modes)
– Literal, reg., absolute, relative, reg + offset, …
• Format (encoding) of the instructions
– Op code, operand fields, …
7/27/2016
Current Logical State
Next Logical State
of the Machine
of the Machine
cs61cl f09 lec 5
8
Administrative Issues
• HW4 due before midnight
• HW5 will go out before morning, due next
Wednesday
• Project 1 due midnight Friday 10/2
• Midterm 1 the following wed
– alternate on Monday 6-7 pm
– need to email instructor in advance
instructions
foo.c16
data structure
Format
foo.o
instructions
Parse
foo.syms
asm source
symbol
sym table
data structure
7/27/2016
cs61cl f09 lec 5
9
Example: MIPS R3000
r0
r1
°
°
°
r31
PC
lo
hi
0
Programmable storage
Data types ?
2^32 x bytes
Format ?
31 x 32-bit GPRs (R0=0)
Addressing Modes?
32 x 32-bit FP regs (paired DP)
HI, LO, PC
Arithmetic logical
Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU,
AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUI
SLL, SRL, SRA, SLLV, SRLV, SRAV
Memory Access
LB, LBU, LH, LHU, LW, LWL,LWR
SB, SH, SW, SWL, SWR
Control
32-bit instructions on word boundary
J, JAL, JR, JALR
BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL
7/27/2016
cs61cl f09 lec 5
10
Address Space
0000000:
0040000:
1000000:
reserved
code
<= instructions
static data
<= externs
1008000:
heap
PC
regs
<= malloc
gp
sp
7FFFFFFF:
stack
<= Local variables
<= OS, etc.
unused
FFFFFFFF:
7/27/2016
UCB CS61CL F09 Lec 3
11
MIPS Instruction Format
R: Reg-Reg
I: Reg-Imed
J: Jump
op
rs
rt
rd
shamt
funct
6
5
5
5
5
6
op
rs
rt
immediate
6
5
5
16
op
immediate
6
26
• Reg-Reg instructions (op == 0)
– add, sub, and, or, nor, xor, slt
– sll. srl, sra
R[rd] := R[rs] funct R[rt]; pc:=pc+4
R[rd] := R[rt] shft shamt
• Reg-Immed (op != 0)
–
–
–
–
addi, andi, ori, xori, lui,
addiu, slti, sltiu
lw, lh, lhu, lb, lbu
sw, sh, sb
7/27/2016
R[rt] := R[rs] op Im16
R[rt] := Mem[ R[ rs ] + signEx(Im16) ]*
Mem[ R[ rs ] + signEx(Im16) ] := R[rt]
cs61cl f09 lec 5
12
Addressing Modes
• How is the operand located?
– Effective Address Calculation
• Immediate
op
rs rt im16
reg
• Register Direct
• Absolute
• Register Indirect
• Base + offset
op
op
0
rs rt addr
31
rs rt addr
0
mem
2n-1
reg
0
op
rs rt addr
31
reg
op
rs rt addr
0
mem
0
mem
31
7/27/2016
cs61cl f09 lec 5
+
13
0
Data Structure Access to Addressing
Mode
•
•
•
•
•
•
•
•
Pointer dereference: *p
Struct Field:
S.foo
Array Element
A[i]
Pointer deref to field p->foo
Pointer to array
(*A)[i]
Array of pointers*(A[i])
Pointer to pointer
**P
Array of ptrs to struct A[i]->foo
…
7/27/2016
cs61cl f09 lec 5
14
Computer Number Systems
• We all take positional notation for granted
– Dk-1 Dk-2 …D0 represents Dk-1Bk-1 + Dk-2Bk-2 + …+ D0 B0
where B  { 0, …, B-1 }
• We all understand how to compare, add, subtract these
numbers
– Add each position, write down the position bit and possibly carry to
the next position
• Computers represent finite number systems
– Generally radix 2
– Bk-1 Bk-2 …B0 represents Bk-12k-1 + B-22k-2 + …+ B0 20 where B  { 0,1 }
7/27/2016
cs61cl f09 lec 5
15
Unsigned Numbers - Addition
+15
+14
1111
+0
1110
+13
Example: 3 + 2 = 5
+1
0000
0001
+
1101
+2
0010
Is just addition, base 2
+12 1100
0011
+3
+11 1011
0100
+4
1010
0101
+10
1001
+9
Add the bits in each position
and carry
+5
0110
1000
+8
7/27/2016
Unsigned binary addition
0111
1
0011
+ 0010
+6
0101
+7
cs61cl f09 lec 5
16
Twos Complement number wheel
-1
-2
1111
1110
-3
+0
+1
0000
0001
1101
0010
+2
+
-4
1100
0011
+3
0 100 = + 4
-5
1011
0100
+4
1 100 = - 4
-6
1010
0101
1001
-7
+5
-
0110
1000
-8
0111
+6
+7
Bk-1 Bk-2 …B0 represents -Bk-12k-1 + B-22k-2 + …+ B0 20
Easy to determine sign, Only one representation for 0
Addition and subtraction just as in unsigned case
Simple comparison: A < B iff A – B < 0
One more negative number than positive number
7/27/2016
cs61cl f09 lec 5
17
Sign Extension
-0*27 + B626+ B525+ B424 + B323 + B222 + B121+ B020
0xxxxxxx
Positive Number
000000000xxxxxxx
-0*215 + … + 0*27 + B626+ B525+ B424 + B323 + B222 + B121+ B020
Negative Number
-27 + B626+ B525+ B424 + B323 + B222 + B121+ B020
1xxxxxxx
111111111xxxxxxx
-215 + 214 … + 27 + B626+ B525+ B424 + B323 + B222 + B121+ B020
-27
7/27/2016
cs61cl f09 lec 5
18
MIPS Instruction Format
R: Reg-Reg
I: Reg-Imed
J: Jump
op
rs
rt
rd
shamt
funct
6
5
5
5
5
6
op
rs
rt
immediate
6
5
5
16
op
Addr
6
26
• Reg-Reg instructions (op == 0)
• Reg-Immed (op != 0)
• Jumps
– j
– jal
– jr
7/27/2016
PC := PC31..28 || addr || 00
PC := PC31..28 || addr || 00; R[31] := PC + 4
PC := R[rs]
cs61cl f09 lec 5
19
MIPS Instruction Format
R: Reg-Reg
I: Reg-Imed
J: Jump
•
•
•
•
op
rs
rt
rd
shamt
funct
6
5
5
5
5
6
op
rs
rt
immediate
6
5
5
16
op
Addr
6
26
Reg-Reg instructions (op == 0)
Reg-Immed (op != 0)
Jumps
Branches
– BEQ, BNE PC := (R[rs] ?= R[rt]) ? PC + signEx(im16) : PC+4
– BLT, BGT, BLE, BGTE are pseudo ops
– Move and LI are pseudo ops too
7/27/2016
cs61cl f09 lec 5
20
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based (Stack)
(B5000 1963)
Concept of a Family
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
(Vax, Intel 432 1977-80)
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
RISC
iX86?
7/27/2016
(MIPS,Sparc,HP-PA,IBM RS6000, 1987)
cs61cl f09 lec 5
21
Dramatic Technology Advance
• Prehistory: Generations
–
–
–
–
1st Tubes
2nd Transistors
3rd Integrated Circuits
4th VLSI….
• Discrete advances in each generation
– Faster, smaller, more reliable, easier to utilize
• Modern computing: Moore’s Law
– Continuous advance, fairly homogeneous technology
7/27/2016
cs61cl f09 lec 5
22
Moore’s Law
•
“Cramming More Components onto Integrated Circuits”
– Gordon Moore, Electronics, 1965
•
# on transistors on cost-effective integrated circuit double every 18 months
7/27/2016
cs61cl f09 lec 5
23
Technology Trends: Microprocessor
Capacity
100000000
Itanium II: 241 million
Pentium 4: 55 million
Alpha 21264: 15 million
Pentium Pro: 5.5 million
PowerPC 620: 6.9 million
Alpha 21164: 9.3 million
Sparc Ultra: 5.2 million
10000000
Moore’s Law
Pentium
i80486
Transistors
1000000
i80386
i80286
100000
CMOS improvements:
• Die size: 2X every 3 yrs
• Line width: halve / 7 yrs
i8086
10000
i8080
i4004
1000
1970
1975
1980
1985
1990
1995
2000
Year
7/27/2016
cs61cl f09 lec 5
24
Technology Trends
•
•
•
•
•
•
Clock Rate:
~30% per year
Transistor Density: ~35%
Chip Area:
~15%
Transistors per chip:
~55%
Total Performance Capability: ~100%
by the time you graduate...
– 3x clock rate (>10 GHz)
– 10x transistor count (100 Billion transistors)
– 30x raw capability
• plus 16x dram density,
• 32x disk density (60% per year)
• Network bandwidth, …
7/27/2016
cs61cl f09 lec 5
25
Performance Trends
Performance
100
Supercomputers
10
Mainframes
Microprocessors
Minicomputers
1
MIPS R3000
0.1
1965
7/27/2016
1970
1975
1980
cs61cl f09 lec 5
1985
1990
1995
26
400
200
600
7/27/2016
800
cs61cl f09 lec 5
1.54X/yr
1200
DEC Alpha 21164/600
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266
IBM POWER 100
DEC AXP/500
HP 9000/750
IBM RS/6000
1000
MIPS M/120
MIPS M/2000
0
Sun-4/260
Processor Performance
(1.35X before, 1.55X now)
87 88 89 90 91 92 93 94 95 96 97
27
Definition: Performance
• Performance is in units of things per sec
– bigger is better
• If we are primarily concerned with response time
performance(x) =
1
execution_time(x)
" X is n times faster than Y" means
Performance(X)
Execution_time(Y)
n
=
=
Performance(Y)
Execution_time(Y)
7/27/2016
cs61cl f09 lec 5
28
Metrics of Performance
Application
Answers per day/month
Programming
Language
Compiler
ISA
(millions) of Instructions per second: MIPS
(millions) of (FP) operations per second: MFLOP/s
Datapath
Control
Function Units
Transistors Wires Pins
7/27/2016
Megabytes per second
Cycles per second (clock rate)
cs61cl f09 lec 5
29
Download