here - Princess Sumaya University for Technology

advertisement
LOGO
P r i n c e s s
S u m a y a
U n i v e r s i t y
f o r
Computer
Architecture
Dr. Esam Al_Qaralleh
Te c h n o l o g y
2
Instruction Set
Architecture
(ISA)
3
Outline
Introduction
Classifying instruction set architectures
Instruction set measurements







Memory addressing
Addressing modes for signal processing
Type and size of operands
Operations in the instruction set
Operations for media and signal processing
Instructions for control flow
Encoding an instruction set
MIPS architecture
4
LOGO
Instruction Set Principles and
Examples
Basic Issues in Instruction Set Design
 What operations and How many
 Load/store/Increment/branch are sufficient to do any
computation, but not useful (programs too long!!).
 How (many) operands are specified?
 Most operations are dyadic (e.g., AB+C); Some are
monadic (e.g., A B).
 How to encode them into instruction format?
 Instructions should be multiples of Bytes.
 Typical Instruction Set




32-bit word
Basic operand addresses are 32-bit long.
Basic operands (like integer) are 32-bit long.
In general, Instruction could refer 3 operands (AB+C).
 Challenge: Encode operations in a small number of
bits.
6
Brief Introduction to ISA
 Instruction Set Architecture: a set of instructions
 Each instruction is directly executed by the CPU’s hardware
 How is it represented?
 By a binary format since the hardware understands only bits
6
opcode
5
rs
5
16
rt
Immediate
 Options - fixed or variable length formats
 Fixed - each instruction encoded in same size field (typically 1
word)
 Variable – half-word, whole-word, multiple word instructions are
possible
7
What Must be Specified?
Instruction Format (encoding)
 How is it decoded?
Location of operands and result
 Where other than memory?
 How many explicit operands?
 How are memory operands located?
Data type and Size
Operations
 What are supported?
8
Example of Program Execution
 Command
 1: Load AC from
Memory
 2: Store AC to
memory
 5: Add to AC
from memory
 Add the contents
of memory 940
to the content of
memory 941 and
stores the result
at 941
Fetch
Execution
9
LOGO
Classifying
Instruction Set
Architecture
Instruction Set Design
CPU _ Time  IC * CPI * Cycle _ time
The instruction set influences everything
11
Instruction Characteristics
 Usually a simple operation
 Which operation is identified by the op-code field
 But operations require operands - 0, 1, or 2
 To identify where they are, they must be addressed
• Address is to some piece of storage
• Typical storage possibilities are main memory, registers, or a stack
 2 options explicit or implicit addressing
 Implicit - the op-code implies the address of the operands
• ADD on a stack machine - pops the top 2 elements of the stack,
then pushes the result
• HP calculators work this way
 Explicit - the address is specified in some field of the instruction
• Note the potential for 3 addresses - 2 operands + the destination
12
Classifying Instruction Set Architectures
Based on CPU internal storage options
AND # of operands
These choices critically affect - #instructions, CPI, and
cycle time
13
Operand Locations for Four ISA Classes
14
C=A+B
 Stack
 Push A
 Push B
 Add
• Pop the top-2 values of
the stack (A, B) and push
the result value into the
stack
 Pop C
 Accumulator (AC)
 Load A
 Add B
• Add AC (A) with B and
store the result into AC
 Store C
Register (registermemory)
 Load R1, A
 Add R3, R1, B
 Store R3, C
Register (load-store)




Load R1, A
Load R2, B
Add R3, R1, R2
Store R3, C
15
Modern Choice – Load-store Register
(GPR) Architecture
 Reasons for choosing GPR (general-purpose registers)
architecture
 Registers (stacks and accumulators…) are faster than memory
 Registers are easier and more effective for a compiler to use
• (A+B) – (C*D) – (E*F)
– May be evaluated in any order (for pipelining concerns or …)
» But on a stack machine  must left to right
 Registers can be used to hold variables
• Reduce memory traffic
• Speed up programs
• Improve code density (fewer bits are used to name a register)
 Compiler writers prefer that all registers be equivalent
and unreserved
 The number of GPR: at least 16
16
Characteristics Divide GPR Architectures
# of operands
 Three-operand: 1 result and 2 source
operands
 Two-operand – 1 both source/result and 1
source
How many operands are memory
addresses
Load-store
 0 – 3 (two
Register-memory
Memory-memory
sources + 1 result)
17
Pro’s and Con’s of Three Most Common
GPR Computers
Register-Register: (0,3)
+ Simple, fixed length instruction encoding.
+ Simple code-generation model.
+ Similar number of clocks to execute.
- Higher instruction count.
Memory-memory: (3,3)
+ Most compact.
- Different Instruction size.
- Memory access bottleneck.
Register-Memory: (1,2)
+ Data access without loading first.
+ Easy to encode and yield good density.
- One operand is destroyed.
- Limited number of registers.
18
LOGO
Memory Addressing
Memory Addressing Basics
All architectures must address memory
What is accessed - byte, word, multiple words?
 Today’s machine are byte addressable
 Main memory is organized in 32 - 64 byte lines
 Big-Endian or Little-Endian addressing
Hence there is a natural alignment problem
 Size s bytes at byte address A is aligned if
A mod s = 0
 Misaligned access takes multiple aligned memory
references
Memory addressing mode influences instruction
counts (IC) and clock cycles per instruction (CPI)
20
Byte Ordering
Idea
 Bytes in long word numbered 0 to 3
 Which is most (least) significant?
 Can cause problems when exchanging binary data
between machines
Big Endian: Byte 0 is most, 3 is least
 IBM 360/370, Motorola 68K, SPARC.
Little Endian: Byte 0 is least, 3 is most
 Intel x86, VAX
Alpha
 Chip can be configured to operate either way
 DEC workstation are little endian
 Cray T3E Alpha’s are big endian
21
Byte Ordering Example
union {
unsigned
unsigned
unsigned
unsigned
} dw;
char c[8];
short s[4];
int i[2];
long l[1];
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
s[0]
s[1]
s[2]
i[0]
s[3]
i[1]
l[0]
22
Byte Ordering on Alpha
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB
MSB
LSB
s[0]
MSB
LSB
s[1]
LSB
MSB
LSB
s[2]
MSB
s[3]
LSB
i[0]
MSB
MSB
i[1]
LSB
MSB
l[0]
Print
Output on Alpha:
Characters
Shorts
Ints
Long
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf1f0,0xf3f2,0xf5f4,0xf7f6]
[0xf3f2f1f0,0xf7f6f5f4]
[0xf7f6f5f4f3f2f1f0]
23
Byte Ordering on x86
Little Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
LSB
MSB
LSB
s[0]
MSB
LSB
s[1]
LSB
MSB
LSB
s[2]
MSB
s[3]
LSB
i[0]
MSB
MSB
i[1]
LSB
MSB
l[0]
Print
Output on Pentium:
Characters
Shorts
Ints
Long
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf1f0,0xf3f2,0xf5f4,0xf7f6]
[0xf3f2f1f0,0xf7f6f5f4]
[f3f2f1f0]
24
Byte Ordering on Sun
Big Endian
f0
f1
f2
f3
f4
f5
f6
f7
c[0] c[1] c[2] c[3] c[4] c[5] c[6] c[7]
MSB
LSB
MSB
s[0]
LSB
MSB
s[1]
MSB
LSB
MSB
s[2]
LSB
s[3]
MSB
i[0]
LSB
LSB
i[1]
MSB
LSB
l[0]
Print
Output on Sun:
Characters
Shorts
Ints
Long
0-7
0-3
0-1
0
==
==
==
==
[0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
[0xf0f1,0xf2f3,0xf4f5,0xf6f7]
[0xf0f1f2f3,0xf4f5f6f7]
[0xf0f1f2f3]
25
Addressing Modes
Immediate
Add R4, #3
Regs[R4]  Regs[R4]+3
Register
Add R4, R3
Regs[R4]  Regs[R4]+Regs[R3]
R3
Operand:3
Register Indirect
Add R4, (R1)
Regs[R4]  Regs[R4]+Mem[Regs[R1]]
R1
Operand
Registers
Operand
Registers
Memory
26
Addressing Modes(Cont.)
Direct
Memory Indirect
Add R4, (1001)
Add R4, @(R3)
Regs[R4]  Regs[R4]+Mem[1001] Regs[R4]  Regs[R4]+Mem[Mem[Regs[R3]]]
R3
1001
Operand
Operand
Memory
Registers
Memory
27
Addressing Modes(Cont.)
Displacement
Add R4, 100(R1)
Regs[R4]  Regs[R4]+Mem[100+R1]
R1
100
Scaled
Add R1, 100(R2) [R3]
Regs[R1]  Regs[R1]+Mem[100+
Regs[R2]+Regs[R3]*d]
R3 R2
100
Operand
Operand
*d
Registers
Memory
Registers
Memory
28
Typical Address Modes (I)
29
Typical Address Modes (II)
30
Use of Memory Addressing Mode (Figure 2.7)
Based on a VAX which
supported everything
Not counting Register
mode (50% of all)
31
Displacement Address Size
Average of 5 programs from SPECint92 and
SPECfp92.
 1% of addresses > 16 bits.
Integer Average
FP Average
30%
25%
20%
15%
10%
5%
Number of Bits
14
12
10
8
6
4
2
0
0%
32
Immediate Addressing Mode
10 Programs from SPECInt92 and
SPECfp92
35%
In
st
.
10%
LU
A
ll
58%
78%
Integer
45%
s
10%
ad
Lo
C
om
pa
re
s
A
77%
87%
FP
0%
50%
100%
Percentage of operations using immediate
33
Immediate Addressing Mode
50% to 60% fit within 8 bits
75% to 80% fit within 16 bits
60%
gcc
50%
40%
spice
30%
Tex
20%
10%
0%
0
4
8
12
16
20
24
28
32
Number of Bits
34
Short Summary – Memory Addressing
Need to support at least three addressing
modes
 Displacement, immediate, and register
deferred (+ REGISTER)
 They represent 75% -- 99% of the addressing
modes in benchmarks
The size of the address for displacement
mode to be at least 12—16 bits (75% –
99%)
The size of immediate field to be at least
8 – 16 bits (50%— 80%)
35
Operand Type & Size
Typical types: assume word= 32 bits
 Character - byte - ASCII or EBCDIC (IBM) - 4
per word
 Short integer - 2- bytes, 2’s complement
 Integer - one word - 2’s complement
 Float - one word - usually IEEE 754 these
days
 Double precision float - 2 words - IEEE 754
 BCD or packed decimal - 4- bit values packed
8 per word
36
Data Access Patterns
37
Short Summary – Type and Size of
Operand
The future - as we go to 64 bit machines
Larger offsets, immediate, etc. is likely
Usage of 64 and 128 bit values will
increase
DSPs need wider accumulating registers
than the size in memory to aid accuracy in
fixed-point arithmetic
38
LOGO
ALU Operations
40
What Operations are Needed
 Arithmetic + Logical
 Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT
 Logical operation: AND, OR, XOR, NOT
 Data Transfer - copy, load, store
 Control - branch, jump, call, return, trap
 System - OS and memory management
 We’ll ignore these for now - but remember they are needed
 Floating Point
 Same as arithmetic but usually take bigger operands
 Decimal
 String - move, compare, search
 Graphics – pixel and vertex,
compression/decompression operations
41
Top 10 Instructions for 80x86
 load: 22%
 conditional branch: 20%
 compare: 16%
 store: 12%
 add: 8%
 and: 6%
 sub: 5%
 move register-register:
4%
 call: 1%
 return: 1%
The most widely
executed instructions
are the simple
operations of an
instruction set
The top-10
instructions for 80x86
account for 96% of
instructions executed
Make them fast, as
they are the common
case
42
Control Instructions are a Big Deal
Jumps - unconditional transfer
Conditional Branches
 How is condition code set? – by flag or part of the
instruction
 How is target specified? How far away is it?
Calls
 How is target specified? How far away is it?
 Where is return address kept?
 How are the arguments passed? Callee vs. Caller
save!
Returns
 Where is the return address? How far away is it?
 How are the results passed?
43
Breakdown of Control Flows
Call/Returns
 Integer: 19%
FP: 8%
Jump
 Integer: 6% FP: 10%
Conditional Branch
 Integer: 75%
FP: 82%
44
Branch Address Specification
Known at compile time for unconditional and
conditional branches - hence specified in the
instruction
 As a register containing the target address
 As a PC-relative offset
Consider word length addresses, registers, and
instructions
 Full address desired? Then pick the register option.
• BUT - setup and effective address will take longer.
 If you can deal with smaller offset then PC relative
works
• PC relative is also position independent - so simple linker
duty
45
Returns and Indirect Jumps
Branch target is not known at compile time
Need a way to specify the target
dynamically
 Use a register
 Permit any addressing mode
 Regs[R4]  Regs[R4] + Mem[Regs[R1]]
Also useful for
 case or switch
 Dynamically shared libraries
 High-order functions or function pointers
46
Branch Stats - 90% are PC Relative
Call/Return
 TeX = 16%, Spice = 13%, GCC = 10%
Jump
 TeX = 18%, Spice = 12%, GCC = 12%
Conditional
 TeX = 66%, Spice = 75%, GCC = 78%
47
Branch Distances
48
Condition Testing Options
PSW: program Switch Word
49
What kinds of compares do Branches Use?
Large comparisons are with zero
50
Direction, Frequency, and real
Change
Key points – 75% are forward branch
• Most backward branches are loops - taken about 90%
• Branch statistics are both compiler and application dependent
• Any loop optimizations may have large effect
51
Short Summary – Operations in the
Instruction Set
Branch addressing to be able to jump to
about 100+ instructions either above or
below the branch
 Imply a PC-relative branch displacement of at
least 8 bits
Register-indirect and PC-relative
addressing for jump instructions to support
returns as well as many other features of
current systems ( dynamic allocations)
52
LOGO
Encoding an
Instruction Set
Encoding the ISA
 Encode instructions into a binary representation for
execution by CPU
 Can pick anything but:
 Affects the size of code - so it should be tight
 Affects the CPU design - in particular the instruction decode
 So it may have a big influence on the CPI or cycle-time
 Must balance several competing forces
 Desire for lots of addressing modes and registers
 Desire to make average program size compact
 Desire to have instructions encoded into lengths that will be easy
to handle in a pipelined implementation (multiple of bytes)
54
3 Popular Encoding Choices
 Variable (compact code but difficult to encode)




Primary opcode is fixed in size, but opcode modifiers may exist
Opcode specifies number of arguments - each used as address fields
Best when there are many addressing modes and operations
Use as few bits as possible, but individual instructions can vary widely in
length
 e. g. VAX - integer ADD versions vary between 3 and 19 bytes
 Fixed (easy to encode, but lengthy code)
 Every instruction looks the same - some field may be interpreted
differently
 Combine the operation and the addressing mode into the opcode
 e. g. all modern RISC machines
 Hybrid
 Set of fixed formats
 e. g. IBM 360 and Intel 80x86
Trade-off between size of program
VS. ease of decoding
55
3 Popular Encoding Choices (Cont.)
56
An Example of Variable Encoding -- VAX
addl3 r1, 737(r2), (r3): 32-bit integer add
instruction with 3 operands  need 6 bytes to
represent it
 Opcode for addl3: 1 byte
 A VAX address specifier is 1 byte (4-bits: addressing
mode, 4-bits: register)
• r1: 1 byte (register addressing mode + r1)
• 737(r2)
– 1 byte for address specifier (displacement addressing + r2)
– 2 bytes for displacement 737
• (r3): 1 byte for address specifier (register indirect + r3)
Length of VAX instructions: 1—53 bytes
57
Short Summary – Encoding the
Instruction Set
Choice between variable and fixed
instruction encoding
 Code size than performance  variable
encoding
 Performance than code size  fixed encoding
58
LOGO
Role of Compilers
Critical goals in ISA from the compiler
viewpoint
 What features will lead to high-quality code
 What makes it easy to write efficient
compilers for an architecture
60
Compiler and ISA
ISA decisions are no more for programming AL
easily
Due to HLL, ISA is a compiler target today
Performance of a computer will be significantly
affected by compiler
Understanding compiler technology today is
critical to designing and efficiently implementing
an instruction set
Architecture choice affects the code quality and
the complexity of building a compiler for it
61
Goal of the Compiler
Primary goal is correctness
Second goal is speed of the object code
Others:




Speed of the compilation
Ease of providing debug support
Inter-operability among languages
Flexibility of the implementation - languages
may not change much but they do evolve - e.
g. Fortran 66 ===> HPF
Make the frequent cases fast and the rare case correct
62
Optimization Observations
Hard to reduce branches
Biggest reduction is often memory
references
Some ALU operation reduction happens
but it is usually a few %
Implication:
 Branch, Call, and Return become a larger
relative % of the instruction mix
 Control instructions among the hardest to
speed up
63
How can Architects Help Compiler
Writers
Provide Regularity
 Address modes, operations, and data types should be
orthogonal (independent) of each other
• Simplify code generation especially multi-pass
• Counterexample: restrict what registers can be used for a
certain classes of instructions
Provide primitives - not solutions
 Special features that match a HLL construct are often
un-usable
 What works in one language may be detrimental to
others
64
How can Architects Help Compiler
Writers (Cont.)
Simplify trade-offs among alternatives
 How to write good code? What is a good code?
• Metric: IC or code size (no longer true) caches and
pipeline…
 Anything that makes code sequence performance
obvious is a definite win!
• How many times a variable should be referenced before it is
cheaper to load it into a register
Provide instructions that bind the quantities
known at compile time as constants
 Don’t hide compile time constants
• Instructions which work off of something that the compiler
thinks could be a run-time determined value hand-cuffs the
optimizer
65
Short Summary -- Compilers
ISA has at least 16 GPR (not counting FP
registers) to simplify allocation of registers using
graph coloring
Orthogonality suggests all supported addressing
modes apply to all instructions that transfer data
Simplicity – understand that less is more in ISA
design
 Provide primitives instead of solutions
 Simplify trade-offs between alternatives
 Don’t bind constants at runtime
Counterexample – Lack of compiler support for
multimedia instructions
66
LOGO
The MIPS
Architecture
Expectations for New ISA
 Use general-purpose registers, with a load-store architecture
 Support displacement (offset size12-16 bits), immediate (size 8 to
16 bits), and register indirect
 Support 8-, 16-, 32-, and 64-bit integers and 64-bit IEEE 754
floating-point numbers
 Support the following simple instructions: load, store, add, subtract,
move register-register, and, shift, compare equal, compare not equal,
branch (with a PC-relative address at least 8 bits long), jump, call,
return
 Use fixed instruction encoding if interested in performance and use
variable instruction encoding if interested in code size
 Provide at least 16 general-purpose registers (GPA) + separate
floating-point registers, be sure all addressing modes apply to all
data transfer instructions, and aim for a minimalist instruction set
68
MIPS
Simple load- store ISA
Enable efficient pipeline implementation
Fixed instruction set encoding
Efficiency as a compiler target
MIPS64 variant is discussed here
69
Register for MIPS
32 64-bit integer GPR’s - R0, R1, ... R31,
R0= 0 always
32 FPR’s - used for single or double
precision
 For single precision: F0, F1, ... , F31 (32-bit)
 For double precision: F0, F2, ... , F30 (64-bit)
Extra status registers - moves via GPR’s
Instructions for moving between an FRP
and a GPR
70
Data Types for MIPS
8-bit byte, 16-bit half words, 32-bit word, and 64bit double words for integer data
32-bit single precision and 64-bit double
precision for FP
MIPS64 operations work on 64-bit integer and
32- or 64-bit floating point
 Bytes, half words, and words are loaded into the
GPRs with zeros or the sign bit replicated to fill the 64
bits of the GPRs
All references between memory and either
GPRs or FPRs are through load or stores
71
Addressing Modes for MIPS
Data addressing : immediate and displacement
(16 bits)
 Displacement: Add R4, 100(R1)
(Regs[R4]Regs[R4]+Mem[100+Regs[R1]])
 Register-indirect: placing 0 in displacement field
• Add R4, (R1) (Regs[R4]Regs[R4]+Mem[Regs[R1]])
 Absolute addressing (16 bits): using R0 as the base
register
• Add R1, (1001) (Regs[R4]Regs[R4]+Mem[1001])
Byte addressable with 64-bit address
 Mode selection for Big Endian or Little Endian
72
MIPS Instruction Format
Encode addressing mode into the opcode
All instructions are 32 bits with 6-bit
primary opcode
73
MIPS Instruction Format (Cont.)
I-Type Instruction
6
5
opcode
rs
5
rt
16
Immediate
 Loads and Stores
LW R1, 30(R2), S.S F0, 40(R4)
 ALU ops on immediates
DADDIU R1, R2, #3
 rt <-- rs op immediate
 Conditional branches
BEQZ R3, offset
 rs is the register checked
 rt unused
 immediate specifies the offset
 Jump registers ,jump and link register
JR R3
 rs is target register
 rt and immediate are unused but = 011
74
MIPS Instruction Format (Cont.)
6
opcode
R-Type Instruction
5
5
5
rs
rt
rd
5
shamt
6
func
 Register-register ALU operations: rdrs funct rt DADDU R1, R2, R3
 Function encodes the data path operations: Add, Sub...
 read/write special registers
 Moves
J-Type Instruction: Jump, Jump and Link, Trap and return from exception
6
26
opcode
Offset added to PC
75
MIPS instruction MIX
SPECint2000
76
MIPS instruction MIX (Cont.)
SPECfp2000
77
Download