ppt - Computer Science and Engineering

advertisement
Introduction to Computer Architecture
Course number: CS141
Who? Tarun Soni ( tsoni@cs.ucsd.edu )
TA:
Wenjing Rao (wrao@cs) and Eric Liu (xeliu@cs)
Where? CENTR: 119
When? M,W @ 6-8:50pm
Textbook: Patterson and Hennessy,
Computer Organization & Design
The hardware software interface, 2nd edition.
Web-page: http://www-cse.ucsd.edu/users/tsoni/cse141
(slides, homework questions, other pointers and information)
Office hours:
Tarun: Mon. 4pm-6pm: AP&M 3151
Yang Yu and Wenjing Rao: TBD, look on the webpage
CS141-L1-1
Tarun Soni, Summer ‘03
Todays Agenda
 Administrivia
 Technology trends
 Computer organization: concept of abstraction
 Instruction Set Architectures: Definition, types, examples
 Instruction formats: operands, addressing modes
 Operations: load, store, arithmetic, logical
 Control instructions: branch, jump, procedures
 Stacks
 Examples: in-line code, procedure, nested-procedures
 Other architectures
CS141-L1-2
Tarun Soni, Summer ‘03
Schedule-sort of
1
6/30
Intro., Technology, ISA
2
7/2
Performance, Cost, Arithmetic
3
7/7
Multiply, Divide?, FP numbers
4
7/9
Single cycle: Datapath, Control
5
7/14
Multiple Cycle CPU, Microprogramming
6
7/16
Mid-term quiz;
7
7/21
Pipelining: intro, control, exceptions
8
7/23
Memory systems, Cache, Virtual
memory
9
7/28
I/O Devices
10
7/30
Superscalars, Parallel machines
11
??
Overview, wrapup, catchup ..
**
??
Final, 7-10 pm, Friday
CS141-L1-3
Tarun Soni, Summer ‘03
Grading
• Grade breakdown
– Mid-term
(1.5 hours)
30%
– Final
(3 hours)
40%
– Pop-Quizzes (3, 45 min each, only 2 high scores cout)
30%
– Class Participation:
Extras??
• Can’t make exams: tell us early and we will work something out
• Homeworks do not need to be turned in. However, pop-quizzes will be based on hw.
• What is cheating?
– Studying together in groups is encouraged
– Work must be your own
– Common examples of cheating: copying an exam question from other material or
other person...
– Better off to skip question (small fraction of grade.)
• Written/email request for changes to grades
– average grade will be a B or B+; set expectations accordingly
CS141-L1-4
Tarun Soni, Summer ‘03
Why?
• You may become a practitioner someday ?
• Keeper of Moore’s law
• Architecture concepts are core to other sub-systems
• Video-processors
• Security engines
• Routing/Networking etc.
• Even if you become a software geek?
• Architecture enables a way of thinking
• Understanding leads to breadth and better
implementation of software
CS141-L1-5
Tarun Soni, Summer ‘03
‘Computer” of the day
Jacquard loom
late 1700’s
for weaving silk
“Program” on punch cards
“Microcode”: each hole
lifts a set of threads
“Or gate”: thread lifted if
any controlling hole punched
CS141-L1-6
Tarun Soni, Summer ‘03
Trends: Moores law
CS141-L1-7
Tarun Soni, Summer ‘03
Trends: $1000 will buy you…
CS141-L1-8
Tarun Soni, Summer ‘03
Trends: Densities
CS141-L1-9
Tarun Soni, Summer ‘03
Technology
Source: Intel Journal, May 2002
CS141-L1-10
Tarun Soni, Summer ‘03
Other technology trends
• Processor
Physics-advancement
– logic capacity: about 30% per year
Architecture-advancement
– clock rate:
about 20% per year
• Memory
– DRAM capacity: about 60% per year (4x every 3 years)
– Memory speed: about 10% per year
– Cost per bit: about 25% per year 100000000
• Disk
10000000
1000000
– capacity: about 60% per year
100000
1000
CPU logic capacity
DRAM capacity
10000
Disk capacity
100
1000
CPU Speed
DRAM Speed
100
10
10
CS141-L1-11
34
31
28
25
22
19
16
13
10
7
4
34
31
28
25
22
19
16
13
7
10
4
1
Speed
1
1
1
Capacity
Tarun Soni, Summer ‘03
SPEC Performance
350
300
RISC
Performance
250
200
150
100
RISC
introduction
Intel x86
35%/yr
50
0
1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995
Year
performance now improves 50% per year (2x every 1.5 years)
CS141-L1-12
Tarun Soni, Summer ‘03
Organization: A Basic Computer
Every computer has 5 basic components
Computer
Control
Input
Memory
Datapath
CS141-L1-13
Output
Tarun Soni, Summer ‘03
Organization: A Basic Computer
• Not all “memory” are created equally
– Cache: fast (expensive) memory are placed closer to the processor
– Main memory: less expensive memory--we can have more
Proc
Caches
Busses
adapters
Memory
Controllers
I/O Devices:
Disks
Displays
Keyboards
Networks
• Input and output (I/O) devices have the messiest organization
– Wide range of speed: graphics vs. keyboard
– Wide range of requirements: speed, standard, cost ...
– Least amount of research (so far)
CS141-L1-14
Tarun Soni, Summer ‘03
What is “Computer Architecture”
Computer Architecture =
Instruction Set Architecture + Machine Organization
How you talk to the machine
What the machine looks like
Computer Architecture and Engineering
Instruction Set Design
Computer Organization
Interfaces
Hardware Components
Compiler/System View
Logic Designer’s View
CS141-L1-15
Tarun Soni, Summer ‘03
Architecture?
Application
Operating
System
Compiler
Firmware
Instr. Set Proc. I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
Layout
• Coordination of many levels of abstraction
• Under a rapidly changing set of forces
• Design, Measurement, and Evaluation
CS141-L1-16
Tarun Soni, Summer ‘03
Levels of abstraction?
temp = v[k];
High Level Language
Program
v[k] = v[k+1];
v[k+1] = temp;
Compiler
lw
lw
sw
sw
Assembly Language
Program
Assembler
Machine Language
Program
0000
1010
1100
0101
1001
1111
0110
1000
$15,
$16,
$16,
$15,
1100
0101
1010
0000
0110
1000
1111
1001
0($2)
4($2)
0($2)
4($2)
1010
0000
0101
1100
1111
1001
1000
0110
0101
1100
0000
1010
1000
0110
1001
1111
Machine Interpretation
Control Signal
Specification
CS141-L1-17
ALUOP[0:3] <= InstReg[9:11] & MASK
Tarun Soni, Summer ‘03
Instruction Set Architecture
ISA is the agreed-upon interface between all the software
that runs on the machine and the hardware that executes it.
software
instruction set
hardware
CS141-L1-18
Tarun Soni, Summer ‘03
Example ISAs
•
•
•
•
•
•
IBM360, VAX etc.
Digital Alpha
(v1, v3)
HP PA-RISC
(v1.1, v2.0)
Sun Sparc
(v8, v9)
SGI MIPS
(MIPS I, II, III, IV, V)
Intel
(8086,80286,80386,
80486,Pentium, MMX, ...)
• ARM
ARM7,8,StrongARM
1992-97
1986-96
1987-95
1986-96
1978-96
1995-
Digital Signal Processors also have an ISA
TMS320, Motorola, OAK etc.
CS141-L1-19
Tarun Soni, Summer ‘03
ISAs
Instruction Set Architecture
“How to talk to computers if you
aren’t in Star Trek”
CS141-L1-20
Tarun Soni, Summer ‘03
ISAs
•
•
•
•
Language of the Machine
More primitive than higher level languages
e.g., no sophisticated control flow
Very restrictive
e.g., MIPS Arithmetic Instructions
We’ll be working with the MIPS instruction set architecture
– similar to other architectures developed since the 1980's
– used by NEC, Nintendo, Silicon Graphics, Sony
Design goals: maximize performance and minimize cost, reduce design time
CS141-L1-21
Tarun Soni, Summer ‘03
ISAs
Ideally the only part of the machine visible to the
programmer/compiler
•
•
•
•
•
CS141-L1-22
Available instructions (Opcodes)
Formats
Registers, number and type
Addressing modes, access mechanisms
Exception conditions etc.
Tarun Soni, Summer ‘03
Instruction Set Architecture: What Must be Specified?
Instruction
Fetch
Instruction
Decode
Operand
Fetch
Execute
Result
Store
Next
Instruction
CS141-L1-23
° Instruction Format or Encoding
– how is it decoded?
° Location of operands and result
– where other than memory?
– how many explicit operands?
– how are memory operands located?
– which can or cannot be in memory?
° Data type and Size
° Operations
– what are supported
° Successor instruction
– jumps, conditions, branches
fetch-decode-execute is implicit!
Tarun Soni, Summer ‘03
Vocabulary
•
•
•
•
superscalar processor -- can execute more than one instructions per cycle.
cycle -- smallest unit of time in a processor.
parallelism -- the ability to do more than one thing at once.
pipelining -- overlapping parts of a large task to increase throughput without
decreasing latency
CS141-L1-24
Tarun Soni, Summer ‘03
ISA Decisions
destination operand
•
•
•
operations
– how many?
– which ones
operands
– how many?
– location
– types
– how to specify?
instruction format
– size
– how many formats?
CS141-L1-25
operation
y=x+b
(add r1, r2, r5)
how does the computer know what
0001 0100 1101 1111
means?
Tarun Soni, Summer ‘03
Crafting an ISA
•
•
•
•
We’ll look at some of the decisions facing an instruction set architect, and
how those decisions were made in the design of the MIPS instruction set.
MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced Instruction
Set Computer) ISA.
– fixed instruction length
– few instruction formats
– load/store architecture
RISC architectures worked because they enabled pipelining. They continue to
thrive because they enable parallelism.
CS141-L1-26
Tarun Soni, Summer ‘03
Basic types of ISAs
Accumulator (1 register):
1 address
add A
acc acc + mem[A]
1+x address
addx A acc acc + mem[A + x]
Stack:
0 address
add
tos tos + next
General Purpose Register:
2 address
add A B
EA(A) EA(A) + EA(B)
3 address
add A B C
EA(A) EA(B) + EA(C)
Load/Store:
3 address
add Ra Rb Rc
Ra Rb + Rc
load Ra Rb
Ra mem[Rb]
store Ra Rb
mem[Rb] Ra
Comparison:
Bytes per instruction? Number of Instructions? Cycles per instruction?
CS141-L1-27
Tarun Soni, Summer ‘03
Instruction Count
C = A+B
Accumulator (1 register):
Load A
Add B
Store C
Stack:
Push A
Push B
Add
Pop C
CS141-L1-28
General Purpose Register:
(Register-Memory)
Load R1,A
Add R1,B
Store C,R1
Load/Store:
Load R1,A
Load R2,B
Add R3,R1,R2
Store C,R3
Tarun Soni, Summer ‘03
Instruction Length
Variable:
…
Fixed:
Hybrid:
MIPS Instructions
•
•
All instructions have 3 operands
Operand order is fixed (destination first)
C code:
MIPS code:
CS141-L1-29
A = B + C
add $s0, $s1, $s2
(associated with variables by compiler)
Tarun Soni, Summer ‘03
Instruction Length
•
•

Variable-length instructions (Intel 80x86, VAX) require multi-step fetch
and decode, but allow for a much more flexible and compact instruction
set.
Fixed-length instructions allow easy fetch and decode, and simplify
pipelining and parallelism.
All MIPS instructions are 32 bits long.
– this decision impacts every other ISA decision we make because it
makes instruction bits scarce.
• If code size is most important,
use variable length instructions
• If performance is most
important, use fixed length
CS141-L1-30
Recent embedded machines (ARM,
MIPS) added optional mode to
execute subset of 16-bit wide
instructions (Thumb, MIPS16)
choose performance or density per
procedure
Tarun Soni, Summer ‘03
MIPS Instruction Format
6 bits
5 bits
5 bits
5 bits
6 bits
rd
sa
funct
OP
rs
rt
OP
rs
rt
OP
•
•
5 bits
immediate
target
the opcode tells the machine which format
so add r1, r2, r3 has
– opcode=0, funct=32, rs=2, rt=3, rd=1, sa=0
– 000000 00010 00011 00001 00000 100000
CS141-L1-31
Tarun Soni, Summer ‘03
Operands
•
•
•
operands are generally in one of two places:
– registers (32 int, 32 fp)
– memory (232 locations)
registers are
– easy to specify
– close to the processor (fast access)
the idea that we want to access registers whenever possible led to
load-store architectures.
– normal arithmetic instructions only access registers
– only access memory with explicit loads and stores
CS141-L1-32
Tarun Soni, Summer ‘03
Load Store Architectures
Load-store architectures
can do:
add r1=r2+r3
and
load r3, M(address)
can’t do
add r1 = r2 + M(address)

-more instructions
+fast implementation (e.g., easy
pipelining)
forces heavy dependence on
registers, which is exactly what
you want in today’s CPUs
Expect new instruction set architecture to use
general purpose register
Pipelining => Expect it to use load store variant
of GPR ISA
CS141-L1-33
Tarun Soni, Summer ‘03
General Purpose Registers
° Advantages of registers
• registers are faster than memory
• registers are easier for a compiler to use
- e.g., (A*B) – (C*D) – (E*F) multiplies in any order vs. stack
• registers can hold variables
- memory traffic is reduced, so program is sped up
- code density improves (since register named with fewer bits
than memory location)
MIPS Registers
• Programmable storage
– 2^32 x bytes of memory
– 31 x 32-bit GPRs (R0 = 0)
– 32 x 32-bit FP regs (paired DP)
– HI, LO, PC
CS141-L1-34
r0
r1
°
°
°
r31
PC
lo
hi
0
Tarun Soni, Summer ‘03
Memory Organization
•
•
•
Viewed as a large, single-dimension array, with an address.
A memory address is an index into the array
"Byte addressing" means that the index points to a byte of memory.
CS141-L1-35
0
8 bits of data
1
8 bits of data
2
8 bits of data
3
8 bits of data
4
8 bits of data
5
8 bits of data
6
8 bits of data
Tarun Soni, Summer ‘03
Memory Organization
•
•
Bytes are nice, but most data items use larger "words"
For MIPS, a word is 32 bits or 4 bytes.
0
32 bits of data
4
32 bits of data
8
32 bits of data
12
32 bits of data
Registers hold 32 bits of data
•
•
•
...
232 bytes with byte addresses from 0 to 232-1
230 words with byte addresses 0, 4, 8, ... 232-4
Words are aligned
i.e., what are the least 2 significant bits of a word address?
CS141-L1-36
Tarun Soni, Summer ‘03
Data Types
Bit: 0, 1
Bit String: sequence of bits of a particular length
4 bits is a nibble
8 bits is a byte
16 bits is a half-word
32 bits is a word
64 bits is a double-word
Character:
ASCII 7 bit code
Decimal:
digits 0-9 encoded as 0000b thru 1001b
two decimal digits packed per 8 bit byte
Integers:
2's Complement
Floating Point:
Single Precision
Double Precision
Extended Precision
CS141-L1-37
exponent
MxR
mantissa
E
base
How many +/- #'s?
Where is decimal pt?
How are +/- exponents
represented?
Tarun Soni, Summer ‘03
Operand Usage
Doubleword
0%
69%
74%
Word
Halfword
Byte
Int Avg.
31%
19%
FP Avg.
0%
7%
0%
0%
20%
40%
60%
80%
Frequency of reference by size
Support data sizes and types:
8-bit, 16-bit, 32-bit integers and
32-bit and 64-bit IEEE 754 floating point numbers
CS141-L1-38
Tarun Soni, Summer ‘03
Addressing: Endian-ness and alignment
• Big Endian:
address of most significant byte = word address
(xx00 = Big End of word)
– IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
• Little Endian: address of least significant byte = word address
(xx00 = Little End of word)
– Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
little endian byte 0
3
2
1
0
msb
lsb
0
0
big endian byte 0
1
2
1
2
3
3
Aligned
Alignment: require that objects fall on address
that is multiple of their size.
Not
Aligned
CS141-L1-39
Tarun Soni, Summer ‘03
Addressing Modes
how do we specify the operand we want?
– Register direct
– Immediate (literal)
– Direct (absolute)
–
–
–
–
–
–
–
Register indirect
M[R3]
Base+Displacement M[R3 + 10000]
if register is the program counter, this is PC-relative
Base+Index
M[R3 + R4]
Scaled Index
M[R3 + R4*d + 10000]
Autoincrement
M[R3++]
Autodecrement
M[R3 - -]
– Memory Indirect
CS141-L1-40
R3
#25
M[10000]
M[ M[R3] ]
Tarun Soni, Summer ‘03
Addressing Modes
Addressing mode
Example
Meaning
Register
Add R4,R3
R4R4+R3
Immediate
Add R4,#3
R4 R4+3
Displacement
Add R4,100(R1) R4 R4+Mem[100+R1]
Register indirect
Add R4,(R1)
Indexed / Base
Add R3,(R1+R2) R3 R3+Mem[R1+R2]
Direct or absolute
Add R1,(1001)
R1 R1+Mem[1001]
Memory indirect
Add R1,@(R3)
R1 R1+Mem[Mem[R3]]
Auto-increment
Add R1,(R2)+
R1 R1+Mem[R2]; R2 R2+d
Auto-decrement
Add R1,–(R2)
R2 R2–d; R1 R1+Mem[R2]
Scaled
CS141-L1-41
Add R1,100(R2)[R3]
R4 R4+Mem[R1]
R1  R1+Mem[100+R2+R3*d]
Tarun Soni, Summer ‘03
Addressing Modes: Usage
3 programs measured on machine with all address modes (VAX)
--- Displacement:
42% avg, 32% to 55%
--- Immediate:
33% avg, 17% to 43%
75%
85%
--- Register deferred (indirect): 13% avg, 3% to 24%
--- Scaled:
7% avg, 0% to 16%
--- Memory indirect:
3% avg, 1% to 6%
--- Misc:
2% avg, 0% to 3%
75% displacement & immediate
88% displacement, immediate & register indirect
similar measurements:
- 16 bits is enough for the immediate address 75 to 80% of the time
- 16 bits is enough of a displacement 99% of the time.
CS141-L1-42
Tarun Soni, Summer ‘03
Addressing mode usage: Application Specific
Program
Base + Displacement
Immediate
Scaled
Index
Memory
Indirect
All Others
TEX
56%
43%
0
1
0
Spice
58%
17%
16%
6%
3%
GCC
51%
39%
6%
1%
3%
CS141-L1-43
Tarun Soni, Summer ‘03
MIPS Addressing Modes
register direct
OP
rs
add $1, $2, $3
immediate
OP
rs
add $1, $2, #35
base + displacement
lw $1, disp($2)
rt
rt
CS141-L1-44
sa
funct
immediate
rs
immediate
rt
rd
register indirect
 disp = 0
absolute
 (rs) = 0
Tarun Soni, Summer ‘03
MIPS ISA-so far
•
•
•
•
•
•
•
fixed 32-bit instructions
3 instruction formats
3-operand, load-store architecture
32 general-purpose registers (integer, floating point)
– R0 always equals 0.
2 special-purpose integer registers, HI and LO, because multiply and
divide produce more than 32 bits.
registers are 32-bits wide (word)
register, immediate, and base+displacement addressing modes
But what about the actual instructions themselves ??
CS141-L1-45
Tarun Soni, Summer ‘03
Typical Operations (little change since 1960)
Data Movement
Load (from memory)
Store (to memory)
memory-to-memory move
register-to-register move
input (from I/O device)
output (to I/O device)
push, pop (to/from stack)
Arithmetic
integer (binary + decimal) or FP
Add, Subtract, Multiply, Divide
Shift
shift left/right, rotate left/right
Logical
not, and, or, set, clear
Control (Jump/Branch)
unconditional, conditional
Subroutine Linkage
call, return
Interrupt
trap, return
Synchronization
test & set (atomic r-m-w)
String
search, translate
Graphics (MMX)
parallel subword ops (4 16bit add)
CS141-L1-46
Tarun Soni, Summer ‘03
80x86 Instruction usage
° Rank instruction
Integer Av erage Percent total executed
1
load
22%
2
conditional branch
20%
3
compare
16%
4
store
12%
5
add
8%
6
and
6%
7
sub
5%
8
mov e register-register 4%
9
call
1%
10
return
1%
Total
96%
° Simple instructions dominate instruction frequency
CS141-L1-47
Tarun Soni, Summer ‘03
Instruction usage
• Support the simple instructions, since they
will dominate the number of instructions executed:
load,
store,
add,
subtract,
move register-register,
and,
shift,
compare equal, compare not equal,
branch,
jump,
call,
return;
Compiler Issues
orthogonality: no special registers, few
special cases, all operand modes available
with any data type or instruction type
completeness: support for a wide range of
operations and target applications
regularity: no overloading for the meanings
of instruction fields
streamlined: resource needs easily
determined
Register Assignment is critical too
Easier if lots of registers
CS141-L1-48
Tarun Soni, Summer ‘03
MIPS Instructions
•
•
•
•
•
CS141-L1-49
arithmetic
– add, subtract, multiply, divide
logical
– and, or, shift left, shift right
data transfer
– load word, store word
conditional Branch
unconditional Jump
Tarun Soni, Summer ‘03
MIPS Instructions
•
arithmetic
– add, subtract, multiply, divide
Instruction
add
subtract
add immediate
add unsigned
subtract unsigned
add imm. unsign.
multiply
multiply unsigned
divide
Example
add $1,$2,$3
sub $1,$2,$3
addi $1,$2,100
addu $1,$2,$3
subu $1,$2,$3
addiu $1,$2,100
mult $2,$3
multu$2,$3
div $2,$3
divide unsigned
divu $2,$3
move from Hi
move from Lo
mfhi $1
mflo $1
CS141-L1-50
Meaning
$1 = $2 + $3
$1 = $2 – $3
$1 = $2 + 100
$1 = $2 + $3
$1 = $2 – $3
$1 = $2 + 100
Hi, Lo = $2 x $3
Hi, Lo = $2 x $3
Lo = $2 ÷ $3,
Hi = $2 mod $3
Lo = $2 ÷ $3,
Hi = $2 mod $3
$1 = Hi
$1 = Lo
Comments
3 operands; exception possible
3 operands; exception possible
+ constant; exception possible
3 operands; no exceptions
3 operands; no exceptions
+ constant; no exceptions
64-bit signed product
64-bit unsigned product
Lo = quotient, Hi = remainder
Unsigned quotient & remainder
Used to get copy of Hi
Used to get copy of Lo
Tarun Soni, Summer ‘03
MIPS Instructions
•
logical
– and, or, shift left, shift right
Instruction
and
or
xor
nor
and immediate
or immediate
xor immediate
shift left logical
shift right logical
shift right arithm.
shift left logical
shift right logical
shift right arithm.
CS141-L1-51
Example
and $1,$2,$3
or $1,$2,$3
xor $1,$2,$3
nor $1,$2,$3
andi $1,$2,10
ori $1,$2,10
xori $1, $2,10
sll $1,$2,10
srl $1,$2,10
sra $1,$2,10
sllv $1,$2,$3
srlv $1,$2, $3
srav $1,$2, $3
Meaning
$1 = $2 & $3
$1 = $2 | $3
$1 = $2 Å $3
$1 = ~($2 |$3)
$1 = $2 & 10
$1 = $2 | 10
$1 = ~$2 &~10
$1 = $2 << 10
$1 = $2 >> 10
$1 = $2 >> 10
$1 = $2 << $3
$1 = $2 >> $3
$1 = $2 >> $3
Comment
3 reg. operands; Logical AND
3 reg. operands; Logical OR
3 reg. operands; Logical XOR
3 reg. operands; Logical NOR
Logical AND reg, constant
Logical OR reg, constant
Logical XOR reg, constant
Shift left by constant
Shift right by constant
Shift right (sign extend)
Shift left by variable
Shift right by variable
Shift right arith. by variable
Tarun Soni, Summer ‘03
MIPS Instructions
•
data transfer
– load word, store word
Instruction
SW 500(R4), R3
SH 502(R2), R3
SB 41(R3), R2
Comment
Store word
Store half
Store byte
LW R1, 30(R2)
LH R1, 40(R3)
LHU R1, 40(R3)
LB R1, 40(R3)
LBU R1, 40(R3)
Load word
Load halfword
Load halfword unsigned
Load byte
Load byte unsigned
LUI R1, 40
Load Upper Immediate (16 bits shifted left by 16)
Why need LUI?
LUI
R5
CS141-L1-52
R5
0000 … 0000
Tarun Soni, Summer ‘03
MIPS Control Instructions
•
•
•
How do you specify the destination of a branch/jump?
studies show that almost all conditional branches go short distances from the
current program counter (loops, if-then-else).
– we can specify a relative address in much fewer bits than an absolute
address
– e.g., beq $1, $2, 100 => if ($1 == $2) PC = PC + 100 * 4
How do we specify the condition of the branch?
° Condition Codes
Processor status bits are set as a side-effect of arithmetic instructions (possibly
on Moves) or explicitly by compare or test instructions.
add r1, r2, r3
bz label
° Condition Register
cmp r1, r2, r3
bgt r1, label
° Compare and Branch
bgt r1, r2, label
CS141-L1-53
Tarun Soni, Summer ‘03
Conditional Branch Distance
Int. Avg.
FP Avg.
40%
30%
20%
10%
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0%
Bits of Branch Dispalcement
CS141-L1-54
Tarun Soni, Summer ‘03
Conditional Branching
• PC-relative since most branches are relatively close
to the current PC address
• At least 8 bits suggested (± 128 instructions)
• Compare Equal/Not Equal most important for integer
programs (86%)
7%
LT/GE
40%
Int Avg.
7%
GT/LE
23%
FP Avg.
86%
EQ/NE
37%
0%
50%
100%
Freque ncy of comparison
types in branches
CS141-L1-55
Tarun Soni, Summer ‘03
Conditional Branching
• Compare and Branch
–
BEQ rs, rt, offset
if R[rs] == R[rt] then PC-relative branch
–
BNE rs, rt, offset
<>
• Compare to zero and Branch
–
BLEZ rs, offset
if R[rs] <= 0 then PC-relative branch
–
BGTZ rs, offset
>
–
BLT
<
–
BGEZ
>=
–
BLTZAL rs, offset if R[rs] < 0 then branch and link (into R 31)
–
BGEZAL
>=
• Remaining set of compare and branch take two instructions
• Almost all comparisons are against zero!
MIPS Branch Instructions
•
•
•
beq, bne beq r1, r2, addr => if (r1 == r2) goto addr
slt $1, $2, $3 => if ($2 < $3) $1 = 1; else $1 = 0
these, combined with $0, can implement all fundamental branch conditions
– Always, never, !=, ==, >, <=, >=, <, >(unsigned), <= (unsigned), ...
CS141-L1-56
Tarun Soni, Summer ‘03
Jumps
•
•
need to be able to jump to an absolute address sometime
need to be able to do procedure calls and returns
•
•
jump -- j 10000 => PC = 10000
jump and link -- jal 100000 => $31 = PC + 4; PC = 10000
– used for procedure calls
•
jump register -- jr $31 => PC = $31
– used for returns, but can be useful for lots of other things.
CS141-L1-57
Tarun Soni, Summer ‘03
Jumps
MIPS Instruction Formats
6 bits
5 bits
5 bits
R
OP
rs
rt
I
OP
rs
rt
J
OP
5 bits
5 bits
6 bits
rd
sa
funct
Immediate (16 bits)
target
MIPS Addressing Formats: Branches and Jumps
•
•
Branch (e.g., beq) uses PC-relative addressing mode (few bits if addr typically close)
uses base+displacement mode, with the PC being the base.
Jump uses pseudo-direct addressing mode. 26 bits of the address is in the instruction,
the rest is taken from the PC.
instruction
6
CS141-L1-58
26
program counter
6
26
jump destination address
Tarun Soni, Summer ‘03
MIPS Branch & Jump Instructions
Instruction
branch on equal
branch on not eq.
set on less than
set less than imm.
set less than uns.
set l. t. imm. uns.
jump
jump register
jump and link
CS141-L1-59
Example
Meaning
beq $1,$2,100
if ($1 == $2) go to PC+4+100
Equal test; PC relative branch
bne $1,$2,100
if ($1!= $2) go to PC+4+100
Not equal test; PC relative
slt $1,$2,$3
if ($2 < $3) $1=1; else $1=0
Compare less than; 2’s comp.
slti $1,$2,100
if ($2 < 100) $1=1; else $1=0
Compare < constant; 2’s comp.
sltu $1,$2,$3
if ($2 < $3) $1=1; else $1=0
Compare less than; natural numbers
sltiu $1,$2,100
if ($2 < 100) $1=1; else $1=0
Compare < constant; natural numbers
j 10000
go to 10000
Jump to target address
jr $31
go to $31
For switch, procedure return
jal 10000
$31 = PC + 4; go to 10000
For procedure call
Tarun Soni, Summer ‘03
Stacks
Stacking of Subroutine Calls & Returns and Environments:
A
A:
CALL B
B:
A
B
A
B
A
B
CALL C
C:
RET
RET
C
A
Some machines provide a memory stack as part of the architecture
(e.g., VAX)
Sometimes stacks are implemented via software convention
(e.g., MIPS)
CS141-L1-60
Tarun Soni, Summer ‘03
Stacks
Useful for stacked environments/subroutine call & return even if
operand stack not part of architecture
Stacks that Grow Up vs. Stacks that Grow Down:
Next
Empty?
SP
Last
Full?
c
b
a
inf. Big
0 Little
grows
up
grows
down
0 Little
inf. Big
Memory
Addresses
Little --> Big/Last Full
Little --> Big/Next Empty
POP:
Read from Mem(SP)
Decrement SP
POP:
Decrement SP
Read from Mem(SP)
PUSH:
Increment SP
Write to Mem(SP)
PUSH:
Write to Mem(SP)
Increment SP
CS141-L1-61
Tarun Soni, Summer ‘03
Stack Frames
High Mem
ARGS
Callee Save
Registers
Reference args and
local variables at
fixed (positive) offset
from FP
(old FP, RA)
Local Variables
FP
Grows and shrinks during
expression evaluation
SP
Low Mem
• Many variations on stacks possible (up/down, last pushed / next )
• Block structured languages contain link to lexically enclosing frame
• Compilers normally keep scalar variables in registers, not memory!
CS141-L1-62
Tarun Soni, Summer ‘03
MIPS Software Register Conventions
0
zero constant 0
16 s0 callee saves
1
at
. . . (caller can clobber)
2
v0 expression evaluation &
23 s7
3
v1 function results
24 t8
4
a0 arguments
25 t9
5
a1
26 k0 reserved for OS kernel
6
a2
27 k1
7
a3
28 gp Pointer to global area
8
t0
...
15 t7
CS141-L1-63
reserved for assembler
temporary (cont’d)
temporary: caller saves
29 sp Stack pointer
(callee can clobber)
30 fp
frame pointer
31 ra
Return Address (HW)
Tarun Soni, Summer ‘03
MIPS Branch & Jump Instructions
MIPS operands
Name
32 regi sters
2 3 0 memory
words
Example
$s0-$s7, $t0-$t9, $zero,
$a0-$a3, $v0-$v1, $gp,
$fp, $sp, $ra, $at
Memory[0],
Memory[4], ...,
Memory[4294967292]
Three operands; data in registers
subtract
sub $s1, $s2, $s3
$s1 = $s2 - $s3
Three operands; data in registers
add immediate
addi $s1, $s2, 100
lw $s1, 100($s2)
sw $s1, 100($s2)
lb $s1, 100($s2)
sb $s1, 100($s2)
lui $s1, 100
$s1 = $s2 + 100
$s1 = Memory[ $s2 + 100]
Memory[ $s2 + 100] = $s1
$s1 = Memory[ $s2 + 100]
Memory[ $s2 + 100] = $s1
$s1 = 100 * 2 16
Used to add constants
store word
load byte
store byte
load upper
immediate
Condi ti onal
branch
Uncondi tional j ump
CS141-L1-64
sequential words differ by 4. Memory holds data structures, such as arrays,
and spilled registers, such as those saved on procedure calls.
add
load word
Data transfer
Accessed only by data transfer instructions. MIPS uses byte addresses, so
MIPS assembly language
Meani ng
Example
$s1 = $s2 + $s3
add $s1, $s2, $s3
Instructi on
Category
Arithmetic
Comments
Fast locations for data. In MIPS, data must be in registers to perform
arithmetic. MIPS register $zero always equals 0. Register $at is
reserved for the assembler to handle large constants.
Comments
Word from memory to register
Word from register to memory
Byte from memory to register
Byte from register to memory
Loads constant in upper 16 bits
branch on equal
beq
$s1, $s2, 25
if ($s1 == $s2) go to
PC + 4 + 100
Equal test; PC-relative branch
branch on not equal
bne
$s1, $s2, 25
if ($s1 != $s2) go to
PC + 4 + 100
Not equal test; PC-relative
set on less than
slt
$s1, $s2, $s3
if ($s2 < $s3) $s1 = 1;
else $s1 = 0
Compare less than; for beq, bne
set less than
immediate
slti
if ($s2 < 100) $s1 = 1;
else $s1 = 0
Compare less than constant
jump
j
jr
jal
go to 10000
go to $ra
$ra = PC + 4; go to 10000
Jump to target address
jump register
jump and link
$s1, $s2, 100
2500
$ra
2500
For switch, procedure return
For procedure call
Tarun Soni, Summer ‘03
Example: Swap()
•
swap(int v[], int k);
{ int temp;
temp = v[k]
v[k] = v[k+1];
v[k+1] = temp;
}
Can we figure out the code?
swap:
muli $2, $5, 4
add $2, $4, $2
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31
CS141-L1-65
//
//
//
//
//
//
//
//
$4=v, $5=k
$2 = k*4
$2 = v+(4*k)
$15=temp= *($2+0)=*(v+k)
$16 = *($2+4) = *(v+k+1)
*(v+k) = $16 = *(v+k+1)
*(v+k+1) = $15 = temp
return;
Tarun Soni, Summer ‘03
Example: Leaf_procedure()
•
Procedures?
int PairDiff(int a, int b, int c,int d);
{ int temp;
temp = (a+b)-(c+d);
return temp;
}
Assume caller puts $a0-$a3 = a,b,c,d and wants result in $v0
PairDiff:
//
sub $sp,$sp,12 // Make space for 3 temp locations
sw $t1, 8($sp) // save $t1 (optional if MIPS convention)
sw $t0, 4($sp) // save $t0 (optional if MIPS convention)
sw $s0, 0($sp) // save $s0
add $t0,$a0,$a1 // (t0=a+b)
add $t1,$a2,$a3 // (t1=c+d)
sub $s0,$t0,$t1 // (s0=t0-t1)
add $v0,$s0,$zero // store return value in $v0
lw $s0,0($sp)
// restore registers
lw $t0,4($sp)
// (optional if MIPS convention)
lw $t1,8($sp)
// (optional if MIPS convention)
add $sp,$sp,12
// ‘pop’ the stack
jr $ra
// The actual return to calling routine
CS141-L1-66
Tarun Soni, Summer ‘03
Example: Nested_procedure()
•
•
What about nested procedures? $ra ??
Recursive procedures?
Assume $a0 = n
fact:
sub $sp,$sp,8
sw $ra, 4($sp)
sw $a0, 4($sp)
int fact(int n);
{
if(n<1) return(1);
else return (n*fact(n-1));
}
//
// Make space for 2 temp locations
// save return address
// save argument n
slt $t0,$a0,1
// test for n<1
beq $t0,$zero, L1 // if (n>=1) goto L1
add $v0,$zero,1
add $sp,$sp,8
jr $ra
L1: sub $a0,$a0,1
jal fact;
lw $a0,0($sp)
lw $ra,4($sp)
add $sp,$sp,8
mult $v0,$a0,$v0
jr $ra
CS141-L1-67
// $v0=1
// ‘pop’ the stack
// return
// n--;
// call fact again.
//
//
//
//
//
(n<1) case
(n>=1) case
fact() returns here. Restore n
restore return address
‘pop’ stack
$v0 = n*fact(n-1)
return to caller
Tarun Soni, Summer ‘03
Other Architectures
•
•
•
Design alternative:
– provide more powerful operations (e.g., DSP, Encryption engines, Java
Processors)
– goal is to reduce number of instructions executed
– danger is a slower cycle time and/or a higher CPI
Sometimes referred to as “RISC vs. CISC”
– virtually all new instruction sets since 1982 have been RISC
– VAX: minimize code size, make assembly language easy
instructions from 1 to 54 bytes long!
We’ll look at PowerPC and 80x86
CS141-L1-68
Tarun Soni, Summer ‘03
Power PC
•
•
•
Indexed addressing
– example:
lw $t1,$a0+$s3
– What do we have to do in MIPS?
// $t1=Memory[$a0+$s3]
Update addressing
– update a register as part of load (for marching through arrays)
– example:
lwu $t0,4($s3)
// $t0=Memory[$s3+4];$s3=$s3+4
– What do we have to do in MIPS?
Others:
– load multiple/store multiple
– a special counter register “bc Loop”
decrement counter, if not 0 goto loop
CS141-L1-69
Tarun Soni, Summer ‘03
x86: Volume is beautiful
•
•
•
•
•
•
1978: The Intel 8086 is announced (16 bit architecture)
1980: The 8087 floating point coprocessor is added
1982: The 80286 increases address space to 24 bits, +instructions
1985: The 80386 extends to 32 bits, new addressing modes
1989-1995: The 80486, Pentium, Pentium Pro add a few instructions
(mostly designed for higher performance)
1997: MMX is added
“This history illustrates the impact of the “golden handcuffs” of compatibility”
“adding new features as someone might add clothing to a packed bag”
“an architecture that is difficult to explain and impossible to love”
“what the 80x86 lacks in style is made up in quantity,
making it beautiful from the right perspective”
CS141-L1-70
Tarun Soni, Summer ‘03
x86: Complex Instruction Set
•
•
•
See text for a detailed description….
Complexity:
– Instructions from 1 to 17 bytes long
– one operand must act as both a source and destination
– one operand can come from memory
– complex addressing modes
e.g., “base or scaled index with 8 or 32 bit displacement”
Saving grace:
– the most frequently used instructions are not too difficult to build
– compilers avoid the portions of the architecture that are slow
CS141-L1-71
Tarun Soni, Summer ‘03
Comparing Instruction Set Architectures
Design-time metrics:
° Can it be implemented, in how long, at what cost?
° Can it be programmed? Ease of compilation?
Static Metrics:
° How many bytes does the program occupy in memory?
Dynamic Metrics:
° How many instructions are executed?
° How many bytes does the processor fetch to execute the program?
° How many clocks are required per instruction?
° How "lean" a clock is practical?
Best Metric: Time to execute the program!
CPI
This depends on
 instruction set,
 processor organization, and
 compilation techniques.
Inst. Count
CS141-L1-72
Cycle Time
Tarun Soni, Summer ‘03
Instruction Set Architectures: What did we learn today?
•
•
•
MIPS is a general-purpose register, load-store, fixed-instruction-length
architecture.
MIPS is optimized for fast pipelined performance, not for low instruction count
Four principles of IS architecture
– simplicity favors regularity
– smaller is faster
– good design demands compromise
– make the common case fast
CS141-L1-73
Tarun Soni, Summer ‘03
Todays Agenda
Administrivia
Technology trends
Computer organization: concept of abstraction
Instruction Set Architectures: Definition, types, examples
Instruction formats: operands, addressing modes
Operations: load, store, arithmetic, logical
Control instructions: branch, jump, procedures
Stacks
Examples: in-line code, procedure, nested-procedures
Other architectures
CS141-L1-74
Tarun Soni, Summer ‘03
Download