IA-64 ISA A Summary JinLin Yang Phil Varner

advertisement
IA-64 ISA
A Summary
JinLin Yang
Phil Varner
Shuoqi Li
Overview
• Summary of IA-64
• Register model
• Instruction format and support for explicit
parallelism
• Instruction set basics
• Predication and speculation support
• Conclusion
Summary of IA-64
•
•
•
•
•
•
RISC-style
Register-Register ISA
Compiler-based ILP support (VLIW)
Predication
Memory-reference speculation
Basis for Intel Itanium processor
The IA-64 Register Model
Components
• 128 64-bit general-purpose registers (actually
65-bits);
• 128 82-bit floating point registers;
• 64 1-bit predicate registers;
• 8 64-bit branch registers;
• several registers used for system control,
memory mapping, performance counters, and
communication with the OS.
Register Stack Mechanism(1)
• This technique is used by integer registers to
accelerate procedure calls. (similar to register
windows in SPARC)
• Registers 0-31 are always accessible.
• Registers 32-128 are used as a register stack and
each procedure is allocated a set of registers for its
use.
Register Stack Mechanism(2)
CFM pointer
• CFM pointer points to the set of
registers to be used by a given
procedure
Register Stack Mechanism(3)
How does it work?
1) The new register stack frame is created by
registers renaming. So the registers to be
used by a given procedure always starts at
R32.
2) The callee executes an alloc instruction
to allocate both local and output registers
for caller.
Register Stack Mechanism(4)
3) The CFM pointer is updated, so R32 of the
called procedure points to the output
registers of the calling procedure.
(I think there is a typo in the text!)
Register Rotation
• Both the integer and floating point registers
support register rotation for registers 32-128
Benefits of register rotation
• Makes it easy to allocate registers in
software pipelined loops
• When combined with predication, it can
reduce the code expansion incurred by
using software pipelining.
• Makes this technique usable for loops with
small number of iterations.
Instruction Format and Support
for Explicit Parallelism
IA-64 has the combination of :
• Major benefits of VLIW-approach
• Greater Flexibility
Inherit major benefits of VLIW
• Implicit parallelism among operations in an
instruction
• Fixed formatting of the operation fields
• Relying on the compiler to detect ILP and
schedule insts into slots
Greater Flexibility
• Flexibility in formatting of instructions
• Allowing the compiler to indicate when an
inst cannot be executed in parallel with its
successors
Implicit Parallelism
• Placing instructions into instruction groups
Instruction group
• a sequence of consecutive instructions with
no register data dependency
• Instructions in the group can be executed in
parallel
• Instruction group can be arbitrarily long
• Compiler must explicitly indicate group
boundary by a “stop”
Fixed Formatting
• Instructions are encoded in bundles
Bundles
• 128-bit wide
• 5-bit template field -- specify exec unit type
needed by each inst in the bundle and possible
presence of stop
– I-unit, M-unit, F-unit, B-unit, L + X(used to encode 64bit immediate and a few special instructions)
– One Execution Unit Slot can hold more than one type
of Instruction
• Three 41-bit instructions
Different Code Scheduling
Algorithms
• Code scheduled to minimize the number of
bundles – more stalls between bundles due
to data dependency
• Code scheduled to minimize the number of
cycles – more empty slots
• The number of empty slots and the use of
bundles may lead to much larger code size
Instruction Set Basics
• Inst encoding
– Major opcode (high-order 4-bit opcode+exec
unit slot designation bits)
– Specification bits of predicate register that
guards the instruction (low order 6 bits)
• The encoding strategy leads to various inst
formats for each inst type
Predication and Speculation
• Nearly every instruction can be predicated
• Specify by predicate register (lower six bits
of each inst)
• if-conversion and code motion have lower
overhead
• Conditional branch is just branch with
guarding predicate
Setting predicates
• Predicates set using compare or test
instructions
• compare
– 10 different tests
– two predicate register destinations
– written: result + complement or logical function
+ complement
• multiple comparisons
Speculation
• control speculation - speculated inst past
branch
• exception handling
• memory reference speculation
Deferred exception handling
• NaT - Not A Thing
– equivalent of poison bits
– make GPRs 65 bits wide
• NaTVal - FP registers - Not A Thing Value
– invalid IEEE FP value
– FP exceptions handled separately
Deferred exception handling II
• Only generated by speculative load
– all inst will propagate
– nonspec cannot defer NaT
Deferred exception handling III
• Non-speculated instruction gets NaT immediate and unrecoverable exception
• chk.s
– detect NaT or NaTVal
– branch to routine
• provides special instructions for storing
NaT and NaTVal registers for saving
processor state
Memory reference
• advanced loads - spec moved from a store
on which it was dependant
• ld.a
• special entry in ALAT
– register destination of the load
– address of the accessed memory location
Memory reference II
• when store is executed - active ALAT
entries looked up, if ALAT entry with same
address, ALAT entry marked as invalid
• Any nonspeculative instruction must check
ALAT before using value from ld.a
– if ALAT value is valid, clear ALAT entry
– if not,
• ld.c - reload from memory (only used with ld.a)
• chk.a - reload and execute "clean up" code
Conclusion
• Hits a lot of hot technologies
–
–
–
–
RISC
VLIW
Predication
Speculation
• Itanium will/may show viability of
approach
Download