Itanium CSE 820

advertisement
Itanium
CSE 820
IA-64
Intel introduced a new ISA with no
backward compatibility to x86 IA-32.
What do you get from a clean sheet?
Michigan State University
Computer Science and Engineering
IA-64
The first product line is the Itanium.
Status:
– NEC announced that a 32-processor,
Itanium 2-based server has achieved the
world's best TPC-C benchmark result
on a 32-processor SMP platform.
– 1GHz, 3MB tertiary cache, 512 GB RAM
Michigan State University
Computer Science and Engineering
SPEC (top in 3/03)
SPECint2000
• Pentium4 3GHz
• IBM 690 1.3GHz
• Pentium4 2.2GHz
• Itanium 2 1GHz
SPECfp2000
• Itanium 2 1GHz
• IBM 690 1.3GHz
• Pentium4 3GHz
1100
839
811
810
1431
1266
1090
Michigan State University
Computer Science and Engineering
Registers
• 128@ 65-bit general-purpose registers
– 64-bit + NaT
• 128@ 82-bit floating-point registers
– 2 extra exponent bits over IEEE 80-bit
• 64 @ 1-bit predicate registers
• 8 @ 64-bit branch registers
– for indirect branches
• Other registers for system control,
memory mapping, performance counters,
and communication with the OS
Michigan State University
Computer Science and Engineering
Integer Registers
• 0-31 general purpose
• 32-128 used as a register stack
similar to SPARC: renaming registers
for function calls; includes a frame
pointer (CFM)
Also, special hardware handles stack
overflow
Michigan State University
Computer Science and Engineering
Register Rotation
Register rotation of registers 32-128
is used for allocating registers in
software-pipelined loops
When combined with predication, loops can be
unrolled without separate prologue and
epilogue—reducing the code expansion
overhead of loop unrolling
That is, the overhead cost of loop unrolling is
reduced so smaller loops can be unrolled.
Michigan State University
Computer Science and Engineering
Explicit Parallelism
One important aspect of the IA-64 is to
allow the compiler to do more and
to allow the compiler to communicate
more information to hardware.
In particular, the compiler can indicate
when an instruction cannot be executed
in parallel with its successors.
Michigan State University
Computer Science and Engineering
Group
A sequence of consecutive instructions with no
data dependences among them.
All instructions can be executed in parallel, if
sufficient hardware and if memory
dependences are preserved.
A group can be arbitrarily long, but the compiler
must explicitly indicate the boundary with a
stop instruction between groups.
Michigan State University
Computer Science and Engineering
Bundle
128-bit wide
– Three 41-bit instructions
• 4 MSB are opcode
• 6 LSB specify predicate registers
– 5-bit template
• Encoded
• Specifies execution unit for each instruction
• Indicates “stops”
Opcode combines MSB 4 bits + template info
Michigan State University
Computer Science and Engineering
Execution Slots
•
•
•
•
•
I-unit: ALU ops, shifts, moves
M-unit: ALU ops, loads, stores
F-unit: FP ops
B-unit: Branches
L+X: Extended immediates, stops, NOP
2-instruction slots for 64-bit immediates
Michigan State University
Computer Science and Engineering
Predication
• Predicate registers are set using
compare or test instructions
– 10 tests
– Write 2 predicate registers (complement)
– Multiple comparisons can be handled in
one instruction
• A conditional branch is simply a
predicated branch
Michigan State University
Computer Science and Engineering
Deferred Exception Handling
Itanium uses poison bits:
NaT = “Not a Thing” (65th GPR bit)
NaTVal = “Not a Value” (special FP value)
Generated by speculative loads
(all ops will propagate NaT and NaTVal)
There exist nonspeculative loads which do not
defer exceptions
FP exceptions are handled separately using special
FP status registers.
Michigan State University
Computer Science and Engineering
Deferred Exception Handling
If NaT (or NaTVal)
if nonspeculative, e.g store, an immediate
exception is raised
if chk.s, branch to a compiler-generated
routine to recover from speculative op.
(special instructions exist so O/S can save
registers with NaT on context switch)
Michigan State University
Computer Science and Engineering
Advanced Loads
Hoist loads above stores it may be
dependent upon
Instruction ld.a generates entry in ALAT
table which stores register destination
and memory address.
On store, the ALAT is accessed by
memory address to check for conflict.
If conflict, mark ALAT entry as invalid.
Michigan State University
Computer Science and Engineering
Advanced Load
Before any nonspeculative instruction (store)
is to use the value from an advanced load
the ALAT is checked. If OK, clear ALAT.
If not OK
– If ld.c reexecute load
– If chk.a reexecute load and any speculative
instructions which depend on the load
Michigan State University
Computer Science and Engineering
Michigan State University
Computer Science and Engineering
Michigan State University
Computer Science and Engineering
Download