Itanium CSE 820 IA-64 Intel introduced a new ISA with no backward compatibility to x86 IA-32. What do you get from a clean sheet? Michigan State University Computer Science and Engineering IA-64 The first product line is the Itanium. Status: – NEC announced that a 32-processor, Itanium 2-based server has achieved the world's best TPC-C benchmark result on a 32-processor SMP platform. – 1GHz, 3MB tertiary cache, 512 GB RAM Michigan State University Computer Science and Engineering SPEC (top in 3/03) SPECint2000 • Pentium4 3GHz • IBM 690 1.3GHz • Pentium4 2.2GHz • Itanium 2 1GHz SPECfp2000 • Itanium 2 1GHz • IBM 690 1.3GHz • Pentium4 3GHz 1100 839 811 810 1431 1266 1090 Michigan State University Computer Science and Engineering Registers • 128@ 65-bit general-purpose registers – 64-bit + NaT • 128@ 82-bit floating-point registers – 2 extra exponent bits over IEEE 80-bit • 64 @ 1-bit predicate registers • 8 @ 64-bit branch registers – for indirect branches • Other registers for system control, memory mapping, performance counters, and communication with the OS Michigan State University Computer Science and Engineering Integer Registers • 0-31 general purpose • 32-128 used as a register stack similar to SPARC: renaming registers for function calls; includes a frame pointer (CFM) Also, special hardware handles stack overflow Michigan State University Computer Science and Engineering Register Rotation Register rotation of registers 32-128 is used for allocating registers in software-pipelined loops When combined with predication, loops can be unrolled without separate prologue and epilogue—reducing the code expansion overhead of loop unrolling That is, the overhead cost of loop unrolling is reduced so smaller loops can be unrolled. Michigan State University Computer Science and Engineering Explicit Parallelism One important aspect of the IA-64 is to allow the compiler to do more and to allow the compiler to communicate more information to hardware. In particular, the compiler can indicate when an instruction cannot be executed in parallel with its successors. Michigan State University Computer Science and Engineering Group A sequence of consecutive instructions with no data dependences among them. All instructions can be executed in parallel, if sufficient hardware and if memory dependences are preserved. A group can be arbitrarily long, but the compiler must explicitly indicate the boundary with a stop instruction between groups. Michigan State University Computer Science and Engineering Bundle 128-bit wide – Three 41-bit instructions • 4 MSB are opcode • 6 LSB specify predicate registers – 5-bit template • Encoded • Specifies execution unit for each instruction • Indicates “stops” Opcode combines MSB 4 bits + template info Michigan State University Computer Science and Engineering Execution Slots • • • • • I-unit: ALU ops, shifts, moves M-unit: ALU ops, loads, stores F-unit: FP ops B-unit: Branches L+X: Extended immediates, stops, NOP 2-instruction slots for 64-bit immediates Michigan State University Computer Science and Engineering Predication • Predicate registers are set using compare or test instructions – 10 tests – Write 2 predicate registers (complement) – Multiple comparisons can be handled in one instruction • A conditional branch is simply a predicated branch Michigan State University Computer Science and Engineering Deferred Exception Handling Itanium uses poison bits: NaT = “Not a Thing” (65th GPR bit) NaTVal = “Not a Value” (special FP value) Generated by speculative loads (all ops will propagate NaT and NaTVal) There exist nonspeculative loads which do not defer exceptions FP exceptions are handled separately using special FP status registers. Michigan State University Computer Science and Engineering Deferred Exception Handling If NaT (or NaTVal) if nonspeculative, e.g store, an immediate exception is raised if chk.s, branch to a compiler-generated routine to recover from speculative op. (special instructions exist so O/S can save registers with NaT on context switch) Michigan State University Computer Science and Engineering Advanced Loads Hoist loads above stores it may be dependent upon Instruction ld.a generates entry in ALAT table which stores register destination and memory address. On store, the ALAT is accessed by memory address to check for conflict. If conflict, mark ALAT entry as invalid. Michigan State University Computer Science and Engineering Advanced Load Before any nonspeculative instruction (store) is to use the value from an advanced load the ALAT is checked. If OK, clear ALAT. If not OK – If ld.c reexecute load – If chk.a reexecute load and any speculative instructions which depend on the load Michigan State University Computer Science and Engineering Michigan State University Computer Science and Engineering Michigan State University Computer Science and Engineering