with a focus on floating point For floating point (i.e., real numbers), MASM supports: real4 single precision; IEEE standard; analogous to float real8 double precision; IEEE standard; analogous to double real10 double extended precision Not IEEE standard NaN = Not a Number (see p. 4-14 of v1) SSE2 supports 32 and 64 bit f.p. data x87 supports 32, 64, and 80 bit f.p. data Note: These are 24-bit binary numbers. Here they are in base 10: 2.00000000000000 1.99999988079071 SSE2 = Streaming SIMD Extensions 2 SIMD = Single Instruction Multiple Data instructions SSE2 introduced in 2000 on Pentium 4 and Intel Xeon processors. 1996 1998 1999 2001 2003 2006 2006 2007 2008 Intel MMX AMD 3DNow! Intel SSE on P3 Intel SSE2 on P4 Intel SSE3 (since Prescott P4) Intel SupplementalSSE3 (since Woodcrest Xeons) Intel SSE4 (4.1 and 4.2) AMD SSE5 (proposed 2007, implemented 2011) Intel AVX (proposed 2008, implemented 2011 in Intel Westmere and AMD Bulldozer) XMM registers go from 128 bit to 256 bit, called YMM. 1. You must use MASM v6.15 or newer for SIMD support. (MASM v6.15 is available from the course software web page.) 2. You must enable MASM support for these instructions with the following: .686 ;instructions for Pentium Pro (or better) .xmm ;allow simd instructions .model flat, stdcall ;no crazy segments! Each one of the 8 128-bit registers (xmm0...xmm7) can hold: 16 packed 1 byte integers 8 packed word (2 byte) integers 4 packed doubleword (4 byte) integers 2 packed quadword (8 byte) integers 1 double quadword (16 byte) 4 packed single precision (4 bytes each) floating point values 2 packed double precision (8 bytes each) floating point values IA32 Registers: 8 32-bit GPRs Integer only 8 80-bit fp regs Floating point only 8 64-bit mmx regs Integer only Re-uses fp regs 8 128-bit xmm regs Integer and fp IA32 Registers: 8 32-bit GPRs Integer only 8 80-bit fp regs Floating point only 8 64-bit mmx regs Integer only Re-uses fp regs 8 128-bit xmm regs Integer and fp IA32 Registers: 8 32-bit GPRs Integer only 8 80-bit fp regs Floating point only 8 64-bit mmx regs Integer only Re-uses fp regs 8 128-bit xmm regs Integer and fp IA32 Registers: 8 32-bit GPRs Integer only 8 80-bit fp regs Floating point only 8 64-bit mmx regs Integer only Re-uses fp regs 8 128-bit xmm regs Integer and fp These will be the focus of our discussion. XMM register formats The utilities.asm MASM code (on the course’s software web page) contains a function that you can call to display the contents of the 8 xmm registers (dump) as pairs of 64 bit double precision fp values. call dumpXmm64 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion movhpd movlpd Move High Packed Double-Precision Floating-Point Value Move Low Packed Double-Precision Floating-Point Value movsd Move Scalar Double-Precision Floating-Point Value movhpd - Move High Packed Double-Precision Floating-Point Value for memory to XMM move: DEST[127-64] ← SRC; Ex. movhpd DEST[63-0] unchanged xmm0, m64 for XMM to memory move: DEST ← SRC[127-64] Ex. movhpd m64, xmm2 movlpd - Move Low Packed Double-Precision Floating-Point Value for memory to XMM move: DEST[127-64] unchanged; DEST[63-0] ← SRC Ex. movlpd xmm1, m64 for XMM to memory move: DEST ← SRC[63-0] Ex. movlpd m64, xmm2 movsd - Move Scalar Double-Precision Floating-Point Value 1. when source and destination operands are both XMM registers: DEST[127-64] remains unchanged; Ex. movsd xmm1, xmm3 2. DEST[63-0] ← SRC[63-0] when source operand is XMM register and destination operand is memory location: DEST ← SRC[63-0] Ex. movsd m64, xmm2 3. when source operand is memory location and destination operand is XMM register: DEST[127-64] ← 0000000000000000H; DEST[63-0] ← SRC Ex. movsd xmm1, m64 1. Data movement 2. Arithmetic (scalar) 3. Comparison 4. Conversion addsd - Add Scalar Double-Precision FloatingPoint Values subsd - Subtract Scalar Double-Precision FloatingPoint Values mulsd - Multiply Scalar Double-Precision FloatingPoint Values divsd - Divide Scalar Double-Precision FloatingPoint Values Also sqrtsd but no sin or cos SSE2 instructions! We have to use the x87 instructions for that! addsd DEST[63-0] ← DEST[63-0] + SRC[63-0] DEST[127-64] remains unchanged subsd DEST[63-0] ← DEST[63-0] − SRC[63-0] DEST[127-64] remains unchanged mulsd DEST[63-0] ← DEST[63-0] * xmm2/m64[63-0] DEST[127-64] remains unchanged divsd DEST[63-0] ← DEST[63-0] / SRC[63-0] DEST[127-64] remains unchanged 1. Data movement 2. Arithmetic (packed) 3. Comparison 4. Conversion addpd - Add Packed Double-Precision Floating-Point Values subpd - Subtract Packed Double-Precision Floating-Point Values mulpd - Multiply Packed Double-Precision Floating-Point Values divpd - Divide Packed Double-Precision Floating-Point Values addpd - Add Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] + SRC[63-0] DEST[127-64] ← DEST[127-64] + SRC[127-64] subpd - Subtract Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] / (SRC[63-0]) DEST[127-64] ← DEST[127-64] / (SRC[127-64]) mulpd - Multiply Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] / (SRC[63-0]) DEST[127-64] ← DEST[127-64] / (SRC[127-64]) divpd - Divide Packed Double-Precision Floating-Point Values DEST[63-0] ← DEST[63-0] / (SRC[63-0]) DEST[127-64] ← DEST[127-64] / (SRC[127-64]) 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion comisd Compare Scalar Ordered Double-Precision FloatingPoint Values and Set EFLAGS 1. Data movement 2. Arithmetic 3. Comparison 4. Conversion cvtsd2si Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer cvtsi2sd Convert Doubleword Integer to Scalar DoublePrecision Floating-Point Value cvtsd2si Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer DEST[31-0] ← Convert_Double_Precision_Floating_Point_To_Integ er(SRC[63-0]) cvtsi2sd Convert Doubleword Integer to Scalar DoublePrecision Floating-Point Value DEST[63-0] ← Convert_Integer_To_Double_Precision_Floating_Poi nt(SRC[31-0]) DEST[127-64] remains unchanged