f.p. & SSE SIMD

advertisement
with a focus on floating point

For floating point (i.e., real numbers), MASM
supports:

real4
 single precision; IEEE standard; analogous to float

real8
 double precision; IEEE standard; analogous to double

real10
 double extended precision
 Not IEEE standard

NaN = Not a Number (see p. 4-14 of v1)


SSE2 supports 32 and 64 bit f.p. data
x87 supports 32, 64, and 80 bit f.p. data
Note: These are 24-bit
binary numbers.
Here they are in base 10:
2.00000000000000
1.99999988079071

SSE2 = Streaming SIMD Extensions 2


SIMD = Single Instruction Multiple Data instructions
SSE2 introduced in 2000 on Pentium 4 and Intel
Xeon processors.









1996
1998
1999
2001
2003
2006
2006
2007
2008
Intel MMX
AMD 3DNow!
Intel SSE on P3
Intel SSE2 on P4
Intel SSE3 (since Prescott P4)
Intel SupplementalSSE3 (since Woodcrest Xeons)
Intel SSE4 (4.1 and 4.2)
AMD SSE5 (proposed 2007, implemented 2011)
Intel AVX (proposed 2008, implemented 2011 in Intel
Westmere and AMD Bulldozer)
 XMM registers go from 128 bit to 256 bit, called YMM.
1.
You must use MASM v6.15 or newer for SIMD
support. (MASM v6.15 is available from the
course software web page.)
2.
You must enable MASM support for these
instructions with the following:
.686
;instructions for Pentium Pro (or better)
.xmm
;allow simd instructions
.model
flat, stdcall
;no crazy segments!

Each one of the 8 128-bit registers (xmm0...xmm7)
can hold:





16 packed 1 byte integers
8 packed word (2 byte) integers
4 packed doubleword (4 byte) integers
2 packed quadword (8 byte) integers
1 double quadword (16 byte)
4 packed single precision (4 bytes each) floating point
values
 2 packed double precision (8 bytes each) floating point
values

IA32 Registers:

8 32-bit GPRs
 Integer only

8 80-bit fp regs
 Floating point only

8 64-bit mmx regs
 Integer only
 Re-uses fp regs

8 128-bit xmm regs
 Integer and fp
IA32 Registers:

8 32-bit GPRs
 Integer only

8 80-bit fp regs
 Floating point only

8 64-bit mmx regs
 Integer only
 Re-uses fp regs

8 128-bit xmm regs
 Integer and fp
IA32 Registers:

8 32-bit GPRs
 Integer only

8 80-bit fp regs
 Floating point only

8 64-bit mmx regs
 Integer only
 Re-uses fp regs

8 128-bit xmm regs
 Integer and fp
IA32 Registers:

8 32-bit GPRs
 Integer only

8 80-bit fp regs
 Floating point only

8 64-bit mmx regs
 Integer only
 Re-uses fp regs

8 128-bit xmm regs
 Integer and fp
 These will be the focus
of our discussion.
XMM
register
formats

The utilities.asm MASM code (on the course’s
software web page) contains a function that you
can call to display the contents of the 8 xmm
registers (dump) as pairs of 64 bit double
precision fp values.
call dumpXmm64
1.
Data movement
2.
Arithmetic
3.
Comparison
4.
Conversion
1.
Data movement
2.
Arithmetic
3.
Comparison
4.
Conversion

movhpd


movlpd


Move High Packed Double-Precision Floating-Point
Value
Move Low Packed Double-Precision Floating-Point
Value
movsd

Move Scalar Double-Precision Floating-Point Value

movhpd - Move High Packed Double-Precision
Floating-Point Value

for memory to XMM move:
 DEST[127-64] ← SRC;
 Ex.

movhpd
DEST[63-0] unchanged
xmm0, m64
for XMM to memory move:
 DEST ← SRC[127-64]
 Ex.
movhpd
m64, xmm2

movlpd - Move Low Packed Double-Precision
Floating-Point Value

for memory to XMM move:
 DEST[127-64] unchanged; DEST[63-0] ← SRC
 Ex.

movlpd
xmm1, m64
for XMM to memory move:
 DEST ← SRC[63-0]
 Ex.
movlpd
m64, xmm2

movsd - Move Scalar Double-Precision Floating-Point
Value
1.
when source and destination operands are both XMM
registers:
 DEST[127-64] remains unchanged;
 Ex.
movsd xmm1, xmm3
2.
DEST[63-0] ← SRC[63-0]
when source operand is XMM register and destination operand
is memory location:
 DEST ← SRC[63-0]
 Ex.
movsd m64, xmm2
3.
when source operand is memory location and destination
operand is XMM register:
 DEST[127-64] ← 0000000000000000H; DEST[63-0] ← SRC
 Ex.
movsd xmm1, m64
1.
Data movement
2.
Arithmetic (scalar)
3.
Comparison
4.
Conversion





addsd - Add Scalar Double-Precision FloatingPoint Values
subsd - Subtract Scalar Double-Precision FloatingPoint Values
mulsd - Multiply Scalar Double-Precision FloatingPoint Values
divsd - Divide Scalar Double-Precision FloatingPoint Values
Also sqrtsd but no sin or cos SSE2 instructions! We
have to use the x87 instructions for that!

addsd


DEST[63-0] ← DEST[63-0] + SRC[63-0]
DEST[127-64] remains unchanged

subsd


DEST[63-0] ← DEST[63-0] − SRC[63-0]
DEST[127-64] remains unchanged

mulsd


DEST[63-0] ← DEST[63-0] * xmm2/m64[63-0]
DEST[127-64] remains unchanged

divsd


DEST[63-0] ← DEST[63-0] / SRC[63-0]
DEST[127-64] remains unchanged
1.
Data movement
2.
Arithmetic (packed)
3.
Comparison
4.
Conversion




addpd - Add Packed Double-Precision
Floating-Point Values
subpd - Subtract Packed Double-Precision
Floating-Point Values
mulpd - Multiply Packed Double-Precision
Floating-Point Values
divpd - Divide Packed Double-Precision
Floating-Point Values

addpd - Add Packed Double-Precision
Floating-Point Values


DEST[63-0] ← DEST[63-0] + SRC[63-0]
DEST[127-64] ← DEST[127-64] + SRC[127-64]

subpd - Subtract Packed Double-Precision
Floating-Point Values


DEST[63-0] ← DEST[63-0] / (SRC[63-0])
DEST[127-64] ← DEST[127-64] / (SRC[127-64])

mulpd - Multiply Packed Double-Precision
Floating-Point Values


DEST[63-0] ← DEST[63-0] / (SRC[63-0])
DEST[127-64] ← DEST[127-64] / (SRC[127-64])

divpd - Divide Packed Double-Precision
Floating-Point Values


DEST[63-0] ← DEST[63-0] / (SRC[63-0])
DEST[127-64] ← DEST[127-64] / (SRC[127-64])
1.
Data movement
2.
Arithmetic
3.
Comparison
4.
Conversion

comisd

Compare Scalar Ordered Double-Precision FloatingPoint Values and Set EFLAGS
1.
Data movement
2.
Arithmetic
3.
Comparison
4.
Conversion

cvtsd2si


Convert Scalar Double-Precision Floating-Point
Value to Doubleword Integer
cvtsi2sd

Convert Doubleword Integer to Scalar DoublePrecision Floating-Point Value

cvtsd2si


Convert Scalar Double-Precision Floating-Point
Value to Doubleword Integer
DEST[31-0] ←
Convert_Double_Precision_Floating_Point_To_Integ
er(SRC[63-0])

cvtsi2sd



Convert Doubleword Integer to Scalar DoublePrecision Floating-Point Value
DEST[63-0] ←
Convert_Integer_To_Double_Precision_Floating_Poi
nt(SRC[31-0])
DEST[127-64] remains unchanged
Download