CSCI-365 Computer Organization Lecture 9-11 Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson & Hennessy, ©2005 Some slides and/or pictures in the following are adapted from: slides ©2008 UCB Number Representations 32-bit signed numbers (2’s complement): 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten 0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten ... 0111 0111 1000 1000 ... MSB 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1111 1111 0000 0000 1110two 1111two 0000two 0001two = = = = + + – – maxint 2,147,483,646ten 2,147,483,647ten 2,147,483,648ten 2,147,483,647ten 1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten 1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten minint LSB Converting <32-bit values into 32-bit values l copy the most significant bit (the sign bit) into the “empty” bits 0010 -> 0000 0010 1010 -> 1111 1010 l sign extend CSE431 Chapter 3.2 versus zero extend (lb vs. lbu) Irwin, PSU, 2008 MIPS Arithmetic Logic Unit (ALU) zero ovf Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu 1 1 A 32 sub, subu ALU mult, multu, div, divu sqrt result 32 B 32 and, andi, nor, or, ori, xor, xori 4 m (operation) beq, bne, slt, slti, sltiu, sltu With special handling for l sign extend – addi, addiu, slti, sltiu l zero extend – andi, ori, xori l overflow detection – add, addi, sub CSE431 Chapter 3.3 Irwin, PSU, 2008 Dealing with Overflow Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur Operation Operand A Operand B Result indicating overflow A+B ≥0 ≥0 <0 A+B <0 <0 ≥0 A-B ≥0 <0 <0 A-B <0 ≥0 ≥0 MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception CSE431 Chapter 3.4 Irwin, PSU, 2008 Two’s Complement Arithmetic Addition is accomplished by adding the codes, ignoring any final carry Subtraction: change the sign and add 16 + (-23) =? 16 - (-23) =? -23 - (-16) =? CSE431 Chapter 3.5 Irwin, PSU, 2008 CSE431 Chapter 3.6 Irwin, PSU, 2008 CSE431 Chapter 3.7 Irwin, PSU, 2008 Hardware for Addition and Subtraction CSE431 Chapter 3.8 Irwin, PSU, 2008 Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial product array n can be formed in parallel and added in parallel for faster multiplication double precision product 2n CSE431 Chapter 3.9 Irwin, PSU, 2008 Multiplication Example 1011 Multiplicand (11 dec) x 1101 Multiplier (13 dec) 1011 Partial products 0000 1011 1011 Note: if multiplier bit is 1 copy multiplicand (place value) otherwise zero 10001111 Product (143 dec) Note: need double length result CSE431 Chapter 3.10 Irwin, PSU, 2008 Add and Right Shift Multiplier Hardware 0110 =6 multiplicand add 32-bit ALU product shift right multiplier 0000 add 0 1 1 0 0011 0011 0001 add 0 1 1 1 0011 0011 0001 CSE431 Chapter 3.12 0101 0101 0010 0010 1001 1001 1100 1100 1110 Control =5 = 30 Irwin, PSU, 2008 Unsigned Binary Multiplication CSE431 Chapter 3.13 Irwin, PSU, 2008 Execution of Example CSE431 Chapter 3.14 Irwin, PSU, 2008 Multiplying Negative Numbers This does not work! Solution 1 l l l Convert to positive if required Multiply as above If signs were different, negate answer Solution 2 l Booth’s algorithm CSE431 Chapter 3.15 Irwin, PSU, 2008 Booth’s Algorithm CSE431 Chapter 3.16 Irwin, PSU, 2008 Example of Booth’s Algorithm CSE431 Chapter 3.17 Irwin, PSU, 2008 MIPS Multiply Instruction Multiply (mult and multu) produces a double precision product mult $s0, $s1 0 l l 16 # hi||lo = $s0 * $s1 17 0 0 0x18 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file Multiplies are usually done by fast, dedicated hardware and are much more complex (and slower) than adders CSE431 Chapter 3.18 Irwin, PSU, 2008 MIPS Multiplication Two l l 32-bit registers for product HI: most-significant 32 bits LO: least-significant 32-bits Instructions l mult rs, rt / multu rs, rt - 64-bit product in HI/LO l mfhi rd / mflo rd - Move from HI/LO to rd - Can test HI value to see if product overflows 32 bits l mul rd, rs, rt - Least-significant 32 bits of product –> rd CSE431 Chapter 3.19 Irwin, PSU, 2008 Division Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder n quotient n 0 0 0 dividend divisor 0 partial remainder array 0 0 remainder n CSE431 Chapter 3.20 Irwin, PSU, 2008 Division of Unsigned Binary Integers 00001101 Quotient 1011 10010011 1011 001110 Partial 1011 Remainders 001111 1011 100 Dividend Divisor CSE431 Chapter 3.21 Remainder Irwin, PSU, 2008 Left Shift and Subtract Division Hardware 0010 =2 divisor subtract 32-bit ALU dividend remainder sub sub sub sub CSE431 Chapter 3.23 0000 0000 1110 0000 0001 1111 0001 0011 0001 0010 0000 quotient shift left Control 0110 =6 1100 1100 rem neg, so ‘ient bit = 0 1100 restore remainder 1000 1100 rem neg, so ‘ient bit = 0 1000 restore remainder 0000 rem pos, so ‘ient bit = 1 0001 0010 rem pos, so ‘ient bit = 1 0011 = 3 with 0 remainder Irwin, PSU, 2008 Division of Signed Binary Integers CSE431 Chapter 3.24 Irwin, PSU, 2008 Division of Signed Binary Integers CSE431 Chapter 3.25 Irwin, PSU, 2008 Division of Signed Binary Integers CSE431 Chapter 3.26 Irwin, PSU, 2008 MIPS Divide Instruction Divide (div and divu) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 0 l 16 17 0 0 0x1A Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0. CSE431 Chapter 3.27 Irwin, PSU, 2008 MIPS Division Use HI/LO registers for result l l HI: 32-bit remainder LO: 32-bit quotient Instructions l div rs, rt / divu rs, rt l No overflow or divide-by-0 checking - Software must perform checks if required l Use mfhi, mflo to access result CSE431 Chapter 3.28 Irwin, PSU, 2008 Representation of Fractions “Binary Point” like decimal point signifies boundary between integer and fractional parts: Example 6-bit representation: xx.yyyy 21 20 2-1 2-2 2-3 2-4 10.10102 = 1x21 + 1x2-1 + 1x2-3 = 2.62510 If we assume “fixed binary point”, range of 6-bit representations with this format: 0 to 3.9375 (almost 4) CSE431 Chapter 3.29 Irwin, PSU, 2008 Fractional Powers of 2 i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 CSE431 Chapter 3.30 2-i 1.0 1 0.5 1/2 0.25 1/4 0.125 1/8 0.0625 1/16 0.03125 1/32 0.015625 0.0078125 0.00390625 0.001953125 0.0009765625 0.00048828125 0.000244140625 0.0001220703125 0.00006103515625 0.000030517578125 Irwin, PSU, 2008 Example: 0.828125 and 0.1640625 (done in class) CSE431 Chapter 3.31 Irwin, PSU, 2008 Representation of Fractions So far, in our examples we used a “fixed” binary point. What we really want is to “float” the binary point. Why? Floating binary point most effective use of our limited bits (and thus more accuracy in our number representation): example: put 0.1640625 into binary. Represent as in 5-bits choosing where to put the binary point. … 000000.001010100000… Store these bits and keep track of the binary point 2 places to the left of the MSB Any other solution would lose accuracy! With floating point rep., each numeral carries a exponent field recording the whereabouts of its binary point. The binary point can be outside the stored bits, so very large and small numbers can be represented. CSE431 Chapter 3.32 Irwin, PSU, 2008 Scientific Notation (in Decimal) significand 6.0210 x 1023 decimal point exponent radix (base) Normalized form: no leadings 0s (exactly one digit to left of decimal point) Alternatives to representing 1/1,000,000,000 l Normalized: 1.0 x 10-9 l Not normalized: 0.1 x 10-8,10.0 x 10-10 CSE431 Chapter 3.33 Irwin, PSU, 2008 Scientific Notation (in Binary) significand 1.0two x 2-1 “binary point” exponent radix (base) Computer arithmetic that supports it called floating point, because it represents numbers where the binary point is not fixed, as it is for integers l Declare such variable in C as float CSE431 Chapter 3.34 Irwin, PSU, 2008 Floating Point Representation Normal format: +1.xxxxxxxxxxtwo*2yyyytwo 32-bit version (C “float”) 31 30 23 22 S Exponent 1 bit 8 bits Significand 0 23 bits S represents Sign Exponent represents y’s Significand represents x’s CSE431 Chapter 3.35 Irwin, PSU, 2008 Floating Point Representation What if result too large? l Overflow! Exponent larger than represented in 8-bit Exponent field What if result too small? l Underflow! Negative exponent larger than represented in 8-bit Exponent field overflow overflow -1 0 underflow 1 What would help reduce chances of overflow and/or underflow? CSE431 Chapter 3.36 Irwin, PSU, 2008 Double Precision Fl. Pt. Representation 64 bit version (C “double”) 31 30 20 19 S Exponent 1 bit 11 bits Significand 0 20 bits Significand (cont’d) 32 bits • Double Precision (vs. Single Precision) – C variable declared as double – But primary advantage is greater accuracy due to larger significand CSE431 Chapter 3.37 Irwin, PSU, 2008 QUAD Precision Fl. Pt. Representation Next Multiple of Word Size (128 bits) l l Currently being worked on (IEEE 754r) l Current version has 15 exponent bits and 112 significand bits (113 precision bits) Oct-Precision? l Unbelievable range of numbers Unbelievable precision (accuracy) Some have tried, no real traction so far Half-Precision? l Yep, that’s for a short (16 bit) en.wikipedia.org/wiki/Quad_precision en.wikipedia.org/wiki/Half_precision CSE431 Chapter 3.38 Irwin, PSU, 2008 IEEE 754 Floating Point Standard Single Precision (DP similar): 31 30 23 22 S Exponent 1 bit 8 bits Significand 23 bits Sign bit:1 means negative, 0 means positive Significand: l l l 0 To pack more bits, leading 1 implicit for normalized numbers 1 + 23 bits single, 1 + 52 bits double always true: 0 < Significand < 1 (for normalized numbers) Note: 0 has no leading 1, so reserve exponent value 0 just for number 0 CSE431 Chapter 3.39 Irwin, PSU, 2008 IEEE 754 Floating Point Standard 754 uses “biased exponent” representation IEEE l l l Designers wanted FP numbers to be used even if no FP hardware; e.g., sort records with FP numbers using integer compares Wanted bigger (integer) exponent field to represent bigger numbers 2’s complement poses a problem (because negative numbers look bigger) 1.0x 2-1 and 1.0x21 (done in class) CSE431 Chapter 3.40 Irwin, PSU, 2008 IEEE 754 Floating Point Standard • Called Biased Notation, where bias is number subtracted to get real number – IEEE 754 uses bias of 127 for single precision – Subtract 127 from Exponent field to get actual value for exponent – 1023 is bias for double precision Summary (single precision): 31 30 23 22 S Exponent 1 bit • 8 bits Significand 0 23 bits (-1)S x (1 + Significand) x 2(Exponent-127) – Double precision identical, except with exponent bias of 1023 (half, quad similar) CSE431 Chapter 3.41 Irwin, PSU, 2008 Single-Precision Range Exponents Smallest l l l l l value Exponent: 00000001 actual exponent = 1 – 127 = –126 Fraction: 000…00 significand = 1.0 ±1.0 × 2–126 ≈ ±1.2 × 10–38 Largest l 00000000 and 11111111 reserved value exponent: 11111110 actual exponent = 254 – 127 = +127 Fraction: 111…11 significand ≈ 2.0 ±2.0 × 2+127 ≈ ±3.4 × 10+38 CSE431 Chapter 3.42 Irwin, PSU, 2008 Double-Precision Range Exponents Smallest l l l l l value Exponent: 00000000001 actual exponent = 1 – 1023 = –1022 Fraction: 000…00 significand = 1.0 ±1.0 × 2–1022 ≈ ±2.2 × 10–308 Largest l 0000…00 and 1111…11 reserved value Exponent: 11111111110 actual exponent = 2046 – 1023 = +1023 Fraction: 111…11 significand ≈ 2.0 ±2.0 × 2+1023 ≈ ±1.8 × 10+308 CSE431 Chapter 3.43 Irwin, PSU, 2008 Floating-Point Precision Relative precision l l l all fraction bits are significant Single: approx 2–23 - Equivalent to 23 × log102 ≈ 23 × 0.3 ≈ 6 decimal digits of precision Double: approx 2–52 - Equivalent to 52 × log102 ≈ 52 × 0.3 ≈ 16 decimal digits of precision CSE431 Chapter 3.44 Irwin, PSU, 2008 Floating-Point Example Represent –0.75 l l l l –0.75 = (–1)1 × 1.12 × 2–1 S=1 Fraction = 1000…002 Exponent = –1 + Bias - Single: –1 + 127 = 126 = 011111102 - Double: –1 + 1023 = 1022 = 011111111102 Single: 1011111101000…00 Double: 1011111111101000…00 CSE431 Chapter 3.45 Irwin, PSU, 2008 Floating-Point Example What number is represented by the single-precision float 11000000101000…00 l S=1 l Fraction = 01000…002 l Fxponent = 100000012 = 129 x = (–1)1 × (1 + 012) × 2(129 – 127) = (–1) × 1.25 × 22 = –5.0 CSE431 Chapter 3.46 Irwin, PSU, 2008 Example: Converting Binary FP to Decimal 0 0110 1000 101 0101 0100 0011 0100 0010 (done in class) CSE431 Chapter 3.47 Irwin, PSU, 2008 Example: Converting Decimal to FP -2.828125 x 101 (done in class) CSE431 Chapter 3.48 Irwin, PSU, 2008 Representation for 0 Represent 0? exponent all zeroes l significand all zeroes l What about sign? Both cases valid +0: 0 00000000 00000000000000000000000 -0: 1 00000000 00000000000000000000000 l CSE431 Chapter 3.49 Irwin, PSU, 2008 Special Numbers What have we defined so far? (Single Precision) Exponent Significand Object 0 0 0 0 nonzero ??? 1-254 anything +/- fl. pt. # 255 0 +/- ∞ 255 nonzero ??? CSE431 Chapter 3.50 Irwin, PSU, 2008 Representation for Not a Number What do I get if I calculate sqrt(-4.0)or 0/0? l l l If ∞ not an error, these shouldn’t be either Called Not a Number (NaN) Exponent = 255, Significand nonzero Why is this useful? l Hope NaNs help with debugging? l They contaminate: op(NaN, X) = NaN CSE431 Chapter 3.51 Irwin, PSU, 2008 Infinities and NaNs Exponent = 111...1, Fraction = 000...0 l l ±Infinity Can be used in subsequent calculations, avoiding need for overflow check Exponent = 111...1, Fraction ≠ 000...0 l Not-a-Number (NaN) l Indicates illegal or undefined result l - e.g., 0.0 / 0.0 Can be used in subsequent calculations CSE431 Chapter 3.52 Irwin, PSU, 2008 Representation for Denorms Problem: There’s a gap among representable FP numbers around 0 Normalization and implicit 1 is to blame! (done in class) CSE431 Chapter 3.53 Gaps! b 0 a + Irwin, PSU, 2008 Representation for Denorms Solution: l We still haven’t used Exponent = 0, Significand nonzero l Denormalized number: no (implied) leading 1, implicit exponent = -127 l Smallest representable pos num: a = 2-150 l Second smallest representable pos num: b = 2-149 CSE431 Chapter 3.54 0 + Irwin, PSU, 2008 Special Numbers What have we defined so far? (Single Precision) Exponent Significand Object 0 0 0 0 nonzero Denorm 1-254 anything +/- fl. pt. # 255 0 +/- ∞ 255 nonzero NaN CSE431 Chapter 3.55 Irwin, PSU, 2008 Floating-Point Addition Consider a 4-digit decimal example l 1. Align decimal points l Shift number with smaller exponent l 9.999 × 101 + 0.016 × 101 2. Add significands l 9.999 × 101 + 0.016 × 101 = 10.015 × 101 3. Normalize result & check for over/underflow l 9.999 × 101 + 1.610 × 10–1 1.0015 × 102 4. Round and renormalize if necessary l 1.002 × 102 CSE431 Chapter 3.56 Irwin, PSU, 2008 Floating Point Addition Addition (and subtraction) (F1 2E1) + (F2 2E2) = F3 2E3 l Step 0: Restore the hidden bit in F1 and in F2 l Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1 E2) keeping track of (three of) the bits shifted out in G R and S l Step 2: Add the resulting F2 to F1 to form F3 l Step 3: Normalize F3 (so it is in the form 1.XXXXX …) - If F1 and F2 have the same sign F3 [1,4) 1 bit right shift F3 and increment E3 (check for overflow) - If F1 and F2 have different signs F3 may require many left shifts each time decrementing E3 (check for underflow) l Step 4: Round F3 and possibly normalize F3 again l Step 5: Rehide the most significant bit of F3 before storing the result CSE431 Chapter 3.57 Irwin, PSU, 2008 Floating Point Addition Example Add (0.5 = 1.0000 2-1) + (-0.4375 = -1.1100 2-2) l Step 0: l Step 1: l Step 2: l Step 3: l Step 4: l Step 5: CSE431 Chapter 3.58 Irwin, PSU, 2008 Floating Point Addition Example Add (0.5 = 1.0000 2-1) + (-0.4375 = -1.1100 2-2) Hidden bits restored in the representation above Shift significand with the smaller exponent (1.1100) right until its exponent matches the larger exponent (so once) l Step 0: l Step 1: l Step 2: l Step 3: Normalize the sum, checking for exponent over/underflow 0.001 x 2-1 = 0.010 x 2-2 = .. = 1.000 x 2-4 l Step 4: The sum is already rounded, so we’re done l Step 5: Rehide the hidden bit before storing CSE431 Chapter 3.59 Add significands 1.0000 + (-0.111) = 1.0000 – 0.111 = 0.001 Irwin, PSU, 2008 Floating Point Multiplication Multiplication (F1 2E1) x (F2 2E2) = F3 2E3 l Step 0: Restore the hidden bit in F1 and in F2 l Step 1: Add the two (biased) exponents and subtract the bias from the sum, so E1 + E2 – 127 = E3 also determine the sign of the product (which depends on the sign of the operands (most significant bits)) l Step 2: Multiply F1 by F2 to form a double precision F3 l Step 3: Normalize F3 (so it is in the form 1.XXXXX …) - Since F1 and F2 come in normalized F3 [1,4) 1 bit right shift F3 and increment E3 - Check for overflow/underflow l Step 4: Round F3 and possibly normalize F3 again l Step 5: Rehide the most significant bit of F3 before storing the result CSE431 Chapter 3.60 Irwin, PSU, 2008 Floating Point Multiplication Example Multiply (0.5 = 1.0000 2-1) x (-0.4375 = -1.1100 2-2) l Step 0: l Step 1: l Step 2: l Step 3: l Step 4: l Step 5: CSE431 Chapter 3.61 Irwin, PSU, 2008 Floating Point Multiplication Example Multiply (0.5 = 1.0000 2-1) x (-0.4375 = -1.1100 2-2) l Step 0: Hidden bits restored in the representation above l Step 1: Add the exponents (not in bias would be -1 + (-2) = -3 and in bias would be (-1+127) + (-2+127) – 127 = (-1 -2) + (127+127-127) = -3 + 127 = 124 l Step 2: Multiply the significands 1.0000 x 1.110 = 1.110000 l Step 3: Normalized the product, checking for exp over/underflow 1.110000 x 2-3 is already normalized l Step 4: The product is already rounded, so we’re done l Step 5: Rehide the hidden bit before storing CSE431 Chapter 3.62 Irwin, PSU, 2008 Floating Point Examples Add (0.75) + (-0.375) Multiplication (0.75) * (-0.375) CSE431 Chapter 3.63 Irwin, PSU, 2008 Accurate Arithmetic IEEE Std 754 specifies additional rounding control l Extra bits of precision (guard, round, sticky) l Choice of rounding modes l Allows programmer to fine-tune numerical behavior of a computation Not l all FP units implement all options Most programming languages and FP libraries just use defaults Trade-off between hardware complexity, performance, and market requirements CSE431 Chapter 3.64 Irwin, PSU, 2008 MIPS Floating Point Instructions MIPS has a separate Floating Point Register File ($f0, $f1, …, $f31) (whose registers are used in pairs for double precision values) with special instructions to load to and store from them lwcl $f1,54($s2) #$f1 = Memory[$s2+54] swcl $f1,58($s4) #Memory[$s4+58] = $f1 And supports IEEE 754 single add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 and double precision operations add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 similarly for sub.s, sub.d, mul.s, mul.d, div.s, div.d CSE431 Chapter 3.65 Irwin, PSU, 2008 MIPS Floating Point Instructions, Con’t And floating point single precision comparison operations c.x.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where x may be eq, neq, lt, le, gt, ge and double precision comparison operations c.x.d $f2,$f4 #$f2||$f3 < $f4||$f5 cond=1; else cond=0 And floating point branch operations bclt 25 #if(cond==1) go to PC+4+25 bclf 25 #if(cond==0) go to PC+4+25 CSE431 Chapter 3.66 Irwin, PSU, 2008 FP Example: °F to °C C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } l fahr in $f12, result in $f0, literals in global memory space Compiled MIPS code: f2c: lwc1 lwcl div.s lwcl sub.s mul.s jr CSE431 Chapter 3.67 $f16, $f18, $f16, $f18, $f18, $f0, $ra const5($gp) const9($gp) $f16, $f18 const32($gp) $f12, $f18 $f16, $f18 Irwin, PSU, 2008 FP Example: Array Multiplication X l C =X+Y×Z All 32 × 32 matrices, 64-bit double-precision elements code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } l Addresses of x, y, z in $a0, $a1, $a2, and i, j, k in $s0, $s1, $s2 CSE431 Chapter 3.68 Irwin, PSU, 2008 FP Example: Array Multiplication MIPS code: li li L1: li L2: li sll addu sll addu l.d L3: sll addu sll addu l.d … CSE431 Chapter 3.69 $t1, 32 $s0, 0 $s1, 0 $s2, 0 $t2, $s0, 5 $t2, $t2, $s1 $t2, $t2, 3 $t2, $a0, $t2 $f4, 0($t2) $t0, $s2, 5 $t0, $t0, $s1 $t0, $t0, 3 $t0, $a2, $t0 $f16, 0($t0) # # # # # # # # # # # # # # $t1 = 32 (row size/loop end) i = 0; initialize 1st for loop j = 0; restart 2nd for loop k = 0; restart 3rd for loop $t2 = i * 32 (size of row of x) $t2 = i * size(row) + j $t2 = byte offset of [i][j] $t2 = byte address of x[i][j] $f4 = 8 bytes of x[i][j] $t0 = k * 32 (size of row of z) $t0 = k * size(row) + j $t0 = byte offset of [k][j] $t0 = byte address of z[k][j] $f16 = 8 bytes of z[k][j] Irwin, PSU, 2008 FP Example: Array Multiplication … sll $t0, $s0, 5 addu $t0, $t0, $s2 sll $t0, $t0, 3 addu $t0, $a1, $t0 l.d $f18, 0($t0) mul.d $f16, $f18, $f16 add.d $f4, $f4, $f16 addiu $s2, $s2, 1 bne $s2, $t1, L3 s.d $f4, 0($t2) addiu $s1, $s1, 1 bne $s1, $t1, L2 addiu $s0, $s0, 1 bne $s0, $t1, L1 CSE431 Chapter 3.70 # # # # # # # # # # # # # # $t0 = i*32 (size of row of y) $t0 = i*size(row) + k $t0 = byte offset of [i][k] $t0 = byte address of y[i][k] $f18 = 8 bytes of y[i][k] $f16 = y[i][k] * z[k][j] f4=x[i][j] + y[i][k]*z[k][j] $k k + 1 if (k != 32) go to L3 x[i][j] = $f4 $j = j + 1 if (j != 32) go to L2 $i = i + 1 if (i != 32) go to L1 Irwin, PSU, 2008 Problem Calculate the sum of A and B assuming the 16-bit NVIDIA format (1 bit sign, 5 bit exponent and 10 bit significands), as well as, 1 guard, 1 round bit and 1 sticky bit. A = -1.278 x 103 B = -3.90625 x 10-1 CSE431 Chapter 3.71 Irwin, PSU, 2008 Problem Calculate the product of A and B assuming the 16-bit NVIDIA format (1 bit sign, 5 bit exponent and 10 bit significands), as well as, 1 guard, 1 round bit and 1 sticky bit. A = 5.66015625 x 100 B = 8.59375 x 100 CSE431 Chapter 3.72 Irwin, PSU, 2008 Associativity Parallel programs may interleave operations in unexpected orders l Assumptions of associativity may fail (x+y)+z x+(y+z) -1.50E+38 x -1.50E+38 y 1.50E+38 0.00E+00 z 1.0 1.0 1.50E+38 1.00E+00 0.00E+00 CSE431 Chapter 3.73 Irwin, PSU, 2008 Problem Calculate (A+B)+C and A+(B+C) assuming the 16-bit NVIDIA format (1 bit sign, 5 bit exponent and 10 bit significands), as well as, 1 guard, 1 round bit and 1 sticky bit. A = 2.865625 x 101 B = 4.140625 x 10-1 C = 1.2140625 x 101 CSE431 Chapter 3.74 Irwin, PSU, 2008