Floating Point Numbers & Parallel Computing Outline • • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing 3.141592653589793238462643383… 1 Fixed-point Numbers • How to represent rational numbers in binary? • One way: define binary “point” between integer and fraction • Analogous to point between integer and fraction for decimal numbers: 6.75 integer 2 point fraction Fixed-point Numbers • Point’s position is static (cannot be changed) • E.g., point goes between 3rd and 4th bits of byte: 0110.1100 4 bits for integer component 3 4 bits for fraction component Fixed-point Numbers • Integer component: binary interpreted as before • LSB is 20 0110.1100 = 2 2 + 21 = 4+2 =6 4 Fixed-point Numbers • Fraction component: binary interpreted slightly differently • MSB is 2-1 0110.1100 5 = 2-1 + 2-2 = 0.5 + 0.25 = 0.75 Fixed-point Numbers = 22 + 21 = 4+2 =6 0110.1100 6.75 6 = 2-1 + 2-2 = 0.5 + 0.25 = 0.75 Fixed-point Numbers • How to represent negative numbers? • 2’s complement notation -2.375 1101.1010 7 Fixed-point Numbers 1. 2. 3. 4. 1101.1010 Invert bits Add 1 Convert to fixed-point decimal Multiply by -1 0010.0101 0010.0110 21 = 2 -2.375 8 2.375 = 2-2 + 2-3 = 0.25 + 0.125 = 0.375 Outline • • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing 3.141592653589793238462643383… 9 Floating Point Numbers • Analogous to scientific notation • E.g., 4.1 × 10 3 = 4100 • Gets around limitations of constant integer and fraction sizes • Allows representation of very small and very large numbers 10 Floating Point Numbers • Just like scientific notation, floating point numbers have: • • • • sign (±) mantissa (M) base (B) exponent (E) 4.1 × 10 3 = 4100 M = 4.1 E=3 B = 10 11 Floating Point Numbers • Floating point numbers in binary 32 bits exponent 8 bits sign 1 bit 12 mantissa 23 bits Floating Point Numbers • Example: convert 228 to floating point 228 = 1110 0100 = 1.1100100 × 27 sign = positive exponent = 7 mantissa = 1.1100100 base = 2 (implicit) 13 Floating Point Numbers 228 = 1110 0100 = 1.1100100 × 27 sign = positive (0) exponent = 7 mantissa = 1.1100100 base = 2 (implicit) 0 14 0000 0111 11100100000000000000000 Floating Point Numbers • In binary floating point, MSB of mantissa is always 1 • No need to store MSB of mantissa (1 is implied) • Called the “implicit leading 1” 15 0 0000 0111 11100100000000000000000 0 0000 0111 11001000000000000000000 Floating Point Numbers • Exponent must represent both positive and negative numbers • Floating point uses biased exponent • Original exponent plus a constant bias • 32-bit floating point uses bias 127 • E.g., exponent -4 (2-4) would be -4 + 127 = 123 = 0111 1011 • E.g., exponent 7 (27) would be 7 + 127 = 134 = 1000 0110 16 0 0000 0111 11001000000000000000000 0 1000 0110 11001000000000000000000 Floating Point Numbers • E.g., 228 in floating point binary (IEEE 754 standard) 0 1000 0110 11001000000000000000000 sign bit = 0 (positive) 23-bit mantissa without implicit leading 1 8-bit biased exponent E = number – bias E = 134 – 127 = 7 17 Floating Point Numbers • Special cases: 0, ±∞, NaN 18 value sign bit exponent mantissa 0 N/A 00000000 00…000 +∞ 0 11111111 00…000 -∞ 1 11111111 00…000 NaN N/A 11111111 non-zero Floating Point Numbers • Single versus double precision • Single: 32-bit float • Range: ±1.175494 × 10-38 ---> ±3.402824 × 1038 • Double: 64-bit double • Range: ±2.22507385850720 × 10-308 ---> ±1.79769313486232 × 10308 19 # bits (total) # sign bits # exponent bits # mantissa bits float 32 1 8 23 double 64 1 11 52 Outline • • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing 3.141592653589793238462643383… 20 Superscalar Processors • Multiple hardwired copies of datapath • Allows multiple instructions to execute simultaneously • E.g., 2-way superscalar processor • • • • 21 Fetches / executes 2 instructions per cycle 2 ALUs 2-port memory unit 6-port register file (4 source, 2 write back) Superscalar Processors • Datapath for 2-way superscalar processor 6-port register file 22 2 ALUs 2-port memory unit Superscalar Processors • Pipeline for 2-way superscalar processor • 2 instructions per cycle: 23 Superscalar Processors • Commercial processors can be 3, 4, or even 6-way superscalar • Very difficult to manage dependencies and hazards Intel Nehalam (6-way superscalar) 24 Outline • • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing 3.141592653589793238462643383… 25 Multithreading (Terms) • Process: program running on a computer • Can have multiple processes running at same time • E.g., music player, web browser, anti-virus, word processor • Thread: each process has one or more threads that can run simultaneously • E.g., word processor: threads to read input, print, spell check, auto-save 26 Multithreading (Terms) • Instruction level parallelism (ILP): # of instructions that can be executed simultaneously for program / microarchitecture • Practical processors rarely achieve ILP greater than 2 or 3 • Thread level parallelism (TLP): degree to which a process can be split into threads 27 Multithreading • Keeps processor with many execution units busy • Even if ILP is low or program is stalled (waiting for memory) • For single-core processors, threads give illusion of simultaneous execution • Threads take turns executing (according to OS) • OS decides when a thread’s turn begins / ends 28 Multithreading • When one thread’s turn ends: -- OS saves architectural state -- OS loads architectural state of another thread -- New thread begins executing • This is called a context switch • If context switch is fast enough, user perceives threads as running simultaneously (even on single-core) 29 context switch context switch Multithreading • Multithreading does NOT improve ILP, but DOES improve processor throughput • Threads use resources that are otherwise idle • Multithreading is relatively inexpensive • Only need to save PC and register file idle vs next task… 30 Outline • • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing 3.141592653589793238462643383… 31 Homogeneous Multiprocessing • AKA symmetric multiprocessing (SMP) • 2 or more identical processors with single shared memory • Easier to design (than heterogeneous) • Multiple cores on same (or different) chip(s) • In 2005, architectures made shift to SMP 32 Homogeneous Multiprocessing • Multiple cores can execute threads concurrently • True simultaneous execution • Multi-threaded programming can be tricky.. single-core threads w/ single-core vs. multi-core core #1 core #2 multi-core core #3 33 core #4 Outline • • • • • • Fixed-point Numbers Floating Point Numbers Superscalar Processors Multithreading Homogeneous Multiprocessing Heterogeneous Multiprocessing 3.141592653589793238462643383… 34 Heterogeneous Multiprocessing • AKA asymmetric multiprocessing (AMP) • 2 (or more) different processors • Specialized processors used for specific tasks • E.g., graphics, floating point, FPGAs • Adds complexity 35 Nvidia GPU Heterogeneous Multiprocessing • Clustered: • Each processor has its own memory • E.g., PCs connected on a network • Memory not shared, must pass information between nodes… • Can be costly 36