Floating Point Numbers & Parallel Computing

advertisement
Floating Point Numbers
& Parallel Computing
Outline
•
•
•
•
•
•
Fixed-point Numbers
Floating Point Numbers
Superscalar Processors
Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
1
Fixed-point Numbers
• How to represent rational numbers in binary?
• One way: define binary “point” between integer and fraction
• Analogous to point between integer and fraction for decimal
numbers:
6.75
integer
2
point
fraction
Fixed-point Numbers
• Point’s position is static (cannot be changed)
• E.g., point goes between 3rd and 4th bits of byte:
0110.1100
4 bits for
integer
component
3
4 bits for
fraction
component
Fixed-point Numbers
• Integer component: binary interpreted as before
• LSB is 20
0110.1100
= 2 2 + 21
= 4+2
=6
4
Fixed-point Numbers
• Fraction component: binary interpreted slightly differently
• MSB is 2-1
0110.1100
5
= 2-1 + 2-2
= 0.5 +
0.25
= 0.75
Fixed-point Numbers
= 22 + 21
= 4+2
=6
0110.1100
6.75
6
= 2-1 + 2-2
= 0.5 +
0.25
= 0.75
Fixed-point Numbers
• How to represent negative numbers?
• 2’s complement notation
-2.375
1101.1010
7
Fixed-point Numbers
1.
2.
3.
4.
1101.1010
Invert bits
Add 1
Convert to fixed-point decimal
Multiply by -1
0010.0101
0010.0110
21 =
2
-2.375
8
2.375
= 2-2 + 2-3
= 0.25 +
0.125
= 0.375
Outline
•
•
•
•
•
•
Fixed-point Numbers
Floating Point Numbers
Superscalar Processors
Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
9
Floating Point Numbers
• Analogous to scientific notation
• E.g., 4.1 × 10 3 = 4100
• Gets around limitations of constant integer and fraction sizes
• Allows representation of very small and very large numbers
10
Floating Point Numbers
• Just like scientific notation, floating point numbers have:
•
•
•
•
sign (±)
mantissa (M)
base (B)
exponent (E)
4.1 × 10 3 = 4100
M = 4.1
E=3
B = 10
11
Floating Point Numbers
• Floating point numbers in binary
32 bits
exponent
8 bits
sign
1 bit
12
mantissa
23 bits
Floating Point Numbers
• Example: convert 228 to floating point
228 = 1110 0100 = 1.1100100 × 27
sign = positive
exponent = 7
mantissa =
1.1100100
base = 2 (implicit)
13
Floating Point Numbers
228 = 1110 0100 = 1.1100100 × 27
sign = positive (0)
exponent = 7
mantissa =
1.1100100
base = 2 (implicit)
0
14
0000 0111
11100100000000000000000
Floating Point Numbers
• In binary floating point, MSB of mantissa is always 1
• No need to store MSB of mantissa (1 is implied)
• Called the “implicit leading 1”
15
0
0000 0111
11100100000000000000000
0
0000 0111
11001000000000000000000
Floating Point Numbers
• Exponent must represent both positive and negative numbers
• Floating point uses biased exponent
• Original exponent plus a constant bias
• 32-bit floating point uses bias 127
• E.g., exponent -4 (2-4) would be -4 + 127 = 123 = 0111 1011
• E.g., exponent 7 (27) would be 7 + 127 = 134 = 1000 0110
16
0
0000 0111
11001000000000000000000
0
1000 0110
11001000000000000000000
Floating Point Numbers
• E.g., 228 in floating point binary (IEEE 754 standard)
0
1000 0110
11001000000000000000000
sign bit = 0
(positive)
23-bit mantissa without
implicit leading 1
8-bit biased exponent
E = number – bias
E = 134 – 127 = 7
17
Floating Point Numbers
• Special cases: 0, ±∞, NaN
18
value
sign bit
exponent mantissa
0
N/A
00000000
00…000
+∞
0
11111111
00…000
-∞
1
11111111
00…000
NaN
N/A
11111111
non-zero
Floating Point Numbers
• Single versus double precision
• Single: 32-bit float
• Range: ±1.175494 × 10-38 ---> ±3.402824 × 1038
• Double: 64-bit double
• Range: ±2.22507385850720 × 10-308
---> ±1.79769313486232 × 10308
19
# bits
(total)
# sign
bits
# exponent
bits
# mantissa
bits
float
32
1
8
23
double
64
1
11
52
Outline
•
•
•
•
•
•
Fixed-point Numbers
Floating Point Numbers
Superscalar Processors
Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
20
Superscalar Processors
• Multiple hardwired copies of datapath
• Allows multiple instructions to execute simultaneously
• E.g., 2-way superscalar processor
•
•
•
•
21
Fetches / executes 2 instructions per cycle
2 ALUs
2-port memory unit
6-port register file (4 source, 2 write back)
Superscalar Processors
• Datapath for 2-way superscalar processor
6-port register
file
22
2 ALUs
2-port
memory unit
Superscalar Processors
• Pipeline for 2-way superscalar processor
• 2 instructions per cycle:
23
Superscalar Processors
• Commercial processors can be 3, 4, or even 6-way superscalar
• Very difficult to manage dependencies and hazards
Intel Nehalam (6-way superscalar)
24
Outline
•
•
•
•
•
•
Fixed-point Numbers
Floating Point Numbers
Superscalar Processors
Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
25
Multithreading (Terms)
• Process: program running on a computer
• Can have multiple processes running at same time
• E.g., music player, web browser, anti-virus, word processor
• Thread: each process has one or more threads that can run
simultaneously
• E.g., word processor: threads to read input, print, spell check, auto-save
26
Multithreading (Terms)
• Instruction level parallelism (ILP): # of instructions that can be
executed simultaneously for program / microarchitecture
• Practical processors rarely achieve ILP greater than 2 or 3
• Thread level parallelism (TLP): degree to which a process can
be split into threads
27
Multithreading
• Keeps processor with many execution units busy
• Even if ILP is low or program is stalled (waiting for memory)
• For single-core processors, threads give illusion of
simultaneous execution
• Threads take turns executing (according to OS)
• OS decides when a thread’s turn begins / ends
28
Multithreading
• When one thread’s turn ends:
-- OS saves architectural state
-- OS loads architectural state of another thread
-- New thread begins executing
• This is called a context switch
• If context switch is fast enough, user perceives threads as
running simultaneously (even on single-core)
29
context switch
context switch
Multithreading
• Multithreading does NOT improve ILP, but DOES improve
processor throughput
• Threads use resources that are otherwise idle
• Multithreading is relatively inexpensive
• Only need to save PC and register file
idle
vs
next task…
30
Outline
•
•
•
•
•
•
Fixed-point Numbers
Floating Point Numbers
Superscalar Processors
Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
31
Homogeneous Multiprocessing
• AKA symmetric multiprocessing (SMP)
• 2 or more identical processors with single shared memory
• Easier to design (than heterogeneous)
• Multiple cores on same (or different) chip(s)
• In 2005, architectures made shift to SMP
32
Homogeneous Multiprocessing
• Multiple cores can execute threads concurrently
• True simultaneous execution
• Multi-threaded programming can be tricky..
single-core
threads w/ single-core vs. multi-core
core #1
core #2
multi-core
core #3
33
core #4
Outline
•
•
•
•
•
•
Fixed-point Numbers
Floating Point Numbers
Superscalar Processors
Multithreading
Homogeneous Multiprocessing
Heterogeneous Multiprocessing
3.141592653589793238462643383…
34
Heterogeneous Multiprocessing
• AKA asymmetric multiprocessing (AMP)
• 2 (or more) different processors
• Specialized processors used for specific tasks
• E.g., graphics, floating point, FPGAs
• Adds complexity
35
Nvidia GPU
Heterogeneous Multiprocessing
• Clustered:
• Each processor has its
own memory
• E.g., PCs connected on a
network
• Memory not shared,
must pass information
between nodes…
• Can be costly
36
Download