2013-05-17 CS 134: Operating Systems Multiprocessing 1 / 13 CS34 CS 134: Operating Systems Multiprocessing 2013-05-17 Overview Multiprocessing Designs OS Implications Programming Models Other Issues 2 / 13 CS34 Overview Multiprocessing Designs OS Implications Overview Programming Models Other Issues Multiprocessing Designs 2013-05-17 SIMD and MIMD CS34 Multiprocessing Designs SIMD and MIMD Multiple CPUs come in several flavors: SIMD: Single Instruction, Multiple Data SIMD and MIMD I Also called vector processor I Sample instruction: a[i] = b[i] + c[i] for i in small range (e.g., 0-3) I Canonical example: GPUs MIMD: Multiple Instruction, Multiple Data I.e., 2 or more (semi-)independent CPUS Multiple CPUs come in several flavors: We won’t talk further about SIMD; from an OS point of view it’s just another CPU. SIMD: Single Instruction, Multiple Data I Also called vector processor I Sample instruction: a[i] = b[i] + c[i] for i in small range (e.g., 0-3) I Canonical example: GPUs MIMD: Multiple Instruction, Multiple Data I.e., 2 or more (semi-)independent CPUS 3 / 13 Multiprocessing Designs 2013-05-17 MIMD Approaches MIMD can be: I Several chips or cores, (semi-)private memories, able to access each other’s memory (NUMA—Non-Uniform Memory Access) I Several chips or cores, one memory (SMP—Symmetric Multiprocessing) I Several boxes (possibly each SMP or NUMA) connected by network (distributed system) 4 / 13 CS34 Multiprocessing Designs MIMD Approaches MIMD Approaches MIMD can be: I Several chips or cores, (semi-)private memories, able to access each other’s memory (NUMA—Non-Uniform Memory Access) I Several chips or cores, one memory (SMP—Symmetric Multiprocessing) I Several boxes (possibly each SMP or NUMA) connected by network (distributed system) OS Implications 2013-05-17 NUMA Issues NUMA means processes access local memory faster ⇒ Allocate process memory on local CPU ⇒ Processes should have “CPU affinity” 5 / 13 CS34 OS Implications NUMA Issues NUMA means processes access local memory faster ⇒ Allocate process memory on local CPU NUMA Issues ⇒ Processes should have “CPU affinity” OS Implications 2013-05-17 SMP Issues CS34 OS Implications SMP Issues SMPs still have caches Introduces cache coherency problems: SMP Issues I Processor 0 uses compare-and-swap to set a lock nonzero I Write goes into local cache for speed I Processor 1 reads lock from own cache, sees it’s still zero. . . Cure: hardware coherency guarantees . . . but spinlocks now have super-high costs I May be better to do thread switch SMPs still have caches Thread switch is high cost, but may be cheaper than spinlock. Introduces cache coherency problems: I Processor 0 uses compare-and-swap to set a lock nonzero I Write goes into local cache for speed I Processor 1 reads lock from own cache, sees it’s still zero. . . Cure: hardware coherency guarantees . . . but spinlocks now have super-high costs I May be better to do thread switch 6 / 13 OS Implications 2013-05-17 SMP Scheduling Threads are often related I Schedule independently or together? I Completely independent: job completion is slowest thread I Together: some CPUs may be wasted on waiting for events I Always good to keep thread x on same CPU (because cache is filled) 7 / 13 CS34 OS Implications SMP Scheduling Threads are often related SMP Scheduling I Schedule independently or together? I Completely independent: job completion is slowest thread I Together: some CPUs may be wasted on waiting for events I Always good to keep thread x on same CPU (because cache is filled) OS Implications 2013-05-17 Distributed Systems Many ways to communicate Most important modern approach is. . . 8 / 13 CS34 OS Implications Distributed Systems Many ways to communicate Most important modern approach is. . . Distributed Systems OS Implications 2013-05-17 Distributed Systems Many ways to communicate Most important modern approach is. . . the Internet! 8 / 13 CS34 OS Implications Distributed Systems Many ways to communicate Most important modern approach is. . . the Internet! Distributed Systems OS Implications 2013-05-17 Distributed Systems Many ways to communicate Most important modern approach is. . . the Internet! Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder 8 / 13 CS34 OS Implications Distributed Systems Many ways to communicate Most important modern approach is. . . the Internet! Communicating with skinny wires introduces new problems: Distributed Systems I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder OS Implications 2013-05-17 Distributed Systems Many ways to communicate Most important modern approach is. . . the Internet! Communicating with skinny wires introduces new problems: I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . . . and what if network connection goes down? 8 / 13 CS34 OS Implications Distributed Systems Many ways to communicate Most important modern approach is. . . the Internet! Communicating with skinny wires introduces new problems: Distributed Systems I Can’t move process to other machine (or must work hard) I Locking becomes really hard I Programming multiprocessor systems is much harder I . . . and what if network connection goes down? Programming Models 2013-05-17 RPC Programming is hard, so need abstractions that simplify things Remote Procedure Call (RPC) makes distant system look like normal function 1. Marshal arguments (i.e., pack up and serialize) 2. Send procedure ID and arguments to remote system 3. Wait for response 4. Deserialize return value Class Exercise What are the advantages and disadvantages? 9 / 13 CS34 Programming Models RPC Programming is hard, so need abstractions that simplify things Remote Procedure Call (RPC) makes distant system look like normal function 1. Marshal arguments (i.e., pack up and serialize) RPC 2. Send procedure ID and arguments to remote system 3. Wait for response 4. Deserialize return value Class Exercise What are the advantages and disadvantages? Programming Models 2013-05-17 DSM CS34 Programming Models RPC is nice, but limits parallelism SMPs can do cool things because memory is shared DSM “Hard” is a gross understatement. RPC is nice, but limits parallelism SMPs can do cool things because memory is shared So why not simulate shared memory across the network? Teeny problem: hard to make it work fasta 10 / 13 DSM So why not simulate shared memory across the network? Teeny problem: hard to make it work fasta Other Issues 2013-05-17 Load Balancing Suppose you have servers A, B, C, and D CS34 Other Issues A notices the situation and sends excess work to C and D Simultaneously, B does the same! Now C and D are overloaded Result can be thrashing Common solution: have one front-end machine whose sole job is allocating load to others 11 / 13 Suppose you have servers A, B, C, and D A and B are currently overloaded, C and D underloaded A notices the situation and sends excess work to C and D Load Balancing Simultaneously, B does the same! Now C and D are overloaded Result can be thrashing Common solution: have one front-end machine whose sole job is allocating load to others Random assignment works surprisingly well. A and B are currently overloaded, C and D underloaded Load Balancing Other Issues 2013-05-17 How Does Google Work? Well, it’s a secret. . . But basically they use the front-end approach Obvious problem: one front end can’t handle millions of requests per second even if it does almost nothing Solution: DNS Round Robin tricks you into picking one of many dozens of front ends (roughly at random) to talk to 12 / 13 CS34 Other Issues How Does Google Work? Well, it’s a secret. . . But basically they use the front-end approach How Does Google Work? Obvious problem: one front end can’t handle millions of requests per second even if it does almost nothing Solution: DNS Round Robin tricks you into picking one of many dozens of front ends (roughly at random) to talk to Other Issues 2013-05-17 Example of Google’s DNS tricks These commands were run within 15 seconds of each other: bow:2:877> host www.google.com www.google.com has address 74.125.224.241 www.google.com has address 74.125.224.242 www.google.com has address 74.125.224.243 www.google.com has address 74.125.224.244 www.google.com has address 74.125.224.240 bow:2:878> ssh www.google.com www.google.com www.google.com www.google.com www.google.com lever.cs.ucla.edu host www.google.com has address 74.125.239.19 has address 74.125.239.20 has address 74.125.239.17 has address 74.125.239.18 has address 74.125.239.16 13 / 13 CS34 Other Issues Example of Google’s DNS tricks Example of Google’s DNS tricks These commands were run within 15 seconds of each other: bow:2:877> host www.google.com www.google.com has address 74.125.224.241 www.google.com has address 74.125.224.242 www.google.com has address 74.125.224.243 www.google.com has address 74.125.224.244 www.google.com has address 74.125.224.240 bow:2:878> ssh www.google.com www.google.com www.google.com www.google.com www.google.com lever.cs.ucla.edu host www.google.com has address 74.125.239.19 has address 74.125.239.20 has address 74.125.239.17 has address 74.125.239.18 has address 74.125.239.16