Multiprocessing and Distributed Systems CS-502 Operating Systems Fall 2007 (Slides include materials from Operating System Concepts, 7th ed., by Silbershatz, Galvin, & Gagne, Modern Operating Systems, 2nd ed., by Tanenbaum, and Operating Systems: Internals and Design Principles, 5th ed., by Stallings) CS-502 Fall 2007 Multi-processing and Distributed Systems 1 Multiprocessing Distributed Computing (a spectrum) • Many independent problems at same time • Similar • Different • One very big problem (or a few) • Computations that are physically separated • Client-server • Inherently dispersed computations Different kinds of computers and operating systems CS-502 Fall 2007 Multi-processing and Distributed Systems 2 Multiprocessing Distributed Computing (a spectrum) • Many independent problems at same time • Similar — e.g., banking & credit card; airline reservations • Different — e.g., university computer center; your own PC • One very big problem (or a few) • Computations that are physically separated • Client-server • Inherently dispersed computations Different kinds of computers and operating systems CS-502 Fall 2007 Multi-processing and Distributed Systems 3 Multiprocessing Distributed Computing (a spectrum) • Many independent problems at same time • Similar — e.g., banking & credit card; airline reservations • Different — e.g., university computer center; your own PC • One very big problem (too big for one computer) • Weather modeling, finite element analysis; drug discovery; gene modeling; weapons simulation; etc. • Computations that are physically separated • Client-server • Inherently dispersed computations Different kinds of computers and operating systems CS-502 Fall 2007 Multi-processing and Distributed Systems 4 Multiprocessing Distributed Computing (a spectrum) • Many independent problems at same time • Similar — e.g., banking & credit card; airline reservations • Different — e.g., university computer center; your own PC • One very big problem (too big for one computer) • Weather modeling, Finite element analysis; Drug discovery; Gene modeling; Weapons simulation; etc. • Computations that are physically separated • Client-server • Dispersed and/or peer-to-peer – Routing tables for internet – Electric power distribution – International banking Different kinds of computers and operating systems CS-502 Fall 2007 Multi-processing and Distributed Systems 5 Inherently Distributed Computation (example) • Internet routing of packets • Network layer – send a packet to its IP address • Problems • No central authority to manage routing in Internet • Nodes and links come online or fail rapidly • Need to be able to send a packet from any address to any address • Solution • Distributed routing algorithm CS-502 Fall 2007 Multi-processing and Distributed Systems 6 Distributed routing algorithm (simplified example) • Each node “knows” networks other nodes directly connected to it. • Each node maintains table of distant networks • [network #, 1st hop, distance] • Periodically adjacent nodes exchange tables • Update algorithm (for each network in table) • If (neighbor’s distance to network + my distance to neighbor < my distance to network), then update my table entry for that network CS-502 Fall 2007 Multi-processing and Distributed Systems 7 Distributed routing algorithm (result) • All nodes in Internet have reasonably up-todate routing tables • Rapid responses to changes in network topology, congestion, failures, etc. • Very reliable with no central management! • Network management software • Monitoring health of network (e.g., routing tables) • Identifying actual or incipient problems • Data and statistics for planning purposes CS-502 Fall 2007 Multi-processing and Distributed Systems 8 Summary • Some algorithms are inherently distributed • Depend upon computations at physically separate places • Result is the cumulative results of individual computations • Not much impact on computer or OS architecture CS-502 Fall 2007 Multi-processing and Distributed Systems 9 Taxonomy of Parallel Processing CS-502 Fall 2007 Multi-processing and Distributed Systems 10 Client-Server Computations • Very familiar model • Parts of computation takes place on • User’s PC, credit-card terminal, etc. • … and part takes place on central server • E.g., updating database, search engine, etc. • Central issues • Protocols for efficient communication • Where to partition the problem CS-502 Fall 2007 Multi-processing and Distributed Systems 11 Very Big Problems • Lots and lots of calculations • Highly repetitive • Easily parallelizable into separate subparts • (Usually) dependent upon adjacent subparts • Arrays or clusters of computers • Independent memories • Very fast communication between elements close coupling CS-502 Fall 2007 Multi-processing and Distributed Systems 12 Closely-coupled Systems • Examples – Beowulf clusters – 32-256 PCs – ASCI Red – ~1000-4000 PC-like processors – BlueGene/L – up to 65,536 processing elements CS-502 Fall 2007 Multi-processing and Distributed Systems 13 Taxonomy of Parallel Processing CS-502 Fall 2007 Multi-processing and Distributed Systems 14 IBM BlueGene/L CS-502 Fall 2007 Multi-processing and Distributed Systems 15 Systems for Closely-Coupled Clusters • Independent OS for each node • E.g., Linux in Beowulf • Additional middleware for • • • • Distributed process space Synchronization among nodes (Silbershatz, Ch 18) Access to network and peripheral devices Failure management • Parallelizing compilers • For partitioning a problem into many processes CS-502 Fall 2007 Multi-processing and Distributed Systems 16 Cluster Computer Architecture CS-502 Fall 2007 Multi-processing and Distributed Systems 17 Interconnection Topologies CS-502 Fall 2007 Multi-processing and Distributed Systems 18 Goal • Microsecond-level – messages – synchronization and barrier operations among processes CS-502 Fall 2007 Multi-processing and Distributed Systems 19 Questions? CS-502 Fall 2007 Multi-processing and Distributed Systems 20 Multiprocessing Distributed Computing (a spectrum) • Many independent problems at same time • Similar — e.g., banking & credit card; airline reservations • Different — e.g., university computer center; your own PC • One very big problem (or a few) • Problems that physically separated by their very nature Different kinds of computers and operating systems CS-502 Fall 2007 Multi-processing and Distributed Systems 21 Many Separate Computations • Similar – banking & credit card; airline reservation; … • Transaction processing • Thousands of small transactions per second • Few (if any) communications among transactions – Exception: locking, mutual exclusion, serialization • Common database CS-502 Fall 2007 Multi-processing and Distributed Systems 22 Requirements • Lots of disks, enough CPU power • Accessible via multiple paths • High availability • Fault-tolerant components & OS support • Processor independence • • • • CS-502 Fall 2007 Any transaction on any processor Global file system Load-balancing Scalable Multi-processing and Distributed Systems 23 Options • Cluster (of a somewhat different sort) • Shared-memory multi-processor CS-502 Fall 2007 Multi-processing and Distributed Systems 24 Taxonomy of Parallel Processing CS-502 Fall 2007 Multi-processing and Distributed Systems 25 Cluster Computing • Scalable to large number of processors • Common I/O space – Multiple channels for disk access – Hyper-channel, Fiber-channel, etc. • Multi-headed disks – At least two paths from any processor to any disk • Remote Procedure Call for communication among processors – DCOM, CORBA, Java RMI, etc. CS-502 Fall 2007 Multi-processing and Distributed Systems 26 Shared-Memory Multiprocessor • All processors can access all memory • Some memory may be faster than other memory • Any process can run on any processor • Multiple processors executing in same address space concurrently • Multiple paths to disks (as with cluster) CS-502 Fall 2007 Multi-processing and Distributed Systems 27 Shared Memory Multiprocessors Bus-based •Bus contention limits the number of CPUs CS-502 Fall 2007 •Lower bus contention •Caches need to be synced (big deal) Multi-processing and Distributed Systems •Compiler places data and text in private or shared memory 28 Multiprocessors (2) - Crossbar • • • Can support a large number of CPUs Non-blocking network Cost/performance effective up to about 100 CPU – growing as n2 CS-502 Fall 2007 Multi-processing and Distributed Systems 29 Multiprocessors(3) – Multistage Switching Networks • Omega Network – blocking – Lower cost, longer latency – For N CPUs and N memories – log2 n stages of n/2 switches CS-502 Fall 2007 Multi-processing and Distributed Systems 30 Type of Multiprocessors – UMA vs. NUMA • UMA (Uniform Memory Access) • NUMA (Non-Uniform Memory Access) – All memory is equal – Familiar programming model – Number of processors is limited • 8-32 – Symmetrical – Single address space visible to all CPUs – Access to remote memory via commands • LOAD & STORE • remote memory access slower than to local • No processor is different from others CS-502 Fall 2007 Multi-processing and Distributed Systems 31 Common SMP Implementations • Multiple processors on same board or bus • Dual core and multi-core processors • I.e., two or more processors on same chip • Share same L2 cache • Hyperthreading • One processor shares its circuitry among two threads or processes • Separate registers, PSW, etc. • Combinations of above CS-502 Fall 2007 Multi-processing and Distributed Systems 32 Hardware issue – Cache Synchronization • Each processor cache monitors memory accesses on bus (i.e., “snoops”) – Memory updates are propagated to other caches – Complex cache management hardware – Order of operations may be ambiguous • Some data must be marked as not cacheable – Visible to programming model CS-502 Fall 2007 Multi-processing and Distributed Systems 33 Operating System Considerations • Master-slave vs. full symmetry • Explicit cache flushing • When page is replaced or invalidated • Processor affinity of processes • Context switches across processors can be expensive • Synchronization across processors • Interrupt handling CS-502 Fall 2007 Multi-processing and Distributed Systems 34 Multiprocessor OS – Master-Slave • One CPU (master) runs the OS and applies most policies • Other CPUs – run applications – Minimal OS to acquire and terminate processes • Relatively simple OS • Master processor can become a bottleneck for a large number of slave processors CS-502 Fall 2007 Multi-processing and Distributed Systems 35 Multiprocessor OS – Symmetric Multi-Processor (SMP) • Any processor can execute the OS and any applications • Synchronization within the OS is the issue 1. Lock the whole OS – poor utilization – long queues waiting to use OS 2. OS critical regions – much preferred – Identify independent OS critical regions that be executed independently – protect with mutex – Identify independent critical OS tables – protect access with MUTEX – Design OS code to avoid deadlocks – – CS-502 Fall 2007 The art of the OS designer Maintenance requires great care Multi-processing and Distributed Systems 36 Multiprocessor OS – SMP (continued) • Multiprocessor Synchronization – Need special instructions – test-and-set – Spinlocks are common • Can context switch if time in critical region is greater than context switch time • OS designer must understand the performance of OS critical regions • Context switch time could be onerous – Data cached on one processor needs to be recached on another CS-502 Fall 2007 Multi-processing and Distributed Systems 37 Multiprocessor Scheduling • When processes are independent (e.g., timesharing) – Allocate CPU to highest priority process – Tweaks • For a process with a spinlock, let it run until it releases the lock • To reduce TLB and memory cache flushes, try to run a process on the same CPU each time it runs • For groups of related processes – Attempt to simultaneously allocate CPUs to all related processes (space sharing) – Run all threads to termination or block – Gang schedule – apply a scheduling policy to related processes together CS-502 Fall 2007 Multi-processing and Distributed Systems 38 Linux SMP Support • Multi-threaded kernel • Any processor can handle any interrupt • Interrupts rarely disabled, and only for shortest time possible • Multiple processors can be active in system calls concurrently • Process vs. Interrupt context • Process context:– when processor is acting on behalf of a specific process • Interrupt context:– when processor is acting on behalf of no particular process • Extensive use of spinlocks • Processor affinity properties in task_struct CS-502 Fall 2007 Multi-processing and Distributed Systems 39 WPI Computing & Communications Center • • • • • • CCC cluster – 11 dual 2.4 GHz Xeon processors toth – 16-node 1.5 GHz Itanium2 big16 – 8 dual-core 2.4 GHz Opterons toaster – NetApp 5.5 TByte Fiber Channel RAID NFS breadbox – NetApp 12 TByte RAID NFS Many others • All connected together by very fast networks CS-502 Fall 2007 Multi-processing and Distributed Systems 40 Questions? CS-502 Fall 2007 Multi-processing and Distributed Systems 41