Introduction to Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Windows XP Threads Linux Threads 4.1 Threads A Thread is just a sequence of instructions to execute Threads share the same memory space as other threads in the same application – so they automatically share data and variables. Threads can run on different processor cores on a multicore processor – this makes applications faster and more responsive Even on a single core processor threads make an application more responsive – if one thread stops waiting for I/O, other threads can still run Processes have a unique virtual memory address space and they take a lot longer for the OS to switch between than threads. Sharing data requires additional overhead and steps – so they have a lot more overhead than threads in many applications. Most applications have one process with several threads. In C/C++, a thread typically runs the code in a C/C++ function and a special API call starts up a new thread running that function. 4.2 Single and Multithreaded Processes 4.3 Benefits of Threads Responsiveness Applications can run up to N times faster on an N core processor Resource Sharing Economy Scalability 4.4 Multicore Programming Applications only run on one processor core - unless they use multiple threads Multicore systems are putting more pressure on programmers to use threads, multithreaded application challenges include: Dividing activities Balancing the Computational Load Data splitting Data dependency Testing and debugging 4.5 Concurrent Execution on a Single-core System OS can time slice between the four Threads T1…T4 4.6 Parallel Execution on a Multicore System OS can time slice the four Threads T1…T4 on two processor cores. Two threads can run in parallel on different cores. Application could run up to twice as fast. Without threads, an application can run on only one core! 4.7 User Threads Thread management done by a user-level threads library Three primary thread libraries: POSIX Pthreads Win32 threads Java and C# threads A simplified thread library wrapper called GThreads will be used in the last lab on Jinx 4.8 Thread Libraries Thread library provides programmer with API for creating and managing threads Two primary ways of implementing Library entirely in user space Kernel-level library supported by the OS 4.9 Pthreads A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization API specifies behavior of the thread library, implementation is up to development of the library Common in UNIX operating systems (Solaris, Linux, Mac OS X) Can also be added to Windows by installing the optional Pthreads library 4.10 Java and C# Threads Thread support is built into these newer languages with keywords Java threads are managed by the JVM C# thread support is in .Net Framework (the C# JVM) Typically implemented using the threads model provided by underlying OS Java and C# threads may be created by: Extending Thread class Implementing the Runnable interface 4.11 Threading Issues Semantics of fork() and exec() system calls Thread cancellation of target thread Asynchronous or deferred Signal handling Thread pools Thread-specific data Scheduler activations 4.12 Thread Cancellation Terminating a thread before it has finished Two general approaches: Asynchronous cancellation terminates the target thread immediately Deferred cancellation allows the target thread to periodically check if it should be cancelled 4.13 Signal Handling Signals are used in UNIX systems to notify a process that a particular event has occurred A signal handler is used to process signals 1. Signal is generated by particular event 2. Signal is delivered to a process 3. Signal is handled Options: Deliver the signal to the thread to which the signal applies Deliver the signal to every thread in the process Deliver the signal to certain threads in the process Assign a specific thread to receive all signals for the process 4.14 Thread Pools Create a number of threads in a pool where they await work Advantages: Usually slightly faster to service a request with an existing thread than create a new thread Allows the number of threads in the application(s) to be bound to the size of the pool 4.15 Windows Threads Implements the one-to-one mapping, kernel-level Each thread contains A thread id Register set Separate user and kernel stacks Private data storage area The register set, stacks, and private storage area are known as the context of the threads 4.16 Linux Threads Linux refers to them as tasks rather than threads Thread creation is done through clone() system call clone() allows a child task to share the address space of the parent task (process) OS can time slice between the four Threads T1…T4 4.17 Background on the need for Synchronization • Threads may need to wait for other threads to finish an operation • Additionally concurrent access to shared data with threads may result in data inconsistency (i.e., incorrect values) • Maintaining data consistency requires mechanisms to ensure the orderly execution of cooperating processes (or threads) Example Problem • Suppose two threads share a common buffer array. The producer put items in the buffer and the consumer removes them. • A solution to a two thread consumer-producer problem that fills all the buffer space has an integer count that keeps track of the number of full buffers. Initially, count is set to 0. It is incremented by the producer after it produces a new buffer and is decremented by the consumer after it consumes a buffer. Producer while (true) { /* produce an item and put in nextProduced */ while (count == BUFFER_SIZE) ; // do nothing buffer [in] = nextProduced; in = (in + 1) % BUFFER_SIZE; count++; } Consumer while (true) { while (count == 0) ; // do nothing nextConsumed = buffer[out]; out = (out + 1) % BUFFER_SIZE; count--; // consume the item in nextConsumed } Critical Section • The code segments that read and write global shared data between threads or processes is called a “critical section” • Possible race condition bugs on global variable values – example will follow • OS Synchronization API used to solve this • Must be careful and use OS synchronization primitives to control access to a critical section or hidden bugs will appear in code Race Condition on Count • count++ could be implemented as • register1 = count register1 = register1 + 1 count = register1 count-- could be implemented as • register2 = count register2 = register2 - 1 count = register2 Consider this execution interleaving with “count = 5” initially: S0: producer executes register1 = count {register1 = 5} S1: producer executes register1 = register1 + 1 {register1 = 6} S2: consumer executes register2 = count {register2 = 5} S3: consumer executes register2 = register2 - 1 {register2 = 4} S4: producer executes count = register1 {count = 6 } S5: consumer executes count = register2 {count = 4} Need an Atomic Operation • Count++ and Count-- code must run to end before switching to other thread to avoid bugs • Atomic operation here means a basic operation which cannot be stopped or interrupted in the middle to switch to another thread • Race conditions will occur faster on systems with multiple processors since threads are running in parallel Solution to Critical-Section Problem 1. Mutual Exclusion (Mutex) - If process Pi is executing in its critical section, then no other processes can be executing in their critical sections 2. Progress - If no process is executing in its critical section and there exist some processes that wish to enter their critical section, then the selection of the processes that will enter the critical section next cannot be postponed indefinitely 3. Bounded Waiting - A bound must exist on the number of times that other processes are allowed to enter their critical sections after a process has made a request to enter its critical section and before that request is granted Assume that each process executes at a nonzero speed No assumption concerning relative speed of the N processes Solution to Critical-section Problem Using Mutex Locks do { acquire lock critical section release lock remainder section } while (TRUE); Deadlock and Starvation • Deadlock – two or more processes or threads are waiting indefinitely for an event that can be caused by only one of the waiting processes • Let S and Q be two semaphores initialized to 1 (i.e. a mutual exclusion lock) P0 P1 wait (S); wait (Q); . . . signal (S); signal (Q); wait (Q); wait (S); . . . signal (Q); signal (S); • Starvation – indefinite blocking. A process may never be removed from the semaphore queue in which it is suspended • Priority Inversion - Scheduling problem when lower-priority process holds a lock needed by higher-priority process. Might need to run lower – priority process first to continue. – messes up priority on processes Barriers for Thread Synchronization Barriers allow defining synchronization points used to coordinate the execution of a team of threads. When a thread reaches a synchronization point, its execution is stopped until all other threads in the team reach the synchronization point. Basic Barrier A simple barrier is implemented using an atomic shared counter. The counter is incremented by each thread after entering the barrier. Threads wait at the barrier until the counter becomes equal to the number of threads. This kind of barrier cannot be reused, because the counter is never reset safely. Reusing the barrier, through resetting the counter, results in possible starvation, because storing 0 into the counter will mask the old value. If a thread is suspended during the resetting phase, it will never leave the barrier. Sense Reversing Barrier Adding a sense flag allows reuse of a barrier many times. The barrier counter is used to keep track of how many threads have reached the barrier, but the waiting phase is performed by spinning on a sense flag. Threads wait until the barrier sense flag matches the thread-private sense flag. The last thread reaching the barrier resets both the counter and the barrier sense flag, while each thread must reset its local sense flag before exiting the barrier. The sense flag allows the discrimination between odd and even barrier phases. Resetting the counter is not an unsafe operation because it does not interfere with the barrier waiting variable, represented by the sense flag.