Chapter 4: Threads Rev. by Kyungeun Park, 2015 Operating System Concepts Essentials – 8th Edition Silberschatz, Galvin and Gagne ©2011 Chapter 4: Threads Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues Operating System Examples Operating System Concepts Essentials – 8th Edition 4.2 Silberschatz, Galvin and Gagne ©2011 Objectives To introduce the notion of a thread — a fundamental unit of CPU utilization that forms the basis of multithreaded computer systems To discuss the APIs for the Pthreads, Windows, and Java thread libraries To explore several strategies that provide implicit threading To examine issues related to multithreaded programming To cover operating system support for threads in Windows and Linux Operating System Concepts Essentials – 8th Edition 4.3 Silberschatz, Galvin and Gagne ©2011 Motivation Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate threads Update display Fetch data Spell checking Answer a network request Commonly used process-creation method is time-consuming, resource intensive, heavy-weight, while thread creation is light-weight. Can simplify code, increase efficiency Most OS kernels are now generally multithreaded. Operating System Concepts Essentials – 8th Edition 4.4 Silberschatz, Galvin and Gagne ©2011 Multithreaded Server Architecture * Multithreaded Web Server Operating System Concepts Essentials – 8th Edition 4.5 Silberschatz, Galvin and Gagne ©2011 Benefits Responsiveness May allow continued execution if part of process is blocked. Important for user interface Work well with an interactive application Resource Sharing Share the resources of the process, easier than shared memory or message passing Economy Cheaper than process creation Creating a process : thirty times slower than creating a thread Thread switching lower overhead than context switching Scalability Threads are running in parallel on different processors. Process can take advantage of multiprocessor architectures. Operating System Concepts Essentials – 8th Edition 4.6 Silberschatz, Galvin and Gagne ©2011 Classical Thread Model A process, in terms of an execution unit requiring to group related resources together An address space per a process contains program text and data, as well as other resources (open files, child processes, pending alarms, signal handlers, accounting information, etc.) A process has a thread of execution, thread. A program counter per thread Registers holding current working variables Stack containing the execution history Processes are used to group resources together; threads are the entities scheduled for execution on the CPU Thread model allows multiple (independent) executions to take place in the same process environment. Threads are called lightweight processes. In a multithreaded process running on a single-CPU, the threads take turns running. Operating System Concepts Essentials – 8th Edition 4.7 Silberschatz, Galvin and Gagne ©2011 Processes vs. Threads Process 4 Process 3 Process 2 Process 1 Thread 4 Thread 3 Thread 2 Thread 1 Devices… Storage Resource Memory Address Space A Process A Computer Multithreading Multiprogramming Sharing address space and resources (open files, child processes, alarms, signals) Sharing physical memory, disks, devices and more Operating System Concepts Essentials – 8th Edition Resource 4.8 Silberschatz, Galvin and Gagne ©2011 Single and Multithreaded Processes Operating System Concepts Essentials – 8th Edition 4.9 Silberschatz, Galvin and Gagne ©2011 Multithreading A thread can be in any one of several states: running, waiting (blocked), ready, or terminated like a traditional process. When multithreading is present, processes usually start with a single thread. This thread creates new threads by calling a library procedure such as thread_create. The new thread exits by calling thread_exit. One thread can wait for a (specific) thread to exit by calling thread_join. A thread voluntarily give up the CPU to let another thread run by calling thread_yield . Operating System Concepts Essentials – 8th Edition 4.10 Silberschatz, Galvin and Gagne ©2011 Thread Libraries Thread library provides programmer with API for creating and managing threads Two primary ways of implementing Library entirely in user space with Run-time system Kernel-level library supported by the OS Operating System Concepts Essentials – 8th Edition 4.11 Silberschatz, Galvin and Gagne ©2011 Pthreads For portable threaded programs, IEEE has defined a standard for threads in IEEE 1003.1c. The thread package it defines is called Pthreads as the API for thread creation and synchronization. Defines over 60 function calls Specification, not implementation: API specifies behavior of the thread library, implementation is up to development of the library Common in UNIX operating systems (Solaris, Linux, Mac OS X) May be provided either as user-level or kernel-level Operating System Concepts Essentials – 8th Edition 4.12 Silberschatz, Galvin and Gagne ©2011 Pthreads function calls Pthread_create : create a new thread Ptherad_exit: Terminate the calling thread Ptherad_join: Wait for a specific thread to exit Ptherad_yield: Release the CPU to let another thread run Ptherad_attr_init: Create and initialize a thread’s attribute structure Ptherad_attr_destroy: Remove a thread’s attribute structure Operating System Concepts Essentials – 8th Edition 4.13 Silberschatz, Galvin and Gagne ©2011 Pthreads Example I Operating System Concepts Essentials – 8th Edition 4.14 Silberschatz, Galvin and Gagne ©2011 Pthreads Example I (Cont.) Operating System Concepts Essentials – 8th Edition 4.15 Silberschatz, Galvin and Gagne ©2011 Pthreads Example II $ vi threadtest.c #include <stdio.h> #include <stdlib.h> #include <pthread.h> #define NUMBER_OF_THREADS 10 void *print_hello_world(void *tid) { /* This function prints the thread's identifier and then exits. */ printf("Hello World. Greetings from thread %d\n", tid); pthread_exit(NULL); } int main(int argc, char *argv[]) { /* The main program creates 10 threads and then exits. */ pthread_t threads[NUMBER_OF_THREADS]; int status, i; for (i=0; i < NUMBER_OF_THREADS; i++) { printf("Main here. Creating thread %d\n", i); status = pthread_create(&threads[i],NULL,print_hello_world,(void *)(&i)); if (status !=0) { printf("Oops. pthread_create returned error code %d\n", status); exit(-1); } } exit(0); } $ gcc –lpthread –o threadtest threadtest.c Operating System Concepts Essentials – 8th Edition 4.16 Silberschatz, Galvin and Gagne ©2011 Pthreads Code for Joining 10 Threads Operating System Concepts Essentials – 8th Edition 4.17 Silberschatz, Galvin and Gagne ©2011 Win32 API Multithreaded C Program Operating System Concepts Essentials – 8th Edition 4.18 Silberschatz, Galvin and Gagne ©2011 Win32 API Multithreaded C Program (Cont.) Operating System Concepts Essentials – 8th Edition 4.19 Silberschatz, Galvin and Gagne ©2011 Java Threads Java threads are managed by the JVM Typically implemented using the threads model provided by underlying OS Java threads may be created by: Extending Thread class Implementing the Runnable interface Operating System Concepts Essentials – 8th Edition 4.20 Silberschatz, Galvin and Gagne ©2011 Java Multithreaded Program Operating System Concepts Essentials – 8th Edition 4.21 Silberschatz, Galvin and Gagne ©2011 Java Multithreaded Program (Cont.) Operating System Concepts Essentials – 8th Edition 4.22 Silberschatz, Galvin and Gagne ©2011 Multicore Programming Multicore system Multiple computing cores on a single chip More efficient use of multiple cores and improved concurrency of multithreaded programming Multicore systems putting pressure on programmers to make better use of the multiple computing cores Challenges in programming for multicore systems include: Dividing activities (identification of separate and concurrent tasks) Balance (equal work of equal value) Data splitting (data separation by tasks) Data dependency (synchronization required) Testing and debugging (many execution paths) Types of parallelism Data parallelism – distributes subsets of the same data across multiple cores, same operation on each Task parallelism – distributing threads across cores, each thread performing unique operation Operating System Concepts Essentials – 8th Edition 4.23 Silberschatz, Galvin and Gagne ©2011 Concurrency vs. Parallelism Concurrent execution on single-core system: (without parallelism) • concurrency as an interleaving • only one thread at a time (illusion of parallelism by rapidly switching between processes) Parallelism on a multi-core system: • Threads can run in parallel • A separate thread to each core Operating System Concepts Essentials – 8th Edition 4.24 Silberschatz, Galvin and Gagne ©2011 Amdahl’s Law Identifies performance gains from adding additional cores to an application that has both serial and parallel components Amdahl’s Law S is serial portion N processing cores I.e. if application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times As N approaches infinity, speedup converges to 1 / S Serial portion of an application has disproportionate effect on performance gained by adding additional cores Operating System Concepts Essentials – 8th Edition 4.25 Silberschatz, Galvin and Gagne ©2011 User Threads and Kernel Threads User threads – Managed by user-level threads library Three primary thread libraries: POSIX Pthreads Win32 threads Java threads Kernel threads - Supported by the Kernel (the operating system) Examples – virtually all general purpose operating systems, including: Windows Solaris Linux Tru64 UNIX Mac OS X Operating System Concepts Essentials – 8th Edition 4.26 Silberschatz, Galvin and Gagne ©2011 User Thread Implementation (1) In User Space: Put the threads package entirely in user space Kernel knows nothing about them and manages it as a single-threaded processes With this approach, threads are implemented by a library. Each process needs its own private thread table to keep track of the threads in that process with each thread’s program counter, stack pointer, registers, state, and so forth. The run-time system is in charge of managing user threads and the thread table. Thread scheduler is needed as just a local procedure Advantages: User-level threads package can be implemented even on an OS that does not support threads. Each process has its own customized scheduling algorithm Invoking the local thread scheduler is much more efficient than making a kernel system call When changing thread, Neither trap nor context switch needed No memory need not be flushed Operating System Concepts Essentials – 8th Edition 4.27 Silberschatz, Galvin and Gagne ©2011 User Thread Implementation (2) Problems: When implementing blocking system calls: Since this will stop all the threads in a process, letting a thread make a system call (e.g. read() system call) is unacceptable. Pages fault within a thread: the kernel blocks the entire process until the disk I/O is complete without being aware of the existence of threads. Possible Solutions: – Convert Blocking system calls to non-blocking system calls: also requires changes to many user programs – Tell in advance if a call will block Threads running forever: Thread starts running and never gives up the CPU No other threads in the process will ever run. Within a single process, there are no clock interrupts (no round-robin scheduling with a fixed time-quantum available) Operating System Concepts Essentials – 8th Edition 4.28 Silberschatz, Galvin and Gagne ©2011 Kernel Thread Implementation In the Kernel: Kernel knows about and manage the threads. Neither run-time system nor thread table is needed in each process. Kernel has a thread table that keeps track of all the (kernel) threads in the system as well as the traditional process table Kernel’s thread table holds each thread’s registers, state, and other information. Thread management overhead No need for non-blocking system calls Costs for creating and destroying threads Recycling threads General context switch across threads Problems: Cost of system calls for creation and termination of threads When creating a child process: with one thread or multiple threads? Signal processing Signals are sent to processes, not to threads. Who will get the signal? Operating System Concepts Essentials – 8th Edition 4.29 Silberschatz, Galvin and Gagne ©2011 Multithreading Models Multithreading Relationship between user threads and kernel threads Many-to-One One-to-One Many-to-Many Operating System Concepts Essentials – 8th Edition 4.30 Silberschatz, Galvin and Gagne ©2011 Many-to-One Many user-level threads mapped to single kernel thread Managed by the thread library in user space One thread blocking causes all to block (e.g. a blocking system call made by a thread) Multiple threads may not run in parallel on multicore system because only one may be in kernel at a time Few systems currently use this model Examples: Solaris Green Threads GNU Portable Threads Operating System Concepts Essentials – 8th Edition 4.31 Silberschatz, Galvin and Gagne ©2011 One-to-One Each user-level thread maps to kernel thread which may be run in parallel on multiprocessors Drawback: crating a user thread requires creating the corresponding kernel thread (limited number of threads) More concurrency than many-to-one Number of threads per process sometimes restricted due to overhead Examples Windows NT/XP/2000 Linux Solaris 9 and later Operating System Concepts Essentials – 8th Edition 4.32 Silberschatz, Galvin and Gagne ©2011 Many-to-Many Model Multiplexes many user level threads to a smaller or equal number of kernel threads Advantages: Reasonable degree of concurrency is supported unlike the many-to-one model User doesn’t need to be careful not to create too many threads within an application with one-to-one model as many user threads as necessary Allows the operating system to create a sufficient number of kernel threads Blocking-system call invoked the kernel can reschedule another thread for execution Solaris prior to version 9 Operating System Concepts Essentials – 8th Edition 4.33 Silberschatz, Galvin and Gagne ©2011 Two-level Model Similar to M-to-M, except that it allows a user thread to be bound to kernel thread Examples IRIX HP-UX Tru64 UNIX Solaris 8 and earlier Operating System Concepts Essentials – 8th Edition 4.34 Silberschatz, Galvin and Gagne ©2011 Implicit Threading Growing multicore processing results in applications containing hundreds, or even thousands of threads program correctness more difficult with explicit threads As a solution, creation and management of threads from programmers to compilers and run-time libraries: implicit threading Three alternative approaches for designing multithreaded programs with multicore processors through implicit threading Thread Pools OpenMP Grand Central Dispatch Other methods include Microsoft Threading Building Blocks (TBB), java.util.concurrent package Operating System Concepts Essentials – 8th Edition 4.35 Silberschatz, Galvin and Gagne ©2011 Thread Pools Thread pool: Create a number of threads at process startup and place them into a pool where they await work A Request awakens a thread from the pool Completion returns to the pool Advantages: Usually slightly faster to service a request with an existing thread than create a new thread Allows the number of threads in the application(s) to be bound to the size of the pool Because unlimited threads could exhaust system resources, such as CPU time or memory. Separating the tasks from the details of creating the tasks allows different strategies for running the task. (e.g. scheduling by time delay or periodic scheduling) Windows API for supporting thread pools: Declaring a thread function DWORD WINAPI PoolFunction(AVOID Param) { /* This function runs as a separate thread. */ } Invoking a thread function, PoolFunction() declared as a thread’s task QueueUserWorkItem(&PoolFunction, NULL,0); Operating System Concepts Essentials – 8th Edition 4.36 Silberschatz, Galvin and Gagne ©2011 OpenMP #pragma omp parallel Set of compiler directives and an API for C, C++, FORTRAN programs Create as many threads as there are cores in the system Provides support for parallel programming in shared-memory environments #pragma omp parallel for for(i=0;i<N;i++) { Identifies parallel regions as blocks of code that may run in parallel Let compilers process multithreaded parallel program with conditional compiling directives Allows developers to choose among several levels of parallelism Available on several compilers for Linux, Windows, and Mac OS X systems Operating System Concepts Essentials – 8th Edition c[i] = a[i] + b[i]; } Run for loop in parallel 4.37 Silberschatz, Galvin and Gagne ©2011 Grand Central Dispatch (GCD) GCD is a technology for Apple’s Mac OS X and iOS operating systems Extensions to C, C++ languages, API, and run-time library Allows developers to identify parallel sections GCD manages most of the details of threading GCD identifies extensions to the C and C++ languages known as blocks Block is in “^{ }” : ˆ{ printf("I am a block"); } GCD schedules blocks for run-time execution by placing them on a dispatch queue Assigned to available thread in thread pool when removed from queue Two types of dispatch queues: serial queue – blocks removed in FIFO order, queue is per process, called main queue Programmers can create additional serial queues within program concurrent queue – removed in FIFO order but several may be removed at a time Three system wide concurrent dispatch queues according to priority: low, default, and high Operating System Concepts Essentials – 8th Edition 4.38 Silberschatz, Galvin and Gagne ©2011 Threading Issues Changes in the traditional semantics of fork() and exec() system calls in a multithreaded program: Does fork() duplicate only the calling thread or all threads from the new process? Two versions of fork() system call When one thread in a program calls fork() : duplicate all threads or only the thread invoked the fork() Depending on application, duplicate all threads of the process or only the calling thread – If exec() called immediately after forking: duplicate only the calling thread – If not calling exec() after forking: duplicate all threads Operating System Concepts Essentials – 8th Edition 4.39 Silberschatz, Galvin and Gagne ©2011 Signal Handling Signals are used in UNIX systems to notify a process that a particular event has occurred. A signal may be received either synchronously or asynchronously, depending on the source of the event and reason for the event. All signals follow the same pattern: 1. Signal is generated by a particular event. 2. Signal is delivered to a process 3. Once delivered, the signal must be handled by one of two signal handlers: A default signal handler A user-defined signal handler overriding default For single-threaded, signal delivered to process Options in multi-threaded programs: Deliver the signal to the thread to which the signal applies Deliver the signal to every thread in the process Deliver the signal to certain threads in the process Assign a specific (master) thread to receive all signals for the process Operating System Concepts Essentials – 8th Edition 4.40 Silberschatz, Galvin and Gagne ©2011 Thread Cancellation Thread cancellation: terminating a thread before it has completed Target thread: thread to be canceled The task of terminating a thread before it has completed Stopping a Web browser Asynchronous cancellation terminates the target thread immediately. Deferred cancellation allows the target thread to periodically check if it should be canceled (self termination) Pthread code to create and cancel a thread: Operating System Concepts Essentials – 8th Edition 4.41 Silberschatz, Galvin and Gagne ©2011 Thread Cancellation (Cont.) Invoking pthread_cancel() initiates cancellation, but the actual cancellation depends on the state (configuration) of the target thread If thread has cancellation disabled, cancellation remains pending until thread enables it Default type is deferred cancellation Cancellation only occurs when thread reaches cancellation point i.e. pthread_testcancel() Then cleanup handler is invoked On Linux systems, thread cancellation is handled through signals Operating System Concepts Essentials – 8th Edition 4.42 Silberschatz, Galvin and Gagne ©2011 Thread-Local Storage Thread-local storage (TLS) allows each thread to have its own copy of data Useful when you do not have control over the thread creation process (i.e., when using a thread pool) Different from local variables Local variables visible only during single function invocation TLS visible across function invocations Similar to static data TLS is unique to each thread Operating System Concepts Essentials – 8th Edition 4.43 Silberschatz, Galvin and Gagne ©2011 Lightweight Processes Intermediate data structure between the user and kernel threads Lightweight process (LWP) To the user-thread library, the LWP appears to be a virtual processor on which the application can schedule a user thread to run. Each LWP is attached to a kernel thread. Operating system schedules kernel threads to run on physical processors. The user-level thread also blocks. upcalls If a kernel thread blocks, the LWP blocks, as well. Operating System Concepts Essentials – 8th Edition 4.44 Silberschatz, Galvin and Gagne ©2011 Scheduler Activations Communication between the kernel and the thread library Required by both M:M and Two-level models to maintain the appropriate number of kernel threads allocated to the application Scheduler activation : one scheme for communication between the user-thread library and the kernel. The kernel provides an application with a set of virtual processors (LWPs). The application can schedule user threads onto an available LWP. Each LWP is attached to a kernel thread which is scheduled by OS to run on physical processor Scheduler activations provide upcalls - a communication mechanism from the kernel to the thread library Upcall is made by the kernel to inform an application (its run-time system) about certain events. (informing application that a thread is about to block because of blocking system call or page fault and identifying the specific thread or I/O completion) Causing the thread scheduler schedule a new thread Operating System Concepts Essentials – 8th Edition 4.45 Silberschatz, Galvin and Gagne ©2011 Operating System Examples Windows Threads Linux Threads Operating System Concepts Essentials – 8th Edition 4.46 Silberschatz, Galvin and Gagne ©2011 Windows Threads Implements the one-to-one mapping, kernel-level Each thread contains A thread id Register set Separate user and kernel stacks Private data storage area The register set, stacks, and private storage area are known as the context of the threads The primary data structures of a thread include: ETHREAD (executive thread block) KTHREAD (kernel thread block) TEB (thread environment block) Operating System Concepts Essentials – 8th Edition 4.47 Silberschatz, Galvin and Gagne ©2011 Windows Threads Data Structures Operating System Concepts Essentials – 8th Edition 4.48 Silberschatz, Galvin and Gagne ©2011 Linux Threads Linux refers to them as tasks rather than threads A new task is created with fork(): Generally copies all the associated data structures of the parent A new task (thread) creation is also created through clone() system call with a set of flags notifying the level of sharing between parent and child tasks Varying level of sharing depending on the set of flags passed to clone() Following set of the flags for clone() acts like an ordinary thread creation. If non of the above is set, clone() is similar to fork() struct task_struct points to process data structures (shared or unique) Operating System Concepts Essentials – 8th Edition 4.49 Silberschatz, Galvin and Gagne ©2011