Duke Systems Servers and Threads Jeff Chase Duke University Processes and threads virtual address space + Each process has a virtual address space (VAS): a private name space for the virtual memory it uses. The VAS is both a “sandbox” and a “lockbox”: it limits what the process can see/do, and protects its data from others. main thread stack other threads (optional) +… Each process has a thread bound to the VAS, with stacks (user and kernel). From now on, we suppose that a process could have additional threads. If we say a process does something, we really mean its thread does it. We are not concerned with how to implement them, but we presume that they can all make system calls and block independently. The kernel can suspend/restart the thread wherever and whenever it wants. STOP wait Threads: a familiar metaphor 1 Page links and back button navigate a “stack” of pages in each tab. 2 Each tab has its own stack. One tab is active at any given time. You create/destroy tabs as needed. You switch between tabs at your whim. 3 Similarly, each thread has a separate stack. The OS switches between threads at its whim. One thread is active per CPU core at any given time. time Threads • A thread is a stream of control. – defined by CPU register context (PC, SP, …) – Note: process “context” is thread context plus protected registers defining current VAS, e.g., ASID or “page table base register(s)”. – Generally “context” is the register values and referenced memory state (stack, page tables) • Multiple threads can execute independently: – They can run in parallel on multiple CPUs... • physical concurrency – …or arbitrarily interleaved on a single CPU. • logical concurrency – Each thread must have its own stack. Two threads sharing a CPU concept reality context switch Two threads: closer look “on deck” and ready to run address space 0 x common runtime program code library running thread CPU (core) data R0 Rn PC SP y x y stack registers stack high Thread context switch switch out switch in address space 0 common runtime x program code library data R0 CPU (core) 1. save registers Rn PC SP y x y registers stack 2. load registers high stack Thread states and transitions exit exited running The kernel process/thread scheduler governs these transitions. sleep blocked wakeup wait, STOP, read, write, listen, receive, etc. STOP wait EXIT ready Sleep and wakeup are internal primitives. Wakeup adds a thread to the scheduler’s ready pool: a set of threads in the ready state. CPU Scheduling 101 The OS scheduler makes a sequence of “moves”. – Next move: if a CPU core is idle, pick a ready thread t from the ready pool and dispatch it (run it). – Scheduler’s choice is “nondeterministic” – Scheduler’s choice determines interleaving of execution blocked threads Wakeup ready pool If timer expires, or wait/yield/terminate GetNextToRun SWITCH() Event-driven programming • Some of the goals of threads can be met by using an event-driven programming model. • An event-driven program executes a sequence of events. The program consists of a set of handlers for those events. – e.g., Unix signals • The program executes sequentially (no concurrency). But the interleaving of handler executions is determined by the event order. • Pure event-driven programming can simplify management of inherently concurrent activities. – E.g., I/O, user interaction, children, client requests • Some of these needs can be met using either threads or eventdriven programming. But often we need both. Event-driven programming vs. threads • Often we can choose among event-driven or threaded structures. • So it has been common for academics and developers to argue the relative merits of “event-driven programming vs. threads”. • But they are not mutually exclusive. • Anyway, we need both: to get real parallelism on real systems (e.g., multicore), we need some kind of threads underneath anyway. • We often use event-driven programming built above threads and/or combined with threads in a hybrid model. • For example, each thread may be event-driven, or multiple threads may rendezvous on a shared event queue. • We illustrate the continuum by looking first at Android and then at concurrency management in servers (e.g., the Apache Web server). Android app: main event loop • The main thread of an Android app is called the Activity Thread. • It receives a sequence of events and invokes their handlers. 1 • Also called the “UI thread” because it receives all User Interface events. – screen taps, clicks, swipes, etc. – All UI calls must be made by the UI thread: the UI lib is not thread-safe. – MS-Windows apps are similar. • The UI thread must not block! – If it blocks, then the app becomes unresponsive to user input: bad. 2 3 Android event loop: a closer look • The main thread delivers UI events and intents to Activity components. • It also delivers events (broadcast intents) to Receiver components. main event loop • Handlers defined for these components must not block. • The handlers execute serially in event arrival order. • Note: Service and ContentProvider components receive invocations from other apps (i.e., they are servers). • These invocations run on different threads…more on that later. Activity Activity UI clicks and intents Receiver Dispatch events by invoking component-defined handlers. Event-driven programming • This “design pattern” is called eventdriven (event-based) programming. • In its pure form the thread never blocks, except to wait for the next event, whatever it is. • We can think of the program as a set of handlers: the system upcalls a handler to dispatch each event. events • Note: here we are using the term “event” to refer to any notification: – arriving input – asynchronous I/O completion – subscribed events – child stop/exit, “signals”, etc. Dispatch events by invoking handlers (upcalls). Android event classes: some details • Android defines a set of classes for event-driven programming in conjunction with threads. • A thread may have at most one Looper bound to a MessageQueue. Looper Message • Each Looper has exactly one thread and exactly one MessageQueue. • The Looper has an interface to register Handlers. Message Queue • There may be any number of Handlers registered per Looper. • These classes are used for the UI thread, but have other uses as well. Handler [These Android details are provided for completeness.] Android: adding services (simplified) main/UI thread main event loop UI clicks and intents binder thread pool Activity Service Activity Provider Receiver Service incoming binder messages Pool of event-driven threads • Android Binder receives a sequence of events (intents) in each process. • They include incoming intents on provider and service components. • Handlers for these intents may block. Therefore the app lib uses a pool of threads to invoke the Handlers for these incoming events. • Many Android apps don’t have these kinds of components: those apps can use a simple event-driven programming model and don’t need to know about threads at all. • But apps having these component types use a different design pattern: pool of event-driven threads. • This pattern is also common in multi-threaded servers, which poll socket descriptors listening for new requests. Let’s take a look. Multi-threaded RPC server [OpenGroup, late 1980s] Ideal event poll API Poll() 1. Delivers: returns exactly one event (message or notification), in its entirety, ready for service (dispatch). 2. Idles: Blocks iff there is no event ready for dispatch. 3. Consumes: returns each posted event at most once. 4. Combines: any of many kinds of events (a poll set) may be returned through a single call to poll. 5. Synchronizes: may be shared by multiple processes or threads ( handlers are thread-safe as well). A look ahead • Various systems use various combinations of threaded/blocking and event-driven models. • Unix made some choices, and then more choices. • These choices failed for networked servers, which require effective concurrent handling of requests. • They failed because they violate each of the five properties for “ideal” event handling. • There is a large body of work addressing the resulting problems. Servers mostly work now. – More about server performance and Unix/Linux later. • The Android Binder model is closer to the ideal. Classic Unix • Single-threaded processes • Blocking system calls – Synchronous I/O: calling process blocks until each I/O request is “complete”. • Each blocking call waits for only a single kind of a event on a single object. – Process or file descriptor (e.g., file or socket) • Add signals when that model does not work. • With sockets: add select system call to monitor I/O on sets of sockets or other file descriptors. – select was slow for large poll sets. Now we have various variants: poll, epoll, pollet, kqueue. None are ideal. Inside your Web server Server application (Apache, Tomcat/Java, etc) accept queue packet queues listen queue disk queue Server operations create socket(s) bind to port number(s) listen to advertise port wait for client to arrive on port (select/poll/epoll of ports) accept client connection read or recv request write or send response close client socket Accept loop while (1) { int acceptsock = accept(sock, NULL, NULL); char *input = (char *)malloc(1024*sizeof (char)); recv(acceptsock, input, 1024, 0); int is_html = 0; char *contents = handle(input,&is_html); free(input); …send response… close(acceptsock); } If a server is listening on only one port/socket (“listener”), then it can skip the select/poll/epoll. Handling a request Accept Client Connection may block waiting on network Read HTTP Request Header Find File may block waiting on disk I/O Send HTTP Response Header Read File Send Data Want to be able to process requests concurrently. Web server (serial process) Option 1: could handle requests serially Client 1 WS Client 2 R1 arrives Receive R1 Disk request 1a R2 arrives 1a completes R1 completes Receive R2 Easy to program, but painfully slow (why?) Web server (event-driven) Option 2: use asynchronous I/O Fast, but hard to program (why?) Client 2 Client 1 WS Disk R1 arrives Receive R1 Disk request 1a R2 arrives Receive R2 1a completes R1 completes Start 1a Finish 1a Web server (multi-process) Option 3: assign one thread per request Client 1 WS1 WS2 Client 2 R1 arrives Receive R1 Disk request 1a R2 arrives Receive R2 1a completes R1 completes Where is each request’s state stored? Concurrency and pipelining CPU DISK Before NET CPU DISK NET After