A3_survey1

advertisement
Adapting UNIX For A Multiprocessor
Environment Using Threads
Group A3
Jahanzeb Faizan
Jonathan Sippel
Ka Hou Wong
October 15, 2001
Introduction _________________________________________________________________________ 1
The UNIX process model _______________________________________________________________ 1
Limitations of the UNIX process model ____________________________________________________ 2
Improving the UNIX process model_______________________________________________________ 2
Threads ___________________________________________________________________________ 2
Figure 1 _____________________________________________________________________ 3
Figure 2 _____________________________________________________________________ 4
Figure 3 _____________________________________________________________________ 5
Multithreaded systems ______________________________________________________________ 5
Kernel threads _____________________________________________________________________ 6
Lightweight processes _______________________________________________________________ 7
Figure 4 _____________________________________________________________________ 7
User threads _______________________________________________________________________ 8
Figure 5 _____________________________________________________________________ 8
Figure 6 _____________________________________________________________________ 9
SunOS 5.0: A case study________________________________________________________________ 9
Figure 7 ____________________________________________________________________ 11
Summary ___________________________________________________________________________ 12
References __________________________________________________________________________ 12
Group A3
i
3/6/16
Introduction
During the 1980s, the demand for processing power exceeded the capabilities of current
computer systems. To satisfy the demand for increased computing power, systems were
developed with multiple processors. These systems typically shared the same system
memory and Input/Output (I/O) infrastructure, an advance that required operating
systems to change.
UNIX was selected for these new systems because it was originally designed as a
portable, general-purpose, multi-tasking operating system, which was continually being
adapted to new and more powerful computer architectures. These multiprocessor
computers were natural candidates for a UNIX-based operating system because the highlevel management techniques UNIX used for files, I/O, memory, and processes were
efficient models for manageable operating system functions. Since the operating system
was originally developed to run on a single processor, there were a number of challenges
in porting this familiar and efficient environment to powerful multiprocessor machines.
Early implementations of multiprocessor UNIX operating systems were asymmetric in
nature. The kernel could run on only one processor at a time, while user processes could
be scheduled on any of the available processors. The implementation of asymmetric
multiprocessors was a move in the right direction, but the scalability declined rapidly as
additional processors were added.4 This brought about the need to redesign the UNIX
operating system to better support multiprocessor systems.
In this paper we will describe the traditional UNIX process model, discuss its limitations,
and review how it has been redesigned to better support concurrency and parallelism in a
multiprocessor environment. Due to the scope of this topic alone, we will not be able to
discuss the need for new synchronization methods and scheduling algorithms to support
multithreaded applications.
The UNIX process model
The UNIX application environment contains a fundamental abstraction—the process.7 In
traditional UNIX systems, the process executes a single sequence of instructions in an
address space. The address space of a process is simply the set of memory locations that
the process may reference or access.
The UNIX system is a multitasking environment, i.e. several processes are active in the
system concurrently. To these processes, the system provides some features of a virtual
machine. In a virtual machine architecture the operating system gives each process the
illusion that it is the only process on the machine. The programmer writes an application
as if only its code were running on the system. Under UNIX, each process has its own
registers and memory, but must rely on the UNIX kernel for I/O and device control.
UNIX processes contend for the various resources of the system, such as the processor,
memory, and peripheral devices. The UNIX kernel must act as a resource manager,
Group A3
1
3/6/16
distributing the system resources optimally. A process that cannot acquire a resource it
needs must block (suspend execution) until that resource becomes available. Since the
processor is one such resource, only one process can actually run at a time in a single
processor system. The rest of the processes are blocked, waiting for either the processor
or other resources. The UNIX kernel provides an illusion of concurrency by allowing
one process to have the processor for a brief period of time, then switching to another. In
this way each process receives processor time and is allowed to make progress.
Limitations of the UNIX process model
The process model has a couple of important limitations. First, many applications wish
to perform several largely independent tasks that can run concurrently, but must share a
common address space and other resources. These processes are parallel in nature and
require a programming model that supports parallelism. On traditional UNIX systems,
these types of programs must spawn multiple processes.
Using multiple processes in an application has some disadvantages. Creating additional
processes adds substantial overhead, since creating a new process requires an expensive
system call. Processes must use interprocess communication facilities such as message
passing or shared memory to communicate because each process has its own address
space.
Second, traditional processes cannot take advantage of multiprocessor architectures
because a process can use only one processor at a time. An application must create a
number of separate processes and dispatch them to the available processors. These
processes must find ways of sharing memory and resources, and synchronizing their tasks
with each other.
Improving the UNIX process model
A process is defined by the resources it uses and the location at which it is executing.
There are many instances where it would be useful for resources to be shared and
accessed concurrently. This situation is similar to the event where a fork() system call is
invoked with a new program counter, or thread of control, executing within the same
address space. Many UNIX variants are now providing mechanisms to support this
through thread facilities.
Threads
A traditional UNIX process has a single thread of control. A thread of control, otherwise
known as a thread, is a sequence of instructions being executed in a program. Each
thread has a program counter (PC) and a stack to keep track of local variables and return
addresses. Threads share the process instructions and most of the processes data. If a
thread changes any shared data the change can be seen by all other threads in the process.
In addition, threads share most of the operating system state of a process.5
Group A3
2
3/6/16
A multithreaded UNIX process is no longer a thread of control in itself; instead it is
associated with one or more threads. Each thread executes independently.
The advantages of multithreaded systems are most apparent when combined with
multiprocessor architectures. An application can achieve true parallelism by running
each thread of a multithreaded process on a different processor.
Figure 1
Figure 17 shows a set of single-threaded processes executing on a system with a single
processor. It appears that the processes are running concurrently on the system because
each process is being executed for a brief period of time before switching to the next. In
this example the first three processes are associated with a server application. The server
program starts a new process for each active client. The server processes have nearly
identical address spaces and share information with one another using interprocess
communication mechanisms. The last two processes are running another server
application.
Group A3
3
3/6/16
Figure 2
Figure 27 shows two servers running in a multithreaded system. Each server runs as a
single process, with multiple threads sharing a single address space. Either the kernel or
a user threads library, depending on the operating system, handles interthread context
switching. Since all of the application threads share a common address space, they can
use efficient, lightweight, interthread communication and synchronization mechanisms,
which significantly reduce the demand on the memory subsystem.
There are potential disadvantages with this approach. For instance, a single-threaded
process does not have to protect its data from other processes. Multithreaded processes
must be concerned with all data in their address space. If more than one thread can
access the data, the processes must use some form of synchronization to avoid data
corruption.
Group A3
4
3/6/16
Figure 3
Figure 37 shows two multithreaded processes running on a system with multiple
processors. All threads of one process share the same address space, but each runs on a
different processor. Therefore, the processes are all running concurrently. This improves
performance considerably because many processes can run at the same time, but also
complicates synchronization problems because only one process should access the data at
a time.
Multithreaded systems
The level of parallelism of a multiprocessor application is measured by the actual degree
of parallel execution achieved and is limited by the number of physical processors
available to the application. Concurrency is the maximum parallelism a multiprocessor
application can achieve using an unlimited number of processors. It is dependent on how
the application is written, how many threads of control can execute at the same time, and
availability of the proper resources for processing.
The kernel recognizes multiple threads of control within a process, schedules them
independently, and multiplexes them onto the available processor(s) to provide system
concurrency. Both single-processor and multiprocessor applications can benefit from
Group A3
5
3/6/16
system concurrency because the kernel is able to schedule another thread if one blocks on
an event or resource.
User-level thread libraries are used by applications to provide user concurrency. The
kernel does not recognize these user threads, or co-routines, so they must be scheduled
and managed by the applications themselves. True concurrency or parallelism is not
achieved since these co-routines cannot actually run in parallel. However, non-blocking
system calls can be used by an application to simultaneously maintain several interactions
in progress. User threads capture the state of these simultaneous interactions in perthread local variables on the thread’s stack instead of using a global state table, which
simplifies programming.
Each concurrency model offers limited value by itself. Threads are used as both
organizational tools and to exploit multiple processors. A kernel thread facility allows
parallel execution on multiple processors, but it is not suitable for structuring user
applications. On the other hand, a purely user-level facility is only useful for structuring
applications and does not allow parallel execution of code.
Many systems combine system and user concurrency to implement a dual concurrency
model. The kernel recognizes multiple threads in a process, and libraries add user threads
that are not seen by the kernel. User threads are desirable in systems with multithreaded
kernels because they allow synchronization between concurrent routines in a program
without the overhead of making system calls. Splitting the thread support functionality
between the kernel and the threads library is good because it reduces the size and
responsibilities of the kernel.
Kernel threads
A kernel thread does not have to be associated with a user process. It is internally created
and destroyed when it is needed by the kernel and is responsible for executing a specific
function. It has its own kernel stack and shares the kernel text and global data. It can be
independently scheduled and uses the standard synchronization mechanisms of the kernel
(e.g. sleep() and wakeup()).7
Kernel threads are mostly used for performing operations like asynchronous I/O. The
kernel can create a new thread to handle each request instead of providing special
mechanisms to handle this. The thread handles the request synchronously, but the
operation appears to be asynchronous to the rest of the kernel. Kernel threads are also
used to handle interrupts.
Kernel threads are inexpensive to create and use since they use limited resources. For
instance, they only use the kernel stack and an area to save the register context when they
are not processing. Context switching between kernel threads is quick since no memory
mappings need to be flushed.7
Group A3
6
3/6/16
Lightweight processes
A lightweight process (LWP) is a kernel-supported user thread. A system must support a
kernel thread before it can support LWPs. As seen in Figure 47, every process may have
at least one LWP, but a separate kernel thread must support each one. The LWPs are
independently scheduled and share the processes address space and other resources.
They can make system calls and block for I/O or resources. True parallelism exists on a
multiprocessor system since each LWP can be sent to run on a different processor. There
are major advantages to using LWPs even on a single-processor system, since resource
and I/O waits block individual LWPs and not the entire process.
Figure 4
Besides the kernel stack and register context, a LWP also needs to maintain some user
state. This mostly includes the user register context, which must be saved when the LWP
is preempted. Each LWP is associated with a kernel thread, but some kernel threads will
not have a LWP and may be dedicated to system tasks.
These multithreaded processes are useful when each thread is relatively independent and
does not frequently interact with other threads. User code in these processes are fully
preemptible, and LWPs share a common address space. Any data that can be accessed at
the same time by multiple LWPs must be accessed in a synchronized manner. Shared
Group A3
7
3/6/16
variables are locked by facilities that are provided by the kernel and LWPs are blocked
from accessing the locked data.
User threads
Thread abstraction can be provided entirely at the user level without involvement from
the kernel. Library packages like POSIX pthreads and Mach c-threads are used to
accomplish this.7 These libraries provide all the functions for creating, synchronizing,
scheduling, and managing threads and do not require any special assistance from the
kernel as illustrated in Figure 57. As a result, the thread interactions are very fast.
Figure 5
In Figure 67, the library acts as a miniature kernel for the threads it controls by combining
user threads and lightweight processes. This ultimately creates a very powerful
programming environment because the kernel recognizes, schedules, and manages the
LWPs. In addition, a user-level library multiplexes user threads on top of LWPs and
provides facilities for inter-thread scheduling, context switching, and synchronization
without involving the kernel.7
Group A3
8
3/6/16
Figure 6
Only the kernel has the ability to modify the memory management registers, so it retains
the responsibility for process switching. User threads are not really schedulable entities,
and the kernel has no knowledge of them. The kernel is responsible for scheduling the
underlying process or LWP, which will use library functions to schedule its threads. The
threads are preempted each time the process or LWP is preempted. Similarly, if a user
thread makes a blocking system call, it blocks the underlying LWP. If the process has
only one LWP, all its threads are blocked. The library also provides protection for shared
data structures using synchronization objects. These synchronization objects usually
contain a type of lock variable such as a semaphore and a queue of threads blocked on it.
Threads must acquire the lock before accessing the data structure. If the data is already
locked, the library blocks the thread by adding it to its blocked thread queue and
transferring control to another thread.
Performance is the biggest advantage of user threads because they implement
functionality at a user level without using system calls and they are very lightweight and
consume no kernel resources when bound to a LWP. This avoids the overhead of trap
processing and moving parameters and data across protection boundaries.7
SunOS 5.0: A case study
Group A3
9
3/6/16
A prime example of a multithreaded UNIX operating system is SunOS 5.0, the operating
system component of the Solaris 2.0 operating environment. Until 1992, SunOS
supported only traditional UNIX processes. Then, in 1992 it was redesigned as a modern
operating system with support for symmetric multiprocessing.
SunOS 5.0 supports user threads by a library for their creation and scheduling, and the
kernel knows nothing of these threads. SunOS 5.0 expects potentially thousands of userlevel threads to be vying for processor time.
SunOS 5.0 defines an intermediate level of threads as well. Between user threads and
kernel threads are lightweight processes. Each process contains at least one LWP. These
LWPs are manipulated by the thread library. The user threads are multiplexed on the
LWPs of the process, and only user threads currently connected to LWPs make progress.
The rest are either blocked or waiting for an LWP on which they can run.
Standard kernel threads execute all operations within the kernel. There is a kernel thread
for each LWP, and there are some kernel threads that run on the kernel’s behalf and have
no associated LWP (for instance, a thread to service disk requests). The SunOS 5.0
thread system is depicted in Figure 7. Kernel threads are the only objects scheduled
within the system. Some kernel threads are multiplexed on the processor(s) in the
system, whereas some are tied to a specific processor. For instance, the kernel thread
associated with a device driver for a device connected to a specific processor will run
only on that processor. By request, a thread can also be pinned to a processor. Only that
thread runs on the processor, with the processor allocated to only that thread (Figure 7).
Group A3
10
3/6/16
Figure 7
Take a look at how the system operates. Any one process may have many user threads.
These user threads may be scheduled and switched among kernel-supported lightweight
processes without the intervention of the kernel. No context switch is needed for one
user thread to block and another to start running, so user threads are extremely efficient.
Lightweight processes support these user threads. Each LWP is connected to exactly one
kernel thread, whereas each user thread is independent of the kernel. There may be many
LWPs in a process, but they are needed only when threads need to communicate with the
kernel. For instance, one LWP is needed for every thread that may block concurrently in
system calls. Consider five different file read requests that could be occurring
simultaneously. Then, five LWPs would be needed, because they could all be waiting for
I/O completion in the kernel. If a process had only four LWPs, then the fifth request
would have to wait for one of the LWPs to return from the kernel. Adding a sixth LWP
would gain us nothing if there were only enough work for five.
The kernel threads are scheduled by the kernel’s scheduler and execute on the
processor(s) in the system. If a kernel thread blocks (usually waiting for an I/O operation
to complete), the processor is free to run another kernel thread. If the thread that blocked
was running on behalf of an LWP, the LWP blocks as well. Any user-level thread
Group A3
11
3/6/16
currently attached to the LWP also blocks. If the process containing that thread has only
one LWP, the whole process blocks until the I/O completes
With SunOS 5.0, a process no longer must block while waiting for I/O to complete. The
process may have multiple LWPs; if one blocks, the others can continue to execute
within the process.
Summary
Redesigning UNIX around threads has made it a much more efficient operating system.
Applications that need to perform several largely independent tasks concurrently, but
must share a common address space and other resources, can now take advantage of
thread facilities. It is no longer necessary for these applications to spawn multiple
processes, thus eliminating the overhead of expensive system calls and providing a more
efficient use of memory and resources.
By having multiple threads of control, a process is no longer limited to running on a
single processor. It can now take advantage of the parallelism that a multiprocessor
architecture provides.
References
1. Maurice J. Bach, The Design of the UNIX Operating System, Prentice Hall,
Englewood Cliffs, New Jersey, 1986.
2. J. R. Eyhholt, S. R. Kleiman, S. Barton, R. Faulkner, A. Shivalingiah, M. Smith, D.
Stein, J. Voll, M. Weeks, D. Willams, Beyond Multiprocessing… Multithreading the
SunOS Kernel, USENIX Summer, 1992, San Antonio, Texas.
3. M. D. Janssens, J. K. Annot, and a. J. Van De Goor, Adapting UNIX for a
Multiprocessor Environment, Communications of the ACM, September, 1986, Vol.
29, no. 9, pp. 895 - 901.
4. Jim Mauro, Solaris Internals: Core Kernel Components, Sun Microsystems Press,
2001.
5. M. L. Powel, S. R. Kleiman, S. Barton, D. Shah, D. Stein, M. Weeks, SunOS
Multithread Architecture, USENIX Winter, 1991, Dallas, Texas.
6. Channing H. Russel and Pamela J. Waterman, Variations on UNIX for Parallelprocessing Computers, Communications of the ACM, December, 1987, Vol. 30, no.
12, pp. 1048 - 1055.
7. Uresh Vahalia, UNIX Internals: The New Frontiers, Prentice Hall, Upper Saddle
River, New Jersey, 1996.
Group A3
12
3/6/16
Download