Unix Internals

advertisement
Unix Internals
3
Threads & lightweight Processes
process – program in execution
process control point
tracks execution sequence – program counter PC
single threaded process
single control point – single instruction sequence
multi-threaded process
multiple control points – multiple instruction sequences
thread -- instruction sequence, computational unit
n threads running on n processors
work completed in 1/nth time required for
single threaded process
thread -- context switch
 user-level threads library
 kernel
thread synchronization
 within process
 between processes
parallelism – actual degree of parallel execution achieved
≤ number of physical processors available
concurrency – maximum parallelism that could be achieved
with an unlimited number of processors
natural implementation model for concurrent applications
kernel provides system concurrency by
recognizing multiple threads of control within process
scheduling them independently
multiplexing threads onto available processors
natural programming model for concurrent applications
user processes provide user concurrency by
invoking the scheduling and multiplexing features of
user-level thread libraries
C. Robert Putnam
Threads & LWP 3
Page 1
3/7/2016
5:27 AM
dual concurrency model
 system concurrency -- threads seen by the kernel
 user concurrency – threads not seen by the kernel
o allows synchronization between concurrent routines
without the overhead of making system calls
process

set of threads
o thread is a dynamic object
 represents a control point in the process
 executes a sequence of instructions
o thread has private objects
 program counter
 stack
 register context
 …

collection of resources
o address space, open files, user credentials, quotas, …
o shared by all threads in process
Kernel Threads
 share
o kernel text
o global data
 have individual kernel stacks
 can be individually scheduled
 use standard (kernel) synchronization mechanisms
 are inexpensive to create and use
o resources used
 kernel stack
 area to save register context
 structure to hold
 scheduling information
 synchronization information
 context switch among kernel threads
o memory mappings do not need to be flushed
o common text area
o common global data
C. Robert Putnam
Threads & LWP 3
Page 2
3/7/2016
5:27 AM
Lightweight Processes

kernel-supported user thread


every active process has at least one LWP
LWP’s
o
are each supported by a separate kernel thread
o
are independently scheduled
o
share the processes address space
o
share other resources of process
o
can make system calls & block for resources
 resource waits may block individual LWP’s
but need not block the entire process
o
may be dispatched to execute on separate processors
(multiprocessor system)
o
require storage space for
 kernel stack
 register context
 user register context
LWP operations

creation

deletion

synchronization

blocking

scheduling
require expensive kernel system calls
consume significant kernel resources

each system call requires two mode switches
o user  kernel
o kernel  user

on each mode switch
o LWP crosses protection boundary
o kernel must
 copy the system call parameters
 user space  kernel space
 kernel space  user space
 validate against malicious processes
C. Robert Putnam
Threads & LWP 3
Page 3
3/7/2016
5:27 AM
User Threads
user-level thread libraries
 provide functions to
o create
o synchronize
o schedule
o terminate
o …
o manage threads
Mach C-threads
POSIX Pthreads
kernel does not recognize user threads
kernel schedules kernel threads for execution
in order to execute
user threads must be mapped to kernel threads
Solaris Thread Model
kernel ─ recognizes, schedules, and manages LWPs
user level library
 multiplexes : user threads  LWP
 provides facilities for
o inter-thread scheduling
o context switching synchronization
 acts as virtual kernel for user threads
user-level thread context – saved & restored without kernel intervention
user thread has private
 user stack
 area to save user-level register context
 signal masks
 …
kernel schedules LWP
LWP uses library functions to schedule user threads onto the LWP
LWP  preemted  user thread  preemted
user thread makes blocking system call  LWP blocked;
if process has only one LWP, then process is blocked
C. Robert Putnam
Threads & LWP 3
Page 4
3/7/2016
5:27 AM
asynchronous I/O mechanisms
perform I/O without blocking
SVR4 IO_SETSIG ioctl() function call
STREAMS device
process issues read / write function returns without blocking
I/O completion  SIGPOLL signal sent to process
library provides
 synchronous interface to the user
 asynchronous interface to the kernel
complexities of asynchronous operations are hidden in the library
user threads
 consume no kernel resources except when bound to LWP
 functionality implemented at user level without using
kernel system calls hence avoids the overhead of
o trap processing
o moving parameters & data across protection boundaries
split scheduling model
threads library -- schedules user threads
kernel -- schedules LWP
no communication between threads library & kernel wrt scheduling
kernel
o does not know relative priorities of user threads
o may preempt LWP running high priority thread to
schedule LWP running low priority thread
kernel threads are primitive objects not visible to applications
LWP are user-visible threads that are recognized by the kernel and are
implemented by linking them to kernel threads
user level threads are not visible to the kernel;
o they may be supported by LWP or
o they may be implemented by linking them to a standard process
C. Robert Putnam
Threads & LWP 3
Page 5
3/7/2016
5:27 AM
Fork
1
2
3
1 2
2
fork – exec
threads 1 or 3
lock resource
prior to fork()
after fork()
thread 2 precluded
from trying to acquire
lock on the same
resource due to the
potential for deadlock
1
2
3
3
before fork(), in parent
thread 1 blocks in parent 
thread 3 opens network connection
after fork(), in child
thread 1 state undefined in the child
thread 3 closes network connection
which may result in
unexpected messages being
sent to the remote node
Signals
Unix POSIX signals are
 directed to processes and
 handled at the process level
C. Robert Putnam
Threads & LWP 3
Page 6
3/7/2016
5:27 AM
Multithreaded Process
1. send signal to each thread – suspend all threads in process
2. appoint master thread in process to receive all signals – not favored
3. send signal to arbitrarily chosen thread
SIGTSTP (stop signal generated from terminal)
SIGINT (interrupt signal generated by external events)
4. heuristically determine the thread which gave rise to the signal
SIGSEGV (segmentation fault)
SIGILL (illegal exception)
5. create a new thread to handle each new signal – not favored
Signals are normally masked to protect critical code sections;
each thread should specify it’s own signal mask ; the overhead
of per-thread signal masks is relatively low & acceptable
Within a process, it is often desirable for LWP’s to know about each other;
many systems provide a special system call that allows one LWP to send
a signal to another specific LWP within the same process.
Stack Growth
SVR4 process overflows user stack  segmentation violation fault 
within a configurable limit, kernel extends stack;
does not signal process
Multithreaded Process
 threads are allocated at the user level by the threads library
 per-thread stacks are allocated by the stack allocator in the threads library
 kernel extensions of a thread stack would conflict with the stack allocator
Multithreaded Systems
 kernel has no knowledge of user stacks
 stacks are allocated from the heap area
by the stack allocator in the threads library
 thread specifies stack size
 library protects against overflow by allocating a write-protected page
just beyond the end of the stack which causes a protection fault
when the stack overflows; the kernel send SIGSEGV signal to
offending thread and it has the responsibility of extending the stack
or handling the overflow
C. Robert Putnam
Threads & LWP 3
Page 7
3/7/2016
5:27 AM
User Level Thread Libraries


C Threads – Mach
Pthreads – POSIX



kernel generally has no explicit knowledge of user threads
thread library may use system calls to implement thread functionality
thread priority
process-relative priority used by the threads scheduler
to select a thread to be attached to a LWP
unrelated to kernel scheduling priority which schedules LPW’s
to be attached to thread executing on a CPU
thread attached to LWP is in running state even if the LWP is blocked,
preempted, or waiting to be scheduled; thread is blocked when it tries
to acquire synchronization object locked by another thread

Implementation of Thread Libraries

traditional UNIX kernels
o threads library acts as a miniature kernel
maintaining the state information for each thread
and
handling all thread operations at the user level
o effectively serializes all processing
some measure of concurrency provided
by using
asynchronous I/O facilities of the system

multithreaded kernels
o bind each thread to a separate LWP
requires kernel involvement in all synchronization &
thread scheduling operations
o multiplex user threads on a (smaller) set of LWP’s
 scheduling of threads onto LWP’s independent of kernel
 LWP ‘s scheduled by kernel
 no way to guarantee resources to a particular thread
 all threads in the LWP are equivalent  good performance
o allow both bounded and unbounded threads in the same process
 multiplex unbounded threads on selected LWP’s within user process
 assign bounded threads exclusive access to selected LWP’s
 allows preferential handling of bounded thread by
□ increasing the priority of it’s LWP or by
□ assigning the LWP exclusive access to a processor
C. Robert Putnam
Threads & LWP 3
Page 8
3/7/2016
5:27 AM
Scheduler Activations






kernel controls processor allocation
processors are allocated to particular processes
threads library schedules threads on assigned processors
threads library informs kernel of events that affect processor allocation;
it may request additional processors or relinquish processors
kernel may preempt a processor and allocate it to another process;
before the kernel preempts processor,
it will inform the threads library;
the threads library will reassign the affected threads
thread blocks in kernel  process retains allocated processor;
kernel informs thread library of blockage;
thread library schedules another thread on the processor
upcall


call made by kernel to threads library
kernel  threads library
scheduler activation





execution context used to run a thread; similar to LWP
allocated its own kernel & user stacks
upcall : kernel passes activation to library, which uses activation to
o process event
o run thread
o invoke system call
process has exactly one activation for each assigned processor
kernel does not timeslice activations on processor
C. Robert Putnam
Threads & LWP 3
Page 9
3/7/2016
5:27 AM
blocking operations


user thread blocks in the kernel
o kernel creates new activation
o upcalls to library; passing new activation to process
o library saves thread state from old activation
o library relinquishes the old activation
o library schedules different user thread on new activation
when blocking operation is complete
o kernel makes upcall to notify library of event
o new activation passed to process
o kernel may
 assign new processor to run new activation

 preempt one of the current activations
► upcall notifies library of two events
● original thread may be resumed
● thread running on that processor has been preempted
► library places both threads on ready list
► library schedules selected thread
kernel informs the thread library of blocking & preemption events;
library can make intelligent scheduling and synchronization decisions
C. Robert Putnam
Threads & LWP 3
Page 10
3/7/2016
5:27 AM
Solaris Threads
(adopted by SVR4.2/MP)
threads library
 multiplexes user threads onto a smaller set of LWPs;
 controls the number of LWPs available to individual processes
 binds selected threads to individual LWPs
kernel threads
 lightweight objects which may be independently
scheduled & dispatched to run on one of the system processors
 do not need to be associated with a process
 created, ran, & destroyed by kernel to execute specific functions
 when switching between kernel threads,
kernel does not need to remap virtual address space
 context switch between kernel threads
is less expensive than
context switch between processes

kernel threads
o are fully preemptible
o may belong to any of the scheduling classes in the system
o use special versions of the synchronization primitives that
 prevent priority inversion
low priority thread
locks resource that is needed by a
higher priority thread
o are used to handle asynchronous activity
allows kernel to associate priority to each activity
and to
appropriately schedule each such activity
o used to support LWPs
Solaris Kernel
kernel thread resources
organized as a set of kernel threads
 stack
 data structure
o kernel registers contents
o priority & scheduling information
o pointers used to put the thread on scheduler queue or
if thread is blocked, on a resource wait queue
o pointer to the stack
o pointers to associated lwp structure and proc structure
(NULL if thread is not assigned to a LWP)
o pointers to maintain a queue of all threads in
 a process
 the system
o information regarding the associated LWP
C. Robert Putnam
Threads & LWP 3
Page 11
3/7/2016
5:27 AM
LWP Implementation
LWPs
 provide multiple threads of control within a single process
 are scheduled independently
 may execute in parallel on multiprocessors
each LWP is bound to its own kernel thread;
binding remains effective through out its lifetime
Unix traditional proc structures & u areas are inadequate to support LWPs
Solaris proc structure
contains the per-process data including the process-specific
portions of the traditional proc structure & traditional u area
Solaris lwp structure
contains the per-LWP portions of the
traditional proc structure & traditional u area
 user-level register contents -- when LWP is not running
 system call arguments, results, error codes
 signal handling information
 resource usage & profiling information
 virtual time alarms
 user time
 CPU usage
 pointer to the associated kernel thread
 pointer to proc structure
 lwp structure can be swapped out with LWP
information that must not be swapped out,
e.g., signal masks,
is kept in the associated thread structure
C. Robert Putnam
Threads & LWP 3
Page 12
3/7/2016
5:27 AM
LWP synchronization primitives
 mutex locks
 condition variables
 counting semaphores
 reader-writer locks








thread trying to acquire a mutex
held by another thread may
busy-wait (spin lock) or
block until mutex is released
all LWP in a process share a common set of signal handlers
each LWP may have its own signal mask;
thus may decide how to handle its own set of signals
each LWP may specify its own alternate stack for handling signals
traps are synchronous signals generated by the actions of the LWP;
they are delivered to the LWP that caused the signal
interrupt signals are asynchronous signals that are created by
events external to the process; they can be delivered to
any LWP that has not masked the signal
LWPs have no global namespace;
LWPs in a process are invisible to other processes
a process cannot direct a
o signal to a specific LWP in another process
o message to a specific LWP in another process
number of LWPs
o determines maximum parallelism achievable
o limits the number of blocking operations outstanding
User Threads





threads library provides facilities for creating, scheduling,
synchronizing, blocking, and destroying threads without kernel
intervention
library creates a pool of LWPs for the process and multiplexes the
user threads on selected LWPs activated from the pool
application may
o specify the number of LWPs to create
o require system to dedicate a LWP to a particular thread
thread types
o bound threads – dedicated to a particular LWP
o unbound threads – share the common LWP pool
pool size depends on number of processors and user threads
C. Robert Putnam
Threads & LWP 3
Page 13
3/7/2016
5:27 AM
User Thread Implementation
user thread information
 TID (thread identifier) – allows threads within a process to
communicate with signals, etc
 saved register state – program counter & stack pointer
 user stack – each thread has its own stack – unknown to kernel
 signal mask – each thread may have its own signal mask
 priority – user thread has process-relative priority
o used by threads scheduler;
o kernel unaware of this priority
o kernel schedules the underlying LWp
 thead local storage
o managed by the threads library
o used to support reentrant versions of the C library interfaces,
e.g., the error code variable errno which must be maintained
for each individual thread
Solaris allows threads in different processes to synchonize with each other;
using synchronization variables placed in sheared memory or placed in files
and accessed by mapping the file via mmap
Synchronization objects may have lifetimes beyond those of the creating
process and may be used to synchronize threads in different processes
Interrrupt Handling


interrupt handlers manipulate data shared by the kernel
kernel must synchronize access to the shared data
traditional Unix
kernel controls access to shared data by raising interrupt priority level (ipl) to
block relevant interrupts while executing the code that accesses such data
C. Robert Putnam
Threads & LWP 3
Page 14
3/7/2016
5:27 AM
Solaris Interrupt & Synchronization Model




kernel synchronization objects – mutex locks, semaphores, etc
interrupt threads
o dynamically created
o assigned higher priority than other types of threads
o use synchronization objects;
 block if they require resource held by another thread
kernel maintains pool of interrupt threads
o preallocated & partially initialized
o pool contains one thread per interrupt level for each CPU
o each thread requires 8KB for stack & thread data
single system-wide thread for the clock
Solaris Interrupt Handling


thread T1 executing on processor P1 when it receives interrupt;
interrupt handler
o raises ipl to prevent further interrupts of same or lower priority;
o allocates interrupt thread T2 from pool and switches context to
it; while T2 executes, T1 is pinned – it may not run on another
CPU; when T2 returns, it switches context back to T1 which
resumes execution
o interrupt thread T2 runs without being completely initialized; thus
it cannot be descheduled; if thread has reason to block, it saves
its state & becomes a completely initialized, independent thread;
capable of running on any CPU; if T2 blocks, it returns control to
T1, unpinning it
Solaris System Call Handling
fork() system call duplicates each LWP of parent in the child; any LWP in
the middle of a system call will return with an EINTR error;
fork1() system call duplicates only thread that invokes it
programmers may write applications using only threads; optimize process
execution by manipulating the underlying LWPs to provide maximal
concurrency
C. Robert Putnam
Threads & LWP 3
Page 15
3/7/2016
5:27 AM
Mach Threads
task
 static object consisting of
o address space
o collection of system resources called port rights
 not an executable entity
 environment in which one or more threads can execute




thread – fundamental execution unit; runs within context of task
task may contain zero or more threads
threads share the task resources
each thread
o has a kernel stack used for system call handling
o has its own computation state,
i.e., program counter, stack pointer, general registers, etc
o is independently scheduled by processor
 threads that belong to tasks are equivalent to LWPs
 pure kernel threads belong to the kernel task
available processors in the system can be divided into
mutually exclusive processor sets
each task &/or thread can be assigned to any processor set
(many processor set operations require superuser privileges)
task structure
 pointer to address map; describes virtual address space of task
 header of list of threads belonging to task
 pointer to processor set to which task is assigned
 pointer to utask structure
 ports
 IPC-related information
C. Robert Putnam
Threads & LWP 3
Page 16
3/7/2016
5:27 AM
thread structure
 links to place thread on scheduler or wait queue
 pointers to the task and processor set to which it belongs
 links to place thread on list of threads in the same task
 links to place thread on the list of threads in the same processor set
 pointer to the process control block (PCB); i.e., register context
 pointer to kernel stack
 scheduling state – runnable, suspended, etc
 scheduling information – priority, scheduling policy, CPU usage
 pointers to associated uthread & utask structures
 thread-specific IPC information
task owns the resources, including the address space;
resources are shared by all the threads, which execute the code;
multithreaded process consists of one task & more than one thread
Mach Thread/Task System Calls
task manipulation
task_create()
task_terminate()
task_suspend()
task_resume()
thread manipulation
thread_create()
thread_terminate(
)
thread_suspend()
thread_resume()
thread modification
thread_status()
thread_mutate()
list of task threads
task_threads()
Mach C-threads Library
cthread_fork( void * (*func)() )  creates new thread that executes func()
cthread_join(T)  suspends calling thread until thread T terminates
cthread_yield()  requests scheduler to preempt calling thread
mutexes & condition variables
C. Robert Putnam
Threads & LWP 3
Page 17
3/7/2016
5:27 AM
implementation options
 coroutine-based library
o multiplexes user threads onto a single-threaded kernel task,
i.e., traditional Unix kernel
o threads are nonpreemptive
o thread switching
■ synchronization procedures,
e.g., current thread blocks on mutex or semaphore
■ cthread_yield() invocation
 thread-based library
o each C-thread is assigned to a different Mach thread
o threads are preemptively scheduled
o threads may execute simultaneously on a multiprocessor
 task-based library
o each C-thread is assigned to a different Mach task
o uses Mach virtual memory primitives to share memory among threads
Digital Unix





based on the Mach operating system
Unix features implemented by Mach primitives
Unix process implemented on top of the task & thread abstractions of Mach
Unix process features not provided by Mach
o user credentials
o open file descriptors
o signal handling
o process groups
multithreaded processes are supported by
o kernel
o Posix-compliant threads library
Digital Unix Architecture





kernel based on Mach 2.5
interface ported from 4.3BSD compatibility layer in Mach 2.5
which was, in turn, ported from the native 4.3BSD implementation
device drivers ported from the BSD-based Digital Ultrix
u area replaced by
o single utask structure for the task
o one uthread structure for each thread in the task
utask structure & uthread structures are
o not at fixed addresses in the process
o not swapped out with the process
C. Robert Putnam
Threads & LWP 3
Page 18
3/7/2016
5:27 AM
task wide information
utask structure






pointers to vnodes of current & root directories
pointer to proc structure
signal handling structures
open file descriptors table
default file creation mask (cmask)
resource usage, quotas, profiling information
thread opens file file descriptor is shared by all threads in task;
all threads share a common current working directory
per-thread resources of Unix process
uthread structure




pointer to saved user-level registers
pathname traversal fields
current & pending signals
thread-specific signal handlers
references to the old u area redirected to utask or uthread fields
e.g.,
#define u_cmask utask -> uu_cmask
#define u_pcb
uthread -> uu_pcb
proc structure









links to place structure on the allocated, zombie, & free process lists
signal masks
kernel threads
pointer to credentials structure
 associated with the kernel task
identification & hierarchy information
 perform system functions
o PID
 does not have
o parent PID
o utask structure
o pointers to
o uthread structure
 parent
o proc structure
 siblings
 children
kernel task
process group & session information
 does not have
scheduling fields (unused)
o address space
status & resource usage fields
o utask structure
pointers to task & utask structures
o uthread structure
pointer to the first thread
o proc structure
C. Robert Putnam
Threads & LWP 3
Page 19
3/7/2016
5:27 AM
proc structure
utask structure
task
thread
uthread
structure
address map
thread
uthread
structure
thread
uthread
structure
system calls & signals
 fork() new process contains a single thread
 traps (synchronous signals)
o delivered to thread that caused event
o each thread is allowed to declare its own set of signal handlers
 interrupts (asynchronous signals)
o delivered to any thread that has enabled acceptance
o all threads share common set of handlers
 all threads in process share single set of signal masks;
stored in proc structure
Pthreads Library Implementation
one Mach thread associated to each Pthread
implements asynchronous I/O functions
provides signal handling synchronization primitives & scheduling
functions
synchronization between threads can be implemented at the user level
kernel must be involved if LWP needs to block
thread T1 calls Posix function aioread();
library creates new thread T2 to issue synchronous read;
when read completes,
kernel wakes blocked thread T2 which notifies calling thread T1 via signal
T1 aioread()
T2 I/O request; block
T1 continues with other work
I/O progresses to completion
wake T2; notify T1
T1 receives & processes notification
C. Robert Putnam
Threads & LWP 3
Page 20
3/7/2016
5:27 AM
Mach 3.0 Continuations
process model
 each tread has its own kernel stack,
used when it traps into kernel for system call or exception
 when a thread blocks in the kernel,
its stack contains its current execution state
 kernel threads can block without having to explicitly save state
 drawback excessive memory consumption
interrupt model



kernel treats system calls & exceptions as interrupts;
► it uses a single per-processor kernel stack for
all kernel operations;
► if thread needs to block while in kernel,
it must first explicitly save its state somewhere;
advantage  memory saved by having only one kernel stack
drawback  must explicitly save state for each blocking operation
continuation model

thread_block ( NULL);  traditional Mach3.0 blocking behavior

thread_block ( void ( *contfn )() );  continuation blocking behavior

operational sequence
o save thread state
o execute thread_block ( void ( *contfn )() );
o kernel blocks thread; removes stack from calling thread
o when thread resumes,
 kernel allocates new stack to thread
 invokes continuation function *contfn ()
o continuation function recovers state from storage location
syscall_1(arg1)
{
■
■
■
/* save arg1 & thread state */
thread_block (f2);
/* not reached*/
}
C. Robert Putnam
Threads & LWP 3
Page 21
f2()
{
/* restore arg1 & thread state */
■
■
■
thread_syscall_return (status);
}
3/7/2016
5:27 AM
Optimizations & Analysis
direct benefit of continuations 
reduce number of kernel stacks in the system
(2.2 kernel stacks per processor)
message transfer optimization
 mach_msg() system call used by
o client thread to send message & wait for reply
o server thread to send replies to clients and wait for next request
 messages are passed via a port, i.e., a protected queue of messages
 sending & receiving messages are independent of one another
 if receiver is not ready, kernel queues message on port
 if sender finds receiver is waiting,
it hands off its stack to receiver &
blocks itself with a mach_msg_continue() continuation system call;
the receiving thread resumes, using senders stack, which already contains
all information necessary regarding the message to be transferred;
 avoids overhead of queueing & dequeueing messages
client thread hands off its stack to server thread & resumes execution
server thread hands off its stack to the client thread & resumes execution
each operating system vendor provides a different set of system calls to
create & manage threads, making it difficult to write portable multithreaded
code that efficiently uses the system resources; the POSIX standard defines
the threads library functions but does not address the kernel interfaces not
implementation
C. Robert Putnam
Threads & LWP 3
Page 22
3/7/2016
5:27 AM
Download