Threads - ECE Users Pages

advertisement
Introduction to Threads
 Overview
 Multithreading Models
 Thread Libraries
 Threading Issues
 Operating System Examples
 Windows XP Threads
 Linux Threads
4.1
Threads
 A Thread is just a sequence of instructions to execute
 Threads share the same memory space as other threads in the same
application – so they automatically share data and variables.
 Threads can run on different processor cores on a multicore processor –
this makes applications faster and more responsive
 Even on a single core processor threads make an application more
responsive – if one thread stops waiting for I/O, other threads can still run
 Processes have a unique virtual memory address space and they take a lot
longer for the OS to switch between than threads. Sharing data requires
additional overhead and steps – so they have a lot more overhead than
threads in many applications. Most applications have one process with
several threads.
 In C/C++, a thread typically runs the code in a C/C++ function and a special
API call starts up a new thread running that function.
4.2
Single and Multithreaded Processes
4.3
Benefits of Threads
 Responsiveness
 Applications can run up to N times faster on an N core processor
 Resource Sharing
 Economy
 Scalability
4.4
Multicore Programming
 Applications only run on one processor core - unless they use multiple
threads
 Multicore systems are putting more pressure on programmers to use
threads, multithreaded application challenges include:

Dividing activities

Balancing the Computational Load

Data splitting

Data dependency

Testing and debugging
4.5
Concurrent Execution on a Single-core System
OS can time slice between the four Threads T1…T4
4.6
Parallel Execution on a Multicore System
OS can time slice the four Threads T1…T4 on two
processor cores. Two threads can run in parallel on
different cores. Application could run up to twice as
fast. Without threads, an application can run on only
one core!
4.7
User Threads
 Thread management done by a user-level threads library
 Three primary thread libraries:

POSIX Pthreads

Win32 threads

Java and C# threads
 A simplified thread library wrapper called GThreads will be used in the last
lab on Jinx
4.8
Thread Libraries
 Thread library provides programmer with API for creating and managing
threads
 Two primary ways of implementing

Library entirely in user space

Kernel-level library supported by the OS
4.9
Pthreads
 A POSIX standard (IEEE 1003.1c) API for thread creation
and synchronization
 API specifies behavior of the thread library, implementation
is up to development of the library
 Common in UNIX operating systems (Solaris, Linux, Mac
OS X)
 Can also be added to Windows by installing the optional
Pthreads library
4.10
Java and C# Threads
 Thread support is built into these newer languages with
keywords
 Java threads are managed by the JVM
 C# thread support is in .Net Framework (the C# JVM)
 Typically implemented using the threads model provided by
underlying OS
 Java and C# threads may be created by:

Extending Thread class

Implementing the Runnable interface
4.11
Threading Issues
 Semantics of fork() and exec() system calls
 Thread cancellation of target thread

Asynchronous or deferred
 Signal handling
 Thread pools
 Thread-specific data
 Scheduler activations
4.12
Thread Cancellation
 Terminating a thread before it has finished
 Two general approaches:

Asynchronous cancellation terminates the target
thread immediately

Deferred cancellation allows the target thread to
periodically check if it should be cancelled
4.13
Signal Handling

Signals are used in UNIX systems to notify a process that a
particular event has occurred

A signal handler is used to process signals

1.
Signal is generated by particular event
2.
Signal is delivered to a process
3.
Signal is handled
Options:

Deliver the signal to the thread to which the signal applies

Deliver the signal to every thread in the process

Deliver the signal to certain threads in the process

Assign a specific thread to receive all signals for the
process
4.14
Thread Pools
 Create a number of threads in a pool where they await work
 Advantages:

Usually slightly faster to service a request with an existing thread
than create a new thread

Allows the number of threads in the application(s) to be bound to
the size of the pool
4.15
Windows Threads
 Implements the one-to-one mapping, kernel-level
 Each thread contains

A thread id

Register set

Separate user and kernel stacks

Private data storage area
 The register set, stacks, and private storage area are known
as the context of the threads
4.16
Linux Threads
 Linux refers to them as tasks rather than threads
 Thread creation is done through clone() system call
 clone() allows a child task to share the address space
of the parent task (process)
OS can time slice between the four Threads T1…T4
4.17
Background on the need for Synchronization
• Threads may need to wait for other threads to
finish an operation
• Additionally concurrent access to shared data with
threads may result in data inconsistency (i.e.,
incorrect values)
• Maintaining data consistency requires
mechanisms to ensure the orderly execution of
cooperating processes (or threads)
Example Problem
• Suppose two threads share a common buffer
array. The producer put items in the buffer and
the consumer removes them.
• A solution to a two thread consumer-producer
problem that fills all the buffer space has an
integer count that keeps track of the number of
full buffers. Initially, count is set to 0. It is
incremented by the producer after it produces a
new buffer and is decremented by the consumer
after it consumes a buffer.
Producer
while (true) {
/* produce an item and put in
nextProduced */
while (count == BUFFER_SIZE)
; // do nothing
buffer [in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
count++;
}
Consumer
while (true) {
while (count == 0)
; // do nothing
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
count--;
// consume the item in nextConsumed
}
Critical Section
• The code segments that read and write global
shared data between threads or processes is
called a “critical section”
• Possible race condition bugs on global variable
values – example will follow
• OS Synchronization API used to solve this
• Must be careful and use OS synchronization
primitives to control access to a critical section
or hidden bugs will appear in code
Race Condition on Count
•
count++ could be implemented as
•
register1 = count
register1 = register1 + 1
count = register1
count-- could be implemented as
•
register2 = count
register2 = register2 - 1
count = register2
Consider this execution interleaving with “count = 5” initially:
S0: producer executes register1 = count {register1 = 5}
S1: producer executes register1 = register1 + 1 {register1 = 6}
S2: consumer executes register2 = count {register2 = 5}
S3: consumer executes register2 = register2 - 1 {register2 = 4}
S4: producer executes count = register1 {count = 6 }
S5: consumer executes count = register2 {count = 4}
Need an Atomic Operation
• Count++ and Count-- code must run to end
before switching to other thread to avoid bugs
• Atomic operation here means a basic
operation which cannot be stopped or
interrupted in the middle to switch to another
thread
• Race conditions will occur faster on systems
with multiple processors since threads are
running in parallel
Solution to Critical-Section Problem
1. Mutual Exclusion (Mutex) - If process Pi is executing in its
critical section, then no other processes can be executing in
their critical sections
2. Progress - If no process is executing in its critical section and
there exist some processes that wish to enter their critical
section, then the selection of the processes that will enter
the critical section next cannot be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of
times that other processes are allowed to enter their critical
sections after a process has made a request to enter its
critical section and before that request is granted
 Assume that each process executes at a nonzero speed
 No assumption concerning relative speed of the N processes
Solution to Critical-section Problem Using Mutex Locks
do {
acquire lock
critical section
release lock
remainder section
} while (TRUE);
Deadlock and Starvation
• Deadlock – two or more processes or threads are waiting indefinitely for
an event that can be caused by only one of the waiting processes
• Let S and Q be two semaphores initialized to 1 (i.e. a mutual exclusion
lock)
P0
P1
wait (S);
wait (Q);
.
.
.
signal (S);
signal (Q);
wait (Q);
wait (S);
.
.
.
signal (Q);
signal (S);
• Starvation – indefinite blocking. A process may never be removed from
the semaphore queue in which it is suspended
• Priority Inversion - Scheduling problem when lower-priority process holds
a lock needed by higher-priority process. Might need to run lower –
priority process first to continue. – messes up priority on processes
Barriers for Thread Synchronization
Barriers allow defining synchronization points used to coordinate the
execution of a team of threads. When a thread reaches a
synchronization point, its execution is stopped until all other threads
in the team reach the synchronization point.
Basic Barrier
A simple barrier is implemented using an atomic shared counter. The
counter is incremented by each thread after entering the barrier.
Threads wait at the barrier until the counter becomes equal to the
number of threads.
This kind of barrier cannot be reused, because the counter is never
reset safely.
Reusing the barrier, through resetting the counter, results in possible
starvation, because storing 0 into the counter will mask the old value.
If a thread is suspended during the resetting phase, it will never leave
the barrier.
Sense Reversing Barrier
Adding a sense flag allows reuse of a barrier many times. The
barrier counter is used to keep track of how many threads
have reached the barrier, but the waiting phase is performed
by spinning on a sense flag. Threads wait until the barrier
sense flag matches the thread-private sense flag. The last
thread reaching the barrier resets both the counter and the
barrier sense flag, while each thread must reset its local sense
flag before exiting the barrier.
The sense flag allows the discrimination between odd and
even barrier phases. Resetting the counter is not an unsafe
operation because it does not interfere with the barrier
waiting variable, represented by the sense flag.
Download