W4118 Operating Systems Instructor: Junfeng Yang

advertisement
W4118 Operating Systems
Instructor: Junfeng Yang
Logistics

Homework 2 update

Assembly code to call a hook function written in C
• syscall_fail(long syscall_nr)

Clarifications on sys_fail (int ith, int ncall, struct
syscall_failures calls)
• Only fail system calls from the current process
• Each fail() injects only one failure
• if a system call matches one of the system call numbers
specified in the calls argument, count it as one matching
call toward ith

Mac users: talk to TA to get access to VMware fusion
Last lecture

Processes in Linux



Context switch on x86
Kernel stack captures all state  switch stack =
switch process
Threads: good at expressing concurrency
efficiently


Multithreading Models: different tradeoffs
Race conditions
Recall: banking example
int balance = 1000;
int main()
{
pthread_t t1, t2;
pthread_create(&t1, NULL, deposit, (void*)1);
pthread_create(&t2, NULL, withdraw, (void*)2);
pthread_join(t1, NULL);
pthread_join(t2, NULL);
printf(“all done: balance = %d\n”, balance);
return 0;
}
void* deposit(void *arg)
{
int i;
for(i=0; i<1e7; ++i)
++ balance;
}
void* withdraw(void *arg)
{
int i;
for(i=0; i<1e7; ++i)
-- balance;
}
Recall: a closer look at the banking
example
$ objdump –d bank
…
08048464 <deposit>:
…
8048473: a1 80 97 04 08
8048478: 83 c0 01
804847b: a3 80 97 04 08
…
0804849b <withdraw>:
…
80484aa: a1 80 97 04 08
80484af: 83 e8 01
80484b2: a3 80 97 04 08
…
// ++
mov
add
mov
balance
0x8049780,%eax
$0x1,%eax
%eax,0x8049780
// -- balance
mov 0x8049780,%eax
sub $0x1,%eax
mov %eax,0x8049780
Avoiding Race Conditions


Race condition: a timing
dependent error involving
shared state
Critical section: a
segment of code that
accesses shared variable
(or resource) and must
not be concurrently
executed by more than
one thread
// ++
mov
add
mov
…
balance
0x8049780,%eax
$0x1,%eax
%eax,0x8049780
// -- balance
mov 0x8049780,%eax
sub $0x1,%eax
mov %eax,0x8049780
…
How to implement critical sections?


Atomic operations: no other
instructions can be interleaved,
executed “as a unit” “all or
none”, guaranteed by hardware
A possible solution: create a
super instruction that does
what we want atomically


add $0x1, 0x8049780
Problem


Can’t anticipate every possible
way we want atomicity
Increases hardware
complexity, slows down other
instructions
// ++
mov
add
mov
…
balance
0x8049780,%eax
$0x1,%eax
%eax,0x8049780
// -- balance
mov 0x8049780,%eax
sub $0x1,%eax
mov %eax,0x8049780
…
Layered approach to synchronization

Hardware provides simple low-level atomic
operations, upon which we can build high-level,
atomic operations, upon which we can
implement critical sections and build correct
multi-threaded/multi-process programs
Properly synchronized application
High-level synchronization
primitives
Hardware-provided low-level
atomic operations
Example low-level atomic operations and
high-level synchronization primitives

Low-level atomic operations



On uniprocessor, disable/enable interrupt
x86 load and store of words
Special instructions:
• Test-and-set

High-level synchronization primitives



Lock
Semaphore
Monitor
Will look at them all.
Start with lock.
Locks (or Mutex: Mutual exclusion)

Two common operations



lock(): acquire lock exclusively; wait if not available
unlock(): release exclusive access to lock
pthread example
pthread_mutex_t l = PTHREAD_MUTEX_INITIALIZER
void* deposit(void *arg)
void* withdraw(void *arg)
{
{
int i;
int i;
for(i=0; i<1e7; ++i) {
for(i=0; i<1e7; ++i) {
pthread_mutex_lock(&l);
pthread_mutex_lock(&l);
++ balance;
-- balance;
pthread_mutex_unlock(&l);
pthread_mutex_unlock(&l);
}
}
}
}
Critical Section Goals


Requirements

Safety (aka mutual exclusion): no more than one thread in critical

Liveness (aka progress):

Bounded waiting (aka starvation-free)

Makes no assumptions about the speed and number of CPU
section at a time.
• If multiple threads simultaneously request to enter critical section, must
allow one to proceed
• Must not depend on threads outside critical section
• Must eventually allow waiting thread to proceed
• However, assumes each thread makes progress
Desirable properties



Efficient: don’t consume too much resource while waiting
• Don’t busy wait (spin wait). Better to relinquish CPU and let other
thread run
Fair: don’t make some thread wait longer than others. Hard to do
efficiently
Simple: should be easy to use
Implementing Locks: version 1

Can cheat on uniprocessor: implement locks by disabling
and enabling interrupts
lock()
{
}

Linux kernel heavily used this trick in single core days




disable_interrupt();
unlock()
{
enable_interrupt();
}
cli(): __asm__ __volatile__("cli": : :"memory")
sti(): __asm__ __volatile__(“sti": : :"memory")
Good: simple!
Bad:


Both operations are privileged, can’t let user program use
Doesn’t work on multiprocessors
Implementing Locks: version 2



Peterson’s algorithm: software-based lock
implementation
Good: doesn’t require much from hardware
Only assumptions:



Loads and stores are atomic
They execute in order
Does not require special hardware instructions
Software-based lock: 1st attempt
// 0: lock is available, 1: lock is held by a thread
int flag = 0;
lock()
{
while (flag == 1)
; // spin wait
flag = 1;
}


unlock()
{
flag = 0;
}
Idea: use one flag, test then set, if unavailable, spin-wait
(or busy-wait)
Why doesn’t work?


Not safe: both threads can be in critical section
Not efficient: busy wait, particularly bad on uniprocessor (will
solve this later)
Software-based locks: 2nd attempt
// 1: a thread wants to enter critical section, 0: it doesn’t
int flag[2] = {0, 0};
lock()
{
flag[self] = 1; // I need lock
while (flag[1- self] == 1)
; // spin wait
}


unlock()
{
// not any more
flag[self] = 0;
}
Idea: use per thread flags, set then test, to
achieve mutual exclusion
Why doesn’t work?

Not live: can deadlock
Software-based locks: 3rd attempt
// whose turn is it?
int turn = 0;
lock()
{
// wait for my turn
while (turn == 1 – self)
; // spin wait
}


unlock()
{
// I’m done. your turn
turn = 1 – self;
}
Idea: strict alternation to achieve mutual
exclusion
Why doesn’t work?

Not live: depends on threads outside critical section
Software-based locks: final attempt
(Peterson’s algorithm)
// whose turn is it?
int turn = 0;
// 1: a thread wants to enter critical section, 0: it doesn’t
int flag[2] = {0, 0};
unlock()
lock()
{
{
// not any more
flag[self] = 1; // I need lock
flag[self] = 0;
turn = 1 – self;
}
// wait for my turn
while (flag[1-self] == 1
 Why works?
&& turn == 1 – self)
; // spin wait while the
 Safe?
// other thread has intent
 Live?
// AND it is the other
 Bounded wait?
// thread’s turn
}
Notes on Peterson’s algorithm

Great way to start thinking of multi-threaded
programming

Scheduler is “malicious”

The algorithm is useful in other contexts as well

Problem:

Doesn’t work with N>2 threads
• Obvious extension: N flags, turn = 0,1,…,N-1
• Leslie Lamport’s Bakery’s algorithm


doesn’t work
More importantly, doesn’t really work on modern out-oforder processors
Next: implement locks with hardware support
Implementing locks: version 3
// 0: lock is available, 1: lock is held by a thread
int flag = 0;
lock()
unlock()
{
{
while(test_and_set(&flag))
flag = 0;
;
}
}


Problem with the test-then-set approach: test and set
are not atomic
Special atomic operation



test_and_set address, register
load *address to register, and set *address to 1
Why works?
test_and_set on x86



xchg reg, addr: atomically swaps *addr and reg
spin_lock in Linux can be implemented using this instruction
(include/asm-i386/spin_lock.h)
Another way: append a “lock prefix” before an instruction

Examples in include/asm-i386/atomic.h
long test_and_set(volatile long* lock)
{
int old;
asm("xchgl %0, %1"
: "=r"(old), "+m"(*lock) // output
: "0"(1)
// input
: "memory“
// can clobber anything in
// memory, so gcc won’t
// reorder this statement with others
);
return old;
}
Spin-wait or block


So far the lock implementations we’ve seen are busywait or spin-wait locks: endlessly checking the lock flag
without yielding CPU
Problem: waste CPU cycles


On uniprocessor: should not use spin-lock


Worst case: prev thread holding a busy-wait lock gets
preempted, other threads try to acquire the same lock
Yield CPU when lock not available (need OS support)
On multi-processor




Thread holding lock gets preempted  ???
Correct action depends on how long before lock release
• Lock released “quickly”  spin-wait
• Lock released “slowly”  block
Quick or slow is relative to context-switch overhead
Good plan: spin a bit, then block
Problem with simple yield
lock()
{
}

Problem:




while(test_and_set(&flag))
yield();
Still a lot of context switches
Starvation possible
Why? No control over who gets the lock next
Need explicit control over who gets the lock
Implementing locks: version 4
typedef struct __mutex_t {
int flag;
// 0: mutex is available, 1: mutex is not available
int guard;
// guard lock to internal mutex data structure
queue_t *q; // queue of waiting threads
} mutex_t;
void lock(mutex_t *m) {
while (test_and_set(m->guard))
; //acquire guard lock by spinning
if (m->flag == 0) {
m->flag = 1; // acquire mutex
m->guard = 0;
} else {
enqueue(m->q, self);
m->guard = 0;
yield();
}
}

void unlock(mutex_t *m) {
while (test_and_set(m->guard))
;
if (queue_empty(m->q))
// release mutex; no one wants mutex
m->flag = 0;
else
// hold mutex (for next thread!)
wakeup(dequeue(m->q));
m->guard = 0;
}
Add thread to queue when lock unavailable
Semaphore



Synchronization tool that does not require busy waiting
Semaphore S – integer variable
Two standard operations modify S: acquire() and
release()

Originally called P() and V()
• from Dutch Proberen and Verhogen (Dijkstra)




Also called down() and up()
And even wait() and signal()
Higher-level abstraction, less complicated
Can only be accessed via two indivisible (atomic)
operations
P(S) {
while(S ≤ 0)
;
S--;
}
V(S) {
S++;
}
Semaphore Types

Counting semaphore – integer value can
range over an unrestricted domain


Used for synchronization
Binary semaphore – integer value can range
only between 0 and 1; can be simpler to
implement

Used for mutual exclusion: same as mutex
Process i
P(S);
Critical Section
V(S);
Remainder Section
Semaphore Implementation


Must guarantee that no two processes can
execute P () / acquire () and V () / release () on
the same semaphore at the same time
Thus, implementation of these operations
becomes the critical section problem again,
where the acquire and release code are placed
inside the critical section.




Could now have busy waiting in critical section
implementation
But if we know we can’t acquire semaphore, should
we busy wait and burn up the CPU?
Note that applications may spend lots of time in
critical sections and therefore this is not a good
solution.
We’d like a semaphore that sleeps (or at least
lets someone else run)
Semaphore Implementation with no Busy
waiting

With each semaphore there is an
associated waiting queue. Each entry in a
waiting queue has two data items:



Two operations:



value (of type integer)
pointer to next record in the list
block – place the process invoking the
operation on the appropriate waiting queue.
wakeup – remove one of processes in the
waiting queue and place it in the ready queue.
Potential queuing policies: FIFO, LIFO,
undef
Semaphore Implementation with no Busy waiting
(Cont.)

Implementation of acquire():

Implementation of release():
Producer-Consumer Problem

Bounded buffer: size ‘N’


Producer process writes data to buffer


Access entry 0… N-1, then “wrap around” to 0 again
Must not write more than ‘N’ items more than consumer “ate”
Consumer process reads data from buffer

Should not try to consume if there is no data
0
1
N-1
In
Out
Solving Producer-Consumer Problem

Solving with semaphores


We’ll use two kinds of semaphores
We’ll use counters to track how much data is in the
buffer
• One counter counts as we add data and stops the producer
if there are N objects in the buffer
• A second counter counts as we remove data and stops a
consumer if there are 0 in the buffer


Idea: since general semaphores can count for us, we don’t
need a separate counter variable
Why do we need a second kind of semaphore?

We’ll also need a mutex semaphore
Producer-Consumer Problem
Shared: Semaphores mutex, empty, full;
Init: mutex = 1; /* for mutual exclusion*/
empty = N; /* number empty buf entries */
full = 0;
/* number full buf entries */
Producer
Consumer
do {
...
// produce an item in nextp
...
P(empty);
P(mutex);
...
// add nextp to buffer
...
V(mutex);
V(full);
} while (true);
do {
P(full);
P(mutex);
...
// remove item to nextc
...
V(mutex);
V(empty);
...
// consume item in nextc
...
} while (true);
Common Errors with Semaphores
Process i
Process j
Process k
Process l
P(S)
CS
P(S)
V(S)
CS
V(S)
P(S)
CS
P(S)
If (something)
return;
CS
V(S)
A typo. Process
J won’t next
respect
Whoever
calls P() will freeze up.
mutual Iexclusion
even
the other
The
bugif might
be confusing because
A typo. Process
will get stuck
Someone
to release the
follow
the rules
that
othercorrectly.
process
couldforgot
be perfectly
(forever) theprocesses
second time
it does
the
semaphore
before
The
still, once
we’ve
done
two that’s the
correct
yet
onereturning!
you’ll
P() operation.Worse
Moreover,
every
othercode,
nextyou
caller
operations
this hung
way, other
see
when
usewill
theget stuck.
process will “extra”
freeze V()
up too
when trying
processes
might getdebugger
into the CS
to look at its state!
to enter the
critical section!
inappropriately!
Download