Scheduler Activations: Effective Kernel Support for the User

advertisement
Scheduler Activations: Effective Kernel
Support for the User-Level Management
of Parallelism
THOMAS
E. ANDERSON,
and HENRY
(University
BRIAN N. BERSHAD,
EDWARD
D. LAZOWSKA,
M. LEVY
of Washington
Threads
are the vehicle for concurrency in many approaches to parallel programming.
Threads
can be supported either by the operating
system kernel or by user-level library
code in the
application
address space, but neither approach has been fully satisfactory.
This paper addresses this dilemma. First, we argue that the performance of kernel threads is
inherently
worse than that of user-level threads, rather than this being an artifact of existing
implementations;
managing
parallelism
at the user level is essential to high-performance
parallel
computing.
Next, we argue that the problems encountered in integrating
user-level
threads with other system services is a consequence of the lack of kernel support for user-level
threads provided by contemporary
multiprocessor
operating
systems; kernel threads are the
wrong
abstraction
on which to support user-level
management
of parallelism.
Finally,
we
describe the design, implementation,
and performance of a new kernel interface and user-level
thread package that together provide the same functionality
as kernel threads without compromising the performance and flexibility
advantages of user-level management
of parallelism.
Process Management;
D.4.4
D.4. 7 [Operating
[operating
systems]:
Communications
Management–
input/ output;
tems]: Organization
and Design; D.4.8 [Operating
Systems]: Performance
Categories
Sys-
General
Additional
and Subject
Terms: Design,
Descriptors:
Measurement,
D.4. 1 [Operating
Systems]:
Performance
Key Words and Phrases: Multiprocessorj
thread
1. INTRODUCTION
The
effectiveness
performance
lparallelism
of parallel
computing
of the primitives
within
programs.
that
Even
depends
to a great
extent
on the
are used to express
and control
a coarse-grained
parallel
program
the
can
This work was supported in part by the National
Science Foundation
(grants CCR-8619663,
CCR-8703049, and CCR-8907666), the Washington
Technology Center, and Digital Equipment
Corporation
(the Systems Research Center and the External Research Program). Anderson was
!supported by an IBM Graduate Fellowship
and Bershad by an AT&T
Ph.D. Scholarship.
Anderson is now with the Computer Science Division,
University
of California
at Berkeley;
Bershad is now with the School of Computer Science, Carnegie Mellon University.
Authors’ address: Department
of Computer Science and Engineering,
University
of Washington,
Seattle, WA 98195.
Permission to copy without fee all or part of this material is granted provided that the copies are
not made or distributed
for direct commercial advantage, the ACM copyright notice and the title
of the publication
and its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery.
To copy otherwise, or to republish, requires a fee and/or
specific permission.
01992
ACM 0734-2071/92/0200-0053
$01.50
ACM
Transactions
on Computer
Systems,
Vol.
10, No.
1, February
1992,
Pages
53-79.
54
T. E. Anderson et al
.
exhibit
poor performance
high. Even a fine-grained
of creating
if the cost of creating
program
can achieve
and managing
parallelism
One way to construct
collection
of traditional
is low.
a parallel
UNIX-like
program
is to share memory
processes,
each consisting
address space and a single sequential
space. Unfortunately,
because
such
gramming
in a uniprocessor
for general-purpose
parallel
parallelism
well.
The shortcomings
and managing
parallelism
is
good performance
if the cost
execution
stream within
processes were designed
environment,
programming;
of traditional
that address
for multipro-
they are simply
they handle
only
processes
between
a
of a single
too inefficient
coarse-grained
for general-purpose
parallel
pro-
Threads
separate
the notion
of a
gramming
have led to the use of threads.
sequential
execution
stream
from the other aspects of traditional
processes
such as address spaces and 1/0 descriptors.
This separation
of concerns yields
a significant
1.1
performance
advantage
relative
to traditional
processes.
The Problem
Threads
can
be supported
either
at
approach
has been fully satisfactory.
User-level
threads
are managed
each application
so that
intervention.
The result
PCR
[25]
and
an
order
threads
are also flexible;
level
or
by runtime
in
library
the
[2],
of magnitude
the
cost
of the
they
of user-level
cost
kernel.
routines
thread
management
operations
can be excellent
performance:
FastThreads
within
user
in
linked
into
require
no kernel
systems such as
thread
of a procedure
can be customized
Neither
operations
call.
is
User-level
to the needs of the language
or user without
kernel
modification.
User-level
threads
execute
within
the context
of traditional
processes;
indeed,
user-level
thread
systems are typically
built
without
any modifications
to the
each
process
executing
underlying
operating
system kernel.
and treats
processor,”
as a “virtual
under
pulls
threads
off
virtual
processors
the
underlying
its control;
each virtual
the ready
are being
kernel.
The thread
package views
it as a physical
processor
processor
runs
list and runs them.
In
multiplexed
across real,
“Real
world”
operating
user-level
reality,
physical
system
code that
though,
these
processors
by
activity,
such
as
multiprogramming,
1/0, and page faults,
distorts
the equivalence
between
virtual
and physical
processors;
in the presence
of these factors,
user-level
threads
built
on top of traditional
processes can exhibit
poor performance
or
even incorrect
behavior.
Multiprocessor
operating
systems
such as Mach [21], Topaz [22], and V
[71 provide
Programming
exhibited
direct
kernel
with
kernel
by user-level
support
threads
threads,
for multiple
avoids the
because
the
threads
per address
space.
system integration
problems
kernel
directly
schedules
each
application’s
threads onto physical
processors.
Unfortunately,
kernel threads,
just like traditional
UNIX
processes,
are too heavyweight
for use in many
parallel
programs.
The performance
of kernel
threads,
although
typically
an
order of magnitude
better
cally an order of magnitude
than
worse
that
than
of traditional
the
best-case
ACM TransactIons on Computer Systems, Vol. 10, No 1, February
processes,
performance
1992.
has been
typi-
of user-level
Scheduler Activations:
threads
(e.g.,
in
the
absence
Effective Kernel Support
of multiprogramming
and
1/0).
.
As
55
a result,
user-level
threads
have ultimately
been implemented
on top of the kernel
threads
of both Mach (CThreads
[8]) and Topaz (WorkCrews
[241). User-1evel
threads
are built on top of kernel
traditional
processes;
they have
suffer exactly
The parallel
the same problems.
programmer,
then,
employ user-level
threads,
provided
the application
kernel
1.2
threads
exactly
threads,
which
exactly
as they are built
the same performance,
has been
faced
with
on top of
and they
a difficult
dilemma:
which have good performance
and correct behavior
is uniprogrammed
and does no 1/0,
or employ
have
worse
performance
but
are not as restricted.
The Goals of this Work
In this paper we address this
a user-level
thread
package
kernel
threads
with
the
dilemma.
We describe
a kernel
interface
that
together
combine
the functionality
performance
and
flexibility
of user-level
and
of
threads.
Specifically,
—In
the
tion,
common
case when
our performance
existing
user-level
system
integration).
—In
the
thread
operations
is essentially
thread
infrequent
case
the
management
when
the
—No processor
idles
—No high-priority
in the presence
thread
waits
interven-
achieved
by the best
(which
must
system
need kernel
as that
systems
kernel
processor
reallocation
or 1/0,
our
kernel thread
management
system:
do not
same
from
be involved,
can
of ready
suffer
mimic
the
poor
such
as on
behavior
of a
threads.
for a processor
while
a low-priority
thread
runs.
– When
a thread
page fault),
to run
—The
another
user-level
specific
traps
thread
part
between
to block
the
same
system
is easy
(for
the thread
or from
in achieving
the
the
necessary
kernel
and
these
control
each
goals
and
to
the
To be able
support
cations
software
and 1/0
application’s
to manage
the
needs to be aware
request /completions)
address
simplify
policy
scheduling
an
model
such
application’s
multiproces-
information
space.
space.
application-
for
concurrency
address
of a
can be used
in a multiprogrammed
scheduling
allocate
processors
among applications,
the kernel
scheduling
information
(e. g., how much parallelism
space).
because
a different
is structured
to change
example,
was running
threads,
or even to provide
a different
[16], Actors [1], or Futures
[10].
The difficulty
sor is that
from
It
kernel
on which
of our
customization.
application’s
as workers
to the
the processor
is distributed
To
be
able
to
needs access to user-level
there is in each address
parallelism,
the
user-level
of kernel
events (e. g., processor
reallothat
are normally
hidden
from the
application.
1.3
The Approach
multiprocessor,
an
Our approach
provides
each application
with
a virtual
abstraction
of a dedicated
physical
machine.
Each application
knows exactly
ACM
Transactions
on Computer
Systems,
Vol.
10, No.
1, February
1992.
56
.
T. E, Anderson et al.
how many
control
(and which)
over
operating
which
system
processors
of its
kernel
have
threads
been
are
has complete
sors among
address spaces including
processors assigned to an application
allocated
running
to it and has complete
on
control
over
those
the
processors.
allocation
the ability
to change
during
its execution.
The
of proces-
the
number
of
To achieve this, the kernel
notifies
the address space thread
scheduler
of
every kernel
event affecting
the address space, allowing
the application
to
have complete
knowledge
of its scheduling
state. The thread
system in each
address space notifies
the kernel
of the subset
that can affect processor
allocation
decisions,
for the majority
The
uler
of operations
kernel
that
of user-level
preserving
thread operations
good performance
do not need to be reflected
to the kernel.
mechanism
that we use to realize
these ideas is called
A scheduler
activation
vectors control from the kernel
activations.
sched-
to the
address space thread
scheduler
on a kernel
event; the thread
scheduler
can
use the activation
to modify
user-level
thread
data structures,
to execute
user-level
threads,
and to make
requests
of the kernel.
We have implemented
a prototype
of our design on the DEC SRC Firefly
multiprocessor
workstation
[22]. While
the differences
between
scheduler
activations
that
the
and kernel
threads
are crucial,
the similarities
kernel
portion
of our implementation
required
straightforward
operating
modifications
system
on the
to
Firefly.
the
kernel
threads
Similarly,
the
are great enough
only relatively
of Topaz,
user-level
the
portion
implementation
involved
relatively
straightforward
modifications
Threads,
a user-level
thread
system
originally
designed
to run
Topaz
Since
threads
kernel
our
can
when
threads
are the
We emphasize,
implemented
suffer
from
solved
by implementing
the
2. USER-LEVEL
In
to Faston top of
threads.
goal is to demonstrate
that the exact functionality
be provided
at the user level, the presentation
in
assumes
that
user-level
programmer
or compiler.
models,
native
of our
same
at user level
problems
them
THREADS:
PERFORMANCE
LIMITATIONS
this
motivate
we
on top of kernel
as user-level
our
work
threads
threads—problems
on top of scheduler
FUNCTIONALITY
section
concurrency
model
however,
that other
used by the
concurrency
or processes,
that
are
activations.
ADVANTAGES
by
of kernel
this paper
describing
AND
the
advantages
that
user-level
threads
offer relative
to kernel
threads,
and the difficulties
that
arise when user-level
threads
are built
on top of the interface
provided
by
kernel
threads
or processes.
We argue that the performance
of user-level
threads
is inherently
better
than that of kernel
threads,
rather
than this
being an artifact
of existing
implementations.
additional
advantage
of flexibility
with respect
environments.
Further,
we argue that the lack
ited by user-level
threads
is not
but is a consequence
of inadequate
ACM Transactions
inherent
kernel
User-level
threads
have an
to programming
models and
of system integration
exhib-
in user-level
support.
on Computer Systems, Vol. 10, No. 1, February
1992
threads
themselves,
Scheduler Activations: Effective Kernel Support
2.1
The Case for User-Level
It is natural
thread
that
that
could
Unfortunately,
in the kernel:
cost
within
as user-level
of accessing
thread
optimizations
the
threads
there
57
Management
the performance
be applied
are as efficient
tionality.
threads
— The
to believe
systems
Thread
.
are
kernel,
without
significant
management
found
yielding
the
threads
compromises
inherent
costs
With
operations:
in user-level
kernel
in func-
to managing
kernel
threads,
the program
must cross an extra
protection
boundary
on every thread
operation,
even when the processor
is being switched
between
threads
in
the same address space. This involves
not only an extra kernel
trap, but
the
kernel
against
must
buggy
thread
also copy and
or
operations
techniques
malicious
can
check
parameters
programs.
By
be quite
are used to expand
inexpensive,
code inline
in order
contrast,
to protect
invoking
particularly
and perform
itself
user-level
when
compiler
sophisticated
regis-
ter allocation.
Further,
safety is not compromised:
address space boundaries isolate misuse of a user-level
thread
system to the program
in which
it occurs.
With kernel
– The cost of generality:
ing implementation
is used by all
kernel
thread
system must provide
application;
particular
system
use it,
this
imposes
feature.
overhead
In contrast,
can be closely
since different
These
were
factors
inherently
instance,
However,
applications
provided
that
do not use a
by a user-level
thread
matched
to the specific needs of the applications
that
applications
can be linked
with different
user-level
policy
would
on those
the facilities
thread
libraries.
As an example,
preemptive
priority
scheduling,
can use a simpler
thread
management,
a single underlyapplications.
To be general-purpose,
a
any feature
needed by any reasonable
most kernel
even though
thread
systems implement
many parallel
applications
such as first-in-first-out
not
expensive.
be important
Kernel
trap
[241.
if thread
overhead
management
and priority
operations
scheduling,
for
are not major contributors
to the high cost of UNIX-like
processes.
the cost of thread
operations
can be within
an order of magnitude
of a procedure
implementation,
level
thread
well-written
To illustrate
implementations
call.
This implies
that
any overhead
added by a kernel
however
small, will be significant,
and a well-written
user-
system
will
have
significantly
kernel-level
thread
system.
this
better
performance
than
a
quantitatively,
Table I shows the performance
of example
of user-level
threads,
kernel
threads,
and UNIX-like
pro-
cesses, all running
on similar
Topaz kernel
threads
were
hardware,
measured
a CVAX processor. FastThreads
and
on a CVAX
Firefly;
Ultrix
(DEC’S
derivative
of UNIX)
was measured
on a CVAX
uniprocessor
workstation.
(Each of these implementations,
while
good, is not “optimal.”
Thus,
our
measurements
are illustrative
and not definitive).
Fork,
the time to create,
schedule,
execute
The two benchmarks
are Null
and complete
a process/thread
that
invokes
the null
procedure
(in other
ACM Transactions on Computer Systems, Vol. 10, No. 1, February 1992.
58
.
T. E. Anderson et al.
Table 1,
Operation
the
(the
was
executed
the
Table
11300
441
1840
there
tuned
that
with
while
this
of the
the
path
arises
between
they
Flexibility
call
Topaz
written
in
[26].
and
improve
2.2
Sources of Poor Integration
Kernel Interface
Unfortunately,
have the same
may
and
in User-Level
in
threads
code
and
is
highly
in
choos-
flexibility
threads,
both
performance
in thread
systems
modifying
likelihood
Threads
and
since there
may require
specialthreads,
supporting
require
the
however,
the
kernel,
of errors
in
the
Built on the Traditional
it has proven
difficult
to implement
user-level
threads
level of integration
with system services as is available
kernel
threads.
level, but rather
cost
management,
Topaz
User-level
important
overhead,
on
assembler.
performance
models
programming
complexity,
across
7 ,usec.
difference
thread
multiple
parallel
about
between
the
models, each of which
system.
With
kernel
increases
a
benchmark
takes
thread
are many parallel
programming
ized support
within
the thread
which
kernel.
Each
averaged
kernel
that
simultaneously
is particularly
for
on a condi-
were
of magnitude
Topaz
services
time
wait
19 psec.
difference
fact
system
results
order
and
critical
implement
tradeoffi
is an
then
together).
the
about
of magnitude
despite
a tradeoff
to
there
and
a procedure
takes
the
Signal-Wait,
threads
and
management
is
much
where
flexibility.
trap
order
This
two
comparison,
process
another
Commonly,
avoid
For
and
process/thread,
processor,
a kernel
I shows
is yet
a thread),
a waiting
a single
while
FastThreads.
ing
of forking
of synchronizing
on
Ultrix
Ultrix
processes
948
repetitions.
between
Topaz
threads
37
overhead
Firefly,
(ysec )
34
to signal
tion
multiple
FastThreads
Latencies
Fork
overhead
processlthread
Operation
Signal-Wait
Null
words,
Thread
This is not inherent
in managing
parallelism
is a consequence
of the lack of kernel
support
that
with
at the user
in existing
abstraction
for supporting
user-level
systems.
Kernel
threads
are the wrong
thread
management.
There are two related
characteristics
of kernel
threads
that cause difficulty:
—Kernel
the
threads
user
—Kernel
thread
block,
resume,
are
preempted
without
notification
to
threads
are
scheduled
obliviously
with
respect
to
the
user-level
state.
These can cause
thread
system will
problems
even on a uniprogrammed
often create as many kernel
threads
processors”
as there are physical
to run user-level
threads.
When
ACM
and
level.
TransactIons
on
Computer
Systems,
system. A user-level
to serve as “virtual
processors
in the system; each will be used
a user-level
thread
makes a blocking
1/0
Vol.
10,
No.
1, February
1992
Scheduler Activations: Effective Kernel Support
request
or takes
a page fault,
though,
the kernel
threads
A plausible
on the just-idled
solution
to this
serving
59
as its virtual
processor is lost to the address
is no kernel thread to run other
processor also blocks. As a result, the physical
space while the 1/0 is pending,
because there
user-level
thread
.
processor.
might
be to create
physical
processors;
when one kernel
thread
thread
blocks
in the kernel,
another
kernel
user-level
threads
on that processor.
However,
1/0 completes
or the page fault returns:
there
more
kernel
threads
than
blocks because its user-level
thread
is available
to run
a difficulty
occurs when the
will be more runnable
kernel
threads
than
processors,
each kernel
thread
in the middle
of running
a
user-level
thread.
In deciding
which
kernel
threads
are to be assigned
processors,
the operating
system
will
implicitly
choose which
user-level
threads
In
are assigned
a traditional
processors,
the
processors.
system,
operating
when
there
system
could
are
more
employ
runnable
some kind
threads
than
of time-slicing
to
ensure each thread makes progress.
When user-level
threads
are running
on
top of kernel threads,
however,
time-slicing
can lead to problems.
For example, a kernel thread could be preempted
while its user-level
thread is holding
a spin-lock;
any user-level
threads
accessing the lock will then spin-wait
until
the lock holder is rescheduled.
Zahorjan
et al. [281 have shown that time-slicing in the presence of spin-locks
can result in poor performance.
As another
example,
a kernel thread
running
a user-level
thread
could be preempted
to
allow another
kernel thread to run that happens to be idling
in its user-level
scheduler.
Or a kernel thread running
a high-priority
user-level
thread could
be rescheduled
in favor of a kernel
low-priority
user-level
thread.
Exactly
the same problems
page faults.
If there
machine’s
processors;
should
preempt
occur
thread
with
that
happens
to be running
multiprogramming
as with
a
1/0
and
is only one job in the system, it can receive all of the
if another
job enters the system, the operating
system
some of the first
job’s
processors
to give
to the new job [231.
The kernel then is forced to choose which of the first job’s kernel threads,
and
thus implicitly
which user-level
threads,
to run on the remaining
processors.
The need to preempt
processors
from an address space also occurs due to
variations
the
in parallelism
dynamic
variations
within
reallocation
jobs;
of processors
in parallelism
is important
Zahorjan
and
among
address
to achieving
McCann
[271 show
spaces
high
that
in response
to
performance.
While
a kernel
interface
can be designed
to allow
the user level
to
influence
which kernel
threads
are scheduled
when the kernel
has a choice
[5], this choice is intimately
tied to the user-level
thread
state; the communication
of this information
between
the kernel
and the user-level
negates
many
of the
performance
threads
in the first
Finally,
ensuring
and
place.
the logical
flexibility
correctness
advantages
of
of a user-level
using
thread
user-level
system
built
on kernel threads
can be difficult,
Many applications,
particularly
those that
require
coordination
among multiple
address spaces, are free from deadlock
based on the assumption
that all runnable
threads
eventually
receive processor time. When kernel
threads
are used directly
by applications,
the kernel
ACM
Transactions
on Computer
Systems,
Vol
10, No.
1, February
1992.
60
.
satisfies
runnable
number
T. E. Anderson et al
this assumption
by time-slicing
the processors
among
all of the
threads,
But when user-level
threads
are multiplexed
across a fixed
of kernel
threads,
the assumption
may no longer
hold: because a
kernel
thread
blocks when its user-level
thread
blocks, an application
run out of kernel threads to serve as execution
contexts,
even when there
runnable
user-level
threads
and available
processors.
3. EFFECTIVE
KERNEL
SUPPORT
FOR THE USER-LEVEL
can
are
MANAGEMENT
OF PARALLELISM
Section
the
ity)
2 described
programmer
and when
behavior
problems,
system
the problems
that
arise
to express parallelism
user-level
threads
are
in the presence
of multiprogramming
we have designed
a new kernel
that
together
combine
the
own
chine
virtual
except
machine
this
multiprocessor,
that
during
the
the
the
kernel
may
execution
kernel
and
interface
functionality
performance
and flexibility
of user-level
The operating
system kernel
provides
its
when
1/0).
and
of kernel
threads.
each user-level
abstraction
change
of the
threads
are used by
(poor performance
and poor flexibilbuilt
on top of kernel
threads
(poor
To address these
user-level
thread
threads
thread
of a dedicated
the
program.
number
with
system
with
physical
of processors
There
the
are several
ma-
in that
aspects
to
abstraction:
–The
kernel
allocates
processors to address spaces; the kernel
has complete
control
over how many
processors
to give each address
space’s virtual
multiprocessor.
—Each address space’s user-level
thread
system has complete
control
over
which threads
to run on its allocated
processors,
as it would if the application were running
on the bare physical
machine.
–The
kernel
changes
thread
notifies
the number
system
the
user-level
of processors
whenever
thread
system
assigned
a user-level
whenever
to it; the kernel
thread
blocks
the
or wakes
up
kernel (e.g., on 1/0 or on a page fault).
The kernel’s
role is to vector
to the appropriate
thread
scheduler,
rather
than to interpret
these
on its own.
—The
user-level
thread
system
notifies
the
kernel
when
kernel
also notifies
the
in
the
the
events
events
application
needs more or fewer processors.
The kernel
uses this information
to allocate processors among address spaces. However,
the user level notifies
the
kernel
only on those subset of user-level
thread
operations
that
might
affect processor
allocation
decisions.
As a result,
performance
promised;
the majority
of thread
operations
do not suffer the
communication
— The
from
with
application
the kernel.
programmer
programming
directly
system manages
its virtual
mer, providing
programmers
user-level
different
sees
with
no
difference,
kernel
multiprocessor
a normal
runtime
system could easily
parallel
programming
model).
ACM Transactions
is not comoverhead
of
except
for
performance,
Our user-level
thread
transparently
to the programTopaz thread
interface
[41. (The
threads.
be adapted,
on Computer Systems, Vol. 10, No. 1, February
1992
though,
to provide
a
Scheduler Activations: Effective Kernel Support
In the remainder
to the user-level
tion
to allow
user-level
3.1
of this
thread
section
system,
the kernel
we describe
what
to allocate
how kernel
information
processors
events
is provided
among
.
61
are vectored
by the applica-
jobs, and how we handle
spin-locks.
Explicit
Vectoring
The communication
of Kernel
between
Events
to the User-Level
the kernel
processor
Thread
allocator
Scheduler
and the user-level
thread
system
is structured
in terms
of scheduler
activations.
The term
“scheduler
activation”
was selected because each vectored
event causes the
user-level
thread system to reconsider
its scheduling
decision of which threads
to run on which processors.
A scheduler
activation
serves
—It serves
as a vessel,
exactly
the same way
—It
notifies
–It
provides
or execution
that
the user-level
space
three
in
roles:
context,
a kernel
thread
the
system
kernel
for running
thread
for
user-level
threads,
in
does.
of a kernel
saving
the
event.
processor
context
of the
activation’s
current
user-level
thread,
when the thread
is stopped by the
kernel
(e.g., because the thread
blocks in the kernel
on 1/0 or the kernel
preempts
its processor).
A scheduler
traditional
activation’s
kernel
— one mapped
space.
Each
starts
running
into
data
thread.
the
user-level
[2]; when
Each
structures
scheduler
are
quite
activation
kernel
and
thread
is allocated
one mapped
a user-level
its
thread
activation’s
kernel stack. The user-level
tion’s user-level
stack. In addition,
the
into
own
calls
thread
kernel
similar
to those
has two execution
the
application
user-level
into
the
scheduler
maintains
address
stack
kernel,
of a
stacks
when
it
it uses its
runs on the activaan activation
con-
trol block (akin to a thread
control block) to record the state of the scheduler
activation’s
thread when it blocks in the kernel or is preempted;
the user-level
thread
scheduler
maintains
a record of which user-level
thread is running
in
each scheduler
When
activation.
a program
is
started,
the
kernel
assigns it to a processor,
and upcalls
into
fixed entry
point.
The user-level
thread
creates
a scheduler
the application
management
activation,
address space at a
system receives
the
upcall and uses that activation
as the context in which to initialize
itself and
run the main application
thread.
As the first thread
executes,
it may create
more user threads
and request
additional
processors.
In this case, the kernel
will create an additional
scheduler
activation
for each processor and use it to
upcall into the user level to tell it that the new processor
user level then selects and executes a thread in the context
is available.
The
of that activation.
Similarly,
when the kernel
needs to notify
the user level of an event, the
kernel
creates a scheduler
activation,
assigns it to a processor,
and upcalls
into the application
address space. Once the upcall is started,
the activation
is similar
to a traditional
kernel thread— it can be used to process the event,
run
user-level
threads,
and trap
ACM
into
Transactions
and block
on Computer
within
Systems,
the kernel.
Vol.
10, No.
1, February
1992.
62
T. E. Anderson et al.
.
Table II.
Add
this
processor
(processor
a mnnab!e
Execute
Processor
has
been
ready
context
of the preempted
activation
The
blocked
Scheduler
list
has
scheduler
activation
to the
ready
context
of the
blocked
(preempted
the
user-ievei
scheduler
Points
blocked
(blocked
activation
state)
zn the
#)
using
(unblocked
wet--level
scheduler
# and its machine
was executing
that
as no longer
unblocked
l~st the
activation
thread
activation.
activation
has
Return
Upcall
thread
preempted
to the
Activation
#)
user-level
Return
Scheduler
Scheduler
thread
itsprocessor.
activation
that
was
# and its machine
executing
in
state)
the
act~uatton.
The crucial distinction
that
once an activation’s
between
scheduler
activations
and kernel threads
is
user-level
thread
is stopped
by the kernel,
the
thread
resumed
is never
activation
directly
is created
has been stopped.
thread
from
to notify
The user-level
the old activation,
by
the
the
kernel.
user-level
thread
tells
Instead,
thread
system
the kernel
system
then
removes
that
a new
that
scheduler
the
thread
the state
of the
the old activation
can be
reused, and finally
decides which thread to run on the processor.
By contrast,
in a traditional
system,
when the kernel
stops a kernel
thread,
even one
running
a user-level
thread
in its context,
the kernel
never notifies
the user
level of the event. Later,
the kernel
directly
by implication,
its user-level
thread),
again
resumes
without
the kernel
notification.
thread
(and
By using
scheduler
activations,
the kernel
is able to maintain
the invariant
that there
are always exactly
as many running
scheduler
activations
(vessels for running
user-level
Table
scheduler
II
threads)
lists
the
as there
events
activations;
that
are processors
the
kernel
the parameters
assigned
vectors
to the address
to the
to each upcall
user
space.
level
using
are in parentheses,
and
the action
taken
by the user-level
thread
system
is italicized.
Note that
events are vectored
at exactly
the points where the kernel
would otherwise
be forced to make a scheduling
decision.
In practice,
these events occur in
combinations;
when this occurs, a single upcall is made that passes all of the
events that need to be handled.
As one example
of the use of scheduler
activations,
Figure
1 illustrates
what happens on an 1/0 request/completion.
Note that this is the uncommon
case; in normal
operation,
threads
can be created,
run, and completed,
all
without
kernel
intervention.
Each pane in Figure
1 reflects
a different
time
step. Straight
arrows represent
scheduler
activations,
s-shaped arrows represent user-level
threads,
and the cluster
of user-level
threads
to the right
of
each pane represents
the ready list.
At time Tl, the kernel
allocates
the application
two processors.
On each
processor,
the kernel
upcalls
to user-level
code that removes
a thread
from
the ready
(thread
ACM
list
and starts
1) blocks
Transactions
on
in the
Computer
running
it. At time
kernel.
Systems,
To notify
Vol
10,
No.
T2, one of the user-level
the
1, February
user
level
1992,
of this
threads
event,
the
Scheduler Activations: Effective Kernel Support
‘m
.
63
Time
User Program
............................ .. .... .......... ..........
J($
(A)
p
fq?
~
u (B)
u
‘2
,,:
‘(c)
A’s thread
has blocked
1
I
..“U!!!!-J
.................................
Time
T4
User Program
. .. .. .. .. ... .. .. . .. ... .. . .. ... . .. .. ... . .. .. .. ... . .. .. .
User-Level
Runtime
System
~m~
‘(c)
“(B)
‘(.4)
‘(D)
Operating
A’s thread
System
and B’s
Kernel
thread can
continue
Fig.
kernel
takes
upcall
in the
scheduler
1.
the processor
context
can then
Example:
that
1/0
completes.
em
running
scheduler
use the processor
I
request/completion.
had been
of a fresh
and start running
it.
At time T3, the 1/0
i
to take
Again,
thread
activation.
another
the kernel
1 and performs
The
user-level
thread
must
an
thread
off the ready
notify
list
the user-level
thread
system of the event, but this notification
requires
a processor.
The
kernel preempts
one of the processors running
in the address space and uses
it to do the upcall.
(If there are no processors
assigned
to the address space
when
This
the 1/0
upcall
preemption.
puts
the
completes,
notifies
The upcall
thread
that
the upcall
the
user
level
invokes
had
been
must
wait
until
of two things:
the kernel
the 1/0
code in the user-level
blocked
on the
ready
allocates
completion
thread
list
system
and
one).
and the
that
(2) puts
(1)
the
thread that was preempted
on the ready list. At this point, scheduler
activations A and B can be discarded.
Finally,
at time T4, the upcall takes a thread
off the ready list and starts
When a user level thread
running
it.
blocks in the kernel
or is preempted,
most
of the
state needed to resume it is already
at the user level —namely,
the thread’s
stack and control
block.
The thread’s
register
state, however,
is saved by
low-level
kernel
routines,
such as the interrupt
and page fault handlers;
the
kernel
passes this state to the user level as part of the upcall notifying
the
address space of the preemption
and/or
1/0 completion.
We use exactly
the same mechanism
to reallocate
a processor
from one
ACM
Transactions
on Computer
Systems,
Vol.
10, No. 1, February
1992.
64
T. E. Anderson et al.
.
address space to another
due to multiprogramming.
For example,
suppose the
kernel decides to take a processor away from one address space and give it to
another.
The kernel does this by sending the processor an interrupt,
stopping
the old activation,
and then using the processor to do an upcall into the new
address space with a fresh activation.
The kernel
need not obtain permission
in advance from the old address space to steal its processor;
to do so would
violate
the semantics
could
have
address
the
the
old
dress
space
space
on its
an
address
then
but
control
we
could
instead,
it as a processor.
processors
it
has
simply
we
reallocates
been
over
(When
skip
delay
the
notifying
case
the
it
is
running
into
the
The
ad-
user-level
threads
should
be
is preempted
the
address
until
allows
in
stopped.
processor
old
an upcall
notifying
of these
last
notification
Notification
assigned,
been
which
the
still
to make
space
the
The kernel
processor
activation,
have
address
However,
occurred.
is used
scheduler
threads
full
space).
the preemption
processor
a fresh
has
(e. g., the new
address
on a different
second
processors.
space,
preemption,
that
user-level
remaining
old
preemption
The
using
two
scheduler
run
the
be notified
space.
that
space priorities
than
another
address
address
thread
still
by doing
old
of address
priority
space must
does this
in
higher
the
user
from
space
kernel
level
of
to know
explicitly
the
eventually
which
managing
cache
locality).
The
above
threads
beyond
3 is
description
have
the
ones
lower
thread
will
system
system
level
to put
can
which
terms
of
the
of the
page
same
data
Third,
In
was
in the
event.
until
ACM
manager
a different
loop,
only
must
page
Transactions
added
for
may
in
and
when
on
in
For
whose
be made
kernel
page
it
Systems,
Vol.
10,
No
1, February
user
or
no user-level
its
on the
the
by
stack,
thread
is
processor
during
processing
upcall
same
the
allows
an event
an
a
thread
is saved
own
handling
is
level.
a preemption
to continue
1992
other
no knowledge
if a preempted
delay
of
in
needs
is that
fault
occurs,
any
if a user-level
example,
the
turn
way
context
build
state
with
one
the
behavior
when
completes.
Computer
user
exactly
is entirely
to
when
if it was
can
this,
The
kernel’s
at the
even
activation
is necessary;
fault
free
kernel
manager
if not.
of a page
fault
the
manager
a new
saving
the
parallelism
thread
complication
kernel
thread
it knows
application
is
properly
switch
The
user-level
2 instead.
and
the
activations;
particular,
check
for
the
thread
user-level
3’s processor.
application
to recover
context
the
if
place
processors.
thread
way
no action
a user-level
The
the
in
First,
take
1, suppose
because
with
in
to
case,
thread
as stopping
it is the
thread
program
kernel
case,
that
run
interaction
work
upcall,
idle
an upcall,
the
this
and
of its
user-level
Figure
allowing
kernel
In
in
thread
and
respects.
have
In
preemption
to represent
subsequent
(reentrant)
running,
in the
list
The
case.
used
2.
upcall,
of scheduler
activations
occurs
The
top
every
structures
is running.
kernel.
the
in
scheduler
fault
kernel’s
on
an
additional
the
minor
may
example
1 and
ready
the
activations.
model
the
to preempt
do
on each
the
scheduler
concurrency
exactly
for
we described
threads,
In
to
3 on the
several
preemption
threads
kernel
is running
while
user-level
the
to ask
thread
both
processor
thread
know
Second,
ask
that
in
additional
above.
than
can
use
an
described
priority
then
is over-simplified
priorities,
location;
subsequent
the
to notify
the
upcall
Scheduler Activations: Effective Kernel Support
Finally,
a user-level
execute
further
resumes
point
the
it
part
of the
3.2
Notifying
has
mode
the
leave
user
level,
straints
are
not
by
the Kernel
met
any
key
of User-Level
affect
the
while
minimizing
best
our
more
to
to
notified
an
away,
be
to
user
on top
level
but
the
state
as
Allocation
next
as we
system
about
the
By
— a thread
that
cache
need
they
not
subset
contrast,
traps
are
tell
the
that
can
when
to the
—is
kernel
kernel
respects
context
and
con-
threads.
small
a processor
preserving
know,
of kernel
thread
only
decision.
run
and
all
that
there
the
Table
III
even
priorities
within
the
same
lists
the
that
address
has
when
the
idle
may
that
simply
since
apparent
the
longer
to
If
their
get
only
kernel
has
more
The
has
not
taken
kernel
to this
address
needed
thread
for
none
need
approach
spaces,
an
are
the
the
system
if
have
time
must
space
it
gets
state
that
that
idle
there,
the
the
updated
serialize
its
but
in
an address
it
has
happens,
becomes
gives
with
these
kernel
nothing
one
kernel
kernel
on
the
address
found,
if the
space
notifies
a processor
by
to
address
space
hints:
processor
user-level
ordering
drawback
in reporting
extra
application
system.
a
or
additional
Creating
an
extension
than
an
searches
processors.
the
by
address
kernel
are
is no
returns
Of course,
in the
rather
made
an
eventually
notifications
a processor
threads,
the
(An
has
it
busy.
if
makes
processors,
application
be
and
work
it
than
reassigned
must
processors.
calls
example,
space
These
idle
not
Similarly,
processors
be no other
kernel
processors,
unique
idle
an
has
system
has
where
needs
more
the
whenever
threads
priorities).
For
kernel,
in
kernel
Provided
allocator
constraints.
must
situation
transitions.
registered
it
the
runnable
the
of additional
meaningful
more
threads.
processors
kernel
then
notifies
has
processor
violate
notified
space
it
runnable
the
cannot
globally
not
it
address
than
handles
An
the
parallelism,
where
and
then
the
them
honest
the
that
register
Processor
as far
built
allocation
for
for
a state
parallelism
the
Affecting
systems;
system
operation,
overhead
run
processors,
tion.
Events
thread
is that
thread
thread
processors
space
reaches
occurs
thread’s
tO
kernel
policies
that
both respect
priorities
exist.
These
idle if runnable
threads
thread
directly
system,
threads
future.
user-level
or
latter
need
the
space.
transition
the
again
the
still
so,
last subsection
is independent
of the policy
processors
among address spaces. Reason-
kernel
processor
used
the
not
most
kernel’s
when
In
by
user-level
every
are
address
may
If
however,
must be based on the available
parallelism
In this subsection,
we show that this information
can
observation
about
threads
blocks
is when
the
kernel
completes.
either
It
passing
be efficiently
communicated
for
guarantee
that processors
do not
The
it
kernel.
in the
1/0
65
upcall.
able allocation
policies,
in each address space.
kernel
the
until
the
The mechanism
described
in the
used by the kernel
for allocating
met
blocked
when
temporarily,
would
notifies
that
kernel
thread
where
kernel
thread
in
.
the
space
address
informa-
notifications
to
matters.
to
this
approach
parallelism
multiprocessors:
to the
a
dishonest
ACM Transactions
is
that
operating
or
applications
system.
misbehaving
on Computer Systems, Vol
may
This
not
problem
program
10, No. 1, February
be
is
can
1992.
66
T. E. Anderson
●
Table
III.
Add
et al.
Communication
more
processors
Aliocate
more
running
them
This
processor
sor
an
unfair
as well.
feedback
for
In
can
that
elsewhere,
has
penalize
similar
goal;
of idle
and
with
this
policy
idle
when
the
in
could
easily
processors
the
be
future,
will
system
left
to
do
in
the
avoid
something
the
similar.
performance,
service,
often
service
is improved
approximated
time.
multiprocessors
be adapted
This
needed
is created.
accumulate
multiprogrammed
the
address
more.
are
should
systems
remaining
use
if overall
near
work
favor
they
that
hand,
interactive
as they
that
when
the
information
can
processors
in
multi-level
honest
those
is likely
work
uniproces-
systems,
allocator
other
operating
least
at.
We
by
expect
to achieve
a
the
to encourage
honest
reporting
a user-level
thread
could
processors.
3.3
Critical
One
issue
There
SectIons
we
executing
have
not
in a critical
are
two
yet
possible
continue
to test
[28],
deadlock
ready
list
lock;
preempted
are
not
per-processor
improve
avoided
[2];
and
Prevention
onto
through
and
user
use
of
upcall
lists
approaches
a scheduling
occur
of
thread
also
must
to dealing
and
has
by the
could
pre-empted
to place
even
when
locking
a number
uses
control
critical
the
protocol
to
atomically.
problem
preemptions
of serious
the
the
unlocked
blocks
be done
with
1
other
be holding
attempted
inopportune
prevention,
because
FastThreads
lists
free
Prevention
can
example,
free
two
With
if the
Problems
For
held
thread
be
or preempted.
(e.g.,
spin-lock
list).
to these
are
level.
performance
occur
a lock.
it is blocked
preempted
per-activation)
the
the
the
ready
accesses
recovery
when
poor
would
by
preemption.
that
application-level
(e.g.,
the
(really,
latency
inopportune
an
protected
is
instant
effects:
if so, deadlock
thread
sections
at the
ill
threads
and
addressed
section
thread)
kernel
the
create
of jobs
to be used
it
the
especially
the
priority
policy
to
reallocation
time,
jobs
the
On
processors,
likely
start
needs
to provide
processors
that
needed.
uniprocessor
response
favoring
same
than
most
up
imply
are
of processor
reducing
give
priorities
they
production
Average
by
the
applications
to
space
thread
and
spaces
and
on a multiprogrammed
user-level
processors
threads
Many
or
processor
spaces
overhead
of resources
address
The
when
fewer
if another
kernel-level
fewer
needed)
space
()
decisions.
since
address
activattorw.
to encourage
address
be returned
address
allocation
use
encourages
scheduler
proportion
Space to the Kernel
# of processors
to this
be used
processor
spaces
(additional
thzs processor
either
the Address
processors
is idle
Preempt
consume
from
between
of
are
the
drawbacks,
1 The need for critical sections would be avoided if we were to use wait-free synchronization
[11].
Many commercial
architectures,
however, do not provide the required hardware support (we
assume only an atomic test-and-set instruction);
in addition, the overhead of wait-free synchronization can be prohibitive
for protecting anything but very small data structures.
ACM
Transactions
on
Computer
Systems,
Vol.
10,
No.
1, February
1992
Scheduler Activations: Effective Kernel Support
particularly
kernel
in
a multiprogrammed
to yield
user-level,
control
violating
is inconsistent
will
vention
the
with
describe
in
address
Finally,
“pinning”
to
in
(at
in
requires
priorities.
presence
memory
section;
67
the
temporarily)
of critical
the
physical
a critical
least
space
implementation
4.3.
while
Prevention
allocation
of
efficient
Section
be touched
environment.
processor
semantics
the
requires
might
over
.
to the
Prevention
sections
of
all
page
virtual
identifying
that
we
faults,
pre-
pages
these
that
pages
can
be
cumbersome.
Instead,
we
user-level
adopt
thread
thread
system
course,
this
a solution
system
checks
check
based
that
if
must
the
a user-level
context
temporarily
via
exits
the
section,
upcall,
again
user-level
continue
activation
on the
if
it
locks).
switch.
When
back
At
this
point,
list.
We
use
the
in
the
middle
preempted
the
the
section.
If so, the
control
switch.
ready
was
a critical
any
relinquishes
context
back
in
informs
or unblocked,
the
to
it
continued
the
original
is safe
same
(Of
thread
to place
mechanism
of
to
processing
a
event.
This
technique
ensure
the
thread
an
kernel
a user-level
it
an upcall
preempted
executing
acquiring
is continued
the
was
When
been
before
thread
via
has
thread
be made
critical
on recovery.
a thread
that
presence
supports
lock
from
spin-locks,
is to relinquish
the
is
a spin-lock
processor
faults.
lock
the
not
affected,
takes
spinning
processor
for
system
continue
a page
the
time
fault;
a while
in
technique
thread
to
we
even
this
user-level
it
holder,
released,
Further,
allowing
holder
after
the
eventually
since
occurs,
correctness
when
continuing
is always
or page
a preemption
Although
spin-waiting
By
it
preemptions
user-level
when
holder.
deadlock.
is acquired,
of processor
notified
wasted
free
a lock
arbitrary
always
this
is
once
is
spin-
may
be
a solution
to
[4].
4. IMPLEMENTATION
We
have
implemented
the
native
operating
tion,
and
We
FastThreads,
modified
scheduler
it now
Table
processors
the
Topaz
Where
In
address
described
the
DEC
spaces
to
thread
upcalls
which
existing
blocked,
the
modified
they
Topaz
user
modifying
Topaz,
worksta-
Topaz
routines
to implement
resumed,
or preempted
level
Topaz
to
to take
do
scheduled
belonged.
(and
3 by
multiprocessor
package.
formerly
formerly,
Section
Firefly
management
to allow
we
spaces;
in
SRC
thread
Topaz
addition,
to address
compatibility;
for
kernel
performs
II).
design
a user-level
the
activations.
a thread,
(see
the
system
We
therefore
also
UNIX)
these
actions
explicit
allocation
of
threads
obliviously
to
object
code
maintained
applications
still
run
as
before.
FastThreads
sections,
allocation
In
all,
lines
modified
to provide
decisions
we
was
concerned
a few
(For
over
with
and
(see
added
to Topaz.
threads
below),
was
and
not
to process
Topaz
with
Table
III).
hundred
comparison,
4000
lines
the
lines
of code).
implementing
the
with
scheduler
ACM
upcalls,
the
of code
original
The
for
to FastThreads
majority
per
on Computer
interrupted
needed
Topaz
processor
activations
Transactions
to resume
information
its
and
implementation
of the
allocation
new
critical
processor
about
1200
of kernel
Topaz
policy
code
was
(discussed
se.
Systems,
Vol.
10, No
1, February
1992.
68
T. E. Anderson et al.
.
Our
is “neutral”
design
address
pair
spaces
of
policies
briefly
Processor
The
processor
Zahorjan
specting
do.
in
allocation
guaranteeing
divided
the
Our
rather
the
tions.
Continuing
compatibility
In
implementation,
our
in
kernel
requiring
with
that
every
Topaz
kernel
existing
same
way
allocator
processors
or
application
can
be
neither
state;
processor,
received
in
it,
and
interface
described
in
scheduler
activations;
address
spaces
dress
spaces
scheduler
threads
using
are
is no need
Thread
An
important
aspect
to
application’s
lists
cache
locality;
choose
The
accessed
is essentially
by
policy
our
design
ACM Transactions
kernel
threads
compete
it
each
has
for
user-level
it
for
to
ad-
thread
using
As
a
The
spaces
assigned
spaces
scheduler.
yet).
provide
Processors
thread
(An
for
address
structures
to the
idle.
asked
information
address
address
are
processor
handed
for
activations.
another
to
the
at
as appropriate;
default
each
used
it
the
policy
by
idle
for
in
work
kernel
a result,
there
the
Each
can
FastThreads
list
struc-
application
to
is
fit
the
per-processor
order
ready
of an
data
be tuned
uses
last-in-first-out
own
of
to
improve
is empty.
This
[10].
includes
processor
is available
no knowledge
or
level.
they
if its
Multilisp
has
policy,
user
in
processor
scans
kernel
scheduling
these
an
that
is that
or
parallelism
implementation
reallocations;
the
to preserve
applications.
that
if
kernel
activa-
Topaz
whether
data
use
of processors.
model
a processor
the
addition,
notifying
to
needs.
ready
In
of our
manage
free
Topaz
to
Policy
concurrency
used
are
assigned
only
number
scheduler
scheduler
know
directly.
activations
partitioning
Scheduling
completely
processor
static
this
of the
necessary
processors
kernel
threads
original
use
for
provides
processors
kernel
that
instance,
asked
time-sliced
space
was
use
to
share,
Space-sharing
use
sequential)
for
internal
kernel
to the
space
some
address
them.
address
needs
of
re-
is work
in their
are
address
that
policy
while
if there
multiple
an
threads
has
not
3.2
scheduler
handed
application’s
tures
use
upcalls;
for
4.2
Section
that
via
only
has
that
want
applications
more
use
debug-
priority
remainder.
an integer
(possibly
use
could
idles
processors
the
for
spaces
as
dynamic
highest
of the
that
processor
space
we
and
processors
processors
is not
possible
address
the
all
among
priority)
it
to support
binary
processors
same
to the
the
reallocations;
makes
than
need
processors
implementation
threads,
some
implementation;
no processor
among
evenly
of processor
(at
course,
follow.
that
do not
divided
of available
spaces
to
processors
Of
enhancements
“space-shares”
evenly
spaces
are
number
prototype
is similar
chose
policy
and
number
that
we
The
are
of address
The
[27].
priorities
reduces
our
allocating
processors.
performance
subsections
policy
McCann
processors
for
as some
for
onto
Policy
if some address
those
selected
the
of policies
threads
Processors
spaces;
if the
be
as well
Allocation
and
choice
scheduling
to
these,
considerations,
4.1
on the
for
had
describe
ging
to
and
for
hysteresis
spins
for
to
a
reallocation.
on Computer Systems, Vol. 10, No. 1, February
1992.
avoid
short
unnecessary
period
before
Scheduler Activations: Effective Kernel Support
4.3
Performance
While
the
as just
equivalent
considerations
The
Section
In
a user-level
can
be resumed),
the
thread
flag
when
thread
check
to see if it
being
temporarily
upcall
when
acquisition
even
though
since
we
thread
and
We
adopt
collector
section
in
the
post-process
would
the
is for
the
when
it
this
to
the
imposes
set
and
a
then
thread
to the
original
overhead
or page
fault
is particularly
in
and
whether
thread
so that
a preemption
Latency
kernel
leaves,
processor
sections
place
code
the
the
on
occurs,
important
building
our
user-level
not
the
A second
it
notifies
its
copy
we
then
copy.
This
compiler
support.
At
critical
section,
we
execution
kernel
this
to see if it was
a
new
activation
in one
of these
corresponding
relinquishes
uses
starts
system;
at the
critical
each critical
the
Normal
the
thread
thread
place
control
can
user
in
level
the
the
in
to the
back
can
upcall
ACM
be
be
an old
and
cached
must
call,
the
of an
in
call
explicit
but
flag).
to the management
activation
is created for
is
it
the
free,
had
eventual
case
user
been
reuse.
running
level
Vol.
The
that
userit to
is removed
the
of blocking
be-
Instead,
by returning
processing
Systems,
however,
initialized.
activation
after
on Computer
not
and
for
in the
notifies
latency
relates
allocated
of preemption,
that
clearing
scheduler
preemption;
on lock
a procedure
scheduler
thread
‘Ikansactions
impact
case, we bracket
this
activation
to
user-level
case
of the
no
enhancement
scheduler
recycle
as the
In
setting
a new
structures
activations
system
context:
the
is
occasionally
section.
with
a new
data
there
implementation,
Logically,
Creating
processing
The
performance
as soon
the
resumer.
counter
preemptions,
our
path,
scheduler
kernel
of the
user-level
the
make
version
occurs,
program
a critical
line
requires
thread
from
about
(In
activations.
discarded
level
the
section.
significant
upcall.
cause
notify
low-level
package;
to
and
to the
common
Trellis/Owl
labels,
thread
code
the
the
every
language
original
back
of
assembler
user-level
preemption
thread’s
within
straight
of scheduler
the
a
critical
case.
from
the
if so, continues
concerned
be made
not
copy
special
in
in
at the end of the critical
section.
Because normal
execution
code, and this code is exactly the same as it would be if we
upcall
uses the original
common
exact
assembly
processor
to
and
of the
original
for
no overhead
a uniprocessor
to do given
When
preempted
copy
each
the
activation
the
an
with
code
but
code.
sections,
were
source
to yield
scheduler
critical
make
on
compiler-generated
copy,
original
checks
We
imposes
used
be straightforward
of the
that
was
by delimiting,
C
the
also
end
after
func-
additional
to check
is needed
the
Unfortunately,
critical
in
be able
flag
check
or not
solution
[17].
We do this
the
The
blocks
do this
the
infrequent.
continuable
technique
section.
not
to
it
must
relinquish
place.
are
when
clear
whether
events
different
a
a related
garbage
the
a safe
these
way
continued.
release
these
use
One
will
some
critical
sections,
described
in
continuation
of critical
sections
system
section,
continued
user-level
are
system.
case;
the
a lock.
is being
(or
thread
a critical
it reaches
lock
to
is preempted
holding
enters
there
performance.
relates
temporary
user-level
to provide
threads,
for
these
to provide
the
was
it
of
order
is sufficient
of kernel
are important
significant
3.3.
when
described
to that
that
most
69
Enhancements
design
tionality
.
upcall
in the
that
kernel,
resumption
10, No. 1, February
is
1992.
70
T. E. Anderson et al.
.
possible.
A similar
tions:
kernel
future
thread
in
occasional
scheduler
bulk,
bulk
traditional
deposit
same
kernel
kernel
4.4
have
There
the
two
its
system
itself
single-steps
use
speed
the
to
same
the
number
of
as a
preemption
system,
when
returned
Ignoring
the
one
1/0
crossing
is
completes.
The
system.
with
the
with
debugging
Firefly
their
own
application
Topaz
debugger.
needs:
debugging
code
running
on top
the
processors,
but
debugged.
a
debugger
of instructions
user-level
this
is
Instead,
logical
the
these
thread
system
kernel
do not
state
the
each
of
thread
scheduler
debugger
cause
kernel
of the
assigns
the
as little
The
when
when
events
have
debugged.
inappropriate
processor;
activation,
should
being
stops
upcalls
into
or
the
system.
the
the
debugging–the
informs
debugged
thread
user-level
facilities
code
thread
of the
running
system
thread
in
is working
system
the
context
to
combine
to
correctly,
stop
and
the
examine
of a user-level
debugger
the
thread
state
of
[181.
PERFORMANCE
The
goal
with
the
the
sequence
a scheduler
application
5.
to
on the
is being
Assuming
can
to
and
a time.
processor
thread
each
and
is crucial
described
being
user-level
or
our
activations
system
physical
activation
in
at
makes
1/0
is needed
environments,
thread
we have
of
collected
be
one
a kernel
In
occur
scheduler
as possible
each
can
implementa-
destroyed
system.
Transparency
support
thread
when
system
on
another
crossings
separate
thread
effect
kernel
cached
Considerations
user-level
of the
and
boundary
our
crossings
system.
1/0,
integrated
are
many
be
returned
of discards,
thread
an
Debugging
We
of being
boundary
to start
in
can
activations
instead
application-kernel
needed
created,
[131.
discarded
kernel
is used
once
creations
Further,
the
optimization
threads,
of our
user
research
and
level
each
flexibility
within
issues
performance,
thread
we
have
cost
fically,
flexibility
fork,
of upcalls)?
what
is
space.
the
The
previous
in
kernel
overall
effect
In
cost
system?
and
the
on
at
functionality
is the
our
threads
parallelism
sections.
what
yield)
the
of kernel
of managing
First,
and
between
Third,
in
questions.
block
of communication
functionality
address
addressed
three
(e, g.,
the
advantages
application
been
consider
operations
is the
is
performance
user
the
and
terms
of
of user-level
Second,
what
level
(speci-
performance
of
applications?
5.1
Thread
The
cost
of user-level
as those
—that
for
in
thread
FastThreads
running
on
integration.
original
tained
ACM
of the
is,
system
Performance
FastThreads,
Table
Transactions
I.
on
top
Table
Our
Computer
operations
package
of Topaz
IV
adds
Topaz
system
Systems,
in
kernel
the
kernel
10,
system
on the
threads,
performance
threads,
preserves
Vol.
our
running
the
No
order
1, February
is essentially
Firefly
with
the
of our
and
associated
system
Ultrix
of magnitude
1992
the
to our
prior
to the
processes
same
work
poor
data
con-
advantage
Scheduler Activations: Effective Kernel Support
Table
Signal- Wait
that
user-level
tion
in Null
and
the
kernel
on
threads
Fork
decrementing
must
inform
number
be notified.
(This
available).
There
the
(in
plus
which
an
when
a
48
lock
psec.
is
(The
than
5.2
Upcall
Thread
nel
Null
does
Fork
(Section
is
important,
point,
level
that
to those
threads
held
When
to
be
Our
we
by
the
this
time
an
onds,
a factor
We
see
from
time
in its
of
execution
is
by
1/0
the
request
of five
nothing
difference,
in
application
the
inherent
which
upcall
we
ACM
may
is
threads
similar
between
it
a
to the
activations
latency
have
user
or preempting
scheduler
the
the
at
user-level
activations
then
determine
be done
of blocking
reschedules
we
overhead
two
user-level
to the
to
when
determines
to wait
for
how
a critical
the
One
to signal
test
kernel.
in
and
Table
This
operations.
measure
IV,
approximates
fault.
signal-wait
time
in
scheduler
through
except
of making
The
of upcall
wait
machinery
Topaz
Transactions
that.
performance
thread
activation
than
attribute
than
upcall
kernel
threads
in
our
Topaz
Signal-Wait
be
a page
expected
of
slower
scheduler
or
can
for
ker-
infrequent
thread.
forced
worse
cost
Further,
the
case when
it helps
that
thread,
when
the
for
First,
scheduler
a kernel
is analogous
added
using
is considerably
is the
pleting
this
perfor-
of marking
optimization
frequent
needed
If the
implementation,
with
the
intervention,
preempted
synchronization
overhead
codes).
a Signal-Wait
sections
operations
on a uniprocessor.
our
implementation
the
to this
our
performance—the
reasons.
threads.
and
commensurate
kernel;
several
kernel
running
began
performance
the
critical
Upcall
of thread
preempting
even
is preempted
resource
more
characterizes
kernel
the
or
be practical
other
for
kernel
in
of blocking
long
5.1)
ratio
require
thread
a thread
has
and
are
resumed
condition
way
this
,usec.
is due
threads,
it
as
is being
the
kernel
Removing
of 49
necessary.
the
to outperform
user-level
could
not
though,
“break-even”
to begin
which
restore
that
processors
a zero-overhead
4.3).
time
benchmark
many
thread
than
without
Section
Fork
to
running
parallelism
Performance
performance
case—is
done
better
whether
Signal-Wait.)
involvement
cost
(see
a Null
as
to increment-
a program
in Signal-Wait,
be
worse
for
a preempted
of magnitude
held
yielded
path
must
is due
determining
sufficient
wants
whether
be significantly
FastThreads
always
processes
11300
1840
is a 3 psec. degrada-
and
be eliminated
with
Ultrix
which
threads
degradation
work
order
There
threads.
running
it
of checking
extra
still
would
that
ToPaz threads
948
441
FastThreads,
could
71
Latencies(Wsec)
of busy
or
is a 5 psec.
cost
case
Although
mance
machine
kernel
factor
over kernel
to original
the
the
Operation
FastThreads
on
Scheduler Activations
37
42
offer
relative
a uniprogrammed
can
Thread
FastThreads
On
Topaz threack
34
37
Operation
Null Fork
ing
IV.
.
the
and
is 2.4
that
com-
millisec-
threads.
activations
to
two
on Computer
that
is
responsible
implementation
Systems,
Vol
issues.
10, No.
1, February
for
First,
1992.
72
T. E. Anderson et al,
.
because we built
implementation
state,
the
and
thus
kernel
in
have
from
written
in
+ . For
assembler.
commensurate
comparison,
by
we
expect
with
than
what
our
if tuned,
kernel
thread
be
Topaz
to the existing
more
maintain
that
portion
thread
Burrows
by
our
[19]
the
reduced
As
next
a production
on
application
+
in
would
be
a result,
the
ap-
are
somewhat
section
in
SRC
Modula-2
performance
performance.
is
is entirely
recoding
upcall
of
system
implementation
four
in
achieved
designed
and
of
that,
had
of the
kernel
a factor
measurements
might
if we
much
Schroeder
over
Topaz
performance
worse
than
importantly,
assembler;
costs
Thus,
as a quick modification
thread
system, we must
overhead,
As
tuned
processing
plication
more
scratch.
carefully
Modula-2
RPC
scheduler
activations
of the Topaz kernel
scheduler
activation
implementation.
5.3
Application
Performance
To illustrate
sured
the
the
FastThreads
ning
effect
same
built
on
on scheduler
solution
to the
ing
the
center
the
tree
masses
were
all
this
force
on each
the
cache
cache
data.
fit
are
processor
Firefly’s
we
demonstrate
when
With
the
are
sequential
system
no
to
kernel
threads
Transactions
of
a cluster
of
size
was
if they
cache
for
and
were
into
physical
account;
memory
systems,
tests
by
buffer
50 msec.;
measurements
disk
the
simplifica-
kernel
the
generation
the
a further
(Our
the
for
used
so that
the
for
) All
memory,
modified
of memory
As
in
performance
illustrative.
We
chosen
memory.
current
available
amount
contention
than
on
Topaz
threads
the
number
three
of the
the
for
Computer
enough
of the
all
parallelize
the
application
on our
were
size
our
measure-
run
on
Systems,
makes
minimal
as on original
were
used.
a
six
so that
running.
for
2
each
there
use
graphs
of the
the
three
is negligible
(Speedup
is
of
FastThreads
Figure
of processors
1/0
relative
to
algorithm).
systems
added
perform
overhead
application.
either
system
memory
applications
because
threads
portions
exert
as a buffer
the
access.
took
to
explicitly
a disk
when
has
other
processor,
implementation,
ACM
if
versus
implementation
one
represent-
by
would
bound.
cause
as quickly
than
speedup
there
1/0
block
only
that
it runs
faster
application’s
systems
be
they
log N)
Firefly.
services,
much
to
exerted
that
simply
than
a tree
traverses
speed
or
physical
point
less
force
cache
we
then
run-
an O(N
cluster.
problem
floating
intended
CVAX
First,
and
normally
of magnitude
force
us to control
the
when
by the
memory
Firefly’s
in
The
of processor
of its
enough
our
body.
mea-
original
FastThreads
was
constructs
and
we
threads,
modified
algorithm
compute
allowed
miss
would
the
orders
and
in
similar
because
kernel
This
that
misses
ments
a part
kernel
we measured
of space
ratio
either
a small
qualitatively
are
be
to manage
threads
The
of the
relative
can
always
tion,
of mass
the
application;
[3].
be approximated
center
and
application
portion
performance,
Topaz
threads,
The
of each
on
application’s
of Topaz
problem
application
the
top
using
N-body
at the
application
system
of mass
can
Depending
our
application
activations.
to compute
distant
of
parallel
This
user-level
Vol
10,
worse
overhead
thread
No
than
of creating
1, February
system.
1992.
and
is
the
sequential
synchronizing
greater
for
Topaz
a
Scheduler Activations: Effective Kernel Support
5-
-0-
l“opaz
+
orig
+
new
o
73
threads
FastThrds
Fastl’hrdS
4-
3-
2-
1
o~
1
2
number
Fig.2.
Speedup
of N-Body
As processors
tially
improves
release
an
kernel,
provided
acquire
a busy
greater
both
user-level
bottleneck
during
our
tests,
wake
up
lock
is
the
of original
processors.
the
Topaz
periodically,
that
Thus,
so that
fora
kernel
though
to
and
be
kernel
lock
has
good
enough
paral-
performance.
when
are
are
using
less
ifthe
less
When
the application
tries
(in this case, six processors),
thread
systems
is similar.
impact
on the
duration).
Next,
because
performance
of original
and
our
system
for
a short
system
diverges
applications
has
several
time,
Transactions
if
slightly
were
running
daemon
and
then
threads
go back
to
spaces,
proceswhich
Fast-
to use all of the processors
of the
the number
of preemptions
for both
(The preemptions
have only a small
FastThreads
we show that
when the application
requires
it does 1/0, our system performs
better
than
ACM
is
crucial
Because
Threads.
machine
user-level
ofa
lock
our system explicitly
allocates
processors
to address
these daemon threads
cause preemptions
only when there are no idle
sors available;
this is not true
with
the native
Topaz scheduler,
controls
the kernel
threads
used as virtual
processors
by original
sleep.
is
attainedby
level
optimizations
the
overhead
sections
atuser
no other
operating
execute
to
threads.
FastThreads
Even
time
iniand
tries
application
critical
threads
acquire
trapping
speedup
prevents
available.
a thread
the
Topaz
ofthe
short
user-level
in
application
its
[121; these
If
good
that
performance
can
without
block
the
kernel
lock.
The
threads
it
byspinning
with
will
released.
100%ofmemory
Topaz
the
of contention.
shows
6
a thread
section
for
thread
ofkernel
to the
with
Topaz,
critical
the
restructuring
is built
which
a
ofprocessors,
In
contention
systems
trapping
or five
on
no
to improve
by
performance
four
out.
the
overhead
application
performance
presence
or perhaps
before
The
the
thread
threads
the
however,
when
in
number
flattens
is
5
processors
of
versus
lock
lock,
be able
kernel
with
then
there
itisthe
Wemight
the
added,
only
much
busy
are
application
rescheduled
lelism;
application
and
4
3
on Computer
Systems,
because
of their
short
kernel
involvement
either
original
FastVol.
10, No. 1, February
1992.
74
T. E. Anderson et al.
.
1oo0
Topaz
+
orig
+
new
80-
threads
Fast
Thrds
Fast
Thrds
60-
40-
c
o
20-
o~
100%
90%
80%
Fig.
3.
Execution
time
of
N-Body
60%
70%
% available
40%
50%
memory
application
versus
amount
of
available
memory,
6
processors,
Threads
or Topaz
threads.
Figure
3 graphs
the
application’s
execution
on six processors as a function
of the amount
of available
memory.
For all three systems, performance
degrades slowly at first, and then
sharply
once the application’s
application
performance
with
than
with
the
other
two
working
original
systems.
time
more
set does not fit in memory.
However,
FastThreads
degrades
more quickly
This
is because
when
a user-level
thread
blocks
blocks,
in the kernel,
the kernel
and thus the application
of the
parallel
1/0.
The curves
for modified
FastThreads
and for Topaz threads
each other because both systems are able to exploit the parallelism
of
the
application
to overlap
thread
serving
as its virtual
processor
also
loses that physical
processor for the duration
some of the
1/0
latency
with
As in Figure
2, though,
application
performance
is
fied FastThreads
than with
Topaz because most thread
useful
computation.
better
with
operations
modican be
implemented
without
kernel
involvement.
Finally,
while
Figure
3 shows the effect on performance
of applicationinduced
kernel
events,
multiprogramming
causes system-induced
kernel
events
that
result
in our system
having
better
performance
than
either
original
N-body
averaged
system;
Table
FastThreads
or Topaz threads.
To test this, we ran two copies of the
application
at the same time
on a six processor
Firefly
and then
their execution
times. Table V lists the resulting
speedups for each
note that a speedup of three would be the maximum
possible.
V shows that application
performance
with modified
FastThreads
is
good even in a multiprogrammed
that obtained
when the application
environment;
the speedup is within
5% of
ran uniprogrammed
on three processors.
This small degradation
is about what
and the need to donate
a processor
thread.
In contrast,
multiprogrammed
either
When
ACM
original
FastThreads
applications
using
Transactions
on
Computer
we would expect from bus contention
periodically
to run a kernel
daemon
performance
is much
worse
with
or Topaz threads,
although
for different
reasons.
original
FastThreads
are multiprogrammed,
the
Systems,
Vol.
10,
No.
1, February
1992
Scheduler Activations: Effective Kernel Support
Table
V.
Speedup
of N-Body
Application,
Multiprogramming
100% of Memory
Topaz
threads
1.29
Level
.
75
= 2, 6 Processors,
Available
Original
FastThreads
1.26
New
FastThreads
2.4.5
operating
system time-slices
the kernel threads
serving as virtual
processors;
this can result in physical
processors
idling
waiting
for a lock to be released
while
the lock holder
threads
than with our
expensive.
In addition,
tion,
it may
than from
for Topaz
is rescheduled.
system because
because Topaz
end up scheduling
more
Performance
is worse with
Topaz
common
thread
operations
are more
does not do explicit
processor
allocakernel
the other; Figure
2 shows,
threads
when
more than
threads
from
one address
space
however,
that performance
flattens
three
processors
are assigned
to
out
the
application.
While
the Firefly
is an excellent
vehicle
for constructing
proof-of-concept
prototypes,
its limited
number
of processors
makes
it less than
ideal for
experimenting
with significantly
parallel
applications
or with multiple,
mul tiprogrammed
parallel
applications.
For this reason, we are implementing
scheduler
activations
in C Threads
and Mach; we are also porting
Amber [6],
a programming
system
for
a network
of multiprocessors,
onto
our
Firefly
implementation.
6. RELATED
The
two
IDEAS
systems
with
goals
most
closely
related
to
our
own—achieving
properly
integrated
user-level
threads through
improved
kernel support —are
Psyche [20] and Symunix
[9]. Both have support
for NUMA
multiprocessors
as a primary
goal: Symunix
in a high-performance
parallel
UNIX
implementation,
and Psyche in the context of a new operating
system.
Psyche and Symunix
provide
“virtual
processors”
as described
in Sections
1 and
rupts
are
same
2, and
that
like
augment
notify
upcalls,
stack
the
these
user
except
and thus
virtual
level
that
processors
of some kernel
all, interrupts
are not reentrant).
of multimodal
parallel
programming
kinds,
in different
address spaces,
by defining
events.
on the
Psyche
software
(Software
same
processor
has also explored
in which user-defined
can synchronize
while
inter-
interrupts
threads
sharing
use the
the notion
of various
code and
data.
While
Psyche, Symunix,
and our own work share similar
goals, the approaches
taken
to achieve
these goals differ
in several
important
ways.
Unlike
our work, neither
Psyche nor Symunix
provides
the exact functionality of kernel threads with respect to 1/0, page faults and multiprogramming;
further,
the performance
of their user-level
thread operations
can be compromised, We discussed some of the reasons for this in Section 2: these systems
events that affect the
notify the user level of some but not all of the kernel
ACM
Transactions
on Computer
Systems,
Vol.
10, No
1, February
1992.
76
T E. Anderson et al.
.
address space. For example,
neither
when a preempted
virtual
processor
thread
system
threads
Both
are running
Psyche and
does
not
know
on those
Symunix
Psyche nor Symunix
notify
is rescheduled.
As a result,
how
many
processors
processors.
provide
shared
it
writable
the user level
the user-level
has
or
memory
what
user
between
the
kernel
and each application,
but neither
system provides
an efficient
mechanism for the user-level
thread
system to notify the kernel when its processor
allocation
needs to be reconsidered.
The number
of processors needed by each
application
could be written
into this shared memory,
but that would give no
efficient
way for an application
that
needs more
other application
has idle processors.
Applications
in both Psyche and Symunix
the kernel
in order
to avoid
preemption
processors
share
to know
that
synchronization
at inopportune
state
moments
(e.g.,
some
with
while
spin-locks
are being held). In Symunix,
the application
sets and later clears a
variable
shared with the kernel
to indicate
that it is in a critical
section; in
Psyche, the application
checks for an imminent
preemption
before starting
a
critical
latency,
section.
which
performance
The setting,
clearing,
and checking
of these bits adds to lock
constitutes
a large portion
of the overhead
when doing high-
user-level
thread
latency
management
unless
[2]. By contrast,
a preemption
actually
our
occurs.
system
has
no effect
on lock
Furthermore,
in these
preempt
other systems the kernel
notifies
the application
of its intention
to
before
the
preemption
actually
occurs; based on this
a processor
notification,
the application
can choose to place a thread in a “safe”
state and
voluntarily
relinquish
a processor.
This mechanism
violates
the constraint
that higher priority
threads are always run in place of lower priority
threads.
Gupta et al. [9a] share our goal of maintaining
a one-to-one correspondence
between
physical
processors
and execution
contexts
for running
user-level
threads.
being
texts
When
more
until’
context.
Our
a processor
contexts
than
the application
kernel
preemption
eliminates
the
need
application
thread
system of the event
constant.
Some systems provide
asynchronous
some of the problems
with
or 1/0
processors,
Gupta
reaches
a point
user-level
for
while
kernel
thread
completion
results
in
there
et al.’s kernel
time-slices
conwhere
it is safe to suspend
a
time-slicing
by
notifying
keeping
the number
1/0
as a mechanism
management
the
of contexts
to solve
on multiprocessors
[9, 251. Indeed, our work has the flavor of an asynchronous
1/0 system: when
an 1/0 request
is made, the processor
is returned
to the application,
and
later,
when the 1/0 completes,
the application
is notified.
There
are two
major differences
between
our work and traditional
asynchronous
1/0 systems, though.
First,
and most important,
scheduler
activations
provide
a
single uniform
mechanism
to address the problems
of processor preemption,
1/0, and page faults.
Relative
to asynchronous
1/0, our approach
derives
conceptual
simplicity
from the fact that all interaction
with
the kernel
is
synchronous
from the perspective
of a single scheduler
activation.
A scheduler activation
that blocks in the kernel
is replaced
with a new scheduler
activation
when the awaited
event occurs. Second, while
asynchronous
1/0
schemes may require
significant
changes to both application
and kernel code,
ACM
Transactions
on
Computer
Systems,
Vol.
10,
No
1, February
1992
Scheduler Activations: Effective Kernel Support
.
77
our scheme leaves the structure
of both the user-level
thread
system and the
kernel
largely
unchanged.
Finally,
parts of our scheme are related
in some ways to Hydra [26], one of
the earliest
multiprocessor
moved
of the
out
performance
the
kernel
could
policy
to a scheduling
without
in
decisions
policy
scheduling
this
and
policy
separation
required
server,
to the kernel.
are the kernel’s
be delegated
in which
Hydra,
communication
then
back
was
came
at
Longer-term
responsibility,
to the
to a distinguished
processor
although
application-level
a
through
kernel
to
switch.
In our system, an application
can set its
its threads
onto its processors,
and can implement
trapping
sions in our system
systems,
However,
cost because
implement
a context
policy for scheduling
policy
operating
kernel.
own
this
allocation
deci-
as in Hydra,
this
server.
7. SUMMARY
Managing
parallelism
parallel
at
computing,
operating
described
but
the
user
kernel
level
threads
is
essential
to
or processes,
high-performance
as provided
in
systems, are a poor abstraction
on which to support this.
the design, implementation
and performance
of a kernel
many
We have
interface
and a user-level
thread
package
that together
combine
the performance
user-level
threads
(in the common
case of thread
operations
that
can
implemented
(correct
entirely
behavior
at user
in the
level)
with
infrequent
the functionality
case when
Our approach
is based
multiprocessor
a virtual
on providing
in which
each application
the
application
many
processors
and
which
those
processors.
application
it
address
–Processor
has
exactly
Responsibilities
are
of its
divided
of kernel
the kernel
must
threads
be involved).
address
space
knows
exactly
threads
between
are
the
of
be
kernel
with
how
running
on
and
each
space:
allocation
(the
allocation
of processors
to address
spaces)
is done
threads
to its
of every
event
by the kernel.
—Thread
scheduling
(the assignment
processors)
is done by each address
—The
kernel
affecting
–The
address
can affect
The
notifies
the address
the
space
address
thread
space’s
scheduler
space.
space notifies
processor
address
of an
space.
the kernel
allocation
of the subset
of user-level
events
that
decisions.
mechanism
that we use to implement
these ideas is called
A scheduler
activation
is the execution
context
for
vectoring
control from the kernel to the address space on a kernel event. The
address space thread
scheduler
uses this context to handle the event, e.g., to
modify user-level
thread data structures,
to execute user-level
threads,
and to
make requests of the kernel.
While our prototype
implements
threads
as the
concurrency
abstraction
supported
at the user level, scheduler
activations
are
not linked
to any particular
model;
scheduler
activations
can support
any
user-level
concurrency
model because the kernel
has no knowledge
of userlevel data structures.
scheduler
kernel
activations.
ACM Transactions on Computer
Systems,
Vol.
10, No. 1, February
1992.
78
T. E. Anderson et al.
.
ACKNOWLEDGMENTS
We would
like to thank
Andrew
Black,
Mike
Burrows,
Jan Edler,
Mike
Jones, Butler
Lampson,
Tom LeBlanc,
Kai Li, Brian Marsh, Sape Mullender,
Dave Redell,
Michael
Scott,
Garret
Swart,
and John Zahorjan
for their
helpful
comments.
Center
for providing
We would
also like
us with
their
to thank
Firefly
the
DEC
hardware
Systems
Research
and software.
REFERENCES
Actors:
A Model of Concurrent
Computation
in Distributed
Systems.
MIT Press,
1. AGHA, G.
Cmabridge, Mass. 1986.
2, ANDERSON, T , LAZOWSKA, E., AND LEVY, H. The performance implications
of thread manIEEE
Trans. Comput. 38, 12 (Dee.
agement alternatives
for shared memory multiprocessors.
1989),
1631-1644.
Performance
Calif.,
Also
appeared
’89 Conference
May
1989),
446-449.
4. BIRRELL,
A.,
GUTTAG,
multiprocessor:
Operating
A
D.
system.
A hierarchical
J.,
Principles
Comput.
Msg.
programming
Symposwm
D.
Systems
J., LIPKIS,
systems.
1988),
Mellon
In
E., LEVY)
The
process
R,
Program.
M.
ceedings
2nd
June
algorithm.
the
11th
Nature
324
primitives
ACM
for
Symposium
a
cm
parallelism
in
the
Mach
operating
H.,
AND LITTLEFIELD,
In
Park,
ACM.
Rep.
R.
The
Proceedings
Ariz.,
31,
Dee
(Mar.
3
Amber
1989),
ACM
pp. 147-158.
1988),
CMU-CS-88-154,
system:
of the 12th
314-333.
School
of Computer
1988.
Process
USENIX
July
management
Workshop
on
for highly
UNIX
Systems
and
parallel
UNIX
Supereomputers
(Sept.
for
1985),
Wash.,
concurrent
Mar
S.-P.,
AND GLIGOR, V.
1987),
pp
15, MARSH,
B.,
Proceedings
Oct.
1990),
pp.
In
Systems
ACM
Trans.
and
structures.
Practice
In
Pro-
of Parallel
197-206.
S.
Empirical
Calif.,
with
A comparative
data
Principles
Oct.
processes
analysis
Conference
studies
of
the
1991),
of competitive
13th
ACM
spinning
Symposium
on
pp. 41-55.
and monitors
of multiprocessor
on Distributed
in Mesa
scheduling
Comm un.
algorithms.
Computing
Systems
(Sept.
356-363.
SCOTT, M.,
of the 13th
1991),
ACM
T , AND MAR~ATOS,
Symposium
P., AND STAUNSTRUP,
Parallel
on
LEBLANC,
E.
on Operating
First-class
Systems
user-level
Prlnclples
threads.
(Pacific
In
Grove,
pp.110-121.
Cornput.
J., AND KOHLER,
Transactions
computation.
concurrent
on
Proceedings
Grove,
of the 7th International
16. MOELLER-NIELSEN,
algorithms.
highly
M., AND OWrC~I,
(Pacific
symbolic
Symposium
multiprocessor
Principles
multipro-
Computer
501-538.
SIGPLAN
MANASSE,
use of shared-memory
CSL-TR-91-475A,
1991.
LAMPSON, B., AND REDELL, D.
Experiences
ACM. 23, 2 (Feb. 1980), 104-117.
Proceedings
effective
Rep.
for implementing
ACM
a shared-memory
Making
Tech.
A language
7, 4 (Oct.
(Seattle,
A., LI, K.,
Operating
ACM
Univ.,
Syst.
the
Programming
17. Moss,
and
(Oakland,
pp. 94-102.
and
Tech.
approach
A methodology
of
12. KARLIN,
control
Multilisp:
Lang.
11. HERLIHY,
Calif,
1987),
Commun
J,, AND SCHONBERG, E
Stanford
10. HAI,STEAD,
In
Systems
Synchronization
of
(Litchtield
system.
of the
R.
of multiprocessors.
C Threads.
Umv.,
Proceedings
Laboratory,
14. Lo,
SIGME’Z’RICS
Pp. 1-17.
cessors:
13
ACM
1990), 35-43.
9A. GUPTA, A , TUCKER, A., AND STEVENS, L
for
Nov.
concurrency
Principles
The V distributed
Carnegie
9. EDLER,
1989
of Computer
Proceedings
Tex.,
for
23, 5 (May
8. DRAVES, R., AND COOPER, E.
Science,
the
Modeling
AND LEVIN,
In
on a network
on Operating
7. CHERITON,
J.,
(Austin,
support
6. CHASE, J., AMADOFt, F., LAZOWSIiA,
Parallel
of
and
O(N log N) force-calculation
specification.
Scheduling
IEEE
HORNING,
formal
Systems
5. BLACK,
Proceedings
pp. 49-60.
3, BARNES, J., AND HUT, P.
(1986),
in
on Measurement
Computer
W.
4,
1 (Feb.
Concurrency
Systems,
Vol.
J.
Problem-heap:
1987),
features
10,
No.
A paradigm
for
multiprocessor
63-74.
for the Trellis/Owl
1, February
1992
language.
In
Proceed-
Scheduler Activations: Effective Kernel Support
.
79
ings of European
Conference
on Object-Oriented
Programming
1987 (ECOOP 87) (June 1987),
pp. 171-180.
Proceedings
of the ACM
18. REDELL, D. Experience
with
Topaz teledebugging.
In
SIGPLAN/SIGOPS
Workshop
on Parallel
and Distributed
Debugging
(Madison, Wise., May
1988), pp. 35-44.
Performance
of Firefly
RPC. ACM Trans. Cwnput. SW
19. SCHROEDER,M., AND BURROWS, M.
8, 1 (Feb. 1990), 1-17.
20. SCOTT, M., LEBLANC, T., AND MARSH, B. Multi-model
parallel programming
in Psyche. In
Proceedings
of the 2nd
21.
TEVANIAN,
Threads
ACM
SIGPLAN
Symposium
on Principles
and
BLACK,
COOPER, E.,
AND
Practice
of Parallel
YOUNG,
M.
(Mar. 1990), pp. 70-78.
Programming
A.,
and
RASHID,
R.,
Unix
Kernel:
the
GOLUB,
The
D.,
battle
for
D.,
control.
In
Proceedings
of the 1987
Mach
USENIX
Summer
Conference
(1987), pp. 185-197.
A multiprocessor
worksta22. THACKER, C., STEWART, L., AND SATTERTHWAITE, JR., E. Firefly:
tion. IEEE Trans. Comput. 37, 8 (Aug. 1988), 909-920.
23. TUCKER, A., AND GUPTA, A. Process control and scheduling issues for multiprogrammed
of the 12th ACM Sympo.mum
on Operating
shared memory multiprocessors.
In Proceedings
Systems Principles
(Litchfield
Park, Ariz., Dec. 1989), pp. 159-166.
24. VANDEVOORDE, M., AND ROBERTS, E. WorkCrews:
An abstraction
for controlling
paralProgram.
17, 4 (Aug. 1988), 347-366.
lelism. Int. J. Parallel
common
runtime
approach
to
25. WEISER, M., DEMERS, A., AND HAUSER, C. The portable
interoperability.
In Proceedings
of the 12th ACM Symposium
on Operating
Systems Principles
(Litchfield
26. WULF,
W.,
Park,
LEVIN,
tem. McGraw-Hillj
27.
Dec.
New
York,
ZAHORJAN, J., AND MCCANN,
Proceedings
Computer
28.
Ariz.,
ZAHORJAN,
overhead
1991),
Received
of the 1990
Systems
J.,
1989),
R., AND HARBISON,
C.
ACM
LAZOWSKA,
Hydra/
C.mmp:
An
Experimental
Computer
Sys-
1981.
(Boulder,
in shared
pp. 114-122.
S.
Processor
Colo.,
E.,
memory
scheduling
SIGMETRICS
May
1990),
AND EAGER,
in shared
Conference
D.
multiprocessors.
memory
multiprocessors.
on Measurement
and
Modeling
In
of
pp. 214-225.
The
IEEE
effect
of scheduling
Trans.
Parallel
Distrib.
discipline
on spin
Syst.
2, 2 (Apr.
180-198.
June
1991;
revised
August
ACM
1991;
accepted
Transactions
September
on Computer
1991
Systems,
Vol.
10, No. 1, February
1992.
Download