File-System Development with Stackable

advertisement
File-System Development
Stackable Layers
with
JOHN S. HEIDEMANN and GERALD J. POPEK
University of California, Los Angeles
Filing
services
promising
ments
present
stand
alone
requires
faded
several
changes
There
allowing
development.
work
invahdate
design
layer
very
discusses
an implementation
lowering
barriers
Categories
and
not feasible
Subject
design;
D.4.8
Files—orgaruzatzon
General
Terms:
Additional
ture,
D.47
years,
many
of these
current
filing
environ-
systems
today
typically
is that
file
support
but
of new
filing
services
often
m several
of which
order,
Complex
be provided
and layers
can occupy
layer
fihng
services
by independent
different
evolution
and
address
development
each layer
in detad
these
ways.
may
and presents
facilities
stackable
that
layering
design
exhibits
offers
techniques
very
it enables
high
the potential
We
performance.
of broad
By
third-party
today.
D 4.3
[Operating
[Operating
[Operating
Systems]:
Systems]
Systems]:
organization
File
and
System
Deslgn—hLerai”cht-
Performance—measurements;
E.5
[Data]:
/ structure
Design,
Key
example,
and
Independent
bounding
Descriptors:
Management—rnatntenance;
cal
to layer
layering
design,
For
issues
each
configuration
providing
to new filing
development
these
interface
stackable
reason
of others,
blocks,”
constraints
layer
in recent
One
work.
addresses
“building
flexible
describe
file-system
existing
by an extensible
paper
use.
on the
are no syntactic
are supported
broad
to new
from
spaces,
of innovations
into
barriers
file-system
parties.
a number
to enter
of building
that
constructed
This
experienced
have
instead
Stackable
are
have
Ideas
Words
Performance
and
Phrases
Composablhty,
file
system
design.
operating
system
struc-
reuse
1. INTRODUCTION
File
systems
services.
operating
represent
one of the most
important
aspects
of operating-system
Traditionally,
the file system has been tightly
integrated
system proper. As a result, the evolution
of filing
services
with the
has been
This work was sponsored
by the Advanced
Research
Projects
Agency under contracts
C-0072 and NOO174-91-C-O1O7.
Also, J. S. Heidemann
was sponsored
by a USENIX
for the
1990– 1991
academic
year,
address:
Department
and
G. J. Popek
is affiliated
with
Locus
F29601-87scholarship
Computing
Corpora-
tion.
Authors’
90024;
email:
Permission
not made
of the
to copy without
or distributed
publication
Association
specific
of Computer
]ohnh@cs.ucla.edu;
and
fee all or part
for direct
its
for Computing
date
Science,
popek(ucs,
of this
commercial
appear,
Machinery.
Umversity
of Califorma,
Los Angeles,
CA
ucla.edu.
and
material
IS ~g-anted
advantage,
notice
the ACM
is given
To copy otherwise,
that
provided
copyright
copying
or to republish,
that
notice
the copies
is by permission
requires
are
and the title
of the
a fee and/or
permission.
C 1994 ACM
0734-2071/94/0200-0058
ACM
TransactIons
$3.50
on Computer
Systems,
Vol
12, No
1, February
1994, Pages 58-89
File-System Development
relatively
slow.
For
today,
called
almost
a decade
is similar.
change
catalog
areas
benefits
from
builders
users
to provide
at considerable
There
file-system
their
reasons
service
by which
fact has not worked
so most
vendors
real
system
variety
have
of functional
dependent
filing
the file
service
to arrange
manager
the
for
the
file
into
that
and
the
face of these
observations,
it is not
surprising
and deployed,
is not changed
a great
deal
possible
to add
file-system
services
to an existing
manner,
analogous
user
level
to the
merely
by
way
adding
services is not provided
individually
developed
that
by a monolithic
packages,
many
with
one another
in a straightforward
may
the
software”
select
at a local
an explosion
systems,
@
UNIX
store,
of services
where
is
services
a trademark
and install
of UNIX
ACM
it himself.
rate,
is traditionally
System
Transactions
acquire
to a
viewed
with
the impact
basis
so
of an
with
virtual
that
other
motivation
memory
buffer
pool. In
a file
system,
time.
would
accrue
system
in a very
if it
is,
the
were
simple
is obtained
That
at the
set of user
but by a large number
of
today can exchange
data
way. In the personal-computer
he wishes,
at a far faster
the process
program,
of which
when
a single
service
application.
any
solution
as it is usually
system.
then
of benefit
an additional
another
user
Also,
for a long
problems,
of issues,
system,
system’s
with
these
this
is considerable
operating
perhaps
Sys-
in practice,
effort
on an intimate
of pages,
in a few
Furthermore,
can be quite
devasting,
storage in the computer
there
a
File
successful
is critical.
interact
that
Virtual
ways.
it
system.
the range
a daunting
management
Despite
fact
(the
of an operating
For example,
First,
to introduce
operating
the
requires
to share
once operational
an
in addressing
performance
service
of
software
persists.
vendor
can be installed,
and is thus
must
to users.
a variety
such as database
situation
Despite
is such a key part
services.
benefit
kept
applications
system
it in incompatible
error in the filing
system’s
logic
responsible
for all of the persistent
In addition,
their
interface
system
service
on it, excellent
core operating-system
inside
a major
is inflexible
requirements,
Since
have
developers
to employ.
extended
is a complex
in its entirety.
much
VFS
the fact
generality.
is a coarse-grain
file
systems
architectural
despite
far more
improvement
an entire
well.
giving
unappealing
than
interface
there
systems
introduced
operating
exists
of evolution
much
this
UNIX@
59
that are well known
and have been
By contrast,
applications
software
in
services
or incremental
UNIX,
or VFS)
file
why
other
of affairs
.
System
has seen little
applications
own filing
is no well-defined
tem,
state
rapidly
and without
for anyone
like
more
and have caused
are several
systems,
This
in
File
in proprietary
for example,
and inhibition
expense
difficult
There
much
of innovation
system
Fast
for filing
system,
and a half.
has evolved
file
Berkeley
are numerous
improvements
in one context
or another.
stifling
is very
primary
the
ago. The situation
in over a decade
many
the
is basically
The MVS
that there
constructed
This
example,
UFS,
with Stackable Layers
appropriate
This
situation
for example,
considerably
a
has provided
than
more
arena,
“shrink-wrap
on mainframe
cumbersome.
Laboratories,
on Computer
Systems,
Vol.
12, No.
1, February
1994.
60
J. S. Heidemann
.
We believe
deal. Suppose
that a similar
situation
would
it were just as straightforward
obtain
and install
—fast
physical
—extended
packages
storage
directory
—compression
for such services
as
management,
services,
encryption
and decryption,
performance
—remote-access
—selective
—undo
benefit
filing
services
a great
to construct
independently,
or
and decompression,
—automatic
—cache
and G. J. Popek
enhancement,
services,
file
replication,
and undelete,
and
—transactions,
with
the
assurance
that
available
much
they
services
more
widely
would
would
work
together
be much
available,
and
effectively.
richer,
Then,
particular
evolution
would
we contend,
solutions
be far
more
would
be
rapid.
In
addition,
the wide variety
of expensive
and confusing
ad hoc solutions
employed today would not occur.1
The goal of the research
reported
in this paper is to contribute
toward
making
it as easy to add a function
to a file system in practice
as it is to
install
an application
provide
on a personal
an environment
effectively
assembled,
functionality
useful
in
for
or performance.
lifetime
operation
and
which
spans
which
built
the
To do so, it is necessary
can be easily
makes
no compromises
must
of computer
in the past
fragments
result
The environment
generations
of modules
computer.
file-system
must
be extensible
systems.
so that
Continued
be assured.
Multiple
to
and
in
its
successful
third-party
additions
must be able to coexist in this environment,
some of which
may
provide unforeseen
services.
The impact
of such an environment
can be considerable.
For example,
many recognize
the utility
of microkernels
such as Mach or Chorus.
Such
systems,
however,
address
the structure
of about
15 percent
of a typical
operating
system, compared
40 percent of the operating
This
paper
file-system
describes
to the filing
system.
an interface
development.
This
environment,
and
framework
which
framework
allows
that
new
often
represents
support
stackable
layers
to build
on the
functionality
of existing
services, while allowing
all parties
to gradually
grow
and evolve the interface
and set of services. Unique
characteristics
of this
framework
include
the ability
of layers in different
address spaces to interact
and for layers to react gracefully
in the face of change.
1Consider,
service
that
would
compatible
ACM
for
example,
the
grouping
on top of and independent
not
have
been
facihty
in MS-Windows.
of the underling
easy
to incorporate
MS-DOS
into
the
It m, in effect,
directory
underlying
fashion.
TransactIons
on
Computer
Systems,
Vol
12,
No,
1, February
1994
system.
file
another
directory
It provides
system
in
an
services
upward
File-System Development
Actual
experience
utility.
For
examples
obtained.
1984],
this
software
much
environments
of the
body
6.2),
modularity
as well
[Keiman
as fragments
1986],
of ideas
work.
The experiences
and application
interface
controlling
access to persistent
that
which
of the Paper
This
examines
paper
how
paper
.
61
test of their
is accompanied
these
stream
by
experiences
were
protocols
[Ritchie
and object-oriented
in a variety
design
of other
of these principles
data differentiate
(see
referenced
to
this
a complex
work from
has come before.
1.1 Organization
consider
is an important
of this
drawn
from
the UNIX
system,
where
Our work draws on earlier
research
with
file-system
Section
with
reason
with Stackable Layers
first
the
direct
essential
support
the evolution
of file-system
characteristics
for
stacking
development.
of a stackable
can
enable
new
file
system
approaches
We then
and
discuss
to file-system
design. An implementation
of this framework
is next examined,
focusing
on
the elements
of filing unique to stacking.
We follow with an evaluation
of this
implementation,
sider
examining
similar
areas
of
both
performance
operating-systems
and usability.
research
to
Finally,
place
this
we conwork
in
context.
2. FILE-SYSTEM
DESIGN
Many new filing
services
tion presented
a certainly
have been
incomplete
suggested
in recent years.
list of nine such services,
exists today in some form. This section
approaches
for new services, examining
simplify
development.
2.1 Traditional
A first
which
file
can
existing
frequently
ing
new
compares
alternative
implementation
characteristics
of each that hinder
or
Design
approach
existing
The Introduceach of which
to providing
system.
There
be modified
to add
implementation
evolved
capabilities
new filing
are several
can
significantly
across
new
speed
since
a dozen
functionality
might
widely
available
features.
Although
development,
their
systems,
beginning
“standard”
initial
different
be to modify
file
platforms
with
services
implementation.
may
an
any
of
an
have
Supportwell
mean
a
dozen separate
sets of modifications,
each nearly as difficult
as the previous.
Furthermore,
it is often difficult
to localize
changes; correctness
verification
of the modified
file system will require
examination
of its entire implementation, not just the modifications.
Finally,
although
file systems may be widely
used, source code for a particular
system
may be expensive,
difficult,
or
impossible
to acquire. Even when source code is available,
it can be expected
to change frequently
as vendors update their system.
Standard
filing
interfaces,
such as the vnode interface
[Kleinman
1986]
and NFS [Sandberg
et al. 1985], address some of these issues. By defining
an
“airtight”
boundary
for change, such interfaces
avoid modification
of existing
services, preventing
introduction
of bugs outside new activity.
Yet, by employing existing
file systems for data storage, basic filing
services do not need to
ACM
Transactmns
on Computer
Systems,
Vol.
12, No
1, February
199-4
62
J. S. Heidemann
.
be reimplemented.
level,
Use
own.
NFS
simplifying
and G. J. Popek
also allows
development
of standard
interfaces
The interface
file
systems
and reducing
introduces
is either
evolving
to be developed
the impact
implementation
or static.
at the user
of error.
problems
If, like
the vnode
of their
interface,
it
evolves to provide
new services and functionality,
compatibility
problems
are
introduced.
A change to the vnode interface
requires
corresponding
changes
to each existing
filing
Compatibility
with
NFS.
Although
to provide
services
new
must
services
either
Finally,
concept
of modularity
A naive
today.
can be avoided
this
interface.
tion.
service,
problems
approach
cleanly.
RPC
with
implementation
the result
is not fully
of layer crossings
thus discouraging
2.2
Stackable
form.
New
vides
a clear
layer.
illustrate
is not
facilities
can
of process
easily
allowed,
new
a parallel
services
burden
execution
interpose
as
difficult
then
or employ
to user-level
choices
static,
the
and protec-
repeated
data
and the user. Although
performance
can be
[Steere et al. 1990], portability
is reduced,
and
satisfactory.
Finally,
even with
careful
caching,
the cost
is often orders of magnitude
greater
than subroutine
the use of layering
for structuring
and reuse.
systems
construct
developed
layers,
services
layer
another
interface
it becomes
calls,
Design
Stackable
file
independently
current
interfaces
approach
copies between
the hardware
improved
by in-kernel
caching
change
on existing
particular
the
portability,
If protocol
be overloaded
NFS-like
by keeping
improves
how
are provided
boundary
or
an invalid
“stacks,”
as those
layers;
implementation
(discussed
can
be
used
is actually
of Figures
services
from a number
of
available
only in binary
Errors
1–5
stacking
Stacking
as separate
for modifications.
Figures
environments.
such
complex
filing
each potentially
of the
further
to
layer
the
services
interface,
in
by
sections)
a variety
since
be common;
syntactically
proto the
interface
following
of a misnomer,
3 and 4, should
division
isolated
interlayer
in
provide
somewhat
term for historic
reasons.
Layers
are joined
by a symmetric
the
can be then
of
nonlinear
we retain
identical
the
above
and below. Because of this symmetry,
there are no syntactic
restrictions
in
the configuration
of layers. New layers can be easily added or removed from a
file-system
stack, much as Streams
modules
can be configured
onto a network
interface.
Such
filing
configuration
is simple
to do; no kernel
are required,
allowing
each experimentation
with stack behavior.
Because new services often employ
or export additional
functionality,
changes
the
interlayer
interface
is extensible.
Any layer can add new operations;
existing
layers adapt automatically
to support
these operations.
If a layer encounters
an operation
it does not recognize,
the operation
can be forwarded
to a lower
layer in the stack for processing.
Since new file systems can support
operations not conceived
of by the original
operating-system
developer,
unconventional
file services can now be supported
as easily as standard
file systems.
The goals of interface
symmetry
and extensibility
may initially
appear
incompatible.
\(
‘\!
‘1’r,in~,ict
Confiwration
]OIIS
on
computer
is most
Systems,
Vol
flexible
12, No
when
1, February
all interfaces
1994
are literally
File-System Development
syntactically
identical,
same mechanisms.
distinct,
restricting
that
is, when
all
with Stackable Layers
layers
can
take
file-system
domain,
but there
transparently
exist
development.
is considerable
in
in the
Layers
different
same
execute
advantage
address
address
often
space
to making
spaces.
should
in
Whether
require
63
advantage
But extensibility
implies
that layers may
how layers
can be combined.
Semantic
layers
layers
may
of stack-
a single
protection
also able to run
or not
no changes
of the
be semantically
constraints
do
limit
layer configurations,
but Sections
4.2 and 4.4 discuss how
provide reasonable
default
semantics
in the face of extensibility.
Address-space
independence
is another
important
characteristic
able
.
different
to the
layers
layer
and
should not affect stack behavior.
New layers can then be developed
at the
user level and later put into the kernel for maximum
performance.
A transport
layer bridges the gap between
address spaces, transferring
all
operations
port layer
this way,
and results
back and forth. Like the interlayer
interface,
a transmust be extensible,
to support
new operations
automatically.
In
distributed
filing
implementations
fit smoothly
into
the same
framework
and
transport
layer
corresponding
address
are
afforded
isolates
the
the
performance
space,
the
impact.
statistically
tremely
rapid
manner,
marshaling.
The interface
same
advantages.
“distributed”
component
Stack
calls
dominate
importantly,
stacking
between
case,
avoiding
distributed
has been developed
More
of the
can
layers
then
protocols
to enable
in
operate
the
and
in
such as
absolutely
its
a single
an ex-
argument
minimum
crossing
cost locally
while
still
maintaining
the structure
necessary
for
address-space
independence.
This stacking
model of layers joined
by a symmetric
but extensible
filing
interface,
the
constrained
binary-only
absolutely
of this
minimal
work.
section
techniques
of
section
or facilitated
3. STACKABLE
requirements
performance
The next
are enabled
This
by the
environment
of address-space
kernels,
impact,
discusses
and
the
represents
approaches
by the stacking
independence,
overriding
demand
the primary
to file-system
for
contribution
design
that
model.
LAYERING TECHNIQUES
examines
enabled
in detail
a number
or simplified
of different
by stackable
file-system
development
layering.
3.1 Layer Composition
One goal of layered
file-system
design is the construction
services from a number
of simple, independently
developed
systems
services
Our
are to be constructed
should be decomposed
experience
shows
that
of complex
filing
layers. If the file
from multiple
layers,
one must decide how
to make individual
components
most reusable.
layers
are most
easily
reusable
and
composable
when each encompasses
a single abstraction.
This experience
parallels
those
encountered
in designing
composable
network
protocols
in the x-kernel
[Hutchinson
et al. 1989] and tool development
with the UNIX
shells [Pike
and Kernighan
1984].
ACM
Transactions
on Computer
Systems,
Vol.
12, No.
1, February
1994.
64
J. S. Heidemann
.
Fig.
1.
and G. J Popek
Compression
serwce
stacked
over
a UNIX
file
system.
&
layer
As
an
example
of this
problem
in
the
context
of file-system
I
layering,
consider the stack presented
in Figure
1. A compression
layer is stacked over
a standard
UNIX
file system (UFS); the UFS handles
file services, while the
compression
layer periodically
compresses
rarely used files.
A compression
service provided
above the UNIX
directory
difficulty
efficiently
handling
files
with
multiple
names
abstraction
(hard
links).
has
This
is
because
the UFS was not designed
as a stackable
layer;
it encompasses
several separate
abstractions.
Examining
the UFS in more detail,
we see at
least three basic abstractions:
a disk partition,
enced by fixed names (inode-level
access), and
vice.
Instead
stack
layer
of a single
the
“UFS
service”
should
files referdirectory
ser-
be composed
of directory,
file, and disk layers. In this architecture
the
could be configured
directly
above the file layer. Multiply
would
no longer
higher-level
be a problem
layer.
other low-level
Figure
2.
3.2
layer,
arbitrary-length
a hierarchical
One could
storage
because
multiple
also imagine
implementations.
names
reusing
Stacks
would
the
be provided
directory
of this
type
of a
compression
named files
service
are
shown
by a
over
in
Layer Substitution
Figure 2 also demonstrates
layer substitution.
Because the log-structured
file
system
and the UFS are semantically
similar,
the compression
layer can
stack equally
well over either.
Substitution
of one for the other is possible,
allowing
selection
of low-level
storage
to be independent
of higher-level
services.
This
ability
to have “plug-compatible”
layers
not only supports
higher-level
services across a variety
of vendor-customized
storage facilities,
but also
desired.
3.3
supports
Nonlinear
File-system
the
evolution
and
replacement
of the
lower
layers
as
the
top
Stacking
stacks
are
frequently
linear;
all
access
proceeds
from
down through
each layer.
However,
there
are also times
when
nonlinear
stacks are desirable.
Fan-oz/t
occurs when a layer references
“out” to multiple
layers beneath
it.
Figure
3 illustrates
how this structure
is used to provide
replication
in Ficus
[Guy et al. 1990].
ACM
TransactIons
on
Computer
Systems,
Vol
12,
No,
1, February
1994,
File-System Development
0
with Stackable Layers
.
65
Q
user
user
0s
+
d
directory
layer
directory
layer
Fig.
2.
Compression
physical
compression
ti layer
ent file
~compression
~
storage
storage
layer
service.
layer
configured
Each
(UFS
stack
with
a modular
also uses a differ-
and
log-structured
layers.
Fan-out
layout).
layer
~ Iog-fs file I
/
[ l~YOUt
4
+
d
+
disk
device
~
device
i
Q
user
A
J?’-Ficus
logical
Fig.
i remote
access
??
Al
Cooperating
Ficus
layer
to identify
access
layer
is inserted
several
replicas,
between
while
cooperating
allows
the
a remoteFicus
lay-
ers as necessary.
1
Fan-in
layer
I
I
3.
logical
is
allows
multiple
separately
clients
named,
it
access
is
to a particular
possible
for
layer.
knowledgeable
If each
stack
programs
to
choose to avoid upper stack layers. For example,
one would prefer to back up
compressed
or encrypted
data without
uncompressing
or decrypting
it to
conserve
space
by direct
access to the underlying
and to preserve
ACM
privacy
Transactions
(Figure
storage
layer.
on Computer
4). This
could
Network
Systems,
easily
be done
access of encrypted
Vol.
12, No.
1, February
1994.
J. S. Heidemann
66
.
Fig. 4.
Access
and G. J. Popek
A&
ently
the
through
uncompressed
compressed
the compression
data.
version
Fan-in
layer
allows
provides
a backup
users
transpar-
program
to access
dmectly.
comprason
layer
UFS
~
data
could
storage,
3.4
also
be provided
avoiding
Cooperating
Layered
layers.
design
encourages
Sometimes
stateless
might
build
but
access
that
underlying
encrypted
several
exact
particular
if remote
of file systems
be reusable
occur
into
small,
in the
possible
semantics,
semantics
on
access
is provided
distributed
UNIX
is buried
top
of
in the
as a reusable
an
internals
layers.
layer,
file
or even file
reusable
middle
file system. For example,
a distributed
and server portion,
with a remote-access
service,
its
could
system, it will be unavailable
for reuse.
Cases such as these call for cooperating
access service
to the
over the network.
the separation
services
One can envision
simple
service,
direct
transfer
Layers
otherwise
special-purpose
may consist of a client
between.
by
clear-text
of an
file system
service in
systems
offering
replication.
“RPC”
Each
remote-access
of each
specific
A “semantics-free”
and the remainder
file
remoteis split
into
two separate,
cooperating
layers. When the file-system
stack is composed, the
reusable
layer is placed between
the others. Because the reusable
portion
is
encapsulated
as a separate
layer, it is available
for use in other stacks. For
example,
a new secure remote
filing
service could be built
by configuring
encryption/decryption
layers around the basic transport
service.
An example
of the use of cooperating
layers in the Ficus replicated
file
system
layers
[Guy
et al. 1990]
correspond
remote-access
layer
3.5 Compatibility
is shown
roughly
in Figure
to a client
is placed
between
3. The Ficus
and
server
them
when
logical
of a replicated
and physical
service.
A
necessary.
with Layers
The flexibility
stacking
provides
promotes
rapid interface
and layer evolution.
Unfortunately,
rapid
change often rapidly
results
in incompatibility.
Interface change and incompatibility
today often prevent
the use of existing
filing
abstractions
[Webber
1993]. A goal of our design is to provide
approaches
to
cope with interface
change in a binary-only
environment.
File-system
interface
evolution
takes a number
of forms. Third parties wish
to extend interfaces
to provide
new services. Operating-system
vendors must
change
ACM
interfaces
TransactIons
on
to evolve
Computer
the
Systems,
operating
Vol
12,
No
system,
1, February
but
1994
usually
also wish
to
File-System Development
maintain
backward
approaches
Extensibility
Any
provides
interface
is the
primary
to the
interface;
services.
evolution
increased,
.
67
a number
of
evolution.
can add operations
existing
gradual
operating-system
of a filing layer is greatly
layering
of interface
file-system
party
not invalidate
Stackable
the problems
of the
compatibility.
need
compatibility.
to address
with Stackable Layers
Third-party
tool
to
such
development
address
additions
is facilitated,
becomes possible,
and the useful lifetime
protecting
the investment
in its construc-
tion.
Layer
substitution
(see Section
incompatibilities.
adaption
to differences
format
tied
on other
of more
have
interface,
service
provide
hardware
A still
more
significant
direct
different
layers
example,
system
simple
allows
a low-level
easy
storage
by an alternate
may
base layer
boundaries,
used transport
and operation
this limited
role. Section
Ficus replication
available
operating
of the
NFS
this
systems.
radi-
access to
and
of stackable
can be thought
ranging
from
provides
and is not extensible,
only
it is still
approach
to
machine
benefits
NFS
a single
with
limited
can bridge
on platforms
how
between
vendor
systems
many
5.3 describes
on PCs.
map
different
layers
standard
of their
to map
operating
environment.
available
to
parties
sets is difficult,
Transport
Although
restrictions,
by
layer.
semantics
operating-system
changes.
between
extending
layer,
to mainframes.
imparts
or by an
significant
computing
a compatibility
of the
be constructed
is posed
be possible.
a widely
services,
easily
employ
views
be used by third
of layers
services
to a nonstacking
computers
can
barrier
may
identical
could
portability
services
not
layer
facility
layering
of as
personal
core filing
quite
useful
is used
in
to make
User-Level Development
One advantage
the
For
problems
but
similar
platforms
comparability
after
operating-system
3.6
to address
similar
can be replaced
significant
similar
This
to several
backward
Although
remote
approach
semantically
environments.
a thin
incompatibilities.
cally
in
to particular
layers
shared
of
machines.
Resolution
If two
3.2) is another
Substitution
operating
with
this
approach.
are simply
RPC
usually
takes
serves
as the
user-level
allows
of microkernel
system
this
Each
form
at UCLA
file-system
server.
calls
layers
may
on
servers.
(Figure
moving
vnodes
new
operations
exist
and
inside
executed
development
layer
service
the
(such
the
(the
as NFS)
kernel
“u-to-k
kernel.
as user
of
naturally
and operations
layer
from
portions
fits
of as a server,
In fact,
transport
that
be developed
large
layering
5). A transport
all
Another
to move
Stackable
can be thought
between
interface,
is the ability
of the kernel.
layer
messages
RPC
user-level
framework
design
outside
to
With
code.
a
layer”)
this
Although
inter-address-space
RPC has real
formance
for an out-of-kernel
file
cost, caching may provide
reasonable
persystem [Steere et al. 1990] in some cases,
particularly
if other
latency (for example,
of the filing
service
storage management).
characteristics
hierarchical
ACM
Transactions
on Computer
Systems,
have
Vol.
inherently
12, No.
1, February
high
1994.
68
J. S. Heidemann
.
and G. J. Popek
-\
/’/+
lL-f=Y2-
I
GW*
L’---Kd
nfs
(client)
I/
*
layer
/nf~
I
--’ protocol
Fig, 5.
Nevertheless,
User-level
many
layer
filing
development
services
will
via transport
find
the
layers
cost
overly expensive.
Stackable
layering
offers valuable
Because file-system
layers each interact
only through
of frequent
flexibility
the layer
RPCS
in this case.
interface,
the
transport
layers can be removed
from this configuration
without
affecting
layer’s implementation.
An appropriately
constructed
layer can then run
the kernel,
kernel
avoiding
(or between
all RPC
different
ing the concepts
of modularity
ing permits
advantages
an integrated
the
execution
overhead.
Layers
user-level
servers)
from
address-space
of microkernel
can be moved
as usage
development
in an out of the
requires.
protection,
a
in
By separat-
stackable
layer-
and the efficiency
of
environment.
4. IMPLEMENTATION
The UCLA
stackable
layers interface
and its environment
are the results
of
our efforts
to tailor
file-system
development
to the stackable
model. Sun’s
vnode interface
is extended
to provide
extensibility,
stacking,
and addressspace independence.
We describe this implementation
summary
of the vnode interface
and then examining
our stackable
4.1 Existing
here, beginning
with a
important
differences
in
interface.
File System
Interfaces
Sun’s vnode interface
is a good example
of several
“file-system
switches”
developed
for the UNIX
operating
system [Kleiman
1986; Rodriguez
1986].
All have the same goal, to support
multiple
file-system
types in the same
operating
system.
The vnode interface
has been quite
successful
in this
respect,
providing
dozens of different
filing
services
in several
versions
of
UNIX.
The vnode interface
is a method
of abstracting
the details
of a file-system
implement
ation from the majority
of the kernel.
The kernel views file access
ACM
Transactions
on
Computer
Systems,
Vol.
12,
No.
1, February
1994.
File-System Development
with Stackable Layers
.
69
A
Fig.
through
two
abstract
data
types.
6.
Namespace
A vnode
identifies
composed
of two
individual
subtrees.
files.
A small
set of file types
is supported,
including
regular
files,
which
provide
an
uninterpreted
array of bytes for user data, and directories,
which list other
files. Directories
include
references
to other directories,
forming
a hierarchy
of files. For implementation
reasons, the directory
portion
of this hierarchy
is
typically
limited
to a strict tree structure.
The other
major
configuration
referred
data
structure
purposes,
to as file
is the
sets of files
systems
or disk
partitions),
Subtrees
are added to the file-system
Mounting
is the process of adding
file-system
namespace.
Figure
vfs, representing
are grouped
into
groups
of files.
each corresponding
to one vfs.
namespace
by mounting.
new collections
of files into
6 shows
two
For
sub trees (traditionally
subtrees:
the
root
the
global
subtree
and
another
attached
under
/usr. Once a subtree
is mounted,
name translation
proceeds automatically
across subtree
boundaries,
presenting
the user with
an apparently
seamless namespace.
All files within
a subtree
typically
have
al UNIX
disk partitions
correspond
is employed,
each collection
of files
a corresponding
completely
Data
subtrees
subtree
separate
requires
be manipulated
supported
characteristics.
local
machine.
Tradition-
subtrees.
machine
Each
When
NFS
is assigned
subtree
is allowed
a
implementation.
encapsulation
that
these
abstract
only by a restricted
by vnodes,
implementation
on the
similar
one-to-one
with
from
a remote
the
(see Karels
abstract
and
data
data
types
set of operations.
type
McKusick
for
for
“files,”
vary
and
Kleiman
[1986]
files
and
The operations
according
to
[1986]
for
semantics
of typical
operations).
To allow this generic
treatment
of vnodes, binding
of desired
function
to
correct implementation
is delayed
until
kernel
initialization.
This is implemented by associating
with each vnode type an operations
vector identifying
the correct
can then
implementation
be invoked
of each operation
on a given
vnode
for that
by looking
vnode
type.
up the correct
Operations
operation
in
this vector (this mechanism
is analogous
to typical
implementations
of C+ +
virtual
class method
invocation).
Limited
file-system
stacking
is possible with the standard
vnode interface
using the mount mechanism.
Sun Microsystems’
NFS [ Sandberg
1985], loopACM
Transactions
on Computer
Systems,
Vol.
12, No.
1, February
1994.
70
.
J. S. Heidemann
back
and
translucent
and G. J. Popek
[Hendricks
1990]
Information
associated
with the mount
layer and where the new layer should
Extensibility
4.2
of interface
Incompatible
1993]
are serious
filing
services
concerns
without
The vnode
interface
the addition
file
have
The UCLA
all interface
cally
operations
information
approach.
initialization,
additions.z
invocation.
of each operation
During
new operations,
reconfiguration.
operation
may
systems
be added
provides
of these
the
operations
This
set
dynamically,
very
rapid
the correct
each file
by any layer.
dynami-
to each file system
select
can be added
since
change.
of all
kernel.
vector
to permit
Thus,
then
a list
by this
customized
vectors
vnode.
and
operations
sufficient
these
for a given
and new file
operations
Vectors
information
after
by maintaining
beings
supported
the global
of all
prohibits
execution,
of extensibility
the union
of all operations
used to define
its imple-
convention
or during
system
of
evolu-
definition
compatibility
execution
file
with
the formal
time
problem
At kernel
caching
tion
link
until
a requirement
ease vendor
of an operation
This
existing
[Webber
to add to the set of
is
greatly
interface
Each
it to arbitrary
tion
this
with
problem
The ability
compilation.
interface.
the list
is then
constructed,
New
this
problem
practices
by fixing
at kernel
addresses
yielding
time
of ensuring
interface
it supports.
adapting
take
release
and would
kernel
definition
of operations
then
before
the
today.
association
run
no method
constructing
is taken,
the
until
a critical
lock-step
existing
development
of new operations
systems
the
of developers
allows
operations
is
and
disrupting
to be delayed
permissible
evolution
change
diverse third-party
filing
tion of existing
systems.
mentation
systems
in the UCLA Interface
Accommodation
interfaces.
file
command
identifies
the existing
stack
be attached
into the filing namespace.
Because
implementa-
system
to a kernel
are
opera-
may
with
include
a simple
the interface
does not
define
a fixed set of operations,
a new layer
must
expect “unsupported”
operations
and accommodate
them consistently.
The UCLA interface
requires
a defaalt
routine
that
will
be invoked
for all operations
not otherwise
provided
by a file
operation”
to a lower
error
layer
The new
structure
operation
static
invocation.
offset
computed
important
2For
File
systems
into
of the
The
may
most
operations
calling
the operations
vector
vector
we ignore
addition
here,
although
ACM
Transactions
here
of operations
this
extension
on
Computer
the
problem
Systems,
12.
No
an “unsupported
also requires
for
new
with
new
operations
an extension
implementations.
1, February
1994
operations
a new
operations
with
of
the
a dynamically
at run
to the
method
replaces
very little
performance
will be as frequently
of adding
of current
Vol
return
to pass unknown
of the old interface
can be supported
is not part
simply
layers
sequence
new offset. These changes have
consideration
for a service that
simplicity,
dynamic”
system.
code, but we expect
for processing.
impact,
employed
time.
approach
This
an
as
“fully
described
File-System Development
with Stackable Layers
.
71
/
0s
‘v-y---
t
UFS
layer
‘4
Fig.
7.
device
Mounting
a UFS
layer.
The
new
layer
is instantiated
at /layer/ufs/crypt
raw from
a disk
/dev/dskOg.
an interlayer
interface.
Section
5.1 discusses
performance
of stackable
layer-
ing in detail.
4.3
This
Stack
Creation
section
discusses
how
stacks
are configured
required
on a file-by-file
4.3.1
Stack
stacks
at the
basis.
Configuration.
are
formed.
file-system
Section
In
the
prototype
granularity
4.1
has
and
described
interface,
constructed
how
a UNIX
as
file
system is built from a number
of individual
subtrees
by mounting.
Subtrees
are the basic unit of file-system
configuration;
each is either mounted
making
all of its files accessible, or unmounted
and unavailable.
We employ this same
mechanism
for layer construction.
Fundamentally,
the UNIX mount mechanism
has two purposes:
it creates a
new “subtree
object” of the requested
type, and it attaches
this object into the
file-system
other
namespace
objects
7, where
in
for
the
a new UFS
file
layer
use.
system.
An
is instantiated
Frequently,
example
from
of this
a disk
/layer/ufs/crypt
creation
of subtrees
is
shown
in
uses
Figure
device
raw
/dev/dskOg.
of layers
Configuration
from
requires
the same basic steps of layer creation
and
naming,
so we employ
the same mount
mechanism
for layer construction.3
Layers are built at the subtree granularity,
a mount command
creating
each
layer
of a stack. Typically,
stacks
are built
bottom
up. After
a layer
is
mounted
to a name, the next higher
layer’s mount
command
uses that name
to identify
its “lower-layer
‘;Although
mount
is typically
inherently
costly.
Mount
inexpensive,
so is the
neighbor”
used
constructs
corresponding
ACM
today
in initialization.
to provide
an object
and
“expensive”
gives
Figure
services,
it a name;
when
8 continues
the
mechanism
object
initialization
the
is not
is
mount.
Transactions
on Computer
Systems,
Vol.
12, No.
1, February
1994.
72
.
J. S. Heidemann
and G, J, Popek
W
/
usr
+
data
_
layer
o
/
L———
B
Au
Fig. 8.
from
encryption
an existing
layer
UFS
is stacked
over
a UFS.
The encryption
\
layer
is instantiated
/usr/data
/Iayer/ufs/crypt raw.
layer
previous
example by stacking
an encryption
layer over the UFS. In this figure
an encryption
layer is created
with a new name (/usr/data)
after specifying
the lower layer (/lay er/ufs/crypt.
raw). Alternatively,
if no new name is necessary or desired,
the new layer can be mounted
to the same place in the
namespace.4
Stacks with fan-out
typically
require
that each lower layer be
named when constructed.
Stack construction
does not necessarily
proceed from the bottom
up. Sophisticated
file
systems
may
create
lower
layers
on
demand.
The
Ficus
distributed
file system takes this approach
in its use of volumes.
Each volume
is a subtree
storing
related
files. To ensure that all sites maintain
a consistent
view
about
distributed
the
system,
location
volume
of the
mount
mount location.
When a volume
translation,
the corresponding
cally located and mounted.
4.3.2 File-Level
Stacking.
level, most user actions take
by vnodes,
with
one vnode
thousands
of volumes
information
mount
volume
is maintained
in
a large-scale
on disk
at the
point is encountered
during path-name
(and lower stack layers) is automati-
While
stacks
are
place on individual
configured
files. Files
at the subtree
are represented
per layer.
When a user opens a new file in a stack, a vnode is constructed
to represent
each layer of the stack. User actions begin in the top stack layer and are then
forwarded
down the stack as required.
If an action requires
creation
of a new
vnode (such as referencing
a new file), then as the action proceeds down the
stack, each layer will build the appropriate
vnode and return
its reference
to
the layer above. The higher layer will then store this reference
in the private
4 Mounts
to the
is separately
same
named,
name
are currently
possible
standard
access control
Computer
Systems,
only
mechanisms
in BSD
4.4-derived
layers
ACM
TransactIons
on
Vol.
12,
No.
systems.
can be used to mediate
1, February
1994.
If each layer
access to lower
File-System Development
data
of the
vnodes
vnode
will
it constructs.
reference
several
Since vnode references
the rest of the kernel,
Should
lower-level
with Stackable Layers
a layer
employ
vnodes
similarly.
.
fan-out,
each
73
of its
are used both to bind layers and to access files from
no special
provision
needs to be made to perform
operations
between
layers. The same operations
used by the general
kernel
can be used between
layers; layers treat all incoming
operations
identically.
Although
the current
implementation
does not explicitly
support
stack
configuration
at a per-file
prohibits
finer
configuration
maintain
proper
4.3.3
layer
granularity,
control.
configuration
layers
when
Stack
Data
details
there is nothing
A typed-file
layer,
for each file and automatically
Caching.
cause cache aliasing
A cache manager
Individual
stack
page
other
ferred.
was not
layer
This
often
previously
is called
approach
cached.
upon
If the
to flush
is analogous
page
its
before
is required
this
fact,
4.4
the replication
A compression
different
sion layer
Attribute
currently
Before
of that page
ownership
if
by another
ownership
or token
passing
to identify
cached
layer,
is transin
a dis-
data
pages
different
layers. The cache manager
identifies
pages by a pair: [stack
file offset].
Stack layers that provide
semantics
that violate
this
to its clients.
have
to cache
interface.
is cached
cache
to callbacks
policy
wish
acquire
“ownership”
immediately
grants
naming
convention
are required
to conceal this difference
For example,
a replication
service must coordinate
stack
its clients,
itself,
and its several lower layers, providing
conceal
the
files. If each layer
cached copies can
problems
and possible data 10SS.5
coordinates
page caching in the UCLA
tributed
file system.
A consistent
page-naming
between
identifier,
layers
and to support
memory
mapped
data pages, writes
to different
any stack layer may cache a page, it must
from the cache manager.
The cache manager
the
create
the file is opened.
data, both for performance
may independently
cache
the
in the model that
for example,
could
meanings
will explicitly
the name
above
will
claim
presents
and below
to be the bottom
another
solutions
in these
example,
a compression
coordinate
cache behavior
caching
by layers
present
investigating
Stacking
layer
layer
from their clients.
identifiers
between
replica
storage.
To
layer.
above
similar
of the stack
since file offsets
The compres-
and below
problems;
itself.
we are
areas.
and Extensibility
One of the most powerful
features
of a stackable
interface
is that layers can
be stacked together,
each adding functionality
to the whole. Often, layers in
the middle
of a stack will modify
only a few operations,
passing most to the
next lower
layer
unchanged.
For example,
although
an encryption
layer
would encrypt
and decrypt
all data accessed by read and write requests,
it
may not need to modify
operations
for directory
manipulation.
Since the
interlayer
interface
is extensible
and therefore
new operations
may always be
5Imagine
in another
modifying
layer.
the first
Without
byte
some
of a page in one stack
cache
consistency
ACM Transactions
layer
policy,
on Computer
and the last
one of these
Systems,
byte
of the same page
modifications
Vol. 12, No.
will
1, February
be lost.
1994.
74
.
J. S. Heidemann
added,
an intermediate
operations.
One way
lower
This
that
to a lower
and
would
fail
to forward
layer
invokes
would
requiring
operation.
be discouraged,
of new stacks
be prepared
explicitly
approach
of new operations,
layer adds a new
must
operations
a routine
layer.
would
middle
layer
to pass
operation,
and G. J. Popek
modification
The creation
is to implement,
the
to adapt
arbitrary,
same
operation
automatically
for
each
in the
next
to the
of all existing
layers
of new layers and new
the use of unmodified
be impossible.
third-party
new
addition
when
any
operations
layers
in the
What is needed is a single bypass routine
that forwards
new operations
to
a lower level. Default
routines
(discussed
in Section 4.2) provide
the capability to have a generic routine
intercept
unknown
operations,
but the standard
vnode
interface
manner.
handle
provides
To handle
the variety
no
way
multiple
to
process
operations,
of arguments
aging
operations’
ated
with
arguments
each
and other
ation
collection,
pertinent
invocations
we expect
importantly,
these
providing
allows
all operations
have
as collections.
information.
and
the
between
make
it
to a lower
most file-system
these changes
to map
absolutely
also be
any vnode
argu-
operation
by explicitly
metadata
identity,
explicit
to be manipulated
performance
man-
are associ-
argument
management
types,
of oper-
in a generic
fashion
layers, usually
with pointer
manipulation.
possible
for a simple
bypass
routine
layer
layers
to the
to
It must
minimal
characteristics
this
a general
be able
existing
interfaces
where
calls. And, of course, sup-
In addition,
Together,
arguments
and efficiently
forwarded
These characteristics
forward
place
in
must
operations,
of these characteristics
is possible
with
are implemented
as standard
function
port for these characteristics
must
impact.
The UCLA interface
accommodates
operation
routine
used by different
possible
to identify
the operation
taking
ments to their lower-level
counterparts.G
Neither
operations
this
a single
in the UCLA
interface.
to
By convention,
to support
such a bypass routine.
More
interface
have minimal
impact
on perfor-
mance. For example,
passing
metadata
requires
only one additional
arg-w
ment to each operation.
See Section 5.1 for a detailed
analysis
of performance,
4.5
Intermachine
Operation
A transport
layer
address
space to
is a stackable
layer that transfers
operations
from one
another.
Because
vnodes
for both local and remote
file
they may be used interchangeably,
the same operations,
systems
accept
providing
network
transparency.
Sections
3.5 and 3.6 describe
some of the
ways to configure
the layers this transparency
allows.
Providing
a bridge
between
address
spaces presents
several
potential
problems.
Different
machines
might have differently
configured
sets of opera-
GVnode
stripped
bypass
ACM
arguments
change
off as network
an operation
Transactions
between
on
as a call
messages
Computer
two
proceeds
are
processed.
layers
in the
Systems,
Vol.
12,
down
No
same
No.
the
stack,
much
other
argument
address
space.
1, February
1994,
as protocol
processing
headers
is required
are
to
File-System Development
tions.
Heterogeneity
ods to
support
can make
basic
variable-length
data
and
with Stackable Layers
types
incompatible.
dynamically
.
75
Finally,
allocated
data
meth-
structures
for traditional
kernel
interfaces
do not always
generalize
when
crossing
address-space
boundaries.
For two hosts to interoperate,
it must be possible to identify
each desired
operation
unambiguously.
Well-defined
RPC protocols,
such as NFS, ensure
compatibility
by providing
set of operations
frequently
in the UCLA
defined.7
labels
interface
only a fixed set of operations.
Since restricting
the
restricts
and impedes innovation,
each operation
is assigned
Intermachine
to reject
locally
Transparent
a universally
communication
unknown
forwarding
unique
of arbitrary
identifier
when
operations
it is
uses
these
operations.
of operations
across
address-space
boundaries
re-
quires
not only that
operations
be identified
consistently,
but also that
arguments
be communicated
correctly
in spite of machine
heterogeneity.
Part
of the metadata
associated
with
each operation
includes
a complete
type
description
of all arguments.
With
this information,
an RPC protocol
can
marshal
operation
arguments
and results
between
heterogeneous
machines.
Thus, a transport
layer may be thought
of as a semantics-free
RPC protocol
with a stylized
method
of marshaling
and delivering
arguments.
NFS provides
a good prototype
transport
layer. It stacks on top of existing
local
file
systems,
not designed
using
the vnode
as a transport
layer;
interface
and its implementations
define particular
to bypass new operations
automatically.
consistency
layer, providing
a separate
In addition
to the use of an NFS-like
we employ
a more
efficient
above
its supported
transport
and below.
operations
But
consistency
policy.
inter-address-space
layer
operating
transport
between
the
level. Such a transport
layer provides
“system-call’’-level
interface
allowing
user-level
development
of file-system
providing
user-level
the interface.
data.
We
interface
to make
restriction
to new
transport
Traditional
returned
this
access
a system-call-like
system
have
the
file-system
functionality.
layer
one additional
expect
to
extend
user-to-kernel
has not
been
places
calls
chosen
serious
was
caching semantics.
We extend NFS
We have also prototyped
a cache
the kernel
the UCLA
support
NFS
are not extensible,
transport
since
The
layer
the
client
desire
constraint
restriction
and
access to
layers and
the user to provide
this
layer,
user
to
on
space for all
to
the
universal.
In
can often
make
UCLA
practice,
a good
estimate
of storage requirements.
If the client’s first guess is wrong, information is returned,
allowing
the client to repeat the operation
correctly.
4.6
Centralized
Several
Interface
aspects
characteristics
a complete
Definition
of the UCLA
definition
of all
able to map vnodes from
employing
new operations
7Generation
creation
and
interface
of the operation
schemes
based
assignment.
taking
operation
require
place.
types,
precise
Network
and
information
a bypass
one layer to the next. The designer
must provide
this information.
on host
identifier
We therefore
employ
ACM Transactions
and time-stamp
the NCS
support
UUID
on Computer
about
transparency
fully
routine
of a file
distributed
the
requires
must
be
system
identifier
mechanism.
Systems,
Vol. 12, No.
1, February
1994.
76
.
J. S. Heidemann
Detailed
interface
and G. J. Popek
information
is
needed
at
several
different
places
throughout
the layers. Rather
than require
that the interface
designer
keep
this information
consistent
in several different
places, operation
definitions
are combined
into an interface
definition.
Similar
language
used by RPC packages,
this description
arguments,
and
translates
4.7
this
Framework
The
UCLA
the
into
direction
forms
of data
convenient
movement.
to
interface
some change,
a system
with
has proved
to be quite
layers
BSD’S
namei
“compiler”
portable.
pleased
independently
derived
more
implemented
to path-name
with
translation
our framework’s
vnode
of individual
layers.
interface
itself has proved
is somewhat
Initially
to SunOS 4.1.1. In addition,
the
of the interface
have been ported
approach
we are largely
an
discusses portability
While
the UCLA
individual
interface
manipulation.
Portability
4.4. Although
quired
An
for automatic
under SunOS 4.0.3, it has since been ported
in-kernel
stacking
and extensibility
portions
to BSD
to the data-description
lists each operation,
its
interface.
to be portable,
difficult.
None
of the
re-
portability
Section
5.3
portability
of
implementations
described
have identical
sets of vnode operations,
and path-name
translation
approaches
differ considerably
between
SunOS and BSD.
Fortunately,
several aspects of the UCLA
interface
provide
approaches
to
address
layer
portability.
Extensibility
allows
layers
with
different
sets
of
operations
to coexist. In fact,
require
no changes to existing
interface
additions
from SunOS 4.0.3 to 4,1.1
layers. When interface
differences
are signifi-
cantly
greater,
layer
to run
layers
operations
a compatibility
without
(as well
change.
as other
(see Section
Ultimately,
system
3.5) provides
adoption
services)
an opportunity
of a standard
is required
set of core
for effortless
layer
portability.
5. PERFORMANCE
AND
EXPERIENCE
While a stackable
file-system
design offers numerous
advantages,
file-system
layering
will not be widely
accepted if layer overhead
is such that a monolithic file system performs
significantly
better than one formed from multiple
layers. To verify layering
performance,
overhead
was evaluated
from several
points of view.
If stackable
layering
is to encourage
rapid advance in filing,
not only must
it have good performance,
Here, we also examine
development
of similar
and then by examining
Finally,
compatibility
current
applying
5.1
but it also must
filing
abstractions.
stacking
to resolve
file-system
We conclude
by describing
filing incompatibilities.
development.
our
experiences
in
Layer Performance
To examine
the performance
classes of benchmarks.
First,
ACM
facilitate
this aspect of “performance,”
first by comparing
the
file systems
with
and without
the UCLA
interface,
the development
of layers in the new system.
problems
are one of the primary
barriers
to the use of
Transactions
on
Computer
Systems,
of the UCLA
interface,
we consider
several
we examine
the costs of particular
parts of this
Vol.
12,
No.
1, February
1994,
File-System Development
interface
with
overall
“microbenchmarks.”
system
performance
We then
by
with Stackable Layers
consider
comparing
.
how the interface
a stackable
layers
77
affects
kernel
to an
unmodified
kernel.
Finally,
we evaluate
the performance
of multilayer
systems by determining
the overhead
as the number
of layers changes.
The UCLA interface
measured
here was implemented
as a modification
SunOS
and
4.0.3. All timing
two
70-Mb
data
Maxtor
were
collected
XT-1085
hard
5.1.2 use the new interface
throughout
5.1.3 use it only
systems.
5.1.1
file
Microbenchmarks.
system
very
within
operation
The
is invoked.
inexpensive.
Here,
on a Sun-3\60
disks.
new
interface
two
while
changes
overhead,
portions
the
in
Section
those
in Section
way
every
operation
of the
to
8 Mb of RAM
measurements
the new kernel,
To minimize
we discuss
The
with
file
calls
interface:
file-
must
be
the method
for calling
an operation,
and the bypass routine.
Cost of operation
invocation
is the key to performance,
since it is an unavoidable
cost of stacking
no
matter
how layers themselves
are constructed.
To evaluate
the performance
of these portions
of the interface,
we consider
the number
of assembly-language
instructions
generated
in the implementation. Although
this statistic
is only a very rough indication
of true cost,
provides
an order-of-magnitude
comparison.8
We begin
the
UCLA
quence
the cost of invoking
a Sun-3
translates
sequence
with
by considering
interfaces.
into
requires
respect
On
four
the
assembly-language
six instructions.g
to most
an operation
platform,
vnode
instructions,
We view
file-system
in the vnode
original
this
overhead
tem
or local
layers
next
layer
pressed
File
offer.
down,
routine
the
a remote
of these
caching
pass
user’s
must
to about
are
many
actions
file into
routine
amounts
disk
layers
the
bypass
in our design
one-third
These
modifying
file or to bring
to be practical,
About
compression
might
new
as not significant
the cost of the bypass routine,
layers, each adding new abilities
stack.
se-
the
operations.
We are also interested
in
number
of “filter”
file-system
such
and
calling
while
it
examples
operations
only
the local
We envision
a
to the file-sys-
disk
in the
a com-
cache. For such layers
be inexpensive.
are not
to the
to uncompress
A complete
54 assembly-language
instructions
of services
directly
bypass
instructions.l
main
flow,
being
0
used
only for uncommon
argument
combinations,
reducing
the cost of forwarding
simple vnode operations
to 34 instructions.
Although
this cost is significantly
more than a simple
subroutine
call, it is not significant
with respect to the
cost of an average
file-system
operation.
To investigate
the effects
of
8Factors
such
these
figures.
claim
only
a rough
9We found
instructions
code
to
as machine
Many
a similar
and the
pass
10 These
figures
bundled
with
arguments
were
SunOS
architecture
architectures
comparison
and
have
from
these
ratio
on SPARC-based
new
required
of the
produced
4.0.3
the
choice
instructions
eight.
of compiler
that
have
are significantly
a significant
slower
than
impact
on
others.
We
statistics.
architectures,
In
both
cases
where
these
the
calling
old sequence
sequences
required
five
do not include
operation.
by
produced
ACM
the
Free
Software
Foundation’s
gcc
compiler.
Sun’s
C compiler
71 instructions.
Transactions
on Computer
Systems,
Vol.
12, No.
1, February
1994.
78
.
Table
J. S. Heidemann
I.
Modified
Andrew
and G. J. Popek
Benchmark
Results
UCLA
Vnode
interface
Time
Phase
Running
on Kernels
Using
the Vnode
and the
Interfacesa
UCLA
interface
Time
YoRSD
Percent
%RSD
overhead
14.8
-3.03
3.3
16.1
copy
18.8
4.7
191
5.0
1.60
ScanDir
17.3
5.1
17.8
7.9
2.89
MakeDir
3.2
28.2
1.8
28.8
2.0
2.13
Make
327.1
0.4
328.1
0.7
0.31
Overall
394.7
0.4
396.9
0.9
0.56
ReadAll
aTime
values
runs.
% RSD
overhead
timer
(in seconds;
indicates
timer
granularity
the percent
of the new interface.
High
standard
relative
deviation
standard
of elapsed
time
impact
layering
further,
of a multilayered
5.1.2
tual
lrzter~ace
deviations
for MakeDir
29 sample
is the Percent
are a result
of poor
step compares
kernel.
[Ousterhout
and removal
of large
multiple
The Andrew
ent file-system
phase
a kernel
Howard
trees.
has several
activities.
Unfortunately,
to granularity
results
four
makes
overall
very
short
phase
from
phases
run
shows
fewer number
the benchmark.
To exercise
remove
recursive
large
about
for these
copy
and
amounts
interface
the
the
of data
the
each of which
brevity
difficult.
examines
In addition,
Coarse
benchmarks
limit
their
We attribute
this
more
Computer
the long
compile
second
a recursive
(a 4.8-Mb
lower
two
/usr/include
and since
Systems,
Vol.
system
12,
No.
the
of time
by this
Both
phases
impact
tree)
1, February
1994.
first
to extend
small
of
copy
doing
operate
occurred
instead
of layering,
is usually
to the
phase
recursive
the
directory
time
The compile
phases,
remove.
and the
overhead
we examined
employs
for the
granularity
exaggerates
on
differphases
accuracy,
done per unit
strenuously,
benchmark
greatly
is in the kernel
Transactions
of
four
I. Overhead
timing
elapsed
This
copy
the effect
of the first
can be seen in Table
operations
interface
This
phases,
Because we knew all overhead
time (time spent in the kernel)
time.
a
Andrew
recursive
we examine
duration
of the benchmark.
kernel,
we measured
system
overhead
with
the modified
and
ac-
for evaluation.
the UCLA
In addition,
2 percent.
overhead.
of file-system
the
are useful,
results.
Nevertheless,
taken as a whole,
“normal
use” better
than a file-syscharacterizes
such as a recursive
copy/remove.
a slight
times.
only
et al. 1988]
accuracy
the benchmark
times
counts
benchmark
averages
only
performance
in the new interface.
this benchmark
probably
tem-intensive
benchmark
The
overall
are essential
two benchmarks:
1990;
benchmark
dominates
first
supporting
subdirectory
layers
the
instruction
measurements
To do so, we consider
benchmarks
relative
Although
performance
The first
5.1.3 examines
system.
Performance.
implementation
adding
Section
file
standard
ACM
from
( ~X\W.Y ). overhead
granularity.
file-system
and
= 1 s) are the means
relative
a
on
the
in the
of total
since
all
compared
to
File-System Development
Table II.
with Stackable Layers
.
79
Recursive Copy and Remove Benchmark Results Running on Kernels Using
the Vnode and UCLA Interfacesa
Vnode interface
UCLA interface
Time
%RSD
Time
%RSD
Percent
overhead
Recursive copy
Recursive remove
51.57
25.26
1.28
2.50
52.55
25.41
1.11
2.80
1.90
0.59
Overall
76.83
0.87
77.96
1.11
1.47
Phase
‘Time values (in seconds; timer granularity = 0.1 s) are the means of system time from 20
sample runs. 70RSD indicates the percent relative standard deviation. Overhead is the percent
overhead of the new interface.
the elapsed “wall-clock”
time a user actually
experiences.
As can be seen in
Table II, system-time
overhead
averages about 1.5 percent.
5.1.3
Multiple-Layer
Since
Performance.
the
stackable-layers
design
phi-
losophy
advocates
using several layers to implement
what has traditionally
been provided
by a monolithic
module,
the cost of layer transitions
must be
minimal
if it is to be used for serious
file-system
implementations.
To
examine
the
overall
performance
impact
of a multilayer
of a file-system
stack
file
as the
system,
number
we
of layers
analyze
that
the
employs
changes.
To perform
the UCLA
out the
Fast File
mounted
this
experiment,
interface
rest
within
of the
we began
all file
kernel.11
System,
modified
from zero to six
At
with
systems
the
base
a kernel
modified
and the vnode
of the
to use the UCLA
null
layers,
each
stack,
to support
interface
we placed
interface.
of which
Above
merely
througha Berkeley
this layer
forwards
we
all
operations
to the next layer of the stack. We ran the benchmarks
the previous
section upon those file-system
stacks. This test
described
in
is by far the
worst
full
overhead
nearly
linearly
possible
case for layering,
without
providing
Figure 9 shows
with
the
since
each added
layer
incurs
any additional
functionality.
the results of this study. Performance
number
of layers
used.
about a 0.3 percent
elapsed-time
such as the recursive
copy and
The
modified
varies
Andrew
benchmark
shows
overhead
per layer. Alternate
benchmarks
remove
phases,
also show less than
0.25
percent overhead
per layer.
To get a better feel for the costs of layering,
we also measured
system time,
that is, time spent in the kernel on behalf of the process. Figare
10 compares
recursive
copy and remove
system times (the modified
Andrew
benchmark
does not report system-time
statistics).
Because all overhead
is in the kernel
and
the
total
time
spent
in
the
kernel
comparisons
of system time indicate
layer for recursive
copy and remove.
of one layer
in Figure
10 results
from
is only
one-tenth
of elapsed
a higher
overhead:
about
Slightly
better performance
a slight
caching
effect
time,
2 percent
per
for the case
of the null
layer
11To Improve portability, we desired to modify as little of the kernel as possible. Mapping
between interfaces occurs automatically upon first entry of a file-system layer.
ACM Transactions
on Computer
Systems,
Vol. 12, No. 1, February
1994.
80
J. S. Heidemann
and G. J. Popek
15
MAB Overall
cp real
Ii7
.-E
—
—
rm real —
10 -
5
0
Fig.
9.
Elapsed
added
time
of recursive
to a file-system
stack.
2
1
copy/remove
Each
data
point
3
4
Number of layers
and
modified
is the mean
5
Andrew
of four
6
benchmarks
as layers
are
runs,
15
Cp Sys —
rm sys —
/“’/”
10
5
0
o
1
2
3
4
5
6
Number of layers
Fig.
10
System
stack
(the
mean
of four
possible
ACM
time
modified
runs.
layering
Transactions
of recursive
Andrew
Measuring
copy/remove
benchmark
system
benchmarks
does not
time
alone
provide
as layers
system
of a do-nothing
overhead.
on Computer
Systems,
Vol. 12, No
1, February
are added
time ). Each
1994.
layer
to a file-system
data
represents
point
the
is the
worst
File-System Development
compared
result
to the standard
of differences
benchmark
UFS.
in the ratio
Differences
between
with Stackable Layers
in benchmark
the number
.
overheads
of vnode
81
are the
operations
and
length.
We draw two conclusions
from these figures.
indicate
that under normal
load usage,, a layered
be virtually
file-system
undetectable.
Also,
use a small overhead
First,
elapsed-time
results
file-system
architecture
will
system-time
costs imply
that during
heavy
will be incurred
when numerous
layers are
involved.
5.2 Layer Implementation
Effort
An important
goal of stackable
file
job of new file-system
development.
saves a significant
amount
of time
systems
Importing
and this interface
functional
with
in new development,
is to ease the
existing
layers
but this
be compared
to the effort required
to employ stackable
layers.
sections
compare
development
with
and without
the UCLA
examine
how layering
conclude
that
5.2.1 Simple
file-system
than
development
tion?
To evaluate
implemented
that
the
was
of existing
and
layer was chosen
vnode interface,
A first
concern
would
file
without
and small
and large
would
we chose
with
large
small
process
extensibility;
complexity,
both
both
Development.
was
faces do not support
through
traditional
simplifies
Layer
layers
the
can be used for both
layering
when
Most
services.
We
developing
other
the
the
interface.
size
new
complicated
kernel
complicate
to examine
for comparison:
and the null
filing
to be more
facility
UCLA
must
three
and
projects.
prove
systems.
this
savings
The next
interface,
inter-
implementa-
of similar
layers
A simple
“pass-
the loopback file system under the
layer under the UCLA
interface.lz
We performed
this comparison
for both the SunOS 4.0.3 and the BSD 4.4
implementations,
measuring
complexity
as numbers
of lines of comment-free
C code.13
Table III compares
the code length
of each service in the two operating
systems. Closer examination
reveals
in the implementation
of individual
ments
must
the
most
operations
explicitly
services
with
forward
provided
that the majority
vnode operations.
a bypass
routine,
each operation.
by
the
null
while
In spite
layer
are
of code savings
The null layer
the loopback
of a smaller
also
more
file
occurs
implesystem
implementation,
general;
the
same
implementation
will support
the addition
of future
operations.
For the example
of a pass-through
layer, use of the UCLA interface
enables
improved
functionality
with a smaller
implementation.
Although
the relative
difference
in size would be less for single layers providing
multiple
services, a
goal of stackable
layers
is to provide
sophisticated
services
through
multiple,
‘2 In SunOS the null layer was augmented to reproduce the semantics of the loopback layer
exactly. This was not necessary in BSD UNIX.
13Although well-commented code might be a better comparison, the null layer was quite heavily
commented
for pedagogical
reasons,
whereas
the loopback
layer had only sparse comments.
We
chose to eliminate
this
variable.
ACM
Transactions
on Computer
Systems,
Vol. 12, No. 1, February
1994.
82
J. S. Heidemann
.
Table
III.
Number
and G. J. Popek
of Lines
Layer
of Comment-Free
or File
System
Code Needed
in SunOS
to Implement
4.0.3 and BSD
a Pass-Through
4.4
BSD
SunOS
Loopback-fs
743 lines
Null
632 lines
layer
578 lines
111 lines
Difference
reusable
1046 lines
layers.
This
goal
468 lines
(15%)
requires
that
minimal
layers
be
(45%)
as simple
as
possible.
We
are
layer
currently
code
null-derived
5.2.2 Layer
in
We
layers,
generality
and
pursuing
further.
Development
of a new
students
graduate
to
centralizing
this
invited
class at UCLA.
programming
experience
ranged
from
students
were
environment.
each
provided
All
projects
succeeded
in provided
a file-versioning
second-class
replication
the consistency
providing
ment
its
time
become
layer,
as a layer,
layer,
ranged
from
with
more
a null
prototype
encryption
layer,
their
to stack
groups
layers.
Prototypes
layer.
UFS
Self-estimates
figure
layer,
Other
over a standard
This
of
a user-level
a compression
environment,
of a
kernel
and
consistency
enhancement.
on this
as part
Five
layer
functioning
development
layers
the
parties
perspective
new
to considerable.
with
40 to 60 person-hours.
the
for
to demonstrate
programmers,
and an NFS
as an optional
of null
use by different
develop
none
each was designed
service
familiar
an
way
its
proficient
one or two
development
include
best
To gain
and
all were
size
routines
service.
The
problems.
absolute
management
is through
to design
While
the
vnode
common
technique
to different
were
to reduce
unify
Experience.
design
application
issue,
strategies
expect
layer,
of develop-
included
as well
than
time
as layer
to
design
and implementation,
Review
of the development
butions
of stacking
provide
basic
unnecessary.
required
peripheral
to this
filing
services,
Second,
was
of these
layers
experiment.
detailed
by beginning
focused
on the
framework
issues.
Finally,
platform
provided
a convenient,
tional kernel development.
a null
problem
the
familiar
three
by relying
understanding
with
largely
suggested
First,
primary
on a lower
of these
layer,
being
new
solved
out-of-kernel
services
to
was
implementation
rather
layer
environment
contrilayer
than
on
development
compared
to tradi-
We consider
this experience
a promising
indication
of the ease of development offered
by stackable
layers.
Previously,
new-file-system
functionality
required
in-kernel
modification
of multi-thousand-line
With
stackable
cant new filing
and programming
ACM
TransactIons
layers,
file
of current
systems
students
in the
capabilities
with
methodology.
on Computer
Systems,
file
and
class were
knowledge
Vol
systems,
low-level
12, No
only
1, February
requiring
kernel
able
to investigate
of the
1994,
knowledge
debugging
stackable
tools.
signifiinterface
File-System Development
5.2.3 Large-Scale
riences
in
concludes
Example.
stackable
with
development
the results
for
both
of size and use. It is comparable
in
(12,000
to
system,
7,000–8,000
of comment-free
development
over
in terms
file
its
(including
section has discussed
our expeprototype
layers.
This section
suitable
production
sive
83
a replicated
code size to other
environment
.
of developing
daily use.
Ficus is a “real”
lines
The previous
of several
with Stackable Layers
systems
NFS
three-year
Ficus
lines
or UFS
system
for Ficus
code). Ficus
existence.
development)
file
Its
compared
has seen exten-
developers’
is completely
computing
supported
in Ficus,
and it is now in use at various
sites in the United
States.
Stacking
has been a part of Ficus from its very early development.
Ficus
has provided
both a fertile
source of layered
development
techniques,
and a
proving
ground for what works and what does not work.
Ficus makes good use of stackable
concepts such as extensibility,
cooperating
layers,
an
extensible
transport
layer,
and
out-of-kernel
development.
Extensibility
The concept
is widely used in Ficus to provide replication-specific
of cooperating
layers is fundamental
to the Ficus
where
services
some
must
must
be close to data
be provided
storage.
port layer has provided
parency as well. Finally,
Between
“close”
the
to the
Ficus
user
layers,
the
operations.
architecture,
whereas
others
optional
trans-
easy access to any replica,
leveraging
location
transthe out-of-kernel
debugging
environment
has proved
particularly
important
in early development,
saving significant
development
time.
As a full-scale
example
of the use of stackable
layering
and the UCLA
interface,
Ficus illustrates
the success of these tools for file-system
ment.
Layered
file systems
can be robust
enough
for daily
use,
development
process is suitable
for long-term
projects.
5.3
Compatibility
Experiences
Extensibility
and
lems.
3.5 discusses
here
Section
layering
we consider
experiences
user-id
UFS.
Extensibility
are powerful
several
how effective
primarily
mapping
developand the
concern
and null
has
these
tools
to address
approaches
have
quite
in
prob-
these
tools;
to be in practice.
Our
of the Ficus
and the stack-enabled
effective
compatibility
to employ
proved
the use and evolution
layers,
proved
tools
different
versions
supporting
layers,
the
of NFS
and
“third-party’’-style
change. The file-system
layers developed
at UCLA
evolve independently
of
each other and of standard
filing services. Operations
are frequently
added to
the Ficus layers with minimal
consequences
on the other layers. We have
encountered
and our
protocols
some
cache
consistency
problems
resulting
transport
layer.
We are currently
implementing
as discussed in Section 4.3.3 to address this issue.
from
extensibility
cache coherence
Without
extensi-
bility,
each interface
change would require
changes to all other layers, greatly
slowing progress.
We have had mixed experiences
with portability
between
different
operating systems.
On the positive
side, Ficus is currently
accessible
from PCS
ACM
Transactions
on Computer
Systems,
Vol. 12, No, 1, February
1994.
84
.
J. S. Heidemann
and G. J. Popek
Unix
host
j
:
~
a
PC user
:
n fs
:
;
(client)
: 0
/—<
nfs
(server)
Fig.
11.
running
ferences;
nal
Access
MS-DOS.
the
interface
to
UNIX-based
NFS
Ficus
bridges
shrinkfid
layer
from
minor
difinter-
:
:
.
shnnkfid
differences.
_ )
!
:
:
:
:
a PC
operating-system
addresses
:’..
:
MS-DOS
Pc
8
Ficus
logical
E
Ficus
physical
UFS
b
running
MS-DOS
communicate
(see Figure
with
tion to identify
files
additional
“shrinkfid”
Actual
portability
mentations
is more
a UNIX
11). The
host
running
PC runs
Ficus.
an NFS
Ficus
implementation
requires
more
to
informa-
than will fit in an NFS file identifier,
so we employ
an
layer to map over this difference.
of layers between
the SunOS and BSD stacking
impledifficult.
Each
operating
system
has a radically
different
set of core vnode operations
and related
services. For this reason and because
of licensing
restrictions,
we chose to reimplement
the null and user-id
mapping layers for the BSJ3 port. Although
we expect that a compatibility
layer
could mask interface
differences,
long-term
interoperability
requires
not only
a consistent
stacking
framework,
but also a common
set of core operations
and related
operating-system
services.
Finally,
we have been successful
employing
simple compatibility
layers to
map over minor
interface
differences.
The shrinkfld
and umap layers each
correct deficiencies
in interface
or administrative
configuration.
We have also
constructed
a simple
layer that passes additional
through
extensible
NFS as new operations.
6. RELATED
state
information
closes
WORK
Stackable
filing
environments
build
upon several
bodies of existing
work.
UNIX
shell programming,
streams,
and the x-kernel
present
examples
of
stackable
development,
primarily
applied
to network
protocols
and terminal
ACM
Transactions
on Computer
Systems,
Vol. 12. No. 1, February
1994.
File-System Development
processing.
There
object-oriented
is
also
design.
systems. Finally,
face independently
in turn.
a significant
Sun’s
vnode
relationship
interface
between
provides
.
85
stacking
a basis
and
for modular
file
Rosenthal
has presented
a prototype
stackable
filing
interdescended from these examples.
We consider each of these
6.1 Other Stackable
Systems
The key characteristics
of a stackable
file
system
and a flexible
method
of joining
these
provides
an early example
of combining
with
with Stackable Layers
a syntactically
identical
interface
are its symmetric
layers.
UNIX
independently
[Pike
interface
shell programming
developed
modules
and Kernighan
1984].
Ritchie
[1984] applied
these principles
to one kernel
subsystem
with the
Streams
device 1/0
system. Ritchie’s
system constructed
terminal
and network
protocols
by composing
stackable
modules
that
may be added
and
removed
during
operation.
Ritchie’s
concluded
that
Streams
significantly
reduce complexity
and improve
maintainability
of this portion
of the kernel.
Since their development,
Streams
have been widely
adopted.
The x-kernel
is an operating-system
nucleus
designed
to simplify
protocol
implementation
[Hutchinson
allowing
et
al.
arbitrary
by implementing
1989].
Key
protocol
all
features
are
composition;
protocols
a uniform
run-time
protocol
choice
network
environment,
Streams,
and the
and
x-kernel
that
layers
interface,
of protocol
allowing
selection
based on efficiency;
and very inexpensive
The x-kernel
demonstrates
the effectiveness
of layering
development
in the
suffer.
Shell programming,
network
as stackable
stacks,
layer transition.
in new protocol
performance
need
are all important
not
examples
of stackable
environments.
They differ
from our work in stackable
file systems primarily
in the richness
of their
services
and in the level of performance demands.
The pipe mechanism
provides
only a simple byte-stream
of
data,
leaving
x-kernel
effectively
it to the application
also place
very
annotating
few
to impose
constraints
message
streams
structure.
Both
or requirements
with
control
Streams
on their
information.
and the
interface,
A stackable
file system, on the other hand, must provide
the complete
suite of expected
filing operations
under reasonably
extreme
performance
requirements.
Caching of persistent
data is another
major difference
between Streams-like
approaches
and stackable
file systems. File systems store persistent
may be repeatedly
accessed, making
caching of frequently
accessed
possible
and necessary.
Because
of the performance
differences
cached and noncached
data, file caching is mandatory
in production
Network
protocols
operate
need not be addressed.
6.2
Object-Orientation
strictly
with
transient
data,
late
binding,
and so caching
issues
and Stacking
Strong parallels
exist between
“object-oriented
design
ing. Object-oriented
design is frequently
characterized
sulation,
data that
data both
between
systems.
and
inheritance.
ACM
Transactions
Each
on Computer
of these
Systems,
techniques
by strong
has
and stackdata encap-
a counterpart
in
Vol. 12, No. 1, February
1994.
86
J. S, Heldemann
.
stacking.
Strong
cannot manipulate
time
stack
routine;
and G. J. Popek
data encapsulation
layers as black
configuration.
operations
is required;
without
encapsulation
one
boxes. Late binding
is analogous
to run-
Inheritance
inherited
parallels
a layer
in an object-oriented
through
a stack to the implementing
layer.
Stacking
differs
from
object-oriented
design
object-orientation
is
often
associated
with
providing
system
in
two
broad
a particular
tiate)
new
areas.
First,
lan-
stackable
filing
For example,
and linkers)
to
class of objects
and to instantiate
individual
objects.
In a
environment,
however,
far more people will configure
(instan-
stacks
than
will
to simplify
this process.
A second difference
easily
be bypassed
programming
guage. Such languages
are typically
general
purpose,
while
can be instead
tuned
for much more specific
requirements.
languages
usually
employ
similar
mechanisms
(compilers
define
a new
stackable
filing
a bypass
would
be described
design
concerns
in
new layers.
As a result,
inheritance.
object-oriented
Simple
terms.
For
special
stackable
example,
the
tools
exist
layers
can
compression
stack of Figure
3 can be thought
of as a compression
subclass of normal
files;
similarly,
a remote-access
layer could be described
as a subclass of “files.” But
with staking
it is not uncommon
to employ multiple
remote-access
layers. It
is less clear how to express this characteristic
in traditional
object-oriented
terms.
6.3 Modular
File Systems
Sun’s
interface
vnode
stackable
standard
[Kleinman
file-systems
vnode
work.
interface.
1986]
Section
We build
modularity
to provide
stackable
The standard
vnode interface
has
served
2 compares
upon
filing.
has been
its
as a foundation
stackable
abstractions
used
to provide
for
our
and
the
approach
to
filing
and
basic
file-system
stacking.
Sun’s loopback
and translucent
file systems [Hendricks
1990] and
early versions
of the Ficus file system were all built with a standard
vnode
interface.
These implementations
highlight
the primary
differences
between
the standard
vnode interface
and our stackable
environment;
with support
for extensibility
and explicit
support
for stacking,
the UCLA
interface
is
significantly
easier to employ (see Section 5.2.1).
6.4
Rosenthal’s
Stackable
Interface
Rosenthal
[1990] also recently
explored
stackable
filing,
Although
conceptually similar
to our work, the approaches
differ with regard to stack con@-uration, stack view consistency,
and extensibility.
Stack
configuration
in Rosenthal’s
model
is accomplished
by two new
operations,
push and pop. Stacks are configured
on a file-by-file
basis with
these operations,
unlike
our subtree granularity
configuration.
Per-file
configuration
allows additional
configuration
flexibility,
since arbitrary
files can be
independently
configured.
However,
this flexibility
complicates
the task of
maintaining
this information;
it is not clear how current
tools can be applied
to this
ACM
problem.
TransactIons
A second
on Computer
concern
Systems,
is that
these
new operations
Vol. 12, No, 1, February
1994
are specialized
File-System Development
for
the
construction
general
stack
of linear
fan-in
Rosenthal’s
stack layers;
stacks.
Push
with Stackable Layers
and
pop
do not
.
support
87
more
and fan-out.
stack model requires
that all users see an identical
view of
dynamic
changes of the stack by one client will be perceived
by
all other clients. As a result, it is possible to push a new layer on an existing
stack and to have all clients
immediately
begin using
the new layer.
In
principle,
one might
dynamically
add and remove
a measurements
layer
during
file
use. This
new vnode
pushed
However,
layers
were
it is not
typically
remain
approach
clear
have
unchanged
that
semantic
during
used to write
be employed
also
over the mount
this
facility
content,
data.
mounts
This
needed.
will
expect
a compression
the corresponding
the
to implement
is widely
a client
use. Consider
the file,
to read
can be used
as a
point.
Bec~tise
stack
layer.
decompression
suggests
that
a more
stack
contents
Clearly,
to
if it
service
needs
global
dynamic
change may not be necessary
and, to the extent that it adds complexity
overhead,
may be undesirable.
In addition,
ensuring
that all stack clients agree on stack construction
a number
layers
of drawbacks.
is often
access.
useful
Such
allowed.
for special
diverse
Ensuring
As discussed
in Section
tasks
a common
stack
prohibited
debugging,
if only
top also requires
very
multiprocessor
implementation,
at some performance
interface
does not enforce atomic stack configuration,
overhead.
The most
is that
for
multiple
could
careful
of a single-stack
For
view
address
all of these
with
reasons,
is
in a
cost. Since the UCLA
it does not share this
model
is one service
3.3). Rosenthal’s
does not make
layers,
it impossible
our stack
at different
difference
between
concerns
extensibility.
users
transport
space, making
to access the stack
A final
interface
for all stack
tops. Furthermore,
be in another
pointer.
view
locking
significant
problem
with Rosenthal’s
method of dynamic
stacking
many
stacks there
is no well-defined
notion
of “top-of-stack.”
stack
clients
has
stack
and remote
one stack
Stacks
with
fan-in
have
multiple
stack
tops. Encryption
requiring
fan-in
with
multiple
stack “views”
(see Section
guarantee
and
3.3, access to different
such as backup,
access is explicitly
to
sense with
the correct
stack
top
to keep a top-of-stack
explicitly
permits
different
layers. 14
Rosenthal’s
Rosenthal
vnode
interface
and the UCLA
discussed
the use of versioning
layers to map between
different
interfaces.
While versioning
layers work well
to map between
pairs of layers with
conflicting
semantics,
the number
of
mappings
required
grows exponentially
with the number
of changes, making
this approach
unsuitable
for wide-scale,
third-party
change. A more general
solution
to extensibility
is preferable,
in our view.
7. CONCLUSION
The focus of this
has been
14While
Stackable
work
approached
Rosenthal’s
Files
model
Working
is to improve
in several
the file-system
ways.
can be extended
Group
1992],
ACM
the
to support
result
Transactions
development
Stacking
provides
nonlinear
is, in effect,
on Computer
stacking
two
Systems,
process.
a framework
different
[UNIX
This
allow-
International
“stacking”
methods.
Vol. 12, No. 1, February
1994.
88
.
J. S. Heldemann
ing reuse
of existing
by leveraging
the
and G. J. Popek
filing
body
services.
of existing
Higher-level
services
can be built
file
improved
low-level
can immediately
have a wide-reaching
Formal
mechanisms
for extensibility
new
services,
facilities
even
from
are provided
work enables
Widespread
lower
systems;
impact
by replacing
existing
provide
a consistent
approach
layers
of a sophisticated
stack.
in an address-space-independent
a number
adoption
services.
to export
When
manner,
these
this
of new development
approaches.
of a framework
such as that described
will permit
independent
individual
developers
while moving forward
quickly
facilities
frame-
in this
paper
development
of filing services by many parties,
while
can benefit
from the ability
to leverage
others’ work
independently.
By opening
this field previously
largely
restricted
to major operating-systems
as a whole can progress forward
more
vendors,
rapidly.
it is hoped
that
the
industry
ACKNOWLEDGMENTS
The authors
thank
regarding
stackable
Kirk
McKusick
Tom Page and Richard
Guy for their
many discussions
file systems.
We would
also particularly
like to thank
for working
with
us to bring
stacking
to BSD
4.4. We would
like to acknowledge
Dieter Rothmeier,
Wai Mak, David Ratner,
Jeff Weidner,
Peter Reiher,
Steven Stovall,
and Greg Skinner
for their contributions
to the
Ficus
file
contributions
their
system,
and Yu Guang
Wu and Jan-Simon
to the null layer. Finally,
we wish to thank
many
helpful
Pendry
for their
the reviewers
for
suggestions.
REFERENCES
GUY,
R. G., HEIDEMANN,
J. S., MAK, W., PAGE, T. W., JR., POPEK, G. J., AND ROTHMEIER)
Implementation
(June).
HENDRICKS,
Berkeley,
1990.
A filesystem
wzgs (June).
USENIX,
KAZAR,
M.,
Syst.
N.
x-kernel:
Operating
KARELS,
M.
J.,
Proceedings
KLEIMAN,
C.,
Prlnc~ples
USENZX
development.
In
D.,
Conference
CLSENIX
SATYANARAYANAN,
in a distributed
M.
An
Why
M.
ACM,
K.
1986.
Unix
User’s
D. 1990.
Proceedings
Conference
M.,
Proceed-
SIDEBOTHAM,
file system.
B. 1984.
(June).
Program
York,
ACM
R., AND
Trans.
Cornpzd.
multiple
EUUG,
file
RPC
in
the
Sympvslum
filesystem
interface.
In
Buntington,
England,
15.
system
Berkeley,
systems
1989.
12th
a compatible
types
Calif.,
getting
USENIX,
design
S.
of the
on
91-101,
(Sept.).
USENIX,
operating
Proceedings
Proceedings
Toward
Group
for
(June).
AND OMALLEY,
In
New
architecture
aren’t
B.,
techniques.
Proceedings
J, 63, 8 (Oct.),
1984.
In
333–340.
ABBOTT,
(Dec.).
of the European
Vnodes:
L.,
design
AND MCKUSICK,
Conference
D. M.
L.
Systems
PIKE, R., AND KERNIGHAN,
RITCHIE,
Calif.,
S., NICHOLS,
new
OUSTERHOUT, J. K. 1990.
Tech.
system.
51-81.
PETERSON,
Conference
USENIX
software
and performance
Evaluating
S. R. 1986.
USENIX
file
63-71.
for
Berkeley,
Scale
1 (Feb.),
6,
replicated
Calif,,
MENEES,
WEST, M. 1988.
HUTCHINSON,
Ficus
USENIX,
D.
HOWARD, J.,
of the
faster
Berkeley,
as fast
Calif.,
in the UNIX
in Sun
UNIX.
In
238–247.
as hardware?
In
247-256.
environment.
AT&2’
Bell
Lab.
1595-1605.
A stream
input-output
system.
AT&T
Bell
generic
file
Lab.
Tech.
system.
In
J.
63, 8 (Oct.),
1897-1910.
RODRIGUEZ, R., KOEHLER,
ence Proceedings
ROSENTHAL,
(June).
ACM
D. S. H.
USENIX,
Transactions
M., AND HYDE,
(June).
1990.
Evolving
Berkeley,
on Computer
R. 1986.
The
Berkeley,
Calif.,
vnode
interface.
USENIX,
Calif.,
Systems,
the
In
USENIX
107-118.
Vol
USENZX
Confer-
260–269.
12, No, 1, February
1994,
Conference
Proceedings
File-System Development
SANDBERG, R., GOLDBERG, D., KLEIMAN,
mentation
of the
USENIX,
Berkeley,
STEERE,
D. C., XISTLER,
management
USENIX,
UNIX
on
Internal
N.
1993.
Received
August
System.
D., AND LYON, B. 1985.
In
USENZX
AND SATYANARAYANAN,
Sun
vnode
Calif.,
UNIX
Operating
Proceedings
1992;
revised
Conference
Design
.
and
Proceedings
89
imple(June).
119-130.
J.,
the
S., WALSH)
File
interface.
In
M.
1990.
USENIX
Effkient
user-level
Conference
file
Proceedings
cache
(June).
325-332.
STACRABLE FILES WORKING GROUP. 1992.
memo.
Conference
Network
Calif.,
J
Berkeley,
INTERNATIONAL
WEBBER,
Sun
with Stackable Layers
International
system
(Jan.).
April
ACM
Files
support
USENIX,
1993;
Working
for
accepted
Transactions
portable
Berkeley,
June
on Computer
Requirements
Group,
Parsippany,
filesystem
Calif.,
for stackable
files.
N.J.
extensions.
In
USENZX
219-228.
1993
Systems,
Vol. 12, No. 1, February
1994.
Download