Chapter 6: File management
file => persistent
schijf = block-georiënteerd device
programma is ook file
file is enige manier om data v schijf te halen en op schijf te zetten
OS kan files niet interpreteren (op een paar uitzonderingen na, OS heeft zelf wel bep files nodig en is
dan app die files gebruikt)
fig2: in de minimal kernel zit geen file management => indien nodig: als service implementeren op
niveau boven kernel
concurrency control => locking service: voor exclusieve toegang (om bv te schrijven)
6.2 An overview of filing system functions
Storage service: clients do not need to know about physical characteristics of disks, or where files
have been stored on them
directory service: clients can give convenient text names to files and, by grouping them in directories,
show the relationships between them
fig: typical interaction between a file service and a client
(file directory service: controleert of file nog bestaat, of
gebruiker deze wel mag openen en hoe de file gekend is ih
systeem (naam))
file is created with some text name
(1) client calls an operation such as open-file with the text
name as an argument; directory service will carry out access
check to ensure the client is authorized; directory service is
responsible for translating text name into a form which
enables file storage service to locate file on disk: name
(2) filing system is ready for client to use the file; it will have set up information about the file in its
tables in main memory. It returns a user-file-identifier (UFID) for the client to use in subsequent
requests to read or write the file => (3). (4) the storage service returns the portion of the file that was
requested at (3)
files can be shared => potential problem: concurrent requests for access to the same file
concurrency control => many clients can safely read a file at the same time but only one should be
allowed to have write access => a file is locked for reading or writing
locking service: a client can request a shared lock or an exclusive lock on a file and be told whether
locks have already been taken out and by whom
6.3 File and directory structure
(directory met hoofdletter, file (padnaam) met kleine
letter. Windows: hiërarchie per device;
unix: mounten => 1 grote hiërarchie van alle
hiërarchiën van alle devices
alle blokjes in hiërarchie zijn zelf files)
filing system must be able to identify each file uniquely
in the filing system => associate identifier with a given
file = system-file-identifier, SFID
6.3.1 Pathnames and working directories
hierarchical system: files and directories are named relative to the top-level root directory => full
name of each file and directory is pathname starting from the root
current working directory: names can be relative to this as well as full pathnames
6.3.2 File Sharing: Access rights and links
alternative way to support sharing: allow new directory entries to be set up to point to existing
objects => links => an authorized sharer can give a new name to the object instead of remembering
the owner’s pathname
6.3.3 Existence control
an object should be kept in existence while there is a valid
pathname for it
request to delete an object => requester’s directory entry for
that object will be removed, but not necessarily the stored
object itself; if the last directory entry has been removed, the
file is inaccessible and has become garbage (figure (a)) =>
existence control
garbage collection: all objects that can be accessed from the
root of the filing system are marked during garbage
collection and unmarked objects can be deleted
(streepjeslijnen zijn links die worden gelegd
file die meerdere links heeft, mag niet gedeleted worden tot
reference counter <= 1; links alleen tussen directory en file,
niet tussen directories => je raakt in de knoei met de
reference count – onoverzichtelijk, ingewikkeld)
6.4 The filing system interface
Operations available to clients of the filing system:
(asymmetrie: close file – storage service  open file – directory service ( vrij complexe operatie, veel
6.4.1 The directory service as type manager
example of directory with associated interface
operations. Creating a file or directory includes
making an entry for it in a superior directory as well
as allocating storage for it. Deleting a file or
directory includes removing a directory entry
(SFID nummers zijn cruciaal voor storage service
informatie over locatie kan je niet in file zelf steken,
ook niet in directory => dubbel bijhouden, je moet zien dat alles op elk moment consistent is
=> nieuwe structuur: metadata tabel – ook metadata tabel heeft vaste plaats op schijf)
6.5 The filing system implementation
Filing systems must keep information on each file or
fig: typical information bock
different filing systems may store this information in
different ways; if the information was kept with the
directory entry => would make directories large +
cause information to be replicated when links were
set up. Information is needed to located the file or
directory on disk, so cannot be stored with the file
use a table where each entry is a block of
information on a file or directory => metadata table
fig2: makes it clear that the files, directories and
metadata table recording information on them are all
stored permanently on disk showing how the directory
service can build on the abstractions presented by the
storage service
fig1: metadata table
fig2: outlines data structures, typical of those used within filing systems that are held in main
memory when a file is in use. The case where two users have opened the same file is illustrated. User
A has UFID 3 allocated for the file and user B has UFID 5. The file has a single SFID and a single
metadata table entry. Two main memory filing system data structures are shown. The system open
file table has an entry for each sharer of the file. This allows for concurrent sharers of a file to have
their own position pointer into it for reading and writing. The active file information table contains
entries with information similar to that held in the metadata table.
fig3 expands on this. Additional information that is likely
to be held is the SFID and concurrency control
information, ie, whether the file is open for reading or
writing and by how many readers and/or writers
(fig 3: 2 bovenste lijnen zijn erbij gekomen
SFID heeft geen vaste plaats meer => moet er dus
instaan, omdat we anders zelfs niet weten over welke
file het gaat
fig2: buffer/cache geeft performantievoordeel
(verbetert response tijd); nadeel: als systeem uitvalt, ben je updates kwijt
oplossen (1) door kritische gegevens niet gwn te bufferen, maar naar schijf te schrijven, (2) NVRAM
gebruiken voor de buffer => als systeem dan uitvalt, heb je alles nog wat in NVRAM zat)
fig2 also shows buffer area for disk blocks; likely that this buffer area is also used as a cache. Disk
access is slow and the system will aim to satisfy as many read requests as possible from the cache.
6.5.1 Hard and symbolic links
link operation requests a new directory entry to be made for an existing file or directory
if access is allowed, new name is added with SFID of the existing file => hard link
symbolic link: involves entering the new name with a pathname for the existing object, instead of its
(#172 is index naar ingang (blokje op einde vd pijl) naar metadata. Wijzigingen aan een file zijn bij
hard links volledig transparant voor andere gebruiker
bv als je file weglaat in AX, is deze nog aanwezig in BX bij hard links  bij soft links zal je in BX met
een padnaam blijven zitten die niet meer geldig is, maar soft links kan je wel gebruiken in distributed
systems => je kan verwijzen naar files die op een ander systeem staan (andere computer); bij hard
links kan je alleen verwijzen naar files in eigen systeem)
6.5.2 Locating a file on disk
chaining in the media
metadata contains a pointer to first block of the file.
Each block starts with a pointer to the next block
the last block is indicated by some distinguished
pointer value. The free blocks may also be chained.
Acceptable for sequential access but bad for random
(random access niet mogelijk, blijft steeds sequentiële
verwerking => ik moet steeds alle blokken inladen,
ook al wil ik enkel het laatste blok bewerken)
chaining in a map
map of the file store is held in memory. It mirrors the
block structure of the disk and contains only pointers.
Disk blocks now contain ony file information. Problems
arise from the size of the map
(blokken v/e schijf (disk blocks) worden nog eens kort
bijgehouden i/e tabel (map in memory)
tabel bevat ingang voor blok => verwijst naar blok op
schijf; ingang verwijst ook naar volgende ingang die
ook weer naar blok verwijst
=> nu wordt ingang (verwijzing) naar blok bijgehouden in aparte tabel in intern geheugen ipv in blok
table of pointers
basic approach: keep a table of pointers for each file to
its blocks. Problem: such a table is of variable length and
becomes very large. Using a variable-length table clearly
conflicts with holding metadata as a series of fixed-length
records. A hierarchy of tables of pointers can be used to
solve this.
(verwijzing naar blokje dat verwijst naar alle eigenlijke
blokken v/e file => blokje kan ook naar ander blokje
verwijzen (enz) om zo naar meer eigenlijke blokken te
kunnen verwijzen)
extent lists
extent = contiguous cluster of disk blocks
contiguous storage allows for efficient access and is
especially suitable for large files and continuous media
such as voice and video. An extent table is held per
process with each entry recording the start block and
number of blocks in the extent
6.5.3 Storing new media types
continuous media types: special requirements
=> quality of service (QoS) requirement: sustain high data rate and deliver data on regular basis
major difficulty: requirements for such streams arrive dynamically and there is a limit to the number
of simultaneous streams that can be supported
we can observe the following for a specialized integrated storage service:
* storage allocation should not be in terms of a fixed block size. Extent-based allocation allows for
both small conventional files and large continuous media files
* it should be allowed to pass blocks on at the agreed rate then reuse the buffer space rather than
risk being held up by lack of a handshake from the receiving component
* copying of huge files should be avoided. It should therefore be possible for one file to point to a
selected portion of another. Requirement to collect garbage: detect when there are no longer any
pointers to a file
* naïve mapping of a huge file into the virtual address space of a process is inappropriate
* if such a file visits the file system’s user-end cache, it should be possible to specify that access is
sequential; ie, the portion used most recently is not likely to be used again => least recently used
(LRU) algorithm is inappropriate
6.6 Modern file system design
6.6.1 Logical volume mgmt
logical volume: a file system is implemented over such a volume which, in turn, draws space from
partitions of physical disks. Each partition comprises an extent of disk blocks. Rudimentary
partitioning schemes allow a single disk to be divided into a series of sections, each of which holds a
single logical volume and, in turn, a separate file system
6.6.2 Striping and mirroring
striped logical volume = one that is built over a
number of partitions, placing the blocks of the
logical volume on each partition in run. The
partitions are located on different disks
(a) volume striped over three partitions. If we
assume that a file is being read sequentially then
this organization means that groups of blocks
will be read from each disk. This can deliver an
aggregate throughput of three times that of a
single disk
although a striped volume is stored across a
number of disks it does not provide additional
resilience to disk failures: if any constituent disk becomes unavailable, then the entire file system is
lost. (partities staan vaak ook op fysiek verschillende disks => performantievoordeel: ik kan 3
schrijfoperaties tegelijk uitvoeren (I/O))
(b) a mirrored volume does provide resilience to disk failures. It replicates data across a number of
disks so that each holds a complete copy of the logical volume. Mirroring can improve performance
in two ways. 1: groups of blocks can be read from each disk in turn improving throughput. 2: latency
of individual reads could be reduced by servicing them using whichever disk has its head closest to
the requested block. (alles wordt ≠ malen weggeschreven => voor het geval een schijf crasht, is er
nog een kopie op een andere schijf – nadeel: meer schijfruimte nodig)
striping and mirroring are two modes of operating a Redundant Array of Inexpensive Disks (RAID)
striping => RAID-0; mirroring => RAID-1
(c) These are combined in RAID-5 to provide a degree of fault tolerance without the high overheads
that simple mirroring imposes. RAID-5 augments a striped volume with additional space to hold a
parity check for each position on the disks. Fig: case of three disks: for each block position two of the
disks will hold data and the third wil hold the exclusive-or (XOR) of the other two’s contents. If any
one disk fails then its contents can be recovered by forming the XOR of the remaining two.
(pariteit van 2 andere blokken wordt op derde schijf gezet => som van twee andere blokken: je kan 1
blok opnieuw berekenen door 2e van de pariteit af te trekken; dubbele throughput)
6.6.3 Journaling and logging
journaling file system: decomposing each update that it makes into series of steps. First step writes
non-volatile ‘journal entry’ setting out the various changes that are going to be made. Second step
proceeds to make these updates. Final step records, again in NV storage, that the journal entry has
been completed. => system fails in 1st step: file system structures have not been modified at all and
are therefore still consistent; fail in 2nd step: journal describes updates that were going to be made;
these operations can be performed by a recovery process when the system is restarted
log-structured file system: updates are written to the log sequentially and read operations are
implemented by examining the log in reverse order to recover the most recent version of a required
file or directory
6.7 Network-based file servers
(ik doe beroep op een file server op een ander knooppunt)
6.7.1 Open and closed storage architectures
(a) single file service comprising a directory
service and storage service is used
the only way to use the storage service is
through the directory service: we have a
closed storage service architecture. This
enforces a single naming convention and
access control policy on all clients.
Although the architecture is closed, the service components might be distributed.
(b) service interface at the level of the file storage service. The names used at the interface are SFIDs
and not pathnames. This allows a network-based file storage service to support different client OS,
each with its own directory service. It also allows direct use of the file storage service by specialized
Open architecture provides more general and flexible service than a closed one.
(a: heel schematische voorstelling van gesloten systeem; b: open systeem; bv mail service heeft geen
directory service nodig)
6.7.2 The storage service interface
interface should be open for general use
fig1: client interaction with a closed file service; we can see the implications of separating the
directory service from the storage service and of making the storage interface open. Roughly, we see
the directory service dealing with pahtnames and the storage service dealing with SFIDs.
fig2: possible interface for a network storage service; interface that was previously invoked from
within a closed file service is now both remote and open to invocation across a network by any client
6.7.3 Location of function
naming and name resolution: to resolve a pathname the directory service will need to fetch each
component of the pathname in turn from the storage service in order to look up the SFID of the next
component in a directory.
existence control: storage service level sees only a flat SFID-to-file mapping and has no knowledge of
the internal structure of the files it stores. This appears to place the existence control function at the
directory service level; ie, in an open environment each client of the storage service would have to
carry out its own existence control.
It does not allow for the possibility that stored
objects might be shared by different kinds of
=> interface at storage service level and support
for existence control; asynchronous garbage
collection carried out periodically
an alternative approach is that the storage
service ages its objects and can delete (or
archive) an object when its time expires
concurrency control: directory service;
necessary for all clients which could access a file
to communicate to achieve concurrency control. The storage service might provide shared and
exclusive locks on files; if the storage service is stateless, it cannot provide this service
storage service is likely to provide concurrency control at the granularity of whole files. If a client of
the storage service requires arbitrarily fine-grained concurrency control, this must be provided above
the storage service level, within the client app or in some new service.
access control is carried out during pathname resolution
in an open system the storage service interface can be used directly by a number of different clients.
Access control is therefore needed at the storage service level
6.7.4 Stateless servers: NFS
the filing system maintains information on all the files that have been opened by its clients. If this is
done in a network filing system the server could crash, losing all this information, while its clients
continue to run. Clients may also crash, possibly while holding locks on shared files, and a stateful
server would have the overhead of detecting and dealing with this.
the alternative is to specify the server as stateless.
Sun Microsystem’s NFS
fig: NFS architecture
(stateless server => server die geen toestand bijhoudt – bv openen v/e file impliceert bijhouden van
een toestand => NFS doet dit niet; bijgevolg: file wordt niet geopend – bv HTTP is stateless server
=> webpagina houdt geen toestand bij: krijgt request, beantwoordt die en vergeet alles
= protectie tegen crashen van knooppunten in een netwerk
fig: NFS-client en NFS-server: strikte relatie => gesloten systeem
NFS server communiceert maar met 1 client, nl. NFS client
voor client lijkt dit 1 geheel te zijn, files die op server staan, lijken gwn bij files op client te staan)
NFS server interface: fh indicates file handle, dirfh =
directory handle, operations are self-explanatory in
general. Note that hard and symbolic links are
supported. The cookie argument for readdir is
effectively a pointer for sequential reading over
successive calls; a value of zero indicates the first
(op server geen padnamen, client moet dat zelf
doen => daarvoor dient lookup => binnen een
directory naam v/e file opzoeken
file wordt dan gecached => getattr checkt dan
metadata ed van de file om te zien of de kopie die lokaal gecached is in de buffer nog wel geldig is)
6.7.6 Protection
authentication service so that the storage service
has secure knowledge of who is invoking it
fig: typical format for a capability = possession of
an identifier, of a carefully designed form is taken
as proof of the right to access the file
6.8 Integrating virtual memory and storage