Chapter 6: File management file => persistent schijf = block-georiënteerd device programma is ook file file is enige manier om data v schijf te halen en op schijf te zetten OS kan files niet interpreteren (op een paar uitzonderingen na, OS heeft zelf wel bep files nodig en is dan app die files gebruikt) fig2: in de minimal kernel zit geen file management => indien nodig: als service implementeren op niveau boven kernel concurrency control => locking service: voor exclusieve toegang (om bv te schrijven) 6.2 An overview of filing system functions Storage service: clients do not need to know about physical characteristics of disks, or where files have been stored on them directory service: clients can give convenient text names to files and, by grouping them in directories, show the relationships between them fig: typical interaction between a file service and a client (file directory service: controleert of file nog bestaat, of gebruiker deze wel mag openen en hoe de file gekend is ih systeem (naam)) file is created with some text name (1) client calls an operation such as open-file with the text name as an argument; directory service will carry out access check to ensure the client is authorized; directory service is responsible for translating text name into a form which enables file storage service to locate file on disk: name resolution (2) filing system is ready for client to use the file; it will have set up information about the file in its tables in main memory. It returns a user-file-identifier (UFID) for the client to use in subsequent requests to read or write the file => (3). (4) the storage service returns the portion of the file that was requested at (3) files can be shared => potential problem: concurrent requests for access to the same file concurrency control => many clients can safely read a file at the same time but only one should be allowed to have write access => a file is locked for reading or writing locking service: a client can request a shared lock or an exclusive lock on a file and be told whether locks have already been taken out and by whom 6.3 File and directory structure (directory met hoofdletter, file (padnaam) met kleine letter. Windows: hiërarchie per device; unix: mounten => 1 grote hiërarchie van alle hiërarchiën van alle devices alle blokjes in hiërarchie zijn zelf files) filing system must be able to identify each file uniquely in the filing system => associate identifier with a given file = system-file-identifier, SFID 6.3.1 Pathnames and working directories hierarchical system: files and directories are named relative to the top-level root directory => full name of each file and directory is pathname starting from the root current working directory: names can be relative to this as well as full pathnames 6.3.2 File Sharing: Access rights and links alternative way to support sharing: allow new directory entries to be set up to point to existing objects => links => an authorized sharer can give a new name to the object instead of remembering the owner’s pathname 6.3.3 Existence control an object should be kept in existence while there is a valid pathname for it request to delete an object => requester’s directory entry for that object will be removed, but not necessarily the stored object itself; if the last directory entry has been removed, the file is inaccessible and has become garbage (figure (a)) => existence control garbage collection: all objects that can be accessed from the root of the filing system are marked during garbage collection and unmarked objects can be deleted (streepjeslijnen zijn links die worden gelegd file die meerdere links heeft, mag niet gedeleted worden tot reference counter <= 1; links alleen tussen directory en file, niet tussen directories => je raakt in de knoei met de reference count – onoverzichtelijk, ingewikkeld) 6.4 The filing system interface Operations available to clients of the filing system: (asymmetrie: close file – storage service open file – directory service ( vrij complexe operatie, veel structuur) 6.4.1 The directory service as type manager example of directory with associated interface operations. Creating a file or directory includes making an entry for it in a superior directory as well as allocating storage for it. Deleting a file or directory includes removing a directory entry (SFID nummers zijn cruciaal voor storage service informatie over locatie kan je niet in file zelf steken, ook niet in directory => dubbel bijhouden, je moet zien dat alles op elk moment consistent is => nieuwe structuur: metadata tabel – ook metadata tabel heeft vaste plaats op schijf) 6.5 The filing system implementation Filing systems must keep information on each file or directory fig: typical information bock different filing systems may store this information in different ways; if the information was kept with the directory entry => would make directories large + cause information to be replicated when links were set up. Information is needed to located the file or directory on disk, so cannot be stored with the file use a table where each entry is a block of information on a file or directory => metadata table (fig1) fig2: makes it clear that the files, directories and metadata table recording information on them are all stored permanently on disk showing how the directory service can build on the abstractions presented by the storage service fig1: metadata table fig2: outlines data structures, typical of those used within filing systems that are held in main memory when a file is in use. The case where two users have opened the same file is illustrated. User A has UFID 3 allocated for the file and user B has UFID 5. The file has a single SFID and a single metadata table entry. Two main memory filing system data structures are shown. The system open file table has an entry for each sharer of the file. This allows for concurrent sharers of a file to have their own position pointer into it for reading and writing. The active file information table contains entries with information similar to that held in the metadata table. fig3 expands on this. Additional information that is likely to be held is the SFID and concurrency control information, ie, whether the file is open for reading or writing and by how many readers and/or writers (fig 3: 2 bovenste lijnen zijn erbij gekomen SFID heeft geen vaste plaats meer => moet er dus instaan, omdat we anders zelfs niet weten over welke file het gaat fig2: buffer/cache geeft performantievoordeel (verbetert response tijd); nadeel: als systeem uitvalt, ben je updates kwijt oplossen (1) door kritische gegevens niet gwn te bufferen, maar naar schijf te schrijven, (2) NVRAM gebruiken voor de buffer => als systeem dan uitvalt, heb je alles nog wat in NVRAM zat) fig2 also shows buffer area for disk blocks; likely that this buffer area is also used as a cache. Disk access is slow and the system will aim to satisfy as many read requests as possible from the cache. 6.5.1 Hard and symbolic links link operation requests a new directory entry to be made for an existing file or directory if access is allowed, new name is added with SFID of the existing file => hard link symbolic link: involves entering the new name with a pathname for the existing object, instead of its SFID (#172 is index naar ingang (blokje op einde vd pijl) naar metadata. Wijzigingen aan een file zijn bij hard links volledig transparant voor andere gebruiker bv als je file weglaat in AX, is deze nog aanwezig in BX bij hard links bij soft links zal je in BX met een padnaam blijven zitten die niet meer geldig is, maar soft links kan je wel gebruiken in distributed systems => je kan verwijzen naar files die op een ander systeem staan (andere computer); bij hard links kan je alleen verwijzen naar files in eigen systeem) 6.5.2 Locating a file on disk chaining in the media metadata contains a pointer to first block of the file. Each block starts with a pointer to the next block the last block is indicated by some distinguished pointer value. The free blocks may also be chained. Acceptable for sequential access but bad for random access (random access niet mogelijk, blijft steeds sequentiële verwerking => ik moet steeds alle blokken inladen, ook al wil ik enkel het laatste blok bewerken) chaining in a map map of the file store is held in memory. It mirrors the block structure of the disk and contains only pointers. Disk blocks now contain ony file information. Problems arise from the size of the map (blokken v/e schijf (disk blocks) worden nog eens kort bijgehouden i/e tabel (map in memory) tabel bevat ingang voor blok => verwijst naar blok op schijf; ingang verwijst ook naar volgende ingang die ook weer naar blok verwijst => nu wordt ingang (verwijzing) naar blok bijgehouden in aparte tabel in intern geheugen ipv in blok zelf) table of pointers basic approach: keep a table of pointers for each file to its blocks. Problem: such a table is of variable length and becomes very large. Using a variable-length table clearly conflicts with holding metadata as a series of fixed-length records. A hierarchy of tables of pointers can be used to solve this. (verwijzing naar blokje dat verwijst naar alle eigenlijke blokken v/e file => blokje kan ook naar ander blokje verwijzen (enz) om zo naar meer eigenlijke blokken te kunnen verwijzen) extent lists extent = contiguous cluster of disk blocks contiguous storage allows for efficient access and is especially suitable for large files and continuous media such as voice and video. An extent table is held per process with each entry recording the start block and number of blocks in the extent 6.5.3 Storing new media types continuous media types: special requirements => quality of service (QoS) requirement: sustain high data rate and deliver data on regular basis major difficulty: requirements for such streams arrive dynamically and there is a limit to the number of simultaneous streams that can be supported we can observe the following for a specialized integrated storage service: * storage allocation should not be in terms of a fixed block size. Extent-based allocation allows for both small conventional files and large continuous media files * it should be allowed to pass blocks on at the agreed rate then reuse the buffer space rather than risk being held up by lack of a handshake from the receiving component * copying of huge files should be avoided. It should therefore be possible for one file to point to a selected portion of another. Requirement to collect garbage: detect when there are no longer any pointers to a file * naïve mapping of a huge file into the virtual address space of a process is inappropriate * if such a file visits the file system’s user-end cache, it should be possible to specify that access is sequential; ie, the portion used most recently is not likely to be used again => least recently used (LRU) algorithm is inappropriate 6.6 Modern file system design 6.6.1 Logical volume mgmt logical volume: a file system is implemented over such a volume which, in turn, draws space from partitions of physical disks. Each partition comprises an extent of disk blocks. Rudimentary partitioning schemes allow a single disk to be divided into a series of sections, each of which holds a single logical volume and, in turn, a separate file system 6.6.2 Striping and mirroring striped logical volume = one that is built over a number of partitions, placing the blocks of the logical volume on each partition in run. The partitions are located on different disks (a) volume striped over three partitions. If we assume that a file is being read sequentially then this organization means that groups of blocks will be read from each disk. This can deliver an aggregate throughput of three times that of a single disk although a striped volume is stored across a number of disks it does not provide additional resilience to disk failures: if any constituent disk becomes unavailable, then the entire file system is lost. (partities staan vaak ook op fysiek verschillende disks => performantievoordeel: ik kan 3 schrijfoperaties tegelijk uitvoeren (I/O)) (b) a mirrored volume does provide resilience to disk failures. It replicates data across a number of disks so that each holds a complete copy of the logical volume. Mirroring can improve performance in two ways. 1: groups of blocks can be read from each disk in turn improving throughput. 2: latency of individual reads could be reduced by servicing them using whichever disk has its head closest to the requested block. (alles wordt ≠ malen weggeschreven => voor het geval een schijf crasht, is er nog een kopie op een andere schijf – nadeel: meer schijfruimte nodig) striping and mirroring are two modes of operating a Redundant Array of Inexpensive Disks (RAID) striping => RAID-0; mirroring => RAID-1 (c) These are combined in RAID-5 to provide a degree of fault tolerance without the high overheads that simple mirroring imposes. RAID-5 augments a striped volume with additional space to hold a parity check for each position on the disks. Fig: case of three disks: for each block position two of the disks will hold data and the third wil hold the exclusive-or (XOR) of the other two’s contents. If any one disk fails then its contents can be recovered by forming the XOR of the remaining two. (pariteit van 2 andere blokken wordt op derde schijf gezet => som van twee andere blokken: je kan 1 blok opnieuw berekenen door 2e van de pariteit af te trekken; dubbele throughput) 6.6.3 Journaling and logging journaling file system: decomposing each update that it makes into series of steps. First step writes non-volatile ‘journal entry’ setting out the various changes that are going to be made. Second step proceeds to make these updates. Final step records, again in NV storage, that the journal entry has been completed. => system fails in 1st step: file system structures have not been modified at all and are therefore still consistent; fail in 2nd step: journal describes updates that were going to be made; these operations can be performed by a recovery process when the system is restarted log-structured file system: updates are written to the log sequentially and read operations are implemented by examining the log in reverse order to recover the most recent version of a required file or directory 6.7 Network-based file servers (ik doe beroep op een file server op een ander knooppunt) 6.7.1 Open and closed storage architectures (a) single file service comprising a directory service and storage service is used the only way to use the storage service is through the directory service: we have a closed storage service architecture. This enforces a single naming convention and access control policy on all clients. Although the architecture is closed, the service components might be distributed. (b) service interface at the level of the file storage service. The names used at the interface are SFIDs and not pathnames. This allows a network-based file storage service to support different client OS, each with its own directory service. It also allows direct use of the file storage service by specialized clients. Open architecture provides more general and flexible service than a closed one. (a: heel schematische voorstelling van gesloten systeem; b: open systeem; bv mail service heeft geen directory service nodig) 6.7.2 The storage service interface interface should be open for general use fig1: client interaction with a closed file service; we can see the implications of separating the directory service from the storage service and of making the storage interface open. Roughly, we see the directory service dealing with pahtnames and the storage service dealing with SFIDs. fig2: possible interface for a network storage service; interface that was previously invoked from within a closed file service is now both remote and open to invocation across a network by any client 6.7.3 Location of function naming and name resolution: to resolve a pathname the directory service will need to fetch each component of the pathname in turn from the storage service in order to look up the SFID of the next component in a directory. existence control: storage service level sees only a flat SFID-to-file mapping and has no knowledge of the internal structure of the files it stores. This appears to place the existence control function at the directory service level; ie, in an open environment each client of the storage service would have to carry out its own existence control. It does not allow for the possibility that stored objects might be shared by different kinds of client. => interface at storage service level and support for existence control; asynchronous garbage collection carried out periodically an alternative approach is that the storage service ages its objects and can delete (or archive) an object when its time expires concurrency control: directory service; necessary for all clients which could access a file to communicate to achieve concurrency control. The storage service might provide shared and exclusive locks on files; if the storage service is stateless, it cannot provide this service storage service is likely to provide concurrency control at the granularity of whole files. If a client of the storage service requires arbitrarily fine-grained concurrency control, this must be provided above the storage service level, within the client app or in some new service. access control is carried out during pathname resolution in an open system the storage service interface can be used directly by a number of different clients. Access control is therefore needed at the storage service level 6.7.4 Stateless servers: NFS the filing system maintains information on all the files that have been opened by its clients. If this is done in a network filing system the server could crash, losing all this information, while its clients continue to run. Clients may also crash, possibly while holding locks on shared files, and a stateful server would have the overhead of detecting and dealing with this. the alternative is to specify the server as stateless. Sun Microsystem’s NFS fig: NFS architecture (stateless server => server die geen toestand bijhoudt – bv openen v/e file impliceert bijhouden van een toestand => NFS doet dit niet; bijgevolg: file wordt niet geopend – bv HTTP is stateless server => webpagina houdt geen toestand bij: krijgt request, beantwoordt die en vergeet alles = protectie tegen crashen van knooppunten in een netwerk fig: NFS-client en NFS-server: strikte relatie => gesloten systeem NFS server communiceert maar met 1 client, nl. NFS client voor client lijkt dit 1 geheel te zijn, files die op server staan, lijken gwn bij files op client te staan) NFS server interface: fh indicates file handle, dirfh = directory handle, operations are self-explanatory in general. Note that hard and symbolic links are supported. The cookie argument for readdir is effectively a pointer for sequential reading over successive calls; a value of zero indicates the first entry (op server geen padnamen, client moet dat zelf doen => daarvoor dient lookup => binnen een directory naam v/e file opzoeken file wordt dan gecached => getattr checkt dan metadata ed van de file om te zien of de kopie die lokaal gecached is in de buffer nog wel geldig is) 6.7.6 Protection authentication service so that the storage service has secure knowledge of who is invoking it fig: typical format for a capability = possession of an identifier, of a carefully designed form is taken as proof of the right to access the file 6.8 Integrating virtual memory and storage