Outline • File Management – Structured files – Low-level file implementations Operating System Components 5/29/2016 COP4610 2 Why Programmers Need Files HTML Editor <head> … </head> <body> … </body> Web Browser foo.html File Manager <head> … </head> <body> … </body> • Persistent storage • Shared device 5/29/2016 COP4610 File Manager • Structured information • Can be read by any application • Accessibility • Protocol 3 File system context 5/29/2016 COP4610 4 Fig 13-2: The External View of the File Manager Application Program Memory Mgr Process Mgr File Mgr UNIX Device Mgr WriteFile() CreateFile() CloseHandle() ReadFile() SetFilePointer() Memory Mgr Process Mgr Device Mgr File Mgr mount() write() close() open() read() lseek() Windows Hardware 5/29/2016 COP4610 5 Levels in a file system 5/29/2016 COP4610 6 Information Structure 5/29/2016 COP4610 7 Logical structures in a file 5/29/2016 COP4610 8 Low-Level Files 5/29/2016 COP4610 9 File systems • File system – A data structure on a disk that holds files • actually a file system is in a disk partition • a technical term different from a “file system” as the part of the OS that implements files • File systems in different OSs have different internal structures 5/29/2016 COP4610 10 A file system layout 5/29/2016 COP4610 11 File system descriptor • The data structure that defines the file system • Typical fields – – – – size of the file system (in blocks) size of the file descriptor area first block in the free block list location of the file descriptor of the root directory of the file system – times the file system was created, last modified, and last used 5/29/2016 COP4610 12 File system layout variations • MS/DOS uses a FAT (file allocation table) file system – so does the Macintosh OS (although the MacOS layout is different) • New UNIX file systems use cylinder groups (mini-file systems) to achieve better locality of file data 5/29/2016 COP4610 13 Locating file data • The logical file is divided into logical blocks • Each logical block is mapped to a physical disk block • The file descriptor contains data on how to perform this mapping – there are many methods for performing this mapping – we will look at several of them 5/29/2016 COP4610 14 Dividing a file into blocks 5/29/2016 COP4610 15 Contiguous Allocation • Each file occupies a set of contiguous blocks on the disk – – – – Simple – only starting location and length are required Random access Wasteful of space (dynamic storage-allocation problem) Files cannot grow • Mapping from logical to physical Q LA/512 R – Block to be accessed = Q + starting address – Displacement into block = R 5/29/2016 COP4610 16 A contiguous file 5/29/2016 COP4610 17 A contiguous file – cont. 5/29/2016 COP4610 18 Keeping a file in pieces • We need a block pointer for each logical block, an array of block pointers – block mapping indexes into this array – Each file is a linked list of disk blocks • But where do we keep this array? – usually it is not kept as contiguous array – the array of disk pointers is like a second related file (that is 1/1024 as big) 5/29/2016 COP4610 19 Block pointers in the file descriptor 5/29/2016 COP4610 20 Block pointers in contiguous disk blocks 5/29/2016 COP4610 21 Block pointers in the blocks 5/29/2016 COP4610 22 Block pointers in the blocks – cont. 5/29/2016 COP4610 23 Block pointers in an index block 5/29/2016 COP4610 24 Block pointers in an index block – cont. 5/29/2016 COP4610 25 Chained index blocks 5/29/2016 COP4610 26 Two-level index blocks 5/29/2016 COP4610 27 Two-level index blocks – cont. primary index secondary index table 5/29/2016 COP4610 data blocks 28 The UNIX hybrid method 5/29/2016 COP4610 29 The UNIX hybrid method – cont. 5/29/2016 COP4610 30 Inverted disk block index (FAT) 5/29/2016 COP4610 31 DOS FAT Files File Descriptor 43 Disk Block 254 Disk Block … 107 Disk Block File Descriptor 43 43 107 Disk Block 254 Disk Block … 107 Disk Block 254 File Access Table (FAT) 5/29/2016 COP4610 32 Free-Space Management • Bit vector (n blocks) 0 1 2 n-1 bit[i] = … 1 block[i] free 0 block[i] occupied • First free block number (number of bits per word) * (number of 0-value words) + offset of first 1 bit 5/29/2016 COP4610 33 Free-Space Management - cont. • Bit map requires extra space. Example: block size = 212 bytes disk size = 230 bytes (1 gigabyte) n = 230/212 = 218 bits (or 32K bytes) • Easy to get contiguous files • Linked list (free list) – Cannot get contiguous space easily – No waste of space 5/29/2016 COP4610 34 Free list organization 5/29/2016 COP4610 35 Free-Space Management - cont. • Need to protect: – Pointer to free list – Bit map • Must be kept on disk • Copy in memory and disk may differ. • Cannot allow for block[i] to have a situation where bit[i] = 0 in memory and bit[i] = 1 on disk. – Solution: • Set bit[i] = 0 in disk. • Allocate block[i] • Set bit[i] = 0 in memory 5/29/2016 COP4610 36 Implementing Low Level Files • Secondary storage device contains: – Volume directory (sometimes a root directory for a file system) – External file descriptor for each file – The file contents • Manages blocks – Assigns blocks to files (descriptor keeps track) – Keeps track of available blocks • Maps to/from byte stream 5/29/2016 COP4610 37 Disk Organization Boot Sector Volume Directory … Blk0 Blk1 Blkk Blkk+1 Blkk-1 Track 0, Cylinder 0 Blk2k-1 Track 0, Cylinder 1 … Blk Track 1, Cylinder 0 … Blk Track N-1, Cylinder 0 … Blk Track N-1, Cylinder M-1 … … Blk Blk … Blk Blk … Blk 5/29/2016 Blk COP4610 38 Low-level File System Architecture Block 0 b0 b1 b2 b3 … … bn-1 ... Sequential Device 5/29/2016 Randomly Accessed Device COP4610 39 File Descriptors •External name •Current state •Sharable •Owner •User •Locks •Protection settings •Length •Time of creation •Time of last modification •Time of last access •Reference count •Storage device details 5/29/2016 COP4610 40 An open() Operation • • • • Locate the on-device (external) file descriptor Extract info needed to read/write file Authenticate that process can access the file Create an internal file descriptor in primary memory • Create an entry in a “per process” open file status table • Allocate resources, e.g., buffers, to support file usage 5/29/2016 COP4610 41 File Manager Data Structures 2 Keep the state of the processfile session 3 Return a reference to the data structure Process-File Session Open File Descriptor 1 Copy info from external to the open file descriptor External File Descriptor 5/29/2016 COP4610 42 Opening a UNIX File fid = open(“fileA”, flags); … read(fid, buffer, len); 0 1 2 3 stdin stdout stderr ... On-Device File Descriptor File structure inode Open File Table Internal File Descriptor 5/29/2016 COP4610 43 Reading and Writing the Byte Stream • Two stages – Reading bytes into or writing bytes out of the memory copy of the block – Reading the physical blocks into or writing them out of memory from/to storage devices – Packing or unmarshalling procedure converts secondary storage blocks into a byte stream – Unpacking or marshalling procedure converts a byte stream into blocks 5/29/2016 COP4610 44 Marshalling the Byte Stream • Must read at least one buffer ahead on input • Must write at least one buffer behind on output • Seek flushing the current buffer and finding the correct one to load into memory • Inserting/deleting bytes in the interior of the stream 5/29/2016 COP4610 45 Full Block Buffering • Storage devices use block I/O • Files place an explicit order on the bytes • Therefore, it is possible to predict what is likely to be read after a byte • When file is opened, manager reads as many blocks ahead as feasible • After a block is logically written, it is queued for writing behind, whenever the disk is available • Buffer pool – usually variably sized, depending on virtual memory needs – Interaction with the device manager and memory manager 5/29/2016 COP4610 46 Supporting Other Storage Abstractions • Low-level file systems avoid encoding record-level functionality – If applications use very large or very small records, a generic file manager may not be efficient – Some operating systems provide a higher-layer file system to support applications with large or small files – Database management systems and multimedia documents are examples 5/29/2016 COP4610 47 Structured Files 5/29/2016 COP4610 48 Record-Oriented Sequential Files 5/29/2016 COP4610 49 Electronic Mail Example 5/29/2016 COP4610 50 Indexed Sequential Files 5/29/2016 COP4610 51 Database Management Systems • A database is a very highly structured set of information – Stored across different files – Optimized to minimize access time • DBMSs implementation – Some DBMSs use the normal files provided by the OS for generic use – Some use their own storage device block 5/29/2016 COP4610 52 Disk compaction 5/29/2016 COP4610 53 Memory-mapped Files • A file’s contents are mapped directly into the virtual address space – Files can be read from or written to by referencing the corresponding virtual addresses • Memory-mapped files are very useful when a file is shared or accessed repeatedly 5/29/2016 COP4610 54 Memory-mapped Files – cont. 5/29/2016 COP4610 55 Directories • A directory is a set of logically associated files and other directories of files – Directories are the mechanism we use to organize files • The file manager provides a set of commands to manage directories – Traverse a directory – Enumerate a list of all files and nested directories 5/29/2016 COP4610 56 Directory Structures • How should files be organized within directory? – Flat name space • All files appear in a single directory – Hierarchical name space • Directory contains files and subdirectories • Each file/directory appears as an entry in exactly one other directory -- a tree • Popular variant: All directories form a tree, but a file can have multiple parents. 5/29/2016 COP4610 57 Directory Structures 5/29/2016 COP4610 58 Directory Structures – cont. 5/29/2016 COP4610 59 A directory tree 5/29/2016 COP4610 60 Directory Implementation • Device Directory – A device can contain a collection of files – Easier to manage if there is a root for every file on the device -- the device root directory • File Directory – Typical implementations have directories implemented as a file with a special format – Entries in a file directory are handles for other files (which can be files or subdirectories) 5/29/2016 COP4610 61 Directory Implementation • Linear list of file names with pointer to the data blocks. – simple to program – time-consuming to execute • Hash Table – linear list with hash data structure. – decreases directory search time – collisions – situations where two file names hash to the same location – fixed size 5/29/2016 COP4610 62 Mounting file systems • Each file system has a root directory • We can combine file systems by mounting – that is, link a directory in one file system to the root directory of another file system • This allows us to build a single tree out of several file systems • This can also be done across a network, mounting file systems on other machines 5/29/2016 COP4610 63 UNIX mount Command / / bin usr etc bill bin usr etc foo bill nutt foo / nutt FS abc cde xyz / FS abc cde xyz blah 5/29/2016 blah COP4610 mount FS at foo 64 Mounting a file system 5/29/2016 COP4610 65 VFS-based File Manager Exports OS-specific API File System Independent Part of File Manager Virtual File System Switch MS-DOS Part of File Manager 5/29/2016 ISO 9660 Part of File Manager COP4610 … ext2 Part of File Manager 66 NFS Architecture 5/29/2016 COP4610 67 Summary of File Storage Methods • Contiguous files – Interleaved files • File pointers in the file descriptor • Contiguous file pointers • Chained data blocks – – – – Chained single index blocks Double index blocks Triple index blocks Hybrid solutions 5/29/2016 COP4610 68