Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 Storage management assignments (lesson 5) 1. Describe the concept of a file. 2. Explain what we mean by a: a) Sequential device or file b) Random access device or file 3. Normally an application can store data as it choose, but sometimes it is motivated to let the OS take control of certain file types. State if, and motivate why for the following file types. a) Directories b) Photos c) Executable programs d) Icons e) Symbolic links f) Archives 4. A file consist of some metadata (such as file name, size and rights), and the file content (data). The tar archive file format stores the metadata of the first file immediately followed by the file content, then the metadata of the second file followed by the second file content and so on. The zip archive file format stores first all files content, then an index of all files metadata. Motivate in terms of convenience and disk accesses which format is most appropriate for the following operations. a) Sequential backup and restore operations b) Adding and removing files c) Extracting specific files 5. A typical OS stores file information in all places below. - A list of files in each directory on disk. - A central list of all open file in kernel memory. - A list of files opened in each process. Explain for each: a) The content stored in each list b) How the lists are related/linked to each other c) When and how it is used 6. Why would it not be sufficient to simply store the file name just before the file content and then refer to the name each time a file should be read or written? 1 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 7. A brand new 4TB disk drive is set up in one volume covering the entire disk. The disk block size is set to 1kB to minimize internal fragmentation. The disk is used in a storage server (NAS) with 2GB RAM running a custom OS. The disk will store 3TB worth of movie files ranging from 350MB to 4GB each, and 1TB worth of MP3 files of approximately 5MB each. (For sake of copyright regulations we must assume the content was legally purchased, or that the system is part of a setup by RIAA/MPAA to provide a long wanted new service.) a) Discuss how the OS should keep track of free space (linked? bit map? other?) b) Knowing more about disk management than the system builder seems to know, describe a more suitable setup. 8. A system using linked free space management suffers an unexpected power failure midway during update of the free space start pointer. Only half written it become completely corrupted and unusable. Can the free space block list be recovered? If so, how? 9. A user does the operations below. Which operations will fail? Explain why. 1. Create the regular file "regular" 2. Creates the symbolic (soft) link "soft" referring to "regular" 3. Rename (move) the file "regular" to "regular.moved" 4. List the content of "soft" 5. Creates a hard link "hard" to "regular.moved" 6. Rename (move) the file "regular.moved" back to "regular" 7. List the content of "soft" 8. Removes the file "regular" 9. List the content of "hard" 10.Given a implementation with a single directory, disk blocks of 512byte and a file occupying 192 disk blocks, how many I/O operations is needed to: a) Move the file within the same partition? b) Move the file to a different partition? 11.Given a implementation with a single directory, disk blocks of 512byte, linked allocation and a file occupying 192 disk blocks, how many I/O operations is needed to: a) Sequentially read the entire file b) Read the data at byte position 792600 (hint: 1548x512+24) c) Add 843 bytes at end of the file? 2 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 12.Given a implementation with a single directory, disk blocks of 512byte, indexed allocation with only 64 direct pointers and one single indirect pointer and a file occupying 192 disk blocks, how many I/O operations is needed to a) Sequentially read the entire file b) Read the data at byte position 792600 (hint: 1548x512+24) c) Add 843 bytes at end of the file? 13.In the previous two questions, suggest improvements to the allocation strategy to solve some of the problems. 14.Does any of the files in previous three questions suffer internal or external fragmentation? Why? 15.RAID can be classified according to data availability, transfer large files, and handle many data request. In those terms, discuss: a) RAID 0 b) RAID 1 16.The content (in arrival order) of a disk drive I/O queue references the following tracks: 28 34 68 75 64 30 96 52 48 and by coincidence the system happens to generate exactly one new request for every request handled, arriving in the following order: 96 24 35 90 58 74 65 81 21 Suggest a disk scheduling algorithm and motivate its use. (In terms of head movement?) 17.A certain disk has a rotation speed of 10000rpm, average read seek time of 4.2ms, track-to-track seek time of 0.7ms, 512 bytes per sector and stores 1500 sectors per track. How long does it take to transfer: a) A 100MB continuos file? b) A 100MB file with each 4kB block at a random location? 18.Disk storage are allocated in units of blocks. The file system may choose to use any blocksize larger than or equal to that of the physical device. Consider three different blocksizes 512B, 4kB and 32kB in terms of transfer speed, efficiency (wasted space), and ease of update. 3 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 Answers 1. A file is from the view of a process a sequentially numbered collection of bytes on secondary storage, no matter which order and representation it has on physical disk. A file can be read sequentially, one byte after the other, reading/writing one automatically moves to the next. Some files can also move the read/write position to achieve random access. 2. a) A sequential is read from start to end one byte at a time. It is not possible to read any byte twice, once read each byte is consumed. It may be possible to rewind some devices to start. b) A random access file is a sequential file that also can be read at any position by first selecting that position. 3. a) Directories are not general files but special “file containers” in the file system and in general controlled by OS or system software. Special protection mechanisms may apply. b) Photos have no special relation to OS and can be completely left to applications. c) Executable program files are special in that they store not data but an “activity”. They are normally specific to the execution environment (hardware and OS). Thus it is motivated that the OS recognize them as such and threat them special, allowing special actions (execute), requiring a special format, and possibly add special protection features (virus scanning). Or it provide interface to allow third party software or system tools to do it. d) Icon files are in some systems closely integrated in the user interface managed by the OS or system software. Keeping a format recognized by OS is essential for them to be displayed correct. In other systems this are left to applications. e) Symbolic links are special files that should be transparent to the applications, thus the OS/file system must know them. f) Archives may be provided by OS for system backup and restore, but can in general be left to application. 4 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 4. a) Tar archives are suited for sequential backup and restore as each file can be read, it’s metadata and content stored without the need to remember any information. The store order is sequential, as is the restore order. b) Tar archives are not suited to remove operations as the entire archive must be scanned to find the position of the file in the archive. Adding files at the end is easy. Zip files provide an index to quickly with few disk accesses find a specific file to extract or remove. Adding files require the index to be moved to add space for the new file. c) See b) 5. a) The directory structure stores the file and directory names coupled with the position on disk, size and other less essential attributes such as modification time and access rights. When a file is opened this information is cached for quick retrieval in the OS central list of open files (inodes). This avoids scanning the directory tree each time a file is used. A counter keeps track of the number of users of the inode. Each process have a table of open files that enable the process to know which files it has open and also enables several processes to have the same file open several times simultaneous. This list store for each file a link to the corresponding inode (position in OS central list) and the current read/ write position. b) On disk information provide permanent storage. OS central list cache one copy of disk info for all files currently in use. It knows where to find each file on disk. Process list store information unique for each file instance opened by that process. Each open instance link to the common information in the OS table. c) See a) and b). 6. It would involve scanning the directory tree to find the disk position of the file at each access which would be too slow. It would also not provide any way for synchronization to avoid inconsistency when the file is read/written by several processes simultaneously. 5 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 7. a) A bitmap would require 4TB/1kB/8=512MB of memory. As this must be stores in RAM it is infeasible. A linked structure would be very long an tedious to setup. An alternate linked version also keeping track of the number of following consecutive blocks would reduce this chain. b) Knowing how the disk will be used allow us to make more intelligent choices. Clearly, storing only large files do not require small block size. The internal fragmentation will be at most one block per file. For 1MB block size and 350Mb files this is 1MB/350M=0.29% of the disk space wasted at most. That can be deemed acceptable and will yield a bitmap of free sectors of 4TB/1MB/8=512kB memory. For an average size of 5MB 1MB blocks will however not be acceptable, as the space wasted on internal fragmentation amounts up to 20%. This calls for two volumes (partitions). One for movies and one for music, each with different block size. The proportions are suggested to be the same as the expected space used by movies and music respectively. The block size for music could be for example 32kB which would yield a freemap of 1TB/32kB/8=4MB (max 0.64% waste). 8. Assuming the directory structure keeping the list of files are unhurt one can scan each file for the sector it uses and build a list of occupied sectors. The ones remaining after all files are parsed are free. 9. The fourth operation will fail, since it refer to the file “regular” that can no longer be found by that name. All other operations will succeed, also the seventh, as a file named “regular” exists at that point. 10.I assume N and P operations to read the directories, respectively. a) N operations to read the directory, 1 operation to write back the new name of the file. Moving a file in general only involve changing the name and/or place in the directory. The data on disk does not have to move. b) In this case, as different partitions do not share disk blocks also the data must be moved. N operations to read the source directory, P operations to read the destination directory, 1 operations to update each, 192 to read the file, and 192 to write it at the new partition. (N+P+2+192+192 operations total). 11.I assume the file is already opened, thus no need to read the directory. a) 192 block must be read. b) Assuming the given size of 192 blocks is correct that position is after end of file, so no blocks are read (or 192 to find end of file). Assuming a (intended) file size of 1MB (2048 blocks) 1549 blocks must be read. c) 192 blocks must be read and one or two written (one if at least 331 bytes in the last block are free). 6 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 12.I assume the file is already opened, thus no need to read the directory. I also assume the index blocks are cached after the first access. And I assume 32-bit pointers, thus one block can store 128 pointers. a) 1 read of direct index + 64 direct reads + 1 read of indirect index + 128 indirect reads. b) The maximum file size supported by the described setup is 64+128 blocks, at most 96kB. Reading will require at most one access to the direct block to find the indirect block, one access to the indirect block to find the data block, and one access for the data block. Thus at most 3 accesses. c) See b). In the case of space left, 3 accesses to find and read the last block, one write to add new blocks to the index, and one or two accesses to write the data (one if at least 331 bytes are free in the last block). Finding and updating the list of free blocks not counted here. 13.Clearly, the file size limit with indexed allocation is not acceptable. A solution is to add double and triple indirect blocks. With 64 direct pointers, 128 indirect, and 128*128 doubly indirect pointers the maximum file size is 8288kB. Adding a triple indirect block would add another 128*128*128*512B=1024MB. 14.Neither linked nor indexed allocation suffer from external fragmentation since any block can be allocated without regard of location on disk. Both however suffer internal fragmentation, linked in the space not used in the last block at end of file, indexed allocation also in the index blocks not fully used. 15.For simplicity I assume we discuss only two drives. Two drives are enough to explain the points. a) RAID 0 uses two disks to multiplex read and write operations. Losing one drive will loose all data on the two drives. Data availability in case of a disk crash is thus low. Since the operations are multiplexed the transfer of large data will at best be twice that of a single disk. Since data are multiplexed on bit level both disks must participate in each operation. Thus we can NOT handle one request on disk one while handling a second request on disk two. The performance of handling many request would thus be comparable to that of one single disk. b) RAID 1 uses two disks to store all information on both disks. Losing one drive will still leave one copy of all data. Availability is thus high. Write must be done to both disks, thus the performance equal of one disk. Read can be done interleaved on the two disks, thus at most double performance compared to one disk. With many requests each disk CAN handle different read requests individually and simultaneous, since the data is cloned on both disks. Thus double performance at most for reads. Writes still have the performance of one disk. 7 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 16.Request execution order with three algorithms. Total head motion is estimated as the sum of the absolute difference between each request. The red ‘x’ line is FIFO, the green ‘+’ line is CSCAN and the blue ‘o’ line is SSTF. The suggestion based on this result is to use CSCAN. There may however be better algorithms not investigated. FIFO: 28 34 68 75 64 30 96 52 48 96 24 35 90 58 74 65 81 21 tot 525 CSCAN: 28 30 34 35 48 52 58 64 65 68 74 75 81 90 96 96 21 24 tot 146 SSTF: 28 30 34 35 24 48 52 58 64 65 68 74 75 81 90 96 96 21 tot 165 17.100MB on this disk correspond to 204800 blocks or 136.533 tracks. One block is read in 60/10000/1500=4us. a) The initial seek require 4.2ms, the remaining track-to-track seek time for 136 tracks is 0.7ms*136=95.2ms and reading 204800 blocks require 204800*4us=819.2ms. In total 918.6ms or about a second. b) The average seek time of 100MB/4kB=25600 blocks is 4.2ms*25600=107.520s. Reading all blocks is as in a) 819.2ms. Thus i total 1 minute and 48 seconds. 8 Linköping University Department of Computer and Information Science (IDA) Concurrent programming, Operating systems and Real-time operating systems (TDDI04) 2009-05-18 18.The advantage of a large blocksize is that it is more efficient to transfer. Using 32kB blocks instead of 512B blocks will avoid 64 seeks each time accessed. The disadvantage is that internal fragmentation increase, on average each file will waste 16kB compared to 256B. Updating a block involves reading the block, modify the data and write it back. With a large block size more work is required. 9