Linköping University 2009-05-18 Department of Computer and Information Science (IDA)

advertisement
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
Storage management assignments (lesson 5)
1. Describe the concept of a file.
2. Explain what we mean by a:
a) Sequential device or file
b) Random access device or file
3. Normally an application can store data as it choose, but sometimes it is motivated
to let the OS take control of certain file types. State if, and motivate why for the
following file types.
a) Directories
b) Photos
c) Executable programs
d) Icons
e) Symbolic links
f) Archives
4. A file consist of some metadata (such as file name, size and rights), and the file
content (data). The tar archive file format stores the metadata of the first file
immediately followed by the file content, then the metadata of the second file
followed by the second file content and so on. The zip archive file format stores first
all files content, then an index of all files metadata. Motivate in terms of convenience
and disk accesses which format is most appropriate for the following operations.
a) Sequential backup and restore operations
b) Adding and removing files
c) Extracting specific files
5. A typical OS stores file information in all places below.
- A list of files in each directory on disk.
- A central list of all open file in kernel memory.
- A list of files opened in each process.
Explain for each:
a) The content stored in each list
b) How the lists are related/linked to each other
c) When and how it is used
6. Why would it not be sufficient to simply store the file name just before the file content
and then refer to the name each time a file should be read or written?
1
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
7. A brand new 4TB disk drive is set up in one volume covering the entire disk. The disk
block size is set to 1kB to minimize internal fragmentation. The disk is used in a
storage server (NAS) with 2GB RAM running a custom OS. The disk will store 3TB
worth of movie files ranging from 350MB to 4GB each, and 1TB worth of MP3 files
of approximately 5MB each. (For sake of copyright regulations we must assume the
content was legally purchased, or that the system is part of a setup by RIAA/MPAA
to provide a long wanted new service.)
a) Discuss how the OS should keep track of free space (linked? bit map? other?)
b) Knowing more about disk management than the system builder seems to know,
describe a more suitable setup.
8. A system using linked free space management suffers an unexpected power failure
midway during update of the free space start pointer. Only half written it become
completely corrupted and unusable. Can the free space block list be recovered? If
so, how?
9. A user does the operations below. Which operations will fail? Explain why.
1. Create the regular file "regular"
2. Creates the symbolic (soft) link "soft" referring to "regular"
3. Rename (move) the file "regular" to "regular.moved"
4. List the content of "soft"
5. Creates a hard link "hard" to "regular.moved"
6. Rename (move) the file "regular.moved" back to "regular"
7. List the content of "soft"
8. Removes the file "regular"
9. List the content of "hard"
10.Given a implementation with a single directory, disk blocks of 512byte and a file
occupying 192 disk blocks, how many I/O operations is needed to:
a) Move the file within the same partition?
b) Move the file to a different partition?
11.Given a implementation with a single directory, disk blocks of 512byte, linked
allocation and a file occupying 192 disk blocks, how many I/O operations is needed
to:
a) Sequentially read the entire file
b) Read the data at byte position 792600 (hint: 1548x512+24)
c) Add 843 bytes at end of the file?
2
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
12.Given a implementation with a single directory, disk blocks of 512byte, indexed
allocation with only 64 direct pointers and one single indirect pointer and a file
occupying 192 disk blocks, how many I/O operations is needed to
a) Sequentially read the entire file
b) Read the data at byte position 792600 (hint: 1548x512+24)
c) Add 843 bytes at end of the file?
13.In the previous two questions, suggest improvements to the allocation strategy to
solve some of the problems.
14.Does any of the files in previous three questions suffer internal or external
fragmentation? Why?
15.RAID can be classified according to data availability, transfer large files, and handle
many data request. In those terms, discuss:
a) RAID 0
b) RAID 1
16.The content (in arrival order) of a disk drive I/O queue references the following
tracks:
28 34 68 75 64 30 96 52 48
and by coincidence the system happens to generate exactly one new request for
every request handled, arriving in the following order:
96 24 35 90 58 74 65 81 21
Suggest a disk scheduling algorithm and motivate its use.
(In terms of head movement?)
17.A certain disk has a rotation speed of 10000rpm, average read seek time of 4.2ms,
track-to-track seek time of 0.7ms, 512 bytes per sector and stores 1500 sectors per
track. How long does it take to transfer:
a) A 100MB continuos file?
b) A 100MB file with each 4kB block at a random location?
18.Disk storage are allocated in units of blocks. The file system may choose to use any
blocksize larger than or equal to that of the physical device. Consider three different
blocksizes 512B, 4kB and 32kB in terms of transfer speed, efficiency (wasted
space), and ease of update.
3
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
Answers
1. A file is from the view of a process a sequentially numbered collection of bytes on
secondary storage, no matter which order and representation it has on physical disk.
A file can be read sequentially, one byte after the other, reading/writing one
automatically moves to the next. Some files can also move the read/write position to
achieve random access.
2.
a) A sequential is read from start to end one byte at a time. It is not possible to read
any byte twice, once read each byte is consumed. It may be possible to rewind
some devices to start.
b) A random access file is a sequential file that also can be read at any position by
first selecting that position.
3.
a) Directories are not general files but special “file containers” in the file system and
in general controlled by OS or system software. Special protection mechanisms
may apply.
b) Photos have no special relation to OS and can be completely left to applications.
c) Executable program files are special in that they store not data but an “activity”.
They are normally specific to the execution environment (hardware and OS).
Thus it is motivated that the OS recognize them as such and threat them special,
allowing special actions (execute), requiring a special format, and possibly add
special protection features (virus scanning). Or it provide interface to allow third
party software or system tools to do it.
d) Icon files are in some systems closely integrated in the user interface managed
by the OS or system software. Keeping a format recognized by OS is essential
for them to be displayed correct. In other systems this are left to applications.
e) Symbolic links are special files that should be transparent to the applications, thus
the OS/file system must know them.
f) Archives may be provided by OS for system backup and restore, but can in
general be left to application.
4
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
4.
a) Tar archives are suited for sequential backup and restore as each file can be
read, it’s metadata and content stored without the need to remember any
information. The store order is sequential, as is the restore order.
b) Tar archives are not suited to remove operations as the entire archive must be
scanned to find the position of the file in the archive. Adding files at the end is
easy. Zip files provide an index to quickly with few disk accesses find a specific
file to extract or remove. Adding files require the index to be moved to add space
for the new file.
c) See b)
5.
a) The directory structure stores the file and directory names coupled with the
position on disk, size and other less essential attributes such as modification time
and access rights. When a file is opened this information is cached for quick
retrieval in the OS central list of open files (inodes). This avoids scanning the
directory tree each time a file is used. A counter keeps track of the number of
users of the inode. Each process have a table of open files that enable the
process to know which files it has open and also enables several processes to
have the same file open several times simultaneous. This list store for each file a
link to the corresponding inode (position in OS central list) and the current read/
write position.
b) On disk information provide permanent storage.
OS central list cache one copy of disk info for all files currently in use. It knows
where to find each file on disk.
Process list store information unique for each file instance opened by that
process. Each open instance link to the common information in the OS table.
c) See a) and b).
6. It would involve scanning the directory tree to find the disk position of the file at each
access which would be too slow. It would also not provide any way for
synchronization to avoid inconsistency when the file is read/written by several
processes simultaneously.
5
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
7.
a) A bitmap would require 4TB/1kB/8=512MB of memory. As this must be stores in
RAM it is infeasible. A linked structure would be very long an tedious to setup. An
alternate linked version also keeping track of the number of following consecutive
blocks would reduce this chain.
b) Knowing how the disk will be used allow us to make more intelligent choices.
Clearly, storing only large files do not require small block size. The internal
fragmentation will be at most one block per file. For 1MB block size and 350Mb
files this is 1MB/350M=0.29% of the disk space wasted at most. That can be
deemed acceptable and will yield a bitmap of free sectors of 4TB/1MB/8=512kB
memory. For an average size of 5MB 1MB blocks will however not be acceptable,
as the space wasted on internal fragmentation amounts up to 20%. This calls for
two volumes (partitions). One for movies and one for music, each with different
block size. The proportions are suggested to be the same as the expected space
used by movies and music respectively. The block size for music could be for
example 32kB which would yield a freemap of 1TB/32kB/8=4MB (max 0.64%
waste).
8. Assuming the directory structure keeping the list of files are unhurt one can scan
each file for the sector it uses and build a list of occupied sectors. The ones
remaining after all files are parsed are free.
9. The fourth operation will fail, since it refer to the file “regular” that can no longer be
found by that name. All other operations will succeed, also the seventh, as a file
named “regular” exists at that point.
10.I assume N and P operations to read the directories, respectively.
a) N operations to read the directory, 1 operation to write back the new name of the
file. Moving a file in general only involve changing the name and/or place in the
directory. The data on disk does not have to move.
b) In this case, as different partitions do not share disk blocks also the data must be
moved. N operations to read the source directory, P operations to read the
destination directory, 1 operations to update each, 192 to read the file, and 192
to write it at the new partition. (N+P+2+192+192 operations total).
11.I assume the file is already opened, thus no need to read the directory.
a) 192 block must be read.
b) Assuming the given size of 192 blocks is correct that position is after end of file,
so no blocks are read (or 192 to find end of file).
Assuming a (intended) file size of 1MB (2048 blocks) 1549 blocks must be read.
c) 192 blocks must be read and one or two written (one if at least 331 bytes in the
last block are free).
6
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
12.I assume the file is already opened, thus no need to read the directory. I also
assume the index blocks are cached after the first access. And I assume 32-bit
pointers, thus one block can store 128 pointers.
a) 1 read of direct index + 64 direct reads + 1 read of indirect index + 128 indirect
reads.
b) The maximum file size supported by the described setup is 64+128 blocks, at
most 96kB. Reading will require at most one access to the direct block to find the
indirect block, one access to the indirect block to find the data block, and one
access for the data block. Thus at most 3 accesses.
c) See b). In the case of space left, 3 accesses to find and read the last block, one
write to add new blocks to the index, and one or two accesses to write the data
(one if at least 331 bytes are free in the last block). Finding and updating the list
of free blocks not counted here.
13.Clearly, the file size limit with indexed allocation is not acceptable. A solution is to
add double and triple indirect blocks. With 64 direct pointers, 128 indirect, and
128*128 doubly indirect pointers the maximum file size is 8288kB. Adding a triple
indirect block would add another 128*128*128*512B=1024MB.
14.Neither linked nor indexed allocation suffer from external fragmentation since any
block can be allocated without regard of location on disk. Both however suffer
internal fragmentation, linked in the space not used in the last block at end of file,
indexed allocation also in the index blocks not fully used.
15.For simplicity I assume we discuss only two drives. Two drives are enough to
explain the points.
a) RAID 0 uses two disks to multiplex read and write operations. Losing one drive
will loose all data on the two drives. Data availability in case of a disk crash is thus
low.
Since the operations are multiplexed the transfer of large data will at best be twice
that of a single disk.
Since data are multiplexed on bit level both disks must participate in each
operation. Thus we can NOT handle one request on disk one while handling a
second request on disk two. The performance of handling many request would
thus be comparable to that of one single disk.
b) RAID 1 uses two disks to store all information on both disks. Losing one drive will
still leave one copy of all data. Availability is thus high.
Write must be done to both disks, thus the performance equal of one disk. Read
can be done interleaved on the two disks, thus at most double performance
compared to one disk.
With many requests each disk CAN handle different read requests individually
and simultaneous, since the data is cloned on both disks. Thus double
performance at most for reads. Writes still have the performance of one disk.
7
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
16.Request execution order with three algorithms. Total head motion is estimated as
the sum of the absolute difference between each request. The red ‘x’ line is FIFO,
the green ‘+’ line is CSCAN and the blue ‘o’ line is SSTF. The suggestion based on
this result is to use CSCAN. There may however be better algorithms not
investigated.
FIFO: 28 34 68 75 64 30 96 52 48 96 24 35 90 58 74 65 81 21 tot 525
CSCAN: 28 30 34 35 48 52 58 64 65 68 74 75 81 90 96 96 21 24 tot 146
SSTF: 28 30 34 35 24 48 52 58 64 65 68 74 75 81 90 96 96 21 tot 165
17.100MB on this disk correspond to 204800 blocks or 136.533 tracks. One block is
read in 60/10000/1500=4us.
a) The initial seek require 4.2ms, the remaining track-to-track seek time for 136
tracks is 0.7ms*136=95.2ms and reading 204800 blocks require
204800*4us=819.2ms. In total 918.6ms or about a second.
b) The average seek time of 100MB/4kB=25600 blocks is 4.2ms*25600=107.520s.
Reading all blocks is as in a) 819.2ms. Thus i total 1 minute and 48 seconds.
8
Linköping University
Department of Computer and Information Science (IDA)
Concurrent programming, Operating systems and Real-time operating systems (TDDI04)
2009-05-18
18.The advantage of a large blocksize is that it is more efficient to transfer. Using 32kB
blocks instead of 512B blocks will avoid 64 seeks each time accessed. The
disadvantage is that internal fragmentation increase, on average each file will waste
16kB compared to 256B. Updating a block involves reading the block, modify the
data and write it back. With a large block size more work is required.
9
Download