Chapter 8 File Management

advertisement
Chapter 8
File
Management
8.1 Introduction
• Data should be organized in some convenient
and efficient manner. In particular, users
should be able to:
– Put data into files
– Find and use files that have previously been
created
2
File System
• Set of OS Services that provides Files and
Directories for user applications
3
8.2 Files
• A file is simply a sequence of bytes that have been stored in some
device (storage) on the computer
4
Files
• Those bytes will contain whatever data we would like to store
in the file such as:
– A text file just containing characters that we are interested in
– A word processing document file that also contains data about how to
format the text
– A database file that contains data organized in multiple tables.
• In general, the File Management system does not have any
knowledge about how the data in a file is organized. That is
the responsibility of the application programs that create and
use the file.
5
Permanent (non-volatile) Storage
Devices
•
•
•
•
6
Disk Drives
Flash Memory (Memory stick)
CDs and DVDs
Magnetic tape drives
8.2.1 File Attributes
• Name
– Symbolic (Human-readable) name of the file
• Type
– Executable file, print file, etc.
• Location
– Where file is on disk
7
File Attributes
• Size
• Protection
– Who can read, write file, etc.
• Time, date
– When file was created, modified, accessed
8
8.2.2 Folders
• An important attribute of folders is the Name
• Typically, a folder may contain Files and other
Folders (commonly called sub-folders or subdirectories)
• This results in a Tree Structure of Folder and
Files.
9
Folder/Directory Tree Structure
10
8.2.3 Pathnames
• The pathname of a file specifies the sequence of
folders one must traverse to travel down the tree to
the file.
• This pathname actually describes the absolute path of
the file, which is the sequence of folders one must
travel from the root of the tree to the desired file.
• A relative path describes the sequence of Folders one
must traverse starting at some intermediate place on
the absolute path.
• The Absolute path provides a unique identification for
a file. Two different files can have the same filename
as long as the resulting pathnames are unique.
11
File Links
• Allow a directory entry to point to a file (or
entry) that is not directly below it in the tree
structure
– Unix: Symbolic Link
– Windows: Shortcut
12
Link in Directory Tree Structure
13
8.3 Access Methods
• An access method describes the manner and
mechanisms by which a process accesses the
data in a file.
• There are two common access methods:
– Sequential
– Random (or Direct)
14
File Operations
When a process needs to use a file, there are a
number of operations it can perform:
•
•
•
•
15
Open
Close
Read
Write
Create File
• Allocate space for file
• Make entry for file in the Directory
16
8.3.1 Open File
• Make files accessible for read/write
operations
• Locates files in the Directory
• Returns internal ID for the file
– Commonly called a Handle
– handle = open(filename, parameters)
17
File Open
18
8.3.2 Close File
• Makes file no longer accessible from
application
– Deletes the Handle created by Open
19
File Close
20
8.3.3 Read File
• System call specifies:
– Handle from Open call
– Memory Location, length of information to be
read
– Possibly, location in the file where data is to be
read from
– read(file handle, buffer)
– read(file handle, buffer, length)
21
Read File
• Uses Handle to locate file on disk
• Uses file’s Read Pointer to determine the
position in the file to read from
• Update file’s Read Pointer
22
8.3.4 Write File
• System call specifies:
– Handle from Open call
– Location, length of information to be written
– Possibly, location in the file where data is to be
written
– write(file handle,buffer,length)
23
Write File
• Use Handle to locate file on disk
• Use file’s Write pointer to determine the
position in the file to write to
• Update file’s Write Pointer
24
Delete File
• Deletes entry for file in Directory
• De-allocates disk space used by the file
25
8.3.5 Sequential Access
• If the process has opened a file for sequential
access, the File Management subsystem will
keep track of the current file position for
reading and writing.
• To carry this out, the system will maintain a
file pointer that will be the position of the
next read or write.
26
File Pointer
The value of the file pointer will be initialized during
Open to one of two possible values
– Normally, this value will be set to 0 to start the reading or
writing at the beginning of the file.
– If the file is being opened to append data to the file, the File
Position pointer will be set to the current size of the file.
– After each read or write, the File Position Pointer will be
incremented by the amount of data that was read or written.
27
8.3.6 Streams, Pipes, and I/O Redirection
• A Stream is the flow of data bytes, one byte
after another, into the process (for reading) and
out of the process (for writing).
• This concept applies to Sequential Access and
was originally invented for network I/O, but
several modern programming environments
(e.g. Java, C#) have also incorporated it.
28
Standard I/O
• Standard Input
– Defaults to keyboard
• Standard Output
– Defaults to console
29
I/O Redirection
• Standard Input can come from a file
– app.exe < def.txt
• Standard Output can go to a file
– App.exe > def.txt
• Standard Output from one application can be
Standard Input for another
– App1.exe | app2.exe
Called a Pipe
30
A Pipe
31
Pipe
• A Pipe is a connection that is dynamically established
between two processes.
• When a process reads data, the data will come from
another process rather than a file. Thus, a pipe has a
process at one end that is writing to the pipe and
another process reading data at the other end of the
pipe.
• It is often the situation that one process will produce
output that another process needs for input.
• Rather than having the first process write to a file and
the second process read that file, we can save time by
having each process communicate via a pipe.
32
Pipe and Performance
Using a pipe can improve system performance in two
ways:
• By not using a file, the applications save time by not
using disk I/O.
• A pipe has the characteristic that the receiving
process can read whatever data has already been
written. Thus we do not need to wait until the first
process has written all of the data before we start
executing the second process. This creates a pipeline
similar to an automobile assembly line to speed up
overall performance.
33
8.4 Directory Functions
•
•
•
•
•
•
34
Search for a file
Create a file
Delete a file
List a directory
Rename a file
Traverse the file system
8.5 File Space Allocation
• Contiguous
– File is allocated contiguous disk space
35
File System Implementation
A possible file system layout
A Master Boot Record (MBR) is a special type of boot sector at
the very beginning of partitioned computer mass storage devices.
The MBR holds the information on how the logical partitions,
containing file systems, are organized on that medium.
36
Implementing Files (1)
(a) Contiguous allocation of disk space for 7 files
(b) State of the disk after files D and E have been removed
37
Contiguous Allocation
• Advantages
– Simple to implement
– Good disk I/O performance
• Disadvantages
– Need to know max file size ahead of time
– Probably will waste disk space
– Necessary space may not be available
38
Contiguous Allocation
Read/Write Disk Address Calculation
39
8.5.1 Cluster Allocation
• Cluster Allocation
– Disk space allocated in blocks
– Space allocated as needed
40
Cluster Allocation
41
Implementing Files (3)
Linked list allocation using a file allocation table in RAM
42
Implementing Files (4)
An example i-node
43
Cluster Allocation
• Advantages
– Tends not to waste disk space
• Disadvantages
– Additional overhead to keep track of clusters
– Can cause poor disk I/O performance
– May limit maximum size of File System
44
Cluster Performance
• Clusters tend to be scattered around the disk
– This is called External Fragmentation
– Can cause poor performance as disk arm needs to
move a lot
– Requires De-fragmentation utility
45
Cluster Performance
• Large clusters can reduce External
Fragmentation
– If lots of small files, then space will be wasted
inside each cluster
• This is called Internal Fragmentation
46
Managing Cluster Allocation
• Linked
– Each cluster has a pointer to the next cluster
• Indexed
– Single table has pointers to each of the clusters
47
Linked Blocks
48
Index Block
49
8.6 Real-World Systems
50
8.6 Real-World Systems
•
•
•
•
51
Microsoft FAT
Microsoft NTFS
Linux Ext2, Ext3
Others
8.6.1 MS FAT System
• Fat16 (FAT: file allocation table )
– MS-Dos, Windows 95
– Max 2GB space for a FileSystem
– Generally bad disk fragmentation
• Fat32
– Windows 98
– Supported by Windows 2000, XP, 2003
52
The MS-DOS File System (1)
The MS-DOS directory entry
53
The Windows 98 File System (1)
Bytes
The extended MOS-DOS directory entry used in Windows 98
54
Cluster Sizes of FAT16 and FAT32
Drive Size
Default FAT16 Cluster
Size
Default FAT32 Cluster
Size
260 MB–511 MB
8 KB
Not supported
512 MB–1,023 MB
16 KB
4 KB
1,024 MB–2 GB
32 KB
4 KB
2 GB–8 GB
Not supported
4 KB
8 GB–16 GB
Not supported
8 KB
16 GB–32 GB
Not supported
16 KB
> 32 GB
Not supported
32 KB
55
Windows FAT Table
56
8.6.2. Windows NTFS File System
• The NTFS file system (New Technology File System) is based on a structure
called the "master file table" or MFT, which is able to hold detailed
information on files. This system allows the use of long names, but, unlike
the FAT32 system, it is case-sensitive, which means that is capable of
distinguishing lower-case and upper-case letters.
• Available on Windows 2000, XP, 2003
• Maintains transaction log to recover after reboot
• Support for file protection
• Large (64 bit) cluster pointers
– Allows small clusters
– Avoids internal fragmentation
57
Windows NTFS File System
Master File Table: containing records about the files and
directories of the partition. The first record, called a descriptor,
contains information on the MFT (a copy of it is stored in the
second record). The third record contains the log file, a file
containing all actions performed on the partition. The following
records, making up what is known as the core, reference each file
and directory of the partition in the form of objects with assigned
attributes.
58
File System Structure (1)
The NTFS master file table
59
File System Structure (2)
The attributes used in MFT records
60
File System Structure (3)
An MFT record for a three-run, nine-block file
61
8.6.3 Linux Ext2 and Ext3 File System
Ext2
•Ext2 stands for second extended file system.
•It was introduced in 1993. Developed by Rémy Card.
•Maximum individual file size can be from 16 GB to 2 TB
Ext3
•Ext3 stands for third extended file system.
•It was introduced in 2001. Developed by Stephen
Tweedie.
•Starting from Linux Kernel 2.4.15 ext3 was available.
•Maximum individual file size can be from 16 GB to 2 TB
62
UNIX File System (1)
Disk layout in classical UNIX systems
63
UNIX File System (3)
The relation between the file descriptor table,
the open file description
64
UNIX File System (2)
Directory entry fields.
Structure of the i-node
65
The Linux File System
Layout of the Linux Ex2 file system.
66
Download