File Systems

advertisement
UNIX Internals – The New Frontiers
Chapters 8 & 9
File Systems
1
Contents
 The
User Interface to Files
 File System
 File System Framework
 The Vnode/VFS Architecture
 Implementation Overview
 File-System-Dependent Objects
 Mounting a File System
 Operations on Files
 The System V File System(s5fs)
 S5fs Kernel
2
8.2 The User Interface
files, directory, file descriptor, file systems
 File & Directories

 File:
logically a container for data
 A hierarchical, tree-structured name space
 Pathname: all the components in the path
from the root to the node, by “/”
 “.” & “..”
 Link: a directory entry for a file.
3
Directory tree
4
Operation on directory





5
dirp = opendir(const *filename);
direntp = readdir (dirp);
rewinddir(dirp);
status = closedir(firp);
struct dirent {
int_t d_ino;
char d_name[NAME_MAX +1];
};
File Attributes
Kept in the inode: index node
 File attributes:

 File
type
 Number of hard links
 File size
 Device ID
 Inode number
 User and Group Ids of the owner of the file.
 Timestamps
 Permissions and mode flags
6
Permissions and mode flags



7
0wner, group, others (3 x 3 bits)
Read, write, execute (3 bits)
Mode flags - apply to executable files
- suid, sgid – to set the user’s effective UID
to that of the owner of the file,
- stick – to retain file in swap area
System calls




8
link, unlink – to create and delete hard links
utimes – to change the access and modify
timestamps,
chown – to change the owner UID and GID,
Chmode – to change permissions and mode flags.
File Descriptors


9
fd = open (path, oflag, mode);
fd is a per-process object.
File descriptors
10
File I/O
 Random
and sequential access
– random access
 nread = read(fd, buf, count);
 Write has similar semantics
 Operations are serialized
 In append mode offset pointer set to the
end of the file
 lseek
11
Scatter-Gather I/O

12
nbytes = writev(fd, iov, iovcnt);
File Locking
 Read
and write are atomic.
 Advisory locks: protect from
cooperative processes, flock() in 4BSD;
in SVR3 chmod must be enabled first
 SVR4: r/w locks.
 Mandatory locks:kernel
 C library function lockf
13
8.3 File systems


14
Mount-on
- a directory is covered by the mounted file system.
- mount table (original) & vfs list (modern)
Restrictions
- file cannot span file system,
- each file system must reside on a single logical
disk
15
Logical Disks








16
A logical disk is a storage abstraction that the kernel
sees as a linear sequence of fixed sized, randomly
accessible blocks.
newfs, mkfs,
Traditional: partition – physical storage of a file
system
Modern configurations:
Volume (several disks combined),
Disk mirroring
Stripe sets
RAID(Redundant Array of Inexpensive Disks)
Special files


17
Generalization to include all kinds of I/O related
objects such as directories, symbolic links, hardware
devices (disks, terminals, printers, psuedodevices
such as the system memory, and communications
abstractions such as pipes and sockets;
Problems with hard links – may not span file
systems,can be created by superuser only,
ownership problems,
Special files



18
Symbolic links – special file that points to another file
(linked-to file); the data portion of the file contains
the pathname of the linked-to file; may be stored in
the I-node of the symbolic link ( more on this in
Practical UNIX Programming pp.90-96);
Pipes – created by pipe system call, deleted by the
kernel automatically
FIFOs - created by mknod system call, must be
explicitly deleted;
8.5 File System Framework
 Traditional
UNIX can not support >1 types of
FS.
 The new developments (DOS, file sharing,
RFS, NFS) require the framework to change.
 AT&T:
file system switch
 Sun Microsystem: vnode/vfs
 DEC: gnode
 SVR4:(AT&T+
standard
19
vnode/vfs+NFS)-> de facto
8.6 The Vnode/Vfs Architecture
 Objectives
 Support
several file system types
simultaneously.
 Different disk partitions may contain
different types of file systems.
 Support for sharing files over a network.
 Vendors should be able to create their own
file system types and add them to the
kernel.
20
Lessons from Device I/O
 Devices:
block & character
 Character device switch:
struc cdevsw {
int (*d_open)();
int (*d_close)();
int (*d_read)();
int (*d_write)();
} cdevsw[ ];

21
Major device number: as the index
read system call(in traditional UNIX)
1)
2)
3)
4)
5)
6)
7)
8)
9)
22
Use the file descriptor to get to the open file object;
Check the entry to see if the file is open for read;
Get the pointer to the in-core inode from this entry;
Lock the inode so as to serialize access to the file;
Check the inode mode field and find that the file is a
character device file.
Use the major device number to index into a table of
character devices and obtain the cdevsw entry for this
device;
From the cdevsw, obtain the pointer to the d_read
routine for this device;
Invoke the d_read operation to perform the devicespecific processing of the read request.
Unlock the inode and return to the user.
Lessons from Device I/O
 It
is necessary to separate the file
subsystem code into file-systemindependent code and file-systemdependent code
 The interface between these two parts
is defined by a set of generic functions
that are called by the file systemindependent code
23
Object Oriented Design
24
Overview of the Vnode/Vfs Interface
 Vnode
represents a file in the UNIX
kernel.
 Vfs represents a file system
25
)
26
base class data and operations
pointers
v_data: inode(s5fs), rnode(NFS),
tmpnode(tmpfs),
 v_op: vnodeops

Example: to close the file associated with the vnode

27
#define VOP_CLOSE(vp,…) (*((vp)->v_opclose))(vp,…)
VFS base class
28
8.7 Implementation Overview
 Objectives
 Each
operation must be carried out on behalf of the
current process.
 Certain operations may need to serialize access to the
file.
 The interface must be stateless and reentrant.
 FS implementation should be allowed to use global
resources, such as buffer cache.
 The interface should be usable by the server side
 The use of fixed-size static tables must be avoided.
29
Vnodes and Open Files
 The
vnode is the fundamental
abstraction that represents an active
file in the kernel.
 access to a vnode:
 by
a file descriptor
 by file-system-dependent data structures
30
Data structures
Reference count
31
The Vnode
struct vnode
{u_short v_flag;
u_short v_count;
struct vfs *vfsmountedhere;
struct vnodeops *v_op;
struct vfs *vfsp;
…
};
// p242
32
Vnode Reference Count




33
It determines how long the vnode must remain in the
kernel.
Reference versus lock:
Acquire a reference:
 Open a file
 A process holds a reference to its current directory.
 When a new file system is mounted
 Pathname traversal routine
file is deleted physically when reference count becomes
zero.
The Vfs Object

struct vfs {







 };
34
struct vfs *vfs_next;
struct vfsops * vfs_op;
struct vnode *vfs_vnodecovered;
int vfs_fstype;
caddr_t vfs_data;
dev_t vfs_dev;
…
//p243
35
8.8 File-System-Dependent Objects
 The
Per-File Private Data
 Vnode
36
is an abstract objects.
The vnodeops Vector
struct vnodeops{
int (*vop_open)();
int (*vop_close)();
…
}; //p245
For ufs:
struct vnodeops ufs_vnodeops = {
ufs_open;
ufs_close;
…
}; //p246
37
38
File-System-Dependent Parts of
the Vfs Layer
struct vfsops {
int (*vfs_mount)();
int (*vfs_unmount)();
int (*vfs_root)();
int (*vfs_statvfs)();
int (*vfs_sync)();
…
}; //p246
39
40
8.9 Mounting a File System
 mount(spec, dir, flags, type, dataptr, datalen) //SVR4

Virtual File System Switch - a global table containing
one entry for each file system type.
struct vfssw{
char *vsw_name;
int (*vsw_init)();
struct vfsops * vsw_vfsops;
….
} vsfsw[];
41
mount Implementation
 Adds
the structure to the linked list
headed by rootvfs.
 Sets the vfs_op field to the vfsops
vector specified in the switch entry.
 Sets the vfs_vnodecovered field to
point to the vnode of the mount point
directory.
42
VFS_MOUNT processing
 Verify
permissions for the operation.
 Allocate and initialize the private data
object of the file system.
 Store a pointer to it in the vfs_data field
of the vfs object.
 Access the root directory of the file
system and initialize its vnode in
memory.
43
8.10 Operations on Files
Pathname Traversal
lookuppn(): u_cdir
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
v_type is of a directory
“..” & system root – move on
“..” & a mounted system root – access the mount point
VOP_LOOKUP
Not found, last one - success, else – error ENOENT
A mount point - go to the mounted vfs root
A symbolic link – translate it and append
Release the directory
Go back to the top of the loop
Terminate, do not release the reference of the final vnode
//p250
44
Opening a file
fd = open(pathname, mode)
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
45
Allocate a descriptor
Allocate an open file object
Call lookuppn()
Check the vnode for permissions
Check for the operations
Not exist, O_Creat, VOP_CREAT; ENOENT
VOP_OPEN
If O_TRUNC, VOP_SETATTR
Initialize
Return the index of the file descriptor
//p252
Other topics






46
File I/O
File attributes
User credentials
Analysis
Drawbacks of the SVR4 Implementation
The 4.4 BSD Model
Chapter 9
File System Implementations
47
9.2 The System V File System(s5fs)
 The
layout of s5fs
partition:
B S inode list
 Directories:

48
s5fs directory is a
special file containing a
list of files and
subdirectories.
data blocks
Inodes
 The
inode contains administrative
information,or meta data.
 The
node list contains all the inodes.
 On-disk inode - see Tab. 9-1
 In-core inode have more fields
49
Inode Fields
50
di_mode
Bit-fields
51
Block array of inode—di_addr
inode
10, 10K
256, 256K
256*256=65K, 65M
52
256*256*256=16M, 16G
The superblock
 Size
in blocks of the file system
 Size in blocks of the inode list
 Number of free blocks and inodes
 Free block list
 Free inode list
53
Free block list
54
9.3 s5fs Kernel Organization

In-core Inodes
 The
vnode
 Device ID
 Inode number of the file
 Flags for synchronization and cache management
 Pointers to keep the inode on a free list
 Pointers to keep the inode on a hash queue.
 Block number of last block read
55
Allocating and Reclaiming
Inodes
 Inode
table(LRU) containing the active
inodes
 Reference count of a vnode ==0 the
reclaim the inode as free
 Iget()(allocating):
56
Inode lookup

s5lookup()
 Checks
the directory name lookup cache
 Directory name lookup cache Miss? Reads
the directory one block at a time, searching
the entries for the specified file name:Get it
 If the file is in the directory, get the inode
number, use iget() to locate the inode,
 Inode in the table?get it: allocate a new
inode, initialize, copy, put in the hash queue,
also initialize the vnode(v_ops, v_data, vfs)
 Return the pointer to the inode
57
File I/O (1)
 Read(to
 Fd->
58
a user buffer address)
the open file object, verify mode-> vnode-> get
the rw-lock->call s5read()
 Offset -> block number & the offset -> uiomove()->
call copyout()
 The page not in memory?page fault->the handler>s5getpage()->call bmap()
 logical to physical mapping, search vnode’s page
list, not in?allocates a free page and call the disk
driver to read the data from disk
 Sleeps until the I/O completes. Before copying to
user data space, verifies the user has access
 s5read() returns, unlock, advances the offset,
returns the number of bytes read
File I/O (2)
 Write:
 Not
immediately to disk
 May increase the file size
 May require the allocation of data blocks
 Read the entire block, write relevant data,
write back all the block
59
Allocating and reclaiming
Inodes
 When
the reference count drops to 0..
 When a file becomes inactive….
 It is better to reuse inodes…………
60
Analysis of s5fs
 Reliability
concern : super block
 Performance:
2
disk I/Os
 Blocks randomly located
 Block size: 512(SVR2), 1024(SVR3)
 Name: 14 characters
 Inodes limit: 65535
61
The Berkeley Fast File System
Hard disk structure
 On-disk organization
- Blocks and fragments
- Allocation policy
 FFS functionality enhancements
– long file names,
- symbolic links,
- other enhancements;
 Analysis

62
Other file systems
 Temporary
file systems
- RAM disk, mfs, tmpfs)
 The Specfs File System
 The /proc File System
63
Linux Virtual File
System
 Uniform
file system interface to user
processes
 Represents any conceivable file
system’s general feature and behavior
 Assumes files are objects that share
basic properties regardless of the target
file system
64
65
66
Primary Objects in VFS
 Superblock
object
 Represents
 Inode
object
 Represents
 Dentry
a specific directory entry
object
 Represents
process
67
a specific file
object
 Represents
 File
a specific mounted file system
an open file associated with a
Download