Chapter 5 Files and Directories Source: Robbins and Robbins, UNIX Systems Programming, Prentice Hall, 2003. 5.1 UNIX File System Navigation Navigating the UNIX File System • • Operating systems organize physical disks into file systems to provide high-level logical access to the actual bytes of a file A file system is a collection of files and attributes such as location and name – The location is stated in terms of an offset, which the operating system translates into a physical location on a disk • • A directory is a file containing directory entries that associate a filename with the physical location of a file on disk Most file systems organize their directories in a tree structure (see the next slide) – The forward slash ( / ), when by itself or as the first character in a directory-file path, designates the root directory of the file system; it is located at the top of the file system tree; every file and subdirectory are located in a node somewhere underneath the root – dirB contains the file named m3.dat – dirA contains the files named my1.dat and my2.dat – dirA also contains a subdirectory named dirB • A file or subdirectory can be specified by either an absolute path name or a relative path name – An absolute pathname specifies all of the nodes in the path from the root node down to the file or subdirectory in the tree; each successive node name is separated by a slash • Example: /dirA/dirB/my1.dat – A relative pathname is based on the current working directory (covered later) 3 Tree Structure of a File System 4 Current Working Directory • • Anytime a process exists, a current working directory is associated with it This directory is used for path resolution for any relative path names – If a pathname does not start with a slash, the process prepends the current working directory to the path name in order to create an absolute path • In a directory listing, the dot ( . ) specifies the current directory and the dot dot ( .. ) specifies the parent of the current directory (see next slide) – The root directory has dot and dot-dot pointing to itself • The cd command can be used in a command shell to set (i.e., change to) a new current working directory – Example: cd /dirC • The PWD environment variable specifies the current working directory of a process as shown in the example below PWD=/home/smithj/COSC4153 5 Example Directory Contents uxb2% ls –fl total 10 drwx--x--x dr-xr-xr-x drwx-----drwx------rw------drwx--x--x -rw------drwxr-xr-x -rw------drwx--x--x 7 4 2 14 2 21 1 2 1 2 jjt107 root jjt107 jjt107 jjt107 jjt107 jjt107 jjt107 jjt107 jjt107 faculty root faculty faculty faculty faculty faculty faculty mail faculty 1024 4 2048 2048 12 1024 0 96 161 1024 Nov Nov Nov Nov Jul Nov Aug Jan Aug Nov 9 9 9 8 13 6 28 27 30 6 20:35 19:57 15:28 16:17 16:59 12:26 11:20 2005 22:40 12:40 . .. Mail Files .mailboxlist http .gopherrc .jpiu .procmailrc Articles 6 getcwd() Function • The getcwd() function returns the path name of the current working directory of a process #include <unistd.h> char *getcwd(char *buffer, size_t size); – The buffer parameter represents a user supplied buffer for holding the path name – The size parameter specifies the maximum length path name that the buffer can accommodate, including the trailing string terminator • If successful, the function returns a pointer to buffer; otherwise, it returns NULL and sets errno 7 chdir() Function • The chdir() function causes the directory specified by the path parameter to become the current working directory for the calling process #include <unistd.h> int chdir(const char *path); • If successful, the function returns zero; otherwise, it returns –1 and sets errno 8 Example use of getcwd() and chdir() #include <stdio.h> #include <unistd.h> #define MAX_SIZE 500 // ******************************************** int main(int argc, char *argv[]) { int status; char *pathPtr; char buffer[MAX_SIZE]; if (argc != 2) { printf("\nUsage: a.out directory_path_name\n"); return 1; } // End if (More on next slide) 9 Example use of getcwd() and chdir() (continued) status = chdir(argv[1]); if (status == -1) { perror("Problem occurred in changing current working directory\n"); return 1; } // End if pathPtr = (char *) getcwd(buffer, MAX_SIZE); if (pathPtr == NULL) { perror("Problem obtaining current working directory"); return 1; } // End if printf("\nThe current working directory is now %s\n", buffer); return 0; } // End main 10 Search Path • When the name of an executable program is entered in a command shell, the UNIX operating system systematically looks for the location of the corresponding file – If only the name of the file is given, the command shell searches for the executable file in all the directories (in the order) listed in the PATH environment variable – The PATH variable contains the fully qualified pathnames of particular directories, each separated by colons, as shown in the example below PATH=/opt/local/bin:/usr/openwin/bin:/opt/local/comm on/bin:/usr/bin:/usr/ccs/bin:/usr/ucb:/opt/local/com mon/lib/wp51/wpbin:/opt/local/common/lib/wp51/shbin: /opt/local/uxb1/bin:/opt/SUNWspro/bin:/opt/local/pro g/bin:.:/opt/lpp/SPSS/bin:/opt/sas/utilities/bin 11 The which Command • The which command can be used within a command shell – It takes a list of one or more names and looks for the files that would be executed had these names been given as commands – It does this by searching the directories listed in the PATH variable as shown in the example below uxb3% which csh /usr/bin/csh uxb3% which xwd /usr/openwin/bin/xwd uxb3% 12 5.2a Accessing the Contents of a Directory opendir() Function • • The contents of a directory are accessed through the use of three functions: opendir(), readdir(), and closedir() The opendir() function provides a handle of type DIR * to a directory stream that is positioned at the first entry in the directory #include <dirent.h> DIR *opendir(const char *directoryName); • • If successful, the function returns a pointer to a directory object; otherwise, it returns a NULL pointer and sets errno The DIR type represents a directory stream, which is an ordered sequence (not necessarily alphabetical) of all of the directory entries in a particular directory 14 readdir() Function • The readdir() function reads a directory by returning successive entries in a directory stream pointed to be directoryPtr #include <dirent.h> struct dirent *readdir(DIR *directoryPtr); • • • • The function returns a pointer to a struct dirent structure containing information about the next directory entry The function moves the stream to the next position (i.e., directory entry) after each call If successful, the function returns a pointer to a struct dirent; otherwise, it returns a NULL pointer and sets errno The function also returns NULL to indicate the end of the directory, but in this case it does not change the value of errno 15 Conceptual View of Directory Entry Storage DIR record struct dirent record struct dirent *next DIR *directoryPtr char d_name[1] char d_name[1] char d_name[1] NULL 16 struct dirent structure The following are the fields in the struct dirent structure struct dirent { ino_t d_ino; off_t d_off; ushort d_reclen; char d_name[1]; } // // // // i-number Offset into directory file Length of record Entry name pointer (i.e., char *) 17 closedir() and rewinddir() Functions • The closedir() function closes a directory stream #include <dirent.h> int closedir(DIR *directoryPtr); – If successful, the function returns zero; otherwise, it returns -1 and sets errno • The rewinddir() function repositions the directory stream at its beginning #include <dirent.h> void rewinddir(DIR *directoryPtr); – The function does not return a value and has no errors defined 18 Example use of Directory Functions #include <stdio.h> #include <dirent.h> // ******************************* int main(int argc, char *argv[]) { DIR *directoryPtr; struct dirent *entryPtr; if (argc != 2) { fprintf(stderr, "Usage: a.out directory_name\n"); return 1; } // End if (more on next slide) 19 Example use of Directory Functions (continued) directoryPtr = opendir(argv[1]); if (directoryPtr == NULL) { perror ("Failed to open directory"); return 1; } // End if entryPtr = readdir(directoryPtr); while (entryPtr != NULL) { printf("%s\n", entryPtr->d_name); entryPtr = readdir(directoryPtr); } // End while closedir(directoryPtr); return 0; } // End main 20 5.2b Accessing File Status Information stat() Function • The stat() function accesses a file by name and retrieves status information about the file #include <sys/stat.h> int stat(const char *path, struct stat *buffer); • The path parameter contains the directory path and name of the file • The buffer parameter points to a user-supplied buffer into which the function stores the status information • If successful, the function returns zero; otherwise, it returns –1 and sets errno 22 struct stat structure The following are the fields in the struct stat structure dev_t ino_t mode_t nlink_t uid_t gid_t off_t time_t time_t time_t st_dev; st_ino; st_mode; st_nlink; st_uid; st_gid; st_size; st_atime; st_mtime; st_ctime; // // // // // // // // // // Device ID of device containing file File serial number File mode (access permissions and file type) Number of hard links User ID of file Group ID of file File size in bytes Time of last access Time of last data modification Time of last file status change 23 Example use of stat() #include <stdio.h> #include <time.h> #include <sys/stat.h> // ******************************* int main(int argc, char *argv[]) { struct stat statusBuffer; int status; if (argc != 2) { fprintf(stderr, "Usage: a.out file_name\n"); return 1; } // End if status = stat(argv[1], &statusBuffer); if (status == -1) { perror("Failed to get file status"); return 1; } // End if printf("File size : %d\n", statusBuffer.st_size); printf("Last accessed: %s\n", ctime(&statusBuffer.st_atime)); return 0; } // End main 24 fstat() Function • The fstat() function reports status information of a file associated with the open file descriptor fileDescriptor #include <sys/stat.h> int fstat(int fileDescriptor, struct stat *buffer); • • The buffer parameter points to a user-supplied buffer into which the function stores the status information If successful, the function returns zero; otherwise, it returns –1 and set errno 25 Example use of fstat() #include #include #include #include <stdio.h> <time.h> <sys/stat.h> <fcntl.h> // ******************************* int main(int argc, char *argv[]) { struct stat statusBuffer; int status; int inFile; if (argc != 2) { fprintf(stderr, "Usage: a.out file_name\n"); return 1; } // End if inFile = open(argv[1], O_RDONLY); if (inFile == -1) { perror("Failed to open file"); return 1; } // End if (more on next slide) 26 Example use of fstat() (continued) status = fstat(inFile, &statusBuffer); if (status == -1) { perror("Failed to get file status"); return 1; } // End if printf("File size : %d\n", statusBuffer.st_size); printf("Last accessed: %s\n", ctime(&statusBuffer.st_atime)); close(inFile); return 0; } // End main 27 Determining the File Type • • • • The st_mode field in the struct stat structure specifies the access permissions of the file and the type of the file The slides for Chapter 4 UNIX I/O cover the symbolic names (e.g., S_IRUSR) for the access permissions The next slide specifies the macros for testing the st_mode field for the type of the file A regular file is a randomly accessible sequence of bytes with no further structure imposed on the system – UNIX stores data and programs as regular files • • Directories are files that associate file names with locations Special files specify peripheral devices – Character special files represent devices such as terminals – Block special files represent disk devices • The ISFIFO macro tests for pipes and FIFOs that are used for inter-process communication 28 Macros for Testing the File Type S_ISBLK(mode) S_ISCHR(mode) S_ISDIR(mode) S_ISFIFO(mode) S_ISLNK(mode) S_ISREG(mode) S_ISSOCK(mode) Block special file Character special file Directory Pipe or FIFO special file Symbolic link Regular file Socket mode is of type mode_t Each macro returns a nonzero value if the test is true and zero otherwise 29 Example use of Macros #include <stdio.h> #include <sys/stat.h> // ******************************* int main(int argc, char *argv[]) { struct stat statusBuffer; int status; if (argc != 2) { fprintf(stderr, "Usage: a.out entry_name\n"); return 1; } // End if status = stat(argv[1], &statusBuffer); if (status == -1) { perror("Failed to get entry status"); return 1; } // End if if (S_ISDIR(statusBuffer.st_mode)) printf("\n%s is a directory\n", argv[1]); else printf("\n%s is not a directory\n", argv[1]); return 0; } // End main 30 lstat() Function • The lstat() function accesses a file by name (via a symbolic link) and retrieves status information about the file #include <sys/stat.h> int lstat(const char *path, struct stat *buffer); • • • • • The path parameter contains the directory path and name of the file The buffer parameter points to a user-supplied buffer into which the function stores the status information If successful, the function returns zero; otherwise, it returns –1 and set errno If path does not correspond to a symbolic link, then the stat() and lstat() functions both return the same results When path is a symbolic link, the lstat() function returns information about the link itself whereas the stat() function returns information about the file referred to by the link 31 5.3 UNIX File System Implementation UNIX File System • • • • • • Disk formatting divides a physical disk into regions called partitions Each partition can have its own file system associated with it A particular file system can be mounted at any node in the tree of another file system The topmost node in a file system is called the root (or the root directory) of the file system In UNIX, a single slash ( / ) is used to denote the root directory The next slide shows the subdirectory layout in the top level of a typical UNIX file system – /dev holds specifications for the devices (i.e., special files) on the system – /etc holds files containing information regarding the network, accounts, and other databases that are specific to the computer – /home is the default directory containing the subdirectories for user accounts – /opt is a standard location for applications – /usr contains files shared among applications (e.g., /usr/include contains include files) – /var contains system files that vary and can grow arbitrarily large (e.g., log files, incoming mail) 33 Typical UNIX Directory Structure 34 Root Directory on uxb2 lrwxrwxrwx drwxr-xr-x drwxr-xr-x drwxr-xr-x drwxr-xr-x drwx-----drwxr-xr-x drwxr-xr-x drwxr-xr-x dr-xr-xr-x drwxr-xr-x lrwxrwxrwx drwx-----drwxr-xr-x drwxr-xr-x drwxr-xr-x drwx-----drwxr-xr-x dr-xr-xr-x drwx-----drwxr-xr-x drwxr-xr-x drwxrwxrwt drwxr-xr-x drwxr-xr-x dr-xr-xr-x 1 4 2 14 7 4 60 2 2 3 10 1 2 4 8 10 2 43 71 2 2 2 21 36 37 6 root root root root root root root root root root root root root root root root root root root root root root root root root root root root other sys sys root sys sys other root sys root root bin root sys root sys root root sys other sys sys sys root 9 512 512 4096 512 512 3584 512 512 3 512 9 8192 512 1024 512 512 1536 480032 512 1024 512 7680 1024 1024 512 Jun Jun Jun Apr Jun Mar May Jun Mar Jul Apr Jun Jun Jun Mar Jun Jun Jun Jul Jun Apr Jun Jul Jun Mar May 2 2 18 29 2 17 20 2 11 12 5 2 2 2 17 30 2 2 12 2 5 21 12 18 11 20 2004 2004 2004 09:33 2004 2005 17:02 2004 2005 18:56 2005 2004 2004 2004 2005 2004 2004 2004 19:29 2004 2005 2004 19:29 2004 2005 17:02 bin -> ./usr/bin cache cdrom dev devices dns etc export ftp home kernel lib -> ./usr/lib lost+found mail mnt opt patch platform proc radmin sbin src tmp usr var 35 vol UNIX Inode • • • • Traditionally, UNIX files have been implemented with a modified tree structure Directory entries contain a file name and a reference to a fixed-length structure called an inode (see the figure on the next slide) The inode contains information about the file size, the file location, the owner of the file, the time of creation, the time of last access, the time of last modification, permissions, etc. The inode also contains pointers to the first few data blocks of a file – If the file is large, the indirect pointer points to a block of pointers that point to additional blocks – If the file is still larger, the double indirect pointer is a pointer to a block of indirect pointers – If the file is huge, the triple indirect pointer contains a pointer to a block of double indirect pointers • • A block is the smallest unit of storage allocated for a file system and is always a quantity of bytes equal to some power of 2 When a system administrator creates a file system on a physical disk partition, the raw bytes are organized into data blocks and inodes – Each partition has its own pool of inodes that are uniquely numbered – Files created on that partition use inodes from that partition's pool 36 – The relative layout of the disk blocks and inodes has been optimized for performance UNIX Inode Structure * * - Time of last access - Time of last data modification - Time of last file status change 37 Directory Implementation • A directory is a file containing a correspondence between file names and file locations – UNIX has traditionally implemented a file location as an inode number • The inode itself does not contain the file name – When a program references a file by path name, the operating system traverses the file system tree to find the file name and inode number in the appropriate directory – Once it has obtained the inode number from the directory, the operating system can determine other information about the file by accessing the inode • Advantages of this implementation – Changing the file name requires only changing the directory entry – A file can be moved from one directory to another just by moving the directory entry, as long as the partition doesn't change – Only one physical copy of the file needs to exist on the disk; however, the file may have several names or the same name in different directories (all on the same partition) – Directory entries are of variable length because the file name is of variable length; manipulating small variable-length structures can be done efficiently – Directory entries are small, since most of the information about each file is kept in its inode 38 5.4a Links in Directories Hard and Soft Links • • • • • • • • UNIX directories contain two types of links: links and symbolic links A link, often called a hard link, is a directory entry that associates a file name with a file location A symbolic link, sometimes called a soft link, is a file that stores a string used to modify the path name when it is encountered during path name resolution The behavioral differences between hard and soft links in practice is often not readily obvious A directory entry corresponds to a single link, but an inode may be the target of several of these links Each inode contains the count of the number of links to the inode (i.e., the total number of directory entries that contain the inode number) When a program uses open() to create a file, the operating system makes a new directory entry and assigns a free inode to represent the newly created file The figure on the next slide shows a directory entry for a file called name1 in the /dirA directory 40 Directory Entry, Inode, and Data Block 41 Creating and Removing a Link • A user can create additional links to a file by using the ln shell command or by calling the link() function from a program • The creation of the new link allocates a new directory entry and increments the link count of the corresponding inode; the link uses no additional disk space A user can delete a file by using the rm shell command or by calling the unlink() function from a program • • • When either approach is used, the operating systems deletes the corresponding directory entry and decrements the link count in the inode It does not free the inode and the corresponding data blocks unless the operation causes the link count to be decremented to zero 42 Directory Entry, Inode, and Data Block 43 link() and unlink() Functions • The link() function creates a new directory entry for the existing file specified by currentPath in the directory specified by newPath #include <unistd.h> int link(const char *currentPath, const char *newPath); – If successful, the function returns zero; otherwise, it returns –1 and sets errno • The unlink() function removes the directory entry specified by path #include <unistd.h> int unlink(const char *path); – If the file's link count is zero and no process has the file open, the function frees the space occupied by the file (i.e., it deletes the file) – If successful, the function returns zero; otherwise, it returns –1 and set errno 44 Example use of link() Function and ln Command • • The figure on the next slide shows the result of creating an entry called name2 in /dirB for the existing name1 entry in /dirA This can be done using the ln command as shown in the following command line ln • /dirA/name1 /dirB/name2 This can also be done using the link() function as shown in the following code segment #include <stdio.h> #include <unistd.h> // . . . int status; status = link("/dirA/name1", "/dirB/name2"); if (status == -1) perror("Failed to make a new link in /dirB"); 45 Two Links to the Same File Current Path New Path 46 Example use of link() and unlink() #include <stdio.h> #include <unistd.h> // ******************************* int main(int argc, char *argv[]) { int status; if ( (argc == 4) && strcmp(argv[1], "L") == 0) { status = link(argv[2], argv[3]); if (status == -1) perror("Failed to link file"); else printf("Successfully linked %s to %s\n", argv[2], argv[3]); } else if ( (argc == 3) && strcmp(argv[1], "U") == 0) { status = unlink(argv[2]); if (status == -1) perror("Failed to unlink file"); else printf("Successfully unlinked %s\n", argv[2]); } 47 (more on next slide) Example use of link() and unlink() (continued) else { fprintf(stderr, "Usage: a.out L current_path new_path\n"); fprintf(stderr, " a.out U current_path\n"); return 1; } // End if return 0; } // End main 48 5.4b Symbolic Links in Directories Creating and Removing a Symbolic Link • • • • • • • A symbolic link is a file containing the name of another file or directory A reference to the name of a symbolic link causes the operating system to locate the inode corresponding to that link The operating system assumes that the data blocks of the corresponding inode contain another path name The operating system then locates the directory entry for that path name and continues to follow the chain until it finally encounters a hard link and a real file A user can create a symbolic link by using the ln shell command with the –s option or by calling the symlink() function from a program The symbolic link may be fully qualified or it may be relative to its own directory Unlike the single partition limitation placed on a hard link, a symbolic link allows the link to be created between two partitions 50 symlink() Function • The symlink() function creates a symbolic link in a directory • path1 contains the string that will be the contents of the link and path2 contains the pathname of the link • In other words, path2 is the newly created link and path1 is what the new link points to #include <unistd.h> int symlink(const char *path1, const char *path2); • • If successful, the function returns zero; otherwise, it returns –1 and sets errno Unlike a hard link, a symbolic link uses a new inode 51 Example use of symlink() Function and ln Command • • The figure on the next slide shows the result of creating a symbolic link called name2 in /dirB for the existing name1 entry in /dirA This can be done using the ln command as shown in the following command line ln • -s /dirA/name1 /dirB/name2 This can also be done using the symlink() function as shown in the following code segment #include <stdio.h> #include <unistd.h> // . . . int status; status = symlink("/dirA/name1", "/dirB/name2"); if (status == -1) perror("Failed to create a symbolic link in /dirB"); 52 Ordinary File with a Symbolic Link to it Current Path1 New Path2 53