System Programming Chapter 3 File I/O 1 Announcement • The first exercise is due today. We will NOT accept late assignment this time. – Submission site is up. – Sign up your account, activate your account, and submit your exercise on-line if possible. – You can also hand in your exercise to TAs. • MP1 is out and is due on March 21st. 2 Fun Project • Topobo (MIT media lab) • A new way of “programming” without “writing programs”? 3 Tips for Writing Good Codes – Design to Test • Avoid the errors is much easier than fixing the errors. • Tips: – Writing accessible code: be prepared for giving your code away when you start. • Document your code. • Giving the variables good names, or things can get very confusing, very fast, as the cognitive dissonance interferes with your brain's normal processing. – InputFromFile is better than buf1/data_2. – validsize(InputNum) is better than checksize(InputNum) – Automatically test your code not manually – Writing unit test 4 Writing Unit Tests • Write a test module for your function. • Example: double mySqrt(double num). • What to test your function? – Pass in a negative argument and ensure that it is rejected. – Pass in an argument of zero to ensure that it is accepted. – Pass in values between zero and the maximum expressible argument and verify that the different between the square of the result and the original argument is less than some value epsilon. 5 24 void testValue(double num, double expected) { 25 double result = mySqrt(num); 26 if (num < 0) { 27 // Should be NaN if negative arg 28 assert(isnan(result)); 29 } 30 else { 31 // Should be within tolerance otherwise 32 assert(fabs(expected-result) < epsilon); 33 } 34 } 6 Contents 1. Preface/Introduction 2. Standardization and Implementation 3. File I/O 4. Standard I/O Library 5. Files and Directories 6. System Data Files and Information 7. Environment of a Unix Process 8. Process Control 9. Signals 10. Inter-process Communication 7 File I/O • Objective of this chapter: – Functions available for file I/O – Atomic operations in multiprogramming environments • Unbuffered I/O – Popular functions: open, close, read, write, lseek, dup, fcntl, ioctl – Each read() and write() invokes a system call! 8 File I/O • File – A sequence of bytes • Directory – A file that includes info on how to find other files. 9 File I/O • Path name – Absolute path name • Start at the root / of the file system • /user/john/fileA – Relative path name • Start at the “current directory” which is an attribute of the process accessing the path name. • ./dirA/fileB • Links (wait till later chapters) 10 File I/O • File Descriptor – Non-negative integer returned by open() or creat(): 0 .. OPEN_MAX • Virtually un-bounded for SVR4 & 4.3+BSD – Per-process base – POSIX.1 – 0: STDIN_FILENO, 1: STDOUT_FILENO, 2: STDERR_FILENO • <unistd.h> • Convention employed by the Unix shells and applications 11 File I/O - File Manipulation • Operations – open, close, read, write, lseek, dup, fcntl, ioctl, trunc, rename, chmod, chown, mkdir, cd, opendir, readdir, closedir, etc. • File descriptor data block data block Read(4, …) sync Tables of Opened Files (per process) System Open File Table In-core v-node list i-node i-node i-node 12 File I/O – open() #include <sys/types> #include <sys/stat.h> #include <fcntl.h> int open(const char*pathname, int oflag, …/*, mode_t mode */); • File/Path Name – PATH_MAX, NAME_MAX – _POSIX_NO_TRUNC -> ENAMETOOLONG if error occurs (NAME_MAX or PATH_MAX). • • • • • O_RDONLY, O_WRONLY, O_RDWR O_APPEND, O_TRUNC, (O_NOCTTY) O_CREAT, O_EXCL -> (atomicity) O_NONBLOCK O_DSYNC (write data only), O_RSYNC (read waits for write completion), O_SYNC (write data & attr) 13 File I/O – creat() and close() #include <sys/types> #include <sys/stat.h> #include <fcntl.h> int creat(const char*pathname, mode_t mode); • = open(pathname, O_WRONLY | O_CREAT | O_TRUNC, mode) • Only for write-access #include <unistd.h> int close(int filedes); • All open files are automatically closed by the kernel when a process terminates. 14 Why need atomicity? fd = open(“foo”, O_CREAT | O_EXCL, mode); Can be rewritten to … if ((fd = open(“foo”, O_WRONLY)) < 0) { // multiprogramming can do something bad unexpectedly if (errno == ENOENT) { if ((fd = creat(“foo”, mode)) < 0) { err_sys(“create error”); } ….. One complex operation with multiple functiion calls -> all steps or none executed 15 File I/O - lseek #include <sys/types> #include <unistd.h> off_t lseek(int filedes, off_t offset, int whence); • Change the offset in a file descriptor • whence: SEEK_SET(beginning), SEEK_CUR, SEEK_END • Example: fd = open(); lseek(fd, 100, SEEK_SET) • (Say you lose track) how do you find the current offset? currpos = lseek(fd, 0, SEEK_CUR) • off_t: typedef long off_t; /* 231 bytes */ – or typedef longlong_t off_t; /* 263 bytes */ • No I/O takes place until next read or write. 16 File I/O - lseek • Program 3.1 – Page 52 – Test if “standard input” is capable of seeking? – cat < /etc/motd | seek cannot seek a FIFO or pipe (EPIPE) – seek < /var/spool/cron/FIFO • Program 3.2 – Page 53, hole creating! – od –c file.hole 000000 a b c \0 \0 \n 000006 17 File I/O – read and write #include <unistd.h> ssize_t read(int filedes, void *buf, size_t nbytes); • Less than nbytes of data are read: – EOF, terminal device (line-input), network buffering, record-oriented devices (e.g., tape) – Offset is increased for every read() – SSIZE_MAX #include <unistd.h> ssize_t write(int filedes, const void *buf, size_t nbytes); – Write errors for disk-full or file-size-limit causes. – When O_APPEND is set, the file offset is set to the end of the file before each write operation. 18 File I/O - Efficiency • Program 3.3 – Page 56 – No needs to open/close standard input/output – Copy stdin to stdout (> /dev/null) – Try I/O redirection in reading an 1.4M file • Buffersize 1 64 512 1024 8192 131072 UsrCPU 23.8s 0.3s 0.0s 0.0s 0.0s 0.0s SysCPU 397.9s 6.6s 1.0s 0.6s 0.3s 0.3s Clock 423.4s 7.0s 1.1s 0.6s 0.3s 0.3s #loops 1468802 22951 2869 1435 180 12 19 File I/O – Sharing data block data block Read(4, …) sync Tables of Opened Files (per process) System Open File Table i-node i-node i-node In-core V-node list • Table per process: filedes flags (close-on-exec), a pointer • Sys open file table: file status(open flags), offset, a v-node pointer • V-node (V = virtual) – i-node: owner, file size, residing device, block ptr,.. 20 File I/O – Sharing filedes flags Tables of Opened Files (per process) file status / offset System Open File Table file size In-core i-node list • Each “independently opened file” has its offset. • Examples – – – – – Write offset is incremented! O_APPEND offset = current file size before each write lseek() causes no I/O (only on the system open file table) fork() & dup()causes the sharing of entries in the (system open) file table. filedes flags versus file status flags 21 Why needs Atomic Operation? • Atomic Operation – Composed of multiple steps (function calls)? • Say you want to append to the end of a file • What can go wrong in multiple processes? if (lseek(fd, 0L, 2) < 0) err_sys(“lseek err”); if (write(fd, buf, 10) != 10) err_sys(“wr err”); • Problem solved with O_APPEND fd = open(pathname, O_WRONLY | O_APPEND, mode); write(fd, buf, 10) 22 File I/O – pread and pwrite #include <unistd.h> ssize_t pread(int filedes, void *buf, size_t nbytes, off_t offset); ssize_t pwrite(int filedes, const void *buf, size_t nbytes, off_t offset); • Read and write from a filedes at a “specified” offset. • Why create them? (= lseek + read/write ?) • Do not update the file offset • Atomic operation 23 File I/O – dup and dup2 #include <unistd.h> int dup(int filedes); int dup2(int filedes, int newfiledes); • dup(): duplicate filedes & return the lowest available filedes. • dup2(): duplicate filedes to newfiledes – What if newfiledes is already open? – close(newfiledes); fcntl(fildes, F_DUPFD, newfiledes); // atomic Tables of Opened Files (per process) System Open File Table In-core i-node list 24 File I/O: sync(), fsync(), fdatasync() Kernel maintains a buffer cache between apps & disks. #include <unistd.h> int fsync(int filedes); // data + attr sync int fdatasync(int filedes); // data sync void sync(); // returns immediately Apps write() Kernel buffer cache Disk DD queue Disks 25 File I/O - fcntl #include <sys/types> #include <unistd.h> #include <fcntl.h> int fcntl(int filedes, int cmd, … /* int arg */); • • • • Changes the properties of opened files F_DUPFD: duplicate an existing file descriptor (>= arg). – FD_CLOEXEC is cleared (for exec()). F_GETFD, F_SETFD: file descriptor flag, e.g., FD_CLOEXEC F_GETFL, F_SETFL: file status flags – O_APPEND, O_NOBLOCK, O_SYNC, O_ASYNC, O_RDONLY, O_WRONLY … val = fcntl(fd, F_GET_FL, 0); val |= flags; fcntl(fd, F_SETFL, val) • F_GETOWN, F_SETOWN 26 File I/O - fcntl • Figure 3.10 – Print file flags for a specified descriptor • Figure 3.11 – Turn on one or more flags • val &= ~flags: clear the flag. • set_fl(STDOUT_FILENO, O_SYNC); 27 Synchronization (Linux ext2) Operations Clock Time Normal write() to disk 6.87 write() to disk file with O_SYNC set 6.83 write() to disk followed by fdatasync() 18.28 write() to disk followed by fsync() 17.95 write() to disk with O_SYNC set followed by fsync() 17.95 28 Synchronization (Mac OX X) Operations Clock Time Normal write() to disk 14.40 write() to disk file with O_FSYNC set 22.48 write() to disk followed by fsync() 14.12 write() to disk with O_FSYNC set followed by fsync() 22.12 29 File I/O - ioctl #include <unistd.h> #include <sys/ioctl.h> int ioctl(int filedes, int request, …); • Catchall for I/O operations (not just disk I/O) – E.g., setting of the size of a terminal’s window. • SVR4 prototype • More headers could be required: – Disk labels (<disklabel.h>), file I/O (<ioctl.h>), mag tape (<mtio.h>), socket I/O (<ioctl.h>), terminal I/O (<ioctl.h>) 30 File I/O - /dev/fd • /dev/fd/n – open(“/dev/fd/n”,mode) duplicate descriptor n (assuming that n is open) • open(“/dev/fd/0”,mode) same as fd=dup(0) • The new mode must be a subset of that of the referenced file. – Uniformity and Cleanliness! • cat /dev/fd/0 31 What did I learn today? • What can you do with File I/O? • Why is atomicity needed? • What is the impact of sync on I/O performance? 32