Types of files Command line arguments File input and output functions Binary files Random access Data stored in main memory does not persist Most programs require that the user be able to store and retrieve data from previous sessions In persistent storage, such as a hard disk There are many forms of persistent storage Which suggests that the low level processes for accessing them is different A file is a high level representation that allows us to ignore low level details Files have formats A set of rules that determine the meaning of its contents To read a file Know (or find) its name Open it for reading Read in the data in the file Close it There are similar processes for writing to files A file represents a section of storage Files are viewed as contiguous sequences of bytes which can be read individually In reality a file may not be stored sequentially and may not be read one byte at a time Such details are the responsibility of the OS C has two ways to view files Text Binary There is a distinction between text files and binary files Text files store all data as text whereas binary files store the underlying binary representation In addition C allows for both text and binary views of files Usually the binary view is used with binary files In addition to types of files and views of files C has a choice of I/O levels Low level I/O uses the fundamental I/O given by the OS Standard high level I/O uses a standard package of library functions ANSI C only supports standard I/O since all OS I/O cannot be represented by one low level model We will only look at standard I/O C automatically assigns input and output to standard files for some I/O functions e.g. getchar(), gets(), scanf(), printf(), puts() There are three standard devices for I/O Standard input is set to the keyboard Standard output is set to the display Standard error is set to the display Redirection causes other devices or files to be used for standard input or output The input or output of a C program can be redirected to a file When the program is run at the command prompt ▪ > file redirects standard input to the named file ▪ < file redirects standard output to the named file To redirect input for a program called a3 to a file called a3test ./a3 > a3test ▪ ... which is what we are doing when marking some of your assignments ... Command line arguments are additional input for programs For example, gcc takes a number of command line arguments, such as in gcc -o hello helloworld.c Our C programs can also take command line arguments Another form of declaring the main function gives the main function two arguments int main(int argc, char* argv[]) array of strings (the arguments) number of arguments short for argument count, the count should be one more than the number of arguments the first string is the name of the command, the remaining strings are the additional arguments The value of argc is derived It does not have to be entered by the user The first element of argv is the name of the executable (the program name) On most systems The second and subsequent elements of argv are the arguments In the order in which they were entered Write a program to count the number and types of characters in a file The file to be read will be given as a command line argument to the program The program will exit under two conditions The wrong number of arguments are given to the program The file cannot be opened #include "stdio.h" #include "ctype.h" required for character comparisons //Forward Declarations void countCharacters(FILE* fp, char* fName); first test to make sure the user has entered the command correctly int main(int argc, char* argv[]) { size of argv FILE* fp; // Test the number of arguments i.e. command and one argument if(argc != 2){ printf("%s requires file name\n", argv[0]); exit(1); } in Unix (or Linux) the program can be given different names then attempt to open the file for reading int main(int argc, char* argv[]) { returns NULL if file cannot be opened // ... the argument, should be a file name // Attempt to open file if((fp = fopen(argv[1], "r")) == NULL){ printf("Cannot open %s\n", argv[1]); exit(1); 1 has the value of EXIT_FAILURE, note that exit } will exit the program from any function countCharacters(fp, argv[1]); fclose(fp); return 0; } processes the file documentation and variable declarations // Prints the count of characters in a file, by: // alpha // digits note the pre-condition is documented // whitespace // other // PRE: fp can be opened and read // PARAM: fp is a pointer to a file to be read void countCharacters(FILE* fp, char* fName) { int alpha = 0; int digits = 0; int white = 0; variable declarations int other = 0; int total = 0; char ch; go through the file one character at a time void countCharacters(FILE* fp, char* fName) { processes each character until end-of-file // ... // Read file one character at a time while((ch = getc(fp)) != EOF){ if(isalpha(ch)){ it’s an if ... else if ... else statement alpha++; to minimize comparisons and to }else if(isdigit(ch)){ ensure that other is counted digits++; correctly }else if(isspace(ch)){ white++; }else{ other++; } } total = alpha + digits + white + other; and then print the number of characters void countCharacters(FILE* fp, char* fName) { // ... // Print number of characters printf("%s contains %d characters\n", fName, total); printf("%d letters\n", alpha); printf("%d digits\n", digits); printf("%d whitespace\n", white); printf("%d other\n", other); prints the count of each type of } character here is a sample run of the program changes directory to the directory containing the .exe no such file it’s a Word document no file name argument There is no need to use command line arguments with the preceding program It’s just an example of using them A different version of the program could allow the user to process multiple files With a loop that ended the program when the user wanted In which case it would not make sense to have the file name as a command line argument The fopen function is used to open files It returns a pointer to a FILE structure The FILE structure is defined in stdio.h and contains data about the file If the file cannot be opened fopen returns the null pointer fopen takes two string arguments The name of the file to be opened The mode in which the file is to be opened Mode Meaning "r" opens text file for reading "w" opens text file for writing, overwrites existing files, creates new files "a" opens text file for writing, appends to the end of existing files "r+" opens text file for update (both reading and writing) "w+" opens text file for update (both reading and writing) overwrites existing files, creates new files "a+" opens text file for update (both reading and writing) the whole file can be read but writing only appends to the end of the file "rb", “wb", ... the same as the preceding modes except that it uses binary rather than text mode The functions getc and putc can be used for character based file I/O They are similar to getchar and putchar except that they require a file argument The getc function will return the EOF value if it has reached the end of a file To avoid trying to process empty files check for EOF before processing the first character Files should be closed when finished with Using the fclose function which takes a file pointer The fclose function flushes buffers as required, and allows the file to be correctly opened again The fclose function returns 0 if a file was closed successfully and EOF if it was not Files can be unsuccessfully closed if the disk is full or if their drive is removed There are file I/O functions similar to the I/O functions we’ve been using Each function takes a FILE pointer ▪ Which could be stdin or stdout if input is to be from the keyboard, or output to the display The fprintf and fscanf functions work just like scanf and printf except with files The file pointer is an additional first argument ▪ The file pointer is the last argument for putc The rewind function moves the file pointer back to the front of the file The fgets function is used for string input The first argument is an address of a string The second is the maximum length of the string The third is the file where input is stored fgets returns NULL when it encounters an EOF The fputs function is used for string output It has arguments for a string and a file pointer It does not append a newline when it prints ▪ Unlike puts which does opens the file for appending and reading (a+) #include "stdio.h" const int FNAME_LEN = 20; const int NAME_MAX = 40; maximum lengths of file names and names int main() { char fname[FNAME_LEN]; char name[NAME_MAX]; FILE* fp; printf("Enter the name of the file: "); open for append and read, will create a gets(fname); new file if fname does not exist fp = fopen(fname, "a+") add words to the end of the file int main() { puts prints a newline // ... puts("Enter names to add to the file"); while(gets(name) != NULL && name[0] != '\0'){ fprintf(fp, "%s\n", name); } similar to printf, can be used to format numeric values the while loop continues until the user presses enter twice in sequence then print the entire contents of the file int main() { // ... puts("File contents\n"); goes back to the start of the file rewind(fp); while(fgets(name, NAME_MAX, fp) != NULL){ printf("%s",name); fgets is used instead of fscanf } since names consist of two words fclose(fp); return 0; } here is a sample run of the program note that the new names have been appended to the existing file rather than over-writing the file All of the examples have involved string and character storage Consider storing numeric data Storing integers is straightforward But what about storing floating point values? We could use fprintf for floating point values e.g. fprintf(fp, "%f", num); But this entails making decisions about the format specifier If fprintf stores numeric values they are converted to characters and stored as text This may waste space if the number contains many digits (e.g. 1.0/3) Or may lose precision if the format specifier is used to fix decimal places ▪ fprintf(fp, "%.2f", 1.0/3); An alternative is to store the same pattern of bits used to represent the value A binary file stores data using the same representation as a program Numeric data are not converted to strings The functions fread and fwrite are used for binary I/O They are a little more complex than text file functions They require information about the size of data to be stored address of the first memory location to be written the size of the variables the number of variables size_t fwrite(void * ptr, size_t size, size_t nmemb, FILE* fp) size_t is a type, defined in terms of other C standard types and is usually an unsigned int size_t is the type returned by sizeof file pointer The complex structure of fwrite allows it to store entire arrays in one function call double temperatures[365]; fwrite(temperatures, sizeof(double), 365, fp); The return value of fwrite is the number of items successfully written to the file This should equal the nmemb parameter The fread function takes the same set of arguments as fwrite The ptr argument is the address in memory to read the data into fread should be used to read files that were written using fwrite double temperatures[365]; fread(temperatures, sizeof(double), 365, fp); It may be useful to move to a particular location in a file Without reading the preceding part of the file, like reading an array This is known as random access The fseek and ftell functions allow random access to files They are usually used with binary files The fseek function has three arguments A file pointer to the file An offset indicating the distance to be moved from the starting point The mode which identifies the starting point ▪ SEEK_SET – the beginning of the file ▪ SEEK_CUR – the current position ▪ SEEK_END – the end fseek returns 0 normally and -1 for an error Such as reading past the end of the file The ftell function returns the current position in a file, as a long The number of bytes from the start of the file fseek and ftell may differ based on the OS Since the distance that fseek moves is measured in bytes they are normally used for binary files ANSI C introduced fgetpos and fsetpos for use with larger file sizes This example creates an array of random values and writes them to a binary file The user is then asked for an index value The program finds and prints the value with that index in the file using fseek and fread declarations #include "stdio.h" #include "stdlib.h" #define ARR_SIZE 100 length of the array int main() { double numbers[ARR_SIZE]; double value; int i; long pos; char* fname = "numbers.dat"; FILE* fp; create the array to written to the file int main() { // ... // Create a set of double values for(i = 0; i < ARR_SIZE; ++i){ numbers[i] = i + (double)rand() / RAND_MAX; } defined in stdlib.h this is probably unnecessarily complicated but it produces an ordered array of doubles with digits to the right of the decimal point write the array to the file int main() { // ... for writing a binary file // Open file for writing if((fp = fopen(fname, "wb")) == NULL){ fprintf(stderr, "Could not open %s.\n", fname); exit(1); } // Write array in binary format fwrite(numbers, sizeof(double), ARR_SIZE, fp); fclose(fp); the array size of each value number of values open file for reading int main() { // ... for reading a binary file // Open file for reading if((fp = fopen(fname, "rb")) == NULL){ fprintf(stderr, "Could not open %s.\n", fname); exit(1); } read values from the file int main() { // ... // Read array elements as requested printf("Enter index in range 0 to %d: ", ARR_SIZE-1); scanf("%d", &i); position in file to be read while(i >= 0 && i < ARR_SIZE){ pos = (long) i * sizeof(double); move to position fseek(fp, pos, SEEK_SET); fread(&value, sizeof(double), 1, fp); binary read printf("value at index %d = %.2f\n", i, value); printf("Enter index (out of range to quit): "); scanf("%d", &i); } get next position fclose(fp); } here is a sample run of the program note that the binary file is not comprehensible by humans