Files

advertisement





Types of files
Command line arguments
File input and output functions
Binary files
Random access

Data stored in main memory does not persist
 Most programs require that the user be able to
store and retrieve data from previous sessions
 In persistent storage, such as a hard disk

There are many forms of persistent storage
 Which suggests that the low level processes for
accessing them is different
 A file is a high level representation that allows us
to ignore low level details

Files have formats
 A set of rules that determine the meaning of its
contents

To read a file
 Know (or find) its name
 Open it for reading
 Read in the data in the file
 Close it

There are similar processes for writing to files


A file represents a section of storage
Files are viewed as contiguous sequences of
bytes which can be read individually
 In reality a file may not be stored sequentially and
may not be read one byte at a time
 Such details are the responsibility of the OS

C has two ways to view files
 Text
 Binary

There is a distinction between text files and
binary files
 Text files store all data as text whereas binary files
store the underlying binary representation

In addition C allows for both text and binary
views of files
 Usually the binary view is used with binary files



In addition to types of files and views of files
C has a choice of I/O levels
Low level I/O uses the fundamental I/O given
by the OS
Standard high level I/O uses a standard
package of library functions
 ANSI C only supports standard I/O since all OS I/O
cannot be represented by one low level model

We will only look at standard I/O

C automatically assigns input and output to
standard files for some I/O functions
 e.g. getchar(), gets(), scanf(), printf(), puts()

There are three standard devices for I/O
 Standard input is set to the keyboard
 Standard output is set to the display
 Standard error is set to the display

Redirection causes other devices or files to be
used for standard input or output

The input or output of a C program can be
redirected to a file
 When the program is run at the command prompt
▪ > file redirects standard input to the named file
▪ < file redirects standard output to the named file

To redirect input for a program called a3 to a file
called a3test
 ./a3 > a3test
▪ ... which is what we are doing when marking some of your
assignments ...

Command line arguments are additional
input for programs
 For example, gcc takes a number of command line
arguments, such as in gcc -o hello helloworld.c

Our C programs can also take command line
arguments
 Another form of declaring the main function gives
the main function two arguments
int main(int argc, char* argv[])
array of strings (the arguments)
number of arguments
short for argument count, the count should be one
more than the number of arguments
the first string is the name of the command, the
remaining strings are the additional arguments

The value of argc is derived
 It does not have to be entered by the user

The first element of argv is the name of the
executable (the program name)
 On most systems

The second and subsequent elements of argv
are the arguments
 In the order in which they were entered

Write a program to count the number and
types of characters in a file
 The file to be read will be given as a command line
argument to the program

The program will exit under two conditions
 The wrong number of arguments are given to the
program
 The file cannot be opened
#include "stdio.h"
#include "ctype.h"
required for character comparisons
//Forward Declarations
void countCharacters(FILE* fp, char* fName);
first test to make sure the user has entered the command correctly
int main(int argc, char* argv[])
{
size of argv
FILE* fp;
// Test the number of arguments
i.e. command and one argument
if(argc != 2){
printf("%s requires file name\n", argv[0]);
exit(1);
}
in Unix (or Linux) the program
can be given different names
then attempt to open the file for reading
int main(int argc, char* argv[])
{
returns NULL if file cannot be opened
// ...
the argument, should be a file name
// Attempt to open file
if((fp = fopen(argv[1], "r")) == NULL){
printf("Cannot open %s\n", argv[1]);
exit(1);
1 has the value of EXIT_FAILURE, note that exit
}
will exit the program from any function
countCharacters(fp, argv[1]);
fclose(fp);
return 0;
}
processes the file
documentation and variable declarations
// Prints the count of characters in a file, by:
// alpha
// digits
note the pre-condition is documented
// whitespace
// other
// PRE: fp can be opened and read
// PARAM: fp is a pointer to a file to be read
void countCharacters(FILE* fp, char* fName)
{
int alpha = 0;
int digits = 0;
int white = 0; variable declarations
int other = 0;
int total = 0;
char ch;
go through the file one character at a time
void countCharacters(FILE* fp, char* fName)
{
processes each character until end-of-file
// ...
// Read file one character at a time
while((ch = getc(fp)) != EOF){
if(isalpha(ch)){
it’s an if ... else if ... else statement
alpha++;
to minimize comparisons and to
}else if(isdigit(ch)){
ensure that other is counted
digits++;
correctly
}else if(isspace(ch)){
white++;
}else{
other++;
}
}
total = alpha + digits + white + other;
and then print the number of characters
void countCharacters(FILE* fp, char* fName)
{
// ...
// Print number of characters
printf("%s contains %d characters\n", fName, total);
printf("%d letters\n", alpha);
printf("%d digits\n", digits);
printf("%d whitespace\n", white);
printf("%d other\n", other);
prints the count of each type of
}
character
here is a sample run of the program
changes directory to the directory
containing the .exe
no such file
it’s a Word document
no file name argument

There is no need to use command line
arguments with the preceding program
 It’s just an example of using them

A different version of the program could
allow the user to process multiple files
 With a loop that ended the program when the
user wanted
 In which case it would not make sense to have the
file name as a command line argument

The fopen function is used to open files
 It returns a pointer to a FILE structure
 The FILE structure is defined in stdio.h and
contains data about the file


If the file cannot be opened fopen returns the
null pointer
fopen takes two string arguments
 The name of the file to be opened
 The mode in which the file is to be opened
Mode
Meaning
"r"
opens text file for reading
"w"
opens text file for writing, overwrites existing files, creates new
files
"a"
opens text file for writing, appends to the end of existing files
"r+"
opens text file for update (both reading and writing)
"w+"
opens text file for update (both reading and writing) overwrites
existing files, creates new files
"a+"
opens text file for update (both reading and writing) the whole file
can be read but writing only appends to the end of the file
"rb", “wb", ...
the same as the preceding modes except that it uses binary rather
than text mode

The functions getc and putc can be used for
character based file I/O
 They are similar to getchar and putchar except
that they require a file argument

The getc function will return the EOF value if
it has reached the end of a file
 To avoid trying to process empty files check for
EOF before processing the first character

Files should be closed when finished with
 Using the fclose function which takes a file pointer
 The fclose function flushes buffers as required,
and allows the file to be correctly opened again

The fclose function returns 0 if a file was
closed successfully and EOF if it was not
 Files can be unsuccessfully closed if the disk is full
or if their drive is removed

There are file I/O functions similar to the I/O
functions we’ve been using
 Each function takes a FILE pointer
▪ Which could be stdin or stdout if input is to be from the
keyboard, or output to the display

The fprintf and fscanf functions work just like
scanf and printf except with files
 The file pointer is an additional first argument
▪ The file pointer is the last argument for putc

The rewind function moves the file pointer
back to the front of the file

The fgets function is used for string input
 The first argument is an address of a string
 The second is the maximum length of the string
 The third is the file where input is stored
 fgets returns NULL when it encounters an EOF

The fputs function is used for string output
 It has arguments for a string and a file pointer
 It does not append a newline when it prints
▪ Unlike puts which does
opens the file for appending and reading (a+)
#include "stdio.h"
const int FNAME_LEN = 20;
const int NAME_MAX = 40;
maximum lengths of file names and names
int main()
{
char fname[FNAME_LEN];
char name[NAME_MAX];
FILE* fp;
printf("Enter the name of the file: ");
open for append and read, will create a
gets(fname);
new file if fname does not exist
fp = fopen(fname, "a+")
add words to the end of the file
int main()
{
puts prints a newline
// ...
puts("Enter names to add to the file");
while(gets(name) != NULL && name[0] != '\0'){
fprintf(fp, "%s\n", name);
}
similar to printf, can be used
to format numeric values
the while loop continues until the user
presses enter twice in sequence
then print the entire contents of the file
int main()
{
// ...
puts("File contents\n");
goes back to the start of the file
rewind(fp);
while(fgets(name, NAME_MAX, fp) != NULL){
printf("%s",name);
fgets is used instead of fscanf
}
since names consist of two words
fclose(fp);
return 0;
}
here is a sample run of the program
note that the new names have been appended to
the existing file rather than over-writing the file

All of the examples have involved string and
character storage
 Consider storing numeric data

Storing integers is straightforward
 But what about storing floating point values?

We could use fprintf for floating point values
 e.g. fprintf(fp, "%f", num);
 But this entails making decisions about the format
specifier

If fprintf stores numeric values they are
converted to characters and stored as text
 This may waste space if the number contains
many digits (e.g. 1.0/3)
 Or may lose precision if the format specifier is
used to fix decimal places
▪ fprintf(fp, "%.2f", 1.0/3);

An alternative is to store the same pattern of
bits used to represent the value

A binary file stores data using the same
representation as a program
 Numeric data are not converted to strings

The functions fread and fwrite are used for
binary I/O
 They are a little more complex than text file
functions
 They require information about the size of data to
be stored
address of the first memory
location to be written
the size of the
variables
the number of
variables
size_t fwrite(void * ptr, size_t size, size_t nmemb, FILE* fp)
size_t is a type, defined in terms of other C
standard types and is usually an unsigned int
size_t is the type returned by sizeof
file pointer

The complex structure of fwrite allows it to
store entire arrays in one function call
 double temperatures[365];
 fwrite(temperatures, sizeof(double), 365, fp);

The return value of fwrite is the number of
items successfully written to the file
 This should equal the nmemb parameter

The fread function takes the same set of
arguments as fwrite
 The ptr argument is the address in memory to
read the data into

fread should be used to read files that were
written using fwrite
 double temperatures[365];
 fread(temperatures, sizeof(double), 365, fp);

It may be useful to move to a particular
location in a file
 Without reading the preceding part of the file, like
reading an array
 This is known as random access

The fseek and ftell functions allow random
access to files
 They are usually used with binary files

The fseek function has three arguments
 A file pointer to the file
 An offset indicating the distance to be moved
from the starting point
 The mode which identifies the starting point
▪ SEEK_SET – the beginning of the file
▪ SEEK_CUR – the current position
▪ SEEK_END – the end

fseek returns 0 normally and -1 for an error
 Such as reading past the end of the file

The ftell function returns the current position
in a file, as a long
 The number of bytes from the start of the file

fseek and ftell may differ based on the OS
 Since the distance that fseek moves is measured
in bytes they are normally used for binary files

ANSI C introduced fgetpos and fsetpos for use
with larger file sizes


This example creates an array of random
values and writes them to a binary file
The user is then asked for an index value
 The program finds and prints the value with that
index in the file using fseek and fread
declarations
#include "stdio.h"
#include "stdlib.h"
#define ARR_SIZE 100
length of the array
int main()
{
double numbers[ARR_SIZE];
double value;
int i;
long pos;
char* fname = "numbers.dat";
FILE* fp;
create the array to written to the file
int main()
{
// ...
// Create a set of double values
for(i = 0; i < ARR_SIZE; ++i){
numbers[i] = i + (double)rand() / RAND_MAX;
}
defined in stdlib.h
this is probably unnecessarily complicated but it produces an
ordered array of doubles with digits to the right of the decimal point
write the array to the file
int main()
{
// ...
for writing a binary file
// Open file for writing
if((fp = fopen(fname, "wb")) == NULL){
fprintf(stderr, "Could not open %s.\n", fname);
exit(1);
}
// Write array in binary format
fwrite(numbers, sizeof(double), ARR_SIZE, fp);
fclose(fp);
the array
size of each value
number of values
open file for reading
int main()
{
// ...
for reading a binary file
// Open file for reading
if((fp = fopen(fname, "rb")) == NULL){
fprintf(stderr, "Could not open %s.\n", fname);
exit(1);
}
read values from the file
int main()
{
// ...
// Read array elements as requested
printf("Enter index in range 0 to %d: ", ARR_SIZE-1);
scanf("%d", &i);
position in file to be read
while(i >= 0 && i < ARR_SIZE){
pos = (long) i * sizeof(double);
move to position
fseek(fp, pos, SEEK_SET);
fread(&value, sizeof(double), 1, fp);
binary read
printf("value at index %d = %.2f\n", i, value);
printf("Enter index (out of range to quit): ");
scanf("%d", &i);
}
get next position
fclose(fp);
}
here is a sample run of the program
note that the binary file is not comprehensible by humans
Download