Open a file

advertisement
FILE I/O:
Low-level
1
The Big Picture
2
Reading vs. Writing
"Reading a file"
To obtain data from a file (typically on a disk)
and move a copy of it into RAM
"Writing a file"
To copy data from RAM into a file (typically on
a disk).
3
Why Use Files?
Volatile Memory
Computers today use “volatile RAM” – contents get erased when the
power goes out. Someday, computers will use “non-volatile RAM”
(e.g. flash RAM). Until then, we make use of “secondary storage”
(hard drive) to save information more permanently.
Packaging
Putting information in a file gets it organized and keeps it together
Too much data
Frequently we have more information available than we can
work with in main memory (RAM). So we store most of it on
the hard drive (secondary storage) and retrieve only what we
need.
4
File Applications
Databases
Logs – attendance, events, history
Journals / Diaries
Address books
Sensor data
Shuttle ECO sensor
Documents
Automotive O2 sensor
Almost every program uses files!
5
Low-Level, cont.

Some files are mixed format that are not readable by highlevel functions such as xlsread()

Since the data is not easily recognized by the high-level
function, every step to read the file requires a separate
MATLAB command:
1.
2.
3.
Open a file, either to read-from or to write-to
Read or write data from/to the file with the specific delimiters
Close the file
6
Opening / Closing Files
"Open a file"
Program requests access to a file (from the
OS) for reading and/or writing data to/from
the file.
"Close a file"
Inform the OS that the program is finished
working with the file.
7
Opening and closing a file

Template to open a file
fh = fopen(<filename>,<mode>);



fh is known as a file handle or file identifier. It is used in future function calls
to identify this is the file to use. It is like a "nickname" and we will use it
instead of the filename when working with the file.
<filename> represents a string that is the name of the file with its
extension (the letters after the dot). It can either be hardcoded, or within a
variable
<mode> is a string specifying the purpose of opening the file – called the
"access mode". The most commonly used are





'r' to read only from a file – bring data into memory from the disk
'w' to write to a file – put data from memory onto the disk
'a' to append to a file (add data to the end of the file)
Add + to the string to combine reading and writing (e.g. 'r+', 'w+')
Template to close a file
fclose(fh);
8
Some examples…
% Example 1) open a file from which to read
fileGrades = fopen('grades.txt', 'r'); %hardcode the filename
<code block to be inserted here>
%close file
Use the file handle – not the file name!
fclose(fileGrades);
% Example 2) ask user for a filename, then open it to read
nameFile = input('Name of file with grades? (e.g. grades.txt): ', 's');
fileGrades = fopen(nameFile, 'r');
<code block to be inserted here>
%close file
no quotes: a variable
fclose(fileGrades);
Notice that the file handle variable can be
any acceptable variable name
9
Closing Files
After working with a file, it is important to close
the file. Other than being good form, it is
critical when writing to the file.
When the OS is supposed to put information on
disk it frequently waits until it determines the
best time. This is known as "write caching".
10
Closing Files
You have seen write-caching with “safe remove"
warning on USB drives.
The OS may wait to write data. If your program
finishes and the data hasn't been written, it will
not be written at all!
Close the file before finishing the program - this
forces the OS to write the data to the disk.
11
More Examples…
% Open a file for reading and writing
fh = fopen('my_project.abc', 'r+');
Would 'w+' work also?
File name: The file extension (.abc here) is Maybe – see the next slide…
used by Windows – but only to tell it what
program should be used if the Windows users
wants to open the file.
You are free to use any extension you want
with your data files. The only impact will be
that Windows may not know what program
should use that file.
12
Opening Files:
Super-secret Access-mode Codes
The “access mode” codes indicate how you will be
using a file after you open it.
Since the operating system has permissions assigned
to files, when you request access to a file you must
tell the system in what mode you will be using the
file.
The codes used for this tell the OS what it needs to
know, and has an impact on how you will use the
file.
13
Opening Files:
File Position Pointer
When a file is opened, a “file position pointer” is
created. The system keeps track of the point in the
file to which your program has read or written.
Think of it like a cursor that moves as you read or write
the file.
The file position pointer is set initially to different
locations depending on the access mode
14
Opening Files:
Super-secret Access-mode Codes
Access Mode
Initial Position
r, r+
Top (beginning) of file
w, w+
Top (beginning) of file
a, a+
Bottom (end) of file
15
Opening Files:
Access Mode & File Existence
In addition to the file position pointer, the
system also has to decide what will happen if
the file does or does not exist when you try to
open it.
If it already exists, should the file be deleted?
If it doesn’t yet exist, should it be created?
16
Opening Files:
Access-mode & File Existence
Access Mode
Delete?
Create?
r, r+
No
No
w, w+
Yes
Yes
a, a+
No
Yes
You should be able to reason this out – memorization is not the key here!
17
Opening Files:
Choosing an access mode
A “log file” is a file that keeps a history of
events. Many programs keep log files. They
help programmers see what occurred in the
past so that a problem can be fixed.
If your program is going to keep a log file, what
is the best mode to use when opening this
file? Why?
18
Opening Files:
Choosing an access mode
You are writing a program that will manage a
database. You will be accessing files at
different times within the program, so you
decide to close and reopen the file several
times. For each of these times, how should
you open the file?
1. User wants to view a record in the database
2. User wants to modify a record in the database
3. User wants to add a record to the database
19
Writing Text Files
fprintf(<file handle>, … The rest is as usual...);
Don’t forget the semi-colon!
Otherwise, MATLAB displays in the command window a
number! fprintf() default output is how many
characters were printed.
File handle – not the file name!
Example:
fh = fopen('log_file.txt', 'a');
fprintf(fh, 'Event #%d: \t%s\n', event_num, event_description);
20
MS Windows Text files
When writing to a text file, MATLAB will write only
a single newline character to the end of a line –
yet Windows requires two different characters
there. So if you open the file in Notepad, it will
not look like you expect:
21
MS Windows Text files
There is nothing wrong with this – unless you intend
to work with the file outside of your program (and
in Windows).
To make it Windows-ready, write both a carriage
return (\r) and a newline (\n):
22
Writing Text Files
Inserting data into the middle of a text file
Writing to text files is not like working in Word!
When you write to a text file, the data added to
the file will write over any existing data in the
file after the files position pointer – there is no
“insert mode”!
23
Writing Text Files
What we think should
happen…
24
Writing Text Files
What REALLY happens…
25
Writing Text Files
To avoid this problem…
You can’t.
You must write code that moves the existing file
data so that you can insert the new data.
This might mean copying to a new file, or
looping and overwriting the old data.
26
Reading text files
Reading an entire line as a string
 including storing the new line character in the variable
 str = fgets(<file handle>);
 without storing the new line character in the variable
 str = fgetl(<file handle>);
Reading numeric data
 data = fscanf(<file handle>);
27
Using fgets()

Includes the new line character in your variable
Suppose we had this data file
And ran this program:
fh = fopen('testdata.txt', 'r');
x = fgets(fh);
fprintf('->%s<-', x);
28
Using fgets()
Notice there are TWO newlines in the variable:
29
Using fgets()
This is because Windows text files use two
characters to mark “end of line” (newline).
Most other systems only use one character.
MATLAB interprets both of these characters
as newlines.
Fortunately, it’s easy to fix.
30
Using fgets()
If you want to remove just one:
x = x(1:end-1);
If you want to get rid of BOTH characters:
x = x(1:end-2);
Or:
x = strcat(x);
Or…
31
Using fgetl()

Reads past the newline, but DOES NOT include the
newline character in your variable
32
Using fscanf()
fscanf() is like the reverse of fprintf() . You specify
the format you want to match and fscanf() will read
from the file as long as it can match that format.
fscanf() is not good for reading strings because it will
save the characters as their ASCII equivalents.
33
Using fscanf()
Suppose we had this data file:
After opening the file, you could read the
contents using:
data = fscanf(fh, '%d\t%d')
34
Using fscanf()
However, the result would be:
This demonstrates that fscanf() reads the data in
line-order, but then stores it as a column. You can
change this format using one more argument on the
function call.
35
Using fscanf()
Change the function call to:
data = fscanf(fh, '%d\t%d', [2, 3])
And you get out:
Add this argument
MATLAB is still reading the data in line-order, and still
storing the data in column-order. But we've now specified
how big the columns will be – two rows each.
36
Using fscanf()
But we may want the data to be in the form of the file. Unfortunately,
changing the third argument doesn’t help:
data = fscanf(fh, '%d\t%d', [3, 2])
Original file data:
This is because fscanf() is still filling the variable in “column-order” –
it fills a column first and then moves onto the next column.
37
Using fscanf()
To fix this, first read it in as a 2x3 matrix:
data = fscanf(fh, '%d\t%d', [2, 3])
Then transpose the matrix:
data = data'
38
Using fscanf()
But suppose we don’t know how many sets of data will be
in the file?
Use MATLAB’s inf constant. It means “as many as
needed”
data = fscanf(fh, '%d\t%d', [2, inf])
Now, if the data file gets larger, your program can still
handle it.
Will this work? data = fscanf(fh, '%d\t%d', [inf, 2])
39
Moving around within files
When reading and writing to files, the system
maintains a “file position pointer”. Think of it as a
cursor tracking your position in the file.
Every time you read from the file, the file position
pointer moves past all of the characters you have
read.
Ever time you write to the file, the file position pointer
remains immediately after the last character you
wrote.
40
Moving around within files
fseek()
Move to specific byte position within the file
frewind()
Move to beginning of file
ftell()
Return a file position pointer’s byte position (number of bytes from
beginning of file)
feof()
Returns 1 (true) if the file position pointer is at the end of the file.
Note that the file position pointer must be past any non-visible
characters (newlines, tabs, spaces, etc) for this to occur.
41
Moving around within files
fseek():
fseek(fh, 10, 'bof');
fseek(fh, -22, 'eof');
fseek(fh, 0, 'cof');
# of BYTES – not characters!
22 bytes BEFORE end-of-file
'bof' = Beginning of File
'eof' = End of File
'cof' = Current position of File
42
Moving around within files
fseek(fh, 0, 'cof');
Why would we want to move 0 bytes from the current
position?
Because there is a (frequently unmentioned) property of
files: You cannot read from and then write to a file (or
write to and then read from a file) without an intervening
setting of the file position pointer. The command above
sets the file position pointer without moving it.
43
Moving around within files
Example: Suppose testfile.txt exists already. We
want to find a location within the file, and then write to
the file.
Doesn’t work:
Works:
fh = fopen('testfile.txt', 'r+');
. . .
x = fgets(fh);
fprintf(fh, 'fred');
fh = fopen('testfile.txt', 'r+');
. . .
x = fgets(fh);
fseek(fh, 0, 'cof');
fprintf(fh, 'fred');
fclose(fh);
fclose(fh);
44
Moving around within files
frewind(fh)
Essentially the same as fseek(fh, 0, 'bof')
ftell(fh)
Returns the byte position within the file.
Example:
p = ftell(fh);
...
fseek(fh, p, 'bof')
CAUTION: Byte positions depend on the format of the file – do not assume
that a byte and a character are the same thing!
45
Moving around within files
feof() – normally used as a condition
fh = fopen('datafile.txt', 'r');
data = [];
while (fh>0 && ~feof(fh))
s = fgetl(fh);
data = strvcat(data, s);
end
fclose(fh);
What does this mean?
Note that the order of these
boolean expressions is important –
we want to test for a valid file
handle before we use it in the
feof() function call
46
EXTRA: Binary Files
(not on any exam…)
Many programs today do not use ASCII text for
their files.
ASCII is great for being able to read the data
file, but it can make the file unwieldy.
As an alternative, files can be stored with
"binary data". The data stored is not
intended to be read as ASCII.
47
Binary Files
Example usage of binary files:
- Image formats
- Audio files
- Sensor data
- Real-time data processing
- Encrypted data
48
Binary Files
For example – part of a JPEG file (as viewed in
Notepad):
49
Binary Files
In order to work with binary files, a new tool is
handy: the "hex editor"
A hex editor will show you the binary values
stored in a file, but in a form humans find
usable.
50
Binary Files
51
Binary Files
Just as with ASCII files, the format of the file
must be known in order to work with it.
Once you know the format, you can read and
write to the file – but first you must open it in
"binary" mode. In Windows, just add a "b" to
the access mode:
fh = fopen('myfile.bin', 'rb+');
52
Binary Files
Reading from the file is a bit different.
fscanf() has no placeholder for binary
data!
So, we use the fread() function:
% Read 1000 bytes from the file
data = fread(fh, 1000)
53
Binary Files
Writing binary files uses the fwrite()
function:
Here's an example using hardcoded data:
fwrite(fh, [1, 2, 3; 4, 5, 6])
Usually, we will write data using variables.
54
Binary Files
As always…
F1
55
Download