20130409_Files_Low_Level

advertisement
FILE I/O: Low-level
1.
2.
3.
4.
5.
General Ideas. High vs. low-level
Opening a file
Closing a file
Writing data to the file
Reading data from the file (numerical,
strings.. Etc)
1
1. High vs low level
High level
Low level
• “1 line of code opens the file,
reads/write the data, closes the
file”
• Requires at least 3 lines!
clc
clear
%upload data from file
data = dlmread(‘myfile.dat’);
%ready to analyze the data..
clc
clear
%open the file
__ = fopen(_________);
%grab the data ‘properly’
______(a lot of options)_____
%close the file
fclose(_____);
%ready to analyze the data…
2
1. Low-Level, cont.
• Some files are mixed format that are not readable by high-level
functions such as xlsread() and dlmread()!
• Since the data is not easily recognized by any high-level function,
every step to read the file requires a separate MATLAB
command:
1. Open a file, either to read-from or to write-to
2. Read or write data from/to the file with the specific delimiters
3. Close the file
3
The order of today..
1.
2.
3.
4.
5.
6.
first, learn the fopen() command
then, learn the fclose() command
dump data to a file - fprintf()
read string data - fgets() and fgetl()
read numerical data -fscanf()
read combination of numerical and string - textscan()
4
2. Opening / Closing Files
• "Open a file"
– Program requests access to a file (from the OS) for reading and/or
writing data to/from the file.
• "Close a file"
– Inform the OS that the program is finished working with the file.
5
2. Opening a file
• There are many syntaxes possible (from the doc file):
• The most commonly used ones (in EGR115):
Description
fileID = fopen(filename); %opens the file filename for read access, and returns an
integer file identifier.
fileID = fopen(filename, permission); %opens the file with the specified permission.
6
2. Opening a file
• Syntax to open a file
fileID = fopen( filename , permission );
– fileID is known as a file identifier. It is like a "nickname" and is used
instead of the filename when actually working with the file.
– filename represents a string that is the name of the file with its
extension (the letters after the dot). It can either be hardcoded, or
within a variable
– permission is a string that describes the type of access for the file
>> why is this needed?
7
2. Opening Files: permission
• The “permission” indicates how MATLAB will use the file after
being opened.
• Since the operating system has permissions (read-only, writeonly) assigned to files, when you request access to a file you
must tell the system in what mode you will be using the file.
• The strings used for this tell the OS what it needs to know, and
has an impact on how MATLAB will use the file.
8
2. Opening Files: permission
• The most commonly used strings are
‘r’ Open file for reading (default).
‘w’ Open or create new file for writing. Discard existing contents, if
any.
‘a’ Open or create new file for writing. Append data to the end of
the file.
• There are MANY more
>> doc fopen <enter>
9
2. Opening Files: File Position Pointer
• What happens when MATLAB opens a file to read or write?
– When a file is opened, a “file position pointer” is created. The system
keeps track of the point in the file to which your program has read or
written.
– Think of it like a cursor that moves as you read or write the file.
– The file position pointer is set initially to different locations depending
on the permission granted.
10
2. Opening Files: File Position Pointer
• Depending on the access-mode, does a file get “wiped” or
not, “created” or not?
• You should be able to reason this out – memorization is not
the key here!
Access Mode
Delete Content?
Create File?
r
w
a
This is the only ‘tricky’ one.
Think: ‘w’=write=wipe.
11
2. Opening Files: Choosing a permission
• Trivia
A “log file” is a file that keeps a history of events. Many programs keep
log files. They help programmers see what occurred in the past so that a
problem can be fixed.
For example, swiping for attendance creates a log-file.
If your program is going to keep a log file, what is the best mode
to use when opening this file? Why?
12
2. Closing Files
• Syntax:
fclose(fid); %usually ignore the return value
• After working with a file, it is important to close the file. Other than
being good form, it is critical when writing to the file.
• Remember this?
"Safe remove" warning on USB drives.
– When the OS is supposed to put information on disk it frequently waits
until it determines the best time. This is known as "write caching".
– Windows may wait to write data. If your program finishes and Windows
hasn't written this data, it will not be written at all!
>> Closing the file forces Windows to write the data to the disk.
13
2. Examples of opening and closing
% Example 1) open a file from which to read
fileGrades = fopen('grades.txt', 'r'); %hardcode the filename
<code block to be inserted here>
%close file
Use the file handle – not the file name!
fclose(fileGrades);
% Example 2) ask user for a filename, then open it to read
nameFile = input('Name of file with grades? (e.g. grades.txt): ', 's');
fileGrades = fopen(nameFile, 'r');
<code block to be inserted here>
%close file
no quotes: a variable
fclose(fileGrades);
Notice that the file handle variable can be
any acceptable variable name
14
3. Writing to Text Files
fprintf(<file handle>, … The rest is as usual...);
Don’t forget the semi-colon!
Otherwise, MATLAB displays in the command window a number!
fprintf() default output is how many characters were
printed.
Example:
fh = fopen('log_file.txt', 'a');
for k = 1:nbEvents
fprintf(fh,'Event #%d: %15s
%s\n',k,events{k,1},events{k,2});
end
fclose(fh);
File handle – not the file name!
15
3. MS Windows Text files
• When writing to a text file, MATLAB will write only a single
newline character to the end of a line.
• Yet, a Windows software (like Notepad) requires two
different characters at the end of a line.
• If you choose to open the file in a Windows based software,
pad, it will not look like you expect:
• But if you open with WordPad…
16
3. MS Windows Text files
• There is nothing wrong with this – unless you intend to work
with the file outside of your program (and in Windows).
• To make it Windows-ready, write both a carriage return (\r)
and a newline (\n):
17
3. Writing Text Files
Inserting data into the middle of a text file
Writing to text files is not like working in Word!
When you write to a text file, the data added to the file will write
over any existing data in the file after the files position pointer
– there is no “insert mode”!
18
3. Writing Text Files
What we think should
happen…
19
3. Writing Text Files
What REALLY happens…
20
3. Writing Text Files
There is no quick-fix to this problem.
You must write code that moves the existing file data so that you
can insert the new data. This might mean copying to a new
file, or looping and overwriting the old data.
21
4. Reading text files
• There are many ways to read from a file, due to the infinite
possibility of its content.
–
–
–
–
–
Numbers? (Remember to use dlmread() if there is ONLY nbs!)
Strings?
Numbers and strings?
Pattern?
No pattern?
• Therefore, there are many built-in functions, and ALL can be
used in combinations, repeatedly, within a loop… to make it
work!
22
4. Reading files
• Examples of files
requiring lowlevel functions
• It’s all about moving
that cursor from top to
bottom, and grabbing
the data as you go!
23
4. Reading text files - Strings
• Reading an entire line as a string
– including storing the new line character in the variable
str = fgets(<file handle>);
– without storing the new line character in the variable
str = fgetl(<file handle>);
>> Both function calls above move the cursor down to the next line (you can
use this to “skip” lines by ignoring its return value!)
24
Using fgets()
• Includes the new line character in your variable
Suppose we had this data file (.txt file)
And ran this program:
fh = fopen('testdata.txt', 'r');
x = fgets(fh);
fprintf(‘->%s<-', x); %old fprintf
Notice there are TWO newlines in the variable:
l
i
n
e
1
\r
\n
25
Using fgets()
• The important idea though is that MATLAB moved the cursor
past the first line. Since the file has not been closed, MATLAB
is ready to scan the second line!
• Use another fgets() to grab the next
line..
• Or write a for loop to do repeat the
fgets()
How can we get rid of the \r\n ?
26
Using fgetl()
• Reads past the newline, but DOES NOT include the newline
character in your variable
And ran this program:
fh = fopen('testdata.txt', 'r');
x = fgetl(fh);
fprintf('->%s<-', x) %old fprintf
27
4. Reading text files – Numerical data
• Assume you’ve moved the cursor PAST the first line in the
following file, using fgets() previously mentioned
Note: dlmread() is
impossible since we have
characters at top!
• Which function can be used to read numerical data (easily)?
28
4. Reading text files – Numerical data
fscanf() is like the reverse of fprintf(). Specify the
format you want to match and fscanf() will read from the
file as long as it can match that format.
fscanf() is not good for reading strings because it will save
the characters as their ASCII equivalents.
Returns 1 numerical array
(2D if necessary).
That’s why it doesn’t like
strings, usually of different
length!
29
4. Using fscanf()
So:
After opening the file, and moving the cursor past the first line (using
fgets() for example), read the contents using:
%open file (read by default)
fh = fopen('example.txt');
%move cursor past first line (ignore return value)
fgets(fh);
%load the numerical data
data = fscanf(fh, '%d\t%d')
30
4. Using fscanf()
However, the result would be:
This demonstrates that fscanf() reads the data in line-order,
but then stores it as a column. Change this format using one
more argument on the function call.
31
4. Using fscanf()
Change the function call to:
data = fscanf(fh, '%d\t%d', [2, 3])
The return-value collected is now:
Add this 3rd
argument
MATLAB is still reading the data in line-order, and still storing the
data in column-order, but we've now specified how big the
columns will be – two rows each.
32
4. Using fscanf()
We may want the data to be in the form of the file. Unfortunately, changing
the third argument doesn’t help:
data = fscanf(fh, '%d\t%d', [3, 2])
• Original file data:
• This is because fscanf() is still filling the variable in “column-order” –
it fills a column first and then moves onto the next column.
33
4. Using fscanf()
To fix this, first read it in as a 2x3 matrix:
data = fscanf(fh, '%d\t%d', [2, 3])
Then transpose the matrix:
data = data'
34
4. Using fscanf()
Usually, combine all in one line:
data = fscanf(fh, '%d\t%d', [2, 3])’
35
4. Using fscanf()
• Suppose the number of lines in the files is unknown, or more
importantly is constantly updated!
• Use MATLAB’s inf constant (infinity). It means “as many as
needed”
data = fscanf(fh, '%d\t%d', [2, inf])’;
• Now, if the data file gets larger, the program can still handle it.
Will this work? data = fscanf(fh, '%d\t%d', [inf, 2])
36
fscanf() CAUTION
• There isn’t a way to read only UP TO the columns wanted.
For example: there are 10 columns in the file, and all you need is
columns 3 and 7. Too bad..
Don’t do this:
Data = fscanf(fid,’%f %d %f %f %d %d %d’,[7, inf]);
That will move the cursor past the 7th column, then start scanning from
there again!!
Do this:
Data = fscanf(fid,’%*f %*d %f %*f %*d %*d %d %*f %*f
%*f’,[2, inf]); %the * tells MATLAB to read past this
part without storing it within Data. BUT THERE ARE 10
PLACEHOLDERS
fix that on your slides
37
5. Reading files – Strings & Numbers
• Assume the following file:
• Again, move past the first line using fgets() or fgetl(), then how
should we grab both integers, and strings?
– Remember fscanf() is NOT friendly, it will return ASCII values!
38
5. Using textscan()
• textscan() is similar to fscanf() but is friendly to
strings and numbers.
There is still 1 return value, but this time it is a
cell-array (capable of having strings and
numbers of any size!)
39
5. Using textscan()
• Assume this updated data file:
• After opening the file, and moving the cursor past the first
line (using fgets() for example), read the contents using:
data = textscan(fid,'%d %s %d')
40
5. Using textscan()
• The return value is not a 2D cell-array, but rather 1 single row:
• To extract the data, simply reference the one cell you’re
interested in, using { }. For example:
allNames = _________________
allGrades = _________________
allIDs = _________________
41
5. Using textscan()
• CAUTION: fscanf() and textscan() are extremely
picky when it comes to the format string. Even 1 extra space
or a missing character could throw MATLAB off.
• Consider this file:
data=textscan(fid,‘%s %d %d %d AM’); %would not work
data=textscan(fid,‘%s %d %d:%d AM’); %would work
data=textscan(fid,‘%s %d %d:%d %s’); %would work
42
5. Using textscan() – option1
• HOWEVER, YOU (programmer) have much much more control
over how the information is taken.
data=textscan(fid,'%s %d %d:%d AM')
4 placeholders
=
4 columns
43
5. Using textscan() – option2
• Note: with low-level, YOU (programmer) have much much
more control over how the information is taken.
data=textscan(fid,'%s %d %d:%d %s')
5 placeholders
=
5 columns
44
5. Using textscan() – option3
• Note: with low-level, YOU (programmer) have much much
more control over how the information is taken.
data=textscan(fid,'%s %d %s AM')
3 placeholders
=
3 column
45
textscan() CAUTION
• There isn’t a way to read only UP TO the columns wanted.
For example: there are 10 columns in the file, and all you need is
columns 3 and 7. Too bad..
Don’t do this:
Data = textscan(fid,’%s %d %s %s %d %d %d’);
That will move the cursor past the 7th column, then start scanning from
there again!!
Do this:
Data = textscan(fid,’%*s %*d %s %*s %*d %*d %d %*f %*f
%*f’); %the * tells MATLAB to read past this part
without storing it within Data. BUT THERE ARE 10
PLACEHOLDERS
46
So far.. No loop
• fscanf() and textscan() scan and repeat the format
string. They stop when the pattern no longer matches.
• fgets() and fgetl() do not. They scan 1 line only. You
may have to write a loop to make it more efficient!
%move cursor past the 11th line (skip all first 11 lines)
for k = 1:11
fgets(fid); %ignore string returned
end
47
Try all ideas at home!
• Assume this file, scan the data!
• Read the file to filter the actual lines of captions. Rewrite the
lines of caption in a new separate file!
48
Key Ideas: Low Level Functions
• used when a mixture of data is in the file
• always require the use of fopen(), and fclose()
– fopen() has mainly 3 permission mode: read, write, append.
• There are many functions out there! Out of those seen:
– Some are good with strings
fgets()
fgetl()
– Some are good with numbers
fscanf(), textscan() – returns a numerical array
– Some are good with strings & numbers
textscan()
- returns 1 row cell-array
• Note: rarely is the actual name of the file used, besides on the
fopen() call. All other functions require the file identifier.
49
Don’t remember these by heart
As always…
F1
50
Download