FILE I/O: Low-level 1 The Big Picture 2 Reading vs. Writing "Reading a file" To obtain data from a file (typically on a disk) and move a copy of it into RAM "Writing a file" To copy data from RAM into a file (typically on a disk). 3 Why Use Files? Volatile Memory Computers today use “volatile RAM” – contents get erased when the power goes out. Someday, computers will use “non-volatile RAM” (e.g. flash RAM). Until then, we make use of “secondary storage” (hard drive) to save information more permanently. Packaging Putting information in a file gets it organized and keeps it together Too much data Frequently we have more information available than we can work with in main memory (RAM). So we store most of it on the hard drive (secondary storage) and retrieve only what we need. 4 File Applications Databases Logs – attendance, events, history Journals / Diaries Address books Sensor data Shuttle ECO sensor Documents Automotive O2 sensor Almost every program uses files! 5 Low-Level, cont. Some files are mixed format that are not readable by highlevel functions such as xlsread() Since the data is not easily recognized by the high-level function, every step to read the file requires a separate MATLAB command: 1. 2. 3. Open a file, either to read-from or to write-to Read or write data from/to the file with the specific delimiters Close the file 6 Opening / Closing Files "Open a file" Program requests access to a file (from the OS) for reading and/or writing data to/from the file. "Close a file" Inform the OS that the program is finished working with the file. 7 Opening and closing a file Template to open a file fh = fopen(<filename>,<mode>); fh is known as a file handle or file identifier. It is used in future function calls to identify this is the file to use. It is like a "nickname" and we will use it instead of the filename when working with the file. <filename> represents a string that is the name of the file with its extension (the letters after the dot). It can either be hardcoded, or within a variable <mode> is a string specifying the purpose of opening the file – called the "access mode". The most commonly used are 'r' to read only from a file – bring data into memory from the disk 'w' to write to a file – put data from memory onto the disk 'a' to append to a file (add data to the end of the file) Add + to the string to combine reading and writing (e.g. 'r+', 'w+') Template to close a file fclose(fh); 8 Some examples… % Example 1) open a file from which to read fileGrades = fopen('grades.txt', 'r'); %hardcode the filename <code block to be inserted here> %close file Use the file handle – not the file name! fclose(fileGrades); % Example 2) ask user for a filename, then open it to read nameFile = input('Name of file with grades? (e.g. grades.txt): ', 's'); fileGrades = fopen(nameFile, 'r'); <code block to be inserted here> %close file no quotes: a variable fclose(fileGrades); Notice that the file handle variable can be any acceptable variable name 9 Closing Files After working with a file, it is important to close the file. Other than being good form, it is critical when writing to the file. When the OS is supposed to put information on disk it frequently waits until it determines the best time. This is known as "write caching". 10 Closing Files You have seen write-caching with “safe remove" warning on USB drives. The OS may wait to write data. If your program finishes and the data hasn't been written, it will not be written at all! Close the file before finishing the program - this forces the OS to write the data to the disk. 11 More Examples… % Open a file for reading and writing fh = fopen('my_project.abc', 'r+'); Would 'w+' work also? File name: The file extension (.abc here) is Maybe – see the next slide… used by Windows – but only to tell it what program should be used if the Windows users wants to open the file. You are free to use any extension you want with your data files. The only impact will be that Windows may not know what program should use that file. 12 Opening Files: Super-secret Access-mode Codes The “access mode” codes indicate how you will be using a file after you open it. Since the operating system has permissions assigned to files, when you request access to a file you must tell the system in what mode you will be using the file. The codes used for this tell the OS what it needs to know, and has an impact on how you will use the file. 13 Opening Files: File Position Pointer When a file is opened, a “file position pointer” is created. The system keeps track of the point in the file to which your program has read or written. Think of it like a cursor that moves as you read or write the file. The file position pointer is set initially to different locations depending on the access mode 14 Opening Files: Super-secret Access-mode Codes Access Mode Initial Position r, r+ Top (beginning) of file w, w+ Top (beginning) of file a, a+ Bottom (end) of file 15 Opening Files: Access Mode & File Existence In addition to the file position pointer, the system also has to decide what will happen if the file does or does not exist when you try to open it. If it already exists, should the file be deleted? If it doesn’t yet exist, should it be created? 16 Opening Files: Access-mode & File Existence Access Mode Delete? Create? r, r+ No No w, w+ Yes Yes a, a+ No Yes You should be able to reason this out – memorization is not the key here! 17 Opening Files: Choosing an access mode A “log file” is a file that keeps a history of events. Many programs keep log files. They help programmers see what occurred in the past so that a problem can be fixed. If your program is going to keep a log file, what is the best mode to use when opening this file? Why? 18 Opening Files: Choosing an access mode You are writing a program that will manage a database. You will be accessing files at different times within the program, so you decide to close and reopen the file several times. For each of these times, how should you open the file? 1. User wants to view a record in the database 2. User wants to modify a record in the database 3. User wants to add a record to the database 19 Writing Text Files fprintf(<file handle>, … The rest is as usual...); Don’t forget the semi-colon! Otherwise, MATLAB displays in the command window a number! fprintf() default output is how many characters were printed. File handle – not the file name! Example: fh = fopen('log_file.txt', 'a'); fprintf(fh, 'Event #%d: \t%s\n', event_num, event_description); 20 MS Windows Text files When writing to a text file, MATLAB will write only a single newline character to the end of a line – yet Windows requires two different characters there. So if you open the file in Notepad, it will not look like you expect: 21 MS Windows Text files There is nothing wrong with this – unless you intend to work with the file outside of your program (and in Windows). To make it Windows-ready, write both a carriage return (\r) and a newline (\n): 22 Writing Text Files Inserting data into the middle of a text file Writing to text files is not like working in Word! When you write to a text file, the data added to the file will write over any existing data in the file after the files position pointer – there is no “insert mode”! 23 Writing Text Files What we think should happen… 24 Writing Text Files What REALLY happens… 25 Writing Text Files To avoid this problem… You can’t. You must write code that moves the existing file data so that you can insert the new data. This might mean copying to a new file, or looping and overwriting the old data. 26 Reading text files Reading an entire line as a string including storing the new line character in the variable str = fgets(<file handle>); without storing the new line character in the variable str = fgetl(<file handle>); Reading numeric data data = fscanf(<file handle>); 27 Using fgets() Includes the new line character in your variable Suppose we had this data file And ran this program: fh = fopen('testdata.txt', 'r'); x = fgets(fh); fprintf('->%s<-', x); 28 Using fgets() Notice there are TWO newlines in the variable: 29 Using fgets() This is because Windows text files use two characters to mark “end of line” (newline). Most other systems only use one character. MATLAB interprets both of these characters as newlines. Fortunately, it’s easy to fix. 30 Using fgets() If you want to remove just one: x = x(1:end-1); If you want to get rid of BOTH characters: x = x(1:end-2); Or: x = strcat(x); Or… 31 Using fgetl() Reads past the newline, but DOES NOT include the newline character in your variable 32 Using fscanf() fscanf() is like the reverse of fprintf() . You specify the format you want to match and fscanf() will read from the file as long as it can match that format. fscanf() is not good for reading strings because it will save the characters as their ASCII equivalents. 33 Using fscanf() Suppose we had this data file: After opening the file, you could read the contents using: data = fscanf(fh, '%d\t%d') 34 Using fscanf() However, the result would be: This demonstrates that fscanf() reads the data in line-order, but then stores it as a column. You can change this format using one more argument on the function call. 35 Using fscanf() Change the function call to: data = fscanf(fh, '%d\t%d', [2, 3]) And you get out: Add this argument MATLAB is still reading the data in line-order, and still storing the data in column-order. But we've now specified how big the columns will be – two rows each. 36 Using fscanf() But we may want the data to be in the form of the file. Unfortunately, changing the third argument doesn’t help: data = fscanf(fh, '%d\t%d', [3, 2]) Original file data: This is because fscanf() is still filling the variable in “column-order” – it fills a column first and then moves onto the next column. 37 Using fscanf() To fix this, first read it in as a 2x3 matrix: data = fscanf(fh, '%d\t%d', [2, 3]) Then transpose the matrix: data = data' 38 Using fscanf() But suppose we don’t know how many sets of data will be in the file? Use MATLAB’s inf constant. It means “as many as needed” data = fscanf(fh, '%d\t%d', [2, inf]) Now, if the data file gets larger, your program can still handle it. Will this work? data = fscanf(fh, '%d\t%d', [inf, 2]) 39 Moving around within files When reading and writing to files, the system maintains a “file position pointer”. Think of it as a cursor tracking your position in the file. Every time you read from the file, the file position pointer moves past all of the characters you have read. Ever time you write to the file, the file position pointer remains immediately after the last character you wrote. 40 Moving around within files fseek() Move to specific byte position within the file frewind() Move to beginning of file ftell() Return a file position pointer’s byte position (number of bytes from beginning of file) feof() Returns 1 (true) if the file position pointer is at the end of the file. Note that the file position pointer must be past any non-visible characters (newlines, tabs, spaces, etc) for this to occur. 41 Moving around within files fseek(): fseek(fh, 10, 'bof'); fseek(fh, -22, 'eof'); fseek(fh, 0, 'cof'); # of BYTES – not characters! 22 bytes BEFORE end-of-file 'bof' = Beginning of File 'eof' = End of File 'cof' = Current position of File 42 Moving around within files fseek(fh, 0, 'cof'); Why would we want to move 0 bytes from the current position? Because there is a (frequently unmentioned) property of files: You cannot read from and then write to a file (or write to and then read from a file) without an intervening setting of the file position pointer. The command above sets the file position pointer without moving it. 43 Moving around within files Example: Suppose testfile.txt exists already. We want to find a location within the file, and then write to the file. Doesn’t work: Works: fh = fopen('testfile.txt', 'r+'); . . . x = fgets(fh); fprintf(fh, 'fred'); fh = fopen('testfile.txt', 'r+'); . . . x = fgets(fh); fseek(fh, 0, 'cof'); fprintf(fh, 'fred'); fclose(fh); fclose(fh); 44 Moving around within files frewind(fh) Essentially the same as fseek(fh, 0, 'bof') ftell(fh) Returns the byte position within the file. Example: p = ftell(fh); ... fseek(fh, p, 'bof') CAUTION: Byte positions depend on the format of the file – do not assume that a byte and a character are the same thing! 45 Moving around within files feof() – normally used as a condition fh = fopen('datafile.txt', 'r'); data = []; while (fh>0 && ~feof(fh)) s = fgetl(fh); data = strvcat(data, s); end fclose(fh); What does this mean? Note that the order of these boolean expressions is important – we want to test for a valid file handle before we use it in the feof() function call 46 EXTRA: Binary Files (not on any exam…) Many programs today do not use ASCII text for their files. ASCII is great for being able to read the data file, but it can make the file unwieldy. As an alternative, files can be stored with "binary data". The data stored is not intended to be read as ASCII. 47 Binary Files Example usage of binary files: - Image formats - Audio files - Sensor data - Real-time data processing - Encrypted data 48 Binary Files For example – part of a JPEG file (as viewed in Notepad): 49 Binary Files In order to work with binary files, a new tool is handy: the "hex editor" A hex editor will show you the binary values stored in a file, but in a form humans find usable. 50 Binary Files 51 Binary Files Just as with ASCII files, the format of the file must be known in order to work with it. Once you know the format, you can read and write to the file – but first you must open it in "binary" mode. In Windows, just add a "b" to the access mode: fh = fopen('myfile.bin', 'rb+'); 52 Binary Files Reading from the file is a bit different. fscanf() has no placeholder for binary data! So, we use the fread() function: % Read 1000 bytes from the file data = fread(fh, 1000) 53 Binary Files Writing binary files uses the fwrite() function: Here's an example using hardcoded data: fwrite(fh, [1, 2, 3; 4, 5, 6]) Usually, we will write data using variables. 54 Binary Files As always… F1 55