Computer Science 121 Scientific Computing Winter 2012 Chapter 5 Files and Scripts Files and Scripts ● ● File (non-technical): (Word) document, image, recording, video, etc. File (technical): a named collection of bytes on disk. ● ASCII vs. Binary ● ● “ASCII file” means “file that can be viewed as text by a program (Notepad) that interprets each byte as an ASCII code”. Binary file is anything that cannot be viewed that way ● ● ● ● “JPEG file” means “file that can be viewed as an image by using a program (Photoshop) that interprets the bytes as JPEG-encoded image. “MP3 File” means “file that can be watched/heard as a video/audio recording by using a program that interprets the bytes as an MP3encoded video / audio stream”. “Foo File” means “file whose contents can be experienced by using a program that interprets the bytes as a Foo encoding”. XML (eXtensible Markup Language) is an attempt to compromise between binary and ASCII: make all data human-readable 5.1 Filenames ● ● ● General format: name . extension For historical reasons, extension is usually three characters. Extension tells OS what program to use to open file (MS Word, Excel, Matlab, ...) Aside: File Deletion ● Q.: What happens when you “delete” a file? sort.m foo.m OMFG.jpg hamlet.doc ● 011010 110101 000100 111011 (Drag OMFG.jpg to trash and empty trash…) Aside: File Deletion ● A.: What appears to happen... foo.m sort.m hamlet.doc 011010 110101 111011 Aside: File Deletion ● A.: What actually happens ... foo.m sort.m 011010 110101 000100 hamlet.doc ● 111011 Then use WinUnDelete (e.g.) to get back OMFG.jpg Directory Structure ● ● ● Directories (folders) are organized hierarchically (one inside another) So we are forced to choose a single organization method (like library with card catalog indexed only by author) But we can use links (shortcuts) to add additional organization, without copying files. Pathnames ● Pathname is “full name” of directory in a linear form – e.g., C:\MyDocuments\cs121\myproj\new\ ● Complete filename includes path – e.g., C:\MyDocuments\cs121\myproj\new\myprog.m ● This becomes important because of the ... Working Directory >> pwd % print working directory ans = C:\MATLAB\work ● Without extra effort, we can only access files in our working directory >> myprog % run myprog.m script ERROR: myprog? LOL!! Working Directory ● Solutions ● ● ● Make shortcuts from working directory (annoying) >> cd('C:\MyDocuments\cs121\myproj\new\') >> myprog ERROR: Can't find someOther.m… loser! Use Matlab File menu to add paths: File / Set Path... Set Path How Matlab Uses Paths When we type a name foo into the interpreter, Matlab follows this sequence: ● 1. Looks for foo as a variable. If not found, ... 2. Looks in the current directory for a file named foo.m. If not found, ... 3. Searches the directories on the MATLAB search path, in order, for foo.bi (built-in function) or foo.m. If not found, ... 4. Reports ERROR 5.2 File operators ● ● File write/read operators allow us to save/restore values from previous Matlab sessions. File / Save Workspace As... is simplest way to do this – saves everything to a .mat file ● If we want to save/restore specific variables, we can use the save and load commands: 5.2 File operators >> >> >> >> >> a = 'foo'; b = 2; c = pi; save myvariables a b clear load myvariables who Your variables are: a b –I never use the other syntax ( >> save('myvariables', 'a', 'b' ) 5.3 Importing and Exporting Data • Often want to get data from other programs (Excel, LabView, text editor) into Matlab, and save data in a format that other programs can read. • Excel saves data in binary, proprietary (of course!) .xls format 5.3 Importing and Exporting Data • Generally, other formats will all be textbased (ASCII) –.csv : comma-delimited values (no commas in vals) –.dlm : other delimiter (allows commas in vals) –.xml : eXtensible Markup Language (newer) Spreadsheet data should have all cells filled (“flat format”), or Matlab will get confused: YES NO 5.3 Importing and Exporting Data csvread operator allows us to read numerical data, but we need to cut off the header in the file: Remove it by hand from the file: >> d = csvread('sunspots-noheader.csv'); Specify # of lines to cut ignore in cvsread: >> d = csvread('sunspots.csv', 1); % ignore first line 5.3 Importing and Exporting Data >> d = csvread('sunspots.csv', 1) d = 1749 1 58 1749 2 62.6 1749 3 70 etc. 5.3 Importing and Exporting Data importdata command is useful for heterogeneous data. ● Returns a data structure: ● >> d = importdata('sunspots.csv') d = data: [2820x3 double] textdata : {'Year', 'Month', ... colheaders : {'Year', 'Month', ... Non-numerical ASCII Files txt files : anything we want to treat as text (ASCII characters) • >> >> >> >> fid = fopen('mobydick.txt'); s = fread(fid); fclose(fid) s s = 32 67 97 ... % need to munge this Non-numerical ASCII Files >> s = char(s') % transpose, textify ans = Call me Ishmael. Some years agonever mind how long precisely -having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.... textread does this for us, and tokenizes words into cell array: >> s = textread('mobydick.txt‘, ‘%s’) s = {‘Call’, ‘me’, ‘Ishmael.’, … Treat as strings 5.4 Scripts ● ● You know most of this stuff already ☺ You can run a script (e.g., myprog.m) from the interpreter: >> myprog ● Tips − Don't name any variables myprog − Don't use any blank spaces in script names − Re-read search path stuff from a few pages back 5.5 Scripts as Computations Scripts are (mostly) like typing directly into the interpreter – so variables can get overwritten ● This also means that there is no ans value: ● >> x = myprog ERROR: loser trying to execute SCRIPT myprog as a program. ● Nor can we pass arguments: >> myprog(7) ERROR: My name is Donnie, and you suck at Matlab.