Uploaded by Chen Xiang

KAUST ARAMCO summer2022 Lecture6 fileIO

advertisement
Practical tools for Machine Learning
Programming with Python
Lecture 6: File I/O
Malek Smaoui
So far …
• All input is entered by a user in the keyboard
• Use input() function
• All output is displayed on the screen and can only be read by the user
at that point
• Use print() function
• Implications:
• Human-application interaction is significant
• Long term storage of results is not possible
More realistically …
• Consider an application performing geophysical simulation to detect the presence
of oil in the ground
• The application requires as input “imagery” data of multiple ground layers over large areas
• The application will produce an estimation of the distribution of oil in the different portions
• Current IO mode not sustainable for applications requiring significant amount of
input data and producing significant amount of outputs
• Entering input data to the keyboard can take a very long time or impossible due to tricky
format
• The data most probably already exists on the computer as a digital file
• The user may not be able to consult and use all output as soon as they are on screen in one
sitting
• It is important to keep records of multiple simulations for comparison, archiving, etc …
Files
• Data can be stored “permanently” on secondary storage of computer
systems
• The basic unit of long term data storage is the file
• The file system is the part of the computer’s operating systems
responsible for managing files on the different secondary storage
devices (local hard drive, network drive, removable drive, …)
File IO
• Programs can communicate with the file system to
• Create new files on a drive
• Read from existing files
• Modify existing files
• A program can:
• read input from one or multiple files
• Write output to one or multiple files
• Write to and read from temporary files at different stages of solving the
problem
• A set of objects and functions are available to embed these
operations in programs
Accessing a file via file object
• For the program point of view a file can be abstracted as a stream of bytes
• Simplest files are (raw) text files where each byte is interpreted as a character
• The function open(filename[, mode, [buffering]]) returns an
object that is an abstraction (representation) of a file
• The file is designated by its name (if it is in the working directory) or path only at
opening
• Once the object is created, it can be used to:
• Read the existing stream of characters using for instance the read(…) method
• Write to the stream of character at a specific positions using for instance
write(…) method
• “read” and “write” operation can only be done on a file object obtained
by an “open” operation
Opening a file
• file = open(filename[, mode, [buffering]])
• Filename: a string containing the name or path of the file
• Mode: an optional string specifying whether the file is to be
opened for reading, writing or appending
• Buffering: takes the values 0 or 1 and specifies whether the writing
happens immediately or delayed until flushing or closing the
stream
Open mode
mode
Description: open for
‘r (‘rb’)
Reading: OK ; Writing => error
File exists: open it ; File does not exist => error
Initial I/O position: beginning
default mode
‘w’(‘wb’)
Reading => error ; Writing: OK
File exists: open and empty ; File does not exist: create new empty
Initial I/O position: 0 because file initially empty
‘a’ (‘ab’)
Reading => error ; Writing: OK
File exists: open as is ; File does not exist: create new empty
Initial I/O position: end ; ALWAYS write at the end of the file (append)
‘[m]+’ where m is any of
the above
Both reading and writing or appending;
File existence and initial position for I/O rules apply as above;
b: for read/write in binary mode (not text)
Basic reading and writing
• s = file.read([size])
• Returns a string of length size read
from the file
• If the number of characters until
the end of the file n < size, then
the returned string if of length n.
• If size is not specified or negative,
returns a string with all the
characters in the file until the endof-file (eof)
• file.write(content)
• content must be a string
Closing a file
• Before a program terminates or when no more reading or writing is
needed, file should be closed using file.close()
• Closing the file is important: it makes sure that I/O operations
complete safely once the program terminates.
• Once a file object is closed, it can be reused to open another file
(associated with another file)
• Reusing a file object to open a new file, without closing the one
currently associated with it results in:
• Losing access to the currently associated file
• Previous read/write operations may not complete properly
Exercises
1. Write a program where you copy the contents of “input.txt” in the
file “output.txt”
2. Write program which reads a set of integers from a file then
appends to it their sum
Accessing files on the file system
• Acquiring access to specific file
requires knowledge of its location
in the file system (disk)
• Most file systems organize files in a
hierarchy or tree
• The root directory of the tree is the
storage device
• Many branches / internal nodes
represent the different subdirectories
• Files are the leaves of the tree
• The location of a file is specified via
its path
Absolute vs relative path
• Absolute path: a slash separated sequence of directories starting at
the file system root and specifying the hierarchy of a given file (or
directory)
• Working directory: by default is the directory where the module exists
• Relative path: a slash separated sequence of directories starting at
the working directory and specifying the hierarchy to a given file
• Files in the working directory can be opened via their filename only
• The filename is indeed the relative path
• Files in different directories can be opened via their absolute path or
their relative path
Read/write cursor
• Determines at which character (position) the next read/write operation will start
• Set to initially to 0 (beginning of the file) when the file is opened is ‘r’ or ‘w’ modes and
to the end-of-file when the file is opened in ‘a’ mode
• Is updated after each read/write operation to the position at which the operation has
ended
• file.tell(): returns the current read/write position
• file.seek(offset [, ref]): changes the read/write position by offset from
ref
• ref can be 0 (default) for the beginning of the file
• ref can be 1 for the current position, offset has to be 0 (for text files)
• ref can be 2 for the end-of-file, offset has to be 0
Exercise
• Write a program where you open the file twinkle.txt then read and
print to the screen exactly the ten characters at the middle of the file.
• You can only read ten characters
• Use the methods tell and seek to position the read/write cursor at the right
character, then read that character using the read method (with its argument
set to 10)
Reading from a file
• s = file.read([size])
• If size is not specified or negative, returns a string with all the characters in
the file until the end-of-file (eof)
• s = file.readline([size])
• Returns a string containing one line (up to the new line character) from the
file with a maximum number of characters size
• L = file.readlines([size])
• Returns a list of strings where each is a line from the file
• Size always represents the maximum number of characters to be
read. If less characters are left till the end of line or end of file, then
shorter strings are returned
Reading from file
• for line in file:
…
• Iterate through lines using a for loop
• All read operations start from the current cursor position.
• The cursor is incremented by the number of characters read
Exercise
• Write a program where you print to the screen the lines of the file
twinkle.txt in reverse order:
• Output:
Like a teatray in the sky.
Up above the world you fly,
How I wonder what you're at!
Twinkle, twinkle, little bat!
Writing to a file
• file.write(content)
• content must be a string
• file.writelines(list_of_content)
• list_of_content must be a list of strings
• All write operations start from the current cursor position.
• The cursor is incremented by the number of characters written
• If the read/write cursor is not at the end-of-file, the write operation
overwrites the bytes at the cursor position
Exercise
• Write a program where you write the lines of the file twinkle.txt in
reverse order to a new file
Files and functions
• A file can be opened, read or written to then closed in a function
• The file object is a local variable
• The function can obtain the file name/path as an argument
• A function can also perform operations on a file object provided as
argument
• Any file operations performed in the function will reflect on the file
• Changes to the read/write cursor position due to these operations will reflect on the
argument and should be taken into consideration in the calling code
• It is possible to check whether the file is readable() or writable() before
attempting the corresponding operation or use try-except to catch potential
exceptions
• Files opened in ‘a’ and ‘a+’ modes, are writable but the write position is always at the end.
Madlibs
• In the 1960s, entertainer Steve Allen often played a game called
madlibs as part of his comedy routine. Allen would ask the audience
to supply words that fit specific categories—a verb, an adjective, or a
plural noun, for example—and then use these words to fill in blanks in
a previously prepared text that he would then read back to the
audience. The results were usually nonsense, but often very funny
nonetheless.
• In this exercise, your task is to write a program that plays madlibs
with the user. The text for the story comes from a text file that
includes occasional placeholders enclosed in angle brackets. Suppose
the input file is the attached carroll.txt.
Madlibs
• Your program must prompt the user for the input file name (path),
read the file and prompt the user for words or phrases to fill in the
placeholders. The program then prints the resulting text (after
replacing the placeholders with the user input) to the screen and
stores it in an output file.
• Note that the placeholders number, location and content will vary from input
file to the other i.e. the program has to extract the user prompts from the
input file
Madlibs
• Sample run based on caroll.txt
Download