11/19 files in python

advertisement
Files in Python
The Basics
Why use Files?
• Very small amounts of data – just hardcode
them into the program
• A few pieces of data – ask the user to input
them
• More than this, you need an external file
stored on secondary storage
External data files
• Handles large amounts of data
• Data is independent of program, so program can
change without changing data
• Easier to edit data in an editor, instead of during
run of program (can’t go back!)
• Use the same data for input to different programs
• Output files can be saved for later use
• Output of one program can be used for input of
another
Text files versus Binary files
• Text files created by editors, stored as ASCII
codes
• Binary files stored as raw binary numbers,
have to be handled differently
• Text files are manipulated sequentially only
• Binary files can be manipulated sequentially
or randomly (we will not do binary files in this
class)
Creating a text data file
• This is done just like creating any other text
file
• You can use Notepad
• You can use the editors of the IDEs that
create Python
• You can use a word processor like Word if you
are careful to save as plain text
• Store the text file in the same folder as you
put your source code
Delimiters
• The \n (newline) (carriage return) is a very
important symbol in text files.
• It delimits what Python calls a ‘line’ in the file.
• It gets put into the file whenever you press Enter
at the end of a line
• A blank line is represented by two newlines
together \n\n
• It matters whether you press Enter at the end of
the last line of the file – some methods in Python
will treat the last line differently because of the
\n character
Files in Python
Buffers
Why a buffer?
• Computer equipment runs at different speeds,
the hard drive and secondary storage in general is
MUCH slower than RAM and the CPU, for
example
• This is a bottleneck where the faster pieces have
to wait for the slower ones to deliver the action
or service or data that is needed
• Buffers help this bottleneck! They let the OS
bring in a bigger chunk of data to RAM and hold it
until the program asks for it = not so many trips
to the hard drive
What’s a buffer?
• A buffer is an area of RAM allocated by the OS
to manage these bottlenecks
• Every file you open in your program will have
a buffer
• Also buffers for keyboard, screen, network
connections, modems, printers, etc.
• You the programmer do not have to worry
about this happening, it’s automatic!
Buffer for input file
• When you read from a file, the buffer associated
with the file is checked first – if there’s still data in
the buffer, that’s what your program gets
• If the buffer is exhausted, then the OS is told to
get some more data from the file on the HD and
put it in the buffer
• This process continues until the program ends or
until the file has no more data to read
• Think of a pantry in a house – it’s a buffer
between the people in the house and the
supermarket
Buffer for output file
• You write in your program to an output file
• The data does NOT go directly, immediately to
the hard drive, but to an output buffer
• The OS monitors this buffer – when it is full, it
is all written to the hard drive at one time
• Think of a garbage can in a house – it is a
buffer to hold trash until it can all be taken to
the landfill at one time
Why do I care about buffers?
• You can see most of the action on a buffer is
automatic from the point of view of most
programmers
• BUT! if you forget to close your file when you are
finished with it, the file can be left in an
“unfinished” state!
• Some OS’s are bad for not cleaning things up
when your program is over – they should close all
files automatically but sometimes they don’t!
Why do I care?
• A file in an “unfinished” state may be one of
those files you run across after an application has
crashed. If you try to erase it, the OS says “no,
that file is still busy”, even though it’s not.
• Especially for output files, your file on the hard
drive may not get that last buffer of data that you
thought your program wrote to the file if you
forget to close the file! The file will be missing
data or possibly missing altogether if the file was
small.
Before the open happens
After the open
After one readline()
After two more readlines
Don’t forget!
• Don’t forget to close your files!
– and the close statement must look like
– infile.close()
No arguments in the parentheses but they must be
there!
Files in Python
Opening and Closing
Big Picture
• To use a file in a programming language
– You have to open the file
– Then you process the data in the file
– Then you close the file when you are done with it
• This is true for input files or output files
Opening a file
• To use a file, you first have to open it
• in Python the syntax is
infile = open(“xyz.txt”, “r”) # for input (read)
or
outfile = open(“mydata.txt”, “w”) # for output
It creates a link between that variable name in the
program and the file known to the OS
Processing in general
• Processing is general term
• In Python there are at least 4 ways to read
from an input file
• And two ways to write to an output file
• They all use loops in one way or another
• See other talks for details
Closing a file
• When you are finished with the file (usually
when you are at the end of the data if it is
input)
• You close the file
• In Python the syntax is
infile.close()
Works for input or output files
Note: no arguments but you MUST have () !!
Otherwise the function is not actually called!
Files in Python
Input techniques
Input from a file
• The type of data you will get from a file is always
string or a list of strings.
• There are two ways of reading that I call “bulk
reads” because with one statement they totally
exhaust the file. There is no more to read after
that!
• The other two ways read a line at a time from the
file
• Files are objects so most of these will be methods
called with the dot notation as usual
read()
• The read method is called like this
datastr = infile.read()
• What does it do? it reads in the entire file of data, into
one string variable
• The newlines and other whitespace in the file are
stored in the string like every other character
• Be aware if you are reading a LARGE file, this may take
some time and a lot of RAM!
• This is convenient if you do not care particularly where
the newlines are in the file
• BULK
readlines()
• The syntax: datalst = infile.readlines()
• This method reads in ALL the data from the file
and uses the \n as a delimiter to break the data
into strings in a list
• There is nothing more to read in the file after you
execute one readlines call.
• This is convenient if you know the data in the file
is organized by lines, i.e. each line needs to be
processed by itself
• BULK
readline()
• Note that this is a different method from readlines – note the
s!
• syntax: datastr = infile.readline()
• Semantics: it reads in the next line of data from the file, up to
the next newline
• Returns a string which has the data and a \n character at the
end
• Useful when you don’t want to read in ALL the data at one
time, or when you have more data than RAM space to hold it
• Usually used inside a while loop
• Indicates the end of the data in the file by returning an empty
string. Note that this is different from having an empty or
blank line in the file – that is returned as “\n”
Files in Python
Caution about readlines vs. read and
split
You would think that
lines = infile.readlines()
and
line = infile.read()
lines = line.split(‘\n’)
would give the same result in the variable lines,
that is, a list of strings from the file, delimited by
the newline characters.
You would be surprised!
• readlines() gives you a list of strings, each with a
\n at the end
• Except! if you did not press Enter on the last line
of the data file, the last string in the list will not
have a \n in it
• read() followed by split(‘\n’) gives a list of strings,
yes, but none of them will have \n in them
(remember split removes the delimiters from its
results)
And another surprise!
• If you did press Enter on the last line of the data
file, readlines still works properly. The last string
in the list will have a \n character just like all the
others
• BUT the same file read with the read/split
combination will have one extra entry, an empty
string at the end of the list
• This is something you need to be aware of while
processing your data – many programs crash
because they assume that every string will be the
same length, for example.
Files in Python
Output techniques
Outputting to a file
• There are two ways to do this in Python
– print (more familiar, more flexible)
– write (more restrictive)
Using print to output to a file
• You add one argument to the print function call.
At the end of the argument list, put “file=“
followed by the name of the file object you have
opened for output
• Example print(“hi”, a, c*23, end=“”, file= outfile)
• You can use anything in this print that you would
in printing to the screen, end=, sep=, escaped
characters, etc.
• Default end= and sep=, so gives a newline at the
end of every print unless you give different value
• Note it says file=outfile, NOT file = “abc.txt”
Using write to output to a file
• write is a method, similar to the Text object in the
graphics package
• it is called by the output file object (dot notation)
• It is allowed ONE and only one STRING argument,
so you have to convert numbers to strings and
concatenate strings together to make one
argument
• Example outfile.write(“hi”+str(ct)+”\n”)
• Does NOT output a newline automatically, if you
want one, you have to put one in the string
Files in Python
When does it crash?
How a file can make a program crash
• For input files, there are several things that can
happen which can cause a program to crash
• Some are avoidable with some care, some are
not
– the file does not exist that you are trying to open
– trying to read past the end of the file
– the data in the file is not laid out as the program
expects
– the file exists but is empty
Output files
• An output file is constructive and destructive
– If the file you are opening to write to does NOT exist,
it is created
• Note that if you gave the path to the folder as part of the file
name, the open will NOT create folders!
• In other words, outfile = open(“c:\\My
Documents\\cs115\\file1.txt”, “w”) will only work if the path
already exists and you have permission to write to it
– If the file you are opening to write to DOES exist
already, all data is destroyed
• tells the OS to set the length of the file to zero bytes!
• If you try to write to a medium that is full, your
program will crash
Download