Transaction Files

advertisement
Transaction Files




Store details of transactions during a period
Period can be day/month/week
Transient
Once processed can be discarded
Master Files


"Permanent"
Kept up to date by applying transaction files
o Unchanging data eg. Payroll file, name, address
o Changing data - Gross pay to date.
Types of File Organisation
Choice of file depends upon intended use:


what proportion of records to be processed each time file is
updated;
whether individual records need to be quickly accessible.
There are four main types of file organisation:
Serial
Sequential
Random
Indexed sequential
Serial




Records are stored one after another
Records stored in any order (e.g. as the're entered)
To find a record must read each record in turn from start of file
New records added at end of file
How do you think a record could be deleted from a serial file?
Copy to new file skipping record to be deleted, delete original file and
rename new file!
Sequential



Records are stored one after another
Records are sorted into primary key sequence
To find a record must read each record in turn from start of file
A sequential file is particularly suitable when all records in a file need
to be processed (e.g. payroll file).
How do you think a record is be added to a sequential file?
Copy to new file until the insertion point is reached, write the record to
be inserted, copy rest of file then delete original file and rename new
file!
How do you think a record is deleted from a sequential file?
The same way as a serial file!
Random
Also known as Hash or Direct files.


Records are located by disc address or relative position in file
An algorithm is used to convert the primary key to the address
(the Hash function)
An example of a hash function is division/remainder method. The
primary key is divided by the number of addresses in the file, and the
remainder of the division is address of the record.
A problem with hashing is that more than one primary key may map to
the same address. This is called a collision (aka aliasing or a
synonym). A random file has to have a method of dealing with
collisions
A good hash function should be designed so as:



to minimise collisions;
to be fast to calculate;
to generate any of the available addresses.
Indexed Sequential
An index is records the highest primary key stored in each block of
records. Within each block, the records are stored in sequence of
primary key. Space is provided in each block so that more records can
be added in the correct sequence.
An overflow area (usually at the end of the file) is used for records
which will not fit into the correct block. A pointer to the location of the
record in the overflow area is left in the block.
Because the records are indexed, but also stored sequentially, they
can be accessed either randomly or sequentially. The file combines the
advantages of random and sequential files.
File Roles



Master: contains permanent data, some of which is updated
regularly by transaction files.
Transaction: contains details of changes which occur during a
transaction period.
Reference: contains data updated infrequently, often from an
outside source.
Updating Files
A master file is updated by a transaction file. Each record to be
updated is read into memory, updated and then written by to the file.
A sequential master file cannot be updated in the same location. It
must be read modified and written back to a new file as described
previously. When a sequential master file is updated, the transaction
files are also sorted into sequence so that only one pass through the
files is needed. The steps are:
1. A record is read from the master file.
2. A record is read from the transaction file.
3. The primary keys are compared. If the transaction file key is
greater, the current record does not need to be updated, and is
written unchanged to a new master file and the next record is
read from the master file. This step is then repeated.
4. If the keys are equal, the record is updated from the transaction
record and then written to the new master file. The operation
then repeats from step 1 until there are no further records in the
master file.
As each update by transaction file produces a new master file, there is
a copy of the master file before the update so two versions exist.
Usually two previous versions of the master file are kept as backup,
and these "generations" of master file are often known as Grandfather
(oldest) Father and Son (most recent).
If the master file is indexed sequential or random, it can be updated
without copying to a new master file. This is called update by overlay.
Criteria for Choice of File Organisation
The way a file is organised determines how it can be accessed.


A sequential file can only be accessed sequentially.
A random file can be accessed randomly, so it could be
accessed sequentially, but this would be unusual (and slower
than a sequential file).

An indexed sequential file can be accessed either sequentially
or randomly.
The following factors should be considered when choosing file
organisation:






what response time is required;
must information be absolutely up to date;
can requests for information be batched and processed together;
is information required in sequence;
what is the most suitable storage medium;
what happens if data is lost?
Hit rate
This is the proportion of records accessed in any one pass through a
file. A high hit rate would suggest use of a sequential organisation, a
low hit rate would favour random organisation.
Text Files
A text file consists of lines of alphanumeric characters. That's it! No
control codes (except carriage return and line feed), nothing else!
Examples of text files include:





Program source code
Simple text files
Script files
HTML source code
Configuration files
Non-text Files
Non-text files are often called binary files. They contain a sequence of
arbitrary codes. A binary file can contain anything (that can be
represented in binary code).
Examples of non-text files include:



Object code
Application specific data files
Image files
What sort of file do you think a word ".doc" file is?
(A Non-Text file)
File Structure
A file can be considered as a collection of records. Each record
represents information about one object. E.g. consider a file being
used to store names and addresses, each name and associated
address would be one record.
A record can be further subdivided into fields. In the case of the
address book these may be Title, Forename etc.
Primary Key
Within a file, a record must have a unique identifier. This is usually
one field in the record which can be guarenteed to be unique. Where
the data in none of the fields can be guarenteed unique on its own, a
special field with a serial number is often added.
The unique field is called the primary key, and is required so that each
record can be located or selected unambiguously.
Secondary Key
Other fields in a record can be defined as secondary keys. These are
not unique, but may be used to quickly locate groups of records.
Record Types
The records in a file can be either fixed length or variable length.
Fixed Length Records


Number of fields is same in all records
Length of each field is same in all records
Advantages


file processing simple
easy to estimate file size
Disadvantages

inefficient use of storage space
Variable Length Records



Number of fields can vary from record to record
Length of each field can vary from record to record
End of each field and record needs to be indicated either by a
special character, or size at the start of each field/record.
Advantages


compact, no waste of space
flexible, as many fields as needed
Disadvantages


file processing complex
hard to estimate file size
File Size Estimation
In principle the size of a file of fixed size records can be estimated by
multiplying the expected number of records by the number of bytes in
each record. In practice is not quite so simple.
Data is stored on disc or tape in blocks. Each block is of a fixed size
determined by the storage device. E.g. a disc drive may store data in
512 byte blocks.
When records are written to a storage device, only a whole number of
records can be put into each block of storage. Some space in each
block will be used for information about the size and number of
records, the rest will be wasted.
Therefore to estimate file size first calculate how many records fit in
each block. Then use this to calculate the number of blocks which will
be used on the storage device.
Example
A storage device has a block size of 512 bytes. A file of 1200 records
of size 108 bytes is to be stored. Estimate the file size.
1. Cacluate number of records per block:
= 512 / 108
=4
2. Calculate number of blocks:
= 1200 / 4
= 300
3. Multiply number of blocks by block size:
= 300 * 0.5 kB
= 150 kB
Download