Data

advertisement
Summary lecture notes on Data, Files and Records.
Data
These are raw facts and figures (using numeric, alphabetic and special character symbols)
about business activities, transactions, events and happenings. For example, data can be,
recorded hours worked by an employee, invoices, receipts, orders etc
Data itself is meaningless unless manipulated and transformed (processed) into
meaningful forms (information)
Data processing stages
Data processing stages refers to the process of collecting data, at the point of activity up
to the time data is transformed into information. Raw data is assumed to pass through 5
stages as shown below.
DATA ORIGINATION
Data verification
DATA PREPARATION
software
transcription
DATA INPUT
DATA PROCESSING
6 steps
DATA OUTPUT
i) classify
ii) sorting
iii)simplifying
iv)calculations
Sources of Data (origination)
Data can originate from the following sources
i)
business transactions,-invoices, receipts, customer orders etc
ii)
census( counting of people and distribution of wealth)
iii)
observation(observing some happenings and recorded details)
iv)
experiments (observations and recordings about the progress of an
experiment)
Prepared by Sinkala Henry.
University of Lusaka
v)
market survey(sampling to test the reaction of a product or service being
introduced)
Verification
Data verification is a process of checking for errors in input data before it is carried
forward for processing. Data verification is carried out at the Data preparation stage.
Verification is normally carried out by more than one person or machine. The same data
is compared and checked character by character against one another. If there is any
difference the machine halts (shows error message) and allows the operator to make
corrections.
Types of Errors detected by verification
i.
ii.
iii.
iv.
Transcription errors- this is wrongly copying of characters (common one)
Transposition errors-this is wrongly switching of characters. ( letter O and Zero,
I and L)
Omission
Double transposition
There are Six (6) Steps required to process a transaction and they include:
i)
data entry/ capture
ii)
validation
iii)
data processing
iv)
storage
v)
output generation
vi)
query support
Data entry/ capture
This is the collection of raw data from the outside world ( it can be manually or by using
data capture devices) into an information system. Examples of data entry may include:




entering of hours worked from workers' timesheets in order to know how many
hours each person worked in that month
Conducting a survey of customer's opinions and entering the data (in form of a
questionnaire).
Using a form on a website to collect visitors' opinions
Entering of students records into an Information System.
. Note the difference between capture and input.
Input simply means loading of the acquired data into an information system, such as,
Data keyed-in from documents by keyboard operators in either off-line or on-line
mode
Prepared by Sinkala Henry.
University of Lusaka


Directly typing the hours from the timecards/sheets into a spreadsheet.
Reading of source documents by automatic reading devices - MICR, OCR,OMR,
Bar Code Reader etc
Other typical input devices include:
Keyboards, mice, flatbed scanners, bar code readers, joysticks, digital data tablets (for
graphic drawing), electronic cash registers
Validation
Validation is ensuring inputted data is of the right type and within reasonable limits.
Databases and spreadsheets can have validation rules built into data fields to reject
impossible entries.
Validation can include: SEE C.S FRENCH 148
A field is any item of data within a record. It is made up of a number of characters. E.g
name, a date, an amount, gender (sex) etc
Type of validation checks.
Size checks. - Fields are checked to ensure that they contain the right number of characters
e.g. customer No. Specified as 6 numerical characters. This means that the program will
validate for 6 numerical characters.
Presence check. –Data items are checked to ensure that all the fields are present and have
been entered.
Range checks also called limit checks. Check for numbers and codes to ensure that they
are entered within the permissible range e.g. an organization may decide to give a discount in
the range of 20% to 30 %. This implies no discount less than 20 % or more 30 % will be
entered.
Character check. Fields are checked to ensure that they contain only characters of correct
type e.g. no letters on numeric fields.
Format check. Field are checked to ensure that the correct formats be followed, with letters
and number in correct order e.g. the date could either be British or American.
Prepared by Sinkala Henry.
University of Lusaka
Reasonableness. - Product quantities are checked to ensure that they not abnormally high
or low (the amount of goods ordered are reasonable).
Check digit verification: (to validate a credit card number)
Are a means of using arithmetical relationships between a last digit of No and the other digit
of a Numbers. The check digit checks against transposition errors. When a number. is input
into the computer the validation programs perform the same calculation on number as was
performed when the check digit was generated in the first place.
Consistency (numeric fields contains numbers only, certain characters may be disallowed
e.g?*, $ in name field)
Processing (manipulation) stage
At this stage DATA is converted into INFORMATION.
i) This can be described as processed raw facts and figures which have been transformed
into meaningful forms.
ii) Information is the processed data. It is meaningful and allows an organization to
make decisions and solve problems. Such as Surveys data being converted into graphs,
Calculating wages from hours worked
Typical processing software includes word processors, spreadsheets, databases, payroll
systems, etc
Storage
After data has been processed, it is vital that it should be stored.
Typical storage devices (magnetic and optical) include:





hard disk (fast access, big capacity)
floppy disk (slow access, low capacity)
Flash disk
Tapes (QIC quarter inch cassette tape)
Optical
Prepared by Sinkala Henry.
University of Lusaka
With storage the main issues include speed, reliability and the capacity of storage.
Output
Is the final generation of information to the outside world. Output generation is facilitated
by output devices which include:





CRT (cathode ray tube) monitors and LCD displays
printers
sound cards/speakers
printers
plotters
Query
At this stage information is queried for further verification or for conformity (agreeing
with certain accepted standards or norms).
General Qualities of good information:
For information to be useful, it must possess several characteristics attributes. These
attributes add value or increase the potential of information.
i)
Accurate
ii)
Meaningful
iii)
Reliable/ reliability
iv)
Easy to use
v)
User targeted
vi)
Relevant
vii)
Timely
viii)
Complete
ix)
Error free
x)
verifiable
Prepared by Sinkala Henry.
University of Lusaka
a. Accurate- For information to be useful, it must be exactly and precise.
This can help the decision makers to make accurate decisions.
b. Meaningful- Good information must be meaningful and easy to
understand, not to be misleading.
c. Reliable- Good information must accurately represent the events or
activities of an organization. In addition, it must be easy to follow and
retrievable.
d. Easy to Use-(Understandable) - Good information must be user friendly.
Information must not be complicated because it may not be too useful as
people may not really understand it.
e. User Targeted- information must be specific, brief and user targeted.
f. Timely- Good information should provided in good time and received at
the right time
g. Relevant –good information should make a difference to the decision
maker by reducing uncertainty or by adding increased knowledge or value
to the decision maker.
h. Complete-must include all relevant data or aspects of the data that are
required.
i. Error free-data sets must contain a minimum number of errors.
j. Verifiable- generated information of the same process must be the same
or within reasonable variance.
Task: v alue of information
Role of information in business
Prepared by Sinkala Henry.
University of Lusaka
Data organization
Refers to the way we store and arrange data, in order to make it easy for storage,
manipulation and retrieval of information. Information is organized in unit form
according to the data hierarchy structure shown below
FILE
RECORDS
FIELDS FIELDS
RECORDS
FIELDS
FIELDS
RECORDS
FIELDS FIELDS FIELDS ……..
Character character character character character …………………………
A file consists of a number of records. A file holds data that is required for providing
information. Some files are processed at regular intervals to provide information (e.g
payroll file) and others will hold data that is required at regular intervals (e.g a file with
price items)
There are two common ways of viewing files:
a) Logical files. A logical file is a file viewed in terms of what data items its records
contain and what processing operations may be performed upon the file. (i.e entities and
attributes)
i) Entity: entities are real world things (objects, people, events etc) about which there are
need to record data. (E.g. an item of stock, employee, transaction). Entities can be
tangible or intangible.
Prepared by Sinkala Henry.
University of Lusaka
ii) Attribute: these are individual properties/ characteristics that describe and identify an
entity. E.g. attributes of an invoice-(name, address, customer number, quality, price,
description)
b) Physical file. A physical file is a file viewed in terms of how the data is stored on a
storage device and how the processing operations are made possible.(i.e physical
records, fields, characters-a physical feature)
A character is the smallest element in a file and it can be alphabetic, numeric or special.
A field is any item of data within a record. It is made up of a number of characters. E.g
name, a date, an amount, gender (sex) etc
A record is a collection of data pertaining to one entity ( A record is made up of a number
of related fields. E.g. a customer’s record, employee payroll. Eg a bank’s customer file
may contain a single record with all account numbers, branch number, name, address,
phone number and current balance.) A record can be recognized or identified by the
record KEY Field
A key field is a data element of field used to identify each record on a file. It is unique
and it should not be a duplicate/ duplicated. Examples include a bank account number, an
employee ID number, student’s ID number, invoices number, etc
Types of files
i) Master file
ii) Transaction file (movement file)
iii) Reference file
Other file types:
Archive file, Back-up file, Program file, Data file, Work file, Scratch file
i) Master file always contain data. They contain up-to date information on a set of similar
entities. These files are fairy permanent in nature. E.g. an employee file, customer ledger
payroll file. They have a regular updating feature to reflect what is happening in an
organization.
ii) Transaction (movement) file contain data that record events. Records in a transaction
file are placed in time order and these are processed to up-date the related Master file
records. An incoming delivery file is a transaction file and might be used to update the
company’s master Stock file. Similarly a Sales ledger will be from all the Orders
received.
Prepared by Sinkala Henry.
University of Lusaka
iii) Reference (Temporary) file is a file with a reasonable amount of permanency. This
file is deleted when the processing is complete. This can be data used for reference
purposes, for example price lists, tables of rates of pay, VAT rates, names and addresses
of customers/ business partners, etc.
Grand father-father-son method of updating
The method used to update a master file will depend on its organization. In order to
update a sequential master file. The transactions must first be sorted into the same
sequence as the master file. A record from each file is then read and the record keys
compared. If the master record has a matching transaction it is copied across unchanged
to the new master file and another one read into memory until a match is found. The
master record can then be updated in memory and another transaction record read. If it is
for the same record, the master record in memory is updated again and so on until a
transaction from a different record is read. Then the updated master record can be written
to new file and another one read into memory and compared with the current transaction.
This method of updating is known as the grandfather-father-son method, with a new
file being created each time an updated is carried out. Normally at least three generations
of master file are kept for back up purposes. If the least version of the master file is
corrupted or destroyed, it can be recreated by-running the previous update, using the old
master file and matching transactions.
Day
1
Transaction file
Old master
File
Tape A
(1 st generation)
Update
New
master
file
After the update on day 1, Tape
A is the ‘Father’ tape and Tape
B is the ‘son’ tape
Tape B
(2 nd generation)
Day 2
Old master
File
Tape B
(2 st generation)
Update
New master
file
Prepared by Sinkala Henry.
Tape B
(3 rd generation)
After the update on day 1, Tape A
is the ‘Father’ tape and Tape B is
University othe
f Lu‘son’
saka tape
File access
File access is the process of getting to a file by searching, locating and retrieving a
particular record from a file.
File access depend on the following:
a) Storage type – ( DASD or SASD)
b) The organization method (i.e file organization)
1. Direct Access Storage Device (DASD)
This type of device allows any particular item of data on the disk to be read directly,
without having to read all the rest of the recorded data. These types of storage devices are
based on spinning disks upon which data is recorded. Within each category of direct
access device, there are a greater number of variations in design, performance, capacity
and cost.
These devices include:
-Magnetic disk drives
-Optical storage
A) Magnetic disk drives: this is the most common form of disk based storage. All PCs rely
on this form. Two of the basic types of magnetic disk are:
i) Hard Disk drives-Also called Winchester
ii) Floppy disk drives
-limited storage capacity (floppy 1.44Mb, and ZIP Disk 100-250MB) not in use
B) Optical drives:
Optical drives use storage techniques based on light instead of relying on the principles of
magnetism. They use reflected light to read data, based on the Compact Disc. Tiny pits
are burned or pressed into the thin coating of metal or material deposited on a Disc. The
pit patterns represent the stream of digital data that are used to encode images and
sounds. They can store audio, video, text and program instructions.
Prepared by Sinkala Henry.
University of Lusaka
They include: CDs and DVDs




CD-ROM (Compact Disc-Read only Memory)
CD-R (Compact Disc- Recordable- data can not be overwritten once it is
recorded)
CD-RW (Compact Disc-Rewritable-data can be overwritten time and again)
The CDs have a maximum of 700MB data storage, or 80 minutes of Audio play.
DVD- (Digital Versatile Disc)
DVD discs store so much data s compared with CD discs because both sides can be used,
along with sophisticated data compression technologies.
Standard DVD discs store data from 4.7GB to 9.4GB.
However, with the advert of multimedia files with video graphics and sounds, new
optical discs are being developed. E.g. BLUE-RAY. Blue-Ray can have maximum
capacity storage of 50GB (TDK) and 25 GB (Sony, Philips, Fujifilm etc)
Blue-ray Disc (also known as Blu-ray or BD) is an optical disc storage media format. Its
main uses are high-definition video and data storage. The disc has the same dimensions
as a standard DVD or CD.
The name Blu-ray Disc is derived from the blue laser (violet coloured) used to read and
write this type of disc. Because of its shorter wavelength (405 nm), substantially more
data can be stored on a Blu-ray Disc than on the DVD format, which uses a red (650 nm)
laser. A dual layer Blu-ray Disc can store 50 GB, almost six times the capacity of a dual
layer DVD.
Blu-ray Disc was developed by the Blu-ray Disc Association, a group of companies
representing consumer electronics, computer hardware, and motion picture production.
As of July 2, 2008 more than 650nm Blu-ray Disc films have been commercially released
in the United States and more than 410nm Blu-ray Disc titles have been released in
Japan.
During the high definition optical disc format war, Blu-ray Disc competed with the HD
DVD format. On February 19, 2008, Toshiba — the main company supporting HD
DVD — announced it would no longer develop, manufacture and market HD DVD
players and recorders, leading almost all other HD DVD supporters to follow suit,
effectively ending the format war.
2. Serial Access Storage Devices.
Prepared by Sinkala Henry.
University of Lusaka
Serial access devices are those in which a particular item of data can only be read after
reading all the intervening items of data. The only important serial access storage device
in current use is the magnetic tape. There are three basic types of Magnetic tape device in
use. However, due to huge demand in storage capacity, they are not in use in most
organization unless otherwise.



Reel –to- reel tape device
Cartridge tape device (e.g. an 8mm tape of 112m long can store up to 5GB of
data)
Digital audio tape device.
File Organization
File organization is a physical placement or arrangement of records within the file. The
way in which files are stored has a direct bearing on how quickly the data contained can
be accessed. Records on a file can be ordered or unordered.
File organization depend upon:




The way in which the file is going to be used.
The number of records to be processed each time the file is updated.
Whether individual records need to be accessed quickly or not
The type of storage device chosen
Ways in which files can be organized.
1. Serial File Organisation
There is no sequence or order to records that are stored in a serial file. They are stored in
the order they are received and new records are added at the end of the file.
Prepared by Sinkala Henry.
University of Lusaka
In order to access a record in a serial file, the whole file has to be read from the beginning
until the desired record is located.
2. Sequential File Organisation.
Sequential files are organized in such a way that the records are stored according to the
order of the values of a chosen record attribute (field). The records can be in ascending or
descending order, based upon the attribute value. E.g. records in your bank’s customer
file may be stored in customer account number order in an ascending order.
E.g. arranged in alphabetical order.
The field used for sequencing the record is mainly the primary key of the record (such as
the customer account number) or a combination of other fields.
3. Indexed Sequential File Organisation
Prepared by Sinkala Henry.
University of Lusaka
As with sequential files, indexed sequential file records are stored in a sequence that
reflects the value in the key fields of each record. The file contains an index with pointers
to certain data records in the file. The index helps in locating a record in the data file.
Indexed sequential files are used where individual records need to be accessed very
quickly without having to start searching from the beginning. They reside on direct
access storage devices
4. Direct File Organisation. (Random File Organisation)
. The records are not stored in any particular sequence. Instead a mathematical
relationship is established between a record key value and the address of its physical
location on the storage media.Just as with Sequential files, direct files rely on direct
access storage device. This is also known as random file organization
Random file organization allows extremely fast access to individual records, but the file
cannot be processed sequentially. It is suitable on line enquiry systems where a fast
response is required.
 To access a record on a random file, its address is calculated from the record key
using the hash algorithm. The record at that address is then read. If it is not
required record, then the next record is read and examined until either the record
is found, or a blank record is encountered.
 To add a record to a random file, its address is calculated using the hashing
algorithm and then the relevant block is read into the memory, if the block is
empty, the record is written to the file. other wise , the next bock is read and
examined until an empty space is found
 To delete a record from a random file, the record must not be physically deleted
because this would result in any records which caused collisions being
inaccessible. there fore, the record is flagged as deleted by setting an extra field
( e.g. a Boolean field) in the record to indicate that this is a deleted record
Factors which may influence the file organization
Prepared by Sinkala Henry.
University of Lusaka
This refers to various factors or circumstances which may have a barring on the method
adopted on file organization.
The following are some of the factors:





Storage medium (DASD or SASD)
Processing method (e.g. real time processing function where Direct file
organization)
Volume of data
Hit rate
Type of file (transaction are usually serially organized, while master are
sequentially or directly organized)
Hit rate
Hit rate measures the proportion of records being accessed during a particular run. It is
calculated by dividing the number of records accessed by the total number of records on
the file, and multiplying by 100 to express the result as a percentage.
For example, if 270 employee records out of 300 on the file are accessed on a particular
payroll run, the hit rate is 200/300*100=90%.
In general, if the hit rate is high, (say over 70%) whenever the file is processed, it is
efficient to use a sequential file organization, whereas if the hit rate is low, it is preferable
to use a randomly organized file where records can be updated in any sequence. If the hit
rate is sometimes low an indexed sequential file organization is appropriate.
Hit rate = (transaction file record /master file record) X 100
©2013
Prepared by Sinkala Henry.
University of Lusaka
Download