Uploaded by William Abodunrin

file organization and processing

advertisement
CSC 305- Introduction to
Database Systems
A.O. Afolabi
File organisation
• Storage of data
• Organization of data
• Access to data or information
File organization and processing
• Two levels
• Physical level
• Actual manner information is stored in the computer
• Has many different constructs
• Logical level
• An abstraction of our information is stored in the computer
• user’s perception of how information is stored
• Might be different from how it is physically stored
• Few constructs
File organization
• Physical level is a lower level for information representation than
logical level
• A DBMS transforms information from logical level to physical level
• DBMS – A special-purpose software for data sorting and manipulation
Physical file Vs Logical file
• Physical file- contains original data.
• A physical file contains one record format.
• describe how the data is to be displayed to or retrieved from a
program.
• Logical File - It does not occupy memory space. It does not contain
data.
• It can contain up to 32 record formats
• It cannot exist without a physical file.
• Has a description of the records found in one or multiple physical files.
Logical Level vs Logical Level
Logical Level
Physical Level
DBMS
File structures
File organization
• Records = Field 1
Field 2
Field 3
Field4
Field5
Field6
• Student record = Reg. No
Name
DOB
Fac
Dept
Part
• File -Consists of record of same format
• Primary key = field or composite of several fields
• Primary key – distinguishes a record from all others
• Secondary keys or attributes – all the remaining fields
• Binary search key – compares the key of the sought record with the
middle record of the file
• Probe- an access to a distinct location
Probing a file
• Probing an unsorted file of n size (e.g. 10 records)
• Probing a sorted file (n/2)
• The challenges
• Use a file of size 10
• Illustrate challenge search the 11th record (non-existence)
File Organizations
• Three types
• Sequential organization – sequential access
• Access to multiple records, entire file
• Pre-defined order for access
• Direct organization – Direct access
• Aka random access
• Accessing a single record
• Indexed sequential – sequential and Direct
File organisation
• Effective organisation requires
Organisation type
Intended access type
Match
• Organisation might be ideal for one type of access but not suitable for
another
• Match use with structure
• Indexed sequential – for applications that require both types of
access
Sequential File organisation
• Organisation
1
2
……
x
X+1
X+2
• Processing requires moving from one record to the other incrementing the
address of the current record
• Record location is done by processing the records in a file in order of
occurrence until the target record is found or all records are processed
• Use the example of students records to illustrate this type of file
organisation
Sequential file organisation continues
•
•
•
•
•
•
•
•
•
Improving retrieval performance
Sort based on the records key values
Key1<key2<key3…..keyx<….keyn
Using binary search – compares the key of the sought record with the
middle record of the file
Search is further reduced on ordered file using BS
If Keysought<keymiddle
Upper portion of the file has been removed
If Keysought>keymiddle
Lower portion of the file removed include one compared against
Direct File Organisation
• People are impatient
• Rapid location of desired information required
• Desire to go directly to the address where record is stored
• Easier if key were the address – unique key
• Example students matric number
Location from a unique storage address
• Convert primary key to a unique storage address
• Location A[i] = b+ m x s
• Where
• A = array of any dimension
• i= an element
• b=location of the first element in the array
• m= number of elements preceding ith element
• s= size (in address unit) of an element of the array
• The formula helps to locate the element in one search
Retrieval time- Hashing
• Other methods to keep retrieval time to minimum
• Hashing
• Reducing number of collision (when two distinct keys map to the
same address)
• Hashing has two aspects in this instance
• The function
• The collision resolution method
• Collision mechanism important when number of records mapped to a
given location exceeds its limit
Hashing functions
• F(Key) = key mod N, where
• N = table size
• Benefit: - returns values in the range of address space (e.g. 0 to N-1)
• 8 mod 5 = 3
• A variation
• F(Key) = key mod P, where
• P = the smallest prime number >= N
• P is then the new table size
Hashing functions continues
• Truncation or substring – a substring can be used as the address
• Example, student matric number
• Folding – two types
• Folding by boundary
• Folding by shifting
Folding by boundaries
• Demarcate key into boundaries
• Superimpose the digits
• Add without carry
• Obtain the probable address
1
2
321
456
+987
3
4
5
654
• The probable address is 645
6
7
8
9
Folding by shifting
• Slide digits over another
123
456
+789
258
• Polynomial hashing
• The key is divided by polynomial
• Can be used to convert key to address
Packing factor
• Packing factor
• Another method for reducing collision
• Packing factor of a file = ratio of number of items stored in the file to the
capacity of the file
Packing factor = number of records stored/total no of storage locations
• A measure of storage utilization
• Also called packing density or load factor
• the higher the packing load, the higher the likelihood of collision
Coalesced hashing
• Coalesced hashing
• Collision resolution method that uses pointer to connect to synonyms
chain
Download