CSC 305- Introduction to Database Systems A.O. Afolabi File organisation • Storage of data • Organization of data • Access to data or information File organization and processing • Two levels • Physical level • Actual manner information is stored in the computer • Has many different constructs • Logical level • An abstraction of our information is stored in the computer • user’s perception of how information is stored • Might be different from how it is physically stored • Few constructs File organization • Physical level is a lower level for information representation than logical level • A DBMS transforms information from logical level to physical level • DBMS – A special-purpose software for data sorting and manipulation Physical file Vs Logical file • Physical file- contains original data. • A physical file contains one record format. • describe how the data is to be displayed to or retrieved from a program. • Logical File - It does not occupy memory space. It does not contain data. • It can contain up to 32 record formats • It cannot exist without a physical file. • Has a description of the records found in one or multiple physical files. Logical Level vs Logical Level Logical Level Physical Level DBMS File structures File organization • Records = Field 1 Field 2 Field 3 Field4 Field5 Field6 • Student record = Reg. No Name DOB Fac Dept Part • File -Consists of record of same format • Primary key = field or composite of several fields • Primary key – distinguishes a record from all others • Secondary keys or attributes – all the remaining fields • Binary search key – compares the key of the sought record with the middle record of the file • Probe- an access to a distinct location Probing a file • Probing an unsorted file of n size (e.g. 10 records) • Probing a sorted file (n/2) • The challenges • Use a file of size 10 • Illustrate challenge search the 11th record (non-existence) File Organizations • Three types • Sequential organization – sequential access • Access to multiple records, entire file • Pre-defined order for access • Direct organization – Direct access • Aka random access • Accessing a single record • Indexed sequential – sequential and Direct File organisation • Effective organisation requires Organisation type Intended access type Match • Organisation might be ideal for one type of access but not suitable for another • Match use with structure • Indexed sequential – for applications that require both types of access Sequential File organisation • Organisation 1 2 …… x X+1 X+2 • Processing requires moving from one record to the other incrementing the address of the current record • Record location is done by processing the records in a file in order of occurrence until the target record is found or all records are processed • Use the example of students records to illustrate this type of file organisation Sequential file organisation continues • • • • • • • • • Improving retrieval performance Sort based on the records key values Key1<key2<key3…..keyx<….keyn Using binary search – compares the key of the sought record with the middle record of the file Search is further reduced on ordered file using BS If Keysought<keymiddle Upper portion of the file has been removed If Keysought>keymiddle Lower portion of the file removed include one compared against Direct File Organisation • People are impatient • Rapid location of desired information required • Desire to go directly to the address where record is stored • Easier if key were the address – unique key • Example students matric number Location from a unique storage address • Convert primary key to a unique storage address • Location A[i] = b+ m x s • Where • A = array of any dimension • i= an element • b=location of the first element in the array • m= number of elements preceding ith element • s= size (in address unit) of an element of the array • The formula helps to locate the element in one search Retrieval time- Hashing • Other methods to keep retrieval time to minimum • Hashing • Reducing number of collision (when two distinct keys map to the same address) • Hashing has two aspects in this instance • The function • The collision resolution method • Collision mechanism important when number of records mapped to a given location exceeds its limit Hashing functions • F(Key) = key mod N, where • N = table size • Benefit: - returns values in the range of address space (e.g. 0 to N-1) • 8 mod 5 = 3 • A variation • F(Key) = key mod P, where • P = the smallest prime number >= N • P is then the new table size Hashing functions continues • Truncation or substring – a substring can be used as the address • Example, student matric number • Folding – two types • Folding by boundary • Folding by shifting Folding by boundaries • Demarcate key into boundaries • Superimpose the digits • Add without carry • Obtain the probable address 1 2 321 456 +987 3 4 5 654 • The probable address is 645 6 7 8 9 Folding by shifting • Slide digits over another 123 456 +789 258 • Polynomial hashing • The key is divided by polynomial • Can be used to convert key to address Packing factor • Packing factor • Another method for reducing collision • Packing factor of a file = ratio of number of items stored in the file to the capacity of the file Packing factor = number of records stored/total no of storage locations • A measure of storage utilization • Also called packing density or load factor • the higher the packing load, the higher the likelihood of collision Coalesced hashing • Coalesced hashing • Collision resolution method that uses pointer to connect to synonyms chain