Media:FileOrganization

advertisement
Topic 7: File Organization
Logical vs. Physical Organization of Data

logical organization



the abstract way that the computer program is
able to access the data
use of logical structures (e.g. linked lists)
physical organization


the actual physical structure of data in memory
i.e. what the sequence of bits look like in memory
Definitions

database


file


collection of related records
record


collection of related files
collection of related fields (e.g. Name, Age)
key field

uniquely identifies a record (e.g. UserID)
Basics (General Idea)





Records are stored at different places (different
indices or locations)
To find a record, we need to know its location
We can search for the record
OR
Jump to its location directly (if location is known)
OR
A combination of jumping and searching
Sequential File Organization

Records in a file are stored sequentially (in
order) by some key field
2480 Bob
2569 Alice
3020 Paul



Originally designed to operate on magnetic
tapes
How do we find a record?
What happens when we try to add a new
record? (It’s going to be bad…)
Partially-Indexed Sequential Files



1.
2.
3.
File index (address) ~ index in a book
Partially index all the records
Key field has direct index to a section where
record of interest is located
Sequential search for key field
Directly link to section of records
Sequential search for record of interest
Partially-Indexed Sequential Files
Key
Record
Address
Record
1
A
1
2
B
6
3
C
11
D
16
4
5
6
7
8
9
10
11
12
Fully Indexed Files



Every record has an index (address)
Sequentially search through key field for
specific record address
Records may be accessed directly OR in
sequential order by address
Fully Indexed Files
Key
Record
Address
a
4
b
7
c
5
d
3
e
12
m
9
n
10
p
2
s
11
t
6
z
1
Direct Access File Organization



Record address is derived/calculated with math
No need to search through an index
Example:
Record Address = UserID MOD 8 + SSN MOD 3
Record Address = UserID%8 + SSN%3

This math operation is called “key hashing” or “hashing”
Fixed-length vs. Variable-length Records

Fixed-length


each record is a set size
can be used with direct access file organization


access based on math calculations, so size must be
fixed in length
Variable-length


each record is a variable size
can be used with sequential file organization

access is all indexed, so size does not matter
Download