CSC 213 – Large Scale Programming
LECTURE 11:
INDEXED FILES
Dictionaries in Real World
Often need large database on many machines
Split search terms across machines
Updating & searching work split between machines
Database way too large for any single machine
If you think about it, this is incredibly common
Where?
Split Dictionaries
Splitting Keys From Values
In real world, we often have many indices
Simple units measure where we can find values
Values could be searched for in multiple ways
Splitting Keys From Values
In real world, we often have many indices
Simple units measure where we can find values
Values could be searched for in multiple ways
Index & Data Files
Split information into two (or more) files
Data file uses fixed-size records to store data
Index files contain search terms & data locations
Fixed-size records usually used in data file
Each record will use exactly that much space
Extra space wasted if the value is smaller
But limits data size, cannot get more space
Makes it far easier to reuse space & rebuild index
Index File Format
No standard format – depends on type of data
Often variable sized , but this not specific requirement
Each entry in index file begins with exact search term
Followed by position containing matching data
As a result, often find indexes smushed together
Can read indexes at start of program execution
Reasonably assumes index file smaller than data file
Changes written immediately , however
When program starts, do NOT read data file
Never Read Data File
Indexed Files
Enables splitting search terms across computers
Alphabetical split searches faster on many servers
U-X Y-Z
A - C
S-T
D-E
Q-R
F-H
I-P
Indexed Files
Enables splitting search terms across computers
Create indexes for different types of searching
Song name
Song
Length
How Does This Work?
Using index files simplified using positions
Look in index structure to find position of data in file
With this position can then seek to specific record
Create instance & initialize by reading data from file
Starting with Indexed Files
American Telephone & Telegraph 0
International Business Machines 112
Ford Motorcars, Inc.
224
F 224
IBM 0
T 112
IBM 106 IBM AT & T 23 T Ford 2 F
How Does This Work?
Adding new records takes only a few steps
Add space for record with setLength on data file
Update index structure(s) to include new record
Records in data file updated at each change
Adding New Data To The Files
American Telephone & Telegraph 0
Citibank 336
International Business Machines 112
Ford Motorcars, Inc.
224
C
F
336
224
IBM 0
T 112
IBM 106 IBM AT & T 23 T Ford 2 F 0
Adding New Data To The Files
American Telephone & Telegraph 0
Citibank 336
International Business Machines 112
Ford Motorcars, Inc.
224
C
F
336
224
IBM 0
T 112
IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C
How Does This Work?
Removing records even easier
To prevent using record, remove items from indexes
Do NOT update index file(s) until program completes
Use impossible magic numbers for record in data file
Removing Data As We Go
American Telephone & Telegraph 0
Citibank 336
International Business Machines 112
Ford Motorcars, Inc.
224
C
F
336
224
IBM 0
T 112
IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C
Removing Data As We Go
American Telephone & Telegraph 0
Citibank 336
International Business Machines 112
C 336
IBM 0
T 112
IBM 106 IBM AT & T 23 T Ford 0 Ø Citibank -2 C
For Next Lecture
Weekly assignment still available online
Continues to be due Wednesday at 5PM
Ask me questions , if you have trouble on a problem
Reading Section 9.1 in textbook about Map ADT
How do we look up data?
What other ADTs are out there?
How could they relate to today's lecture?