Slides - Canisius College Computer Science

advertisement

CSC 213 – Large Scale Programming

LECTURE 11:

INDEXED FILES

Dictionaries in Real World

 Often need large database on many machines

 Split search terms across machines

 Updating & searching work split between machines

 Database way too large for any single machine

 If you think about it, this is incredibly common

 Where?

Split Dictionaries

Splitting Keys From Values

 In real world, we often have many indices

 Simple units measure where we can find values

 Values could be searched for in multiple ways

Splitting Keys From Values

 In real world, we often have many indices

 Simple units measure where we can find values

 Values could be searched for in multiple ways

Index & Data Files

 Split information into two (or more) files

 Data file uses fixed-size records to store data

 Index files contain search terms & data locations

 Fixed-size records usually used in data file

 Each record will use exactly that much space

 Extra space wasted if the value is smaller

 But limits data size, cannot get more space

 Makes it far easier to reuse space & rebuild index

Index File Format

No standard format – depends on type of data

 Often variable sized , but this not specific requirement

 Each entry in index file begins with exact search term

 Followed by position containing matching data

 As a result, often find indexes smushed together

Can read indexes at start of program execution

 Reasonably assumes index file smaller than data file

 Changes written immediately , however

 When program starts, do NOT read data file

Never Read Data File

Indexed Files

 Enables splitting search terms across computers

 Alphabetical split searches faster on many servers

U-X Y-Z

A - C

S-T

D-E

Q-R

F-H

I-P

Indexed Files

 Enables splitting search terms across computers

 Create indexes for different types of searching

Song name

Song

Length

How Does This Work?

 Using index files simplified using positions

Look in index structure to find position of data in file

With this position can then seek to specific record

 Create instance & initialize by reading data from file

Starting with Indexed Files

American Telephone & Telegraph 0

International Business Machines 112

Ford Motorcars, Inc.

224

F 224

IBM 0

T 112

IBM 106 IBM AT & T 23 T Ford 2 F

How Does This Work?

Adding new records takes only a few steps

 Add space for record with setLength on data file

Update index structure(s) to include new record

Records in data file updated at each change

Adding New Data To The Files

American Telephone & Telegraph 0

Citibank 336

International Business Machines 112

Ford Motorcars, Inc.

224

C

F

336

224

IBM 0

T 112

IBM 106 IBM AT & T 23 T Ford 2 F 0

Adding New Data To The Files

American Telephone & Telegraph 0

Citibank 336

International Business Machines 112

Ford Motorcars, Inc.

224

C

F

336

224

IBM 0

T 112

IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C

How Does This Work?

 Removing records even easier

 To prevent using record, remove items from indexes

 Do NOT update index file(s) until program completes

 Use impossible magic numbers for record in data file

Removing Data As We Go

American Telephone & Telegraph 0

Citibank 336

International Business Machines 112

Ford Motorcars, Inc.

224

C

F

336

224

IBM 0

T 112

IBM 106 IBM AT & T 23 T Ford 2 F Citibank -2 C

Removing Data As We Go

American Telephone & Telegraph 0

Citibank 336

International Business Machines 112

C 336

IBM 0

T 112

IBM 106 IBM AT & T 23 T Ford 0 Ø Citibank -2 C

For Next Lecture

 Weekly assignment still available online

 Continues to be due Wednesday at 5PM

 Ask me questions , if you have trouble on a problem

 Reading Section 9.1 in textbook about Map ADT

 How do we look up data?

 What other ADTs are out there?

 How could they relate to today's lecture?

Download