Assignment 2: A Simple Address Book Andy Wang Data Structures, Algorithms, and

advertisement

Assignment 2:

A Simple Address Book

Andy Wang

Data Structures, Algorithms, and

Generic Programming

Learning Objectives

Gain further experience in C++

Manage project develop using make

Apply the concept of hash functions

Mission

Write a simple address book

Use hash functions to access a table

Insertion

Lookup

Deletion

Deliverables (Due 9/26/2003)

 makefile

 hashtable.h

 hashtable.cpp

 main.cpp

Development logs

More on Development Log

Due in each class

Cumulative

9/17, turn in a log that covers 9/15 – 9/17

9/22, turn in a log that covers 9/15 – 9/22

And so on…

Requirements

Create a proj2 directory

 makefile hashtable.h

 hashtable.cpp

main.cpp

hashtable.h

Interface of the class HashTable

Required public interface: bool Insert(const string &name, const string &addr); bool Lookup(const string &name, string *addr) const; string Remove(const string &name);

Constructor

 Hashtable(unsigned int size);

Default size: 5

Insert() bool Insert(const string &name, const string &addr);

• Return value

• true on success

• false on failure

• Duplicate name

Table is full

Insert()

Name: Michael Jackson

Address:

2

3

0

1

Hash Table

Insert()

Name: Michael Jackson

Address: Never-Never Land

Hash(name)

2

3

0

1

Hash Table

Insert()

Name: Michael Jackson

Address: Never-Never Land

Hash(name)

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

Insert()

Name: Mickey Mouse

Address: Disneyland

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

Insert()

Name: Mickey Mouse

Address: Disneyland

Hash(name)

Oh, Boy!

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

One problem with hashing— collision

Collision handling techniques:

Increase the hash table size

Chaining

Linear probing

Quadratic probing

Double hashing

Increase the Hash Table Size

If collision, double the size, rehash all entries

+ Conceptually simple

-

Unbounded growth of table size

Birthday Paradox

 In a group of 60 people, you are very likely to find two people with the same birthday.

 The probability of collisions is higher than you think!

Chaining

Requires a data structure from a later lecture

Basic idea: each table entry is associated with multiple (key, value) pairs

Sequentially look through those pairs

+ incremental growth of the table

- poor performance when an entry is associated with too many (key, value) pairs

Linear Probing

Sequentially find the next empty table entry

(Hash(key) + 1) % size

(Hash(key) + 2) % size …

-

-

+ Simple

+ Use all table entries before size increase

Clustering (nonuniform distribution of table entries)

Can degenerate into sequential searches

Quadratic Probing

Try to avoid clustering

(Hash(key) + 1 2 ) % size

(Hash(key) + 2 2 ) % size…

+ Simple

- Secondary clustering (not as severe)

Double Hashing

Use another hash function to determine the skipping distance

If Hash

1

(key) points to an occupied entry

Use (Hash

1

(key) + Hash

2

(Hash

1

(key))) % size

(Hash

1

(key) + 2*Hash

2

(Hash

1

(key)) % size…

+ Avoids secondary clustering

+ Widely used in compilers

Lookup() bool Lookup(const string &name, string *address) const;

• Return value

• true if found

• address contains the address false if not found

Lookup()

Name: Michael Jackson

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

Lookup()

Name: Michael Jackson Hash(name)

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

Remove() string Remove(const string &name) const;

• Return value

The address associated with the name

Remove()

Name: Michael Jackson

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

Remove()

Name: Michael Jackson Hash(name)

2

3

0

1 Michael Jackson Never-Never Land

Hash Table

Remove()

Name: Michael Jackson Hash(name)

2

3

0

1

Hash Table

Your Hash Function

 Hash(name) // may contain spaces

Develop/choose your own hash function(s)

Use all characters in the name

Use the ordering information of each character

Short names should not cluster in the table

Milestones

1. Create all files

Add #include into .cpp

files

Add #ifndef into .h

files

Add files names into the makefile make clean

 make

Milestones

2. Add the class definitions to the header files

Try to compile

3. Write the skeleton code in your .cpp files

Nothing but empty functions

Try to get the interface to compile

4. Write pseudo code (comments) in each skeleton function

Milestones

5. Implement the constructor/destructor routines

Add private functions as needed

Compile and test those routines

6. Implement a dump routine to visualize the hash table

Milestones

7. Implement the accessor routines that only read your private data members

Compile and test those routines

8. Implement the accessor routines that write to your private data members

Compile and test those routines

Milestones

9. Implement the Insert routine

Implement the associated hash function(s)

Test the hash function in isolation

Check for clustering

Check for duplicate names

10. Implement collision handling

Milestones

11. Implement the Lookup routine

Compile and test the routine

12. Implement the Remove routine. Compile and test the routine

Pace Yourself

Try to accomplish 1 to 2 milestones per day

Start your project early!

Download