Linked List Variants

advertisement
Linked List Variants
A (dummy) head node is used so that every node has a predecessor
 eliminates special cases for inserting and deleting.
9
?
first
17
22
26
34
The data part of the head node might be used to store some information
about the list, e.g., the number of values in the list.
A (dummy) trailer node can be used so that every node has a successor
first
?
9
17
22
26
34
?
If data portion of element is large, two or more lists can share the same trailer
node
Circularly Linked Lists
instead of the last node containing a NULL pointer, it contains a pointer to the first node
For such lists, one can use a single pointer to the last node in the list, because then one has
direct access to it and "almost-direct" access to the first node.
last
9
17
22
26
34
Each node in a circular linked list has a predecessor(and a successor), provided that the
list is nonempty.
 insertion and deletion do not require special consideration of the first node.
Circularly Linked Lists
For example, item can be inserted as follows:
newptr = new Node(item, 0);
if (first == 0) // list is empty
{
newptr->next = newptr;
first = newptr;
}
else
// nonempty list
{
newptr->next = predptr->next;
predptr->next = newptr;
}
Note that a one-element circularly linked list points to itself.
Circularly Linked Lists
Traversal must be modified: don't an infinite loop looking for end of list as signalled by a
null pointer.
Like other methods, deletion must also be slightly modified.
Deleting the last node is signalled when the node deleted points to itself.
if (first == 0) // list is empty
// Signal that the list is empty
else
{
ptr = predptr->next; // hold node for deletion
if (ptr == predptr) // one-node list
first = 0;
else // list with 2 or more nodes
predptr->next = ptr->next;
delete ptr;
}
Linked List Variants
doubly-linked list => bidirectional movement!!
Each node has two pointers — one to its successor (null if there is none) and one to its
precedessor (null if there is none.)
L
prev
last
9
first
mySize
5
17
22
26
34
next
Doubly linked lists give one more flexibility (can move in either direction)
BUT at significant cost :
DOUBLE the overhead for links
More complex code
Linked List Variants
Linked lists can be both doubly-linked and circular:
L
last
9
first
mySize
17
22
26
34
5
Add a head node to acquire the implementation used in STL's list class.
STL list Class Template
list is a sequential container
optimized for insertion and erasure at arbitrary points in the sequence
Implementation: circular doubly-linked list with head node.
L
prev
last
data 9
first
mySize
5
17
22
26
34
next
Caveat:
Ease of use is a trade-off for the significant overhead of doubly-linked lists
Moral:
Be aware of what you are using and its costs/benefits
Hash Tables
Linear search takes O(n)
Binary search takes O(log n)
Both depend on comparisons of item sought and elements in container
Hash tables place data so that the location of an item is determined
directly as a function of the item itself.
With a good hash function, searching a hash table takes O(1) time
that is, it is constant and does not depend on the number of items
stored.
One problem with hash tables is collisions: when more than one item
map to the same location in the hash table.
The hash function and data set determine the number of collisions and
the strategy used to handle collisions affect the performance of
searching for an arbitrary item.
Hash Tables
Given up to 25 integers in the range 0 through 999 to be stored in a
hash table.
This hash table can be implemented as an integer array table in which
each array element is initialized with some dummy value, such as -1.
If we use each integer i in the set as an index, that is, if we store i in
table[i], then to determine whether a particular integer number has been
stored, we need only check if table[number] is equal to number.
The hash function then is h(i) = i
The hash function determines the location of an item i in the hash table.
Hash Tables
The hash function in the previous example works perfectly because the
time required to search the table for a given value is constant;
only one location needs to be examined.
This hash function then is very time efficient,
but it is surely not space-efficient.
Only 25 of the 1000 available locations are used to store items, leaving
975 unused loca-tions; only 2.5 percent of the available space is used,
and so 97.5 percent is wasted!
Because it is possible to store 25 values in 25 locations, we might try
improving space utilization by using an array table with capacity 25.
Modified hash function h(i) = i modulo 25 addresses the space problem
// C++ syntax,
int h(int i)
{ return i % 25;}
Hash Tables
int h(int i)
{ return i % 25;}
always produces an integer in the range 0 through 24.
52 thus is stored in table[2], since h(52) = 52 % 25 = 2.
129, 500, 273, and 49 are stored in locations 4, 0, 23, and 24, respectively.
INDEX VALUE
0
500
1
-1
2
52
3
-1
4
129
5
-1
…
23
273
24
49
Hash Tables
But what about placing 77? h(77) = 77 % 25 = 2
 Collision!!
Other values may collide at a given position, for example, all integers of the
form 25k + 2 hash to location 2.
Some strategy is needed to resolve such collisions:
1. Need to be able to place element when its mapped location full
2. Need to be able to retrieve element when it's not placed directly
according to the hash function
Collision Strategy
linear probing: linear search of the table from location of collision until an
empty slot is found in which the item can be stored.
When 77 collides with 52 at location 2, put 77 in position 3
INDEX VALUE
0
500
1
-1
2
52
3
77
4
129
5
102
…
23
273
24
49
To insert 102, we follow the probe sequence consisting of locations 2, 3, 4,
and 5 to find the first available location and thus store 102 in table[5].
Collision Strategy
If the search reaches the bottom of the table, continue at the first location.
123 collides with 273 at location 23, and the probe sequence 23, 24, 0, 1
locates the first empty slot at position 1.
 123 placed in position 1
INDEX VALUE
0
500
1
123
2
52
3
77
4
129
5
102
…
23
273
24
49
Cost of Linear Probing
To determine if a specified value is in this hash table,
apply the hash function to compute the location for this value
1. if location is empty, value not in the table.
2. if location contains the specified value, the search is successful.
3. if location contains a different value, must rule out collision
 begin a “circular” linear search at this location and continue until either
item is found or empty or starting location reached (item not in table)
1. & 2. O(1) time
3. worst case: O(n)
Another Collision Strategy
Chaining: use a hash table that is an array (or vector) of linked lists to store
the items.
For example, to store names "alphabetically", use an array table of 26 linked
lists, initially empty, and the simple hash function h(name) = name[0] - ‘A’;
that is, h(name) is
0 if name[0] is ‘A’,
1 if name[0] is ‘B’, . . . ,
25 if name[0] is ‘Z’
Searching such a hash table is straightforward:
apply the hash function to the item sought and then use one of the
search algorithms for linked lists.
When a collision occurs, we simply insert the new item into the appropriate
linked list.
Performance of Hash function
The behavior of the hash function affects the frequency of collisions.
For example, the preceding hash function is not optimal because some
letters occur more frequently than others.
 the linked list of names beginning with ‘S’ tends to be much longer
than that containing names that begin with ‘Z’.
 clustering effect results in longer search times for S-names than for Znames.
Need a better hash function to distribute names more uniformly in hash
table
The hash function must not, however, be so complex that the time required
to evaluate it makes the search time unacceptable.
Random Hashing
An ideal hash function is
 simple to evaluate
 scatters items throughout the hash table
 minimizing the probability of collisions.

random hashing uses a simple random number generation technique
to scatter the items “randomly” throughout the hash table.
The item to store is first transformed into a large random integer and then
reduced modulo the table’s capacity to determine its location:
randomInt = ((MULTIPLIER * item) + ADDEND) % MODULUS
location = randomInt % CAPACITY;
random hashing can be used with any object first encoded as an integer
For example, a name might be encoded as the sum of the
ASCII codes of some or all of its letters.
Iterators
an abstraction of a pointer, hiding some of its details and eliminating
some of its hazards.
For example, a list<T>::iterator is a class within the list class
template that contains a data member node, which is a pointer to a
list_node (the struct used to describe nodes in the list class template):
template<typename T>
class list
{
public:
class iterator // ... simplified here ...
{
protected:
list_node * node; // ... and here ...
. . .
};
. . .
};
Iterators
The iterator class overloads operator*()to return the value of the
data member in the list_node pointed to by the iterator’s node member:
return node->data;
Also overloads operator++ () to “increment” the iterator to the next
node in the list
// prefix version
node = node->next;
return *this;
// postfix version
iterator tmp = node;
node = node->next;
return tmp;
and overloads operator--() similarly to “decrement” the iterator to the
previous node in the list.
Iterators
two important iterator-valued functions:
begin(), which returns an iterator to the first value in the list; and
end(), which returns an iterator that points beyond the final value in
the list, respectively.
Iterators normally go from the first node to the last.
One can use a reverse_iterator to go from last to first.
list<int>:: reverse_iterator it;
for (it = myList.rbegin(); it != myList.rend(); it++)
cout << *it << " ";
cout << endl;
cout << "that was my list contents displayed" << endl;
Download