Linked List Variants A (dummy) head node is used so that every node has a predecessor eliminates special cases for inserting and deleting. 9 ? first 17 22 26 34 The data part of the head node might be used to store some information about the list, e.g., the number of values in the list. A (dummy) trailer node can be used so that every node has a successor first ? 9 17 22 26 34 ? If data portion of element is large, two or more lists can share the same trailer node Circularly Linked Lists instead of the last node containing a NULL pointer, it contains a pointer to the first node For such lists, one can use a single pointer to the last node in the list, because then one has direct access to it and "almost-direct" access to the first node. last 9 17 22 26 34 Each node in a circular linked list has a predecessor(and a successor), provided that the list is nonempty. insertion and deletion do not require special consideration of the first node. Circularly Linked Lists For example, item can be inserted as follows: newptr = new Node(item, 0); if (first == 0) // list is empty { newptr->next = newptr; first = newptr; } else // nonempty list { newptr->next = predptr->next; predptr->next = newptr; } Note that a one-element circularly linked list points to itself. Circularly Linked Lists Traversal must be modified: don't an infinite loop looking for end of list as signalled by a null pointer. Like other methods, deletion must also be slightly modified. Deleting the last node is signalled when the node deleted points to itself. if (first == 0) // list is empty // Signal that the list is empty else { ptr = predptr->next; // hold node for deletion if (ptr == predptr) // one-node list first = 0; else // list with 2 or more nodes predptr->next = ptr->next; delete ptr; } Linked List Variants doubly-linked list => bidirectional movement!! Each node has two pointers — one to its successor (null if there is none) and one to its precedessor (null if there is none.) L prev last 9 first mySize 5 17 22 26 34 next Doubly linked lists give one more flexibility (can move in either direction) BUT at significant cost : DOUBLE the overhead for links More complex code Linked List Variants Linked lists can be both doubly-linked and circular: L last 9 first mySize 17 22 26 34 5 Add a head node to acquire the implementation used in STL's list class. STL list Class Template list is a sequential container optimized for insertion and erasure at arbitrary points in the sequence Implementation: circular doubly-linked list with head node. L prev last data 9 first mySize 5 17 22 26 34 next Caveat: Ease of use is a trade-off for the significant overhead of doubly-linked lists Moral: Be aware of what you are using and its costs/benefits Hash Tables Linear search takes O(n) Binary search takes O(log n) Both depend on comparisons of item sought and elements in container Hash tables place data so that the location of an item is determined directly as a function of the item itself. With a good hash function, searching a hash table takes O(1) time that is, it is constant and does not depend on the number of items stored. One problem with hash tables is collisions: when more than one item map to the same location in the hash table. The hash function and data set determine the number of collisions and the strategy used to handle collisions affect the performance of searching for an arbitrary item. Hash Tables Given up to 25 integers in the range 0 through 999 to be stored in a hash table. This hash table can be implemented as an integer array table in which each array element is initialized with some dummy value, such as -1. If we use each integer i in the set as an index, that is, if we store i in table[i], then to determine whether a particular integer number has been stored, we need only check if table[number] is equal to number. The hash function then is h(i) = i The hash function determines the location of an item i in the hash table. Hash Tables The hash function in the previous example works perfectly because the time required to search the table for a given value is constant; only one location needs to be examined. This hash function then is very time efficient, but it is surely not space-efficient. Only 25 of the 1000 available locations are used to store items, leaving 975 unused loca-tions; only 2.5 percent of the available space is used, and so 97.5 percent is wasted! Because it is possible to store 25 values in 25 locations, we might try improving space utilization by using an array table with capacity 25. Modified hash function h(i) = i modulo 25 addresses the space problem // C++ syntax, int h(int i) { return i % 25;} Hash Tables int h(int i) { return i % 25;} always produces an integer in the range 0 through 24. 52 thus is stored in table[2], since h(52) = 52 % 25 = 2. 129, 500, 273, and 49 are stored in locations 4, 0, 23, and 24, respectively. INDEX VALUE 0 500 1 -1 2 52 3 -1 4 129 5 -1 … 23 273 24 49 Hash Tables But what about placing 77? h(77) = 77 % 25 = 2 Collision!! Other values may collide at a given position, for example, all integers of the form 25k + 2 hash to location 2. Some strategy is needed to resolve such collisions: 1. Need to be able to place element when its mapped location full 2. Need to be able to retrieve element when it's not placed directly according to the hash function Collision Strategy linear probing: linear search of the table from location of collision until an empty slot is found in which the item can be stored. When 77 collides with 52 at location 2, put 77 in position 3 INDEX VALUE 0 500 1 -1 2 52 3 77 4 129 5 102 … 23 273 24 49 To insert 102, we follow the probe sequence consisting of locations 2, 3, 4, and 5 to find the first available location and thus store 102 in table[5]. Collision Strategy If the search reaches the bottom of the table, continue at the first location. 123 collides with 273 at location 23, and the probe sequence 23, 24, 0, 1 locates the first empty slot at position 1. 123 placed in position 1 INDEX VALUE 0 500 1 123 2 52 3 77 4 129 5 102 … 23 273 24 49 Cost of Linear Probing To determine if a specified value is in this hash table, apply the hash function to compute the location for this value 1. if location is empty, value not in the table. 2. if location contains the specified value, the search is successful. 3. if location contains a different value, must rule out collision begin a “circular” linear search at this location and continue until either item is found or empty or starting location reached (item not in table) 1. & 2. O(1) time 3. worst case: O(n) Another Collision Strategy Chaining: use a hash table that is an array (or vector) of linked lists to store the items. For example, to store names "alphabetically", use an array table of 26 linked lists, initially empty, and the simple hash function h(name) = name[0] - ‘A’; that is, h(name) is 0 if name[0] is ‘A’, 1 if name[0] is ‘B’, . . . , 25 if name[0] is ‘Z’ Searching such a hash table is straightforward: apply the hash function to the item sought and then use one of the search algorithms for linked lists. When a collision occurs, we simply insert the new item into the appropriate linked list. Performance of Hash function The behavior of the hash function affects the frequency of collisions. For example, the preceding hash function is not optimal because some letters occur more frequently than others. the linked list of names beginning with ‘S’ tends to be much longer than that containing names that begin with ‘Z’. clustering effect results in longer search times for S-names than for Znames. Need a better hash function to distribute names more uniformly in hash table The hash function must not, however, be so complex that the time required to evaluate it makes the search time unacceptable. Random Hashing An ideal hash function is simple to evaluate scatters items throughout the hash table minimizing the probability of collisions. random hashing uses a simple random number generation technique to scatter the items “randomly” throughout the hash table. The item to store is first transformed into a large random integer and then reduced modulo the table’s capacity to determine its location: randomInt = ((MULTIPLIER * item) + ADDEND) % MODULUS location = randomInt % CAPACITY; random hashing can be used with any object first encoded as an integer For example, a name might be encoded as the sum of the ASCII codes of some or all of its letters. Iterators an abstraction of a pointer, hiding some of its details and eliminating some of its hazards. For example, a list<T>::iterator is a class within the list class template that contains a data member node, which is a pointer to a list_node (the struct used to describe nodes in the list class template): template<typename T> class list { public: class iterator // ... simplified here ... { protected: list_node * node; // ... and here ... . . . }; . . . }; Iterators The iterator class overloads operator*()to return the value of the data member in the list_node pointed to by the iterator’s node member: return node->data; Also overloads operator++ () to “increment” the iterator to the next node in the list // prefix version node = node->next; return *this; // postfix version iterator tmp = node; node = node->next; return tmp; and overloads operator--() similarly to “decrement” the iterator to the previous node in the list. Iterators two important iterator-valued functions: begin(), which returns an iterator to the first value in the list; and end(), which returns an iterator that points beyond the final value in the list, respectively. Iterators normally go from the first node to the last. One can use a reverse_iterator to go from last to first. list<int>:: reverse_iterator it; for (it = myList.rbegin(); it != myList.rend(); it++) cout << *it << " "; cout << endl; cout << "that was my list contents displayed" << endl;