CSE 250: Data Structures Week 12 March 31 – April 4, 2008 Announcements Project 3 due extended to April 7th Homework 4 due April 11th Homework 5 due April 18th Project 4 due April 25th Final Exam May 1st 11:45 – 2:45 Cooke 121 Week’s Topics Hashing (Key, value) pairs From the key, retrieve a value using a hash function to tell you where it will be stored in your table. Hash functions should be simple to compute, ensure that distinct keys hash to distinct values, and ideally, distribute keys evenly among the elements in the store. Week’s Topics It is possible to have two values hash to the same location in the table. This is called a collision. What do we do in this situation? Two main strategies: Separate Chaining Open Addressing Week’s Topics Separate Chaining Keep a list (think linked list) of all elements that hash to the same value What are the potential shortcomings of this approach? Week’s Notes Open Addressing Keep looking (probing) until you find an appropriate open space to put the element you are trying to store. The probing must be systematic so that you can find the inserted element at a later time using the same process. Week’s Notes Linear Probing If the spot you hashed to is already taken, move to the next one and so on, wrapping around to the beginning of the table if necessary until you find an available spot for insertion. Problem: primary clustering – what is it – why does it happen? Week’s Topics Quadratic Probing Probe using a quadratic method, first look one away, then four away, then nine away… Problems involving load factor of the table and size of table – when can quadratic probing go wrong? Week’s Topics Double hashing Use a second hash function to control the probing distance that is not necessarily linear or quadratic Week’s Topics How do you delete from a hash table that uses probing. If you delete the element outright, you could be cutting off the search for elements that come after it – what is the strategy employed when deleting? Week’s Notes Rehashing After the table gets too full (or there have been a significant number of deletes), we should resize the table. After resizing, you would need a new hash function and should re-hash the old values into their new places in the new, larger table.