Advanced Data Structures

advertisement
Advanced Data Structures
Introduction
Brought to you by Max (ICQ:31252512 TEL:61337706)
February 5, 2005
Outline
• Review of some data structures



Array
Linked List
Sorted Array
• New stuff




3 of the most important data structures in OI (and your own
programming)
Binary Search Tree
Heap (Priority Queue)
Hash Table
Page 2
Review
• How to measure the merits of a data structure?
• Time complexity of common operations





Function Find(T : DataType) : Element
Function Find_Min() : Element
Procedure Add(T : DataType)
Procedure Remove(E : Element)
Procedure Remove_Min()
Page 3
Review - Array
• Here Element is simply the integer index of the array cell
• Find(T)

Must scan the whole array, O(N)
• Find_Min()

Also need to scan the whole array, O(N)
• Add(T)

Simply add it to the end of the array, O(1)
• Remove(E)


Deleting an element creates a hole
Copy the last element to fill the hole, O(1)
• Remove_Min()

Need to Find_Min() then Remove(), O(N)
Page 4
Review - Linked List
• Element is a pointer to the object
• Find(T)

Scan the whole list, O(N)
• Find_Min()

Scan the whole list, O(N)
• Add(T)

Just add it to a convenient position (e.g. head), O(1)
• Remove(E)

With suitable implementation, O(1)
• Remove_Min()

Need to Find_Min() then Remove(), O(N)
Page 5
Review - Sorted Array
• Like array, Element is the integer index of the cell
• Find(T)

We can use binary search, O(logN)
• Find_Min()

The first element must be the minimum, O(1)
• Add(T)


First we need to find the correct place, O(logN)
Then we need to shift the array by 1 cell, O(N)
• Remove(E)


Deleting an element creates a hole
Need to shift the of array by 1 cell, O(N)
• Remove_Min()

Can be O(1) or O(N) depending on choice of implementation
Page 6
Review - Summary
Array
Linked List
Sorted Array
Find
O(N)
O(N)
O(logN)
Find_Min
O(N)
O(N)
O(1)
Add
O(1)
O(1)
O(N)
Remove
O(1)
O(1)
O(N)
Remove_Min O(N)
O(N)
O(1) or O(N)
• If we are going to perform a lot of these operations (e.g.
N=100000), none of these is fast enough!
Page 7
Advanced Data Structures
Binary Search Tree
Brought to you by Max (ICQ:31252512 TEL:61337706)
February 5, 2005
What is a Binary Search Tree?
• Use a binary tree to store the data
• Maintain this property

Left Subtree < Node < Right Subtree
Page 9
Binary Search Tree - Implementation
•
Definition of a Node:
Node = Record
Left, Right : ^Node;
Value : Integer;
End;
•
To search for a value (pseudocode)
Node Find(Node N, Value V) :If (N.Value = V)
Return N;
Else If (V < N.Value) and (V.Left != NULL)
Return Find(N.Left);
Else If (V > N.Value) and (V.Right != NULL)
Return Find(N.Right);
Else
Return NULL; // not found
Page 10
Binary Search Tree - Find
Page 11
Binary Search Tree - Remove
• Case I : Removing a leaf node

Easy
• Case II : Removing a node with a single child

Replace the removed node with its child
• Case III : Removing a node with 2 children



Replace the removed node with the minimum element in the right
subtree (or maximum element in the left subtree)
This may create a hole again
Apply Case I or II
• Sometimes you can avoid this by using “Lazy Deletion”


Mark a node as removed instead of actually removing it
Less coding, performance hit not big if you are not doing this
frequently (may even save time)
Page 12
Binary Search Tree - Remove
Page 13
Binary Search Tree - Summary
• Add() is similar to Find()
• Find_Min()

Just walk to the left, easy
• Remove_Min()

Equivalent to Find_Min() then Remove()
• Summary






Find() : O(logN)
Find_Min() : O(logN)
Remove_Min() : O(logN)
Add() : O(logN)
Remove() : O(logN)
The BST is “supposed” to behave like that
Page 14
Binary Search Tree - Problems
• In reality…


All these operations are O(logN) only if the tree is balanced
Inserting a sorted sequence degenerates into a linked list
• The real upper bounds





Find() : O(N)
Find_Min() : O(N)
Remove_Min() : O(N)
Add() : O(N)
Remove() : O(N)
• Solution



AVL Tree, Red Black Tree
Use “rotations” to maintain balance
Both are difficult to implement, rarely used
Page 15
Advanced Data Structures
Heap (Priority Queue)
Brought to you by Max (ICQ:31252512 TEL:61337706)
February 5, 2005
What is a Heap?
• A (usually) complete binary tree for Priority Queue


Enqueue = Add
Dequeue = Find_Min and Remove_Min
• Heap Property

Every node’s value is greater than those of its decendants
Page 17
Heap - Implementation
•
•
Usually we use an array to simulate a heap
Assume nodes are indexed 1, 2, 3, ...



Parent = [Node / 2]
Left Child = Node*2
Right Child = Node*2 + 1
Page 18
Heap - Add
• Append the new element at the end
• Shift it up until the heap property is restored
• Why always works?
Page 19
Heap - Remove_Min
• Replace the root with the last element
• Shift it down until the heap property is restored
• Again, why it always works?
Page 20
Heap - Build_Heap
• There is a special operation called Build_Heap

Transform an ordinary into a heap without using extra memory
• The Remove_Min operation has two steps


Replace the root with a leaf node
Restore the heap structure by shifting the node down
• This is called “Heapify”
• If we apply the Heapify step to ALL internal nodes, bottom
to up, we get a heap
Page 21
Heap - Build_Heap
Page 22
Heap - Summary
• Find() is usually not supported by a heap

You may scan the whole tree / array if you really want
• Remove() is equivalent to applying Remove_Min() on a
subtree

Remember that any subtree of a heap is also a heap
• Summary





Find() : O(N)
// We usually don’t use Heap for this
Find_Min() : O(1)
Remove_Min() : O(logN)
Add() : O(logN)
Remove() : O(logN)
Page 23
Advanced Data Structures
Hash Table
Brought to you by Max (ICQ:31252512 TEL:61337706)
February 5, 2005
What is a Hash Table?
• Question



We have a Mark Six result (6 integers in the range 1..49)
We want to check if our bet matches it
What is the most efficient way?
• Answer


Use a boolean array with 49 cells
Checking a number is O(1)
• Problem


What if the range of number is very large?
What if we need to store strings?
• Solution

Use a “Hash Function” to compress the range of values
Page 25
Hash Table
• Suppose we need to store values
between 0 and 99, but only have
an array with 10 cells
• We can map the values [0,99] to
[0,9] by taking modulo 10. The
result is the “Hash Value”
• Adding, finding and removing an
element are O(1)
• It is even possible to map the
strings to integers, e.g. “ATE” to
(1*26*26+20*26+5) mod 10
Page 26
Hash Table - Collision
• But this approach has an inherent problem

What happens if two data has the same hash value?
• Two major methods to deal with this


Chaining (Also called Open Hashing)
Open Addressing (Also called Closed Hashing)
Page 27
Hash Table - Chaining
• Keep a link list at each hash table cell
• On average, Add / Find / Remove is O(1+a)

a = Load Factor = # of stored elements / # of cells
• If hash function is “random” enough, usually can get the average case
Page 28
Hash Table - Open Addressing
• If you don’t want to implement a linked list…
• An alternative is to skip a cell if it is occupied
• The following diagram illustrates “Linear Probing”
Page 29
Hash Table - Open Addressing
• Find() must continue until a blank cell is reached
• Remove() must use Lazy Deletion, otherwise further
operations may fail
Page 30
Hash Table - Summary
• Find_Min() and Remove_Min() are usually not supported in
a Hash Table

You may scan the whole tree / array if you really want
• For Chaining



Find() : O(1+a)
Add() : O(1+a)
Remove() : O(1+a)
• For Open Adressing



Find() : O(1 / 1-a)
Add() : O(1 / 1-a)
Remove() : O(ln(1/1-a)/a + 1/a)
• Both are close to O(1) if a is kept small (< 50%)
Page 31
Miscellaneous Stuff
• Judge problems



1020 – Left Join
1021 – Inner Join
1019 – Addition II
• Past contest problems


NOI2004 Day 1 – Cashier
Any more?
• Good place to find related information - Wikipedia



http://en.wikipedia.org/wiki/Binary_search_tree
http://en.wikipedia.org/wiki/Binary_heap
http://en.wikipedia.org/wiki/Hash_table
Page 32
Download