S6-Algorithm1

advertisement
Algorithm and data structure
Overview
Programming and problem solving, with
applications
Algorithm: method for solving a problem
Data structure: method to store information
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Agenda
 Data types
 stack, queue, priority queue
 Sorting
 bubble sort, merge sort, quick sort, heap sort, radix sort
 Searching
 binary search tree, red black tree, hash table
 Graphs
Depth-first search, breadth-first search
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Why study algorithms?
Their impact is broad and far reaching
•
•
•
Internet: Web search, packet routing, file sharing,…
Computers: Compilers, file system,…
Social media: News feeds, Advertisements,…
To solve problems
Example: Network connectivity problem
•
Find a path to get from one place to another when we
have a large set of pairwise connected places
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Why study algorithms?
To become a proficient programmer
To understand the nature
•
•
•
Computational models are replacing mathematical models
Formula based  algorithm based
People are more and more developing computational
model and try to simulate what might be happening in
nature in order to understand it
For fun and profit
•
better chance at interview
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Steps to developing a usable algorithm
 Model the problem
 Find an algorithm to solve it
 Fast enough? Fits in memory?
 If not, why?
 Address the problem
 Iterate until satisfied
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Dynamic connectivity problem
Given a set of N objects
 Union command: Connect two objects
 Given two object, provide a connection between them
 Find/connected query: is there a path connecting
the two objects?
union(4,3)
0
1
union(3,8)
union(6,5)
union(9,4)
5
6
union(2,1)
Connected(0,7)
Connected(8,9)  indirect connection
2
3
4
7
8
9
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Modeling the objects
 Pixels in digital photos
 Computers in network
 Friends in a social media
…
Convenient to name objects 0 to N-1

Use integers as array index
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Modeling
 Assumptions
 “is connected to” is an equivalence relation:
 Reflexive: p is connected to p
 Symmetric: if p is connected to q, then q is connected to p.
 Transitive: if p is connected to q and q is connected to r,
then p is connected to r.
 Connected components
 Maximal set of objects that are mutually connected
 {1,2} {5,6} {3,4,8,9}
0
1
2
3
4
5
6
7
8
9
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Implementation
 Find query: Check if two objects are in the same
component
 Union Command: Replace components containing
two objects with their union
0
1
2
3
4
5
6
7
8
9
Union(6,3)
{1,2} {5,6,3,4,8,9}
{1,2} {5,6} {3,4,8,9}
0
1
2
3
4
5
6
7
8
9
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Specify union-find data type
Goal: design efficient data structure for union-find



Number of objects N can be huge
Number of operations M can be huge
Find queries and union commands may be intermixed
UnionFind
UnionFind(int N)
void union (int p, int q)
boolean connected(int p, int q)
int find (int p)
int count()
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Testing
 read in number of objects N
 Repeat:
 read in pair of integers
 if not connected yet, connect them, and print out pair
Public static void main(String[] args){
int N = StdIn.readInt();
UnionFind uf = new UnionFind(N);
while(!StdIn.isEmpty()){
int p = StdIn.readInt();
int q = StdIn.readInt();
if(!uf.connected(p,q)){
uf.union(p,q);
StdOut.println(p + “ ” +q);
}
}
}
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Quick-find
Data structure:




Integer array id[] of size N.
Interpretation: p and q are connected iff they have the
same id.
0
1
2
3
4
5
6
7
8
9
Id[] 0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
Id[] 0
1
1
8
8
0
0
1
8
8
Find: Check if p and q have the same id
Id[6]=0 ; id[1]=1
6 and 1 are not connected
Union: To merge components containing p and q , change
all entries whose id equals id[p] to id[q]
Problem: Many values can change
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
QuickFind
public class QuickFind{
private int[] id;
public QuickFind(int N){
N Array accesses
id = new int[N];
for (int i=0; i<N; i++){
id[i]=i;
}
}
public boolean connected (int p, int q){
2 Array accesses
return id[p]==id[q];
}
public void union (int p, int q){
int pid = id[p];
At most 2N+2 Array access
int qid = id[q];
for (int i=0; i<id.length; i++)
if(id[i]==pid)
id[i]=qid;
}
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Quick-find is too slow
Cost model: Number of array access (for read and write)
algorithm
initialize
union
find
cost
Quick-find
N
N
1
expensive
order of growth of number of array accesses
Quadratic
Takes N2 array access to process N union commands on N
objects
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Quadratic algorithms don’t scale
 billion operations per second
 billion words of main memory
 Touch all words in about 1 sec
 Since 1950!
Example:
- 109 union command
- 109 objects
- Quick-find take 1018 operations
- +30 years
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Quick-union (lazy approach)
Data structure:



Integer array id[] of size N.
Interpretation: id[i] is parent of i
Root of i is id[id[….id[i]…..]]]
 keep going until it doesn’t change, algorithm ensures no cycles
0
0
Id[]
1
2
3
4
5
6
7
8
1
9
6
9
0 1 9 4 9 6 6 7 8 9
2
4
5
3


Find: Check if p and q have the same root
Union: To merge components containing p and q , set the
id of p’s root to the id of q’s root.
0
Id[]
1
2
3
4
5
6
7
8
9
0 1 9 4 9 6 6 7 8 6
Only one value changes
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
7
8
demo
0
1
2
3
4
5
6
7
8
9
Id[]
0
1
2
3
4
5
6
7
8
9
union(4,3)
0
1
2
3
3
5
6
7
8
9
union(3,8)
0
1
2
8
3
5
6
7
8
9
union(6,5)
0
1
2
8
3
5
5
7
8
9
union(9,4)
0
1
2
8
3
5
5
7
8
8
Union(2,1)
0
1
1
8
3
5
5
7
8
8
connected(8,9)
0
1
1
8
3
5
5
7
8
8
connected(5,4)
0
1
1
8
3
5
5
7
8
8
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
QuickUnion
public class QuickUnion{
private int[] id
public QuickUnion(int N){
N Array accesses
id = new int[N];
for (int i=0; i<N; i++){
id[i]=i;
}
}
Depth of i array accesses
private int findRoot(int i){
while(i != id[i] ) I = id[i];
return i;
Depth of p and q array accesses
}
public boolean connected (int p, int q){
return findRoot(p)==findRoot(q);
}
public void union (int p, int q){
int i = findRoot(p);
Depth of p and q array accesses
int j = findRoot(q);
id[i] = j;
}
}
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Quick-union is also too slow
Cost model: Number of array access (for read and write)
algorithm
initialize
union
find
cost
Quick-find
N
N
1
Expensive
Quick-union
N
N
N
Expensive
Includes cost of finding root
•
Tree can get really tall
•
Find too expensive (N array accesses)
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Worse case
Improvements
Improvement 1: weighting
Weighted quick-union:



Avoid tall tree
Keep track of size of each tree (number of objects)
Balance by linking the root of smaller tree to the larger
tree
 In quick-union we might put the larger tree lower
 In weighted quick-union we always choose the better alternative
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
demo
0
1
2
3
4
5
6
7
8
9
Id[]
0
1
2
3
4
5
6
7
8
9
union(4,3)
0
1
2
4
4
5
6
7
8
9
union(3,8)
0
1
2
4
4
5
6
7
4
9
union(6,5)
0
1
2
4
4
6
6
7
4
9
union(9,4)
0
1
2
4
4
6
6
7
4
4
Union(2,1)
0
2
2
4
4
6
6
7
4
4
connected(8,9)
0
2
2
4
4
6
6
7
4
4
connected(5,4)
0
2
2
4
4
6
6
7
4
4
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Weighted QuickUnion
public class QuickUnion{
private int[] id;
public QuickUnion(int N){
id = new int[N];
for (int i=0; i<N; i++){
id[i]=i;
}
}
private int findRoot(int i){
while(i != id[i] ) I = id[i];
return i;
}
public boolean connected (int p, int q){
return findRoot(p)==findRoot(q);
}
public void union (int p, int q){
int i = findRoot(p);
Maintain extra array sz[]
int j = findRoot(q);
if(sz[i]<sz[j] {id[i] = j; sz[j] +=sz[i];}
else
{id[j] = i; sz[i] +=sz[j];}
}
}
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Weighted Quick-union analysis
Running time:


Find: takes time proportional to depth of p and q.
Union: takes constant time, given roots
Proposition. Depth of any node x is at most lg N.
Proof: when does depth of x increase?
Depth increase by 1 when tree T1 containing x is merged into
another tree T2.
• The size of the tree containing x at least double since |T2|
>= |T1|
• Size of tree containing x can double at most lg N times.
Why?
x
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Weighted QU
algorithm
initialize
union
find
Quick-find
N
N
1
Quick-union
N
N
N
Weighted QU
N
Lg N
Lg N
Includes cost of finding root
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Improvements
Improvement 2: path compression
Quick-union with path compression:

Just after computing the root of p, set the id of each
examined node to point to that root.
 Flatten the tree
private int findRoot(int i){
while(i != id[i] ){
id[i] = id[id[i]];
I = id[i];
}
return i;
}
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Analysis
Proposition: Stating from an empty
data structure, any sequence of M
union-find operations on N objects
makes <= c(N+M lg* N) array
accesses.
 So close to be linear-time.
 Theory it is not
 Practice it is
N
Lg* N
1
0
2
1
4
2
16
3
6553
6
4
265536
5
Iterate log function
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Summary
algorithm
find
Quick-find
MN
Quick-union
MN
Weighted QU
N+M Lg N
QU + path compression
N+M Lg N
Weighted QU + path compression
N+M Lg* N
M union-find operations on a set of N objects
Weighted QU + path compression reduces time from 30 years to 6 seconds
Supercomputer wont help much; good algorithm is the key solution
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Union-find Applications
Dynamic connectivity
Image processing
Graph processing
Understanding physical phenomena
….
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Percolation
A model of many physical systems:
•
N by N grid of sites
•
Each site is open with probability p (or blocked with
probability 1-p)
•
System percolates iff top and bottom are connected by open
sites
• Percolate
Blocked
site
does not percolate
Open site
Open site connected to top
No open site connected to top
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Likelihood of Percolation
Depends on site vacancy probability p.



p low ( 0.4 ) doesn’t percolate p>p*
p high (0.8) percolates p<p*
p medium (0.5) ??
 So the question is how we know whether it percolates or
not.
 What is the value for p*
Percolation phase transition (when N is large sharp threshold
for p*)
1
P*
0
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Monte Carlo Simulation
Initialized N by N grid to be blocked
Declare random sites open until top connected to
bottom
Vacancy percentage estimates p*
Every time we add an open site we check and see if it makes the
system percolate
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Percolation threshold
Q: What is the value for p*?
A. About 0.592746
Fast Algorithm enables accurate answer to scientific
question.
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Dynamic connectivity solutions
How to model opening a new site?
A. Connect newly opened site to all of its adjacent open site
Up to 4 calls to union()
Open this site
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
29
20
21
22
23
24
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Dynamic connectivity solutions



0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
29
20
21
22
23
24
Create an object for each site and name then 0 to N2-1
Sites are in same component if connected by open sites
Percolates iff any site on bottom row is connected to site on top row
Brute-force algorithm: N2 calls to connected()
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Dynamic connectivity solutions
Clever trick: Introduce 2 virtual sites (and connections to top and
bottom)
• Percolates iff virtual top site is connected to virtual bottom site
Only one 1 call to connected()
Virtual site
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
29
20
21
22
23
24
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Summary
 Dynamic connectivity problem
 Modeled the problem to understand what kind of
data structure and algorithm needed to solve it
 Introduced few easy and inaccurate algorithms to
solve it
 Learned how to improve them to get efficient
algorithm
 Applications that could not be solved without
these efficient algorithms
CS2336: Computer Science II; K. Wayne & R. Sedgewick Uni. of Princeton
Download