Algorithms - TAMU Computer Science Faculty Pages

advertisement
What is Computer Science
About?
Part 2: Algorithms
Design and Analysis of Algorithms
• Why we study algorithms:
– Many tasks can be reduced to abstract
problems
– If we can recognize them, we can use known
solutions
• Example: Graph Algorithms
– Graphs could represent friendships
among people, or adjacency of states on
a map, or links between web pages...
– Determining connected components
(reachability)
– Finding shortest path between two points
• MapQuest; Traveling Salesman Problem
– Finding cliques
• completely connected sub-graphs
– Uniquely matching up pairs of nodes
• e.g. a buddy system based on friendships
– Determining whether 2 graphs have
same connectivity (isomorphism)
– Finding a spanning tree (acyclic tree that
touches all nodes)
• minimal-cost communication networks
Kruskal’s Algorithm for MinimumSpanning Trees
// input: graph G with a set of vertices V
// and edges (u,v) with weights (lengths)
KRUSKAL(G):
A = ∅
foreach vi  V: cluster[vi]  i // singletons
foreach edge (u,v) ordered by increasing weight:
if cluster[u] ≠ cluster[v]:
A = A  {(u, v)}
foreach w  V:
if cluster[w] = cluster[u]:
cluster[w]  cluster[v] // merge
return A // subset of edges
• It is greedy
• Is it correct? (always produce MST?)
• Is it optimal? (how long does it take?)
Save the Gnomes!
Rules
•Gnomes stand on staircase
•Gnomes can see gnomes below them
•Gnomes may not speak unless asked to
•Gnomes can hear the response of others
•Gnomes cannot see their own hat
Save the Gnomes!
Game
•Start at top gnome
•Ask for the color of his hat (blue or red)
Save the Gnomes!
Game
•Start at top gnome
•Ask for the color of his hat (blue or red)
•If he gets it wrong, he dies
•Continue to next gnome
Save the Gnomes!
Game
•Start at top gnome
•Ask for the color of his hat (blue or red)
•If he gets it wrong, he dies
•Continue to next gnome
•Come up with a strategy that saves the
most gnomes
Save the Gnomes!
Game
•Start at top gnome
•Ask for the color of his hat (blue or red)
•If he gets it wrong, he dies
•Continue to next gnome
•Come up with a strategy that saves the
most gnomes
•What’s the expected number
of surviving gnomes?
•What’s the worst case?
• Characterize algorithms in terms of efficiency
– Note: we count number of steps, rather than seconds
• Time is dependent on machine, compiler, load, etc...
• Optimizations are important for real-time sys., games
– Are there faster ways to sort a list? invert a matrix? find a
completely connected sub-graph?
– Scalability for larger inputs (think: human genome): how
much more time/memory does the algorithm take?
– Polynomial vs. exponential run-time (in the worst case)
• Depends a lot on the data structure (representation)
– Hash tables, binary trees, etc. can help a lot
• Proofs of correctness
• Why do we care so much about polynomial run-time?
– Consider 2 programs that take an input of size n (e.g. length
of a string, number of nodes in graph, etc.)
– Run-time of one scales up as n2 (polynomial), and the other
as 2n (exponential)
1200
1000
800
n^2
600
2^n
400
200
0
0
2
4
6
n
8
10
• Why do we care so much about polynomial run-time?
– Consider 2 programs that take an input of size n (e.g. length
of a string number of nodes in graph, etc.)
– Run-time of one scales up as n2 (polynomial), and the other
as 2n (exponential)
– Exponential algorithms are effectively “unsolvable” for n>~16
even if we used computers that were 100 times as fast!
1200
100000
a computational
“cliff”
90000
1000
80000
70000
800
60000
n^2
600
2^n
n^2
50000
2^n
40000
400
30000
20000
200
10000
0
0
0
2
4
6
n
8
10
0
5
10
15
n
20
25
n
n2
2n
1
1
2
2
4
4
3
9
8
4
16
16
5
25
32
6
36
64
7
49
128
8
64
256
9
81
512
10
100
1024
11
121
2048
12
144
4096
13
169
8192
14
196
16384
15
225
32768
16
256
65536
17
289
131072
18
324
262144
19
361
524288
20
400
1048576
Helpful rules of thumb:
210 ~ 1 thousand (1,024) (1 KB)
220 ~ 1 million (1,048,576) (1 MB)
230 ~ 1 billion (1,073,741,824) (1 GB)
Moore’s Law
(named after Gordon Moore, founder of Intel)
• Number of
transistors on CPU
chips appears to
double about once
every 18 months
• Similar statements
hold for CPU speed,
network bandwidth,
disk capacity, etc.
• But waiting a couple
years for computers
to get faster is not an
effective solution to
NP-hard problems
Dual Core Itanium
Pentium-4
80486
Motorola 6800
source: Wikipedia
P vs. NP
(CSCE 411)
• Problems in “P”: solvable in polynomial time with
a deterministic algorithm
– Examples: sorting a list, inverting a matrix...
• Problems in “NP”: solvable in polynomial time with
a non-deterministic algorithm
– Given a “guess”, can check if it is a solution in
polynomial time
– No known polynomial-time algorithm exists, and they
would take exponential time to enumerate and try all
the guesses in the worst case
– Example: given a set of k vertices in a graph, can check
if they form a completely connected clique; but there
are exponentially many possible sets to choose from
P vs. NP
(CSCE 411)
• Most computer scientists believe P≠NP,
though it has yet to be rigorously proved
• What does this mean?
– That there are intrinsically “hard” problems for
which a polynomial-time algorithm will never be
found
P
sorting a list,
inverting a matrix,
minimum-spanning tree...
NP
even harder problems
(complexity classes)
graph clique, subset cover,
Traveling Salesman Problem,
satisfiability of Boolean formulas,
factoring of integers...
• Being able to recognize whether a problem is in
P or NP is fundamentally important to a computer
scientist
• Many combinatorial problems are in NP
– Knapsack problem (given n items with size wi and value
vi, fit as many as possible items into a knapsack with a
limited capacity of L that maximizes total value.
– Traveling salesman problem (shortest circuit visiting
every city)
– Scheduling – e.g. of machines in a shop to minimize a
manufacturing process
• Finding the shortest path in a graph between 2
nodes is in P
– There is an algorithm that scales-up polynomially with
size of graph: Djikstra’s algorithm
– However, finding the longest path is in NP!
• Applications to logistics, VLSI circuit layout...
• Not all hope is lost...
• Even if a problem is in NP, there might be an
approximation algorithm to solve it efficiently (in
polynomial time)
– However, it is important to determine the error
bounds.
– For example, an approx. alg. might find a subset
cover that is “no more than twice the optimal size”
– A simple greedy algorithm for the knapsack problem:
• Put in item with largest weight-to-value ratio first, then next
largest, and so on...
• Can show that will fill knapsack to within 2 times the optimal
value
Download