01-Intro - Fordham University Computer and Information

advertisement
Computer Algorithms
Lecture 1
Introduction
Some of these slides are courtesy of D. Plaisted, UNC and M. Nicolescu, UNR
1
Class Information
•
•
•
•
Instructor: Elena Filatova
e-mail: filatova@fordham.edu
Office: 334
Office hours:
–
–
–
–
Tuesday, Friday 12:30 – 2:00pm
Thursday 4:00 – 5:00pm
Additional office hours: by appointment
E-mail: 4080 in the beginning of the subject phrase
• Web page: blackboard
– It is your responsibility to check the class blackboard
regularly
– Any questions related course material should be placed
on the blackboard discussion board
• Main text book: Introduction to Algorithms by Cormen
et al (3nd ed.)
2
Grading
• Homework assignments: 30%
– Electronic submission through blackboard
– Done individually
• Midterm: 25%
• Final: 35%
• In-class quizzes: 5%
–
–
–
–
Without prior notification
Based on the material from the previous class
Absolutely no make-up quizzes
Two worst scores will be dropped
• Class participation: 5%
– Attendance is mandatory
– No electronic devices is allowed in the class room (unless with special
permission)
3
Pre-requisites
• Programming (CS I)
–C
– C++
• Data Structures (2200)
• Discrete math: not necessary but very helpful
4
Approach
•
•
•
•
•
Analytical
Build a mathematical model of a computer
Study properties of algorithms on this model
Reason about algorithms
Prove facts about time taken for algorithms
5
Course Outline
Intro to algorithm design, analysis, and applications
•
Algorithm Analysis
– Proof of algorithm correctness, Asymptotic Notation, Recurrence
Relations, Probability & Combinatorics, Proof Techniques, Inherent
Complexity.
•
Data Structures
– Lists, Heaps, Graphs, Trees, Balanced Trees
•
Sorting & Ordering
– Mergesort, Heapsort, Quicksort, Linear-time Sorts (bucket, counting,
radix sort), Selection, Other sorting methods.
•
Algorithmic Design Paradigms
– Divide and Conquer, Dynamic Programming, Greedy Algorithms, Graph
Algorithms, Randomized Algorithms
6
Goals
• Be very familiar with a collection of core algorithms.
– CS classics
– A lot of examples on-line for most languages/data structures
• Be fluent in algorithm design paradigms: divide & conquer, greedy
algorithms, randomization, dynamic programming, approximation
methods.
• Be able to analyze the correctness and runtime performance of a
given algorithm.
• Be intimately familiar with basic data structures.
• Be able to apply techniques in practical problems.
7
Algorithms
• Informally,
– A tool for solving a well-specified computational problem.
– One formal definition ~ Turing Machine (4090 Theory of
Computation)
Input
Algorithm
Output
• Example: sorting
input:
A sequence of numbers.
output:
An ordered permutation of the input.
issues:
correctness, efficiency, storage, etc.
8
Why Study Algorithms?
• Necessary in any computer programming problem
– Improve algorithm efficiency: run faster, process more data, do
something that would otherwise be impossible
– Solve problems of significantly large size
– Technology only improves things by a constant factor
• Compare algorithms
• Algorithms as a field of study
– Learn about a standard set of algorithms
– New discoveries arise
– Numerous application areas
• Learn techniques of algorithm design and analysis
9
Roadmap
• Different problems
– Sorting
• Different design
paradigms
– Searching
– Divide-and-conquer
– String processing
– Incremental
– Graph problems
– Dynamic programming
– Geometric problems
– Greedy algorithms
– Numerical problems
– Randomized/probabilistic
10
Analyzing Algorithms
•
Predict the amount of resources required:
•
memory
how much space is needed?
•
computational time:
how fast the algorithm runs?
•
•
FACT: running time grows with the size of the input
Input size (number of elements in the input)
–
Size of an array, polynomial degree, # of elements in a matrix, # of bits in
the binary representation of the input, vertices and edges in a graph
Def: Running time = the number of primitive operations (steps)
executed before termination
Arithmetic operations (+, -, *), data movement, control, decision making (if,
while), comparison
11
Algorithm Efficiency vs. Speed
E.g.: sorting n numbers
Sort 106 numbers (n=106)!
Friend’s computer = 109 instructions/second
Friend’s algorithm = 2n2 instructions
Your computer = 107 instructions/second
Your algorithm = 50nlgn instructions
2  10  instructio ns
 2000seconds
Your friend =
9
10 instructio ns / second
6 2
You =
 
50  106 lg10 6 instructio ns
 100seconds
7
10 instructio ns / second
20 times better!!
12
Algorithm Analysis: Example
• Alg.: MIN (a[1], …, a[n])
m ← a[1];
for i ← 2 to n
if a[i] < m
then m ← a[i];
• Running time:
– the number of primitive operations (steps) executed
before termination
T(n) =1 [first step] + (n) [for loop] + (n-1) [if condition] +
(n-1) [the assignment in then] = 3n - 1
• Order (rate) of growth:
– The leading term of the formula
– Expresses the asymptotic behavior of the algorithm
13
Typical Running Time Functions
• 1 (constant running time):
– Instructions are executed once or a few times
• logN (logarithmic)
– A big problem is solved by cutting the original problem in smaller
sizes, by a constant fraction at each step
• N (linear)
– A small amount of processing is done on each input element
• N logN
– A problem is solved by dividing it into smaller problems, solving
them independently and combining the solution
14
Typical Running Time Functions
• N2 (quadratic)
– Typical for algorithms that process all pairs of data items (double
nested loops)
• N3 (cubic)
– Processing of triples of data (triple nested loops)
• NK (polynomial)
• 2N (exponential)
– Few exponential algorithms are appropriate for practical use
15
Time units
Why Faster Algorithms?
Problem size (n)
Time units
100
80
60
f(n)=n
f(n)=log(n)
f(n)=n log(n)
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Problem size (n)
16
Asymptotic Notations
• A way to describe behavior of functions in the limit
– Abstracts away low-order terms and constant factors
– How we indicate running times of algorithms
– Describe the running time of an algorithm as n grows to 
• O notation:asymptotic “less than and equal”:
f(n) “≤” g(n)
•  notation:asymptotic “greater than and equal”:f(n) “≥” g(n)
•  notation:asymptotic “equality”:
f(n) “=” g(n)
17
Mathematical Induction
• Used to prove a sequence of statements (S(1), S(2), …
S(n)) indexed by positive integers.
n n  1
i

2
i 1
n
S(n):
• Proof:
– Basis step: prove that the statement is true for n = 1
– Inductive step: assume that S(n) is true and prove that S(n+1) is
true for all n ≥ 1
• The key to proving mathematical induction is to find case n
“within” case n+1
• Correctness of an algorithm containing a loop
18
Recurrences
Def.: Recurrence = an equation or inequality that describes a
function in terms of its value on smaller inputs, and one or
more base cases
• E.g.: T(n) = T(n-1) + n
• Useful for analyzing recurrent algorithms
• Methods for solving recurrences
–
–
–
–
Iteration method
Substitution method
Recursion tree method
Master method
19
Sorting
Iterative methods:
• Insertion sort
• Bubble sort
• Selection sort
2, 3, 4, 5, 6, 7, 8, 9, 10, J, Q, K, A
Divide and conquer
• Merge sort
• Quicksort
Non-comparison methods
• Counting sort
• Radix sort
• Bucket sort
20
Types of Analysis
• Worst case
(e.g. cards reversely ordered)
– Provides an upper bound on running time
– An absolute guarantee that the algorithm would not run longer,
no matter what the inputs are
• Best case
(e.g., cards already ordered)
– Input is the one for which the algorithm runs the fastest
• Average case (general case)
– Provides a prediction about the running time
– Assumes that the input is random
21
Specialized Data Structures
• Problem:
– Schedule jobs in a computer
system
– Process the job with the highest
priority first
• Solution: HEAPS
– all levels are full, except possibly
the last one, which is filled from
left to right
– for any node x
Parent(x) ≥ x
Operations:
–
–
–
–
Build
Insert
Extract max
Increase key
22
Graphs
• Applications that involve not only a set of items, but also the
connections between them
Maps
Schedules
Hypertext
Computer networks
Circuits
23
Searching in Graphs
• Graph searching = systematically follow the edges of the graph so
as to visit the vertices of the graph
u
v
w
x
y
z
• Two basic graph methods:
– Breadth-first search
– Depth-first search
– The difference between them is in the order in which they explore the
unvisited edges of the graph
• Graph algorithms are typically elaborations of the basic graphsearching algorithms
24
Minimum Spanning Trees
•
•
A connected, undirected graph:
– Vertices = houses, Edges = roads
A weight w(u, v) on each edge (u, v)  E
Find T  E such that:
1. T connects all vertices
2. w(T) = Σ(u,v)T w(u, v) is
minimized
8
b
4
7
d
9
2
11
a
c
i
6
7
8
h
14
4
1
e
10
g
2
f
Algorithms: Kruskal and Prim
25
Shortest Path Problems
• Input:
– Directed graph G = (V, E)
– Weight function w : E → R
6
• Weight of path p = v0, v1, . . . , vk
3
2
1
4
2
• Shortest-path weight from u to v:
5
7
3
6
k
w( p )   w( vi 1 , vi )
i 1
w(p) : u p v if there exists a path from u to v
δ(u, v) = min
otherwise
26
Dynamic Programming
• An algorithm design technique (like divide and conquer)
– Richard Bellman, optimizing decision processes
– Applicable to problems with overlapping subproblems
E.g.: Fibonacci numbers:
• Recurrence: F(n) = F(n-1) + F(n-2)
• Boundary conditions: F(1) = 0, F(2) = 1
• Compute: F(5) = 3, F(3) = 1, F(4) = 2
• Solution: store the solutions to subproblems in a table
• Applications:
– Assembly line scheduling, matrix chain multiplication, longest common
sequence of two strings, 0-1 Knapsack problem
27
Greedy Algorithms
• Similar to dynamic programming, but simpler approach
– Also used for optimization problems
• Idea: When we have a choice to make, make the one
that looks best right now
– Make a locally optimal choice in hope of getting a globally
optimal solution
• Greedy algorithms don’t always yield an optimal solution
• Applications:
– Activity selection, fractional knapsack, Huffman codes
28
Greedy Algorithms
• Problem
– Schedule the largest possible set of non-overlapping
activities for B21
Start
End
Activity
1
8:00am
9:15am
Database systems class
2
8:30am
10:30am
Movie presentation (refreshments served)
3
9:20am
11:00am
Data structures class
4
10:00am
noon
Programming club mtg. (Pizza provided)
5
11:30am
1:00pm
Computer graphics class

6
1:05pm
2:15pm
Analysis of algorithms class

7
2:30pm
3:00pm
Computer security class

8
noon
4:00pm
Computer games contest (refreshments served)
9
4:00pm
5:30pm
Operating systems class



29
How to Succeed in this Course
•
•
•
•
•
Start early on all assignments. Don‘t procrastinate
Complete all reading before class
Participate in class
Review after each class
Be formal and precise on all problem sets and inclass exams
30
Basics
• Introduction to algorithms, complexity, and proof of correctness.
(Chapters 1 & 2)
• Asymptotic Notation. (Chapter 3.1)
• Goals
–
–
–
–
Know how to write formal problem specifications.
Know about computational models.
Know how to measure the efficiency of an algorithm.
Know the difference between upper and lower bounds and what they
convey.
– Be able to prove algorithms correct and establish computational
complexity.
31
Divide-and-Conquer
•
•
•
•
Designing Algorithms. (Chapter 2.3)
Recurrences. (Chapter 4)
Quicksort. (Chapter 7)
Divide-and-conquer and mathematical induction
• Goals
–
–
–
–
–
Know when the divide-and-conquer paradigm is an appropriate one.
Know the general structure of such algorithms.
Express their complexity using recurrence relations.
Determine the complexity using techniques for solving recurrences.
Memorize the common-case solutions for recurrence relations.
32
Randomized Algorithms
• Probability & Combinatorics. (Chapter 5)
• Quicksort. (Chapter 7)
• Hash Tables. (Chapter 11)
• Goals
– Be thorough with basic probability theory and counting theory.
– Be able to apply the theory of probability to the following.
• Design and analysis of randomized algorithms and data structures.
• Average-case analysis of deterministic algorithms.
– Understand the difference between average-case and worst-case
runtime, esp. in sorting and hashing.
33
Sorting & Selection
•
•
•
•
Heapsort (Chapter 6)
Quicksort (Chapter 7)
Bucket Sort, Radix Sort, etc. (Chapter 8)
Selection (Chapter 9)
• Goals
– Know the performance characteristics of each sorting algorithm, when
they can be used, and practical coding issues.
– Know the applications of binary heaps.
– Know why sorting is important.
– Know why linear-time median finding is useful.
34
Search Trees
• Binary Search Trees – Not balanced (Chapter 12)
• Red-Black Trees – Balanced (Chapter 13)
• Goals
–
–
–
–
Know the characteristics of the trees.
Know the capabilities and limitations of simple binary search trees.
Know why balancing heights is important.
Know the fundamental ideas behind maintaining balance during
insertions and deletions.
– Be able to apply these ideas to other balanced tree data structures.
35
Dynamic Programming
• Dynamic Programming (Chapter 15): an algorithm
design technique (like divide-and-conquer)
• Goals
– Know when to apply dynamic programming and how it
differs from divide and conquer.
– Be able to systematically move from one to the other.
36
Graph Algorithms
• Basic Graph Algorithms (Chapter 22)
• Goals
– Know how to represent graphs (adjacency matrix and edge-list
representations).
– Know the basic techniques for graph searching.
– Be able to devise other algorithms based on graph-searching
algorithms.
– Be able to “cut-and-paste” proof techniques as seen in the basic
algorithms.
37
Greedy Algorithms
• Greedy Algorithms (Chapter 16)
• Minimum Spanning Trees (Chapter 23)
• Shortest Paths (Chapter 24)
• Goals
– Know when to apply greedy algorithms and their characteristics.
– Be able to prove the correctness of a greedy algorithm in solving an
optimization problem.
– Understand where minimum spanning trees and shortest path
computations arise in practice.
38
Weekly Reading Assignment
Chapters 1, 2, and 3
Appendix A
(Textbook: CLRS)
39
Insertion Sort
• Good for sorting a small number of elements
• Works like sorting a hand of playing cards
– Start with an empty hand and the cards face down on the
table
– Then remove one card at a time from the table and insert it
into the correct position in the left hand
– To find the correct position for a card, compare it with each of
the cards already in the hand from right to left
– At all times, the cards that are already in the left hand a
sorted and those cards were originally on the top of the pile
• Example
40
Pseudo-Code Conventions
• Indentation as block structure
• Loop and conditional constructs similar to those in
Pascal or Java, such as while, for, repeat, if-then-else
• Symbol ► starts a comment line (no execution time)
• Using ← instead of = and allowing i ← j ← e
• Variables local to the given procedure
• Page 19 of the text-book
41
Insertion Sort: Pseudo-Code
Definiteness: each instruction is clear and unambiguous
Visualization: University of San Francisco
42
Algorithm Analysis
• Assumptions:
– Random-Access Machine (RAM)
• Operations are executed one after another
• No concurrent operations
• Only primitive instructions
– Arithmetic (+, -, /, *, floor, ceiling)
– Data movement (load, store, copy)
– Control operators (conditional/unconditional branch, subroutine call,
return)
• Primitive instructions take constant time
• Interested in time complexity: amount of time to
complete an algorithm
43
What to measure? How to
measure?
• Input size: depends on the problem
– Number of items for sorting (3 or 1000)
• Even time for sorting sequences of the same size can vary
– Total number of bits for multiplying two integers
– Sometimes, more than one input
• Running time: number of primitive operations executed
– Machine-independent
– Each line of pseudo-code – constant time
– Constant time for each line vary
44
Analysis of Insertion Sort
45
Running Time
• Best-case: the input array is in the correct order
• Worst-case: the input array is in the reverse order
• Average-case
Insertion sort running time
•
•
•
•
Best-case: linear function
Worst-case: quadratic function
Average-case: best-case or worst-case??
Order (rate) of growth:
– The leading term of the formula
– Expresses the asymptotic behavior of the algorithm
• Given two algorithms (linear and quadratic) which one will you
choose?
46
Download