Uploaded by Nelson Char

Data Structure and Algorithm Notes

advertisement
Lecture 1
•
•
•
Correctness (prove to be correct must test ALL possible scenarios)
• formal
o mathematically concept correct
Informal
•
o Just run data then get correct value
Efficiency
• Speed efficiency (measure the number of basic steps)
o 20 lines vs 5 lines
• Storage/space efficiency
o Run time
Applicability
Class
1
Name
Constant
Log log n
Log n
Log-logarithmic
Logarithmic
(Log n)k
Polylogarithm
√𝑛
Radicals (Roots)
n
Linear
n log n
Linearithmic/
Linearlogarithmic
Quadratic
n2
Comment
Short of best-case efficiencies, very few reasonable
examples can be given since an algorithm’s running time
typically goes to infinity when its input size grows infinitely
large. Since constant does not change, we use the value ‘1’
to represent a constant efficiency. In fact, any expression
without ‘n’ – the number of data (input size) is considered
constant.
Log of a logarithm e.g. loga(logan)
Typically, a result of cutting a problem’s size by a constant
factor on each iteration of the algorithm. Note that a
logarithmic algorithm cannot take into account all its input
or even a fixed fraction of it: any algorithm that does so will
have at least linear running time.
Any function which is the sum of constants times powers of
a logarithm of the argument: f(x)=Σi=0kcilogpi x. (2) In
complexity theory, the measure of computation, m(n)
(usually execution time or memory space), is bounded by a
polylogarithmic function of the problem size, n. More
formally m(n) = O(logk n).
A radical function contains functions involving root. Radical
functions have many interesting applications, and are
studied extensively in many mathematics courses, and are
used often in science and engineering. Applications
involving the calculation of the shortest distance between
two places, or predict how long a stairway is based upon
the height it reaches, are usually involve radical functions.
Algorithms that scan a list of size n such as sequential
search belongs to this class
Is a combination of Linear and logarithmic
Typically, characterizes efficiency of algorithms with two
embedded loops. Elementary sorting algorithms and certain
operations on n x n matrices are standard examples.
n3
Cubic
2n
Exponential
n!
Factorial
Typically, characterizes efficiency of algorithms with three
embedded loops. Several nontrivial algorithms from linear
algebra fall into this class.
Typical for algorithms that generate all subsets of an nelement set. Often, the term “exponential” is used in a
broader sense to include this and larger orders of growth as
well.
Typically for algorithms that generate all permutations of an
n-element set.
Lecture 2
Master Thereom
https://www.youtube.com/watch?v=NQMUQpmurFI
Sample of proving template:
Lecture 3
Slide 10 & 11
Hash - Many to Many
Heap - max heap and min heap, highest value or lowest value from the structure. Priority queue.
Searching function - trip?
Lower bound = lowest element
Higher bound = highest element
E.g. 0 1 2 3 4 5, so lower bound is 0, higher bound is 5
Array
Array always starts from 0
Stack
LIFO (LAST IN FIRST OUT)
• Only can insert and remove the top, so if u wanna remove the third one, u have to remove
first then remove second.
Queue
FIFO (FIRST IN FIRST OUT)
Enqueue/Add - only add from back
Dequeue/Remove - only remove from the front
So after enqueue and dequeue, if i fetch array(1), it will return nothing since its empty? Yes
Tree
To count the height, count the number of link/bridges to the longest/furthest one. E.g. AB, BD, DF,
FG then GH. So total 5
To count the depth, count the number of link/bridges from root to 1 node.
The last leaf must all be same levels and left to right have no empty node to be considered
complete.
This is incomplete as the third leaf in the last row is missing.
Heap
If the root amount is the biggest, it is called Maximum heap
If the root amount is the smallest, it is called Minimum heap
The referencing for heap is top down, not left right. Hence, the amount on the left can be bigger or
smaller than the one on the right.
Associative Tables (Hash Table)
Hash Function Example: Hash(key) = key mod 8 in another words Hash(x) = x mod 8
Close Addressing = When you put the key into an address.
Key
Address
7
0
1
2
3
4
5
6
Chain
Hash(7) = 7 mod 8 = 7 (address 7, so put in 7)
Hash(15) = 15 mod 8 = 7 (address 7, but 7 already taken so put as chain. )
This is how to solve a collision by separate chaining method:
To calculate load factor = number of keys in the hash table / size of the hash table
Load factor = 0 means the hash table is empty
Load factor = 1 means the hash table is full
Key
7
8
3
Address
0
1
2
3
4
Chain
9
5
Chain
3
10
Chain
4
6
To calculate load factor in this scenario is, 3/5 and NOT (chain + key) / address, 9/5
Formula (NOT REQUIRED TO REMEMBER):
7
15
Where a is the load factor.
Open Addressing = When you put the key into the address but the address already taken so you put
in another address
Key
15
7
Address 0
1
2
3
4
5
6
Hash(7) = 7 mod 8 = 7 (address 7, so put in 7)
Hash(15) = 15 mod 8 = 7 (address 7, but 7 already taken so put in the next one, 0.
7
This is how to solve a collision by open addressing‘s linear probing method:
Change the status of the pending deletion key to deleted instead of physically deleting it.
Formula (NOT REQUIRED TO REMEMBER):
Where a is the load factor
Note: Assignment/Exam will not ask for successful probe, but will ask for empirical case instead
This is how to solve a collision by open addressing‘s quadratic probing method:
Hash(7310) = 7310 mod 7 = 2
But lets say that address 2 is occupied, so we use the new method to get to the new address e.g. 4.
This is how to solve a collision by open addressing‘s double hashing method:
Hash2(key) cannot be 0
Hash2(key) = (key mod s) + 1
And s = m - 1
Note: m is the size of the hash table
Rehash
When open addressing load factor = 0.5
When close addressing load factor = 1
A[address] >= A[2 address]
A[address] >= A[2 address + 1]
E.g.
A[2] >= A[4]
A[2] >= A[5]
22
14
17
1
2
3
13
1
4
5
Lecture 4
Quick Sort
9
2
3
7
8
2
6
^Pivot Value
2
3
7
8
2
6
9
Correct
2
3
7
8
2
6
9
^Pivot Value
2
3
7
8
2
6
9
Correct
Since 2 and 9 already correct, we split them and dont touch them anymore -> Divide and Conquer
3
7
8
2
6
^Pivot Value
Merge Sort
9
Split
9
7
2
3
2
3
8
2
Split
9
2
3
7
8
2
6
Split
9
2
3
7
8
2
7
6
8
2
6
6
Then start to merge and sort at the same time
9
2
3
7
8
2
6
Merge and sort
2
3
2
6
Merge and sort
2
2
9
7
8
3
6
Done.
2
3
9
2
6
7
8
Compare 2 and 2
Then enter below which is smaller
2
2
3
9
2
6
7
8
Then now compare 2 and 3
Then enter below which is smaller
2
2
So on and so forth
7
8
9
Lecture 5
Treaps
All the alphabets got meaning
E.g. M smaller than Q so its on the left. Z bigger than Q so its on the right
Exam Topics
Question 1
Topic: Complexity
Difficulty level: **
Can refer to Assignment 1 and Exercise 1
Question 2
Topic: Data Structure - Non-linear
Difficulty level: ***
Look at Binary Search Tree/AVL, Heap
Question 3
Topic: Algorithm Design and Complexity
Difficulty level: ****
Look at recursive algorithm
Question 4
Topic: Sorting
Difficulty level: ***
Describe sort algorithm (pseudo code), e.g. Given a set of unsorted data, apply the algorithm to sort
the data by describing the algorithm. Must know how to calculate runtime complexity of the
algorithm
Only 1 out of Insertion Sort, Selection Sort, Merge Sort, Quick Sort and Heap Sort will appear in exam
Question 5
Topic: Greedy
Difficulty level: **
Single source shortest path
Must know at least 2 out of 4 questions in Exercise 2.
Questions may appear like n1/3 which is actually cube root n = radical when asking to sort in order
the linear question
Exam Question 1
Topic: Efficiency & Complexity
Difficulty level: **
Can refer to Assignment 1 and Exercise 1
Class
Name
Examples
1
Constant
•
•
n0
21000
Log log n
Log-logarithmic
•
lg(2 lg lg 𝑛 ) = lg(lg 𝑛)
log n
Logarithmic
•
2/3 lg 𝑛3 = 2/3 × 3lg 𝑛 = 2lg 𝑛
sqrt(n)
Radicals (Roots)
•
•
•
(√5)
(√n)
n0.5
logkn
Polylogarithmic
•
(lg 𝑛) 4 = 𝑙𝑔 4𝑛
n
Linear
•
•
•
•
(4𝑛2) 0.5= 2n
2lg n
50n + 100 lg n
70n
n log n
Linearithmic
•
•
( 𝑛/3)lg 𝑛3 = 𝑛 lg 𝑛
10n lg n + 5n
n2
Quadratic
•
•
𝑛(𝑛 + 3) = (𝑛2 + 3𝑛)
n2 + 100n
n3
Cubic
•
8lg 𝑛 = (23)lg n = (2lg n)3 = 𝑛3
2n
Exponential
•
•
(3/2)2n
2n
n!
Factorial
•
•
(n3+2)!
2!
Determine big O running time
Master thereom
Exam Question 2
Topic: Data Structure - Non-linear
Difficulty level: ***
Look at Binary Search Tree/AVL, Heap
Binary Tree
Type
Rule
Algorithm
Example
Preorder/ Root- Algorithm preOrder (val
Prefix
Left- root <node pointer>)
Right
If (root is not null)
Process (root)
preOrder
(root->leftsubtree)
preOrder
(root->rightsubtree)
End if
Return End preOrder
A, B, D, H, E, I, J, C, F, G
Postorder/ Left- Algorithm postOrder (val
Postfix
Right- root <node pointer>)
Root
If (root is not null)
postOrder
(root->leftsubtree)
postOrder
(root->rightsubtree)
Process (root)
End if
Return End postOrder
H, D, I, J, E, B, F, G, C, A
Inorder/
Infix
Left- Algorithm inOrder (val root
Root- <node pointer>)
Right
If (root is not null)
inOrder
(root->leftsubtree)
Process (root)
inOrder
(root->rightsubtree)
End if
Return End inOrder
H, D, B, I, E, J, A, F, C, G
Runtime Complexity of BST
Worst case: n
Best case: lg n
Average case: lg n
AVL = Balanced/Self-Balancing Binary Search Tree
Type
Rule
Example
Balanced
Height
of left
sub
tree Height
of
right
sub
tree <=
1
Unbalanced Height
of left
sub
tree Height
of
right
sub
tree >=
1
Rotation of AVL
Type
Example
Single Rotation Left
Single Rotation Right
Double Rotation Left-Right
Double Rotation Right-Left
Complex Right Rotation
Complex Left Rotation
Complex Double Left-Right Rotation
Complex Double Right-Left Rotation
Insertion of AVL
Type
Example
Runtime Complexity of AVL
Heap
Heap must be a complete or nearly-complete binary tree
Heapify algorithm:
heapify(h, i)
Begin SET last TO length(h)
SET k TO i
REPEAT
SET j TO k
IF ( 2j <= last ) AND ( h[2j] < h[k] )
SET k TO 2j
ENDIF
IF ( (2j + 1) <= last ) AND ( h[2j+1] < h[k] )
SET k to 2j + 1
ENDIF
Swap( h[j], h[k] )
UNTIL j = k
END heapify
Min Heap/ Max Heap
If the root amount is the biggest, it is called Maximum heap
A max-heap is a specialized tree such that
• Each element (item) must be >= all of its descendants.
• All levels are full, except possibly the last level. If the last level (bottom level) is not full, all of its
nodes must be as far left as possible.
If the root amount is the smallest, it is called Minimum heap
A min-heap is a specialized tree such that
• Each element (item) must be <= all of its descendants.
• All levels are full, except possibly the last level. If the last level (bottom level) is not full, all of its
nodes must be as far left as possible.
The referencing for heap is top down, not left right. Hence, the amount on the left can be bigger or
smaller than the one on the right.
Add Node
Delete Node
Exam Question 3
Topic: Algorithm Design and Complexity
Difficulty level: ****
Look at recursive algorithm
Function
Algorithm
Returns the height of the Binary
Tree
integer height(Node* subroot) {
if (subroot == NULL)
return 0; // Empty subtree
else
return 1 + max( height(subroot->left()),
height(subroot->right()) );
};
Count the number of leaf nodes in integer count(Node* subroot) {
a Binary Tree
if (subroot == NULL)
return 0; // Empty subtree
else
if (subroot->isLeaf())
return 1; // A leaf
else
Return ( count(subroot->left()) +
count(subroot->right()) );
}
Count the total number of nodes
in the Binary Tree
integer count(Node* subroot) {
if (subroot == NULL)
return 0; // Empty subtree
else
if (subroot->isLeaf())
return 1; // A leaf
else
Return ( 1 + count(subroot->left()) +
count(subroot->right()) );
}
Fibonacci
Algorithm FibRec(n) {
If (n = 0)
Return 0
If (n = 1)
Return 1
Return FibRec(n-1) + FibRec(n-2)
}
Greatest Common Divisor (GCD) of Algorithm: GCD(int a, int b) {
int a and int b
𝑆𝑒𝑑 π‘Ÿ ← 0
While 𝑏 ≠ 0 do the following:
Int a and int b must be non𝑆𝑒𝑑 π‘Ÿ ← π‘Ž π‘šπ‘œπ‘‘ 𝑏
negative and
𝑆𝑒𝑑 π‘Ž ← 𝑏,
a >= b
𝑆𝑒𝑑 𝑏 ← π‘Ÿ.
Return (a)
}
Algorithm 𝐺𝐢𝐷 (π‘Ž, 𝑏) {
𝑆𝑒𝑑 π‘Ÿ ← 0
𝑖𝑓 (π‘Ž π‘šπ‘œπ‘‘ 𝑏 == 0)
π‘Ÿπ‘’π‘‘π‘’π‘Ÿπ‘› 𝑏
𝑒𝑛𝑑 𝑖𝑓
𝑆𝑒𝑑 π‘Ÿ ← π‘Ž π‘šπ‘œπ‘‘ 𝑏
π‘Ÿπ‘’π‘‘π‘’π‘Ÿπ‘› 𝐺𝐢𝐷 (𝑏, π‘Ÿ)
}
Runtime Complexity
Exam Question 4
Topic: Sorting
Difficulty level: ***
Describe sort algorithm (pseudo code), e.g. Given a set of unsorted data, apply the algorithm to sort
the data by describing the algorithm. Must know how to calculate runtime complexity of the
algorithm
Only 1 out of Insertion Sort, Selection Sort, Merge Sort, Quick Sort and Heap Sort will appear in exam
Summary
Insertion Sort
• Split into sorted & unsorted. Take the first number from unsorted, put into sorted and place it
in the correct position
Selection Sort
• Select smallest number from the list then put at 1st position. Then put the next smallest
number at 2nd position. Repeat
Merge Sort
• Divide, divide, divide then sort and merge
Quick Sort
• Pick pivot then divide move small numbers left, big numbers right. Keep repeating
Heap Sort
• Exchange the position of first and last node then heapify until done
Insertion Sort
Take the first number from unsorted, put into sorted and place it in the correct position
Method:
1.Split the list into sorted and unsorted
2.Take the first number from unsorted, put into sorted and place it in the correct position
Algorithm
Efficiency
•π‘› − 1 times through the outer loop.
•Variable times through each inner loop.
•On average 𝑛−𝑖 2 times through each inner loop.
• (𝑛 − 1) ((𝑛−1)/2) total inner loops.
•Algorithm is in Θ(𝑛2)
•Algorithm is in 𝑂(𝑛2)
•Algorithm is in Ω(𝑛2)
Selection Sort
Select smallest number from the list then put at first position
Method:
1. Find the minimum value in the list
2. Swap it with the value in the first position
3. Repeat the steps above for the remainder of the list
Algorithm
Efficiency
•π‘› − 1 times through the outer loop.
•π‘› − 𝑖 times through each inner loop.
•On average, Τ π‘› 2 times through each inner loop.
•(π‘›× π‘›−1)/2 total inner loops.
•Algorithm is in Θ(𝑛2)
•Algorithm is in 𝑂(𝑛2)
•Algorithm is in Ω (𝑛2)
Merge Sort
Divide, divide, divide then sort and merge
Method:
Divide and conquer
1.Divide into 2 nearly equal sub-lists
2.Sort each sub-list
3.Merge back together
Algorithm
Efficiency
•Merging efficiency is in 𝑂(𝑛) .
•Merge operation is called 𝑂(lg 𝑛) time recursively.
•Hence, Mergesort complexity is in 𝑂(𝑛 lg 𝑛).
•Note: Mergesort uses an additional array π‘₯[1..𝑛]. If x was local to merge, much more storage
would be used because of recursive calls.
Quick Sort
Pick pivot then divide move small numbers left, big numbers right. Keep repeating
Method:
Divide and conquer
1.Pick a pivot randomly (usually the 1st element)
2.Split the list into 2 by using the pivot. So the numbers smaller than pivot will be on the left and
larger ones on the right
3.Continue picking the pivot for the 2 lists and numbers smaller than the new pivot move to left,
larger on the right. Repeat until whole list is sorted
Algorithm
Efficiency
•On average, each partition halves the size of the array to be sorted.
•On average each partition swaps half the elements.
•Algorithms is in O(n lg n) in average case.
•Worst case, algorithm is in O(n2) . This scenario occurs when a list is in descending order, and it is to
be sorted in ascending order or vice versa.
Heap Sort
Method:
1. Turn the array to be sorted into a heap structure (We call this makeheap.)
2. Exchange the root, which is the largest element in the heap, with the last element in the unsorted
list, resulted in the largest element being added to the beginning of the sorted list.
3. Re-heap (We call this siftdown or heapify) the unsorted array.
4. Repeat step 2 and 3 until the entire list is sorted.
Algorithm
Efficiency
• Makeheap is 𝑂(𝑛 lg 𝑛).
• Siftdown is 𝑂(lg 𝑛).
• Heapsort is 𝑂(𝑛 lg 𝑛) + (𝑛 − 1) 𝑂(lg 𝑛) = 𝑂(𝑛 lg 𝑛).
Exam Question 5
Topic: Greedy
Difficulty level: **
Single source shortest path
Dijkstra
Knapsack
A*
Download