Advanced Graph Algorithms (I) • What we do not cover but you are expected to know – Mathematical induction, basic data structure, sorting, shortest path, minimum spanning tree, dynamic programming, divide-and-conquer – Homework assignment • Better if you know NP-complete, NP-hard – Pick it up yourself if not. It becomes a common sense in computer science. 1/24 Advanced Graph Algorithms (II) • Who should take – 對演算法有興趣 – 對抽象思考有興趣 – 培養基礎,以後有志於從事研究工作 – 有興趣知道如何將理論應用到實際問題 • 生物資訊,網路搜尋,自然語言 • 對任職於Google, Facebook, Microsoft有興趣者 • 課程網站(所有的announcement都放在此) – 我在研究院的網頁下面的教學網站 – http://iasl.iis.sinica.edu.tw/hsu/#teach 2/24 成績計算方式 • 小考(從作業內出題,作業不必繳交) • 一次期末考筆試 • 一個程式project • 期末論文presentation(視人數而定) 3/24 Biography • • • • • 許聞廉 Wen-Lian Hsu 中研院資訊所特聘研究員 1973 台大數學學士 1980 康乃爾Operations Research博士 1980-89 美國西北大學工業工程系 1989- 中央研究院資訊所 – 發展「自然輸入法」、許氏鍵盤 • Research interests: – Design of algorithms, artificial intelligence, natural language processing, bioinformatics, knowledge management 4/24 Scope • Consecutive ones test • Applications – Sequence assembly • PQ-trees and PC– Motif discovery trees – de novo sequencing • Planar graphs – Protein structure • Maximal Planar prediction Subgraph Algorithm • Chordal graphs • Interval graphs 5/24 A few Examples of Mathematical Induction • Most algorithms we designed are “recursive.” • N! = N x (N-1)! • The theoretical basis is “Mathematical Induction.” 6/24 A Hat Problem (I) N prisoners lined up in a row, each one can see the hats of all people in front of him. A person who guesses the color of his hat correctly can survive No strategy In the worst case, all men were shot. Strategy 1 (with collaboration) In the worst case, half of the men will be shot. 7/24 A Hat Problem (II) Strategy 1 (at least half can survive, probably ¾ will) Divide the men into two groups: odd-numbered and even-numbered. Each odd-numbered person should tell the person in front the correct color (since he can see it). As for the person himself, there is still ½ chance that he will survive) Design a strategy so that as few men will die as possible. 8/24 A Hat Problem (III) Message Passing Suppose we use 0 to indicate white hat and 1 for black hat Let the original sequence be 0 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 Then the sequences each man will see are as follows 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 How do you let each man guess the right # (except the first one)? odd-evenness (or parity) of the # of 0’s and 1’s. 9/24 A Hat Problem (IV) 0 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 1 • If the current hat is 0, then moving to the next sequence will only change the parity of 0 (the parity of 1 stays the same) • Everyone knows the parity of 0 and 1 for the sequence in front of him. • If the 1st person says the parity of 1 for his sequence (either odd or even), then by checking whether the parity of 1 changes, the 2nd person knows his hat color • By induction, everyone afterward can compute his hat color 10/24 Marriage Theorem There are n girls and n boys. Each girl has a list of boys she can marry. Assume a boy never rejects a girl’s offer. Under what condition can you find a perfect match? The following condition is both necessary and sufficient: Every set of r girls, 1 r n, like at least r boys. Sufficient: If the condition holds, then can find a perfect match Prove by induction (for the sufficient part, since necessity is clear). Necessary: If there is a perfect match,size thento the must hold. How do you reduce the problem a condition smaller one? Easy case: There is a subset of k girls who like exactly k boys By induction, can match these k girls with the k boys. Again, by induction, the remaining n-k girls can be matched to the n-k boys Otherwise: we have: Every set of r girls, 1 r n, likes > r boys. Marry a girl with a boy first, then for the remaining n-1 girls and11/24 n-1 boys, the condition still holds. Maximum Subsequence Sum Problem • Given (possibly negative) integers A1, A2,…, AN, find the maximum value of k=i Ak over all i, j j • For input –2, 11, -4, 13, -5, -2, the answer is 20 12/24 Algorithm 1: Brute-force method • Given n integers A1, A2,…, AN, how many subsequences can you form? • For example: 1, 2, 3, 4 – The possible sums include: • • • • 1, 1+2, 1+2+3, 1+2+3+4 2, 2+3, 2+3+4 3, 3+4 4 – Find the maximum in the above sums – An O(N3) solution 13/24 Algorithm 2: A Bit Clever Algorithm • By noting j A k=i k = Aj + j-1 A k=i k • We can reuse partial computation in previous steps 14/24 Algorithm 2: A Bit Clever Algorithm • Sum (i,j) = 0 for all i, j • For i = 0 to n For j = i to n Sum (i,j) Sum (i, j-1) + Aj end end • An O(N2) algorithm 15/24 Algorithm 3: Divide and Conquer • Divide Part: – Split the problem into two roughly equal subproblems, which are then solved recursively. • Conquer Part: – Patching together the two solutions of the subproblems with small amount of additional work. 16/24 Algorithm 3: Divide and Conquer • Maximum subsequence sum problem – Divide the sequence into two equal parts – The maximum subsequence sum can be found in one of three places: • Entirely in the left half of the input • Entirely in the right half of the input • Crosses the middle and in both halves 17/24 Algorithm 3: Divide and Conquer • In first half: 6 First Half 4 -3 Second Half 5 -8 -1 2 6 -2 18/24 Algorithm 3: Divide and Conquer • In second half: 8 First Half 4 -3 Second Half 5 -8 -1 2 6 -2 19/24 Algorithm 3: Divide and Conquer • Crosses middle: 11 First Half 4 -3 Second Half 5 -2 -1 2 6 -2 20/24 Algorithm 3: Divide and Conquer • When the maximum subsequence sum crosses the middle The maximum subsequence sum from the end • T(n) = 2T(n/2) + 2n/2 • An O(n log n) algorithm 21/24 Algorithm 4: The most clever one • An improvement over algorithm 2 • Clever observations: – No negative subsequence can possibly be a prefix of the optimal subsequence 22/24 Algorithm 4: The most clever one When would adding An change the maxSUMn ? • Use induction. Keep two sums at each iteration and update them based on the three conditions below. Note that maxendSUM is the optimal subsequence sum from the right end maxSUMn-1 maxendSUMn-1 maxSUMn-1 n-1 maxendSUMn-1 + An 1 1 n If maxendSUMn-1 + An 0, then maxendSUMn = 0 If maxendSUMn-1 + An > 0, then maxendSUMn maxendSUMn-1 + An If maxendSUMn-1 + An > maxSUMn-1, then maxSUMn = maxendSUMn-1 + An Else maxSUMn = maxendSUMn-1 23/24 Algorithm 4: The most clever one • An O(N) algorithm – It takes constant time to update the two sums at each iteration • Example: Sequence: -2, 11, -4, 13, -5, -2 Initially, maxendSUM0 = 0, maxSUM0 = 0 i = 1: (a) maxendSUM1 = 0 i = 2: (b) maxendSUM2 = 11, (c) maxSUM2 = 11 i = 3: (b) maxendSUM3 = 7, (c) maxSUM3 = 11 i = 4: (b) maxendSUM4 = 20, (c) maxSUM4 = 20 i = 5: (b) maxendSUM5 = 15, (c ) maxSUM5 = 20 i = 6: (b) maxendSUM6 = 13, (c ) maxSUM6 = 20 24/24 Common Computational Models • Discrete algorithm – Probabilistic, approximation, on-line, randomized • Non-linear programming (numerical) • Statistical – Regression – Machine learning • Neural net, SVM, Hidden Markov Model, Maximum entropy, Conditional random fields, – Evolutionary • Genetic algorithm, particle swarm • Areas: NLP, ASR, IR, IE, DM 25/24