Ph.D. Qualifying Exam in CSE Fall 2006 October 30th, 2006, 9am5pm

Ph.D. Qualifying Exam in CSE Fall 2006 October 30th, 2006, 9am5pm This exam is composed of four sections: numerical methods, discrete algorithms, modeling and simulation, and high-performance computing. Each section has ve questions. You are expected to answer six questions from the two subareas you chose on the CSE qualifying Exam Form (three from each subarea, or four from one and two from the other). This is an open-book, open-note exam, but you are not allowed to ask others for help. To save time, you need NOT type your solutions using a computer, especially for questions with complex equations. Please return your nished exam to Barbara Binder by 5pm. Good luck! 1 Numerical methods 1. Consider the linear system for all j, Ax = b where A is an and all other elements are zeros, and b n × n matrix with A(i, i) = 1 for all i, A(1, j) = j n × 1 vector wehre b(1) = n and b(i) = 1 for all is an i ≥ 2. (a) (25%) What is the condition number of (b) (30%) Consider the n×1 vector x̃ A in L1 norm? whose components are all 1's as an approximate solution to the above system. Calculate the residual norm kb − Ax̃k1 . kx − x̃k1 /kxk1 , where x is the exact solution of the system. You may not actually compute the solution x to the above linear system in answering this question. Use the condition number and this residual norm to give an upper bound for the norm x to the above system. Now, we have a new linear Cx = b where b is the same as above and C is an n × n matrix with C(i, i) = 1 for all i, C(1, j) = j for 1 ≤ j ≤ n − 1, C(j, n) = 1 for all j , and all other elements are zeros. Present a fast algorithm for computing the solution of the new system Cx = b. What is the computational (c) (45%) Suppose we have the exact solution system complexity of your algorithm? The faster your algorithm is, the better. 2. Let A= aT Â ∈ Rm×n , m > n, rank(A) = n Assume that we have the reduced QR decompositions A = QR where Q= q1T Q1 ∈ Rm×n has orthonormal columns and compute the reduced QR decomposition of Â R ∈ Rn×n is upper triangular. We want to Q and R factors of A that 1 e1 = to the columns of 0 as eciently as possible using we already have. We can do this by rst, orthogonalizing the unit vector Q. 1 (a) (20%) Supposed the reduced QR decomposition of Express x and α in terms of q1T Q1 1 0 Q1 , q1 , γ , q1T Q1 = or ( Q e1 ) is γ I x . h 0 α (1) h. (b) (25%) How can the QR decomposition (1) be obtained most eciently? ( Q e1 ) (c) (40%) After the QR decomposition of the QR decomposition of Â? is obtained, how would you proceed to obtain Present your algorithm and computational complexity. The less computational complexity your algorithm requires, the better it is. (d) (15%) Discuss what happens to your algorithm if the matrix 3. Given an n-by-n matrix W = [wij ], i, j = 1, . . . , n n X f (x) = and wij ≥ 0. A does not have full rank. x = [x1 , . . . , xn ]T , For dene wij (xi − xj )2 . i,j=1 Consider the following optimization problem, ( min f (x) | n X n X xi = 0, i=1 ) x2i = 1. i=1 (a) (50%) Show that the above optimization problem is equivalent to nding the eigenvector corresponding to the second smallest eigenvalue of the following matrix L = D − W, where D = diag(d1 , . . . , dn ) is a diagonal matrix with n X di = wij . j=1 f (x) in terms of L and x. v1 , . . . , vn , and there is an edge between vi and vj second smallest eigenvalue of L is positive if and only if G Hint: Use direct computation to rewrite (b) (50%) Consider a graph and only if wij 6= 0. G with n Show that the vertices if is connected. 4. Polynomial interpolation is equivalent to solving a linear system of equations. For example, if we use monomials, we end up with a Vandermonde coecient matrix for the linear system and if we use Lagrange polynomials we end up with an identity coecient matrix. (a) (25%) Show that if we use the Newton polynomials, the coecient matrix is lower triangular. (b) (75%) Show that divided dierence method for computing the Newton's interpolation formula is a special way of solving the lower triangular referred to in 1). 5. The dierential equation y 0 (t) = with initial condition y(0) = 0 p y(t) has the solution y(t) = 1 2 t . 4 The Euler scheme for this equation Yk+1 − Yk = hYk with initial condition Y0 = 0 has the solution Yk = 0 for all k. Discuss why the solution of the nite dierence scheme does not converge to the given solution of the dierential equation. Mention appropriate theorems that give convergence results for nite dierence schemes applied to dierential equations. 2 2 Discrete algorithms 1. In computational biology, DNA can be represented as a sequence of characters drawn from an alphabet of four letters, A, C, T, and G, representing the four nucleotides. Given two sequences and m S1 and S2 of n characters, respectively, describe what is meant by a local alignment. Given a similarity score of +2, a mismatch penalty of -1, and a gap score of 0, give an ecient sequential algorithm to compute the score of the best local alignment between S1 and S2 . What is the asymptotic complexity of your algorithm? What are the space requirements? Suppose now that you are given a multi-core processor with p cores (with 1 < p < min(n, m)), design and analyze a multicore algorithm for sequence similarity problem using local alignments that scales with the number of cores. 2. The problem of multiple sequence alignment on DNA sequences is that of nding the optimal alignment of a set of three or more sequences under the sum of pairs (SP) score scheme. Assume that the score is a metric (i.e., it obeys the triangle inequality) and prove that the multiple sequence alignment problem is NP-complete. 3. A problem of signicant importance to a network designer is nding the edges in the network whose removal causes the performance of network applications to degrade the most. A most vital edge is an edge, which if removed, causes the maximum change (increase) in the cost of the minimum spanning tree (MST(G)) of the graph G = (V, E) be a weighted undirected graph with n vertices and m w(e) assigned to it. Let f (G) be the weight of a minimum spanning tree of G if G is connected; otherwise f (G) = ∞. The most precious edge of G is an edge e such that f (G − e) ≥ f (G − e0 ) for every other edge e0 of G. Give the best known sequential algorithm for solving edges; each edge e G. Let has a weight this problem, and analyze its running time in terms of the problem size. 4. Give a O(log n) time parallel algorithm for solving the most vital edge problem on the concurrent read, exclusive write PRAM model. 5. Although merge sort runs on Θ(n lg n) worst-case time and insertion sort runs in time, the constant factors in insertion sort make it faster for small n. Θ(n2 ) worst-case Thus, it makes sense to use insertion sort within merge sort when subproblems become suciently small. Consider a modication to merge sort in which n/k sublists of length the standard merging mechanism, where k k are sorted using insertion sort and then merged using is a value to be determined. (a) In the rst part of the modied algorithm, the n/k sublists, each of length k, can be sorted by insertion sort. Analyze the worst-case running time for this step. (b) In the second part of the modied algorithm, the sublists can be merged together. Analyze the worst-case running time for this step. (c) What is the largest asymptotic (Θnotation) value of k as a function of n algorithm has the same asymptotic running time as standard merge sort. (d) How should k be chosen in practice? 3 for which the modied 3 Modeling and simulation 1. Several algorithms have been proposed in the parallel discrete event simulation literature to relax message ordering in order to improve performance. One approach is to use time intervals rather than precise time stamps on events, to indicate the event could happen any time within the specied interval. (a) Describe precisely the partial ordering that would apply using only time intervals to order events. (b) Modify the Time Warp algorithm to work with time intervals, and specify the local control algorithm that would be used. (c) Dene global virtual time using time intervals, and give an algorithm for computing its value. 2. Design a synchronization protocol where the topology of logical processes is always organized as a tree, and events (messages) only are sent down tree, i.e., from a parent to child node. State all assumptions in your protocol. Assume the tree can have arbitrary fanout (i.e., a node can have any number of children nodes). Your solution to this problem must exploit the fact that the topology is a tree, i.e., you will receive zero credit if you simply use one of the standard conservative synchronization algorithms that works for arbitrary topologies. Suppose the topology of logical processes is acyclic, but not necessarily a tree. Does your algorithm still work? Explain why or why not. 3. Suppose you are given a trace of a parallel discrete event simulation program that indicates what events are executed on each logical process, and what events are scheduled by each event. How would you determine the minimum execution time of this program assuming an unlimited number of processors, and zero time overheads for communication, synchronization, etc. Assume conservative synchronization is used. Write an algorithm for computing this lower bound. 4. Suppose that the altitude of the trajectory of a projectile is described by the second-order ODE u00 = −4. Suppose that the projectile is red from position a target at position t = 1, also of height t=0 and height u(0) = 1 and is to strike u(1) = 1. t = 0 required to h = 1 to derive a system of 0 slope s1 = u (1). What are the (a) Solve this problem by the shooting method. To determine the initial slope at hit the desired target at t = 1, use the trapezoid rule with step size two equations for the unknown initial slope s0 = u0 (0) and nal resulting values for the initial and nal slope? t = 0.5, together with the boundary u(t) approximating the solution. What is the resulting the point t = 0.5? (b) Solve the same BVP again using collocation at the time values to determine a quadratic polynomial approximate height of the projectile at (c) Comment on the advantages and disadvantages of the two dierent methods. 5. The diusion equation describes the change of density in a material undergoing diusion. A particular form is the heat equation, generally written as ∂φ = D 52 φ(~x, t), ∂t where 52 denotes the Laplace operator, and D denotes the diusion coecient. (a) Describe the discretization of the heat equation in two spatial dimension using the Crank-Nicholson method, which uses centered dierence in space and in time (at time t + 21 ∆t, where ∆t denotes the time step). (b) What is the order of accuracy of this discretization in time and space? Show the derivation of its accuracy in time. (c) Under what time step is this method stable? Outline a brief argument in three or four sentences to explain how this stability limit can be proven (you do not need to give the full proof ). 4 4 High-performance computing 1. Given an array bi = bi−1 + ai , A for of n elements, 1 ≤ i < n. we dene the n prex-sums B (a) Give an optimal RAM sequential algorithm to compute B of A as follows. Let b0 = a0 . Let and analyze its asymptotic time complexity. (b) Explain the dierence in performance one would observe running this algorithm on two hypothetical computers, one with a reasonable-sized cache and the other without any caches. Assume that A and B t within main memory. (c) Explain the principle that caches are designed to exploit, and whether or not this code exhibits this principle. (d) Assume now that A exceeds the capacity of main memory and resides on tertiary storage (e.g., B. disk). Give an optimal external memory algorithm for computing the prex-sums Estimate the ratio in performance between the in-memory and external memory approaches. 2. The design of microprocessors has abruptly switched to multi-core designs where two or more independent processing cores are packaged together on the same chip. We expect the number of cores per chip to grow to counts of 64 or more. (a) Give a computational model that may be suitable for multi-core chips. Argue why this model is a reasonable. Describe what critical aspects of design it encourages, and articulate any major issues it ignores. (b) Design and analyze a scalable multi-core algorithm for computing optimal prex-sums. Is this algorithm in your model? Why or why not? 3. Distributed memory machines, such as cluster computers, are often used to solve large-scale problems that require high-performance computing. In this problem, we are using a cluster of communication between two nodes takes between any pair of processors. t + lw p nodes, where time to send a point-to-point message of length l Assume that the network has sucient bandwidth such that no congestion occurs. We are given an array A n where p evenly divides n. The array A is stored in a block layout n/p elements of A are on node P0 , the second n/p elements of A are n > p2 and each node has O(n/p) memory. You may assume that n is a of size on the cluster such that the rst on node P1 , and so on. Let power of two. Design a message-passing algorithm to compute the prex-sums of A. Analyze the time complexity in terms of computation and communication. Derive the speedup compared with the best sequential algorithm. For what values of n and p can the maximum speedup be achieved? 4. Superlinear speedup is dened emprically as cases where a problem runs on p more than p times faster processors than it did on a single processor. Can a parallel algorithm have superlinear absolute speedup? Why or why not. Describe at least two common causes for reports of superlinear speedup in the literature. 5. Answer the following questions related to high-performance computer architecture. (a) Identify the principal dierence between a superscalar architecture and a VLIW architecture. (b) Why does the addition of multithreading to a processor help that processor to tolerate memory and communication latencies? (c) How does predication improve processor performance? (d) Describe two redundancy schemes used to provide higher reliability in disk arrays. (e) Compare and contrast a 128-processor cluster versus a 128-processor symmetric multiprocessor in terms of power, price, performance, applications, reliability, etc. 5

Ph.D. Qualifying Exam in CSE Fall 2006 October 30th, 2006, 9am5pm

Related documents

Products

Support

Ph.D. Qualifying Exam in CSE Fall 2006 October 30th, 2006, 9am5pm

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

Ph.D. Qualifying Exam in CSE Fall 2006 October 30th, 2006, 9am5pm