Uploaded by rahulschool02

hw-3-2023 (2)

advertisement
COMP 382: Reasoning about Algorithms
Homework 3
Konstantinos Mamouras
Michael Burke
Due on September 26, 2023
(released on September 16, 2023)
1
String Merging [15 points]
Let X, Y , and Z be three strings, where |X| = m, |Y | = n, and |Z| = m + n. We say that Z
merges X and Y iff Z can be formed by interleaving the characters from X and Y in a way
that maintains the left-to-right ordering of the characters from each string. For example,
both abucvwx and uavwbxc merge the strings abc and uvwx.
(1) [5 points] Give an efficient dynamic programming algorithm that determines whether
Z merges X and Y . What is the time complexity of your algorithm?
(2) [10 points] We will now generalize the problem so that the input includes a number K
(in addition to the strings X, Y, Z). K is the maximum number of errors (substitutions)
that we can allow during the merging. That is, the computational problem is to determine if there exists a string Z ′ (of length m + n) such that (i) Z ′ merges X and Y , and
(ii) Z, Z ′ differ at a maximum of K positions1 . Give an efficient dynamic programming
algorithm for this problem. What is the time complexity of your algorithm?
2
Inventory Planning [20 Points]
The Rinky Dink Company makes machines that resurface ice rinks. The demand for such
products varies from month to month, and so the company needs to develop a strategy to
plan its manufacturing given the fluctuating, but predictable, demand. The company wishes
to design a plan for the next n months. For each month i, the
Pncompany knows the demand di ,
that is, the number of machines that it will sell. Let D = i=1 di be the total demand over
the next n months. The company keeps a full-time staff who provide labor to manufacture
This is the same as saying the the Hamming distance between Z and Z ′ is at most K. The Hamming
distance between two strings A and B of the same length n is defined as
1
Ham(A, B) = |{i ∈ {1, . . . , n} | A[i] ̸= B[i]}|.
In other words, Ham(A, B) is the number of positions at which A and B differ.
1
up to m machines per month. If the company needs to make more than m machines in a
given month, it can hire additional, part-time labor, at a cost that works out to c dollars
per machine. Furthermore, if, at the end of a month, the company is holding any unsold
machines, it must pay inventory costs. The cost for holding j machines is given as a function
h(j) for j = 1, 2, . . . , D, where h(j) ≥ 0 for 1 ≤ j ≤ D and h(j) ≤ h(j +1) for 1 ≤ j ≤ D −1.
Give an algorithm that calculates a plan for the company that minimizes its costs while
fulfilling all the demand. The running time should be polynomial in n and D.
3
Optimal Matching of Sequences [30 Points]
For an integer n ≥ 1, we define [n] = {1, . . . , n}. Suppose we are given as input sequences
(i.e., arrays) of integers X = [x1 , . . . , xm ] and Y = [y1 , . . . , yn ]. A matching for (X, Y ) is a
subset M ⊆ [m] × [n] (containing pairs of indexes) that satisfies the following properties:
(i) (Left covered) For every index i ∈ [m] there is some j ∈ [n] such that (i, j) ∈ M .
(ii) (Right covered) For every index j ∈ [n] there is some i ∈ [m] such that (i, j) ∈ M .
(iii) (No crossing) There are no indexes i, i′ ∈ [m] and j, j ′ ∈ [n] with i < i′ and j < j ′
such that (i, j ′ ) ∈ M and (i′ , j) ∈ M .
The cost of a matching M for (X, Y ) is defined as follows:
X
cost(M ) =
(xi − yj )2 .
(i,j)∈M
We are interested in finding an optimal matching, that is, a matching with minimal cost.
(1) [3 points] Prove that every matching contains the pairs (1, 1) and (m, n).
(2) [7 points] Let f (m, n) be the number of all possible matchings for (X, Y ), where X =
[x1 , . . . , xm ] and Y = [y1 , . . . , yn ]. Give a recursive definition for the function f and
carefully explain how you obtained it.
(3) [20 points] Design an algorithm that computes a matching with minimal cost. Explain
why it is correct and discuss its time and space complexity.
4
Edit Distance & LCS [35 Points]
We will consider a generalization of the concept of edit distance, where the cost of an
insertion, deletion and substitution is ci , cd and cs respectively. Each parameter can be
chosen to be any positive extended real number, i.e., an element of {x ∈ R | x > 0} ∪ {∞}.
Keep in mind the following properties of ∞:
∞ + ∞ = ∞, x + ∞ = ∞, and x < ∞
for every x ∈ R.
2
(1) [3 points] Provide pseudocode for the procedure EditDistance(X, Y, ci , cd , cs ), where
X[1..m] and Y [1..n] are strings and ci , cd , cs are the edit operation costs. This procedure
should return the edit distance from X to Y , which we denote by D(X, Y ). Recall that
D(X, Y ) = min{cost(W ) | edit sequence W for (X, Y )},
as we defined in the lectures.
(2) [12 points] Prove that D(xa, ya) = D(x, y) for all strings x, y and every letter a. [Note:
This is not a trivial claim. You should justify it with a careful proof using the definition
of edit distance.]
(3) [20 points] Let L(x, y) be the length of the longest common subsequence of strings x, y.
We know from CLRS (page 396) that L is given by the following recursive definition:
L(x, ε) = 0
L(ε, y) = 0
L(xa, ya) = L(x, y) + 1
L(xa, yb) = max(L(xa, y), L(x, yb)) when a ̸= b
for all strings x, y and letters a, b.
For strings X and Y , show how L(X, Y ) can be computed in one line of code using only
one invocation of the procedure EditDistance(X, Y, ci , cd , cs ). Justify the correctness of your approach with a careful proof.
3
Download