free gen sets.tex compiled September 4, 2005 T. Neylon 1 The problem In this note we will consider the following question: Suppose we are given a matrix B and we know that column vector x is in the kernel of B; that is, Bx = 0. What is the minimal information about x in terms of coordinates that we can obtain which determines the entire vector? Since the set {x|Bx = 0} = ker(B) is a vector space, and any vector space V has a matrix B with ker(B) = V, we can simplify this problem to asking: what is the minimal amount of information about a vector v ∈ V which uniquely determines v? This relates to predicting time series in the following sense: Suppose the columns of matrix A are time series, so that the last row represents the most recent observations from each series. Suppose also that we have determined a null matrix N so that AN = 0 and we expect the new (as yet unknown) row a to follow the rule that aN = 0. Then we might be able to determine all of a using only a small subset of its values. We know that N T aT = 0, so the solution of the original problem also applies to this scenario, if we interpret B = N T and x = aT . Then we may find a minimal set of time series which must be known in order to determine the values of all other time series. 2 Definition Let [n] denote the set {1, 2, . . . , n}. For any vector v ∈ Rn and set S ⊂ [n], let v(S) denote the projection of v into its S−coordinates — that is, v(S) is the vector x ∈ R|S| such that xi = vsi , where si is the ith smallest value in S (hence the sequence s1 , s2 , . . . , s|S| is the sorted version of S). Call subset S ⊂ [n] a generating set of vector space V iff there is a vector x ∈ R|S| so that the set V(S → x) := {v ∈ V|v(S) = x} is a singleton. Call S ⊂ [n] a free generating set of V iff for any x ∈ R|S| , the set V(S → x) is a singleton. Intuitively, fixed values on a generating set uniquely determine a vector in V, and a free generating set may assume any values and still uniquely determines a vector in V. 3 Which sets are generating sets? Several important facts about generating sets all follow from one crucial observation from elementary linear algebra. Given an m×n matrix V with rank(V ) = n and rows v1 , . . . , vm ; we can column reduce V to contain an identity matrix in rows S for some S ⊂ [m], |S| = n iff the vectors in V (S) := {vs |s ∈ S} are linearly independent. 1 Property 3.1 Suppose m × n matrix V has columns v 1 , . . . , v n which form a basis of vector space V ⊂ Rm . Then S ⊂ [m] is a generating set of V iff rank(V (S)) = n. Furthermore S is a free generating set iff S is exactly of size n. Proof. Let’s make the preliminary observation that, given x ∈ R|S| , finding V(S → x) can be reduced to discovering all solutions c to PS V c = x, where PS is the n × m matrix which projects onto the S−coordinates (so that the rows of PS V are V (S)). First we show that V (S) contains n linearly independent rows ⇒ S is a generating set. Let T ⊂ S be of size n so that V (T ) is linearly independent. In this case there must be a column reduction which transforms V in such a way that V (T ) becomes the identity. We could write this as V R = U , where U (T ) = I and R represents the column reduction of V as a matrix, so that R is invertible. Now suppose we are given x ∈ Rn and we want to find V(T → x). By the remark above, we hence want to solve PT V c = x. And PT V = PT U R−1 = IR−1 so that the unique solution c is given by c = Rx and V(T → x) is the singleton containing v = V Rx. To confirm that S is a generating set, just let x̃ = v(S) and we see that V(S → x̃) is indeed a singleton. Further, if S were exactly of size n, then we would have T = S in the above, and since x was arbitrary, we see that S must then be a free generating set. Next we see that S is a generating set ⇒ V (S) contains n linearly independent rows. Proceed by contradiction: suppose V (S) contains k < n linearly independent rows. We will see that if any x ∈ R|S| has a nonempty set V(S → x), then it has at least two elements. Clearly this is impossible for a generating set. Find T ⊂ [m], |T | = n so that rank(V (S ∩ T )) = |S ∩ T | = k and rank(V (T )) = n. As above, we can decompose V R = U so that R is invertible and U (T ) = I. This means that the rows of U (T ) are disjoint canonical vectors spanning Rn so that there exist sets K ⊂ [n] and L = [n] − K such that, for any y ∈ Rn , U (S ∩ T )y = y(K) and U (T − S)y = y(L). Now suppose we have some x ∈ R|S| with nonempty V(S → x). Then there must be some c1 ∈ Rn with PS V c1 = x. Let y1 = R−1 c1 , and choose some other y2 so that y1 (K) = y2 (K) but y1 (L) 6= y2 (L). Finally, let c2 = Ry2 . We will see that this gives us two distinct vectors V c1 6= V c2 ∈ V(S → x). Indeed, since the rows of V (S − T ) are dependent on the rows of V (S ∩ T ), this relationship must still hold for U (S − T ) and U (S ∩ T ). We can write this as: there exists matrix X such that U (S − T )y = X · (y(K)), using that U (S ∩ T ) = y(K). Now we may write that U (S ∩ T ) yi (K) PS V ci = PS U R−1 ci = U (S)yi = yi = , U (S − T ) Xyi (K) modulo a row permutation. At the same time, PT V ci = PT U R−1 ci = U (T )yi = yi . 2 These two equations show us that PS V c2 = PS V c1 = x, so that both V c1 and V c2 are in V(S → x); yet V c1 6= V c2 , so that V(S → x) can’t be a singleton. Finally, we must see that S is a free generating set ⇒ S has exactly n elements. Indeed, if S is a free generating set, then projection PS is a bijective linear map PS : V → R|S| , which means |S| = dim(V) = n. 2 From this result we immediately have Corollary 3.2 Any generating set T contains some S ⊂ T which is a free generating set. Clearly, the set of free generating sets also form the set of bases of a matroid. Returning briefly to the original problem, we may now begin with matrix B with V = ker(B) and find a basis matrix V so that V = col(V ). In particular, let V be any full null matrix of B. So if we return to the time series setting, then we have N as the full null matrix of some matrix A. In this case, we can find a subset S of the rows of A so that A(S)T is a full null matrix of N T . Any basis for row(A) will suffice. That is, V = row(A) and V = A(S)T . In this problem, the free generating sets correspond to the bases among columns of A for col(A). 3