MATRIX RANK ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ On the first path through multiple regression, it’s convenient to assume that X has full n p column rank. Let’s see exactly what this means. Recall that we’ve assumed n > p; this means that X is a “tall skinny” matrix with more rows than columns. Also, we can write X in the form X = 1 x1 x2 x3 ... xK to identify its columns. We will use here the accounting p = K + 1. The condition for full column rank is stated thus: X is full column rank { X a = 0 a = 0 } n p n p p1 n1 p1 p1 The product X a is really a linear combination of the columns, as n p p1 a0 a 1 a X a = 1 x1 x2 x3 ... xK 2 = 1 a0 x1 a1 x2 a2 x3 a3 ... xK aK n p p1 a3 aK The condition X a = 0 says that some linear combination of the columns is zero. n p p1 n1 The matrix X is said to be full column rank if and only if the only linear combination of its columns that is zero is formed by the vector of zero coefficients. A number of observations should be made. (1) (2) (3) We are illustrating these notions for regression design matrices, and these have an initial column 1. The definitions of rank have no particular relationship to this 1 column. Finding a non-zero a for which X a = 0 shows that X does not have full column rank. If n < p (a condition which we are disallowing) then X cannot have full column 1 4 2 rank by this definition. As a quick illustration, suppose that X = . 1 0 7 a0 1 4 2 1 4 2 a a The product X a is = a 0 1 1 1 0 7 a2 1 0 7 a2 1 ╔╗ ╚╝ gs2011 ╔╗ ╚╝ MATRIX RANK ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ a 4a1 2a2 = 0 . There are many choices for a for which this is 0. For 7a2 a0 7 example a = 1.25 will do. 1 (4) In the regression context, column rank deficiency is often detected easily. Here is a prototype example: 1 1 1 1 1 1 1 1 1 1 13 9 10 4 2 0 8 6 3 11 1 1 0 0 1 0 1 1 1 0 0 0 1 1 0 1 0 0 0 1 The last two columns might, for example, be gender indicators. Column x2 (the third column) could be the dummy variable for male subjects and column x3 (the final column) could be the dummy variable for female subjects. Note that x2 + x3 = 1. (5) If X has a column rank deficiency, then some columns are exact linear combinations of other columns. Columns can be removed until the matrix that remains has full column rank. The column rank of X is defined as the maximum number of (selected) columns which, if considered a matrix by themselves, would 1 2 4 1 3 6 have full column rank. The matrix 1 9 18 has column rank 2. If we 1 0 0 1 4 8 eliminate either the right column or the middle column, the resulting matrix would have full column rank 2. 2 ╔╗ ╚╝ gs2011 ╔╗ ╚╝ (6) ╔╗ ╚╝ ╔╗ ╚╝ ╔╗ ╚╝ 1 3 1 3 The matrix 1 3 1 3 1 3 MATRIX RANK ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╔╗ ╚╝ 2 2 2 has column rank 1. 2 2 ╔╗ ╚╝ ╔╗ ╚╝ ╔╗ ╚╝ ╔╗ ╚╝ ╔╗ ╚╝ There is a related concept called row rank, and it concerns linear combinations of the form c X . It can be shown that row rank is exactly equal to column rank. 1n n p Moreover rank M min(a, b). ab In the context of regression independent variable matrices X, this is another way of saying that we don’t want to consider n < p. In such a case, the rank could be at most n, and if n < p, we could not have full column rank. Here are some additional examples. These three matrices all have full column rank. 1 1 1 1 1 1 1 23 19 28 20 34 17 22 1 1 1 1 1 1 1 23 19 28 20 34 17 22 10.4 13.2 12.8 16.6 19.4 17.0 9.6 1 1 1 1 1 1 1 3 ╔╗ ╚╝ gs2011 23 19 28 20 34 17 22 10.4 -2.46 13.2 0.23 12.8 -1.77 16.6 1.28 19.4 3.22 17.0 -2.40 9.6 0.71 MATRIX RANK ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╔╗ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ ╚╝ These do not have full column rank. 1 1 1 1 1 1 1 23 19 28 20 34 17 22 23 19 28 20 34 17 22 1 1 1 1 1 1 1 1 1 1 1 1 1 1 23 19 28 20 34 17 22 15 11 15 14 13 11 12 38 30 43 34 47 28 34 1 1 1 1 1 1 1 23 19 28 20 34 17 22 15 11 15 14 13 11 12 238 330 413 374 407 328 334 8.4 6.6 7.8 7.2 6.5 6.0 9.3 -2.46 0.23 -1.77 1.28 3.22 -2.40 0.71 23 19 28 20 34 17 22 46 38 56 40 68 34 44 10.4 13.2 12.8 16.6 19.4 17.0 9.6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 65 74 58 61 70 72 66 0 0 0 1 1 0 0 0.31 0.45 0.42 0.36 0.49 0.28 0.33 ╔╗ ╚╝ ╔╗ ╚╝ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ╔╗ ╚╝ ╔╗ ╚╝ ╔╗ ╚╝ 23 6 19 6 28 6 20 6 34 6 17 6 22 6 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 7 -1 4 6 -2 5 3 The final example is 7-by-9. It has full row rank 7 (and thus rank 7), but it does not have full column rank. We would not use it in a regression analysis. 4 ╔╗ ╚╝ gs2011