Statistics 562 Winter 2004 Weighted Least Squares for Categorical Data INTRODUCTION Maximum Likelihood Estimation has been used to estimate parameters for models based on logit and cumulative models, among others. It is the most common approach for categorical data. We may want to use approaches other than these, such as modeling mean response, proportions, and more complicated functions. Weighted Least Squares is an extension of ordinary least squares that allows for correlated responses and non-constant variance. Example of Nonconstant Variance 20000 Residual 10000 0 -10000 -20000 Fitted Values Weighted Least Squares attempts to give each data point its proper amount of influence over parameter estimates. Weighted Least Squares is not associated with a particular type of function used to describe the relationship between variables. It can be used with functions that are either linear or nonlinear in the parameters. Note that Weighted Least Squares has similar asymptotic behavior as the Maximum Likelihood. THEORY The underlying structure for weighted least squares for categorical data is a contingency table. Suppose we have a categorical response variable Y with J categories. Consider a multinomial of sample sizes n1, n2, …, nI at I levels of an explanatory variable. Let = [1, 2, …, I] where I = [1|i, 2|i, …, J|i] with j|i = 1 denote the conditional distribution of Y at level i. Group 1 2 … i 1 n11 n21 … ni1 Response 2 … n12 … n22 … … … ni2 … J n1J n2J … niJ Total n1+ N2+ … ni+ We define the proportion p = nij/ni+ and pi = (pi1, p, …, pj) to be sample proportions for category i of the explanatory variable. When the samples are independent, the covariance matrix for the proportions in the i-th row is p (1 p ) pi1 pi 2 pi1 pij i1 i1 Vi 1 ni pi 2 pi1 pi 2 (1 pi 2 ) pij pi1 pij pi 2 pij (1 pij ) pi 2 pij The covariance matrix for the entire table is V 0 0 1 V 0 0 V2 0 0 V I MODEL ESTIMATION Let F(p) denote the vector of u I(J – 1) sample response functions. F() = [F1(), F2(), … Fu()] This could represent the mean scores, proportions, logits, and other functions. The choice of sample response function depends on what is being explored. A few common functions of F(p): F(p) = Ap where A is a matrix of known constants. F(p) = A log(p) where log transforms a vector to the corresponding vector of natural logarithms and A is orthogonal to 1. This is the loglinear model. Recall from the delta method for a vector of functions of an asymptotically normal random vector g(t) = [g1(t), g2(t), … gq(t)] dg dg n[ g (Tn ) g ( )] N 0, d d The asymptotic variance of F(p) depends on Q F (π) k a j | i u IJ matrix for k = 1, …, u and all IJ combinations (i, j). The asymptotic covariance of F(p) is VF = QVQ. VF is the weight component of weighted least squares, and it depends on the form of the response functions. The sample version of VF is represented by V̂F , after substituting the sample proportions into Q and VF. VF is usually nonsingular when the sample sizes ni+ are sufficiently large (ni+ 25). To obtain the parameter estimates we minimize the form [F(p) – X] V̂F- 1 [F(p) – X] The parameter estimate for Weighted Least Squares is ˆ - 1X) 1 XV ˆ - 1F(p) b (XV F F The Weighted Least Squares Estimate has an asymptotic multivariate normal distribution with estimated covariance matrix ˆ - 1X) 1 Cov(b) (XV F Recall that this method is similar to the parameter estimation for Ordinary Least Squares. We minimized (y – X) (y – X) and found β (XX) 1 Xy . In fact, for Weighted Least Squares, if VF is a constant multiple of I, then the estimate for weighted least squares is equal to the estimate for Ordinary Least Squares. The estimated covariance matrix of the predicted value F̂ Xb is ˆ X(XV ˆ - 1X) 1 X V ˆ F F TESTING MODEL ADEQUACY H0: F() – X = 0 We test the model using goodness-of-fit statistics W = [F(p) – Xb] V̂F- 1 [ F(p) – Xb] W is asymptotically chi-squared for large sample sizes ni+ 25. IN COMPARISON TO MAXIMUM LIKELIHOOD In the past Weighted Least Squares was computationally simple in comparison with Maximum Likelihood. This is no longer true. When a model holds, with large cell expected frequencies, Maximum Likelihood and Weighted Least Squares give similar results. Maximum Likelihood is more efficient when zero cell counts are present. Maximum likelihood is more efficient when there are many explanatory variables. Maximum likelihood is more efficient when explanatory variables are continuous. However, iterative reweighted least squares is the method for finding the Maximum Likelihood estimates using Fisher scoring. (See Section 4.6.2 in Agresti) WLS ESTIMATION FOR A MEAN RESPONSE Gender F M Job Satisfaction Income($) Very Dis Little Sat <5000 1 3 5k – 15k 2 3 15k – 25k 0 0 >25k 0 2 <5000 1 1 5k – 15k 0 3 15k – 25k 0 0 >25k 0 1 Mod Sat 11 17 8 4 2 5 7 9 Very Sat 2 3 5 2 1 1 3 6 SAS Code for Mean Response data jobsat; input gender income satisf count ; count2 = count + .01; datalines; 1 1 1 1 … proc catmod; * mean response model; weight count2; population gender income; response means; model satisf = income / WLS freq prob; run; The saturated model gives the following: Analysis of Variance Source DF Chi-Square Intercept 1 1323.78 gender 1 0.00 income 3 10.86 gender*income 3 1.36 Residual 0 . Pr > ChiSq <.0001 0.9517 0.0125 0.7160 . Analysis of Weighted Least Squares Estimates Standard ChiParameter Estimate Error Square Pr > ChiSq Intercept 2.9908 0.0822 1323.78 <.0001 gender 0 0.00498 0.0822 0.00 0.9517 income 1 -0.2798 0.1904 2.16 0.1418 2 -0.1828 0.1223 2.23 0.1350 3 0.2994 0.1122 7.12 0.0076 gender*income 0 1 -0.1168 0.1904 0.38 0.5398 0 2 -0.0364 0.1223 0.09 0.7657 0 3 0.00169 0.1122 0.00 0.9880 Reducing the model to include only gender and income gives: Source Intercept gender income Residual Parameter Intercept gender 0 income 1 2 3 Analysis of Variance DF Chi-Square 1 2137.02 1 0.08 3 10.84 3 1.36 Pr > ChiSq <.0001 0.7732 0.0126 0.7160 Analysis of Weighted Least Squares Estimates Standard ChiEstimate Error Square Pr > ChiSq 3.0365 0.0657 2137.02 <.0001 0.0198 0.0688 0.08 0.7732 -0.2266 0.1374 2.72 0.0992 -0.2107 0.1079 3.81 0.0510 0.2527 0.1011 6.24 0.0125 Lastly, we reduce the model to include only income to give: Source Intercept income Residual Analysis of Variance DF Chi-Square 1 2175.77 3 13.13 4 1.44 Pr > ChiSq <.0001 0.0044 0.8375 Analysis of Weighted Least Squares Estimates Standard ChiParameter Estimate Error Square Pr > ChiSq Intercept 3.0338 0.0650 2175.77 <.0001 income 1 -0.2389 0.1307 3.34 0.0676 2 -0.2149 0.1069 4.04 0.0445 3 0.2568 0.1001 6.58 0.0103 These fitted values above show us that for the lowest income level (less than 5000) the mean job satisfaction level is 3.0338 – 0.2389 = 2.7949, and so on. In comparison, when we model the mean job satisfaction using the Maximum Likelihood, we find that both income and gender are significant to the model. The model obtained is M 2.59 0.181x 0.030 g where x represents income level (1, 2, 3, 4) and g represents gender (1= females, 0 = males). Using this model we find that for the lowest income level, the mean job satisfaction is 2.771 for females and 2.471 for males. OTHER EXPLORATION TO BE DONE Perform contrast tests to explore the possible differences in parameter estimates for different income levels. H0: C = 0 We test the parameters using Wald goodness-of-fit statistics Wc = (Cb)[C(X V̂F- 1 X) -1 C] -1 (Cb) Wc is chi-squared with degrees of freedom equal to the number of linearly independent rows in the contrast matrix C. Use a data set that has larger cell frequencies and compare the results from Weighted Least Squares and Maximum Likelihood. Use other response functions such as adjacent categories logit and cumulative logit.