We test the model using goodness-of

advertisement
Statistics 562
Winter 2004
Weighted Least Squares for Categorical Data
INTRODUCTION

Maximum Likelihood Estimation has been used to estimate
parameters for models based on logit and cumulative models,
among others. It is the most common approach for categorical
data.

We may want to use approaches other than these, such as
modeling mean response, proportions, and more complicated
functions.

Weighted Least Squares is an extension of ordinary least
squares that allows for correlated responses and non-constant
variance.
Example of Nonconstant Variance
20000
Residual
10000
0
-10000
-20000
Fitted Values

Weighted Least Squares attempts to give each data point its
proper amount of influence over parameter estimates.

Weighted Least Squares is not associated with a particular
type of function used to describe the relationship between
variables. It can be used with functions that are either linear or
nonlinear in the parameters.

Note that Weighted Least Squares has similar asymptotic
behavior as the Maximum Likelihood.
THEORY
The underlying structure for weighted least squares for categorical
data is a contingency table.
Suppose we have a categorical response variable Y with
J categories.
Consider a multinomial of sample sizes n1, n2, …, nI at I levels of an
explanatory variable.
Let  = [1, 2, …, I] where I = [1|i, 2|i, …, J|i] with  j|i = 1 denote
the conditional distribution of Y at level i.
Group
1
2
…
i
1
n11
n21
…
ni1
Response
2
…
n12
…
n22
…
…
…
ni2
…
J
n1J
n2J
…
niJ
Total
n1+
N2+
…
ni+
We define the proportion p = nij/ni+ and pi = (pi1, p, …, pj) to be
sample proportions for category i of the explanatory variable.
When the samples are independent, the covariance matrix for the
proportions in the i-th row is
 p (1  p )
 pi1 pi 2

 pi1 pij 
i1
i1
Vi 


1 
ni  



 pi 2 pi1
pi 2 (1  pi 2 )




 pij pi1
 pij pi 2







pij (1  pij )

 pi 2 pij
The covariance matrix for the entire table is
V
0  0
1
V

0
 


0

V2




0

0

 

V I 
MODEL ESTIMATION
Let F(p) denote the vector of u  I(J – 1) sample response functions.
F() = [F1(), F2(), … Fu()]
This could represent the mean scores, proportions, logits, and other
functions. The choice of sample response function depends on what is
being explored.
A few common functions of F(p):
F(p) = Ap where A is a matrix of known constants.
F(p) = A log(p) where log transforms a vector to the
corresponding vector of natural logarithms and A is orthogonal
to 1. This is the loglinear model.
Recall from the delta method for a vector of functions of an
asymptotically normal random vector g(t) = [g1(t), g2(t), … gq(t)] 


  dg   dg  
n[ g (Tn )  g ( )]  N 0,    
  d   d   


The asymptotic variance of F(p) depends on Q 
F (π)
k
a
 j | i
u  IJ matrix for k = 1, …, u and all IJ combinations (i, j).
The asymptotic covariance of F(p) is VF = QVQ.
VF is the weight component of weighted least squares, and it
depends on the form of the response functions.
The sample version of VF is represented by V̂F , after
substituting the sample proportions into Q and VF.
VF is usually nonsingular when the sample sizes ni+ are
sufficiently large (ni+  25).
To obtain the parameter estimates we minimize the form
[F(p) – X] V̂F- 1 [F(p) – X]
The parameter estimate for Weighted Least Squares is
ˆ - 1X) 1 XV
ˆ - 1F(p)
b  (XV
F
F
The Weighted Least Squares Estimate has an asymptotic multivariate
normal distribution with estimated covariance matrix
ˆ - 1X) 1
Cov(b)  (XV
F
Recall that this method is similar to the parameter estimation for
Ordinary Least Squares.
We minimized (y – X) (y – X) and found β  (XX) 1 Xy .
In fact, for Weighted Least Squares, if VF is a constant multiple of I,
then the estimate for weighted least squares is equal to the estimate for
Ordinary Least Squares.
The estimated covariance matrix of the predicted value F̂  Xb is
ˆ  X(XV
ˆ - 1X) 1 X
V
ˆ
F
F
TESTING MODEL ADEQUACY
H0: F() – X = 0
We test the model using goodness-of-fit statistics
W = [F(p) – Xb] V̂F- 1 [ F(p) – Xb]
W is asymptotically chi-squared for large sample sizes ni+  25.
IN COMPARISON TO MAXIMUM LIKELIHOOD

In the past Weighted Least Squares was computationally
simple in comparison with Maximum Likelihood. This is no
longer true.

When a model holds, with large cell expected frequencies,
Maximum Likelihood and Weighted Least Squares give similar
results.

Maximum Likelihood is more efficient when zero cell counts are
present.

Maximum likelihood is more efficient when there are many
explanatory variables.

Maximum likelihood is more efficient when explanatory
variables are continuous.

However, iterative reweighted least squares is the method for
finding the Maximum Likelihood estimates using Fisher scoring.
(See Section 4.6.2 in Agresti)
WLS ESTIMATION FOR A MEAN RESPONSE
Gender
F
M
Job Satisfaction
Income($) Very Dis Little Sat
<5000
1
3
5k – 15k
2
3
15k – 25k
0
0
>25k
0
2
<5000
1
1
5k – 15k
0
3
15k – 25k
0
0
>25k
0
1
Mod Sat
11
17
8
4
2
5
7
9
Very Sat
2
3
5
2
1
1
3
6
SAS Code for Mean Response
data jobsat;
input gender income satisf count ;
count2 = count + .01;
datalines;
1 1 1 1
…
proc catmod; * mean response model;
weight count2;
population gender income; response means;
model satisf = income / WLS freq prob;
run;
The saturated model gives the following:
Analysis of Variance
Source
DF
Chi-Square
Intercept
1
1323.78
gender
1
0.00
income
3
10.86
gender*income
3
1.36
Residual
0
.
Pr > ChiSq
<.0001
0.9517
0.0125
0.7160
.
Analysis of Weighted Least Squares Estimates
Standard
ChiParameter
Estimate
Error
Square
Pr > ChiSq
Intercept
2.9908
0.0822
1323.78
<.0001
gender
0
0.00498
0.0822
0.00
0.9517
income
1
-0.2798
0.1904
2.16
0.1418
2
-0.1828
0.1223
2.23
0.1350
3
0.2994
0.1122
7.12
0.0076
gender*income 0 1
-0.1168
0.1904
0.38
0.5398
0 2
-0.0364
0.1223
0.09
0.7657
0 3
0.00169
0.1122
0.00
0.9880
Reducing the model to include only gender and income gives:
Source
Intercept
gender
income
Residual
Parameter
Intercept
gender
0
income
1
2
3
Analysis of Variance
DF
Chi-Square
1
2137.02
1
0.08
3
10.84
3
1.36
Pr > ChiSq
<.0001
0.7732
0.0126
0.7160
Analysis of Weighted Least Squares Estimates
Standard
ChiEstimate
Error
Square
Pr > ChiSq
3.0365
0.0657
2137.02
<.0001
0.0198
0.0688
0.08
0.7732
-0.2266
0.1374
2.72
0.0992
-0.2107
0.1079
3.81
0.0510
0.2527
0.1011
6.24
0.0125
Lastly, we reduce the model to include only income to give:
Source
Intercept
income
Residual
Analysis of Variance
DF
Chi-Square
1
2175.77
3
13.13
4
1.44
Pr > ChiSq
<.0001
0.0044
0.8375
Analysis of Weighted Least Squares Estimates
Standard
ChiParameter
Estimate
Error
Square
Pr > ChiSq
Intercept
3.0338
0.0650
2175.77
<.0001
income
1
-0.2389
0.1307
3.34
0.0676
2
-0.2149
0.1069
4.04
0.0445
3
0.2568
0.1001
6.58
0.0103
These fitted values above show us that for the lowest income level
(less than 5000) the mean job satisfaction level is 3.0338 – 0.2389 =
2.7949, and so on.
In comparison, when we model the mean job satisfaction using the
Maximum Likelihood, we find that both income and gender are
significant to the model. The model obtained is
M  2.59  0.181x  0.030 g
where x represents income level (1, 2, 3, 4) and g represents gender
(1= females, 0 = males).
Using this model we find that for the lowest income level, the mean
job satisfaction is 2.771 for females and 2.471 for males.
OTHER EXPLORATION TO BE DONE

Perform contrast tests to explore the possible differences in
parameter estimates for different income levels.
H0: C = 0
We test the parameters using Wald goodness-of-fit statistics
Wc = (Cb)[C(X V̂F- 1 X) -1 C] -1 (Cb)
Wc is chi-squared with degrees of freedom equal to the number
of linearly independent rows in the contrast matrix C.

Use a data set that has larger cell frequencies and compare the
results from Weighted Least Squares and Maximum Likelihood.

Use other response functions such as adjacent categories logit
and cumulative logit.
Download