Latent class regression models – identification and estimation

advertisement
Latent class binary regression models – identification and estimation
Anders HolmI
and
Morten PedersenII
Abstract In this paper we analyse the identification of the latent class binary regression model. In
this model the latent classes are thought to represent unobserved heterogeneity. We show that
ignoring unobserved heterogeneity might lead to very biased results. We furthermore illustrate that
the model is well identified using panel data, but identification is fragile using cross-sectional data.
We propose, based on insight into the model and simulations that a simplified model might work
almost just as well for cross-sectional data. Finally we illustrate the applicability and performance
of the latent class logit regression model as opposed to an ordinary logit regression model, in two
different applications.
Keywords: Latent class, logit model, regression model, panel data, unobserved heterogeneity.
1. Introduction
In many social science applications of regression analysis one does not observe all the relevant
independent variables. In linear models problems with unobserved independent variables depend on
whether these variables are correlated with the observed independent variables, see e.g. Ejernæs and
Holm (2006). In contrast, in non-linear regression models it is well known that unobserved
independent variables potentially leads to bias in the effect of the observed independent variables,
even when the unobserved and observed variables are uncorrelated, see Cameron and Heckman
(1998), Bretagnolle and Huber-Carol (1988) or Abramson et al. (2000). Consequently, ignoring
omitted independent variables, even when they are uncorrelated with the observed independent
variables, might potentially lead to incorrect conclusions about the effect of the observed
independent variables.
In order to illustrate how to take into account omitted independent variables in non-linear regression
models we propose a simple version of the latent class regression logit model. Furthermore, we
illustrate why with this model the observed data can reveal information on the effects of both the
observed and unobserved independent variables on the dependent variable. We also illustrate why
Danish School of Education, University of Aarhus, Tuborgvej 164, 2400 Copenhagen NV, DK – 2400 Copenhagen,
phone +45 8888 9566, e-mail: ahol@dpu.dk.
II
Department of sociology, University of Copenhagen.
I
1
this model is sometime hard to estimate, especially with cross-sectional data, due to weak
identification and we propose a simple strategy to improve identification in this case.
In our approach the latent classes need not represent any particular type of omitted variables, but
can be seen as a non-parametric approximation to any unknown distribution of omitted variables.
Hence, even if the omitted variables are continuous or discrete, or both, we can think of the latent
class as a non-parametric approximation to the unknown distribution of these variables.
The justification of our approach comes from Lindsay (1983a) and (1983b) who showed that any
mixing distribution representing unobserved heterogeneity can be sufficiently approximated by a
latent class distribution with a fixed number of classes. However, the number of classes is
proportional to the number of observations. Although this argument makes sense intuitively, it leads
to a break down of some of the regularity conditions of maximum likelihood theory (a finite
parameter space with parameters in the interior of the parameter space) and, hence, it is impossible
to use the classical inference of maximum likelihood theory for these types of models. However, by
assuming that the number of latent classes is known in advance, one can use the standard maximum
likelihood for the parameters of the model. Furthermore, the exact number of latent classes can be
determined by alternative goodness of fit measures, e.g. the Bayesian information criteria (BIC) see
e.g. Dayton (1999). Although this strategy tends to understate the estimated standard errors of the
parameter estimates it has been shown to be a feasible solution in the applied literature, see Greene
(2003). In practical applications, see Heckman and Singer (1984), Davies (1993) or Holm (2002)
one often finds that a small number of latent classes are sufficient to capture the significant features
of the distribution of the omitted variables.
The remainder of the paper is organized as follows: Section two introduces the model, section 3
discusses identification, section 4 presents simulations results, section 5 contains an application, and
section 6 concludes.
2. The model
We analyze a latent class binary logit regression model. The dependent variable is Y and takes the
value y = 0 and y = 1. We formulate the latent regression logit model with J latent classes as:
2
jJ
jJ
exp(  βx   j ) P(   j )
j 1
j 1
1  exp(  βx   j )
P(Y  1 | x)   P(Y  1 | x,    j )P(   j )  
(1)
where  is a constant term, x is a vector of explanatory variables, β is a corresponding row vector
of regression coefficients,  j is the effect of the j’th latent class on the probability of observing Y =
1, and finally P(   j ) is the frequency of the j’th latent class in the population. The parameters of
the model to be estimated are  , β ,  j , P(   j ) , j = 1,..,J, where J is the number of latent classes.
This model takes into account unobserved heterogeneity arising from omitting independent
variables. The unobserved heterogeneity might either be thought of as the representation of a true
discrete distribution of unobserved heterogeneity or as an approximation to any unknown
distribution of unobserved heterogeneity, discrete or continuous.
The latent class frequencies, P(   j ) , must meet the restrictions: P(   j )  0 and

j j
j 1
P(   j )  1. Hence, the following re-parameterization is useful when estimating the
frequencies:
P (   j ) 
 
 exp  
exp  j
jJ
j 1
j
where now  j , j = 1,…J are parameters to be estimated. Furthermore we divide with 1 to get:
P (   j ) 

exp  j  1


1   j 2 exp  j  1
jJ


exp  j 
1   j 2 exp  j 
jJ
.
It follows that the number of identifiable parameters for the latent class frequencies is J –1.
Furthermore, we also find that re-defining  j     j
leaves P(Y  1 | x,    j )  P(Y  1 | x,    j ) , j = 1,…, J, hence a normalization of the effect of
the latent classes is warranted. We follow the so called dummy-coding and normalize 1  0 .
3
As the purpose of this paper is a discussion of the intuition behind identification and not a full
rigorous proof of identification of the latent class model, we work with a simplified two-class model
with one independent variable (sentence is very long). The model is then written as:
j 2
exp(   x   j ) P(   j )
j 1
1  exp(   x   j )
P(Y  1 | x)  
(2)
where x is now a single continuous variable and  is a regression coefficient and where 1  0 and
 2   . From (2) we construct the log-likelihood function for a sample of n independent
observations:
i n
ln L   yi lnP(Y  1 | xi )  1  yi  ln 1  P(Y  1 | xi )  (3)
i 1
where
P(Y  1| xi )  pP0i  1  p  Pi
and
P0i 
exp    xi 
exp    xi   
, Pi 
1  exp    xi 
1  exp    xi   
and finally where P(  0)  p and P(   )  1  p . Note that we now implicitly use:
P(  1  0) 
exp  2 
1
; P (   2 ) 
1  exp  2 
1  exp  2 
In the following example we illustrate how the latent class logistic regression model and the
standard logit regression model might lead to very different estimates of the effect of the observed
independent variable. Consider the following two way table:
- TABLE 1 HERE From the table we find the log-odds ratio to be, on average, roughly one. However, the table can be
thought of as comprised of the following two sub-tables:
4
-
TABLE 2 HERE –
From table two, it is evident, that in both sub-samples, the log-odds ratio is approximately two.
Hence, ignoring grouping, we estimate the log-odds ratio,  , with about 100 % bias. This is
confirmed by the following ML estimates from an ordinary logistic regression model and a latent
class model with two classes.
-
TABLE 3 HERE –
The likelihood value of the ordinary regression model and the latent class model seem not to yield
dramatically different fit to the data. The ratio of the log-likelihoods is 1.003, even though the
estimate of  differ dramatically between the two models. To illustrate this consider the following
figure:
- FIGURE 1 HERE The figure shows observed and predicted probabilities of Y = 1. From the figure it is clear that there
are only small differences between the predicted probabilities of the logistic regression model and
the latent class model (which in this case yields a perfect fit to the data because it is a saturated
model). It is likely that the variation in x will only yield minor discrepancies in predicted
probabilities between the logistic regression model and the latent class model. And often, from
these variations it will be difficult to determine whether these discrepancies are due to non-linear
effects of x on the log-odds of Y or the presence of latent classes.1
3. Identification
Going back to the log-likelihood function we find the following log-likelihood equations:
Var ( yi )
Var ( yi )
 ln L
  yi
 1  yi 

Pi
1  Pi
i
x Var ( yi )
x Var ( yi )
 ln L
  yi i
 1  yi  i

Pi
1  Pi
i
Var ( yi |    )
Var ( yi |    )
 ln L
  yi
 1  yi 

Pi
1  Pi
i
P P
P P
 ln L
  yi 0i  i  1  yi  0i  i
p
Pi
Pi
i
5
where Var ( yi )  pP0i 1  P0i   1  p  P i 1  P i  and Var ( yi |    )  P i 1  P i  .
From the Log-likelihood equations we find that when   0  P0i  P i . This means that
whenever   0 
 ln L
 0;  2 , i.e. there is no information on the value of p. In this case, the last
p
equation becomes redundant and identification of p is not possible. In practice this means that
when  is close to 0, the likelihood function might behave badly and identification might be
problematic.
In order to study how the observed information (the distribution of Y and X) may or may not lead to
identification of the distribution of the latent classes, we find the posterior distribution of
 conditional on Y and X:
P     | Y  y, X  x  
P  Y  y | X  x,     p
j 2
 P Y  1| X  x,    
j
j 1

exp(   x  ) y p
1 exp(   x  )
j 2
exp(   x  ) y p
 1 exp(   x j  j )
j 1

1
1 p
1  exp(   x   )
1
p exp( y ) 1  exp(   x)
Now differentiate wrt. x and equate to zero to obtain:
P     | Y  y, X  x 
 exp(   x  y ) 1  exp( )  p(1  p)
|  0 
x
p 1  exp( y ) 1  exp(   x)   1  exp(   x   ) 1  p 


2
0

 0
as the denominator is always defined. This means that whenever x varies so does the posterior
probability of observing a latent class membership, except when the latent class membership effect
is zero.
6
If we have panel data, i.e., repeated observations in both of Y and X, in general we have
that P(Y1  1)  1  P(Y2  0) , where subscript t =1, 2 denotes what part of the panel the observation
belongs. Hence changing values of not only x but also y along the panel might lead to information
on  . This can be seen by:
P     | Y1  1, X  x   P     | Y2  0, X  x 

1
1

0
1  p 1  exp(   x   )
1  p 1  exp(   x   )
1
1
p exp( ) 1  exp(   x)
p 1  exp(   x)

1  exp(   x   )
1  exp(   x   )

exp( ) 1  exp(   x) 1  exp(   x)

exp( ) 1  exp(   x)  1  exp(   x)

 0
Hence whenever y varies so does also the posterior probability of observing a latent class
membership, except when the latent class membership effect is zero. We also find that:
P     | Y1  1, X  x   P     | Y2  0, X  x '   0

 1  exp(   x) 
    x
 1  exp(   x ') 
  0 or   ln  
As the term inside the bracket will always be negative, this is not a feasible solution. Hence,
identification of  improves when both Y and X vary. Finally, note that
P     | Y1  1, X  x   P     | Y2  1, X  x ' does not lead to any conclusion about the value
of  . That is, the observations that only change the values of the independent variable do not
contribute to the identification of the latent classes.
We may summarize these findings in the following proposition:
7
   x  )
If P(Y  1| x,  )  1exp(
exp(   x  ) ,  ,  known, P     | Y  y, X  x   P     | Y  y ', X  x '  ,
y  y ' with P(Y  y )  1  P(Y  y ') or x  x'   | x, x ', y, y '  0.
Proof: See the appendix. The proposition states that if two different posterior probabilities are equal
for different x (the case of cross sectional data) or y and or x (the case of panel data), this must be
because the distribution of the latent classes are degenerate, at least for the observed information
used in the comparison. Hence, this observed information is non-informative with respect to the
distribution of the latent classes. Vice versa, if the posterior probabilities differ for different
observed (non-redundant) information this information is informative on the distribution of the
latent classes.
4. Some Simulations
In order to study the effect of cross sectional and panel data identification of the latent class model,
we run a number of simulations. We run 100 simulations each on datasets with 500 observations,
including repeated observations in panels. The simulations have varying degrees of identification in
terms of number of panels and the variation in x. The results are shown in table 5 below.
- TABLE 4 HERE From table 4 it is evident that the latent class model (LCM) with continuous x (infinite outcomes)
and five panels yields estimates which are close to the true values and with small Root Mean Square
Error (RMSE). However, it is also clear that in the case of only one panel (i.e. cross-sectional data)
and two outcomes of x, the LCM performs poorly, although it still estimates the slope coefficient of
x,  , with much less bias than the logit model. As the slope of x is our parameter of interest, we
may try to improve the fit of the model by reducing the number of nuisance parameters. Therefore
we fix the parameter for the weight of the latent classes (the transformed probabilities of the latent
classes,  2 ) to arrive at the Latent Class model with Fixed Weights (LCMFW).2 In order to asses the
impact of this in real applications we have fixed  2 at a value different from the true value.3 From
table 4 we find that in the weakly identified case this approach actually leads to better estimates,
whereas it leads to worse results in the better identified cases.
8
But why in particular fix the weight. Why not any of the other parameters? First of all, we found
from the likelihood equations, that the equation for the weight was redundant when the effect of the
latent class approached zero. Hence, for some values of the other parameters, there is no
information on how to choose a particular value of  2 . Further, from principal component analysis
(PCA) of the estimates in the simulations in table 4 we find the following eigenvalues and eigenvectors of the estimated parameters in the simulations:
- TABLE 5 HERE -
By comparing the two top panels of table 5, representing PCA of the simulations on panel data,
with the three lower panels, representing PCA on cross-sectional data, we find that the sum of the
eigen values are much lower in the simulations pertaining to panel data simulation than those
pertaining to cross-sectional data. This reflects the increased accuracy of panel data estimation
compared to cross-section estimation.
The first and largest eigenvalue corresponds to a eigen-vector with large loadings on the constant
term,  , and especially the effect of the latent class,  . Hence, large parts of the RMSE on these two
parameters are due to the fact that they are correlated. The second largest eigenvalue, which is still
of considerable relative size in the cross-sectional simulations, pertain to an eigen-vector with a
large loading in the weight of the latent class  2 . Therefore we conclude that a large part of the
RMSE on this parameter is not correlated with any of the other parameters or in other words,  2
can, on cross-sectional data, take a wide range of values that leave the other parameters relatively
unaffected. Hence, when identification is fragile, it seems relevant to fix  2 in estimations.
In the simulations presented in table 4 we fix the number of observations to 500. In order to
investigate on the impact of the sample size on the estimates we run a number of simulations for
different sample sizes, keeping the number of simulations for each sample size at 100. In figure 2a
and 2b we show bias and RMSE for a panel model with continuous x and five panels and an
increasing sample size.
-
FIGURE 2a + 2b HERE –
9
From the figure it is evident that both bias and RMSE decreases substantially for both the LCM and
the LCMFW, even for relatively small samples. In no cases does the bias of the LCM and the
LCMFW exceed that of the logit model. Even in relatively small samples the RMSE of both the
LCM and the LCMFW outperforms the logit model. Hence, estimation of latent class models seems
very feasible with panel data and rich variation in the independent variables. However, from
inspecting the RMSE, it seems that in very small samples (i.e. less than 400 observations) the LCM
is rather unstable. In this case reasonable results can be obtained with the LCMFW. In this case, we
do not get much bias but much better precision compared to the LCM where all parameters are
estimated. And the LCMFW still outperforms the logit model both in terms of bias as well as
RMSE.
In Figure 3a and 3b we show the Bias and RMSE for a cross sectional simulation with limited
variation in the independent variables (two levels).
- FIGURE 3a + 3b HERE From the figures it is evident that the LCM has considerable bias and large RMSE. In fact, for small
sample sizes it displays as much bias as the logit model. For small samples the LCM also has a
huge RMSE compared to the logit model. And for larger sample sizes the LCM still has a much
larger RMSE than the logit model. Hence, it appears that the LCM is not very feasible in this case.
However, the LCMFW seems to perform much better. It has a much smaller bias than the logit
model for all sample sizes and also a smaller RMSE than the logit model, at least for samples larger
than 200 observations. Therefore it seems relevant to use the LCMFW when identification of the
LCM fails or is very.
5. First application – trust in the parliament.
In this section we use a two-class latent class logistic regression model to analyse the relationship
between trust in the parliament and the position on a left/right political scale, see Arts and Gelissen
(2001)., age and gender. We use data from the Danish part of the international value study, see
Gundelach (2002). In the 1990 and 1999 panel 640 respondents where interviewed in both panels,
10
establishing a panel data set. After cleaning the data we are left with 484 individuals who appear in
our data. In table 6 we show descriptive statistics for both waves for the variables in the analysis.
- TABLE 6 HERE -
From the table we se that a little more than half of the respondents where males. This fraction does
not change as all individuals are in both panels. The average age in 1990 is 40 and hence 49 in
1999. More than two thirds live with a partner in both panels. Average household income is
293.000 DKK in 1990 and has risen to 386.000 in 1999. Trust is a binary variable recoded from a
four ordinal variable running from very much trust to very little trust in the parliament. About 42
percent of the respondents express either very much or somewhat trust in the parliament and are
hence coded as one in the data for this analysis. In 1999, average trust has increased to 50 percent.
This increase in trust along the panel could indicate that trust increases with age.
In table 7 we show estimation results from applying a logistic regression model, the LCM and the
LCMFW on the entire panel.
- TABLE 7 HERE -
From the table we find two important findings. First we find that the estimate of the only significant
independent variable, household income, is estimated at very different values for the logit model on
the one hand and the LCM and LCMFW models on the other. Hence, taking into account
unobserved heterogeneity in terms of two latent classes, seems to be important. Second, the
estimate of the class effect in LCM model is very large and the standard error of this estimate is also
huge, indicating weak identification. On the other hand, the class effect in the LCMFW model is
much lower and with a much lower standard error as well. Hence we find that the LCMFW model
is much better identified compared to the LCM model and it also yields a very similar fit of the
effect of the variables of interest, namely the independent variables. The BIC of the LCMFW model
is also lower than for the LCM. This is obtained by the lower number of parameters in the LCMFW
model as they have almost identical fit, reflected in the values of the -2 LnL. The estimated value of
the weight parameter,  2 , is very different from the value from the grid-search with the LCMFW.
This also indicates why estimation of the LCM model is problematic. That two very different values
11
of the weight parameter yield almost the same fit indicates a very flat likelihood function in the
dimension of the weight parameter. Hence it seems sensible to fix it to improve identification of the
remaining parameters in the model. This, of course, only makes sense if we get similar fit of the
parameters of interests, which is indicated by the simulations, and which also show up in this
application.
6. Second application - Unemployment and the dual labour market
In this application we look at the probability of being unemployed or employed on a sample of
Danish males. The theoretical background is the theory of dual labour markets which argues that
some individuals have a strong attachment to the labour market and a low risk of unemployment,
and other individuals have much weaker ties to the labour market and face a high risk of
unemployment.
Individuals with low qualifications are more often employed on fixed term contracts or temporary
positions. Hence, they are more often at risk of being unemployed. As a consequence, the theory
hypothesizes that at least two distinct groups exist in the labour market: One group that has strong
labour market attachment and no or only little unemployment and a second group that experiences
most of the unemployment in the labour market.
Empirical evidence in favour of the dual labour market is mixed. Sakamoto and Chen (1991) find
significant support for the dual labour market whereas Launov (2004) use a latent class count model
to analyze turnover and finds no evidence to support the dual labour market hypothesis.
In our application we use register data from the Danish administrative registers. The data covers the
years 1980 to 1995 with yearly information on employment status, socio-economic information and
marital status. We confine the analysis to males who are either employed or unemployed throughout
their observation period (which might not cover the entire sample period). Summary statistics for
the data is shown in table 8 below.
- TABLE 8 HERE -
12
From the table we find that respondents were on average unemployed 8 % of the time during the
observation period, 11 % had children between zero to two years. The respondents were on average
38.14 years old, 30 % were living alone, 46 % has a vocational education and 18 % had further
education, leaving 36 % unskilled. Note that some or all of these variables might change for an
individual through the observation period.
One of the virtues of register data is that there is no missing data due to attrition or measurement
errors due to recall bias. Hence, in these respects, the data is of very high quality compared to
longitudinal survey data.
We study dual labour market theory by applying a latent class model and a logit model to analyse
whether the respondent is employed (Y = 1) or unemployed (Y = 0) conditional on a number of
independent variables. In the latent class model unobserved heterogeneity is also taken into account
by the latent classes. As we have panel data with multiple panels for most of the observations, we
expect that the latent class model can be reliably estimated. Table 9 below shows estimation results
for the two models.
- TABLE 9 HERE -
We find that a latent class models with three classes yields a reasonable fit to the data. BIC did not
improve by adding more latent classes to the model. From -2lnl and BIC we see that the latent class
model provides a much better fit to the data than the logit model. We also find that the three latent
classes represent three distinct groups on the labour market. On class with very low unemployment
probability (represented with a mass-point equal to -3.236), a group with an intermediate
unemployment risk (the baseline case, with a mass-point normalized to 0) and a group with a very
high unemployment risk (represented with a mass-point equal to 2.045). With respect to the effects
of the observed independent variables we find that middle aged individuals with vocational or
further education and living with a spouse has a lower risk of unemployment than other individuals.
In sum, we find that unemployment risks are very unevenly distributed in the population, both
according to observed variables but also according to unobserved variables. Furthermore, we find
that the latent class model yields quite different estimated effects of the observed independent
13
variables compared to the estimates from the logit model. Hence, relying on the estimates from the
logit model might result in incorrect inference on the effect of the observed variables on the risk of
unemployed
7. Conclusions
In this paper we have demonstrated how to identify the latent class logistic regression model, and
we have shown how it performs with cross sectional and panel data. Two sources underlie the
identification of the model. One source is variation in the independent variables and the other
source is repeated measurement in terms of panel data. The latter source of identification is much
more powerful than the first. When identification is “weak” we suggest fixing the parameter for the
weight of one or more of the latent classes. Based on simulation evidence it turns out that fixed
weight can be a feasible strategy when the LCM proves unreliable. Although it might lead to some
bias this strategy is much better than the conventional logit model, both in terms of bias and
precision. Finally, we present an application in terms of analyzing the dual labour market. We show
that taking into account unobserved heterogeneity is important both in terms of obtaining
information of the dual labour markets and also in terms of obtaining unbiased parameters of the
observed independent variables of the model.
Appendix. Proof of proposition.
 e e 

Proof : let e  exp(   x), e

'
 exp(   x '). Then:

j 1
y '  y 
1  e
e  e  '  e  e  ' e   1  e   e  ' e  e  e  ' e 
 
e
P (  )
1 e e
j 2
e
y
 e e  P (  )
y
 j
j
j
11 e e

j 2

j 1
' 
e
e

y'
P (  )
1 e  'e

y'
' j
e
P (  j )

j
1 e  'e
(*)
If y  y ',    ' (cross-sectional data) (*) reduces to: e  e  e  '  e  ' e  e   e  1    0.
If y  0, y '  1, (panel data, the case y  1, y '  0 being similar) from (*) we get:
 e  e

2
 '
e
  e 1  e


 e  e  '   e   1  0 (**). This is a second order polynomial
in e with roots - 1+ee ee ' and 1. As the first root is negative we can discard this solution and is only
 '
left with e  1    0.
References
Abramson, C., R. L. Andrews, I. S. Currim and M. Jones (2000), Parameter Bias from Unobserved
Effects in Multinomial Logit Model of Consumer Choice. Journal of Marketing Research, 37, 410426.
14
Arts, W., & Gelissen, J. (2001). Welfare States, Solidarity and Justice Principles: Does the Type
Really Matter? Acta Sociologica, 44, 284-299.
Bretagnolle, J. and C. Huber-Carol (1988) Effects of Omitting Covariates in Cox’s Models for
survival Data”, Scandinavian Journal of Statistics, 15 (2), pp. 125-138.
Cameron, S. V. and J. J. Heckman (1998) Life cycling scholling and Dynamic Selection Bias:
Evidence for Five Cohorts of American Males, Journal of political economy, 106 (2), pp. 262-333.
Davies, R. B. (1993) Nonparametric control for residual heterogeneity in modelling recurrent
behaviour, Computational Statistics & Data Analysis, 16 (2), pp. 143-160.
Dayton, M. C. (1999), Latent Class Scaling Analysis, Sage, series in quantitative applications in the
social sciences, 126.
Greene, W. (2003), A latent Class model for discrete choice analysis: contrast with mixed logit,
Transportation Research, Part B: Methodological, 37 (8), pp. 681-698.
Gundelach, P. (Eds.) (2002) Danskernes værdier 1981-1999. Copenhagen: Hans Reitzels Forlag.
Ejernaes, M. and A. Holm (2006) "Comparing fixed effect and covariance structure estimators",
Sociological Methods and Research, vol 35,
Heckman, J.J. and B. Singer (1984), A method for Minimizing the Impact of Distributional
Assumptions in Econometric Models for Duration Data, Econometrica, 52 (2), 271-320.
Holm, A (2002) "The Effect of Training on Search Durations; A Random Effects Approach",
Labour Economics, vol 9
Launov, A. (2004), An Alternative Approach to Testing Dual Labour Market Theory. IZA
Discussion Paper No. 1289.
Lindsay, B. G. (1983a), The geometry of mixture likelihoods: a general theory. Annals of statistics,
11, pp. 86-94.
Lindsay, B. G. (1983b) The geometry of mixture likelihoods, Part II: the exponential family. Annals
of statistics, 11, pp. 783-792.
Murphy, S. A. and A. W. van der Vaart (2000) On Profile Likelihood, Journal of the American
Statistical Association, Vol. 95, No. 450, pp. 449-465.
Sakamoto, Arthur and Meichu D. Chen, (1991), Inequality and Attainment in a Dual Labour
Market, American Sociological Review, Vol. 56, No. 3, pp. 295-308Acknowledgements
We gratefully acknowledge comments made by participants at the seminar in applied statistics held
at the University of Aarhus and from Jan Høgelund and Mads Meyer Jæger.
15
Anders Holm is professor at the school of education, University of Aarhus and has published in
quantitative methods, social mobility and labor market research. Some of his previous papers have
appeared in Social science research and Sociological methods and research.
Morten Pedersen is a research assistant at the department of Sociology, University of Copenhagen
and has extensive computer programming knowledge in R, SAS, Gauss and SPSS.
16
1
However, sometimes it may be conceivable that very erratic deviations from a smooth effect of x are due to
unobserved effects. For example, if we observe clear non-smooth effects of age this is very likely not due to
misspecifications of the age effect, but rather something else, e.g. omitted independent variables. But identifying such
effects usually requires a lot of data. We shall return to this below.
2
One could also argue that one might pursue a profile likelihood approach, see Murphy and van der Vaart (2000),
where one iterate between maximizing the a likelihood with fixed weights and fixing the remaining parameters while
estimating the weights. We have tried simulations with this approach but found that the results from this approach
yields very similar behaviour of the likelihood function as with the full model. Hence in terms of obtaining a better
behaved likelihood function fixing the weight completely (even at a “wrong” value) yields a much more precision on
the remaining parameters than both a full maximum likelihood approach as well as the profile likelihood approach.
17
Download