Research Journal of Applied Sciences, Engineering and Technology 4(14): 2024-2029,... ISSN: 2040-7467

advertisement
Research Journal of Applied Sciences, Engineering and Technology 4(14): 2024-2029, 2012
ISSN: 2040-7467
© Maxwell Scientific Organization, 2012
Submitted: October 15, 2011
Accepted: November 25, 2011
Published: July 15, 2012
Optimal Penalty Functions Based on MCMC for Testing Homogeneity of
Mixture Models
1
Rahman Farnoosh, 2Morteza Ebrahimi and 3Arezoo Hajirajabi
Department of Mathematics, South Tehran Branch, Islamic Azad University, Tehran, Iran
2
Department of Mathematics, Karaj Branch, Islamic Azad University, Karaj, Iran
3
School of Mathematics, Iran University of Science and Technology, Narmak, Tehran 16846, Iran
1
Abstract: This study is intended to provide an estimation of penalty function for testing homogeneity of
mixture models based on Markov chain Monte Carlo simulation. The penalty function is considered as a
parametric function and parameter of determinative shape of the penalty function in conjunction with
parameters of mixture models are estimated by a Bayesian approach. Different mixture of uniform distribution
are used as prior. Some simulation examples are perform to confirm the efficiency of the present work in
comparison with the previous approaches.
Key words: Bayesian analysis, expectation-maximizationtest, markov chain monte carlo simulation, mixture
distributions, modified likelihood ratio test
INTRODUCTION
There have been numerous applications of finite
mixture models in various branches of science and
engineering such as medicine, biology and astronomy.
They have been used when the statistical population is
heterogeneous and contains several subpopulation. The
main problem in application of finite mixture models is
estimation of parameters. To date various methods have
been proposed for estimating the parameters of finite
mixture models. Some of the most importantones are
Bayesian method, maximum likelihood method,
minimum-distance method and method of moments
(Lindsay and Basak, 1993; Diebolt and Robert, 1994;
Lavine and West, 1992). After estimating the parameters
of finite mixture models a particular statistical problem of
interest is whether the data are from a mixture of two
distribution or a single distribution. The problem is called
testing of homogeneity in these models. At the beginig for
testing homogeneity may be the Likelihood Ratio Test
(LRT) that is the most extensively used method for
hypothesis testing problem, is employed. The LRT has a
chi-squared null limiting distribution under the standard
regularity condition. Since the mixture models don’t have
regularity conditions therefore limiting distribution of
LRT isvery complex (Liu and Shao, 2003). For overcome
the above mentioned weakness of LRT method the
Modified Likelihood Ratio Test (MLRT) has been
introduced by Chen (1998), Chen and Chen (2001) and
Chen et al. (2004). In MLRT a penalty is added to the
Log-likelihood function and as a result a simple and
smoot manner is established to the problem. But
dependence on establishment of several regularity
conditions and chosen penalty function are the
mainlimitation of MLRT. Li et al. (2008) has been
proposed Expectation-Maximization (EM) test based on
another form of penalty function that suggested
previously by Chen et al. (2004). This test is independent
of some necessary conditions for MLRT. But both of the
MLRT and EM tests are based on penalized likelihood
function. Therefore the efficiency of these tests is
influenced by the shape of the chosen penalty function.
Hence none of the these tests is generally optimal. To
remove the above mentioned disadvantage, in the present
study we consider the penalty function as a parametric
function and employ Metropolis-Hastings sampling as a
MCMC method for estimation parameters of mixture
models and parameter of determinative shape of the
penalty function. To our best knowledge the parametric
penalty function has not been studied before.
Furthermore, according to latest information from the
research works it is believed that the MCMC estimation
of penalty function based on different priors to testing
homogeneity of mixture model has been investigated for
the first time in the present study.
METHODOLOGY
The finite mixture models: A mixture distribution is a
convex combination of standard distributions fj :
∑ kj =1 p j f j ( y), ∑ kj =1 p j = 1
(1)
Corresponding Author: Rahman Farnoosh, Department of Mathematics, South Tehran Branch, Islamic Azad University, Tehran,
Iran
2024
Res. J. Appl. Sci. Eng. Technol., 4(14): 2024-2029, 2012
In most cases, the fj s are from a parametric family, with
unknown parameter 2 j leading to the parametric mixture
model:
∑ kj = 1 p j f ( y θ j ),
(2)
∑ kj = 1 p j = 1
where, 2 j , 1 and 1 is compact subset of real line. When
a sample y = (y1,...,yn) of mixture distribution is observed,
since it is not known tahteach observation is belongto the
what subpopulation, so obtained observations are called
incomplete data. In this case likelihood function and loglikelihood function of incomplete data is as follows,
respectively:
(
) ∏in= 1∑ kj = 1 p j f j ( yi θ j )
l(θ , P y ) = ∑ in= 1 log(∑ kj = 1 p j f j ( yi θ j ))
L θ, P y =
where, θ = (θ1 ,...,θ k ) and p = (p1,...,pk). Incomplete data
can be converted to complete data by considering a set of
indicator variables z = (z1,…, zn) and zi = (zi1,…, zik)
Therefor it is determinedthat what subpopulation
includewhich observations:
yi*zij = 1:f(yi*2j),
i = 1, ..., n, j = 1, ... k
l n ( p,θ1 ,θ2 ) =
(
( )
)
(
{
The LRT and the MLRT:
Suppose Y1,…, Yn is a random sample of mixture
distribution:
(
)
(4)
where 2 j , 1, j = 1, 2 and 1 is compact subset of real
line. We wish to test:
H0: p(1-p)(21!22) = 0
H1: p(1-p)(21!22) … 0
(
)
(
Rn = 2 log ln p$ ,θ$1,θ$2 − log l n p$ ,θ$0 ,θ$0
)}
Large value of this statistic leads to rejecting of null
hypothesis. Chen and Chen (2001) showed that two
sources of non-regularity are exist which complicate the
asymptotic null distribution of the LRT. One of these
sources is that the hypothesis lies on the boundary of
parameter space p = 0 or p = 1 and other is the mixture
model that is not identifiable under null model therefore
p = 0 or p = 1 and 21 = 22 are equivalent.For overcoming
the boundary problem and non-identifiability, Chen and
Chen (2001) proposed a penalty function in terms of p as
T(p) add to LRT statistic such that:
lim
p→ 0 or 1
T ( p) = −∞ ,arg max T ( p) = 0.5
(6)
p∈ [ 0, 1]
Tl n ( p,θ1,θ2 ) = l n ( p,θ1 ,θ2 ) + T ( p)
Use of this penalty function lead to the fitted value of
p under the modified likelihood that is bounded away
from 0 or 1. Chen and Chen (2001) showed if some
regularity conditions hold on density kernel then
asymptotic null distribution of MLRT statistic is the
mixture of the χ12 andχ 02 with the same weights, i.e.,
Zij
) ∑ in=1∑ kj =1 zij ⎧⎨⎩ log p j + log⎛⎜⎝ f j ( yi θ j )⎞⎟⎠ ⎫⎬⎭
( )
)
In this case penalized Log-likelihood function is as
follows:
(3)
lc θ , P y , z =
pf Y y θ1 (1 − P) f Y y θ 2
(
p$ ,θ$1 ,θ$2
If θ$0 and
are maximization of the
likelihood function under null and alternative hypothesis,
respectively, then the statistic that obtained from LRT is
as follow:
The likelihood function and Log-likelihood function
of complete data which is usually more useful for doing
inference is as follows, respectively:
⎛
⎞
Lc θ , P y , z = ∏ in= 1 ∏ kj = 1 ⎜⎝ p j f j yi θ j ⎟⎠
∑ in=1 log{ pf ( y θ1) + (1 − p) f ( y θ2 )}
0.5χ12 + 0.5χ 02
where, χ 02 is a degenerate distribution with all its mass at
0.
Parameters estimation based on EM algorithm: EM
algorithmis the most popular method for finding the
estimations of likelihood maximum. This algorithm is
consist of two steps E and M. In E-step conditional
expectation of complete data provide observational data
and unknown parameters, i.e.,
where, null hypothesis means homogeneity of population
and alternative hypothesis means consisting population
from two heterogeneous subpopulation as (1). Loglikelihood function can be written as follow:
( ) substitute
f (yθ )
pj f j yθ j
(5)
∑
k
j =1 p j
j
j
in zij and in M-step the value of parameters which
maximize log-likelihood function of complete data is
compute. Then E and M steps repeat until a stop condition
and reaching to convergence.
2025
Res. J. Appl. Sci. Eng. Technol., 4(14): 2024-2029, 2012
SupposeY1,…,Yn is a random sample of mixture
distribution (4), then log-likelihood function of complete
data is as follows:
∑ in=1[zi{log( p) + log( f ( yi θ1))}
+ (1 − zi ){log(1 − p) + log( f ( yi θ2 ))}⎤⎥
⎦
∑
p(t )
lcn (θ , p) =
(7)
In E-step, mathematical expectation is computed as
follow:
zi(1t ) = E ⎜⎛ zi yi;θ ( t −1) p ( t −1) ⎟⎞
⎝
⎠
=
(
)
+
−
1
p
f ( yi θ
) (
p ( t − 1) f yi θ1 ( t −1)
(
p ( t − 1) f yi θ1 ( t −1)
( t − 1)
1
( t − 1)
)
, i = 1, 2,..., n
(8)
∑ in=1
∑ in=1{ zi(1t ) log f ( yi θ1) )}
θ2( t ) = arg max
∑ in=1{(1 − zi(1t ) ) log( f ( yi θ2 ))}
θ1 ∈Θ
and
θ 2 ∈Θ
Now MLE is obtained by iteration of steps until
convergence of algorithm. Estimation of p depends on
choosing penalty function. Chen and Chen
(2001)proposed a penalty function as follows:
T ( p) = C log(4 p(1 − p))
(
)
and
n
(t )
∑ i =1 zi1 + c
2C + n
∑
(
)
2
) = log(4 p(1 − p))
Optimal proposed penalty function: In the present study
the following penalty function that is able to overcome the
disadvantages of the penalty functions (9) and (10) is
proposed:
(
)
g (h, p) = C log 1 − 1 − 2 p h , 0 < h ≤ 2
(10)
These test function, (9) and (10), hold in condition
(6). It is simply shown that by using penalty functions (9)
and (10), the value of p in M-step from (t+1)-th iteration
of EM algorithm is obtained as follow respectively:
p(t ) =
(
log 1 − 1 − 2 p ≤ log 1 − 1 − 2 p
(9)
where, C is a positive constant. Li et al. (2008) also used
a penalty function as the following form:
T ( p) = C * log 1 − 1 − 2 p
∑
For p = 0.5 the above mentioned inequality convert to
equality (1 − 1 − 2 p ) ≈ − 1 − 2 p . . So penalty functions (9) and
(10) are almost equivalent for values of p that are close to
0.5, but for values of p that are close to 0 or 1, a
considerable difference exist between these two penalty
function. The penalty function that suggested by Li et al.
(2008) has advantages and increase the efficiency of LRT,
but there is no reason that it can remove the problem of
optimality of these penalty functions.
zi(1t )
n
θ1( t ) = arg max
∑
The choice of penalty function: However the limiting
distribution of the MLRT does not depend on the specific
form of T(p), but the precision of the approximation and
its power do. A modified test can obtained from the
penalty function (9) and solve the problem of LRT. Since
the penalty function (9) puts too much penalty on the
mixing proportion when it is close to 0 or 1thenin spite of
clear observation of mixture distribution, MLRT statistic
can't reject null hypothesis. Hence a more reasonable
penalty function need such that power of MLRT increases
even mixing proportion is close to 0 or 1. Therefore the
penalty function (10) has been proposed that is holds in
the following relation:
In M-step, by putting th value of (8) in log-likelihood
function of complete data and maximizing this function
with respect to model parameters, we have:
p( t ) =
∑
(t )
(t )
n
n
⎧
i = 1 zi1 + C *
i = 1 zi1
⎪ min{
< 0.5
, 0.5}
⎪
n+ C*
n
=⎨
(t )
(t )
(t )
n
n
n
⎪
i = 1 zi1
i = 1 zi1
i = 1 zi1
> 0.5
0.5, max{
, 0.5
⎪ 0.5
n
n+ C*
n
⎩
(11)
It is clear that the penalty functions (9) and (10) are
special case of penalty function (11) and they are obtained
from (11) by substituting h = 1 and h = 2, respectively.
How to choose the value of h effects on the shape of
penalty function and consequently on the obtained
inferences. In classical approaches the choice of another
value for h leads to complexity of penalty function (11),
that it also would lead to difficulty of maximization in Mstep of EM algorithm. Due to difficulty of classical
approach in determining parameters of model and penalty
function, these parameters will be estimated in paradigm
of Bayesian approach. Therefore at the bigining suitable
prior distributions are considered for given parameters
and then the estimation of these parameters will be
computed as posterior. Range of parameter h in penalty
function (11) is interval (0, 2] but because the values of h
2026
Res. J. Appl. Sci. Eng. Technol., 4(14): 2024-2029, 2012
Fig. 1: Comparison of four penalty functions
Table 1: Mixture models that a data set is simulated from them
Model
p
µ2
µ1
1
2
3
4
0.05
0.10
0.25
0.5
-2.179
-1.500
-0.866
-0.500
MCMC algorithm: Suppose (Y1,…Yn) is a random
sample from a mixture pN (:1,1)+(1-p) (:2, 1) with an
unknown mixing proportion P = (p1 = p, p2 = 1-p), p
therefore for the penalized likelihood function we can
write:
0.115
0.167
0.289
0.500
∏i =1∏ j =1⎛⎜⎝ p j f ( yi µ j )⎞⎟⎠
2
2
h
n
∝ pn1(1 − p) 2 ∏ ∏ e − 0.5( yi − µ j ) (1 − 1 − 2 p )
j =1
i , zij = 1
Lc ( µ , Ρ , h z , y ) =
Table 2: Mean square error of bayesian and classic estimators by using
U(0, 1) as prior distribution for h
Parameters of model
------------------------------------------------Model
Estimator
h$
p$
µ$
µ$
1
1
2
3
n
−
n1
n2
( y1 − µ1 )2 − 2 ( y2 − µ 2 )2
2
e
θ$ML
2.5230
0.0680
0.0570
-
θ$
B
0.2430
0.0045
0.0044
0.0094
θ$ ML
0.1505
0.0210
0.01670
-
θ$
0.0031
0.0002
0.0025
0.1943
ML
0.3212
0.0021
0.0610
-
B
0.0081
0.0006
0.0022
0.0290
ML
0.00021
0.0380
0
-
π ( µ , Ρ , h) ∝ e
B
0.0004
0.0260
0.0009
0.0165
π µ , Ρ , h y , z = L µ , Ρ , h z , y π ( µ , Ρ , h)
B
θ$
θ$
4
= pn1 (1 − p) 2 e
2
θ$
θ$
zij
2
n
(1 − 1 − 2 p )
h
where, : = (:1, :2) and nj (for j = 1 and 2) are the number
of observations associated to the j components. Now, by
considering the normal prior distribution N (a, 1/b), a ,
Randb > 0, for both :1 and :2, prior distribution (a1, a2)
for p and finally by considering prior distribution U (0, 1)
and aU (0, 0.5)+(1 – a)U(0.5, 1) for h,we obtain:
(
in interval (0, 1) may further improve the power of MLRT
in this study the uniform distribution has been used as
prior distribution for p, in order to holding impartiality
between null and alternative hypothesis. The Uniform and
mixture of two uniform distributions have been
considered as prior distribution for h. In the next section
we are going to demonstrate MCMC algorithm that is
used in the present study for estimation of penalty
function.
)
∝e−
e−
−
b
( µ1 − a )
2
c
(
×e
−
b
( µ 2 − a ) a1 −1
2
p
)
(1 − p)a2 −1(1 − 1 − 2 p h)
n1
b
n
( y − µ )2 e − ( µ1 − a ) 2 e − 2 ( y2 − µ2 ) 2
2 1 1
2
2
b
h
( µ2 − a ) 2 pn1+ α 1− 1 (1 − p) n2 + α 2 −1 (1 − 1 − 2 p )
2
Bayesian analysis of mixture models has many
difficulties because of its natural complexity (Diebolt and
Robert, 1994). In this case posterior distribution dosn't has
a closed form hence for doing inference, MCMC
algorithm facilitates sampling of posterior distribution.
2027
Res. J. Appl. Sci. Eng. Technol., 4(14): 2024-2029, 2012
Table 3: Values and Mean square error for bayesian and classic estimators by using 0.25U (0,0.5)+0.75U (0.5,1) as prior distribution for h
Parameters of model
-------------------------------------------------------------------------------------------------------------Model
Estimator
h$
p$
µ$ 2
µ$ 1
1
-1.1262(1.1104)
0.3013(0.0348)
0.2194(0.0289)
θ$
ML
2
θ$B
1.9087(0.0763)
0.1003(0.0002)
0.0372(0.0058)
θ$ ML
θ$
0.5419(0.9290)
0.3861(0.0488)
0.4885(0.1518)
-
1.3100(0.0362)
0.0778(0.0095)
0.0896(0.0031)
0.6002(0.1005)
B
3
4
0.5445(0.0126)
θ$ ML
0.3870(0.2299)
0.5022(0.0456)
0.4976(0.0614)
-
θ$B
0.7564(0.0122)
0.3153(0.0007)
0.2497(0.0013)
0.7373(0.0185)
θ$ ML
0.2310(0.0725)
0.2444(0.0655)
0.5(0)
-
θ$B
0.3671(0.0175)
0.3908(0.0119)
0.4961(0.0007)
0.5555(0.0111)
Table 4: Values and Mean square error for bayesian and classic estimators by using 0.75U(0,0.5)+0.25U(0.5,1) as prior distribution for h
Parameters of model
-------------------------------------------------------------------------------------------------------------Model
Estimator
h$
p$
µ$ 2
µ$ 1
$
1
-1.6041(0.3314)
0.1871(0.0053)
0.2010(0.0230)
θ
ML
θ$B
-2.2611(0.0082)
0.0970(0.0002)
0.0584(0.0052)
0.4302(0.0034)
2
θ$ ML
θ$
-1.9715(0.0984)
0.2922(0.0162)
0.1881(0.0086)
-
-1.6471(0.0226)
0.2157(0.0023)
0.0979(0.0032)
0.3919(0.0039)
3
θ$ ML
-0.3041(0.3164)
0.6712(0.0332)
0.4978(0.0615)
-
θ$B
-0.7539(0.0127)
0.3521(0.0040)
0.2371(0.0017)
0.7787(0.0597)
B
4
θ$ ML
-0.3205(0.0341)
0.4122(0.0079)
0.4253(0.0063)
-
θ$B
-0.4044(0.0092)
0.5178(0.0003)
0.4631(0.0026)
0.2117(0.0168)
distribution.Typical use of MCMC sampling can only
approximate the target distribution. Two generation
mechanism for production such Markov chains are Gibbs
and Metropolis-Hastings. Since the Gibbs sampler may
fail to escape the attraction of the local mode (Marin
et al., 2005) a standard alternative i.e., MetropolisHastings is used for sampling from posterior distribution.
This algorithm uses a proposal density which depends on
the current state. In the following the general MetropolisHastings sampling is introduced:
Initialization: Choose P(0) and 2(0), h(0)
Stept: For t = 1, 2, ....
(
( t − 1) ( t − 1) ( t − 1)
~ ~ ~
,Ρ
,h
C Generate (θ , Ρ , h ) from q θ , Ρ , h θ
C Compute
r=
(
)(
~ ~
~
~
π θ ,~
p , h y q θ ( t − 1) , P( t − 1) , h( t − 1) θ , ~
p, h
⎛
⎝
π ⎜ θ ( t −1) , P( t −1) , h
( t − 1) ⎞ ⎛ ~ ~ ~ ( t − 1) ( t − 1) ( t − 1) ⎞
⎟
y⎟ q ⎜⎝ θ , p , h θ
,P
, h
⎠
⎠
C Generateu: u-U[0, 1]
if r < u then (2(t), p(t), h(t)) =
)
)
where q is proposal distribution that often is considered as
the random walk Metropolis-Hastings (Marin et al.,
2005).
SIMULATION STUDY AND DISCUSSION
Figure 1 exhibits the graph of two penalty functions
(9) and (10) for c* =1 and penalty function with Bayesian
estimation for h = 0.7373 and h = 0.3913. It is seen that
penalty function (9) in the point p = 0.5 is almost smooth
and puts no penalty while penalty function (10) and
penalty function with h = 0.7373 and h = 0.3913 put more
less penalty for values that are close to 0 or 1. In fact the
purpose of this section is to compare estimation of
parameters using the penalty function of (10) and penalty
function that obtained from Bayesian estimation. The
simulation experiment was conducted under normal
kernel with a known variance 1. The mean value for the
null distribution and alternative distributions is 0 and the
variance for the alternative models is set to be 1.25 times
greater than the variance under null model. Therefore we
can consider:
EHo(y) = 0, VarH0(y) = 1
(θ~, ~p , h~)
EH1 (Y) = p:1+(1!p):2
Else (2(t), p(t), h(t)) = (2(t!1), p(t!1), h(t!1))
2028
Res. J. Appl. Sci. Eng. Technol., 4(14): 2024-2029, 2012
and
REFERENCES
VarH1(y) = EH1(y2)!E2H1(y) = p(1!p)(:1!:2)2+1
Table 1 present four alternative models under these
conditions. The sample size n = 200 is simulated from
these models. Table 2, 3 and 4 present comparison
between classical and bayesian estimation of mixture
model parameters. The distributions U(0,1),
0.75U(0,0.5)+0.25U(0.5,1) and 0.25U(0,0.5)+
0.75U(0.5,1) are used as the prior distribution for h in
Table 2, 3 and 4, respectively. The simulation result show
that when p is close to 0 or 1 the Mean Square Error
(MSE) of bayesian estimator is less than one in classical
estimator that obtained via EM algorithm.
CONCLUSION
In This study an optimal penalty function in
comparison to well known penalty functions for testing
homogeneity of mixture models is proposed and
successfully Bayesian approach is employed to estimation
parameter of penalty function and parameters of mixture
models. Simulation experiments show that Bayesian
approach is better than the classical approach both in
estimation of model parameters and in determining of
optimal penalty function in position that mixture models
goes to non-identifiability.
Chen, J., 1998. Penalized likelihood ratio test for finite
mixture models with multinomial observations. Can.
J. Stat., 26: 583-599.
Chen, H. and J. Chen, 2001. The likelihood ratio test for
homogeneity in the finite mixture models. Can. J.
Stat., 29: 201-215.
Chen, H., J. Chen and J.D. Kalbfleisch, 2004. Testing for
a finite mixture model with two components. J. R.
Stat. Soc. B., 66: 95-115.
Diebolt, J. and C.P. Robert, 1994. Estimation of finite
mixture distributions through Bayesian sampling. J.
R. Stat. Soc. B., 56: 363-375.
Lavine, M. and M. West, 1992. A bayesian method for
classification and discrimination. Can. J. Stat., 20:
451-461.
Li, P., J. Chen and P. Marriott, 2008. Non-finite Fisher
information and homo-geneity: The EM approach.
Biometr., 96: 411-426.
Lindsay, B.G. and E. Basak, 1993. Multivariate normal
mixtures: A fast consistent method of moments. J.
Ame. Stat. Ass., 88: 468-476.
Liu, H.B. and X.M. Shao, 2003. Reconstruction of
january to april mean temperature in the qinling
mountain from 1789 to 1992 using tree ring
chronologies. J. Appl. Meteorol. Sci., 14: 188-196.
Marin, J.M., K. Mengerson and C. Robert, 2005.
Bayesian modeling and inference on mixture of
distributions. Handbook Stat., 25: 459-507.
2029
Download