Sequential Estimation of the Mean of a Class of Skewed Distributions Abstract

advertisement
Theoretical Mathematics & Applications, vol.4, no.1, 2014, 55-69
ISSN: 1792- 9687 (print), 1792-9709 (online)
Scienpress Ltd, 2014
Sequential Estimation of the Mean of a Class
of Skewed Distributions
Mohamed Tahir 1
Abstract
In this paper, we propose a sequential procedure (t , µˆ t ) for estimating the mean, µ,
of a class of
skewed probability density functions, subject to the loss
function La = a 2 ( µˆ t − µ ) 2 + t , where a is a given positive number, t is a stopping
time of the type proposed by Robbins (1959) and µ̂ t is a bias-corrected estimator
of µ. We provide a second-order asymptotic expansion, as a → ∞, for the regret
with respect to the loss La. For the Pareto and Skew-uniform distributions, the
proposed sequential procedure (t , µˆ t ) performs better than the procedure
(t , X t ), in the sense that it has a lower asymptotic regret. Moreover, the regret is
negative for large values of a under the Gamma, Pareto, Rayleigh and
Skew-uniform distributions. Using the loss considered by Chow and Yu (1981)
and Martinsek (1988), we propose a bias-corrected estimator of µ and provide a
second-order asymptotic expansion, as a → ∞, for the incurred regret.
1
Department of Mathematics, University of Sharjah, United Arab Emirates.
E-mail: mtahir@sharjah.ac.ae
Article Info: Received: January 3, 2014. Revised: March 1, 2014.
Published online : March 31, 2014.
56
Sequential estimation of the mean
Mathematics Subject Classification: 62F10; 62L10; 62L12
Keywords: Asymptotic expansion, bias-corrected estimator, excess over the
stopping boundary, loss function, regret, skewness, stopping time.
1 Introduction
Let X1, X2, … be independent random variables with common probability
density function f θ (x ) , where the value of θ is unknown, but lies in some
interval Ω ⊂ (-∞, ∞).
Suppose that X1, X2, … are to be observed sequentially up
to stage n at a cost of one unit per observation and that when observation is
terminated, the population mean
µ=
∞
∫ xfθ ( x)dx
−∞
is estimated by an appropriate estimator, µ̂ n , and the loss incurred is of the form
La ( µˆ n ,θ ) = a 2 ( µˆ n − µ ) 2 + n,
(1)
where a is a known positive number, determined by the cost of estimation relative
to the cost of a single observation.
Robbins (1959) proposed the sequential
procedure (t , X t ), which stops the sampling process after observing X1, …, Xt and
estimates µ by µˆ t = X t , where



t = inf n ≥ ma : n > a



n
∑ (X
i =1
i

− X n )2 


n



(2)
with ma being a positive integer.
Let C
denote the class of skewed probability density functions, fθ (x),
θ ∈ Ω, for which the skewness is independent of θ. This class contains, among
Mohamed Tahir
57
others, the density functions of the following distributions:
1- GAMMA(α, θ): the Gamma distribution with known shape parameter α and
scale parameter β
f θ ( x) =
= θ.
Its density function is
1
x α −1 e x / θ , x > 0 and its skewness is γ = 2
.
α
θ Γ(α )
α
2- PARETO(α, θ): the Pareto distribution with known shape parameter α > 0
and scale parameter β = θ.
skewness is γ
=
Its density function is f θ ( x) =
2(1 + α ) α − 2
α −3
α
αθ α
, x ≥ θ and its
x α +1
for α > 3.
3- RAYLEIGH(θ): the Rayleigh distribution with shape parameter α =θ.
density function is f θ ( x) =
x
θ
2
−
e
x2
2θ 2
, x > 0 and its skewness is γ =
Its
2 π (π − 3)
.
(4 − π ) 3 / 2
4- SKEWUNIFORM(λ,θ): the Skew-uniform distribution with known λ and
unknown θ.
Its density function is f θ ( x) =
for − θ < x < θ and its skewness is γ =
1
θ2
[max{min{λx, θ },−θ } + θ ],
2λ (5λ2 − 9)
for − 3 < λ < 3.
5(3 − λ2 ) 3 / 2
In this paper, we propose a bias-corrected estimator µ̂t of µ. and provide a
second-order asymptotic expansion, as a → ∞, for the regret ra (t , µˆ t ) with
respect to the loss defined by (1). It is seen that the asymptotic regret is negative
for the Gamma, Pareto, Rayleigh and Skew-uniform distributions.
We also
provide second-order asymptotic expansion, as a → ∞, for the regret with respect
to the more general loss function considered by Chow and Yu (1981) and
Martinsek (1988).
58
Sequential estimation of the mean
In the Normal case, Starr and Woodroofe (1969) showed that ra (t , X t ) = O (1)
as a → ∞. Woodroofe (1977) showed that ra (t , X t ) = 0.5 + o(1) as a → ∞ if ma ≥ 4.
For the Gamma and Poisson cases, Starr and Woodroofe (1972) and Vardi (1979)
obtained bounded regret using stopping times different from the one in (2).
For
the distribution-free case, Ghosh and Mukhopadhyay (1979) and Chow and Yu
(1981) established asymptotic risk efficiency based on (2) under certain conditions.
Tahir (1989) proposed a class of bias-reduction estimators of the mean of the
one-parameter exponential family and provided a second order approximation for
the regret.
2 Preliminary Notes
Let t be as in (2). Martinsek (1988) indicated that
E[ X t ] = µ −
γ
1
+ o 
2a
a
(3)
as a → ∞, provided that E[|X1|8+p] < ∞ for some p > 0, where γ denotes the
population skewness;
that is, γ = σ −3 E[( X 1 − µ ) 3 ] , where σ is the population
standard deviation. Thus, X t is an asymptotically biased estimator of µ if
f θ (x) ∈ C.. Consider the bias-corrected estimator
µˆ n = X n +
for n ≥ 1.
γ
(4)
2a
Then, E[ µˆ t ] = µ + o(1) as a → ∞, by (3).
In order to define the regret incurred by the sequential procedure (t , µˆ t ) under the
loss (1), we first assume that X1, …, Xn have been observed sequentially up to a
predetermined stage n from a population with
density function f θ (x) ∈ C. The
risk incurred by estimating µ by (4), subject to the loss (1), is
Mohamed Tahir
59
Ra (n, θ ) = E[ La (n, µˆ n )]
= E[a 2 ( X n − µ ) 2 ] +
a 2γ
γ2
)]
[(
E
X
µ
−
+
+n
n
4
a2
a 2σ 2 γ 2
=
+
+ n,
4
n
This risk is minimized with respect to n by choosing n as the greatest integer less
than or equal to na = aσ.
The minimum risk is
Ra* (θ ) = Ra (na , θ ) = 2aσ +
for a > 0.
γ2
4
Since σ is unknown, there is no fixed-sample-size procedure that
attains the minimum risk in practice.
Therefore, we propose to use the sequential
procedure (t , µˆ t ), where t be as in (2).
The performance of this procedure is
measured by its regret, which is defined below.
Definition 2.1 The regret of the procedure (t , µˆ t ) under the loss (2) is defined as
ra (t , µˆ t ) = E[ La (t , µˆ t )] − Ra* (θ ) = E[a 2 ( µˆ t − µ ) 2 + t ] − 2aσ −
γ2
4
(5)
for a > 0.
The stopping time t in (2) can be rewritten as
−1 / 2




V 
t = inf n ≥ ma : n n 
> a ,


 n 


n
Vn = ∑ ( X i − X n ) 2
where
(6)
i =1
for n ≥ 1.
Let U a = t (Vt / t ) −1 / 2 − a denote the excess over the stopping
boundary. Chang and Hsiung (1979) showed that Ua converges in distribution to a
random variable U as a → ∞.
60
Sequential estimation of the mean
Lemma 2.2.
Let t be as in (2). Then,
t
→σ
a
w.p.1 as a → ∞. Moreover, If
E[|X1|8+p] < ∞ for some p > 0, then
3
E[t ] = a + ν − 0.5 − σ 4 (κ − 1) + o(1)
8
as a → ∞, where ν =E[U] is the asymptotic mean of the excess over the boundary
and κ = σ −4 E [( X 1 − µ ) 4 ] is the population kurtosis.
Proof: The first assertion follows from Lemma 1 of Chow and Robbins (1965).
The second assertion is adopted from Chang and Hsiung (1979).
3 Main Results
3.1 Asymptotic regret under the loss (1)
Let
X1, X2, … be as in Section 1.
The following theorem provides a
second-order asymptotic expansion for the regret in (5).
Theorem 3.1. Let t be defined by (2) with ma being such that δ√a ≤ ma = o(a) as a
→ ∞ for some δ > 0. For any probability density function f θ (x) ∈C
with
respect to which E[|X1|8+p] < ∞ for some p > 0,
ra (t , µˆ t ) = 2.75 − 0.75κ + 2γ 2 −
γ
2
+ o(1)
as a → ∞.
Proof: Substituting (4) in (5) yields
ra (t , µˆ t ) = E[a 2 ( X t − µ ) 2 + t − 2aσ ] + aγE[( X t − µ )]
= ra (t , X t ) + aγE[( X t − µ )]
for a > 0.
(7)
Moreover,
2
aE [( X t − µ )] = −γ / 2 + o(1) and ra (t , X t ) = 2.75 − 0.75κ + 2γ + o(1)
(8)
Mohamed Tahir
61
as a → ∞, by (3) and Martinsek (1983).
Take the limit as a → ∞ in (7) and use
(8) to complete the proof.
The distributions considered in Tables 1-5 in Section 4 below are positively
skewed, except for the Skew-uniform distribution with − 3 < λ < −
3
and
5
Skew-Laplace distribution with λ = 0.5. For Table 1, the minimum value of ρ* is
75/28
≈ 2.68, which is attained when α = 49.
The tables show that
1- the sequential procedure (t , µˆ t ) is a clear improvement over the procedure
(t , X t ) since its asymptotic regret is lower, except for the Skew-uniform
distribution with λ = -1.4.
2- the asymptotic regret of the procedure (t , µˆ t ) under the PARETO(5, θ) and
SKEWUNIFORM(λ, θ) distributions is negative; which means that, for large
values of a that the procedure (t , µˆ t ) performs better for these distributions
than the best fixed-sample-size procedure.
3.2 Asymptotic regret under a more general loss function
Let X1, X2, … be as in Section 1 and suppose that the loss function for
estimating µ is of the form considered by Chow and Yu (1981) and Martinsek
(1988); that is,
La ( µ n* , θ ) = a 2σ 2 β − 2 ( µ n* − µ ) 2 + n
for a > 0, where β
is a given positive number and µ n* is an estimator of µ.
(9)
If
*
θ is estimated by µ n = X n , Martinsek (1988) proposed to use the stopping
time
62
Sequential estimation of the mean
β /2


1 n


2
T = inf n ≥ ma : n > a ∑ ( X i − X n )  

 
 n i =1


(10)
and showed that the regret of the procedure (T , X T ) under the loss (9) is
ra* (T , X T ) = E[a 2σ 2 β − 2 ( X T − µ ) 2 + T ] − 2aσ β
= 3β −
β2
β2

+ 
− β κ + ( β 2 + β )γ 2 + o(1)
4  4

as a → ∞, provided that E[|X1|8+p] < ∞ for some p > 0.
(11)
Straightforward
calculations yield that, for large values of a,
1) ra* (T , X T ) is negative under the Gamma distribution with α = 0.5 if 0 < β < 0.1.
2) ra* (T , X T ) is negative under the Pareto distribution with α = 5 if 0 < β < 1.24.
Martinsek (1988) also indicated that
E[ X T ] = µ −
as a → ∞.
βγ
1
+ o 
β −1
2aσ
a
(12)
Thus, if the distribution of X1 is not symmetric, then X T is biased for
large values of a.
Proposition 3.2: Suppose that γ does not depend on θ and let
µ n* = X n +
for n ≥ 1, where β > 1.
βγ
2a
1/ β
n1−1 / β
Let T be defined by (10) with ma being such that δ√a ≤
ma = o(a) as a → ∞ for some δ > 0.
For any probability density function
f θ (x) ∈C with respect to which E[|X1|8+p] < ∞ for some p > 0,
E[ µT* ] = µ +o(1) as a → ∞.
Proof: For a > 0,
 T  − (1−1 / β ) 
E  
.
2  a 

βγ
(13)
Next, E[(T / a ) − (1−1 / β ) ] → σ 1− β as a → ∞ if β
> 1, by the fact that T/a → σβ
aE [ µT* − µ ] = aE [ X T − µ ] +
Mohamed Tahir
63
w.p.1 as a → ∞ and (2.2) of Martinsek (1983).
Taking the limit as a → ∞ in (13),
using this fact and (12) yields the desired result.
Let
ra* (T , µT* ) denote the regret of the biased-corrected procedure
(T , µT* ) under the loss (9). Then,
=
ra* (T , µT* )
=
 1

E[a 2σ 2 β − 2 ( X T − µ ) 2 + T − 2aσ β ] + βγσ 2 β − 2 a 2−1/ β E  1−1/ β ( X T − µ ) 
T

γ 2σ 2 β − 2  a 2− 2/ β 
(14)
+
E  2− 2/ β 
4
T

 a1−1/ β
 γ 2σ 2 β − 2  a 2− 2/ β 
ra* (T , X T ) + βγσ 2 β − 2 E  1−1/ β a ( X T − µ )  +
E  2− 2/ β 
4
T

T

Lemma 3.3: Let T be as in (3.2) with β > 1.
If E[|X1|8+p] < ∞ for some p > 0,
then
 a1−1 / β
 2( β − 1)
βγ
o(1)
E  1−1 / β a (X T − µ ) =
2 β +1
2σ 2( β -1)
σ
T

as a → ∞.
Proof: First, observe that
 a 1−1 / β

 a 1−1 / β

1 
1
E  1−1 / β a (X T − µ ) = E  1−1 / β − β −1 a (X T − µ ) + β −1 aE[ X T − µ ]
σ 
T

 T
 σ
for a > 0.
Moreover,
aE[ X T − µ ] = −
as a → ∞,
(15)
by (12) .
Next,
βγ
2σ β −1
+ o(1)
(16)
expand g(y) = 1/y 1-1/β at y = σβ, substitute y = a/T
and multiply by a (X T − µ ) to obtain
 a1−1 / β
1 
1 

T
 1−1 / β − β −1 a( X T − µ ) =  − 1T*1 / β − 2  − σ β a( X T − µ ),
β
a
T
σ






(17)
where T* is a random variable such that | T* - σβ| ≤ |T/a - σβ|. Next, rewrite T in
as T = inf{ n ≥ ma: n(Vn/n)-β/2 > a}, where Vn is as in (6), and let
64
Sequential estimation of the mean
V 
U a* T  T 
=
T 
− β /2
−a
denote the excess over the stopping boundary. Expanding h(y) = y-β/2 at y = σ2,
substituting y = VT/T and multiplying by T yields
V 
T T 
T 
− β /2
T
= β −
σ
β
β ( β + 2) (VT − T σ 2 ) 2
2
(
V
T
σ
)
−
+
T
2σ β + 2
8λTβ /2+ 2
T
for a > 0, where λT is a random variable between VT/T and σ2.
Furthermore,
T
write V T = ∑ ( X i − µ ) 2 − T ( X T − µ ) 2 to obtain
i =1
U a*=
T
σβ
−a−
β
β
β ( β + 2) (VT − T σ 2 ) 2
2
2
(
W
−
T
σ
)
+
T
(
X
−
µ
)
+
T
T
2σ β + 2
2σ β + 2
8λTβ /2+ 2
T
T
for a > 0, where WT = ∑ ( X i − µ ) 2 . It follows from easily that
i =1
T
σβ *
β
β
− σ=
(U a − ξT ) +
(WT − T σ 2 )
a
a
2aσ 2
(18)
for a > 0, where
ξt =
β ( β + 2) (VT − Tσ 2 ) 2
β
2
T
(
X
−
µ
)
+
.
T
T
2σ β + 2
8λTβ / 2 + 2
Substituting (18) in (17) yields
 a1−1/ β
 1  β 1/ β − 2
1 
(U a − ξT )( X T − µ )
 1−1/ β − β −1  a ( X T − µ ) =  − 1 σ T*
σ 
β

T
1  β
+  − 1 2 T*1/ β − 2 (WT − T σ 2 )( X T − µ )
β
 2σ
1 
1− β
I 2 (a ),
=  − 1 σ β I1 (a ) +
2σ 2
β

say.
Let S n = X 1 +  + X n , n ≥ 1.
Then,
(19)
Mohamed Tahir
65
 T*1/ β − 2

(
)
|]
(U a − ξT )( ST −=
E[| I=
a
E
µT ) 

1
 T

σβ
≤
a
σβ
aσ β

( S − µT ) 
a
E  (U a − ξT ) T*1/ β − 2 T

T
aσ β 

2
2


2/ β − 4  a   ST − µT 
E[(U a − ξT ) ] E T*
 
  
 T   aσ β  

2
2
2


1
2
2
2/ β − 4  a   ST − µT 
β
β
2σ E[U a ] + 2σ E[ξT ] E T*
≤
 
  
a
 T   aσ β  

→0
β
β
(20)
β
as a → ∞, by Hölder’s inequality, the fact that T* → σ (| T* - σ | ≤ |T/a - σ |→
0 w.p.1 since T/a → σβ, as in the first assertion of Lemma 1),
ST − µT
aσ β
converges in
distribution to a Standard Normal random variable by Anscombe’s theorem, the
facts that E[U a2 ] → E[U 2 ] < ∞ and E[ξ T2 ] = O(1) as a → ∞ and (2.3), (2.8) and
(2.9) of Martinsek (1983).
I 2 (a)
To evaluate E[I2(a)], observe that
2
ST − µT 
2aσ β 1/ β − 2 (WT − T σ 2 )( ST − µT )
β a 1/ β − 2  WT − σ T
2
T*
T
σ
=
+


*
T
aσ β
T
aσ β
aσ β 

2
2
2
 S − µT 
a 1/ β − 2  WT − σ 2T 
a
T*
− 2σ
− 2σ β T*1/ β − 2  T



β
T
T
aσ β 

 aσ 
in distribution

→ 2σ 1− 2 β (2 Z ) 2 − 2σ 1− 2 β Z 2 − 2σ 1− 2 β Z 2 =
4σ 1− 2 β Z 2
β
β
as a → ∞, by Anscombe’s theorem and the fact that T* → σ w.p.1 as
where Z is a random variable having the Standard Normal distribution.
E[ I=
4σ 1− 2 β + o(1)
2 ( a )]
as a → ∞, by (21) and (2.3) and (2.4) of Martinsek (1983).
( 21)
a → ∞,
Thus,
(22)
Taking expectation
in (19) and using (20) and (22) yields
 a1−1/ β
 2(1 − β )
1 
E  1−1/ β − β −1  a ( X T − µ=
+ o(1)
)

σ 
σ 2 β +1
 T

as a → ∞.
(23)
The lemma follows by taking the limit, as a → ∞, in (15) and using
(23) and (16).
66
Sequential estimation of the mean
Theorem 3.4: Let T be defined by (3.2) with ma being such that δ√a ≤ ma = o(a)
as a → ∞ for some δ > 0 and β > 1.
f θ (x) ∈ C
Then, for any probability density function
with respect to which E[|X1|8+p] < ∞ for some p > 0,
ra* (T , µT* ) = 3β −
β2

 β2
2 β ( β − 1)γ β 2γ 2 γ 2
−
+
+ o(1)
+ 
− β κ + ( β 2 + β )γ 2 +
σ3
4  4
2
4

as a → ∞.
Proof: The theorem follows by taking the limit, as a → ∞, in (14) and using
(11), Lemma 3.3 and the fact that
 a 2−2 / β
E  2−2 / β
T
as a → ∞ if β

1
 = 2 β − 2 + o(1)
 σ
> 1, by the fact that T/a → σβ w.p.1 as a → ∞ (see the first
assertion of Lemma 2.2) and (2.2) of Martinsek (1983) .
4 Tables
The tables below show the values of ρ and ρ* for certain skewed
distributions, where ρ * = ρ − γ is the asymptotic regret incurred by the procedure
2
(t , µˆ t ) and ρ = 2.75 − 0.75κ + 2γ 2 represents the asymptotic regret incurred by the
procedure (t , X t ) .
Table 1: GAMMA(α, θ) with known α
γ
κ
ρ
2
6
2.75 +
α
α
ρ*
3.5
α
2.75 +
3.5
α
−
1
α
Mohamed Tahir
67
Table 2: PARETO(5, θ)
γ
κ
ρ
ρ*
2(1 + α ) α − 2
= 4.6476
α −3
α
6(α 3 + α 2 − 6α − 2)
+ 3 = 73.8
α (α − 3)(α − 4)
-9.4
-11.7238
Table 3: RAYLEIGH(θ)
γ
2 π (π − 3)
= 0.6311
(4 − π ) 3 / 2
κ
3−
6π 2 − 24π + 16
= 3.2451
(4 − π ) 2
ρ
ρ*
1.11245
0.7969
Table 4: SKEW-UNIFORM(λ, θ) with λ = -1.4 and λ = 1.35
γ =
2λ (5λ2 − 9)
5(3 − λ2 ) 3 / 2
κ=
2λ (5λ2 − 9)
5(3 − λ2 ) 3 / 2
ρ
ρ*
γ<0
κ >0
-29.9109
-29.6997
3   3 

if λ ∈  − 3,−
 ∪  0,

5 
5

if − 3 < λ < 3
(λ = -1.4)
(λ = -1.4)
3
<λ<0
γ > 0 if −
5
-0.7671
(λ = 1.35)
-0.7909
(λ = 1.35)
68
Sequential estimation of the mean
5 Conclusion
We have proposed a bias-corrected estimator of the mean of a class of
skewed probability density functions and provided a second-order asymptotic
expansion for the regret under the squared error loss. The results indicate that the
proposed procedure performs better than the best fixed-sample-size procedure
when the observations are taken from the Gamma, Pareto, Rayleigh or
Skew-uniform distribution.
For a more general loss function, we have proposed
bias-corrected estimator of the mean and provided a second-order asymptotic
expansion for the incurred regret.
References
[1] F. Anscombe, Large sample theory of sequential estimation, Proceedings
Cambridge Philos. Soc., 48, (1952), 600 – 607, 1952.
[2] C. Chang, A.K. Gupta and W.J. Huang, Some skew-symmetric models,
Random Operators and Stochastic Equations, 10, (2002), 133 – 140.
[3] I.S. Chang and C.A. Hsiung, Approximations to the expected sample size of
certain sequential procedures, Proceedings of the Conference on Recent
Developments in Statistical Methods and Applications, Taipei, December
1979, Inst. Of Math. Acad. Sinica, (1979), 71 – 82.
[4] Y.S. Chow and K.F.Yu, The performance of a sequential procedure for the
estimation of the mean, Annals of Statistics, 9, (1981), 189 – 198.
[5] M. Ghosh and N. Mukhopadhyay, Sequential point estimation of the mean
when the distribution is unspecified, Commununications in Statistics -Theory
& Methods, A8, (1979), 637 – 652.
[6] A.T. Martinsek, Second order approximation to the risk of a sequential
procedure, Annals of Statistics, 11, (1983), 827 – 836.
[7] A.T. Martinsek.
Negative regret, optional stopping and the elimination of
Mohamed Tahir
69
outliers. J. A. S. A., 83, (1988), 160 – 163.
[8] H. Robbins, Sequential estimation of the mean of a normal population,
Probability and Statistics, The Harold Cramer Volume, (1959), 235 – 245.
[9] N. Starr and M. Woodroofe, Remarks on sequential point estimation.
Proceedings Nat. Acad. Sci., 63, (1969), 285 – 288.
[10] M. Tahir, An asymptotic lower bound for the local minimax regret in
sequential point estimation, Annals of Statistics, 17, (1989), 1335 – 1346.
[11] Y. Vardi, Asymptotic optimal sequential estimation: the Poisson case. Annals
of Statistics, 7, (1979), 1040 – 1051.
[12] M. Woodroofe, Second order approximations for sequential point and
interval estimation, Annals of Statistics, 5, (1977), 984 – 995.
[13] M. Woodroofe, Non Linear Renewal Theory in Sequential Analysis, Society
for Industrial and Applied Mathematics, Philadelphia, 1982.
Download