Application of new model-based and model-assisted methods for estimating the

advertisement
Application of new model-based and
model-assisted methods for estimating the
finite population mean of the IBEX’35 stock
market data
M. Rueda1 , I. Sánchez-Borrego1 , S. González2 , J.F. Muñoz3 , and S. Martı́nez4
1
2
3
4
Department of Statistics and Operational Research. University of Granada.
Campus de Fuentenueva s/n. 18071. Granada. España.
mrueda@ugr.es,ismasb@ugr.es
Department of Statistics and Operational Research. University of Jaén.
Department of Cuantitative Methods for the Economy and the Business.
University of Granada.
Department of Statistics and Applied Mathematics. University of Almerı́a.
Summary. In this paper new model-assisted and model-based estimators of the
finite population mean, based on local polynomial regression are applied to the
IBEX’35 Spanish stock market data from april 2004 to october 2005. This data set
has one discontinuity point in september 2004. We adapt local linear kernel regression
to discontinuities and propose two jump-preserving model-based and model-assisted
estimators. An empirical comparison between the methods is performed.
1 Introduction
Classical sampling estimators are usually constructed based only on the design,
making no assumption about the nature of the population under study. In many
sample surveys auxiliary information on the finite population is available, and this
could improve the precision of the estimators, compared to a purely design-based
estimator. To incorporate this information consider a superpopulation model for y
built on an unknown function of x:
Yj = m(xj ) + ej ,
j = 1, ..., N.
(1)
The ej , j = 1, . . . , N are independent and identically distributed with E(ej ) = 0
and V ar(ej ) = σ 2 for j = 1, ..., N .
Traditionally, methods that utilize such models are the parametric ones, in which
the working model (1) describing the relationship between the auxiliary variable x
and the study variable y is assumed. The regression function is selected from a particular class of functions, sometimes based in a prior knowledge of the relationship
1698
M. Rueda, I. Sánchez-Borrego, S. González, J.F. Muñoz, and S. Martı́nez
of the variables under study. When there is no such knowledge nonparametric methods are more appropriate, because only smoothing assumptions over the regression
function are made.
Kuo [Kuo88] adopted a nonparametric model-based approach, which does not
place any restrictions on the relationship between the variables under study. Other
important works in this area are given by Chambers et al. [Cha93], Dorfman [Dor92]
and Dorfman and Hall [DH93].
Under the model-assisted approach Breidt and Opsomer [BO00] relies on local
polynomial kernel regression for the estimation of the unknown regression function
m(·). Most traditional nonparametric regression methods work under the assumption that the regression function is smooth. However, there are plenty of examples
available and real-life data problems where the regression function is smooth but in a
finite number of points. For example, the economical impact in a region of a natural
disaster such as earthquake. A well-known application is the classical problem of the
annual volume of the river Nile, studied by Cobb [Cob78]. More applications can be
found in medicine, economics, quality control and so on.
Generally, because of the assumptions made over the regression function, local
polynomial kernel regression provide smooth estimations of the discontinuous regression functions, so that it will cause biased regression estimates. Therefore these
smoothers need to be adapted if discontinuities are present.
New model-based and model-assisted estimators are proposed, as well as their
corresponding jump-preserving counterparts.
The latter are the result of combining local polynomial regression, introduced
by Ruppert and Wand [RW94] and Fan and Gijbels [FG96], among others, together
with Wu and Chu’s [WC93] method of the projected observations. The projected
observations is a method of reusing the available data to increase the number of
observations in the jump point location region and contributing to improve the
estimation of the discontinuous regression function.
The methods are simple to implement and are based on local polynomial smoothing, that has many desirable properties at the nonparametric regression context,
including design adaptation, consistency and asymptotic unbiesedness.
The paper is organized as follows: Section 2 provides a brief description of the
proposed methods for estimating the population mean. Section 3 contains the application to the IBEX’35 Spanish stock market data.
2 Proposed methods
Consider U = {1, . . . , N } a population of N units from which a sample s of size n is
selected. The chosen sampling design determines inclusion probabilities πj . Let yj
be the value of the study variable y, and xj the corresponding value of an auxiliary
variable x.
The Horvitz-Thompson estimators based on sample s:
ȳHT =
1 X yj
,
N j∈s πj
(2)
is a design-unbiased estimator of Y which does not use the information of auxiliary
variable x.
Application of new model-based and new model-assisted methods
1699
It will be assumed in the following the model:
Yj = m(xj ) + ej ,
j = 1, . . . , N,
(3)
where ej are independent and identically distributed with E( ej ) = 0, and constant
variance σ 2 . The unknown regression function m is defined without loss of generality
on the interval [0, 1] (it can be extended to the whole R), and is smooth but in a
finite and unknown number q of jump points tk (k = 1, . . . , q).
Under the model-assisted approach, Breidt and Opsomer’s [BO00] first use the
local polynomial kernel regression in the survey sampling setting. We consider this
method for estimating the finite population mean as follows
Yb
MA
X yj − m
bj
1
=
N
πj
j∈s
+
X
!
m
bj
(4)
,
j∈U
where m
b i is the local linear kernel estimator (Ruppert and Wand [RW94] and Fan
and Gijbels [FG96]).
If discontinuities are present an adaptation of the nonparametric regression estimator is required. As a previous step, we need estimation of the jump points and
in a second step, the regression function is estimated using these jumps points. We
estimate jump points following Wu and Chu’s [WC93] method but relying on local
linear kernel smoothing instead of the traditional Nadaraya-Watson kernel estimator
(Nadaraya [?]).
To estimate the jump points we consider the estimator:
P
m(x)
b
=
{i:xi ∈[−1,2]}
P
Kh (x − xi ){sn,2 − (x − xi )sn,1 }yiP
{i:xi ∈[−1,2]}
sn,2 sn,0 − (sn,1 )2
,
(5)
where K is a kernel function, x is the point of estimation, x ∈ [0, 1] and xi is
the pseudo-point, location of the projected observation yiP introduced by Wu and
Chu [WC93]. The functions sn,r are given by
X
sn,r =
Kh (x − xi )(x − xi )r .
(6)
{i:xi ∈[−1,2]}
The jump points are estimated through differences of estimators of the type (5).
Let s ⊂ U and yi , i = 1, . . . , n, the projected observations for each [b
tk−1 , b
tk ] for
k = 1, . . . , q + 1 are given by
yipk = y2−i + 2m
b gL (xj )(xi − b
tk )
i = 2 − n, 2 − n + 1, . . . , 0,
b gR (xj )(xi − b
tk−1 )
yipk = y2n−i + 2m
(7)
i = n + 1, n + 2, . . . , 2n − 1,
and yipk = yi for i = 1, . . . , n. m
b gL and m
b gR are two kernel smoothers which involve
different kernel functions and g is a pilot bandwidth, whose choice is motivated by
Wu and Chu [WC93] under practical considerations.
Finally, for each k = 1, . . . , q + 1 and xj ∈ [b
tk−1 , b
tk ], m
b j estimates the values m(xj )
m
bj =
q+1
X
X
k=1 {i:xi ∈[2tbk−1 −tbk ,2tbk −tbk−1 ]}
Kh (xj − xi ){sn,2 − (xj − xi )sn,1 }yipk
,
sn,2 sn,0 − (sn,1 )2
(8)
1700
M. Rueda, I. Sánchez-Borrego, S. González, J.F. Muñoz, and S. Martı́nez
where yipk are the projected data obtained from the original observations yi at the
design points xi ∈ [b
tk−1 , b
tk ]. The sn,r are given by:
X
sn,r =
Kh (xj − xi )(xj − xi )r .
(9)
{i:xi ∈[2tbk−1 −tbk ,2tbk −tbk−1 ]}
Finally, the finite mean population is estimated by the model-assisted estimator,
b
y
JP MA
1
=
N
X yj − m
bj
πj
j∈s
+
X
!
m
bj
.
(10)
j∈U
A different approach to estimate the population mean is given by model-based estimators, which only predict the non-sampled values. We propose an alternative
model-based estimator for estimating the population mean based on a local linear kernel smoother, whose good properties are well-known at the nonparametric
regression context. The proposed estimator is given by
Yb
MB
X
1
=
N
Yj +
j∈s
X
!
m
bj
(11)
,
j∈U −s
being m
b j the local linear kernel smoother (Ruppert and Wand [RW94] and Fan and
Gijbels [FG96]).
Similarly, we propose the following model-based estimator
Yb J P M B =
1
N
X
j∈s
Yj +
X
!
m
bj
,
(12)
j∈U −s
b j the adapted version to discontinuities of the local linear kernel smoother
being m
given in (8).
3 Application to the IBEX’35 data set
We consider the population data set of the Spanish stock market IBEX’35, taken
from april 2004 to october 2005. It contains 401 units. Plot of this data set is given
in Figure 1.
Müller and Stadtmüller [?] introduced a nonparametric regression method to determine the number of jump points of the discontinuous regression function. That
method fixed that number for this data set in q = 1. This discontinuity point is
located at the end of september 2004.
We compare the performance of model-assisted estimator (MA), jump-preserving
estimator (JPMA), model-assisted estimator (MB), jump-preserving estimator
(JPMB), regression estimator (REG) (Singh [Sin03]) and the Horvitz-Thompson
estimator (HT) applied to the IBEX’35 data set.
A bandwidth is considered h = 0.1 for the nonparametric regression estimators.
To estimate the jump points of the discontinuous regression function, an equally
spaced grid of 402 points is considered. Samples are generated by simple random
Application of new model-based and new model-assisted methods
1701
IBEX’35
11100
10300
9500
8700
7900
7100
april 2004
oct 2004
april 2005
oct 2005
Fig. 1. Scatter plot of IBEX’35 population
sampling using sample sizes n = 50, n = 75 and n = 100. We perform 500 replications.
For each estimator we compute the relative absolute bias (RAB)
b =
RAB(θ)
R X
b i) − Y
θ(s
Y
i=1
,
(13)
where R is the number of replications and θb is the finite population mean estimator
considered. The relative efficiency respect to the Horvitz-Thompson estimator is
given by
PR
i=1
b = P
RE(θ)
R
i=1
b i) − Y
θ(s
2
yHT (si ) − Y
2 .
(14)
The calculations and all the estimators were obtained using the R program. Programming details are available from the authors.
Table 1 and Table 2 show the RE and RAB values for the IBEX’35 data set, and
Figure 2 reports line plots for the estimators at each sample size.
As sample size grows the bias of the estimators decreases. The application to
IBEX’35 data set shows the improvement provided by the jump-preserving methods
relative to their non jump-preserving counterparts. Moreover, better results in terms
of RE and RAB values are given by the model-assisted methods compared to the
model-based estimators.
1702
M. Rueda, I. Sánchez-Borrego, S. González, J.F. Muñoz, and S. Martı́nez
Table 1. Relative efficiency (RE) of IBEX’35 stock market data with bandwidth
h = 0.1
n
HT MA
50 1
75 1
100 1
JPMA MB
JPMB REG
0.04021 0.03465 0.16749 0.12868 0.13266
0.01968 0.01722 0.02721 0.01917 0.10599
0.01647 0.01341 0.02545 0.01733 0.10371
Table 2. Relative Absolute Bias (RAB) of IBEX’35 stock market data with bandwidth h = 0.1
n
HT
MA
JPMA MB
JPMB REG
50 0.01020 0.00106 0.00101 0.00289 0.00242 0.00372
75 0.00818 0.00061 0.00055 0.00087 0.00063 0.00269
100 0.00743 0.00047 0.00046 0.00077 0.00059 0.00237
References
[BO00]
[Cob78]
[Cha93]
[Dor92]
[DH93]
[FG96]
[Kuo88]
[RW94]
[Sin03]
[WC93]
Breidt, F.J., Opsomer, J.D. (2000) Local polynomial regression estimators
in survey sampling. The Annals of Statistics, 28(4), 1026–1053
Cobb, G.W. The problem of the Nile: conditional solution to a changepoint problem, Biometrika, 62, pp. 243–251 (1978)
Chambers et al. Bias robust estimation in ¯nite population using nonparametric calibration, Journal of American Statistical Association, 88,
268–277 (1993)
Dorfman, A.H. Nonparametric regression for estimating totals in ¯nite
populations, Proceedings of the Section on Survey Research Methods,
622–625. American Statistical Association, Alexandria, VA. (1992)
Dorfman, A.H. and Hall, P. Estimators of the ¯nite population distribution function using nonparametric regression. Annals of Statistics, 21
1452–1475 (1993)
Fan, J. and Gijbels, I. Local Polynomial Modelling and Its Applications.
Ed. Chapman and Hall (1996)
[MS99]MullMüller, H. G. and StadtmÄuller, U. Discontinuous versus
smooth regres- sion. Annals of Statistics, 27(1) 299–337 (1999)
[Nad64]NAdar Nadaraya, E.A. On estimating regression. Theory Probab.
Applic., 15, 134–137 (1964)
Kuo, L. (1988) Classical and Prediction Approaches to Estimating Distribution Functions from Survey Data. Proceeding of the Section on Survey
Researh Methods. American Statistical Association, 280–285
Ruppert, D. and Wand, M.P. Multivariate locally weighted least squares
regression. The Annals of Statistics, 22(3), 1346–1370 (1994)
Singh, S. Advanced sampling theory with applications: How Michael
”selected” Amy. Kluwer Academic Publisher. The Netherlands, 1–1247
(2003)
Wu, J.S. and Chu, C.K. Nonparametric function estimation and bandwidth selection for discontinuous regression functions, Statistica Sinica,
3, 557–576 (1993)
Application of new model-based and new model-assisted methods
RE
1703
RAB
0.15
0.003
0.10
0.002
0.05
0.001
0.00
0.000
50
75
50
100
75
100
n
n
MA
JPMA
MB
JPMB
REG
Fig. 2. (RE) and (RAB) for the model-assisted estimator (MA), jump-preserving
model-assisted (JPMA), model-based (MB), jump-preserving model-based (JPMB)
and the classical regression estimator (REG) and for bandwidth h = 0.1 and sample
sizes n = 50, n = 75 and n = 100
Download