The method introduction of varying coefficient analysis with non

advertisement
Online Appendix
The method introduction of varying coefficient analysis with
5
non- parametric estimation
The varying coefficient method was firstly developed by Hastie and Tibshirani in
1993. This method have been widely use in economic, biology, although it is not well
introduced in the ecology research. The varying coefficient method with
10
non-parametric estimation does not pre-specify functional forms of the involved
variables, but uses information purely from data to estimate a curve/function. Such
varying coefficient method is obviously more flexible and robust to fit data than any
parametric method, because if a parametric structure of the model is pre-assumed
there is a risk of the wrong model and conclusions.
15
In a Varying coefficients model, the factor that might lead to the variation of the
correlation coefficient of species’ quantitative characteristics can be described by
variable Z, a regression model with functional coefficients in such situation can be
expressed as
Y   0 (Z )  1 (Z ) X   ,
20
(2)
where  is a random error with mean zero and is independent of Z,  0 ( Z ) and
1 ( Z ) are respectively the intercept and slope both of which are functions of Z. In
model (2), if 1 ( Z ) is independent of the third variable Z (i.e. constant slope), then it
reduces to the classical linear regression model (1). If the effect from Z is not a
1
constant for species quantitative characteristics of interacting species, this model is
called varying coefficients regression model (Hastie & Tibshirani 1993). Let
 XY Z denote the conditional correlation coefficient of X and Y when Z is given. Then
5
it is easy to see
 1 ( Z )   XY Z (
Y Z
X Z
),
(3)
where  X2 |Z and  Y2|Z are the conditional variance of X and Y for given Z
respectively. If  Y |Z /  X |Z is not significantly dependent on Z, the correlation
coefficient  XY Z as a function of Z is equivalent to 1 ( Z ) .
10
In model (2), the functional forms of  0 ( Z ) and 1 ( Z ) are not pre-specified
(or unknown), and a general method employing non-parametric smoothing technique
can be used to estimate these functions. This nonparametric estimation method is of
an advantage that no functional forms need to be pre-specified, but only using the
information from data to obtain an estimated curve/function. Such a model is
15
obviously more flexible and robust to fit data than any parametric methods
(Epanechnikov 1969), because when a parametric structure of model is pre-assumed
we need to take the risk of wrong modeling and conclusions.
Let  ( Z )  ( 0 ( Z ), 1 ( Z ))' , U  (1, X )' where the prime stands for transpose of a
vector. When the matrix E (UU ' | Z ) is invertible, a least squares type estimator of
20
 (Z ) is uniquely defined by
ˆ (Z )  E (UU ' | Z ) 1 E (UY | Z ) .
(4)
The conditional expectation above can be estimated by nonparametric smoothing
method such as kernel estimation (see Epanechnikov 1969). Let { ( X i , Yi , Z i ) ,
2
i  1,2,  , n } be a sample of observations. Then
E (UU ' | Z ) 
1 n
 wni (Z )U iU i' ,
n i 1
E (UY ' | Z ) 
1 n
 wni (Z )U iYi
n i 1
(5)
n
where U i  (1, X i )' , wni (Z )  K h (Z  Z i ) / f h (Z ) , f h ( Z )  n 1  K h ( Z  Z i ) ,
i 1
5
K h (u )  h 1 K (u / h) . The kernel function K (u ) is a continuous, bounded and
symmetric real function that integrates to one (e.f.
 K (u)du  1 ). A scalar h is
defined as bandwidth. A variety of kernel functions are possible in general. Some
commonly used kernel functions include the Epanechnikov kernel (Epanechnikov
1969) with K (u )  0.75(1  u 2 ) I (| u | 1) and the quadratic kernel with
10
K (u ) 
15
(1  u 2 ) 2 I (| u | 1) (see Haerdle 1994), where I (| u | 1) is an indicator
16
function taking the value 1 when | u | 1 , and 0 otherwise. We see that the weight
function wni (Z ) is zero outside interval | Z  Z i | h . It has been shown that large h
will lead to large bias and small variance of the estimator, which is called
over-smoothing and small h will lead to small bias and large variance, which is called
15
under-smoothing. An appropriate bandwidth h can be selected by the leave-one-out
cross validation method (Haerdle 1994). In applications one can subjectively choose
the bandwidth h to either over-smooth or under-smooth the curve based on their
special purpose (Haerdle 1994, Ch 5).
Although we can see, from the fitted curves, how the regression coefficients vary
20
with Z, the functional forms of the curves are unknown. A method to test whether
1 ( Z ) is of some given functional forms can be developed when a Nonparametric
Monte Carlo Test (NMCT) is applied. The readers are referred to Zhu (2005).
3
In the varying coefficient regression model, the correlation coefficient  XY Z can
be well described by the regression coefficient 1 ( Z ) only when the ratio
 Y |Z /  X |Z is a constant. However, when this condition is not satisfied, 1 ( Z ) , as a
5
function of Z, is not equivalent to  XY Z and a nonparametric estimation of  XY Z is
desired. This can be seen from the definition as
Cov( X , Y | Z )
 XY Z 
Var ( X | Z )Var (Y | Z )

 XY |Z
.
 X |Z  Y |Z
(6)
The conditional covariance of X and Y, and the variance of X and of Y when Z is
given can be computed by using kernel estimation (Haerdle 1994). For example, using
10
the same notations, we have
 XY |Z 
 X |Z 
1 n
 wni (Z ) ( X i  Eˆ ( X | Z ))(Yi  E (Y | Z ))
n i 1
1 n
 wni (Z ) ( X i  Eˆ ( X | Z )) 2
n i 1
 Y |Z 
1 n
 wni (Z ) (Yi  Eˆ (Y | Z )) 2
n i 1
1 n
1 n
Where Eˆ ( X | Z )   wni X i , Eˆ (Y | Z )   wniYi . We can see that even for such
n i 1
n i 1
a complex structure, we can, similar to the linear model, still use correlation
15
coefficient |  XY Z | to reveal the correlation relationship between variables although
it is now a function of another involved factor.
Simulation Study
The computation of the nonparametric estimation for the varying coefficient proposed
in this paper has been programmed using Matlab code. Estimation for the varying
20
coefficients  i (z ) (i=0,1) is written by a Matlab function cvm.m, and the estimation
for the correlation coefficient is contained in npcc.m. We now conduct a simulation
4
study to examine the performance of the new method for the varying coefficients
analysis.
In the simulations, we consider different forms of the functions 1 ( z ) , and set
5
 0 ( z ) to be a constant. We respectively use constant, linear and other two special
functions of 1 ( z ) to simulate data, the drawn data follow the model
y   0 ( z )  1 ( z ) x   , where z is a random number from 0 to 8, x and  are
the standard normal random variables, and the sample size is 300. For each 1 ( z ) ,
we obtain the generated dataset {(yi, xi, zi): i=1, …, n}, then use Matlab function
10
cvm.m we programmed to calculate the estimated values of ˆ1 ( z ) , where the
bandwidth h in all the plots is h=0.5. The plots of y versus x and ˆ1 ( z ) versus z
(together with 1 ( z ) versus z ) are reported in Figures 1 and 2. When 1 ( z ) is a
constant, y and x are linearly related as is shown in Figure 1(a), and the estimated
ˆ1 ( z) is also approximately a constant as is shown in Figure 1(b). However, when
15
1 ( z) is not a constant function of z , while either linear or other specified forms,
we can see, from the plots of y versus x as shown in Figure 1(c), Figure 2(a) and
Figure 2(c), that y and x no longer follow a linear relationship, but our method can
still provide a good estimator ˆ1 ( z ) of 1 ( z ) , which fits the given functions of
1 ( z) very well, as shown in Figure 1(d), Figure 2(b) and Figure 2(d).
20
The simulation results suggest that when the regression coefficients are the
functions of the other factors, the use of the linear regression with constant coefficient
is not appropriate, while the varying correlation coefficient analysis works well to
identify the correlation patterns behind the data.
5
(a)
(b)
150
5
 1(z)
y
4
100
3
2
50
10
20
30
x
(c)
40
1
50
600
0
2
4
z
(d)
6
8
0
2
4
z
6
8
2
1
y
 1(z)
400
200
0
-1
-2
0
50
100
x
150
-3
Fig. 1: (a) Plot of y versus x for the data simulated from the model
y  2  1 ( z) x   , where 1 ( z ) =3, n=300, x is a normal random variable with
5
mean 32 and standard deviation 4, z is a random number from 0 to 8,  is a
normal random error with zero mean and standard deviation 1. This model has a
constant slope. (b) Plot of ˆ1 ( z ) versus z (dot) and 1 ( z ) versus z (line) for the
data of (a), we see the estimated ˆ1 ( z ) is approximated constant. (c) Plot of y versus
x for the data simulated from the model y  300  1 ( z) x   , where
10
1 ( z)  0.5 z  2 , n=300, x is a normal random variable with mean 100 and
standard deviation 12.5, z is a random number from 0 to 8,  is a normal random
error with zero mean and standard deviation 0.2. This model has a linear function of
1 ( z) . (d) Plot of ˆ1 ( z) versus z (dot) and 1 ( z) versus z (line) for the data in (b);
it is clear that the estimated ˆ1 ( z ) approximates the real function 1 ( z ) very well.
6
(b)
2
400
1
 1(z)
y
(a)
500
300
200
100
50
-1
100
x
(c)
-2
150
400
2
4
z
(d)
6
8
0
2
4
z
6
8
0
 1(z)
y
0
1
300
200
-1
100
0
50
0
100
x
150
-2
Fig. 2: (a) Plot of y versus x for the data simulated from the model
y  300  1 ( z) x   , where  1 ( z )  1.5 z  1.9 , n=300, x is a normal random
1 z
5
variable with mean 100 and standard deviation 12.5, z is a random number from 0
to 8,  is a normal random error with zero mean and standard deviation 0.2. (b) Plot
of ˆ1 ( z ) versus z (dot) and 1 ( z ) versus z (line) for the data of (a), we see the
estimated ˆ1 ( z ) approximates 1 ( z ) very well. (c) Plot of y versus x for the data
simulated from the model y  250  1 ( z) x   , where  1 ( z ) 
10
0.5  2 z
2,
1  0.2 z  0.1 z 2
n=300, x , z and  are same as that in (a). (d) Plot of ˆ1 ( z ) versus z (dot) and
1 ( z) versus z (line) for the data of (c), it is clear that the estimated ˆ1 ( z)
approximates the real function 1 ( z ) very well.
Examination of data of the fig and fig wasp
7
We firstly examine whether the linear regression analysis works on describing the
interaction between viable seeds and wasp offspring, following some graphical
evaluation methods such as Sugihara & May (1990). Fig. 3(a) and 3(b) clearly show a
5
good linear relationship for the simulated data in Figure 1(a). Fig. 3(c) and 3(d) are for
the real data from fig/fig wasp mutualism (Ficus racemosa), showing that a constant
linear regression is not appropriate compared with the results in Figure 3(a) and 3(b).
The p-value of the F-test in the linear regression model for this real data is 0.501,
which is obviously non-significant.
10
This above examination showed that the simple or generalized linear regression
can not be simply used to analyses the naturally collected data of figs and fig wasps.
(a)
(b)
20
100
50
y
y/x
10
0
0
-10
0
100
200
Observation number
(c)
-50
-20
300
0
10
20
2000
4000
(d)
20
4000
10
2000
0
-10
-20
-10
x
y
y/x
-20
0
-2000
0
50
100
150
Observation number
200
-4000
-4000 -2000
0
x
Fig.3. Some plots for artificial data and real data of fig/fig wasp mutualism. (a) Plot of y / x
versus observation number for the simulation data as shown in Figure 1(a), which indicates a
15
constant linear regression of y on x is well described since y / x approximates as a constant.
8
(b) Plot of  y versus x for the simulation data, which shown a clear linear relationship. (c)
Plot of y / x versus observation number for the fig/fig wasp data, which is unstable and
indicates a constant linear regression of y on x is not appropriate. (d) Plot of  y versus x for
5
the fig/fig wasp data, which shown no clear linear relationship exists. y=seeds, x= wasp offspring
(galls) in (c) and (d).
Two functions by Matlab code
10
15
20
25
30
35
function coef=vcm(x,y,z,h)
%-----------------------------------------------------------------% nonparametric estimation of varying coefficients model
% model: y=\alpha_0(z)+\alpha_1(z) x + e
% input variables: x, y and z have the same length, say n.
%
h is bandwisth
% output variable: coef is the regression coefficients estimators
%
coef is a n x 2 matrix. The first column is estimator
%
of \alpha_0(z), and The second column is estimator
of
%
\alpha_1(z).
% plot od coef(:,1) and coef(:,2) versus z can display the trend
% of the estimators of \alpha_0(z) and \alpha_1(z) as a function of
z .
%-----------------------------------------------------------------n=length(x);
X0=ones(n,1);
X1=x;
X=[X0 X1];
res=zeros(n,1);
for j=1:n
w1=1-(z-z(j)).^2/h^2;
X00=X0(w1>0);
X10=X1(w1>0);
y0=y(w1>0);
w0=w1(w1>0);
XX=ones(length(y0),2);
XX=[X00 X10];
9
5
10
15
20
25
30
35
XX=XX.*(w0*ones(1,2));
coef(j,:)=pinv(XX'*XX)*XX'*(y0.*w0);
clear y0 X10 X00 w1 w0 x10b y0b;
end
function rxy=npcc(x,y,z,h)
%------------------------------------------------------------------% nonparametric estimation of Pearson correlation coefficents
% input variables: x and y are two variables,and z is a variable,
%
h is bandwisth. x,y and z have the same length n.
% output variable: rxy is the correlation coefficients between x and
y
%
rxy is n vector.
% plot od rxy versus z can display the relationship of rxy with z.
%-----------------------------------------------------------------n=length(z);
for j=1:n
w1=1-(z-z(j)).^2/h^2;
x0=x(w1>0);
y0=y(w1>0);
w0=w1(w1>0);
w0=w0.^2;
y0b=sum(y0.*w0)./sum(w0);
x0b=sum(x0.*w0)./sum(w0);
sigmaxy=sum((y0-y0b).*(x0-x0b).*w0)./sum(w0);
sigmax=sum((x0-x0b).^2.*w0)./sum(w0);
sigmay=sum((y0-y0b).^2.*w0)./sum(w0);
rxy(j)=sigmaxy/sqrt(sigmax*sigmay);
clear y0 x0 w1 w0 x0b y0b sigmax sigmay sigmaxy;
end
clear n;
Literature cited
Epanechnikov, V. (1969) Nonparametric estimates of a multivariate probability
density. Theoretical Probability and Its Applications, 14, 153-158.
40
Haerdle,W. (1994) Applied nonparametric regression. Springer, Berlin.
10
Hastie,T., Tibshirani, R. 1993). Varying-coefficient models (with discussion). J. R.
Statistic. Soci.B, 55, 757-796.
Sugihara, G., May, R.M. 1990. Non-linear forecasting as a way of distinguishing
5
chaos from measurement error in time series. Nature, 344,734-741.
Zhu, L.X. 2005. Nonparametric Monte Carlo Tests and their applications. Springer,
New York.
11
Download