Statistical methods for spatial disaggregation and for integration of

advertisement

Review of projects and contributions on statistical methods for spatial disaggregation and for integration of various kinds of geographical information and geo-referenced survey data

Introduction

Part 1 - Data interpolation;

1.1.Point data

1.1.1 exact methods (polynomials, splines, linear triangulation, proximation, distance weighting, finite difference methods)

1.1.2 approximation methods (power series trends models, Fourier models, distance weighted least squares, least squares fitted with splines)

1.2 Areal data

1.2.1 From area to area (areal interpolation with ancillary data, areal interpolation using remote sensing as ancillary data, other areal interpolation methods using ancillary data, map overlay, pycnophylactic methods)

Part 2 - Data integration;

2.1 Introduction

2.2 Combination of data for mapping and area frames

2.3 Statistical analysis of integrated data

2.3.1 Change of support problems (MAUP and ecological fallacy problem)

2.3.2 Geostatistical tools and models to address the COSPs

2.3.2.1 Optimal zoning

2.3.2.2 Modelling with grouping variables, area level models

2.3.2.3 Geographically weighted regression and M-quantile regression

2.3.2.4 Block kriging and Co-kriging

2.3.2.5 Multiscale models

Part 3 Data fusion (former Data aggregation)

3.1 Ad hoc methods

3.2 Statistical data fusion

3.2.1 Spatial statistics

3.2.1.1 Geographically weighted regression

3.2.1.2 Multiscale spatial tree models

3.2.1.3 Bayesian hierarchical models for multiscale processes

3.2.1.4 Geospatial and spatial random effect models

3.3 Image fusion: non statistical approaches

3.3.1 Traditional fusion algorithms

3.3.2 Multi-resolution analysis-based methods

3.3.3 Artificial neural network based fusion method

3.3.4 Dempster-Shafer evidence theory based fusion method

3.3.5 Multiple algorithm fusion

3.3.6 Classification

Part 4 - Data disaggregation

4.1 Mapping techniques

4.1.1 Simple area weighting method

4.1.2 Pycnophylactic interpolation methods

4.1.3 Dasymetric mapping

4.1.4 Regression models

4.1.5 The EM algorithm

4.2 Small area estimators.

4.2.1 Model assisted estimators

4.2.2 Model based estimators

4.2.2.1 Area level models for small area estimation

4.2.2.2 Unit level mixed models for small area estimation

4.2.2.3 Unit level M-quantile models for small area estimation

4.2.2.4 Unit level nonparametric small area models

4.2.2.5 A note on small area estimation for out of sample areas

4.2.2.6 A note on Bayesian small area estimation methods

4.2.3 Small area model for binary and count data

4.3 Geostatistical methods

4.3.1 Geoadditive models

4.3.2 Area-to-point kriging

Conclusions and identified gaps

1.

Data Interpolation

In the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points. During the last years the increasing availability of spatial and spatiotemporal data pushed the developing of many spatial interpolation methods, including geostatistics. Spatial interpolation includes any of the formal techniques which study entities using their topological, geometric, or geographic properties.

In what follows we present a literature review on the available methods for spatial interpolation. In spatial interpolation the values of an attribute at unsampled points need to be estimated, meaning that spatial interpolation from point data to spatial continuous data is necessary. It is also necessary when the discretized surface has a different level of resolution, cell size or orientation from that required; a continuous surface is represented by a data model that is different from that required; and the data we have do not cover the domain of interest completely (Burrough and McDonnell,

1998). In such instances, spatial interpolation methods provide tools to fulfil such task by estimating the values of an environmental variable at unsampled sites using data from point observations within the same region.

Spatial data can be collected as discrete points or as sub-areas data. We refer to the first case as point data and to the second case as areal data.

1.1 Point Data

Point interpolation deals with data collectable at a point, such as temperature readings or elevation.

Point or isometric methods will be subdivided into exact and approximate methods according to whether they preserve the original sample point values. Numerous algorithms for point interpolation have been developed. But none of them is superior to all others for all applications, and the selection of an appropriate interpolation model depends largely on the type of data, the degree of accuracy desired, and the amount of computational effort afforded. Even with high computing power available, some methods are too time-consuming and expensive to be justified for certain applications. In all cases, the fundamental problem underlying all these interpolation models is that each is a sort of hypothesis about the surface, and that hypothesis may or may not be true.

The methods of the exact type include interpolating polynomials, most distance-weighting methods,

Kriging, spline interpolation, finite difference methods, linear triangulation and proximation. The group of approximate methods includes power-series trend models, Fourier models, distanceweighted least-squares, and least-squares fitting with splines (Lam, 1983).

1.1.1 Exact Methods

Interpolating polynomial

Given the set of N data points, one of the simplest mathematical expressions for a continuous surface that intersects these points is the interpolating polynomial of the lowest order that passes through all data points. One common form of this polynomial is f ( x , y )

=

(1.1) where the pair ( x , y ) identify a point in the space and a ij

are the coefficient of the polynomial that are determined by solving the set of equations f ( x i

, y i

)

= z i i

=

1,

¼

, N . (1.2)

The major deficiency of this exact polynomial fit is that since the polynomial is entirely unconstrained, except at the data points, the values attained between the points may be highly unreasonable and may be drastically different from those at nearby data points. This problem may be alleviated to a certain extent by employing lower order piecewise polynomial surfaces to cover the area (Crain and Bhattacharyya, 1967). Other problems include the existence of other solutions for the same set of data (Schumaker, 1976) and the inaccuracy of the inverses of large matrices of equation (1.2) for polynomials of orders greater than 5 (Ralston, 1965). As a result, this exact

polynomial interpolation method is not generally recommended, and particularly so when the number of data points is large.

A method to fit piecewise polynomials through z values of points irregularly distributed in the x-y plane is described by Akima (1978). By means of a Delaunay triangulation (Watson 1994, 63) data points are connected by non-overlapping triangular facets. Through the vertices of each triangle a surface is calculated which has continuous first derivatives. These derivatives are estimated based on the values of the closest n c neighbours. A Delaunay triangulation for a set P of points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P) .

Delaunay triangulations maximize the minimum angle of all the angles of the triangles in the triangulation; they tend to avoid skinny triangles. Akima (1978) in order to apply the piecewise polynomial interpolation recommends three to five neighbours. Laslett et a1. (1987) tested this technique using 24 neighbours and achieved an unsatisfactory result, so we suggest to follow

Akima’s guidelines.

Distance weighting

The principle of distance weighting methods is to assign more weight to nearby points than to distant points. The usual expression is f ( x , y )

=

N å w ( d i

=

1 i

N å w ( d i

=

1 i

) z i

)

(1.3) where w(d) is the weighting function, z i

is the data value point at i , and d i

is the distance from point i to (x,y) . Although weighting methods are often used as exact methods (Sampson 1978), they can also be approximate depending on the weighting functions. For those weighting functions where w (0)

= ¥

, such as w ( d )

= d

-

1

, the weighting method will give the exact value of the original sample points. On the other hand, for a negative exponential weighting function, the method will only approximate the original values at the locations of the sample points. The assumption is that sampled points closer to the unsampled point are more similar to it than those further away in their values. A popular choice for the weight is w ( d )

= d

p

, where p is a power parameter. When we apply this weighting function this methods is known as inverse distance weighted method. In this case weights diminish as the distance increases, especially when the value of the power parameter increases, so nearby samples have a heavier weight and have more influence on the estimation, and the resultant spatial interpolation is local. There are several disadvantages to weighting methods.

First, the choice of a weighting function may introduce ambiguity, in particular when the characteristics of the underlying surface are not known. Second, the weighting methods are easily affected by uneven distributions of data points since an equal weight will be assigned to each of the points even if it is in a cluster. This problem has long been recognized (Delfiner and Delhomme

1975), and has been handled either by averaging the points or selecting a single point to represent the cluster (Sampson 1978). Finally, the interpolated values of any point within the data set are bounded by min(z j

) ≤ f(x,y) ≤ max(z j

) as long as w(d i

) > 0 (Crain and Bhattacharyya 1967). In other words, whatever weighting function is used, the weighted average methods are essentially smoothing procedures. This is considered to be an important shortcoming because, in order to be useful, an interpolated surface, such as a contour map, should predict accurately certain important features of the original surface, such as the locations and magnitudes of maxima and minima, even when they are not included as original sample points. However this method has been widly used because of its simplicity.

Kriging

In 1951, Krige, a mining engineer in South Africa, first developed the approach therefore bearing his name, and used this approach to gold mining valuation problems. Matheron (1963) describes

Kriging as a method which can predict the grade of a panel by computing the weighted average of

available samples. He emphasizes that suitable weights should make variance smallest. Krige’s own understanding about the term is that it is a multiple regression which is the best linear weighted moving average of the ore grade of an ore block of any size by assigning an optimum set of weights to all the available and relevant data inside and outside the ore block (Krige, 1978). Ord (1983) states that Kriging is a method of interpolation for random spatial processes in the Encyclopedia of

Statistical Sciences. According to the opinion of Hemyari and Nofziger (1987), Kriging is a form of weighted average, where the weights depend upon the location and structure of covariance or semivariogram of observed points. The choice of weights must make the prediction error less than that of any other linear sum. A semivariogram is a function used to indicate spatial correlation in observations measured at sample locations. An early comprehensive introduction to the origins of

Kriging is given by Cressie (1990).

The applications of Kriging cover some disciplines which range from the classical application fields of mining and geology to soil science, hydrology, meteorology, etc., and recently to engineering design, cost estimation, wireless sensing and networks, simulation interpolation, evolutionary optimization, etc. However, the application of Kriging is mainly in geological settings. Kriging is extensively used to produce contour maps (Dowd, 1985 and Sabin 1985), for example to predict the values of soil attributes at unsampled locations. In the field of hydrology, Kriging has had wide applications, and some related review papers are (Goovaerts 1999, Trangmar et al. 1985 and Vieira et al. 1983). Additionally, Kriging is also applied to meteorology (De Iacoa et al. 2002). During the last years Kriging has been applied in the filed of optimization, engineer, cost estimation, economic sensitivity and wireless wave propagation.

All kriging estimators are variants of the basic equation (1.4) as follows:

ˆ

( x

0

)

= m +

N å l i

(

Z ( x i

)

m

( x

0

)

)

(1.4) i

=

1 where

µ is a known stationary mean, assumed to be constant over the whole domain and calculated as the average of the data (Wackernagel, 2003). The parameter λ i is kriging weight; N is the number of sampled points used to make the estimation and depends on the size of the search window; and

μ

( x

0

) is the mean of samples within the search window.

The kriging weights are estimated by minimising the variance, as follows: var( ˆ ( x

0

))

=

E [{ ˆ ( x

0

)

-

Z ( x

0

)}

2

]

= j

C ( x i

, x j

)

+

C ( x

0

, x

0

)

-

2 i

N å

=

1 l i

C ( x i

, x

0

) (1.5) where Z ( x

0

) is the true value expected at point x

0

, N represents the number of observations to be included in the estimation, and C ( x i

, x j

) = Cov[Z( x i

), Z( x j

)] (Isaaks and Srivastava, 1989).

Simple Kriging

The estimation of simple kriging (SK) is based on a slightly modified version of the equation (1.4):

ˆ

( x

0

)

=

N å l i

Z ( x i

)

+ (

1

-

N å

=

1 l i

) m

(1.6) i

=

1 i where

μ is a known stationary mean. The parameter

μ is assumed constant over the whole domain and calculated as the average of the data (Wackernagel, 2003). SK is used to estimate residuals from this reference value

μ given a priori and is therefore sometimes referred to as “kriging with known mean” (Wackernagel, 2003). The parameter μ

( x

0

) in equation (1.4) is replaced by the stationary mean μ in equation (1.6). The number of sampled points used to make the estimation in equation (1.6) is determined by the range of influence of the semivariogram (Burrough and

McDonnell, 1998). Because SK does not have a non-bias condition, 1

the greater the value of 1

-

å N i

=

1 l i

is not necessarily 0;

å l i

, the more the estimator will be drawn toward the mean; and in i

N

=

1 general the value of 1

å i

N

=

1 l i

increases in relative poorly sampled regions (Boufassa and

Armstrong, 1989). SK assumes second-order stationary that is constant mean, variance and covariance over the domain or the region of interest (Wackernagel, 2003; Webster and Oliver,

2001). Because such an assumption is often too restrictive, ordinary kriging (no a priori mean) is most often used (Burrough and McDonnell, 1998).

Ordinary Kriging

The ordinary kriging (OK) is similar to SK and the only difference is that OK estimates the value of the attribute using equations (1.4) by replacing

μ with a local mean

μ

( x

0

) that is the mean of samples within the search window, and forcing 1

å i

N

=

1 l i

=

0 , that is

å i

N

=

1 l i

=

1, which is achieved by plugging it into equation (1.4) (Clark and Harper, 2001; Goovaerts, 1997). OK estimates the local constant mean, then performs SK on the corresponding residuals, and only requires the stationary mean of the local search window (Goovaerts, 1997).

Kriging with a Trend

The kriging with a trend (KT) is normally called universal kriging (UK) that was proposed by

Matheron (1969). It is an extension of OK by incorporating the local trend within the neighbourhood search widow as a smoothly varying function of the coordinates. UK estimates the trend components within each search neighbourhood window and then performs SK on the corresponding residuals.

Block Kriging

The block kriging (BK) is a generic name for estimation of average values of the primary variable over a segment, a surface, or a volume of any size or shape (Goovaerts, 1997). It is an extension of

OK and estimates a block value instead of a point value by replacing the point-to-point covariance with the point-to-block covariance (Wackernagel, 2003). Essentially, BK is block OK and OK is

“point” OK.

Factorial Kriging

The factorial kriging (FK) is designed to determine the origins of the value of a continuous attribute

(Goovaerts, 1997). It models the experimental semivariogram as a linear combination of a few basic structure models to represent the different factors operating at different scales ( e.g.

, local and regional scales). FK can decompose the kriging estimates into different components such as nugget, short-range, long-range, and trend, and such components could be filtered in mapping if considered as noise. For example, the nugget component at sampled points could be filtered to remove discontinuities (peaks) at the sampled points, while the long-range component could be filtered to enhance the short-range variability of the attribute. FK assumes that noise and the underlying signal are additive and that the noise is homoscedastic.

Dual Kriging

The dual kriging (DuK) estimates the covariance values instead of data values to elucidate the filtering properties of kriging (Goovaerts, 1997). It also reduces the computational cost of kriging when used with a global search neighbourhood. It includes dual SK, dual OK, and dual FK. DuK has a restricted range of applications.

Simple Kriging with Varying Local Means

The SK with varying local means (SKlm) is an extension of SK by replacing the stationary mean with known varying means at each point that depend on the secondary information (Goovaerts,

1997). If the secondary variable is categorical, the primary local mean is the mean of the primary variable within a specific category of the secondary variable. If it is continuous, the primary local mean is a function of the secondary variable or can be acquired by discretising it into classes. SK is then used to produce the weights and estimates.

Kriging with an External Drift

The kriging with an external drift (KED) is similar to UK but incorporates the local trend within the neighbourhood search window as a linear function of a smoothly varying secondary variable instead of as a function of the spatial coordinates (Goovaerts, 1997). The trend of the primary variable must be linearly related to that of the secondary variable. This secondary variable should vary smoothly in space and is measured at all primary data points and at all points being estimated. KED is also called UK or external drift kriging in Pebesma (2004).

Cokriging

Unlike SK within strata, SKlm and KED that require the availability of information of auxiliary variables at all points being estimated, cokriging (CK) is proposed to use non-exhaustive auxiliary information and to explicitly account for the spatial cross correlation between the target and auxiliary variables (Goovaerts, 1997). Equation (1.4) can be extended to incorporate the additional information to derive equation (1.7), as follows:

Z

1

( x

0

)

= m

1

+

N

1 å l i

1

(

Z

1 i

1

=

1

( x i

1

)

m

1

( x i

1

)

)

+ N v å j

=

2

N j j i

å

=

1 l i j

(

Z j

( x i j

)

m j

( x i j

)

)

(1.7) where

μ

1 is a known stationary mean of the target variable, Z

1

( x i

1

) is the data of the target variable at point i

1

, m

1

( x i

1

) is the mean of samples within the search window, N

1 is the number of sampled points within the search window for point x

0 used to make the estimation, l i

1

is the weight selected to minimise the estimation variance of the target variable, N v is the number of auxiliary variables, N j is the number of j -th auxiliary variable within the search widow, l i j is the weight assigned to i j

-th point of j -th auxiliary variable, Z j

( x i j

) is the data at i j

-th point of j -th auxiliary variable, and m j

( x i j

) is the mean of samples of j -th auxiliary variable within the search widow.

Simple Cokriging

Replacing m

1

( x i

1

) with the stationary mean

μ

1

of the target variable, and replacing m j

( x i j

) with the stationary mean

μ j of the auxiliary variables in equation (1.7) will give the linear estimator of simple cokriging (SCK) (Goovaerts, 1997) as:

Z

1

( x

0

)

=

N

1 å (

Z

1 i

1

=

1 l i

1

= N

1 å l i

1

Z

1 i

1

=

1

( x i

1

)

( x i

1

)

m

1

+ N v å j

=

2 l i j

Z j i

N

å j

=

1 j

)

+

( x i j m

1

)

+

+

N v å j

=

2 l i j

(

Z j i

N

å j

=

1 j

( x i j

)

-

N

1

(

1

å l i

1

) m

1 i

1

=

1

m j

)

N v å j

=

2 l i j m j i

N

å j

=

1 j

(1.8)

If the target and auxiliary variables are not correlated, the SCK estimator reverts to the SK estimator

(Goovaerts, 1997). The weights generally decrease as the corresponding data points get farther away from the point of interest. When the point of interest is beyond the correlation range of both the target and auxiliary data, the SCK estimator then reverts to the stationary mean of the target variable. If all auxiliary variables are recorded at every sampled point, it is referred to as “equally sampled” or “isotopic”. If the target variable is undersampled relative to the auxiliary variables, it is referred to as “undersampled” or “heterotopic”. When the auxiliary variables are linearly dependent, one should be kept and other correlated variables discarded, and multivariate analysis such as principal component analysis (PCA) may be used to eliminate such dependency.

Ordinary Cokriging

The ordinary cokriging (OCK) is similar to SCK (Goovaerts, 1997). The only difference is that

OCK estimates the value of the target variable using equation (1.8) by replacing μ

1 and μ j with a local mean

μ

1

( x

0

) and

μ j

( x

0

) ( i.e.

, the mean of samples within the search window), and forcing

å

N

1 i

1

=

1 l i

1

=

1 and

å i

N j j

=

1 l i j

=

0. These two constraints may result in negative and/or small weights.

To reduce the occurrence of negative weights, these two constraints are combined to form the single constraint

å i

N

1

1

=

1 l i

1

+ å i

N j j

=

1 l i j

=

1.

OCK amounts to estimating the local target and auxiliary means and applying the SCK estimator

(equation (1.8)) with these estimates of the means rather than the stationary means (Goovaerts,

1997).

Standardised Ordinary Cokriging

The OCK has two drawbacks by calling for the auxiliary data weights to sum to zero (Goovaerts,

1997). The first is that some of the weights are negative, thus increasing the risk of getting unacceptable estimates. The second is that most of the weights tend to be small, thus reducing the influence of the auxiliary data. To overcome these drawbacks, the standardised OCK (SOCK) estimator was introduced, which calls for knowledge of the stationary means of both the target and auxiliary variables. These means can be estimated from the sample means. SOCK still accounts for the local departures from the overall means as OCK.

Principal Component Kriging

The principal component kriging (PCK) applies PCA to a few ( n v

) secondary variables to generate n v orthogonal or uncorrelated PCA components (Goovaerts, 1997). OK is then applied to each of the components to get principal component estimates. The final estimate is then generated as a linear combination of the principal component estimates weighted by their loadings and plus the local attribute mean.

Colocated Cokriging

The colocated cokriging (CCK) is a variant of CK (Goovaerts, 1997). It only uses the single auxiliary datum of any given type closest to the point being estimated. Like CK, CCK can also have several variants like simple colocated cokriging (SCCK), and ordinary colocated cokriging

(OCCK). CCK is proposed to overcome problems, such as screening effects of samples of the secondary variables close to or colocated with the point of interest. This situation arises when the sample densities of the secondary variables are much higher than that of the primary variable.

OCCK is also the preferred method for categorical soft information.

Model-based Kriging

Model-based kriging (MBK) was developed by Diggle et al. (1998). This method embeds the linear kriging methodology within a more general distributional framework that is characteristically similar to the structure of a generalized linear model. A Bayesian approach is adopted and implemented via the Markov chain Monte Carlo (MCMC) methodology, to predict arbitrary functionals of an unobserved latent process whilst making a proper allowance for the uncertainty in the estimation of any model parameters (Moyeed and Papritz, 2002). This method was further illustrated in Diggle and Ribeiro Jr. (2007). Given its heavy computational demanding (Moyeed and

Papritz, 2002), this method it is not applicable to large dataset.

Spline Interpolation

In their traditional cartographic and mathematical representation, splines are a local interpolation procedure. Considering a one-dimensional spline, for a set of n nodes (x

0

<x

1

<...

<x

N with associated values z

0

,z

1

, ...,z

N

), a spline function s(x) of degree m is defined in each subinterval ( x i

,x i+l

) to be a polynomial of degree m (or less), having its first m1 derivatives piecewise continuous (De Boor

1978). Hence, s(x) is a smooth function that passes through all control points. Extending splines to two-dimensional data is not straightforward because they are not simple cross-products of onedimensional splines (Lam 1983). Since the two-dimensional control points may be irregularlyspaced, the surface is often subdivided into "patches" where the splines are to be fitted. There are a number of different criteria for determining the subdivision, and each may produce a different

interpolating surface. Further, "sewing" the patches together in order to create an entire surface must be performed in such a way that spurious variations are not incorporated at or near the patch boundaries (Burrough 1986).

Splines may also be used as a basis for global minimization of an objective function. A particular class of splines commonly used as basis functions for linear interpolation and approximation is

"thin-plate splines" (TPS). The name is derived from the functions that solve a differential equation describing the bending energy of an infinite, thinplate constrained by point loads at the control points (Wahba 1990). One implementation of thinplate splines, a generalization of one-dimensional smoothing polynomial splines (Reinsch 1967), allows the resulting surface to smooth or approximate the data (Wahba 1981). As an approximation method, TPS is particularly advantageous when data are known to contain errors or when small-scale features need to be eliminated ( e.g.

, low-pass filtering of topographic variability). Methods for TPS interpolation and approximation are well-developed (Wahba and Wendelberger 1980; Wahba 1981; Dubrule 1983,

1984; Wahba 1990) and a version of the spherical procedure of Wahba (1981) has been implemented (Burt 1991).

As described by Wahba (1981), the objective of TPS approximation on the sphere is to find the global function u and smoothing parameter q

that minimize:

[ u ( f i

, l i

)

z i

]

+ q

J m

( u ) (1.9) where [ u ( f i

, l i

)

z i

] is often the L

2

-norm (or least square minimization) of the difference between u and z i

, and J m

( u ) is a penalty function that measures the smoothness of the spherical function u

(Wahaba 1981). The subscript m refers to the order of partial derivatives in the penalty function.

The two competing terms in equation (1.9), [ u ( f i

, l i

)

z i

] and q

J m

( u ) balance the requirements of fidelity to the control points and smoothness of the approximating surface. Unless the control points are derived from an inherently smooth surface and contain no error, minimizing

[ u ( f i

, l i

)

z i

] ( i.e.

making the surface pass very close to the data values) will increase

Similarly, minimizing z i

, increasing [ u ( f i

, l i q

J m

)

z i

( u ) ( i.e.

making the surface very smooth) may cause u to deviate from

] . Constraining q q

J m

( u ) .

to equal zero gives thin-plate spline interpolation, i.e.

, an exact fit through the control data. Typically, a linear combination of thin-plate spline basis functions approximates u (Wahba 1990), while generalized cross validation (GCV) determines q and m (although m=2 is commonly used). In contrast to ordinary cross validation, GCV implicitly removes one control point at a time and estimates the error in predicting the removed point (Golub et al. 1979).

Computational requirements for TPS are demanding. Very large design matrices (incorporating the evaluation of u, the penalty function, and the GCV function) are subjected to a series of decomposition methods ( e.g.

, Cholesky, QR, and singular value decompositions). As a result, both large computer memory and fast processing capability are needed. Two alternatives are available to reduce computing requirements: a partial spline model and partitioning of the dataset.

A partial spline model uses a subset of the control points when forming the basis functions (Bates et al. 1986). All control values, however, enter into the design matrix, giving an overdetermined system of equations. By using a smaller set of basis functions, computational requirements are reduced while an attempt is made to maintain the representation of the surface that would result from using all the control points.

An alternative to the partial spline model is to partition the control points into subsets with splines fit over each (overlapping) partition. The partitions subsequently are woven together to form the entire interpolating surface (Mitasova and Mitas 1993). While all of the control points are used in this approach, substantial overlap between partitions may be needed in order to ensure that the entire surface can be pieced together seamlessly. Also, partitioning the data does reduce computer memory requirements, but may take more processing time due to the overlap and additional postprocessing.

Finally, the use of spline functions in spatial interpolation offers, however, several advantages.

They are piecewise, and hence involve relatively few points at a time and should be closely related to the value being interpolated; they are analytic; and they are flexible. Splines of low degree, such as the bicubic splines, are always sufficient to interpolate the surface quite accurately (Lam 1983).

Finite difference methods

The principle behind finite difference methods is the assumption that the desired surface obeys some differential equations, both ordinary and partial. These equations are then approximated by finite differences and solved iteratively. For example, the problem may be to find a function Z such that

¶ 2

¶ x

2 z

2

+ ¶ y

2 z

=

0 (1.10) inside the region, and z(x i

,y i

) = 0 on the boundary. This is the La Place equation; and a finite difference approximation of this equation is z ij

=

( z i

-

1 j

+ z i

+

1 j

+ z ij

-

1

+ z ij

+

1

) / 4 (1.11) where z ij is the value in cell ij. This equation in effect requires that the value at a point is the average of its four neighbours, resulting in a smooth surface. For a smoother surface, other differential equations may be used. Also, the "boundary conditions" may be applied not only to the boundary but also within the region of interest (Briggs 1974; Swain 1976). This point interpolation technique has a striking similarity with the pycnophylactic areal interpolation method, which will be discussed later.

The principle involved in these finite difference methods is generally simple though the solution of the set of difference equations is time-consuming. Yet, the surface generated from these equations has no absolute or relative maxima or minima except at data points or on the boundary. In addition, interpolation beyond the neighbourhood of the data points is poor, and unnatural contouring can occur for certain types of data (Lam 1983). Morover, in some cases there might be no value assigned to certain points.

Linear and non-linear triangulation

This method (TIN) uses a triangular tessellation of the given point data to derive a bivariate function for each triangle which is then used to estimate the values at unsampled locations. Linear interpolation uses planar facets fitted to each triangle (Akima 1978 and Krcho 1973). Non-linear blended functions (e.g. polynomials) use additional continuity conditions in first-order, or both first- and second-order derivatives, ensuring smooth connection of triangles and differentiability of the resulting surface (Akima 1978; McCullagh 1988; Nielson 1983; Renka and Cline 1984). Because of their local nature, the methods are usually fast, with an easy incorporation of discontinuities and structural features. Appropriate triangulation respecting the surface geometry is crucial (Weibel and

Heller 1991). Extension to d -dimensional problems is more complex than for the distance-based methods (Nielson 1993).

While a TIN provides an effective representation of surfaces useful for various applications, such as dynamic visualisation and visibility analyses, interpolation based on a TIN, especially the simplest, most common linear version, belongs among the least accurate methods (Franke 1982; Nielson

1993; Renka and Cline 1984).

Proximation

Abruptly changing sample values may indicate that spatial dependence is low or nearly absent.

Sample point values are then interpreted as reference values for the surrounding area and no gradation across area boundaries is assumed. A Voronoi polygon ( i.e.

the geometric dual of a

Delaunay triangulation) also called a Thiessen or Dirichlet polygon (Burrough 1992), can consequently be used to describe the area with the sample point in the middle. Alternatively, areas with specific boundaries can be used as mapping units. Until the 1970s the latter method was

traditionally used to map soil properties using distinct regions, each describing one soil class having one value (Webster 1979). Van Kuilenburg et al. (1982) tested the accuracy of predicted values for soil moisture based both on Voronoi polygons and on a grid derived from conventional soil survey maps. In this study Voronoi polygons performed badly and soil class areas were found to be accurate predictors. On the other hand, Voltz and Webster (1990) found that splines or kriging performed better than the method of delineating soil classes, even with very distinct value differences at the boundaries of soil class areas.

The studies cited above demonstrate that grouping values by proximation –even if supplementary information of "clear" user-defined boundaries is added to the sample point information- is applicable only if one is sure of a lack of spatial dependence within the Voronoi or user-defined polygons. Spatial dependence can be examined using semi-variogram analysis (Declercq 1996).

1.1.2 Approximation Methods

The methods to be discussed in this section are concerned with determining a function, f(x,y), which assumes values at the data points approximately but not generally equal to the observed values.

Thus, there will be an "error" or residual at every data point. In order to obtain a good approximation, the errors must be kept within certain bounds by some error criterion (Lam 1983).

The methods showed below are variation of the least-square method, the well known method which minimizes the sum of squares residuals i

å =

1

N e i

2 = i

N å ( f ( x , y )

z

=

1 i

) 2 = min (1.12)

Power-series trend models

A common criterion used as approximation method is the ordinary least square polynomials. The general form of this model is the same as in equation (1.1) but in this case the number of terms in the polynomial, m, is less than the total number of data points, N , with the addition of an error term: f ( x , y )

=

(1.13)

These methods are also called power-series trend models since they are often used to simplify the surface into a major trend and associated residuals. Since interpolation means prediction of function values at unknown points, and trend in this case is regarded as a simplified function able to describe the general behaviour of the surface, predictions of values thus follow the trend. Problems associated with these class of interpolation models are evident. In the first place, the trend model assumes a distinction between a deterministic trend and a stochastic random surface (noise) for each phenomenon, which may be arbitrary in most cases. In most of the geosciences, the so-called trend may present the same stochastic character as the noise itself. Hence, a distinction between them is only a matter of scale, which is similar to the problem of distinguishing drift in Universal Kriging.

The estimation of values using trend models is highly affected by the extreme values and uneven distribution of data points (Krumbein 1959). The problem is further complicated by the fact that some of the data points are actually more informative than others. For example, in topographic maps, the data points taken from the peaks, pits, passes, and pales are more significant than the points taken from the slope or plain. Hence, the answer to how many data points are required for a reliable result is not known.

Compared with Kriging, the variance given by least-squares polynomials is the variance between the actual and the estimated values at sample points, which is generally less than the variance at points not belonging to the set of sample points (Matheron 1967). The mean-square error from the polynomial fit is not related to the estimation error as illustrated clearly by Delfiner and Delhomme

(1975).

Fourier models

If there is some definite reason for assuming that the surface takes some recurring or cyclical form, then a Fourier series model may be most applicable. The Fourier model basically takes the form where p i

=

2 p ix / M and q j

=

2 p z

= a

00 jy / N .

+

M i

M å

=

1 j

N å

=

1

and

F

N

( a ij

, p i

, q j

)

+ e ,

are the fundamental wavelengths in the x

(1.14)

and y directions. The Fourier series F(a ij

,p i

,q j

) is usually defined as

F ( a ij

, p i

, q j

)

= cc ij cos( p i

)

+ cs ij cos( p i

)sin( q j

)

+ sc ij sin( p i

)cos( q j

)

+ ss ij sin( p i

)sin( q j

). (1.15) cc ij

, cs ij

, sc ij

, ss ij

are the four Fourier coefficients for each a ij

(Bassett 1972). With this equation a surface can be decomposed into periodic surfaces with different wavelengths. It has been suggested by Curry (1966) and Casetti (1966) that the model is particularly useful for studying the effects of areal aggregation on surface variability. It is possible to combine trend and Fourier models so that a polynomial of low order is used to extract any large-scale trend; the residuals from this surface are analyzed by Fourier models (Bassett 1972).

Distance-weighted least-squares

Distance-weighted least-squares may be used to take into account the distance-decay effect

(McLain 1974; Lancaster and Salkauskas 1975). In this approach, the influence of a data point on the coefficient values is made to depend on its distance from the interpolated point. The error to be minimized becomes

N å i

=

1 e i

2 =

N å i

=

1 w ( d i

)

( f ( x , y )

z i

)

2

, (1.16) where w is a weighting function. Its choice again has a serious effect on the interpolation results.

Computation time is increased by the calculation of the weighting function.

Least-squares fitting with splines

Although a number of authors have suggested that this method will yield adequate solutions for most problems (Hayes and Halliday 1974; Schumaker 1976; McLain 1980), it involves a number of technical difficulties such as the problem of rank-deficiency in matrix manipulations, the choice of knots for spline approximation, and problems associated with an uneven distribution of data points.

1.2 Areal Data

Areal interpolation is the process of estimating the values of one or more variables in a set of target poligons based on known values that exist in a set of source polygons (Hawey and Moellering

2005). The need for areal interpolation arises when data from different sources are collected in different areal units. In the United States for example, spatial data that have been collected in census zones such as block groups and tracts is very common. Moreover, a useful data source may be aggregated based on natural rather than political boundaries. Because zones such as zip codes, service areas, census tracts and natural boundaries are incompatible with one another, areal interpolation is necessary to make use of all of this data from various sources.

There are many different methods of areal interpolation. Each method is unique in its assumptions about the underlying distribution of the data. The more modern methods make use of auxiliary data, which can give insight to the underlying distribution of the variable. The choice of which method to use may be dependent on various factors such as ease of implementation, accuracy, data availability and time. Two approaches, volume preservative and non-volume preservative, can be used to deal with the areal interpolation problem.

The non-volume preserving methods approach generally proceeds by overlaying a grid on the map and assigning a control point to represent each source zone. Point interpolation schemes are then applied to interpolate values at each grid node. Finally, the estimates of the grid points are averaged together within each target zone, yielding the final target-zone estimate. In this approach, the major variations between the numerous methods are the different point interpolation models used in

assigning values to grid points. There is evidence that this approach is a poor practice (Porter 1958;

Morrison 1971). First of all, the choice of a control point to represent the zone may involve errors.

Secondly, ambiguity occurs in assigning values at the grid points in some conditions, particularly when opposite pairs of unconnected centres have similar values which contrast with other opposite pairs. Thirdly, this approach utilizes point interpolation methods and hence can’t avoid the fundamental problem associated with them, that is the a priori assumption about the surface involved (Lam 1983). The most important problem of this approach, however, is that it does not conserve the total value within each zone.

The volume preserving methods preserves volume as an essential requirement for accurate interpolation. Furthermore, the zone itself is now used as the unit of operation instead of the arbitrarily assigned control point. Hence, no point interpolation process is required.

Areal interpolators can be further classified into simple interpolators and intelligent interpolators.

Simple areal interpolation methods refer to transferring data from source zones to target zones without using auxiliary data (Okabe and Sadahiro 1997). Intelligent areal interpolation methods use some form of auxiliary data related to interpolated attribute data to improve estimation accuracy

(Langford et al. 1991). Auxiliary data are used to infer the internal structure of attribute data distribution within source zones such as land use patterns.

In what follow we focus on the volume-preserving methods. We refer to the geographic areas for which the study variable is known as source zones/areas and those for which study variable is unknown as target zones/areas.

Areal interpolation without ancillary data

This paragraph focuses on areal interpolation methods that do not make use of ancillary data. The overlay method (Lam, 1983), also commonly referred to as the areal weighing method interpolates a variable based on the area of intersection between the source and target zones. Intersection zones are created by the overlay of source and target zones. Target zone values are then estimated based on the values of the source zone and the proportion of the intersection with the source zone by the following formula:

Z t

=

Z i i

D å

=

1

å

åå

A it

A i

å

åå

(1.17) where Z is as usual the value of the variable, A is the area, D is the number of source zones and t is a target zone. Although this method does preserve volume, it assumes that the variable is homogeneously distributed within the source zones (Lam, 1983).

The pycnophylactic method “assumes the existence of a smooth density function which takes into account the effect of adjacent source zones” (Lam, 1983). This is a method proposed by Tobler

(1979). This method originally assigns each grid cell the value of the source zone divided by the number of cells within that source zone. A new Z value is computed for each cell as the average of its four neighbours:

Z ij

=

( z ij

+

1

+ z ij

-

1

+ z i

+

1 j z i

-

1 j

) / 4 . (1.18)

The predicted values in each source zone are then compared with the actual values, and adjusted to meet the pycnophylactic condition. The pycnophylactic condition is defined as follows:

òò

R i

Z ( x , y ) dxdy

=

H i

, (1.19) where R i

is the i -th region/zones, H i

is the value of the target variable in region/zone i and Z(x,y) is the density function. This is an iterative procedure that continues until there is either no significant difference between predicted values and actual values within the source zones, or there have been no significant changes of cell values from the previous iteration. The target zone values can then be interpolated as the sum of the values of cells within each target zone.

Other methods of areal interpolation that do not make use auxiliary data include the point-based areal interpolation approach (Lam, 1983). This is an interpolation technique where the points are generally chosen as the centroids of the source polygons. The main criticism of these methods is

that they are not volume preserving. Kyriakidis (2004) has been able to preserve the actual volume of the source zone using the geostatistical method of kriging. Other point based methods include the point-in-polygon method (Okabe and Sadahiro, 1997).

Compared with the polygon overlay method, the pycnophylactic method represents a conceptual improvement since the effects of neighbouring source zones have been taken into account.

Moreover, homogeneity within zones is not required. However, the smooth density function imposed is again only a hypothesis about the surface and does not necessarily apply to many real cases (Lam 1983).

Areal interpolation using remote sensing as ancillary data

In recent years, remote sensing data have become widely available in the public domain. Satellite imagery is very powerful in that its values in different wavelength bands allow one to classify the data into land use types. Land use types are able to better inform an analyst about the distribution of a variable such as population.

Wright (1936) developed a method to map population densities in his classic article using Cape Cod as the study area. Wright used topographic sheets as auxiliary data in order to create areas that are uninhabitable, and areas of differing densities, which are assigned by a principle that he terms

“controlled guesswork.” The general principles of dasymetric mapping can be applied to the areal interpolation problem

Fisher and Langford (1995) were the first to publish results of areal interpolation using the dasymetric method. The dasymetric method was a variant of Wright’s (1936) method in that it uses a binary division of the land use types. Cockings et al. (1997) followed up on previous work (Fisher and Langford 1995) by suggesting measures for parameterization of areal interpolation errors.

There are several technique variations that produce a dasymetric map. Eicher and Brewer (2001) made use of three dasymetric mapping techniques for areal interpolation. These include the binary method, the three-class method and the limiting variable method. Other areal interpolation methods specifically using remotely sensed data are the regression methods of Langford et al. (1991). These methods use pixel counts of each land use type (defined by classification of a satellite image) and the population within each source zone to define the regression parameters. These parameters can then be used to predict target zone populations. However, this method is not volume preserving and can’t be used to produce dasymetric maps.

For the binary method , 100% of the data in each zone is assigned to only two classes cells derived from a raster land-use auxiliary dataset. No data were assigned to the other classes, hence the label binary. The primary advantage of this method was its simplicity. It was only necessary to reclassify the land-use data into two classes. This subjective decision was dependent on the land-use classification set as well as information known about the mapped area.

The polygon k-class method uses weights to assign target variable data to k different land-use classes within each zone. For zones with all k land-use classes present there is need to assign a percentage for each class. The percentages can be selected using a priori knowledge. A major weakness of the k-class method is that it does not account for the area of each particular land use within each zone.

The limiting variable method described by McCleary (1969). Wright (1936) also made use of this concept, as did Gerth (1993) and Charpentier (1997) in their GIS implementations. The first step in this method was to assign data by simple areal weighting to all polygons that show a given characteristic ( e.g.

inhabitable) in each zone. For the target variable ( e.g.

population count), data are assigned so that k classes ( e.g.

urban, forest, water polygons) land-use types within a zone had equal study variable densities. At this step, some classes can be "limited" to zero density ( e.g.

water).

Next, set thresholds of maximum density for particular land uses and applied these throughout the study area ( e.g.

forested areas are assigned a lower threshold of 15 people per square kilometre for the population variable mapped). The final step in the mapping process is the use of these threshold values to make adjustments to the data distribution within each zone. If a polygon density exceeded its threshold, it is assigned the threshold density and the remaining data were removed from the

area. These data were distributed evenly to the remaining polygons in the zone (Gerth1993). To decide the upper limits on the densities of the mapped variable for each of k classes it is useful to examine data available at the zone level. This approach can result in different thresholds for each variable - a systematic customization necessitated by differing magnitude ranges among variables.

Other areal interpolation methods using ancillary data

There exist a variety of areal interpolation methods that use auxiliary data other than remote sensing data (Hawley and Moellering 2005). In what follow we review some of these methods.

Flowerdew (1988) developed an areal interpolation method that operates as a Poisson process on a set of binary variables, which determine the presence or absence of each variable. Flowerdew and

Green (1991) expanded on this method by including a piece of theory known as the Expectation-

Maximization (EM) algorithm from the field of statistics. This allows for the areal interpolation problem to be thought of as a missing data problem.

Goodchild et al. (1993) developed an alternative method that uses a third set of areal units known as control zones as ancillary data. The control zones should each have constant densities, and can be created by the analyst based on local knowledge of the area.

Another interesting set of areal interpolation methods are termed smart interpolation methods

(Deichmann 1996). These methods make use of various auxiliary data sets such as water features, transportation structures, urban areas and parks. Turner and Openshaw (2001) expanded on smart interpolation by incorporating neural networks to estimate model parameters. Turner and Openshaw

(2001) define neural networks as “universal approximators capable of learning how to represent virtually any function no matter how complex or discontinuous.”

2. References

Akima, H. (1978). A method of bivariate interpolation and smooth surface fitting for irregularly distributed data points. ACM Transactions on Mathema&al Software 4 (2), 148-159.

Bassett, K. (1972). Numerical methods for map analysis. Progress in Geography, 4, 217-254.

Bates, D. M., Lindstrom, M. J., Wahba, G. and Yandell, B. S. (1986) . GCVPACK Routines for generalized cross validation .

Technical Report No. 775. Madison, Wis.: Department of Statistics ,

University of Wisconsin.

Boufassa, A. and Armstrong, M. (1989). Comparison between different kriging estimators,

Mathematical Geology , 21 (3), 331-345.

Briggs, I. C. (1974). Machine contouring using minimum curvature. Geophysics, 39, 39-48.

Burrough, P. A. (1986). Principles of geographical information systems for land resources assessment. Oxford: Oxford University Press.

Burrough, P. A. (1992). Principles of geographic information systems for land resources assessment. Oxford: Clarendon Press.

Burrough, P.A. and McDonnell, R.A. (1998). Principles of Geographical Information Systems .

Oxford University Press, Oxford.

Burt, J. E. (1991). DSSPGV, Routines for spherical thin-plate spline interpolation and approximation. Madison, Wis. Department of Geography , University of Wisconsin.

Casetti, E. (1966). Analysis of spatial association by trigonometric polynomials. Canadian

Geographer , 10, 199-204.

Charpentier, M. (1997). The dasymetric method with comments on its use in GIS. In: 1997 UCGIS

Annual Assembly and Summer Retreat. The University Consortium for Geographic Information

Science. June 15-20, Bar Harbor, Maine.

Clark, I. and Harper, W.V. (2001). Practical Geostatistics 2000 . Geostokos (Ecosse) Limited.

Cockings, S., Fisher, P. and Langford, M. (1997). Parameterization and visualization of the errors in areal interpolation. Geographical Analysis , 29, 4, 231-232.

Crain, I. K., and Bhattacharyya, B. K. (1967). Treatment of non-equispaced two-dimensional data with a digital computer. Geoexploration 5, 173-194.

Cressie, N. (1990) The Origins of Kriging, Mathematical Geology , 22 (3), 239-252.

Curry, L. (1966). A note on spatial association. Professional Geographer, 18, 97-99.

De Boor, C. (1978). A practical guide to splines. New York: Springer-Verlag.

Declercq, F. A. N. (1996). Interpolation Methods for Scattered Sample Data: Accuracy, Spatial

Patterns, Processing Time. Cartography and Geographic Information Systems , 23, 3, 128-144.

De Iacoa, S., Myers, D. E. and Posa, D. (2002). Space-time Variogram and a Functional Form for

Total Air Pollution Measurements, Computational Statistics and Data Analysis , 41, 311-328.

Deichmann, U. (1996). A review of spatial population database design and modelling. Technical

Report 96-3 for National Center for Geographic Information and Analysis , 1-62.

Delfiner, P., and Delhomme, J. P. (1975). Optimum interpolation by Kriging. In Display and analysis of spatial data, ed. J. C. Davis and M. J. McCullagh, pp. 97-114. Toronto: Wiley.

Diggle, P.J., Tawn, J.A. and Moyeed, R.A. (1998). Model-based geostatistics. Applied Statistics , 47

(3), 299-350.

Diggle, P.J. and Ribeiro Jr., P.J. (2007). Model-based Geostatistics . Springer, New York.

Dowd, P.A. (1985). A Review of Geostatistical Techniques for Contouring, Earnshaw, R. A. (eds),

Fundamental Algorithms for Computer Graphics , Springer-Verlag, Berlin, NATO ASI Series, Vol.

F17.

Dubrule, O. (1983). Two methods with different objectives: Splines and kriging. Mathematical

Geology, 15, 245-577.

Dubrule, O. (1984). Comparing splines and kriging. Computers and Geosciences, 101, 327-38.

Eicher, C., and Brewer, C. A. (2001). Dasymetric mapping and areal interpolation: Implementation and evaluation. Cartography and Geographic Information Science, 28, 2, 125-138.

Fisher, P. and Langford, M. (1995). Modeling the errors in areal interpolation between the zonal systems by Monte Carlo simulation. Environment and Planning A , 27, 211-224.

Flowerdew, R. (1988). Statistical methods for areal interpolation: Predicting count data from a binary variable. Newcastle upon Tyne, Northern Regional Research Laboratory , Research Report

No. 15.

Flowerdew, R. and M. Green. (1991). Data integration: Statistical methods for transferring data between zonal systems. In: I. Masser and M.B. Blakemore (eds), Handling Geographic

Information: Methodology and Potential Applications . Longman Scientific and Technical. 38-53.

Franke, R. (1982). Scattered data interpolation: tests of some methods. Mathematics of

Computation , 38, 181–200.

Gerth, J. (1993). Towards improved spatial analysis with areal units: The use of GIS to facilitate the creation of dasymetric maps. Masters Paper. Ohio State University.

Goodchild, M., Anselin, L. and Deichmann. U. (1993). A Framework for the areal interpolation of socioeconomic data. Environment and Planning A . London, U.K.: Taylor and Francis, 121-145.

Goovaerts, P. (1997). Geostatistics for Natural Resources Evaluation . Oxford University Press,

New York.

Golub, G. H., Heath, M. and Wahba, G. (1979). Generalized cross validation as a method for choosing a good ridge parameter. Technometrics. 21, 215-23.

Goovaerts, P. (1999). Geostatistics in Soil Science: State-of-the-art and Perspectives, Geoderma ,

89, 1-45.

Hawley, K. and Moellering, H. (2005): A Comparative Analysis of Areal Interpolation Methods,

Cartography and Geographic Information Science , 32, 4, 411-423.

Hayes, J. G. and Halliday, J. (1974). The least-squares fitting of cubic spline surfaces to general data sets. Journal of the Institute of Mathematics and its Application , 14, 89-103.

Hemyari, P. and Nofziger, D.L. (1987). Analytical Solution for Punctual Kriging in One

Dimension, Soil Science Society of America journal , 51 (1), 268-269.

Isaaks, E.H. and Srivastava, R.M. (1989). Applied Geostatistics . Oxford University Press, New

York.

Krcho, J. (1973) Morphometric analysis of relief on the basis of geometric aspect of field theory.

Acta Geographica Universitae Comenianae, Geographica Physica , 1, Bratislava, SPN

Krige, D. G., (1951). A Statistical Approach to Some Basic Mine Valuation Problems on the

Witwatersrand, Journal of the Chemical, Metallurgical and Mining Society of South Africa , 52,

119-139.

Krige, D. G. (1978). Lognormal-de Wijsian Geostatistics for Ore Evaluation, South African

Institute of Mining and Metallurgy, Johannesburg.

Krumbein, W. C. (1959). Trend surface analysis of contour-type maps with irregular control-point spacing. Journal of Geophysical Research, 64, 823-834.

Kyriakidis, P. (2004). A geostatistical framework for area-to-point spatial interpolation. Geographic

Analysis , 36, 259-289.

Lam, N. S. (1983). Spatial Interpolation Methods: A Review, The American Cartographer , 10 (2),

129-150.

Lancaster, P. and Salkauskas, K. (1975). An intro-duction to curve and surface fitting. Unpublished manuscript, Division of Applied Mathematics , University of Calgary.

Langford, M., Maguire, D.J. and Unwin, D.J. (1991). The areal interpolation problem: Estimating population using remote sensing in a GIS framework. In: Masser, I. and Blakemore, M. (eds),

Handling Geographic Information: Methodology and Potential Applications . Longman Scientific and Technical. 55-77.

Laslett, G. M., McBratney, A. B., Pabl, P. J. and Hutchinson, M. F. (1987). Comparison of several spatial prediction methods for soil pH. Journal of Soil Science 38, 325-41.

Matheron, G. (1963). Principles of Geostatistics, Economic Geology , 58 (8), 1246- 1266.

Matheron, G. (1967). Kriging, or polynomial interpolation procedures? Canadian Mining and

Metallurgy Bulletin, 60, 665, 1041-1045.

Matheron, G. (1969). Le Krigeage universel. Cahiers du Centre de Morphologie Mathematique ,

Ecole des Mines de Paris, Fontainebleau

McCleary, G. F. Jr. (1969). The dasymetric method in thematic cartography . Ph.D. Dissertation.

University of Wisconsin at Madison.

McCullagh, M. J. (1988). Terrain and surface modelling systems: theory and practice.

Photogrammetric Record, 12, 747–779.

McLain, D. H. (1974). Drawing contours from arbitrary data points. The Computer Journal, 17,

318-324.

McLain, D. H. (1980). Interpolation methods for erroneous data. In Mathematical Methods in

Computer Graphics and Design, ed. K. W. Brodlie, pp. 87- 104. New York: Academic Press.

Mitàsovà, H., and Mitàs. (1993). Interpolation by regularized spline with tension: I. Theory and implementation. Mathematical Geology, 25, 641-55.

Morrison, J. (1971). Method-produced error in isarithmic mapping. Technical Monograph, CA-5, first edition, Washington, D.C.: American Congress on Surveying and Mapping.

Moyeed, R.A. and Papritz, A. (2002). An empirical comparison of kriging methods for nonlinear spatial point prediction. Mathematical Geology , 34 (4), 365-386.

Nielson, G. (1983). A method for interpolating scattered data based upon a minimum norm network. Mathematics of Computation, 40, 253–271.

Nielson, G. M. (1993). Scattered data modeling. IEEE Computer Graphics and Applications , 13,

60–70.

Okabe, A. and Sadahiro, Y. (1997). Variation in count data transferred from a set of irregular zones to a set of regular zones through the point-in-polygon method. Geographical Information Science ,

11, 1, 93-106.

Ord, J.K. (1983). Kriging, Entry, Kotz, S. and Johnson, N. (eds.), Encyclopedia of Statistical

Sciences , John Wiley and Sons Inc., New York, Vol. 4, 411-413.

Pebesma, E.J. (2004). Multivariable geostatistics in S: the gstat package. Computer and

Geosciences , 30, 683-691.

Porter, P. (1958). Putting the isopleth in its place. Proceedings, Minnesota Academy of Science 25,

26, 372-384.

Ralston, A. (1965). A first course in numerical analysis.

New York: McGraw Hill.

Reinsch, G. (1967). Smoothing by spline functions. Numerische Mathematik, 10, 177-83

Renka, R. J. and Cline, A. K. (1984). A triangle-based C1 interpolation method. Rocky Mountain

Journal of Mathematics , 14, 223–237.

Sabin, M.A. (1985). Contouring – The state of the art, Earnshaw, R. A. (eds), Fundamental

Algorithms for Computer Graphics, Springer Verlag, NATO ASI series, Vol. F17.

Sampson, R. J. (1978). Surface II graphics system. Lawrence: Kansas Geological Survey.

Schumaker, L. L. (1976). Fitting surfaces to scattered data. In Approximation theory II, ed. G.

Lorentz, and others, pp. 203-267. New York: Academic Press.

Swain, C. (1976). A Fortran IV program for interpolating irregularly spaced data using difference equations for minimum curvature. Computers and Geosciences, 1, 231-240.

Tobler, W. (1979). Smooth pycnophylactic interpolation for geographical regions. Journal of the

American Statistical Association , 74, 367, 519–530.

Trangmar, B. B., Yost, R. S. and Uehara, G. (1985). Application of Geostatistics to Spatial Studies of Soil Properties, Advances in Agronomy , 38, 45-94.

Turner, A. and Openshaw, S. (2001). Dissaggregative spatial interpolation. Paper presented at

GISRUK 2001 in Glamorgan, Wales.

Van Kuilenburg, J., De Gruijter, J. J., Marsman, B. A. and Bouma, J. (1982). Accuracy of spatial interpolation between point data on soil moisture supply capacity, compared with estimates from mapping units. Geoderma , 27, 311-325.

Voltz, M. and Webster, R. (1990). A comparison of kriging, cubic splines and classification for predicting soil properties from sample information. Journal of Soil Science, 41, 473-490.

Wackernagel, H. (2003). Multivariate Geostatistics: An Introduction with Applications . Springer,

Berlin.

Wahba, G., and Wendelberger., J. (1980). Some new mathematical methods for variational objective analysis using splines and cross validation. Monthly Weather Review, 108, 1122-43.

Wahba, G. (1981). Spline interpolation and smoothing on the sphere. SIAM journal on Scientific and Statistical Computing 2, 5-16.

Wahba, G. (1990). Spline Models fur Observational Data. Philadelphia: SIAM.

Watson, D. F. (1994). Contouring. A guide to the analysis and display of spatial data. Oxford:

Elsevier.

Webster, R. (1979). Quantitative and numerical methodsin soil classificationand survey. Oxford:

Oxford University Press.

Webster, R. and Oliver, M. (2001). Geostatistics for Environmental Scientists . John Wiley and

Sons, Chichester.

Weibel, R. and Heller, M. (1991) Digital terrain modelling. In Goodchild, M. F., Maguire, D. J. and

Rhind, D. W. (eds) Geographical information systems: principles and applications . Harlow,

Longman/New York, John Wiley and Sons Inc, 2, 269–297.

Wright, J. (1936). A method of mapping densities of population: With Cape Cod as an example.

Geographical Review , 26, 1, 103–110.

2. Data integration

2.1 Introduction

Geospatial Data Integration 1 is considered here as the process and the result of geometrically combining two or more different sources of geospatial content to facilitate visualization and statistical analysis of the data. This process of integration has become more and more diffuse because of three recent developments in the field of information and communication technologies.

First, in the last decade global positioning systems (GPSs) and geographical information systems

(GISs) have been widely used to collect and synthesize spatial data from a variety of sources. Then, new advances in satellite imagery and remote sensing now permit scientists to access spatial data at several different resolutions. Finally, the Internet facilitates fast and easy data acquisition. In fact the growth of geospatial data on the web and adoption of interoperability protocols has made it possible to access a wide variety of geospatial content.

However, challenges remain. Once a user accesses this abundance of data, how is it possible to combine all this data in a meaningful way for agricultural statistics? How can one create valuable information from the multiple sources of information?

The scenery is complex and in addition, in any one study on agriculture and land use, several different types of data may be collected at differing scales and resolutions, at different spatial locations, and in different dimensions. Moreover the integration of multi-sourced datasets is not only the match of datasets geometrically, topologically, and having a correspondence of attribute, but also providing all social, legal, institutional and policy mechanisms together with technical tools to facilitate the integration of multi-sourced datasets. These last issues are considered out of the economy of this chapter.

In the chapter the contributions on the combination of data for mapping are shortly recalled in section 2.2. The main focus of the chapter is on the many unsolved issues associated with combining such data for statistical analysis, especially modelling and inference, which are reviewed in section 2.3.

2.2 Combination of data for mapping and area frames

The focus of many contributions on spatial data integration for mapping is on the technical solutions to integrate different sources. Main issues recalled by the literature on spatial data integration are some technical disparities including scale, resolution, compilation standards, source accuracy, registration, sensor characteristics, currency, temporality, or errors. Other significant problem in data integration including several components including differences in datum, projections, coordinate systems, data models, spatial and temporal resolution, precision, and accuracy. Typical problems in integration are also introduced comprising of naming conflicts, scale conflicts, precision and resolution conflicts (see Geospatial Data Integration, a project of the

Information Integration Research Group, University of South California http://www.isi.edu/integration/projects.html

).

Attention is drawn also on how to use the dynamic aspects of land use systems while mapping land use by using crop calendar and crop pattern information using also mobile GIS (De Bie 2002)

Particularly, the growth of spatial data on the web promoted the study on the dynamic integration of structured data sources - such as text, databases, non-spatial imagery or XML streams . Much less advancement has been gained with the integration of geospatial content. In this case integration is more complex than structured data sources because geospatial data obtained from various sources have significant complexities and inconsistencies related to how the data was obtained, the expectations for its use and level of granularity and coverage of the data (Williamson et al 2003). In

1

Spatial Data integration is often referred to as data fusion. In spatial applications, there is often a need to combine diverse data sets into a unified (fused) data set, which includes all of the data points and time steps from the input data sets. The fused data set is different from a simple combined superset in that the points in the fused data set contain attributes and metadata which might not have been included for these points in the original data set .

case of agricultural-environmental data these difficulties have been faced and resolved to some extent in (Mohammadi, 2008) and in (Rajabifard, 2002).

In case of semantic integration, which may presuppose a common attribute data model the previous difficulties can be approached and solved a priori of the integration itself.

Most of the previous issues are relevant and have to be solved in the construction and the updating of area sampling frames in agricultural surveys. In fact an area frame survey is defined by a cartographic representation of the territory and a rule that defines how it is divided into units

(Gallego, Delince, 2010). An area frame could be a list, map, aerial photograph, satellite image, or any other collection of land units. The units of an area frame can be points, transects (lines of a certain length) or pieces of territory, often named segments.

Area segments in area frames provide better information for geometric co-registration with satellite images; they also give better information on the plot structure and size; this can be useful for agrienvironmental indicators, such as landscape indexes (Gallego, Delince, 2010). Segments are also better adapted to combine with satellite images with a regression estimator (Carfagna, 2007). Future improvements of the European Land Use and Cover Area-Frame Statistical Survey (LUCAS) 2 should come from stratification updating. More recent satellite images or ortho-photos should provide a better stratification efficiency in LUCAS 2012 3 . Some authors encourage the comparison with the approach of photo-interpretation by point, as conducted for LUCAS 2006, with a cheaper approach of simple overlay on standard land cover maps, such as CORINE Land Cover 2006 (EEA,

2007)

There are also techniques that allow on-demand integration and that can be attractive also for dissemination of spatial data by statistical agencies and national statistical institutes. On-demand integration means the spatial content can be combined from disparate sources as necessary without considering complex requirements for manual conflation, pre-compiling or re-processing the existing datasets. On-demand geospatial integration assumes that the content creators have no a priori knowledge of their contents eventual use. Solutions provide the content integrator with greater flexibility and control over the data application leading to user pull models and products such as on-demand mapping and automosaicking.

Resulting reliance on metadata explanations support the complex nature of the problem; even basic steps to understand the geo-coordinates of a map served may prevent the integration of two or more incompatible sources of spatial data (Vanloenen, 2003).

2.3 Statistical analysis of integrated data

In this section of the review we focus on the statistical issues and the approaches that emerge integrating spatial disparate data. In the field of agricultural statistics these has a particular relevance as drawing on work from geography, ecology, geology, and statistical methods. Emphasis is on state-of-the-art of possible statistical solutions to this complex and important problem.

Indeed the answer to the question on how to create valuable information from the multiple sources of spatial information opens many statistical issues. These are those encountered in the so-called change of support problems (COSPs). Spatial support is much more than the area or volume associated with the data; it also includes the shape and orientation of the spatial units being considered. The central issue in COSPs is determination of the relationships between data at various scales or levels of aggregation (Gotaway and Young, 2002).

2.3.1 Change of support problems (MAUP and ecological fallacy problem)

2 In this case the sampling frame is a representation of the EU in a Lambert azimuthal equal area projection. LUCAS is a point frame survey; LUCAS defines a point with a size of 3 m.

3

GISCO (Geographical Information System at the COmmission) is responsible for the management and dissemination of the

Geographical reference database of the European Commission. It produces maps, spatial analysis, promotes geo-referencing of statistics and provides user support for Commission users of GIS. GISCO is one of the leaders of the INSPIRE initiative, supporting the implementation of the directive for the establishment of a European Spatial Data Infrastructure (see Inspire Conference 2013,

Florence 23-27 june 2013.

Spatial data are observations at specific locations or within specific regions – they contain information about locations and relative positions as well as measures of attributes. Three main types of spatial data can be identified: geostatistical, lattice and point pattern data.

Geostatistical data consist of measurements taken at fixed locations (e.g. rainfall measured at weather stations). Lattice or area data contain observations for regions, whether defined by a regular grid or irregular ones (e.g. agri-environmental indicators per area). Point pattern data relate to situations where locations are of interest (e.g. farm addresses). Of these, area data are the most common type of spatial data published by statistical agencies and national statistical institutes.

However, except for individual geo-referenced records, there is no unique unit for spatial analysis.

Areas over any continuous study region can be defined in a very large number of ways. In other words, the unit of spatial analysis is modifiable.

This is potentially problematic, since the results of quantitative analysis applied to such data depend upon the specific geography employed.

As long as the results of quantitative analysis across space are used simply to describe of the relationship among variables, the dependence of these results on the specific boundaries used for aggregation is simply a fact that needs to be taken into account when interpreting them. The problem appears when the differences in parameters from quantitative analysis used to make inferences lead to different – at times contradictory –findings. These findings can be related to either the refutation of certain theoretical models or to the identification of specific policy implications.

Fotheringham, Brunsdon and Charlton (2000) identify the MAUP as a key challenge in spatial data analysis. Its consequences are present in univariate, bivariate and multivariate analyses and could potentially affect results obtained by all the users of area level agricultural data published. The implications of the MAUP affect potentially any area level data, whether direct measures or complex model-based estimates (Dark and Bram, 2007; Arbia, 2013). Here are a few examples of situations where the MAUP is expected to make a difference.

1.

The special case of the ecological fallacy is always present when Census area data are used to formulate and evaluate policies that address problems at farm/individual level, such as deprivation. Also, it is recognised that a potential source of error in the analysis of Census data is ‘the arrangement of continuous space into defined regions for purposes of data reporting’ (Amrhein, 1995).

2.

The MAUP has an impact on indices derived from areal data, such as many of the agroenvironmental indicators, which can change significantly as a result of using different geographical levels of analysis to derive composite measures.

3.

The choice of boundaries for reporting ratios is not without consequences: when the areas are too small, the values estimated are unstable, while when the areas are too large, the values reported may be over-smoothed, i.e. meaningful variation may be lost (Nakaya,

2000).

Gotway and Young (2002) identify twelve other concepts interlinked with the MAUP, generalizing the problem as a change of support problem. Among these is the ecological fallacy, which arises when individual level characteristics and relationships are studied using area level data. The fallacy refers to drawing conclusions about individuals from area-level relationships that are only significant due to aggregation and not because of a real link. Robinson (1950) shows that correlations between two variables can be high at area level but may be very low at individual level.

His conclusions are that area level correlations cannot be used as substitutes for individual correlations. Gotway and Young (2002) view the ecological fallacy as a special case of the MAUP,

King (1997) argues the reverse.

Whilst some analysts dismiss the MAUP as an insoluble problem, many assume its absence, by taking the areas considered in the analysis as fixed or given. Most of those who recognise the

validity of the problem, approach it empirically and propose a variety of solutions: using individual level data; optimal zoning; modelling with grouping variables; using local rather than global analysis tools (Fotheringham, Brunsdon and Charlton, 2000), applying correction techniques or focusing on rates of change rather than levels (Fotheringham, 1989); overcoming the problem and modelling expected value of the study variable at area level with covariates at the same area level; modelling quantiles of the study variable at individual level and defining later the local level of analysis; using block kriging and co-kriging.

The only way to have analysis of spatial data without the MAUP is by using individual level data.

While being widely recognised (e.g. Fotheringham, Brunsdon and Charlton, 2000), this solution is of little practical relevance for most users of official statistics, due to confidentiality constraints.

Presenting a set of results together with their sensitivity to the MAUP is a widely recommended but little followed practice. Reporting sensitivity of analytical results to scale and zoning effects has been done by several authors, who used results for a large number of arbitrary regions produced by

Thiessen polygons or using grids6. Moellering and Tobler (1972) propose a technique that identifies and selects the appropriate set of boundaries on the basis of the principle that the level with most (statistically) significant variances is the one where spatial processes are ‘in action’. This solution, however, only deals with the scale effect of the MAUP.

2.3.2 Geostatistical tools and models to address the COSPs

The COSPs are far to be solved. Nevertheless there are several proposed improvements to facilitate the analysis of difficult to integrate spatial data.

2.3.2.1 Optimal zoning

A solution to MAUP is to analyse aggregate data specifying an optimality criterion of aggregation.

The boundaries of the zones of aggregation are obtained as the result of an optimization process. In other words, the scale and zoning aspects of the MAUP are considered as part of a problem of optimal design. Some objective function is defined in relation to model performance and identified accounting for constraints (Openshaw, 1977).

This solution is fascinating but impractical because is conditioned to the chosen constraints, which individuate analysis-specific boundaries. In addition the solution implies that either the analyst has access to unit-level data, which can be aggregated to any boundaries desired, or that the data provider makes available aggregates at any conceivable set of boundaries. This can be hardly the case for the most part of the application studies in agro-environmental field where usually the boundaries are pre-specified as local administrative governmental areas. Furthermore, regardless of criterion, different variables may be optimally zoned to different sets of boundaries, adding complications to their being modelled together. It is unlikely that optimal zoning will lead to identical boundaries for different variables.

2.3.2.2 Modelling with grouping variables, area level models

A way of circumventing the MAUP is modelling with grouping variables, or modelling directly the area means, taking the areas as given.

The grouping variables are measured at individual level and are used to adjust the area level variance-covariance matrix and bring it closer to the unknown individual level variance-covariance matrix. This happens under the assumption of area homogeneity. (Steel and Holt, 1996; Holt et al.,

1996). There are two limitations of the approach: first, the assumption of area homogeneity is not easy to defend. Then, the outcome is not really free of the MAUP, as the relationship between individuals and areas can change depending on the area definition used (i.e. zoning effects).

The most popular class of area level models (models for area means) is linear mixed models that include independent random area effects to account for between area variation beyond that explained by auxiliary variables (Jiang and Lahiri, 2006). This model is widely used also in small

area estimation, when the problem is inverse and the objective is to predict area means disaggregating exiting areas (see Rao (2003, Chapters 6-7) for a detailed description). Petrucci and

Salvati 2006 and Pratesi and Salvati (2008, p.114) noted that given area boundaries are generally defined according to administrative criteria without considering the eventual spatial interaction of the variable of interest and proposed to abandon the independence to assume that the random effects between the neighbouring areas (defined, for example, by a contiguity criterion) are correlated and that the correlation decays to zero as distance increases.

2.3.2.3 Geographically weighted regression and M-quantile regression

Typically, random effects models assume independence of the random area effects. This independence assumption is also implicit in M-quantile small area models. An alternative approach to incorporate the spatial information in the regression model is by assuming that the regression coefficients vary spatially across the geography of interest. Geographically Weighted Regression

(GWR) (see Brundson et al. (1996)) extends the traditional regression model by allowing local rather than global parameters to be estimated. There are also spatial extension to linear M-quantile regression based on GWR. The advantage of M-quantile models in MAUP context is that they do not depend on how areas are specified M-quantile GWR model is described in Salvati et al. (2008), where the authors proposed an extension to the GWR model, the M-quantile GWR model, i.e. a locally robust model for the M-quantiles of the conditional distribution of the outcome variable given the covariates.

Using local analysis tools, such as geographically weighted regression (Fotheringham, Brunsdon and Charlton, 2002), especially M-quantile geographically weighted regression may go some way towards limiting the global effects of the MAUP (see also Salvati et al 2008). Also semiparametric

(via penalized splines) M-quantile regression as introduced in Pratesi et al. (2006) can model spatial nonlinearities without depending on how the area are specified and so circumventing the COSs problems. Results are promising but there are not still extensions to multivariate case and for dicothomic study variables.

2.3.2.4 Block kriging and Co-kriging

Block kriging is a kriging method in which the average expected value in an area around an unsampled point is generated rather than the estimated exact value of an unsampled point. Block kriging is commonly used to provide better variance estimates and smooth interpolated results and in this sense provide a solution to the COSP. Many of the statistical solutions to the COSP can be traced back to Krige’s “regression effect” (Krige 1951). These were more formally developed into the beginning of the field of geostatistics by Matheron (1963). Point kriging is one solution to the point-to-point COSP. The basic geostatistical concepts of support and change of support have been presented by Clark (1979) and Armstrong (1999).

Cokriging is a form of kriging in which the distribution of a second, highly correlated variable

(covariate) is used along with the primary variable to provide interpolation estimates. Cokriging can improve estimates if the primary variable is difficult, impossible, or expensive to measure, and the second variable is sampled more intensely than the primary variable. In MAUP context bivariate or multivariate spatial prediction, or cokriging , was developed to improve the prediction of an

“undersampled” spatial variable by exploiting its spatial correlation with a related spatial variable that is more easily and extensively measured. by Journel and Huijbregts (1978), Chiles and Delner

(1999), and Cressie (1993a, 1996).

2.3.2.5 Multiscale models

Studies at several scales are often needed to achieve the understanding of many complex spatial processes and attention has recently focused on statistical methods for such multiscale processes.

The method is based on a scale-recursive algorithm based on a multilevel tree. Each level of the tree corresponds to a different spatial scale, with the finest scale at the lowest level of the tree.The conditional specification of spatial tree models lends itself easily to a Bayesian approach to

multiscale modeling.

In a Bayesian hierarchical framework, Wikle and Berliner (2005) propose the combination of multiscale information sources can be accomplished. The approach is targeted to settings in which various special spatial scales arise. These scales may be dictated by the data collection methods, availability of prior information, and/or goals of the analysis. The approach restricts to a few essential scales avoiding the challenging problem of constructing a model that can be used at all scales

References

Amrhein, 1995

Armstrong (1999).

Clark (1979)

Chiles and Delner (1999),

Cressie (1993a, 1996).

Dark S. J., Bram D. (2007) The modifiable area unit problem in phisical geography, in Progress in

Physical Geography 31(5) (2007) pp. 471–479

De Bie C.A.J.M (2002) Novel approaches to use rs-products for mapping and studying agricultural land use sytems, ISPRS, Commission VII, Working Group VII/2.1 on Sustainable Agriculture,

Invited Speaker; Hyderabad, India, 3-6 December 2002

Fotheringham, Brunsdon and Charlton (2000)

Gotaway C., Young L.J (2002) Combining Incompatible Spatial Data, Journal of the American

Statistical Association, June 2002

Gallego J., Delince J, (2010). The European land use and cover area-frame statistical survey, in

Agricultural Survey Methods , Edited by Roberto Benedetti, Marco Bee, Giuseppe Espa and

Federica Piersimoni 2010 John Wiley & Sons, Ltd

Carol A. Gotway and Linda J. Young (2007) A Geostatistical Approach to Linking Geographically

Aggregated Data from Different Sources, in Journal of Computational and Graphical Statistics, Vol.

16, No. 1 (Mar., 2007)

(Krige 1951)

Holt et al., 1996

Journel and Huijbregts (1978),

Jiang and Lahiri, 2006

Matheron (1963).

MOHAMMADI, H., RAJABIFARD, A. & WILLAIMSON, I. (2008) SDI as Holistic Framework. GIM

International, 22 (1).

Openshaw, 1977,

Petrucci and Salvati 2006

Pratesi 2006

Pratesi and Salvati 2008

RAJABIFARD, A., FEENY, M.-E. & WILLIAMSON, I. (2002) Future Directions for SDI Development.

International Journal of Applied Earth Observation and Geoinformation, 4 (1), 11-22.

Steel and Holt, 1996;

Wikle C.K, Berliner L.M. (2005) Combining Information Across Spatial Scales, in Technometrics, february 2005, vol. 47, no. 1

VANLOENEN, B. (2003) The impact of access policies on the development of a national GDI. Geodaten- und Geodienste-Infrastrukturen - von der Forschung zur praktischen Anwendung. Munster.

WILLIAMSON, I. P., RAJABIFARD, A. & FEENEY, M.-E. F. (2003) Developing Spatial Data

Infrastructures: From Concept to Reality, London, Taylor and Francis.

Rao (2003)

3. Data Fusion

Data fusion is the process of combining information from heterogeneous sources into a single composite picture of the relevant process, such that the composite picture is generally more accurate and complete than that derived from any single source alone (Hall, 2004).

Data fusion first appeared in the literature in the 1960s, as mathematical models for data manipulation. It was implemented in the US in the 1970s in the fields of robotics and defence. In

1986 the US Department of Defence established the Data Fusion Sub-Panel of the Joint Directors of

Laboratories (JDL) to address some of the main issues in data fusion and chart the new field in an effort to unify the terminology and procedures. The present applications of data fusion span a wide range of areas: maintenance engineering, robotics, pattern recognition and radar tracking, mine detection and other military applications, remote sensing, traffic control, aerospace system, law enforcement, medicine, finance, metrology, and geo-science.

Interest in deriving fused information from disparate, partially overlapping datasets exists in many different domains, and a recurring theme is that underlying processes of interest are multivariate, hidden, and continuous. Constraints imposed by technology, time, and resources often cause data collection to be incomplete, sparse, and incompatible. Various data fusion techniques appear independently in many different discipline areas in order to make optimal use of such data.

This review consider data fusion specifically designed for spatial data with heterogeneous support.

Such data are often encountered in remote sensing.

Depending on context, “data fusion" may or may not mean the same thing as information fusion, sensor fusion, or image fusion. Information fusion (also called information integration, duplication and referential integrity) is merging of information from disparate sources with differing conceptual, contextual and typographical representations (Torra, 2003). Typically, information integration applies to textual representations of knowledge, which are considered unstructured since they are not easily be represented by inventories of short symbols such as strings and numbers.

Machine classification of news articles based on their content is a good example (Chee-Hong et al,

2001). Another is the automatic detection of speech events in recorded videos, where there is complementary audio and visual information (Asano et al., 2004).

Sensor fusion is the combination of data from different sensors such as radar, sonar or other acoustic technologies, infra-red or thermal imaging camera, television cameras, sonabuoys, seismic sensors, and magnetic sensors. Objectives include object recognition, object identification, change detection, and tracking. A good example is detection and reconstruction of seismic disturbances recorded by ground-based seismic sensors. These instruments tend to produce non-stationary and noisy signals (Ling et. al, 2001). Approaches to sensor fusion are diverse, including, for instance,

physical, feature-based inference, information-theoretic inference and cognitive models (Lawrence,

2007).

Image fusion is the fusion of two or more images into a single more complete or more useful picture. In some situations, analysis requires images with high spatial and spectral resolution; higher than that of any single data source. This often occurs in remote sensing, where many images of the same scene exist at different resolutions. For instance, land-use images may be collected from airplanes, where coverage is narrow and sparse, but resolution is very high. They might also come from satellites, where coverage is dense, but resolution is much coarser. Optimal inference of landuse should combine the two data sources so that the resultant product makes the best use of each source's strength (Sun et al., 2003). Many non-statistical methods exist to perform image fusion, including the high-pass filtering, the discrete wavelet transform, the uniform rational filter bank, and the laplacian pyramid. These approaches are described in the last section.

While it is relatively easy to define and classify types of data fusion, the same can not be said for unifying different fusion methodologies in a comprehensive framework. The majority of fusion techniques are custom-designed for the problems they are supposed to solve. The wide scope of data fusion applications means that a enormous array of methodologies exists, each designed for specific problems with specific sets of assumptions about underlying structure.

In the following section, we discuss those methods that are most relevant for remote sensing data in environmental studies.

3.1 Ad-hoc methods

Typical strategies in Earth and climate sciences are simple. The usual modus operandi is to interpolate the original datasets to some common grid, and then observations can be combined using weighted linear functions. The interpolation process could be simple smoothing, movingwindow averaging, inverse distance weighting, k-nearest neighbor matchup, or more complex methods. The weights in weighted linear functions are usually related to the “confidence” associated with each observation. In moving-window averaging, where each grid point is inferred as the average of the points falling within a neighborhood of size r centered at the grid location, weights are functions of the number of points used to compute the average.

Ad-hoc methods are used in many GIS packages to perform merging and other raster calculations.

Most GIS applications store their data in either vector form, a scale-invariant collection of edges and vertices, or raster form, an array where each cell corresponds to area-support (Neteler and

Mitasova, 2008). Operations such as union, intersection, zonal-averaging, and pixel-by-pixel computations between two rasters with different supports are implicitly fusion operations.

These methods have the advantage of being very fast and scalable. However, they are designed as ad hoc solutions to specific problems, and thus can not be said to provide optimal inference. There is no expressed treatment of the change of support problem, and there is ambiguity regarding the support of the output. Nor is there any measure of uncertainty associated with the input or the prediction.

We note these ad hoc methods consist of two separate processes: interpolation and combination.

The interpolation process, since it is essentially decoupled from the combination process, can utilize a large number of possible methodologies. Combining data points linearly using confidence weights associated with prediction locations is conceptually straightforward, but assumes that each prediction location is independent of the others. This independence assumption is usually not true.

Most interpolation methods assume continuity of the underlying process, and therefore use information from the entire dataset, or a subset of it, to predict values at output locations. This makes the predictions inherently dependent on one another. Combining these predictions without accounting for this dependency can produce biased estimates.

Separating interpolation from combination makes for suboptimal inference. In general, we would like to combine the steps, ensuring that mean-squared prediction errors will be minimized. In the following section we discuss data fusion from a statistical perspective, which treats the problem in the context of a formal inferential framework.

3.2 Statistical data fusion

Statistical data fusion is the process of combining statistically heterogenous samples from marginal distributions in order to make inference about the unobserved joint distributions or functions of them (Braverman, 2008). Data fusion, as a discipline in statistics, has followed two lines of progress. One arises out of business and marketing applications, where there is a need to merge data from different surveys with complementary information and overlapping common attributes. The second setting is spatial, where incongruent sampling and different spatial supports need to be reconciled.

Problems in marketing usually emerge when there are two or more surveys that need to be merged.

These surveys typically have a few variables in common such as age, gender, ethnicity, and education. Few individuals, if any, participate in both surveys. “Statistical matching” refers to the practice of combining the survey data so that the aggregated dataset can be considered a sample from the joint distribution of interest. “Statistical linkage” is a related practice that assumes the same individuals are in both datasets, and attempts to map identical units to each other across datasets (Braverman, 2008).

Statistical matching is closely related to the missing data problem, formalized by Little and Rubin

(1987). Surveys may be concatenated to form a single dataset with complete data for the common variables, and incomplete data for variables that do not exist in both. The incomplete information may be modelled with a random variable for missingness. Specification of the joint and conditional distributions between variables allows for calculation of maximum likelihood estimates of underlying parameters. If prior distributions are imposed on the parameters, then the process is

Bayesian. Algorithms for producing fused data include the Expectation-Maximization (EM) algorithm, and Markov Chain Monte Carlo (MCMC), which is popular for statistical matching problems wherein the Bayesian posterior distribution can not be derived analytically (Braverman,

2008).

Statistical matching assumes that observational units within any single dataset are independent of one another, an assumption that is obviously not true for spatial data. A wide body of methodology has been developed to account for covariance in spatial datasets, and together they make up the discipline of spatial statistics.

3.2.1 Spatial statistics

Spatial statistical methods arose out of the recognition that classical regression is inadequate for use with spatial data. Standard regression assumes that observations are independent, and it is wellknown that when this is not true, regression coefficients are unstable (Berk, 2004). Spatial data in general conform to Tobler’s first law of geography: “Everything is related to everything else, but near things are more related than distant things” (Tobler, 1970). Spatial statistics explicitly account for spatial dependence, utilizing spatial covariance as a source of information to be exploited. We discuss some popular spatial statistical techniques that are sometimes used for fusion.

3.2.1.1 Geographically weighted regression

Geographically weighted regression (GWR) has roots in a linear regression framework. Standard regression assumes that observations are independent, which is clearly not true for spatial data where the defining characteristic is that nearby observations are more similar than those far apart.

Another assumption in regression is that the parameters of the model remain constant over the domain; in other words, there is no local change in the parameter values (Fotheringham et al.,

2002). As an illustration, we consider a simple example of GWR on a two-dimensional dataset. To accomodate the spatial correlation between predictors, geographically weighted regression assumes a linear model which the response variable change as a function of the coordinates (parameters).

The parameters of the GWR model depend by a weight function, W ( ) , which is chosen so that points near the prediction locations have more influence than points far away. Some common weight functions are the bisquare and the gaussian functions.

GWR is a popular spatial interpolation method. It is designed for single dataset spatial interpolation.

There is no provision for incorporating multiple data sources, though such an extension might include additional equations for additional datasets in the model. The parameters must identical across datasets. The method also assumes that data are at point-level support. Little work has been done to address change of support in GWR, though studies that apply GWR to modifiable areal unit problems, a class of change of support problem (COSP) where continuous spatial processes are aggregated into districts, found extreme variation in GWR regression parameters (Fotheringham and Wong, 1991).

To use GWR, it is necessary to estimate the parameters at a set of locations, typically locations associated with the data themselves. Computational order for this process is usually O ( N

3

), where N is the number of data points. Therefore, GWR does not scale well with increases in data size (Grose et al., 2008). Modifications for large datasets include choosing a fixed number p of locations, p < < n , where the model parameters are evaluated. Another possible approach is to separate GWR into several non-interacting processes, which could be solved in parallel using grid computing methods

(Grose et al., 2008).

3.2.1.2 Multiscale spatial tree models

Basselville et al. (1992) and Chou, Willsky, and Nikoukah (1994) developed a multiscale, recursive algorithm for spatial interpolation based on a nested tree structure. The model assumes that there are several levels in the tree, each corresponding to a different spatial scale. We assume that there is a hidden state process, X( s ), from which noisy observations, Z( s ), are generated. The data generation process is assumed to follow a linear model. The relationship across different spatial scales is assumed to be function of the parent-child scale variation, and a white noise process. Chou et al. (1994) generalized the Kalman filter process to produce optimal estimates of the state vector for multiscale spatial tree models. The algorithm is fast, with an order of computation generally proportional to the number of leaves, making the methodology a good candidate for large datasets

(Johannesson and Cressie, 2004).

Multiscale tree models work well with large datasets, and are flexible enough for a wide range of applications. Disadvantages include the fact that although the algorithm allows for change of scale, it does not explicitly account for change in support that occurs with changes in resolution (Gotway and Young, 2002). It is unclear how the model would be a ff ected if the spatial support on the same scale were irregular (i.e. satellite footprints having systematic shape distortion as a function of observing angle). The model does not explicitly account for the case where the observational units overlap, which is a serious concern with satellite data.

3.2.1.3 Bayesian hierarchical models for multiscale processes

A natural extension of multiscale tree models is Bayesian hierarchical modelling (BHM). When substantial prior information about the physics of the underlying field exists, Bayesian hierarchical modelling is a principled and efficient way to combine prior physical knowledge with the flexibility of spatial modelling.

When datasets exist at different support, this model could be extended for data fusion (Wikle et al.,

2001). Like multiscale tree modelling, this approach does not account for change of support that results from change in resolution. However, it is possible to completely resolve the COSPs by relating the processes to a continous point-support process (Gotway and Young, 2002). With certain simplifying assumptions, it is possible to completely specify the point-point, block-point, and block-block prediction procedures with Bayesian hierarchical modelling.

BHM’s allow for incorporation of physical models within a statistical framework. However, since they involve computation of posterior distributions, they rely heavily on Gaussian assumptions so that computation of posteriors is tractable. There are concerns about the choice of priors for many of the parameters, and convergence could be an issue for some problems where Monte Carlo methods such as the Gibb’s sampler are required. For large datasets, however, practical constraints require that the models be simple, and this may be a drawback for application of MCMC techniques in remote sensing.

3.2.1.4 Geospatial and spatial random effect models

Geostatistics is a brand of statistics that deals specifically with geographic relationships. One major class of geostatistical methodology is kriging. Kriging has wide appeal because it is flexible enough for a large variety of applications, while its rigorous treatment of spatial correlation allows for calculation of mean-squared prediction errors.

Kriging models the spatial correlation between data points with a covariance function C (s

1, s

2) =

Cov(Y (s

1

) , Y (s

2

)), where Y (s i ) denotes the value of the process at location s i .

Point kriging is one solution to the point-point change of support problem, but the framework can readily accommodate more general COSPs (Cressie, 1993). Here, we discuss the case when we need to infer point-level processes from areal-level data. When our data are at areal-level, the covariance function can not be estimated directly from the data. Cressie (1993) suggested assuming a parametric form for the covariance function, after which the theoretical covariance function could be equated to the empirical covariance to estimate the parameters. With an estimate of covariance function, at point support, we could predict at any aggregated scale using block-kriging (Gotway and Young, 2002).

The geostatistics accounts for change of support, in fact it was expressedly designed to address the change of support problem. It also produces estimates of mean-squared prediction errors. A further

extension, called cokriging, computes optimal estimates of a quantity by borrowing information from another related process, with realizations over the same domain. Fuentes and Raftery (2005) demonstrate this change of support property by interpolating dry deposition pollution levels from point-level data and areal-level model outputs. They model the relationship between the unobserved field and the data with a mix of Bayesian modeling and kriging. To fit the non-stationary empirical covariance, they represent the process locally as a stationary isotropic random field with some parameters that describe the local spatial structure. This covariance model can reflect nonstationarity in the process, but at the cost of requiring Monte Carlo integration for calculating the correlation at any pair of locations. While this approach elegantly addresses change of support and non-stationarity, it requires intensive computations, and is not suitable for massive datasets like those in remote sensing.

Kriging has several disadvantages. As in the example of Fuentes and Raftery, parameter estimation can be challenging. For small datasets, it is often necessary to assume that the covariance structure is stationary. Isotropy, the requirement that the covariance structure is a function of the distance between locations and not direction, is also a popular simplifying assumption. These assumptions likely do not apply to remote sensing datasets, the domain of which span regions with different geophysical properties. Relationships between nearby points for aerosol optical depth near the

North Pole, for instance, may exhibit different characteristics than those at locations near the equator. Likewise, covariance functions for geophysical processes usually are not isotropic. For instance, it is well known that most geophysical process exhibit markedly different behaviour along the longitudinal direction compared to the latitudinal direction.

The most pressing disadvantage of kriging, however, is its lack of scalability (computing kriging coefficients requires inversion of the covariance matrix). Even with a high-end consumer oriented computer, traditional kriging is too slow to be practical when the number of data points are on the order of thousands. For massive datasets, where the dimension could be on the order of hundreds of thousands data points or more, traditional kriging is clearly out of the question.

There are a number of different approaches to make kriging feasible for large datasets. Ad hoc methods include kriging using only data points in a local neighborhood of the prediction location

(Goovaerts, 1997; Isaaks and Srivastava, 1989). Though the method has the potential to scale to very large datasets, it inherently assumes that the covariance function tapers o ff after a certain distance, an assumption that may be too restrictive for some applications. Another drawback is that prediction mean and error surfaces could exhibit discontinuities as an artifact of the choice of neighbourhood. Another such approach limits the class of covariance function to those that produce sparse matrices (i.e. spherical covariance functions), and solve the kriging equations with sparse matrix techniques (Barry and Pace, 1997).

Recently, a number of strategies have been proposed that approximate the kriging equation itself.

One is to taper covariance functions to approach zero for large distances (Furrer et al., 2006).

Nychka (2000) treats the kriging surface as a linear combination of low order polynomials.

Standard matrix decomposition methods can convert this into an orthogonal basis, and computational complexity can be managed by truncating the basis functions. Billings, Newsam, and

Beatson (2002) replace the direct matrix inversion step with iterative approximation methods such as conjugate gradients. Convergence may be hastened by preconditioning to cluster the eigenvalues of the interpolation matrix. Another approach is to replace the data locations with a space-filling set of locations (Nychka, 1998). The set of data locations, (s

1

,...

, sN ), is replaced with the representive set of knots, (

κ

1

,...,κ K ), where K <<N . The knots can be obtained via an efficient space-filling algorithm, and the covariances between data locations are then approximated with the coveriances between knots. Though these methods scale well with the number of data points, to use them in remote sensing we would need to quantify how close the approximated kriging predictors are to the theoretical true values.

Another option is to restrict the covariance functions to a class that could be inverted exactly. L.

Hartman and O. Hssjer (2008) propose developing an exact kriging predictor for a Gaussian

Markov random field approximation. Computations are simplified and memory requirements are reduced by using a Gaussian Markov random field on a lattice with a sparse precision matrix as an approximation to the Gaussian field. Non-lattice data are converted to a lattice by applying bilinear interpolation at non-lattice locations. Computational speed is linear in the size of the data.

Johannesson and Cressie (2004) contructed multilevel tree models so that simple kriging can be done iteratively and rapidly, achieving eight orders of magnitude improvement in computational speed compared to directly inverting the kriging covariance matrix. The method has order of complexity that is linear in data size. However, the implied spatial covariance is nonstationary and

“blocky” (Johannesson and Cressie, 2004). While exactly invertable covariance methods are not preoccupied with inverting the covariance matrix, there are concerns about whether the specified class of exact covariance functions is flexible enough and how they can be fitted in practice (Cressie and Johannesson, 2008)

Cressie and Johannesson (2008) introduce a new method called Fixed-rank Kriging (FRK), an approach based on covariance classes that could be inverted exactly. Using a spatial random effects model, they develop a family of non-stationary and multi-resolutional covariance functions that are flexible enough to fit a wide range of geophysical situations, and can be inverted exactly.

Nyguyeng, Cressie and Braverman (2012) proposes an optimal fusion methodology that scales linearly with data size, and resolves change of support and biases through a spatial statistical framework. This methodology is based on Fixed-ranked Kriging (FRK), a variant of kriging that

uses a special class of covariance functions for spatial interpolation of a single, massive input dataset. This simplifies the computations needed to calculate the kriging means and prediction errors. The FRK framework is extend to the case of two or more massive input datasets. The methodology does not require assumptions of stationary or isotropy, making it appropriate for a wide range of geophysical processes. The method also accounts for change of support, allowing estimation of the point-level covariance functions from aggregated data, and prediction to pointlevel locations.

3.3 Image fusion: non statistical approaches

Image fusion is a branch of data fusion where data appear in the form of arrays of numbers representing brightness, color, temperature, distance, and other scene properties. Such data can be two-dimensional (still images), three-dimensional (volumetric images or video sequences in the form of spatio-temporal volumes), or of higher dimensions.

Image fusion is the process of combining information from two or more images of a scene into a single composite image that is more informative and is more suitable for visual perception or computer processing. The objective in image fusion is to reduce uncertainty and minimize redundancy in the output while maximizing relevant information particular to an application or task.

Given the same set of input images, different fused images may be created depending on the specific application and what is considered relevant information. There are several benefits in using image fusion: wider spatial and temporal coverage, decreased uncertainty, improved reliability, and increased robustness of system performance.

Often a single sensor cannot produce a complete representation of a scene. Visible images provide spectral and spatial details, and if a target has the same color and spatial characteristics as its background, it cannot be distinguished from the background. If visible images are fused with thermal images, a target that is warmer or colder than its background can be easily identified, even when its color and spatial details are similar to those of its background. Fused images can provide information that sometimes cannot be observed in the individual input images. Successful image fusion significantly reduces the amount of data to be viewed or processed without significantly reducing the amount of relevant information.

In 1997, Hall and Llinas gave a general introduction to multi-sensor data fusion. Another in-depth review paper on multiple sensors data fusion techniques was published in 1998 (Polh et al, 1998).

Since then, image fusion has received increasing attention. Further scientific papers on image fusion have been published with an emphasis on improving fusion quality and finding more application areas. As a case in point, Simone et al. (2002) describe three typical applications of data fusion in remote sensing, such as obtaining elevation maps from synthetic aperture radar (SAR)

interferometers, the fusion of multi-sensor and multi-temporal images, and the fusion of multifrequency, multi-polarization and multi-resolution SAR images. Vijayaraj (2006) provided the concepts of image fusion in remote sensing applications. Quite a few survey papers have been published recently, providing overviews of the history, developments, and the current state of the art of image fusion in the image-based application fields (Dasarathy, 2007; Smith, Heather,2005;

Blum, Liu, 2006) but recent development of multi- sensor data fusion in remote sensing fields has not been discussed in detail.

Image fusion can be performed roughly at four different stages: signal level, pixel level, feature level, and decision level:

Signal level fusion . In signal-based fusion, signals from different sensors are combined to create a new signal with a better signal-to noise ratio than the original signals (Richardson and

Marsh, 1988).

Pixel level fusion . Pixel-based fusion is performed on a pixel-by-pixel basis. It generates a fused image in which information associated with each pixel is determined from a set of pixels in source images to improve the performance of image processing tasks such as segmentation

Feature level fusion . Feature-based fusion at feature level requires an extraction of objects recognized in the various data sources. It requires the extraction of salient features which are depending on their environment such as pixel intensities, edges or textures. These similar features from input images are fused.

Decision-level fusion. Decision-level fusion consists of merging information at a higher level of abstraction, combines the results from multiple algorithms to yield a final fused decision. Input images are processed individually for information extraction. The obtained information is then combined applying decision rules to reinforce common interpretation.

During the past two decades, several fusion techniques have been proposed. Most of these techniques are based on the compromise between the desired spatial enhancement and the spectral consistency. Among the hundreds of variations of image fusion techniques, the widely used methods include, but are not limited to, intensity-hue-saturation (IHS), highpass filtering, principal component analysis (PCA), different arithmetic combination (e.g. Brovey transform), multiresolution analysis-based methods (e.g. pyramid algorithm, wavelet transform), and Artificial

Neural Networks (ANNs), etc.

We will discuss each of these approaches in the following sections. The best level and methodology for a given remote sensing application depends on several factors: the complexity of the classification problem, the available data set, and the goal of the analysis.

3.3.1 Traditional fusion algorithms

The PCA transform converts inter-correlated multi-spectral (MS) bands into a new set of uncorrelated components. To do this approach first we must get the principle components of the MS image bands. After that, the first principle component which contains the most information of the image is substituted by the panchromatic image. Finally the inverse principal component transform is done to get the new RGB (Red, Green, and Blue) bands of multi-spectral image from the principle components. The intensity-hue-saturation (HIS) fusion converts a color MS image from the RGB space into the IHS color space. Were I,H,S stand for intensity, hue and saturation components respectively; R, G, B mean Red, Green, and Blue bands of multi-spectral image.

Because the intensity (I) band resembles a panchromatic (PAN) image, it is replaced by a highresolution PAN image in the fusion. A reverse IHS transform is then performed on the PAN together with the hue (H) and saturation (S) bands, resulting in an IHS fused image. Different arithmetic combinations have been developed for image fusion. The Brovey transform, Synthetic

Variable Ratio (SVR), and Ratio Enhancement (RE) techniques are some successful examples

(Blum and Li, 2006). The basic procedure of the Brovey transform first multiplies each MS band by the high resolution PAN band, and then divides each product by the sum of the MS bands.

Traditional fusion algorithms mentioned above have been widely used for relatively simple and time efficient fusion schemes. However, several problems must be considered before their application: (1) These fusion algorithms generate a fused image from a set of pixels in the various sources. These pixel-level fusion methods are very sensitive to registration accuracy, so that coregistration of input images at sub-pixel level is required; (2) One of the main limitations of HIS and Brovey transform is that the number of input multiple spectral bands should be equal or less than three at a time; (3) These image fusion methods are often successful at improves the spatial resolution, however, they tend to distort the original spectral signatures to some extent (Yun, 2004;

Pouran, 2005). More recently new techniques such as the wavelet transform seem to reduce the color distortion problem and to keep the statistical parameters invariable.

3.3.2 Multi-resolution analysis-based methods

Multi-resolution or multi-scale methods, such as pyramid transformation, have been adopted for data fusion since the early 1980s (Adelson and Bergen,1984). The Pyramid-based image fusion methods, including Laplacian pyramid transform, were all developed from Gaussian pyramid transform, have been modified and widely used [Miao, and Wang 2007; Xiang and Su, 2009). In

1989, Mallat put all the methods of wavelet construction into the framework of functional analysis and described the fast wavelet transform algorithm and general method of constructing wavelet orthonormal basis.

On the basis, wavelet transform can be really applied to image decomposition and reconstruction

(Mallat, S.G. (1989); Ganzalo, P.; Jesus, M.A. (2004); Ma, H.; Jia, C.Y.; Liu, S., 2005). Wavelet transforms provide a framework in which an image is decomposed, with each level corresponding to a coarser resolution band. For example, in the case of fusing a MS image with a high-resolution

PAN image with wavelet fusion, the Pan image is first decomposed into a set of low-resolution Pan images with corresponding wavelet coefficients (spatial details) for each level. Individual bands of the MS image then replace the low-resolution Pan at the resolution level of the original MS image.

The high resolution spatial detail is injected into each MS band by performing a reverse wavelet transform on each MS band together with the corresponding wavelet coefficients. In the waveletbased fusion schemes, detail information is extracted from the PAN image using wavelet transforms and injected into the MS image. Distortion of the spectral information is minimized compared to the standard methods (Krista, A.; Yun, Z.; Peter, D. , 2007).

In order to achieve optimum fusion results, various wavelet-based fusion schemes had been tested by many researchers. Among these schemes several new concepts/algorithms were presented and discussed. Candes provided a method for fusing SAR and visible MS images using the Curvelet transformation. The method was proven to be more efficient for detecting edge information and denoising than wavelet transformation (Candes, E.J.; Donoho, D.L., 2000). Curvelet-based image fusion has been used to merge a Landsat ETM+ panchromatic and multiple-spectral image. The proposed method simultaneously provides richer information in the spatial and spectral domains

(Choi, M.; Kim, RY.; Nam, MR, 2005). Donoho et al. (2002) presented a flexible multi-resolution, local, and directional image expansion using contour segments, the Contourlet transform, to solve the problem that wavelet transform could not efficiently represent the singularity of linear/curve in image processing (Do, M.N.; Vetterli, M., 2003; Do, M.N.; Vetterli, M., 2005). Contourlet transform provides flexible number of directions and captures the intrinsic geometrical structure of images. In general, as a typical feature level fusion method, wavelet-based fusion could evidently perform better than convenient methods in terms of minimizing color distortion and denoising effects. It has been one of the most popular fusion methods in remote sensing in recent years, and has been standard module in many commercial image processing soft wares, such as ENVI, PCI,

ERDAS. Problems and limitations associated with them include: (1) Its computational complexity compared to the standard methods; (2) Spectral content of small objects often lost in the fused images; (3) It often requires the user to determine appropriate values for certain parameters (such as thresholds). The development of more sophisticated wavelet-based fusion algorithm (such as

Ridgelet, Curvelet, and Contourlet transformation) could improve the performance results, but these new schemes may cause greater complexity in the computation and setting of parameters.

3.3.3 Artificial neural network based fusion method

Artificial neural networks (ANNs) have proven to be a more powerful and self-adaptive method of pattern recognition as compared to traditional linear and simple nonlinear analyses (Louis, E.K. and

Yan, X.H., 1998, Dong. J,; Yang, X., Clinton, N., Wang, N., 2004). The ANN-based method employs a nonlinear response function that iterates many times in a special network structure in order to learn the complex functional relationship between input and output training data.

Many multisensor studies have used ANN because no specific assumptions about the underlying probability densities are needed (see e.g. Gong, P., Pu, R., Chen, J. 1996; , Skidmore, A. K., Turner,

B. J. , Brinkhof, W., Knowles, E., 1997). Once trained, the ANN model can remember a functional relationship and be used for further calculations. For these reasons, the ANN concept has been adopted to develop nonlinear models for multiple sensors data fusion.

A drawback of ANN in this respect is that they act like a black box in that the user cannot control how the different data sources are used. It is also difficult to explicitly use a spatial model for neighbouring pixels (but one can extend the input vector from measurements from a single pixel to measurements from neighbouring pixels). Guan et al. (1997) utilized contextual information by using a network of neural networks with which they built a quadratic regularizer. Another drawback is that specifying a neural network architecture involves specifying a large number of parameters. A classification experiment should take care in choosing them and apply different configurations, making the complete training process very time consuming (see Paola, J. D., Schowengerdt, R.

A.,1997; Skidmore, A. K., Turner, B. J. , Brinkhof, W., Knowles, E. ,1996).

Hybrid approaches combining statistical methods and neural networks for data fusion have also been proposed. Benediktsson et al. (Benediktsson, J. A. , Sveinsson, J. R., Swain, P. H. ,1997) apply a statistical model to each individual source, and use neural nets to reach a consensus decision. Most applications involving a neural net use a multilayer perceptron or radial basis function network, but other neural network architectures can be used (see, e.g., Benediktsson, J. A. ,

Sveinsson, J. R., Ersoy, O. K. , 1996; Carpenter, G. A., Gjaja, M. N., Gopal, S. , Woodcock, C. E.

,1996; Wan W., Fraser, D. , 1994). Neural nets for data fusion can be applied both at the pixel-, feature-, and decision level. For pixel- and feature-level fusion a single neural net is used to classify the point feature vector or pixel measurement vector. A multilayer perceptron neural net is first used to classify the images from each source separately. Then, the outputs from the sensor-specific nets are fused and weighted in a fusion network.

3.3.4 Dempster-Shafer evidence theory based fusion method

Dempster-Shafer decision theory is considered a generalized Bayesian theory, used when the data contributing to the determination of the analysis of the images is subject to uncertainty. It allows distributing support for proposition not only to a proposition itself but also to the union of propositions that include it. Compared with Bayesian theory, the Dempster-Shafer theory of evidence feels closer to our human perception and reasoning processes. Its capability to assign uncertainty or ignorance to propositions is a powerful tool for dealing with a large range of problems that otherwise would seem intractable [Wu H., Siegel M., Stiefelhagen R.,Yang J., 2002).

The Dempster-Shafer theory of evidence has been applied on image fusion using SPOT/HRV image and NOAA/AVHRR series. The results show unambiguously the major improvement brought by such a data fusion, and the performance of the proposed method (Le Hégarat-Mascle, S. , Richard,

D., Ottlé, C. , 2003). Borotschnig et.al. (1999) compared three frameworks for information fusion and view-planning using different uncertainty: probability theory, possibility theory and Dempster-

Shafer theory of evidence (Borotschnig, H., Paletta, L. , Prantl, M. , Pinz, A. , Graz, A.,1999). The results indicated that Dempster-Shafer decision theory based sensor fusion method will achieve much higher performance improvement, and it provides estimates of imprecision and uncertainty of the information derived from different sources

3.3.5 Multiple algorithm fusion

The combination of several different fusion schemes has been approved to be the useful strategy which may achieve better quality of results (Yun, Z., 2004, Krista,) . As a case in point, quite a few researchers have focused on incorporating the traditional IHS method into wavelet transforms, since the IHS fusion method performs well spatially while the wavelet methods perform well spectrally

(A.; Yun, Z.; Peter, D., 2007, Le Hégarat-Mascle, S. , Richard, D., Ottlé, C. , 2003). However, selection and arrangement of those candidate fusion schemes are quite arbitrary and often depends upon the user’s experience.

3.3.6 Classification

Classification is one of the key tasks of remote sensing applications. The classification accuracy of remote sensing images is improved when multiple source image data are introduced to the processing (Pohl, C.; Van Genderen, J.L., 1998). Images from microwave and optical sensors offer complementary information that helps in discriminating the different classes. As discussed in the work of Wang et al. (2007), a multi-sensor decision level image fusion algorithm based on fuzzy theory are used for classification of each sensor image, and the classification results are fused by the fusion rule. Interesting result was achieved mainly for the high speed classification and efficient fusion of complementary information (Wu Y., Yang W., 2003). Land-use/land-cover classification

had been improved using data fusion techniques such as ANN and the Dempster-Shafer theory of evidence. The experimental results show that the excellent performance of classification as compared to existing classification techniques (Sarkar A., Banerjee A., Banerjee N., Brahma S.,

Kartikeyan B., Chakraborty M., Majumder K.L. 2005; Liu C.P., Ma X.H., Cui Z.M., 2007). Image fusion methods will lead to strong advances in land use/land cover classifications by use of the complementary of the data presenting either high spatial resolution or high time repetitiveness.

References

Adelson, C.H.; Bergen, J.R.(1984). Pyramid methods in image processing. RCA Eng.

Vol.29,pp.33-41

Asano, F., Kiyoshi Yamamoto, Isao Hara, Jun Ogata, Takashi Yoshimura, Yoichi Motomura,

Naoyuki Ichimura, and Hideki Asoh. (2004) Detection and separation of speech event using audio and video information fusion and its application to robust speech interface. EURASI P J . Appl .

Signa l Process.

, 2004:1727–1738.

Basseville M., A. Benveniste, K. C. Chou, S. A. Golden, R. Nikoukhah, , and A. S. Willsky. (1992)

Modeling and estimation of multiresolution stochastic processes. IEE E Transaction s o n

Informatio n Theory , 38:766784

Benediktsson, J. A. , Sveinsson, J. R., Ersoy, O. K. (1996) Optimized combination of neural networks. In IEEE International Symposium on Circuits and Systems (ISCAS’96), pages 535–538,

Atlanta, Georgia, May 1996.

Benediktsson, J. A. , Sveinsson, J. R., Swain, P. H. (1997) Hybrid consensys theoretic classification. IEEE Tr. Geosc. Remote Sensing, 35:833–843

Berk, R. (2004) Regressio n Analysis : A Constructiv e Critique . Sage

Billings, S. D. , G. N. Newsam, and R. K. Beatson. (2002) Smooth fitting of geophysical data using continuous global surfaces. Geophysics , 37:1823–1834

Blum, R.S.; Liu, Z. (2006). Multi-Sensor Image Fusion and Its Applications; special series on

Signal Processing and Communications; CRC Press: Boca Raton, FL, USA, 2006

Borotschnig, H., Paletta, L. , Prantl, M. , Pinz, A. , Graz, A. (1999). A Comparison of Probabilistic,

Possibilistic and Evidence Theoretic Fusion Schemes for Active Object Recognition. Computing.

Vol.62,pp. 293–319

Braverman, A. (2008) Data fusion. In Encyclopedi a o f quantitativ e ris k analysi s an d assessment , volume 2. Wiley, 2008.

Campell, J. (2002) Introductio n t o Remot e Sensing . The Guilford Press, New York

Candes, E.J.; Donoho, D.L.(2000). Curvelets-A Surprisingly Effective Nonadaptive Representation for Objects with Edges.Curves and Surfcaces; Vanderbilt University Press: Nashville, TN, USA, pp.105-120

Carpenter, G. A., Gjaja, M. N., Gopal, S. , Woodcock, C. E. (1996) ART neural networks for remote sensing: Vegetation classification from Landsat TM and terrain data. In IEEE Symp. Geosc.

Rem. Sens. (IGARSS), pages 529–531, Lincoln, Nebraska, May 1996.

Chee-Hong, C., Aixin, S., Ee-peng, L. (2001) Automated online news classification with personalization. In i n 4t h Internationa l Conferenc e o f Asia n Digita l Library .

Choi, M.; Kim, RY.; Nam, MR. (2005) Fusion of multi-spectral and panchromatic satellite images using the Curvelet transform. IEEE Geosci. Remote Sens. Lett. Vol.2,pp. 136-140,ISSN 0196-2892

Chou, K.C., Willsky, A.S., Nikoukhah, A.B. (1994) Multiscale systems, kalman filters, and riccati equations. IEE E Trans . o n Automati c Control , 39(3)

Cressie, N. A. C. (1993) Statistic s fo r Spatia l Data . Wiley-Interscience, New York, NY, 1993.

Cressie, N. A. C., Johannesson, G. (2008) Fixed rank kriging for very large spatial data sets.

Journa l o f th e Roya l Statistica l Society , 70(1)

Dasarathy B.V. (2007) A special issue on image fusion: advances in the state of the art. Inf.

Fusion.;8 doi: 10.1016/j.inffus.2006.05.003.

Do M. N., Vetterli M. (2003) “Contourlets,”in Beyond Wavelets, G. V. Welland, Ed., pp. 83–105,

Academic Press, New York, NY, USA, ISBN 978-0127432731

Do, M. N.; Vetterli M. (2005). The contourlet transform: an efficient directional multiresolution image representation.

Available online: http://infoscience.epfl.ch/record/49842/files/DoV05.pdf?version=2 (accessed

June 29, 2013)

Dong, J.; Zhuang, D.; Huang, Y.; Fu, J. (2009) Advances in Multi-Sensor Data Fusion: Algorithms and Applications. Sensors , 9 , 7771-7784.

Dong. J.; Yang, X.; Clinton, N.; Wang, N. (2004).An artificial neural network model for estimating crop yields using remotely sensed information. Int. J. Remote Sens. Vol. 25,pp. 1723-1732,ISSN

0143-1161

Fotheringham, A. S. , Wong, D. W. S. (1991) The modifiable areal unit problem in multivariate statistical analysis. Environmen t an d Planning , 23(7)

Fotheringham, A.S. (2002) Geographicall y Weighte d Regression : Th e Analysi s o f Spatiall y

Varyin g Relationships . Wiley, 2002.

Fuentes M, Raftery, A. E. (2005) Model evaluation and spatial interpolation by bayesian combination of observations with outputs from numerical models. Biometrics , 61(1):36–45

Furrer, R., Genton, G. Marc, Nychka, Douglas (2006) Covariance tapering for interpolation of large spatial datasets. Journa l o f Computationa l an d Graphica l Statistics , 15(3):502–523

Galatsanos, N.P., C. Andrew Segall, A. K. Katsaggelos. (2003) Imag e Enhancement , volume 2, pages 388–402. Optical Engineering Press

Ganzalo, P.; Jesus, M.A. (2004). Wavelet-based image fusion tutorial. Pattern Recognit.

Vol.37,pp.1855-1872, ISSN 0031-3203

Gong, P., Pu, R., Chen, J. (1996) Mapping ecological land systems and classification uncertainties from digital elevation and forest-cover data using neural networks. Photogrammetric Engineering &

Remote Sensing, 62:1249–1260

Goovaerts, P. (1997) Geostatistic s fo r Natura l Resource s Evaluation . Oxford University Press,

London

Gotway C.A. and Young L.J. (2002) Combining incompatible spatial data. Journa l o f th e America n

Statistica l Association , 97:632–648

Grose, D.J., Richard Harris, Chris Brunsdon, Dave Kilham (2008) Grid enabling geographically weighted regression. 2008.

Available on line: http://www.merc.ac.uk/sites/default/files/events/conference/2007/papers/paper147.pdf)

Guan, L. , Anderson, J. A., Sutton, J. P..(1997) A network of networks processing model for image regularization. IEEE Trans. Neural Networks, 8:169–174

Hall L., Llinas J. (1997) An introduction to multisensor data fusion. Proc. IEEE. ;85:6–23

Hall, D.L. (2004) Mathematica l Technique s i n Multisenso r Dat a Fusion . Artech House, Inc.,

Norwood, MA, USA

Hartman, L., Ola Hössjer (2008) Fast kriging of large data sets with gaussian markov random fields.

Comput . Stat . Dat a Anal.

, 52(5):2331–2349.

Hastie, T., Robert Tibshirani, and Jerome Friedman (2008) Th e Element s o f Statistica l Learning .

Springer Series in Statistics. Springer New York Inc., New York, NY, USA

Isaaks, E. H., R. Mohan Srivastava (1989) Applie d Geostatistics . Oxford University Press

Johannesson, G., N. Cressie (2008) Variance-Covarianc e Modelin g an d Estimatio n fo r Multi-

Resolutio n Spatia l Models , pages 319–330. Dordrecht: Kluwer Academic Publishers.

Klein, L.A. (2004) Senso r an d Dat a Fusion : A Too l fo r Informatio n Assessmen t an d Decisio n

Making . SPIE-International Society for Optical Engineering

Krista, A.; Yun, Z.; Peter, D. (2007).Wavelet based image fusion techniques — An introduction, review and comparison. ISPRS J. Photogram. Remote Sens. Vol. 62, pp.249-263.

Le Hégarat-Mascle, S. , Richard, D., Ottlé, C. (2003). Multi-scale data fusion using Dempster-

Shafer evidence theory, Integrated Computer-Aided Engineering, Vol.10,No.1,pp.9-22, ISSN

1875-8835

Little, R. J. A., D. B. Rubin (1987) Statistica l Analysi s wit h Missin g Data . Wiley Series in

Probability and Statistics. Wiley, New York, 1st edition, 1987.

Liu C.P., Ma X.H., Cui Z.M. (2007) Multi-source remote sensing image fusion classification based on DS evidence theory. Proceedings of Conference on Remote Sensing and GIS Data Processing

and Applications; and Innovative Multispectral Technology and Applications; Wuhan, China.

November 15–17, 2007; part 2.

Louis, E.K.; Yan, X.H. (1998).A neural network model for estimating sea surface chlorophyll and sediments from thematic mapper imagery. Remote Sens. Environ.Vol.,66, pp.153165,ISSN 0034-

4257

Ma, H.; Jia, C.Y.; Liu, S. (2005).Multisource image fusion based on wavelet transform. Int. J. Inf.

Technol. Vol. 11,pp 81-91,

Mallat, S.G. (1989). A theory for multiresolution signal decomposition: the wavelet representation.

IEEE Trans. Pattern Anal. Mach. Intell.Vol.11,pp.674-693,ISSN 01628828

Miao, Q.G.; Wang, B.S. (2007). Multi-sensor image fusion based on improved laplacian pyramid transform. Acta Opti. Sin. Vol.27,pp.1605-1610,ISSN 1424-8220

Neteler, M., H. Mitasova (2008) Ope n Sourc e GIS : A GRAS S GI S Approach . Kluwer Academic

Publishers/Springer, Boston, third edition, 2008.

Nguyen, H., Cressie, N. A., Braverman, A. (2012). Spatial statistical data fusion for remote sensing applications. Journal of the American Statistical Association, 107 (499), 1004-1018.

Nychka, D. W. (1999) Spatia l Proces s Estimate s a s Smoothers , pages 393–424. Wiley, New York

Nychka, D., N. Saltzman (1998) Design of air quality monitoring networks. Cas e Studie s i n

Environmenta l Statistics

Pace, R. K., Ronald Barry (1997) Kriging with large data sets using sparse matrix techniques.

Communication s i n Statistics : Computatio n an d Simulation , 26(2)

Paola, J. D., Schowengerdt, R. A. (1997). The effect of neural-network structure on a multispectral land-use/ land-cover classification. Photogrammetric Engineering & Remote Sensing, 63:535–544

Pohl, C.; Van Genderen, J.L. (1998). Multisensor image fusion in remote sensing: concepts, methods and applications. Int. J. Remote Sens.Vol. 19,pp.823-854, ISSN 0143-1161

Pouran, B.(2005). Comparison between four methods for data fusion of ETM + multispectral and pan images. Geo-spat. Inf. Sci.Vol.8,pp.112-122,ISSN

Sarkar A., Banerjee A., Banerjee N., Brahma S., Kartikeyan B., Chakraborty M., Majumder K.L.

(2005) Landcover classification in MRF context using Dempster-Shafer fusion for multisensor imagery. IEEE Trans. Image Processing.; 14:634–645.

Simone G., Farina A., Morabito F.C., Serpico S.B., Bruzzone L. (2002) Image fusion techniques for remote sensing applications. Inf. Fusion.;3:3–15.

Skidmore, A. K., Turner, B. J. , Brinkhof, W., Knowles, E. (1996) Performance of a neural network: Mapping forests using GIS and remotely sensed data. Photogrammetric Engineering &

Remote Sensing, 63:501–514

Smith M.I., Heather J.P. (2005) Review of image fusion technology in 2005. Proceedings of

Defense and Security Symposium; Orlando, FL, USA. 2005.

Sun, W. (2004) Land-Us e Classificatio n Usin g Hig h Resolutio n Satellit e Imagery : A Ne w

Informatio n Fusio n Metho d A n Applicatio n i n Landau , Germany . Johannes Gutenburg University

Mainz, Germany

Tobler, W. (1970) A computer movie simulating urban growth in the detroit region. Economi c

Geography , 46(2):234–240

Torra, V., ed. (2003) Informatio n Fusio n i n Dat a Mining . Springer-Verlag New York, Inc.,

Secaucus, NJ, USA

Vijayaraj V., Younan N., O'Hara C. (2006) Concepts of image fusion in remote sensing applications. Proceedings of IEEE International Conference on Geoscience and Remote Sensing

Symposium; Denver, CO, USA. July 31–August 4, 2006; pp. 3798–3801.

Wan W., Fraser, D. (1994) A self-organizing map model for spatial and temporal contextual classification. In IEEE Symp. Geosc. Rem. Sens. (IGARSS), pages 1867–1869, Pasadena,

California, August 1994.

Wang R., Bu F.L., Jin H., Li L.H. (2007) A feature-level image fusion algorithm based on neural networks. Bioinf. Biomed. Eng.;7:821–824.

Wikle, C. K., Ralph F. Milliff, Doug Nychka, and L. Mark Berliner. (1999) Spatio-temporal hierarchical bayesian modeling: Tropical ocean surface winds. Journa l o f th e America n Statistica l

Association , 96:382–397

Wu H., Siegel M., Stiefelhagen R.,Yang J. (2002).Sensor Fusion Using Dempster-Shafer Theory

IEEE Instrumentation and Measurement Technology Conference Anchorage, AK, USA, 21-23 May

2002

Wu Y., Yang W. (2003) Image fusion based on wavelet decomposition and evolutionary strategy.

Acta Opt. Sin.;23:671–676

Xiang, J.; Su, X. (2009). A pyramid transform of image denoising algorithm based on morphology.

Acta Photon. Sin.Vol.38,pp.89-103,ISSN 1000-7032

Yun, Z. (2004). Understanding image fusion. Photogram. Eng. Remote Sens.Vol.6, pp.657661,ISSN 2702-4292

4 Data disaggregation

Availability of high-precision maps is one of the most important factors in many decision-making processes to address numerous spatial problems. However, the data needed to produce such highprecision maps are often unavailable, since for confidentiality and other reasons census or survey data are released only for spatially coarser reporting units. Thus, there is the need to use spatial disaggregation techniques (Kim and Yao, 2010; Li et al., 2007).

The idea underlying spatial disaggregation techniques is to interpolate spatially aggregate data into a different spatial zoning system of higher spatial resolution. The original spatial units, with known data, are usually called source zones, while the final spatial units that describe the same region are called target zones (Lam, 1983). Spatial disaggregation methods are essentially based on estimation and data interpolation techniques (ref. Part 1) and they can be classified according to several criteria e.g. underlying assumptions, use of ancillary data, etc. (Wu et al., 2005).

Inevitably, all these spatial disaggregation techniques generate error: this can be caused by the assumptions about the spatial distribution of the objects (e.g. homogeneity in density) or by the spatial relationship imposed within the spatial disaggregation process (e.g. size of the target zones)

(Li et al., 2007).

4.1 Mapping techniques Equation Chapter 4 Section 1

4.1.1 Simple area weighting method

The simplest interpolation approach that can be used to disaggregate data is probably the simple area weighting method. It basically proportions the attribute of interest by area, given the geometric intersection of the source zones with the target zones. This method assumes that the attribute y is uniformly distributed within each source zone: given this hypothesis, the data in each target zone can be estimated as y t

= å s

A st

× y s

A s

(4.1.1) where y t

is the estimated value of the target variable at target zone t , y s

is the observed value of the target variable in source zone s , A s

is the area of source zone s and A st

is the area of the intersection of source and target zones. This method satisfy the so called “pycnophylactic property”

(or volume-preserving property), that requires the preservation of the initial data: the predicted value on source area s obtained aggregating the predicted values on intersections with area s should coincide with the observed value on area s (Do et al., 2013; Li et al., 2007). However, several

studies have shown that the overall accuracy of simple area weighting is low when compared with that of other techniques (see for example, Langford 2006; Gregory 2005; Reibel and Aditya 2006).

To overcome the hypothesis of homogeneous density of the simple area weighting method, a hypothesis that is almost never accurate, several approaches have been proposed. On one side many studies can be found in the literature that aim to overcome the problem by smooth density functions such as kernel-based surface functions around area centroids and Tobler’s (1979) pycnophylacticinterpolation method (Kim and Yao, 2010).

4.1.2 Pycnophylactic interpolation methods

Tobler (1979) proposed the pycnophylactic interpolation method as an extension of simple area weighting to produce smooth population-density data from areally aggregated data. It calculates the target region values based on the values and weighted distance from the centre of neighbouring source region, keeping volume consistency within the source regions. It uses the following algorithm:

1) intersect a dense grid over the study region;

2) assign each grid cell a value using simple area weighting;

3) smooth the values of all the cells by replacing each cell value with the average of its neighbours;

4) calculate the value in each source region summing all the cells values;

5) weight the values of the target cells in each source region equally so that source region values are consistent;

6) repeat steps 3 to 5 until there are no further changes to a pre-specified tolerance.

In this approach, the choices of an appropriate smooth density function and of a search window size heavily depend on the characteristics of individual applications. The underlying assumption is that the value of a spatial variable in neighbouring target regions tends to be similar: Tobler’s first law of geography asserts that near things are more related than distant ones (Tobler, 1970). As an example, Comber et al. (2007) refer to an application of pycnophylactic interpolation to agricultural data to identify land use areas over aggregated agricultural census data.

4.1.3 Dasymetric mapping

A different approach to overcome the hypothesis of homogeneous density of the simple area weighting method is the dasymetric-mapping method (Wright, 1936; Mennis and Hultgren, 2006,

Langford, 2003). For reflecting density variation within source zones, this method uses other relevant and available information x to distribute y accordingly. That is, this method uses additional

relevant information to estimate the actual distribution of aggregated data with the target units of analysis. This should help allocating y to the small intersection zones within the sources, provided that the relationship between x and y is of a proportionality type with a strong enough correlation.

This means that this method replaces the homogeneity assumption of simple area weighting by the assumption that data are proportional to the auxiliary information on any sub-region. Considering a quantitative variable x , the dasymetric-mapping method extends formula (4.1.1) by substituting x for the area: y t

= å s x st

× y s x s

(4.1.2)

The simplest scheme for implementing dasymetric mapping is to use a binary mask of land-cover types (Langford and Unwin, 1994; Langford and Fisher, 1996; Eicher and Brewer, 2001; Mennis and Hultgren, 2006); in this case the auxiliary information is categorical and its levels defines the so called control zones. The most classical case, called binary dasymetric mapping, is the case of population estimation when there are two control zones: one which is known to be populated and the other one unpopulated. It is assumed that the count density is uniform throughout control zones.

In this case formulas (4.1.1) and (4.1.2) become

P t

= å s

A tsp

×

P s

A sp

(4.1.3) where P t

is the estimated population at target zone t , P s

is the total population in source zone s , A sp is the source zone area identified as populated and A tsp

is the area of overlap between target zone t and source zone s having land cover identified as populated.

Several multi-class extensions to binary dasymetric mapping have been proposed (Kim et al., 2010;

Mennis, 2003; Langford, 2006). Li et al. (2007) present three-class dasymetric mapping for population estimation that takes advantage of binary dasymetric mapping and a regression model with a limited number of ancillary class variables (i.e. non-urban, low-density residential and highdensity residential) to present a range of residential densities within each source zone. The technique is based on the most relaxed assumption about homogeneous density for each land class within each source zone:

P t

= å s

å c

A stc

P s

A sc

= å s

å c

A stc d sc

. (4.1.4)

Here A stc

the area of intersection between target zone t and source zone s and identified as land class c and A sc

is the area of source zone s identified as land class c ; thus, d sc

represents the density estimate for class c in zone s . These densities can be estimated under a regression model, as described below.

The dasymetric and pycnophylactic methods have complementary strengths and shortcomings for population estimation and target variable disaggregation. For this reason, several hybrid pycnophylactic-dasymetric methods have been proposed (Kim et al., 2010; Mohammed et al. 2012;

Comber et al., 2007). All these methods use dasymetric mapping for a preliminary population/variable of interest redistribution and an iterative pycnophylactic-interpolation process to obtain a volume-preserved smoothed surface. In particular, Comber et al. (2007) use the hybrid method to disaggregate agricultural census data in order to obtain a fine-grained (one Km

2

) maps of agricultural land use in the United Kingdom.

4.1.4 Regression models

The dasymetric weighting schemes have several restrictions: the assumption of proportionality of y and x , the fact that the auxiliary information should be known at intersection level and the limitation to a unique auxiliary variable. Spatial disaggregation techniques based on regression models can overcome these three constraints (Langford et al., 1991; Yuan et al., 1997; Shu and Lam, 2011).

Another characteristic of dasymetric method is that when predicting at the level of the s t intersection only the areal data y s

within which the intersection is nested is used for prediction and this will not be the case for regression. In general the regression techniques involve a regression of the source level data of y on the target or control values of x .

Generally speaking, regression models for population counts estimation assume that the given source zone population may be expressed in terms of a set of densities related to the areas assigned to the different land classes. Other ancillary variables may be included for these area densities, but the basic model is:

(4.1.5) where P s

is the total population count for each source zone s , c is the land cover class, A sc

is the area size for each land class within each source zone, d c

is the coefficient of the regression model and is the random error. The output of the regression model is the estimation of the population densities d c

. A problem with this regression model is that the densities are derived from a global context, they remain spatially stable within each land class throughout the study area; therefore, it

has been suggested that the locally fitted approach used by dasymetric method will always outperform the global fitting approach used by regression models (Li et al., 2007). To overcome this limit, locally fitted regression models have been proposed, where the globally estimated density for each land class is locally adjusted within each source zone by the ratio of the predicted population and census counts. In this way a variation of the absolute value of population densities is achieved by reflecting the differences in terms of local population density between source zones.

These methods were developed initially to ensure that the populations reported within target zones were constrained to match the overall sum of the source zones (the pycnophylactic property).

4.1.5 The EM algorithm

Another statistical approach in the same density-solution class as the regression model is the EM algorithm (Flowerdew and Green 1992). Rather than using a regression approach, the interpolation problem is set as a missing data problem, considering the intersection values of the target variable as unknown and the source values as known therefore allowing to use the EM algorithm to overcome the difficulty. This method is of particular interest when the variable of interest is not a count, but can be assumed to follow the normal distribution. Let y st

be the mean of the values of the variable of interest over the n st

values in the intersection zone s t , and assume that

. (4.1.6)

The values n st

are assumed as known or interpolated from n s

.

We have that y s

= å t n st y st

/ n s and

é

é

é y st y s

é

é

é

=

N

é

é

é

é m st m s

é

é

é

,

é

é s

2 s

2

/ n st

/ n s s

2 / n s s

2

/ n s

é

é

é

é

.

If the y st

were know we would obtain y t

, the mean in target zone t as: y t

= å s n st y st

/ n t

, with n t

= å s n st

. Setting y st

= y s

would give the simple areal weighting solution. With the EM

(4.1.7)

(4.1.8) algorithm, instead, the interpolated values y t

can be obtained as follows

E-step:

ˆ st

=

( st

| y s

)

= m st

+

( y s

m s

)

where m s

= å t n st m st

/ n s

.

M-step:

Treat the y st

as a sample of independent observations with distribution

ˆ st

~ N

( m st

, s

2

/ n st

)

and fit the model m st

=

X b

with weighted least squares.

These steps are repeated until convergence, and then the interpolated y t

are computed as weighted mean of the

ˆ st

values from the E-step: y t

= å s n st y st

/ n t

. (4.1.9)

If convergence cannot be achieved, an alternative non-iterative scheme can be used (Flowerdew and

Green 1992).

Regression models can be used also to disaggregate count, binary and categorical data (Langford and Harvey, 2001; Tassone et al., 2010).

Small area estimation methods also use regression models for obtaining estimates at a fined grained scale, e.g. for areas or domains where the number of observations is not large enough to allow sufficiently precise direct estimation using available survey data. These models can also account for specific characteristics of the data, e.g. non-parametric specifications, methods robust to the presence of outliers. Moreover, these models can also directly incorporate geographic information referring to the areas of interest. Small area estimators are reviewed in paragraph 4.2. There are also alternative models that can directly incorporate geographic information when this is referred directly to the units of interest: these are the geoadditive models, that follow in the class of geostatistical models (see paragraph 4.3).

4.2 Small area estimators Equation Chapter 4 Section 2

Sample surveys provide a cost-effective way of obtaining estimates for population characteristics of interest. On many occasions, however, the interest is in estimating parameters for domains that contain only a small number of data points. The term small areas is used to describe domains whose sample sizes are not large enough to allow sufficiently precise direct estimation. Design issues, such as number of strata, construction of strata, sample allocation and selection probabilities, have been addressed the past 60 years or so. In practice, it is not possible to anticipate and plan for all possible areas (or domains) and uses of survey data as “the client will always require more than is specified at the design stage” Fuller (1999). When direct estimation is not possible, one has to rely on alternative, model-based methods for producing small area estimates. Such methods depend on the availability of population level auxiliary information related to the variable of interest, they use

linear mixed models and are commonly referred to as indirect methods. For a detailed description of this theory see the monograph of Rao (2003), or the reviews of Ghosh and Rao (1994), Pfeffermann

(2002) and more recently Jiang and Lahiri (2006a). For small area estimation with application to agriculture see Rao (2010).

However, it is important to consider design issues that have an impact on small are estimation, particularly in the context of planning and designing large-scale surveys. Rao (2003) present a brief discussion on some of the design issues and referred to Singh, Gambino and Mantel (1994) for a more detailed discussion. Rao (2003) suggested: (i) Minimization of clustering, (ii) replacing large strata by many small strata from which samples are drawn, (iii) adopting compromise sample allocations to satisfy reliability requirements at a small area level as well as large area level, (iv) integration of surveys, (v) dual frame surveys and (vi) repeated surveys.

In general, considering the drawbacks of direct estimators for small areas, indirect estimators will always be needed in practice and i n recent years there has been a number of developments in the small area estimation (SAE) literature. This involves both extensions of the conventional small area model and the estimation of parameters other than averages and totals for example, quantities of the small area distribution function of the outcome of interest (Tzavidis et al.

2010) and complex indicators (Molina and Rao 2010, Marchetti et al. 2012). One research direction has focused on nonparametric versions of the random effects model (Opsomer et al. 2008) while a further research area that has attracted interest is in the specification of models that borrow strength over space either by specifying models with spatially correlated or nonstationary random effects

(Salvati et al. 2012; Chandra et al. 2012). The issue of outlier robust small area estimation has also attracted a fair amount of interest mainly due to the fact that in many real data applications the

Gaussian assumptions of the conventional random effects model are not satisfied. Two main approaches to outlier robust small area estimation have been proposed. The first one is based on Mestimation of the until level random effects model (Sinha and Rao 2009) while the second is based on the use of an M-quantile model under which area effects are estimated using a semi-parametric approach (Chambers and Tzavidis 2006).

Reliable small-area information on crop statistics is needed for formulating agricultural policies.

Agriculture is now a world in deep evolution: the focus is on the multifunctional nature of agriculture, income of the agricultural household, food safety and quality production, agroenvironmental issues and rural development, including rural areas. At the same time there is an increasing integration of environmental concerns in the agricultural policy and the promotion of a sustainable agriculture. For these reasons the study variable could be of different nature: it is generally continuous (for example, crop yield), but it can be also a binary response (a farm is multifunctional or not) and a count (number of type of production for each farm). When the survey

variables are categorical in nature, they are not suited to standard SAE methods based on linear mixed models. One option in such cases is to adopt a empirical best predictor based on generalised linear mixed models. In this literature review we present briefly the conventional and advanced models for small area estimation with a focus on the application in agriculture.

4.2.1 Model assisted estimators

Let's suppose that a population U of size N is divided into m non-overlapping subsets U i

(domains of study or areas) of size N i

, i

=

1,..., m . We index the population units by j and the small areas by i .

The population data consist of values y ij

of the variable of interest, values x ij

of a vector of p auxiliary variables. We assume that x ij contains 1 as its first component. Suppose that a sample s is drawn according to some, possibly complex, sampling design such that the inclusion probability of unit j within area i is given by p ij

, and that area-specific samples s i

Ì

U i

of size n i

³

0 are available for each area. Note that non-sample areas have n i

=

0 , in which case s i

is the empty set.

The set r i

Í

U i

contains the N i

n i

indices of the non-sampled units in small area i . Values of y ij are known only for sampled values while for the p -vector of auxiliary variables it is assumed that area level totals X i

or means X i

are accurately known from external sources.

Provided that large enough domain-specific sample sizes are available, statistical agencies can perform domain estimation by using the same design-based methods as those used for the estimation of population level quantities. When, however, area sample sizes are not large enough to allow for reliable direct estimation in all or most of the domains, there is need to use small area estimation techniques.

The application of design-based estimators, namely Generalized Regression (GREG) estimators, in a small area setting was introduced by Sarndal (1984). The class of GREG estimators encompasses a wide range of estimators assisted with a model and are characterized by asymptotic design unbiasedness and consistency. GREG estimators share the following structure

J ˆ i

GREG =

1

N i

[ j

é

é

U i

ˆ ij

+ j

é

é s i w ij

( y ij

ˆ ij

)

]

. (4.2.1)

Different GREG estimators are obtained in association with different models specified for assisting y ij

, j

Î

U i

. In the simplest case a fixed effects regression model is assumed:

" j

Î

U i

,

" i where the expectation is taken with respect to the assisting model. If weights are used in the estimation process, it leads to the estimator

J ˆ i

GREG

-

S =

1

N i j

é y ij

é s i w ij

+ (

1

N i j

é

é

U i x ij

T -

1

N i j

é x

é s i w ij

T ij

)

B , (4.2.2)

where w ij

= p ij

-

1

and B

= ( j

å

Î s i w ij x ij x ij

T

) -

1

( j

å

Î s i w ij x ij y ij

)

(Rao 2003, Section 2.5). Note that in this case the regression coefficients are calculated on data from the whole sample and are not area-specific.

Lehtonen and Veijanen (1999) introduce an assisting two-levels model where , which is a model with area specific regression coefficients. In practice, not all coefficients need to be random and models with area-specific intercepts mimicking linear mixed models may be used

(see Lehtonen et al., 2003). In this case the GREG estimator takes the form (4.2.1) with

. Estimators and u i

are obtained using generalized least squares and restricted maximum likelihood methods (see Lehtonen and Pahkinen, 2004, section 6.3).

If the assisting model is a fixed effect linear model with common regression parameters (as the one reviewed in Rao 2003, Section 2.5) the resulting small area estimators overlook the so called ‘area effects’, that is the between area variation beyond that accounted for by model covariates and may result in inefficient estimators. For this reason, model dependent estimators that rely on mixed

(random) effects models gained popularity in the small area literature (see Rao 2003; Jiang and

Lahiri 2006a).

The reliability of these methods hinges on the validity of model assumptions, a criticism often raised within the design-based research tradition (Estevao and Sarndal 2004). The

GREG estimators assisted with linear mixed models have recourse to model based estimation for model parameters the efficiency of the resulting small area estimators relies on the validity of model assumption, and typically on that of normality of residuals.

Design consistency is a general purpose form of protection against model failures, as it guarantees that, at least for large domains, estimates make sense even if the assumed model completely fails.

Model-based estimators using unit level models such as the popular nested error regression (Battese et al. 1988) typically do not make use of survey weights and, in general, the derived estimators are not design consistent unless the sampling design is self-weighting within areas. Modifications of

Empirical Best Linear Unbiased Predictors (EBLUPs) aimed at achieving design consistency have been proposed by Kott (1989), Prasad and Rao (1999) and You and Rao (2002). Although design consistent, these predictors are model-based and their statistical properties such as the bias and the

Mean Square Errors (MSE) are evaluated with respect to the distribution induced by the data generating process and not randomization. Jiang and Lahiri (2006b) obtained design consistent predictors also for generalized linear models and evaluated their corresponding MSEs with respect to the joint randomization-model distribution.

4.2.2 Model based estimators

Small area model based estimators uses linear mixed models (LMMs) if the response variable is

continuous. LMMs handle data in which observations are not independent. LMMs are generalizations of linear models to better support the analysis of a dependent variable. These models allow to incorporate:

1) Random effects: sometimes the number of levels of a categorical explanatory variable is so large (with respect to sample size) that introduction of fixed effects for its levels would lead to poor estimates of the model parameters. If this is the case, the explanatory variable should not be introduced in a linear model. Mixed models solve this problem by treating the levels of the categorical variable as random, and then predicting their values.

2) Hierarchical effects: response variables are often measured at more than one level; for example in nested territories in small area estimation problems. This situation can be modelled by mixed models and it is thus an appealing property of them.

3) Repeated measures: when several observations are collected on the same individual then the corresponding measurements are likely to be correlated rather than independent. This happens in longitudinal studies, time series data or matched-pairs designs.

4) Spatial correlations: when there is correlation among clusters due to their location; for example, the correlation between nearby domains may give useful information to improve predictions.

5) Small area estimation: where the flexibility in effectively combining different sources of information and explaining different sources of errors is of great help. Mixed models typically incorporate area-specific random effects that explain the additional between area variation in the data that is not explained by the fixed part of the model.

Books dealing with LMMs include Searle, Casella and McCullogh (1992), Longford (1995),

McCullogh and Searle (2001) and Demidenko (2004).

4.2.2.1 Area level models for small area estimation

Let be the m

´

1 vector of the parameters of inferential interest (typically small area totals y i

, small area means y i

with i = 1…m) and assume that the direct estimator is available and design unbiased, i.e.:

(4.2.3) where e is a vector of independent sampling errors with mean vector 0 and known diagonal variance matrix R

= diag ( j i

) , j i

representing the sampling variances of the direct estimators of the area parameters of interest. Usually j i is unknown and is estimated according to a variety of methods, including ‘generalized variance functions’, see Wolter (1985) and Wang and Fuller (2003) for details.

The basic area level model assumes that an m

´ p matrix of area-specific auxiliary variables

(including an intercept term), X , is linearly related to as:

, (4.2.4) where is the p

´

1 vector of regression parameters, u is the m

´

1 vector of independent random area specific effects with zero mean and m

´ m covariance matrix , with I m

being the m

´ m identity matrix. The combined model (Fay-Herriot,1979) can be written as:

, (4.2.5) and it is a special case of linear mixed model. Under this model, the Empirical Best Linear

Unbiased Predictor (EBLUP)

J ˆ i

FH

( s

2 u

) is extensively used to obtain model based indirect estimators of small area parameters and associated measures of variability. This approach allows a combination of the survey data with other data sources in a synthetic regression fitted using population area-level covariates. The EBLUP estimate of

J i is a composite estimate of the form

, (4.2.6) i

= s ˆ u

2

(

2 u

+ j i

)

and is the weighted least squares estimate of with weights

( s ˆ

2 u

+ j i

) -

1

obtained regressing

J ˆ i

on x i

and s ˆ 2 u

is an estimate of the variance component s

2 u

. The

EBLUP estimate gives more weight to the synthetic estimate when the sampling variance, j i

, is large (or s ˆ u

2

is small) and moves towards the direct estimate as j i

decreases (or s ˆ 2 u

increases). For the non-sampled areas, the EBLUP estimate is given by the regression synthetic estimate, , using the known covariates associated with the non-sampled areas. Fuller (1981) applied the arealevel model (4.2.5) to estimate mean soybean hectares per segment in 1978 at the county level.

Jiang et al. (2011) derive the best predictive estimator (BPE) of the fixed parameters under the Fay–

Herriot model and the nested-error regression model. This leads to a new prediction procedure, called observed best prediction (OBP), which is different from the empirical best linear unbiased prediction (EBLUP). The authors show that BPE is more reasonable than the traditional estimators derived from estimation considerations, such as maximum likelihood (ML) and restricted maximum likelihood (REML), if the main interest is estimation of small area means, which is a mixed-model prediction problem.

A Geographic Information System (GIS) is an automated information system that is able to compile, store, retrieve, analyze, and display mapped data. Using a GIS, the available data-set for the study can be combined with the relative map of the study area. This makes the following steps possible:

1.

the geocoding of the study area allows for the computation of the coordinates of the centroid of each small area, its geometric properties (extension, perimeter, etc.) and the neighbourhood structure;

2.

the study variable and the potential explanatory variables can be referred to the centroid of each small area; the result is an irregular lattice (geocoding).

A method to take into account spatial information is to include in the model some geographic covariates for each small area by considering data regarding the spatial location (e.g. the centroid coordinates) and/or other auxiliary geographical variables referred to the same area through the use of the Geographic Information System. We expect that the inclusion of covariates should be able to take into account spatial interaction when this is due to the covariates themselves. In this case it is reasonable to assume that the random small area effects are independent and that the traditional

EBLUP is still a valid predictor.

The explicit modelling of spatial effects becomes necessary when (1) we have no geographic covariates able to take into account the spatial interaction in the target variable, (2) we have some geographic covariates, but the spatial interaction is so important that the small area random effects are presumably still correlated. In this case, taking advantage from the information of the related areas appears to be the best solution.

Salvati (2004), Singh et al. (2005), Petrucci and Salvati (2006) proposed the introduction of spatial autocorrelation in small area estimation under the Fay-Herriot model. The spatial dependence among small areas is introduced by specifying a linear mixed model with spatially correlated random area effects for , i.e.:

(4.2.7) where D is a m

´ m matrix of known positive constants, v is an m

´

1 vector of spatially correlated random area effects given by the following autoregressive process with spatial autoregressive coefficient r

and m

´ m spatial interaction matrix W (see Cressie 1993 and Anselin 1992): v

= r

Wv

+ u

Þ v

=

(

I m

r

W

) -

1 u (4.2.8)

The W matrix describes the spatial interaction structure of the small areas, usually defined through the neighbourhood relationship between areas; generally speaking, W has a value of 1 in row i and column j if areas i and j are neighbours.

The autoregressive coefficient r

defines the strength of the spatial relationship among the random effects associated with neighbouring areas. Generally, for ease of interpretation, the spatial interaction matrix is defined in row standardized form, in which the row elements sum to one; in this case r

is called a spatial autocorrelation parameter (Banerjee et al. 2004).

Combining (4.2.3) and (4.2.7), the estimator with spatially correlated errors can be written as:

(4.2.9)

The error terms v has the m

´ m Simultaneously Autoregressive (SAR) covariance matrix:

G ( d

)

= s

[

2 u

(

I

r

W

T where d =

( )

.

) (

I

r

W

] ) -

1

and the covariance matrix of is given by V ( d

)

=

R

+

DGD

T

Under model (4.2.9), the Spatial Empirical Best Linear Unbiased Predictor (SEBLUP) estimator of

J i

is:

(4.2.10) where and b i

T

is a 1

´ m vector (0,0

¼

0,1,

¼

0) with value 1 in the i-th position. The predictor is obtained from Henderson’s (1975) results for general linear mixed models involving fixed and random effects.

In the SEBLUP estimator the value of d =

( )

is obtained either by Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) methods based on the normality assumption of the random effects. For more details see Singh et al. (2005) and Pratesi and Salvati (2008). Petrucci et al. (2005) have applied the SEBLUP to estimate the average production of olives per farm in 42

Local Economic Systems of the Tuscany region (Italy). The authors remark that the introduction of geographic information improves the estimates obtained by SEBLUP by reducing the mean squared errors.

An alternative approach for introducing the spatial correlation in an area level model is proposed by

Giusti et al. (2012) with a semiparametric specification of the Fay­Herriot model obtained by

P­splines, which allows non-linearities in the relationship between the response variable and the auxiliary variables. A semiparametric additive model (referred by semiparametric model hereafter) with one covariate x

1 can be written as f ( x

1

) , where the function f (

×

) is unknown, but assumed to be sufficiently well approximated by the function:

(4.2.11) where is the q

+

1)

´

1 vector of the coefficients of the polynomial function, is the coefficient vector of the truncated polynomial spline basis (P­spline) and q is the degree of the spline q

+

= t q

if t

>

0 , 0 otherwise. The latter portion of the model allows for handling departures from a q­polynomial t in the structure of the relationship. In this portion k k

for k

=

1, … , K is a set of fixed knots and if K is sufficiently large, the class of functions in (4.2.11) is very large and can approximate most smooth functions. Details on bases and knots choice can be

found in Ruppert et al. (2003).

Since a P­spline model can be viewed as a random effects model (Ruppert et al. 2003; Opsomer et al. 2008), it can be combined with the Fay­Herriot model for obtaining a semiparametric small area estimation framework based on linear mixed model regression.

Corresponding to the and vectors define:

X

1

=

1

ê x

11

ê

ë x

1 m x q

11 x q

1 m

ú

ú

ù

ú

, Z

=

é

ê

( x

11

ê

ë ( x

1 m

-

k k

1

)

1

) q

+ q

+

( x

11

( x

1 m

-

k k

K

K

) q

+

) q

+

ú

ú

ù

ú

.

Following the same notation already introduced in the previous section, the mixed model representation of the semiparametric Fay-Herriot model (NPFH) can be written as:

(4.2.12)

The X

1

matrix of model (4.2.12) can be added to the X effect matrix and model (4.2.12) becomes:

(4.2.13) where is a p

+ q

+

1)

´

1 vector of regression coefficients, the component can be treated as a

K

´

1 vector of independent and identically distributed random variables with mean 0 and K

´

K variance matrix . The covariance matrix of model (4.2.13) is

, where y =

( s

2 g

, s

2 u

)

.

Model-based estimation of the small area parameters can be obtained by using the empirical best linear unbiased prediction (Henderson, 1975):

(4.2.14) with and .

Hereafter this estimator is called NPEBLUP. Assuming normality of the random effects, s g

2 2 u

can be estimated both by the Maximum Likelihood and Restricted Maximum Likelihood procedures

(Prasad and Rao 1990).

When geographically referenced responses play a central role in the analysis and need to be converted to maps, we can deal with utilise bivariate smoothing: . This is the case of environment and agricultural application fields. P­splines rely on a set of basis functions to handle non­linear structures in the data. So bivariate basis functions are required for bivariate smoothing.

In practical applications it is important to complement the estimates obtained using the EBLUP, the

Spatial EBLUP estimator and the Semiparametric Fay-Herriot estimator with an estimate of their variability. For these estimators an approximately unbiased analytical estimator of the MSE is: where is equal to

J ˆ i

FH mse

éé J i

( w

)

éé= g

1

( w

)

+ g

2

( w

)

+

2 g

3

( w

) u

2

) when EBLUP is used, to

(4.2.15)

when considering the SEBLUP estimator or to when the estimator is the semiparametric Fay-Herriot one. The MSE estimator (4.2.15) is the same derived by Prasad and Rao (1990); for more details on the specification of the g components under both models see Pratesi and Salvati (2009) and Giusti et al.

(2012). For a detailed discussion of the MSE and its estimation for the EBLUP based on the traditional Fay-Herriot model (section 2.1) see Rao (2003).

An alternative procedure for estimating the MSE of estimators and can be based on a bootstrapping procedure proposed by Gonzalez-Manteiga et al. (2007), Molina et al. (2009),

Opsomer et al. (2008).

All the predictors above are useful for estimating the small area parameters efficiently when the model assumptions hold, but they can be very sensitive to ‘representative’ outliers or departures from the assumed normal distributions for the random effects in the model. Chambers (1986) defines representative outlier as “sample element with a value that has been correctly recorded and that cannot be regarded as unique. In particular, there is no reason to assume that there are no more similar outliers in the nonsampled part of the population.” Welsh & Ronchetti (1998) regard representative outliers as “extreme related to the bulk of the data.” That is, the deviations from the underlying distributions or assumptions refer to the fact that a small proportion of the data may come from an arbitrary distribution rather than the underlying “true” distribution, which may result in outliers or influential observations in the data. For this reasons, Shina and Rao (2009) and

Schmid and Münnich (2013) propose a robust version of EBLUP and SEBLUP, respectively.

4.2.2.2 Unit level mixed models for small area estimation

Let x ij

denote a vector of p auxiliary variables for each population unit j in small area i and assume that information for the variable of interest y is available only from the sample. The target is to use the data to estimate various area-specific quantities. A popular approach for this purpose is to use mixed effects models with random area effects. A linear mixed effects model is

, (4.2.16) where is the p

´

1 vector of regression coefficients, u i denotes a random area effect that

characterizes differences in the conditional distribution of y given x between the m small areas, d ij is a constant whose value is known for all units in the population and e ij

is the error term associated with the j -th unit within the i -th area. Conventionally, u i

and e ij

are assumed to be independent and normally distributed with mean zero and variances s u

2

and s

2 e

respectively. The Empirical Best

Linear Unbiased Predictor (EBLUP) of the mean for small area i (Battese et al. 1988; Rao 2003) is then

J i

MX 2 u

2 e

)

=

N i

-

1

é

é

é j

é

+ ˆ

é s i y j j

é

é r i y j

é

é

é

(4.2.17) where , s i

denotes the n i

sampled units in area i , r i

denotes the remaining N i

n i units in the area i and

and ˆ i

are obtained by substituting an optimal estimate of the covariance matrix of the random effects in (1) into the best linear unbiased estimator of and the best linear unbiased predictor of u i

respectively. B attese et al. (1988) applied the nested error regression model (4.2.16) to estimate area under corn and soybeans for each of m = 12 counties in North

Central Iowa, using farm interview data in conjunction with LANDSAT satellite data.

Sinha and Rao (2009) proposed a robust version of (4.2.17) that offers good performance in presence of outlying values.

As in the area level models, model (4.2.16) can be extended to allow for correlated random area effects. Let the deviations v from the fixed part of the model be the result of an autoregressive process with parameter r

and proximity matrix W (Cressie 1993), then v

= r

Wv

+ u

Þ v

=

(

I m

r

W

) -

1 u . The matrix

(

I m

r

W

)

needs to be strictly positive definite to ensure the existence of

(

I m

r

W

) -

1

. This happens if r é

é

éé max

1

( ) , min

1

( )

é

éé

where l i

’s are the eingevalues of matrix W . The model with spatially correlated errors can be expressed as

(4.2.18) with independent of v . Under (4.2.18), the Spatial Best Linear Unbiased Predictor (Spatial

BLUP) of the small area mean and its empirical version (SEBLUP) are obtained following

Henderson (1975). In particular, the SEBLUP of the small area mean is:

J ˆ i

MX / SAR =

N i

-

1

é

é

é j

é

é s i y ij

+ j

é

é r i

ˆ y ij

é

é

é

(4.2.19) where , , ,

ˆ = s

2 e

I n

+

DGD

T

,

ˆ = s ˆ u

2

éé (

I m

r

W

T

) (

I m

r

W

) éé -

1

, y is the n

´

1

(

2 u

2 e

) are asymptotically consistent estimators of the parameters obtained by Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) estimation and b i

T is 1

´ m vector

(

0,0, … ,1, … ,0

) with value 1 in the i -th position. For the MSE of the predictors (4.2.17) and (4.2.19) see Prasad and

Rao (1990), Singh et al. (2005) and Pratesi and Salvati (2008) for details. In sampling rare populations there are often many out of sample areas: that is areas where there are not sampled units even if in those areas there are population units with the characteristic of interest. For example in

Farm Structure Survey in Italy it happens that, while in some municipalities there are multifunctional farms, there are no multifunctional farms among the sampled units in the municipalities.


 The out of sample areas problem can be addressed specifying a nested error unit level regression model with dependent area level random effects. Allowing area random effects to be spatially correlated, the Empirical Best Linear Unbiased Predictions for the area parameters can be computed, taking into account also the contribution of the random part of the model, for sampled areas as well as out of sample areas (Saei and Chambers 2005).

An alternative approach for incorporating the spatial information in the model is by assuming that the regression coefficients vary spatially across the geography of interest. Models of this type can be fitted using geographical weighted regression (GWR), and are suitable for modelling spatial nonstationarity (Brunsdon et al. 1998; Fotheringham et al. 2002). Chandra et al. (2012) proposed a geographical weighted empirical best linear unbiased predictor (GWEBLUP) for a small area average and an estimator of its conditional mean squared error.

4.2.2.3 Unit level M-quantile models for small area estimation

A recently proposed approach to small area estimation is based on the use of M-quantile models

(Chambers and Tzavidis, 2006). A linear M-quantile regression model is one where the q th

Mquantile Q q

( x ; y

) of the conditional distribution of y given x satisfies

(4.2.20)

Here y denotes the influence function associated with the M-quantile. For specified q and continuous y

, an estimate of is obtained via iterative weighted least squares.

Following Chambers and Tzavidis (2006) an alternative to random effects for characterizing the variability across the population is to use the M-quantile coefficients of the population units.

For unit j with values y ij

and x ij

, this coefficient is the value q ij

such that Q q ij

( x ij

; y

)

= y ij

. These authors observed that if a hierarchical structure does explain part of the variability in the population data, units within clusters (areas) defined by this hierarchy are expected to have similar M-quantile

coefficients.

When the conditional M-quantiles are assumed to follow a linear model, with smooth function of q , this suggests a predictor of small area parameter of the form

a sufficiently

(4.2.21) where q ˆ i

is an estimate of the average value of the M-quantile coefficients of the units in area i .

Typically this is the average of estimates of these coefficients for sample units in the area. These q ij

( x ij

; y

)

= y ij

denoting the estimated value of

(4.2.20) at q . When there is no sample in area i then q ˆ i

=

0.5

.

Tzavidis et al. (2010) refer to (4.2.21) as the ‘naïve’ M-quantile predictor and note that this can be biased. To rectify this problem these authors propose a bias adjusted M-quantile predictor of the small area parameter that is derived as the mean functional of the Chambers and Dunstan (1986)

(CD hereafter) estimator of the distribution function and is given by

(4.2.22)

Note that, under simple random sampling within the small areas, (4.2.22) is also derived from the expected value functional of the area i version of the Rao et al. (1990) distribution function estimator, which is a design-consistent and model-consistent estimator of the finite population distribution function.

SAR mixed models are global models i.e. with such models we assume that the relationship we are modelling holds everywhere in the study area and we allow for spatial correlation at different hierarchical levels in the error structure. One way of incorporating the spatial structure of the data in the M-quantile small area model is via an M-quantile GWR model (Salvati et al. 2012). Unlike

SAR mixed models, M-quantile GWR models are local models that allow for a spatially nonstationary process in the mean structure of the model.

Given n observations at a set of L locations { h l

; l

=

1,

¼

, L ; L

£ n } with n l

data values

{( y jl

, x jl

); j

=

1,

¼

, n l

} observed at location h l

, an M-quantile GWR model is defined as

, (4.2.23) where now varies with h as well as with q . The M-quantile GWR is a local model for the entire conditional distribution -not just the mean- of be obtained by solving y given x . Estimates of in (4.2.23) can

, (4.2.24)

where y q

( t )

=

2 y

( s

-

1 t ){ qI ( t

>

0)

+

(1

q ) I ( t

£

0)} and s is a suitable robust estimate of scale such as the median absolute deviation (MAD) estimate. It is also customary to assume a Huber type influence function although other influence functions are also possible y

( t )

= tI (

c

£ t

£ c )

+ sgn ( t ) I (| t |

> c ). Provided c is bounded away from zero, an iteratively reweighted least squares algorithm can then be used to solve (4.2.24), leading to estimates of the form

(4.2.25)

In (4.2.25) y is the vector of n sample y values and X is the corresponding design matrix of order n

´ p of sample x values. The matrix W

*

( h ; q ) is a diagonal matrix of order n with entries corresponding to a particular sample observation and equal to the product of this observation's spatial weight, which depends on its distance from location h , with the weight that this observation has when the sample data are used to calculate the ‘spatially stationary’ M-quantile estimate

.

At this point we should mention that the spatial weight is derived from a spatial weighting function whose value depends on the distance from sample location h l

to h such that sample observations with locations close to u receive more weight than those further away. One popular approach to defining such a weighting function is to use w ( h l

, h )

= exp

{ -

0.5( d

( h l

, h )

/ b )

2 } where d

( h l

, h )

denotes the Euclidean distance between h l

and h and b is the bandwidth, which can be optimally defined using a least squares criterion (Fotheringham et al. 2002). It should be noted, however, that alternative weighting functions, for example the bi-square function, can also be used.

Following Chambers and Tzavidis (2006) the estimation of small area parameters can be done by first estimating the M-quantile GWR coefficients { q ij

; ij

Î s } of the sampled population units without reference to the small areas of interest. A grid-based interpolation procedure for doing this under (4.2.20) is described in Chambers and Tzavidis (2006) and can be directly used with the Mquantile GWR model.

In particular, we adapt this approach with M-quantile GWR models by first defining a fine grid of q values over the interval

( )

and then use the sample data to fit (4.2.23) for each distinct value of q on this grid and at each sample location. The M-quantile GWR coefficient for unit j in area i with values y ij

and x ij

at location h ij

is computed by interpolating over this grid to find the value q ij such that Q q ij

( x ij

; y

, h ij

)

= y ij

.

Provided there are sample observations in area i , an area-specific M-quantile GWR coefficient, q ˆ i

, can be defined as the average value of the sample M-quantile GWR coefficients in area i .

Following Salvati et al. (2012) the bias-adjusted M-quantile GWR predictor of the mean in small area i is

J i

MQGWR / CD =

N i

-

1 [ j

é

é s i y ij

+ j

é

é r i

ˆ q ˆ i

( x ij

; y

, h ij

)

+

N i n i

n i j

é

é s i

{ y ij

q ˆ i

( x ij

; y

, h ij

) }

],

(4.2.26) q ˆ i

( x ij

; y

, h ij

)is defined either via the model (4.2.23). For details on the MSE estimator of predictors (4.2.26) see Salvati et al. (2012). This MSE estimator is obtained following Chambers et al. (2011) who proposed a method of mean squared error (MSE) estimation for estimators of finite population domain means that can be expressed in pseudo-linear form, i.e.

, as weighted sums of sample values. In particular, it can be used for estimating the MSE of the empirical best linear unbiased predictor, the model-based direct estimator and the M-quantile predictor.

The proposed outlier robust small area estimators can be substantially biased when outliers are drawn from a distribution that has a different mean from that of the rest of the survey data.

Chambers et al. (2013) proposed an outlier robust bias correction for these estimators. Moreover the authors proposed two different analytical mean-squared error estimators for the ensuing biascorrected outlier robust estimators.

4.2.2.4 Unit level nonparametric small area models

Although very useful in many estimation contexts, linear mixed models depend on distributional assumptions for the random part of the model and do not easily allow for outlier robust inference.

In addition, the fixed part of the model may not be flexible enough to handle estimation contexts in which the relationship between the variable of interest and some covariates is more complex than a linear model. Opsomer et al. (2008) usefully extend model (4.2.16) to the case in which the small area random effects can be combined with a smooth, non-parametrically specified trend. In particular, in the simplest case

(4.2.27) where m (

×

) is an unknown smooth function of the variable x

1

. The estimator of the small area mean is

J ˆ i

NPMX =

N i

-

1

é

é

é

é j

é s i y ij

+

é j

é r i

ˆ ij

é

é

é

(4.2.28) as in (4.2.17) where y ij

= ˆ

( x

1 ij

)

+ d ij u i

. By using penalized splines as the representation for the nonparametric trend, Opsomer et al. (2008) express the non-parametric small area estimation problem as a mixed effect model regression. The latter can be easily extended to handle bivariate smoothing and additive modelling. Ugarte et al. (2009) proposed a penalized spline model is considered to analyse trends in small areas and to forecast future values of the response. The prediction mean

squared error (MSE) for the fitted and the predicted values together with estimators for those quantities were derived.

Pratesi et al. (2008) have extended this approach to the M-quantile method for the estimation of the small area parameters using a nonparametric specification of the conditional M-quantile of the response variable given the covariates. When the functional form of the relationship between the q th M-quantile and the covariates deviates from the assumed one, the traditional M-quantile regression can lead to biased estimators of the small area parameters. Using P-splines for Mquantile regression, beyond having the properties of M-quantile models, allows for dealing with an undefined functional relationship that can be estimated from the data. When the relationship between the q -th M-quantile and the covariates is not linear, a P-splines M-quantile regression model may have significant advantages compared to the linear M-quantile model.

The small area estimator of the mean may be taken as in (4.2.21) where the unobserved value for population unit j

Î r i

is predicted using

where b ˆ y

( q i

) and n ˆ y

( q i

) are the coefficient vectors of the parametric and spline portion, respectively, of the fitted P-splines M-quantile regression function at q i

. In case of P-splines Mquantile regression models the bias-adjusted estimator for the mean is given by

J i

NPMQ / CD

=

1

N i

{ é

ˆ + j

é

U i y ij

N i n i

é j

é s i

( y ij

ˆ ij

)

}

, (4.2.29) where y ij

denotes the predicted values for the population units in s i

and in U i

. The use of bivariate

P-spline approximations to fit nonparametric unit level nested error and M-quantile regression models allows for reflecting spatial variation in the data and then uses these nonparametric models for small area estimation.

4.2.2.5 A note on small area estimation for out of sample areas

In some situations we are interested in estimating small area characteristics for domains (areas) with no sample observations. The conventional approach to estimating a small area characteristic, say the mean, in this case is synthetic estimation.

Under the mixed model (4.2.16) or the SAR mixed model (4.2.18) the synthetic mean predictor for out of sample area i is

(4.2.30) where .Under M-quantile model (4.2.20) the synthetic mean predictor for out of sample

area i is

(4.2.31)

We note that with synthetic estimation all variation in the area-specific predictions comes from the area-specific auxiliary information. One way of potentially improving the conventional synthetic estimation for out of sample areas is by using a model that borrows strength over space such as an

M-quantile GWR model and nonparametric unit level nested error and M-quantile regression models. In this case a synthetic-type mean predictors for out of sample area i are defined by

J i

MQGWR / SYNTH =

N i

-

1 j

é

( x ij

é

U i

Q

0.5

; y

, h ij

) (4.2.32)

J i

NPMX / SYNTH =

N i

-

1

é

( x

1 ij j

é

U i

) (4.2.33)

. (4.2.34)

4.2.2.6 A note on Bayesian small area estimation methods

Bayesian alternatives of both the non-spatial and spatial mixed effects models for small area estimation have been proposed (see, for example, Datta and Ghosh 1991; Ghosh et al. 1998; Datta and Ghosh 2012; Rao, 2003 for a review). In particular, Bayesian small area spatial modelling has already been successful in other similar contexts, such as the estimation of the rate of disease in different geographic regions (Best et al. 2005). Complex mixed-effects and correlation between areas can be easily handled and modelled hierarchically in different layers of the model.

Although implementation of complex Bayesian models requires computationally intensive Markov

Chain Monte Carlo simulation algorithms (Gilks et al. 1995), there are a number of potential benefits of the Bayesian approach for small area estimation. Gomez-Rubio et al. (2010) present these advantages:

1) it offers a coherent framework that can handle different types of target variable (e.g. continuous, dichotomous, categorical), different random effects structures (e.g. independent, spatially correlated), areas with no direct survey information, models to smooth the survey sample variance estimates, and so on, in a consistent way using the same computational methods and software whatever the model.

2) Uncertainty about all model parameters is automatically captured by the posterior distribution of the small area estimates and any functions of these (such as their rank), and by the predictive distribution of estimates for small areas not included in the survey sample.

3) Bayesian methods are particularly well suited to sparse data problems (for example, when the survey sample size per area is small) since Bayesian posterior inference is exact and

does not rely on asymptotic arguments.

4) The posterior distribution obtained from a Bayesian model also provides a much richer output than the traditional point and interval estimates from a corresponding likelihoodbased model. In particular, the ability to make direct probability statements about unknown quantities - for example, the probability that the target variable exceeds some specified threshold in each area - and to quantify all sources of uncertainty in the model, make

Bayesian small area estimation well suited to informing and evaluating policy decisions.

4.2.3 Small area model for binary and count data

Let y ij

be the value of the outcome of interest, a discrete or a categorical variable, for unit j in area i , and let x ij

denote a p

´

1 vector of unit level covariates (including an intercept). Working within a frequentist paradigm, one can follow Jiang and Lahiri (2001) who propose an empirical best predictor (EBP) for a binary response, or Jiang (2003) who extends these results to generalized linear mixed models (GLMMs). Nevertheless, use of EBP can be computationally challenging

(Molina and Rao, 2010). Despite their attractive properties as far as modelling non-normal outcomes is concerned, fitting GLMMs requires numerical approximations. In particular, the likelihood function defined by a GLMM can involve high-dimensional integrals which can- not be evaluated analytically (see Mc Culloch 1994, 1997; Song et al. 2005). In such cases numerical approximations can be used, as for example in the R function glmer in the package lme4 .

Alternatively, estimation of the model parameters can be obtained by using an iterative procedure that combines Maximum Penalized Quasi-Likelihood (MPQL) and REML estimation (Saei &

Chambers 2003). Furthermore, estimates of GLMM parameters can be very sensitive to outliers or departures from underlying distributional assumptions. Large deviations from the expected response as well as outlying points in the space of the explanatory variables are known to have a large influence on classical maximum likelihood inference based on generalized linear models (GLMs).

For discrete outcomes, model-based small area estimation conventionally employs a GLMM for m ij

=

E [ y ij

| u i

] of the form

(4.2.35) where g is a link function. When y ij

is binary-valued a popular choice for g is the logistic link function and the individual y ij

values in area i are taken to be independent Bernoulli outcomes with m ij

=

E [ y ij

| u i

]

=

P ( y ij

=

1| u i

)

= exp { h ij

}(1

+ exp { h ij

})

-

1 and Var [ y ij

| u i

]

= m ij

(1

m ij

) . When y ij

is a count outcome the logarithmic link function is

commonly used and the individual y ij

values in area i are assumed to be independent Poisson random variables with m ij

=

E [ y ij

| u i

]

= exp{ h ij

} and Var[ y ij

| u i

]

= m ij

. The q -dimensional vector u i

is generally assumed to be independently distributed between areas according to a normal distribution with mean 0 and covariance matrix

S u

This matrix depends on parameters which are referred to as the variance components and in (4.2.35) is the vector of fixed effects. If the target of inference is the small area i mean (proportion), y i

=

N i

-

1 j

å

Î

U i y ij

and the Poisson or Bernoulli GLMM is assumed, the approximation to the minimum mean squared error predictor of y i

is N i

-

1

å

å

å

å j

å s i y ij

+

å j

å r i m ij

å

å . Since

å m ij depends on and u i

, a further stage of approximation is required, where unknown parameters are replaced by suitable estimates. This leads to the Conditional Expectation Predictor (CEP) for the area i mean (proportion) under logarithmic or logistic, where m ij

= h ij

} or ˆ ij

= h ij

J i

CEP =

N i

-

1

{ é j

é s i y ij

+ é j

é r i

ˆ ij

}

} 1

+ h ij

}

éé -

1

,

(4.2.36)

, is the vector of the estimated fixed effects and u i

denotes the vector of the predicted area-specific random effects. We refer to (4.2.36) in this case as a ‘random intercepts’ CEP. For more details on this predictor, including estimation of its MSE, see Saei and Chambers (2003), Jiang and Lahiri (2006a) and

Gonzalez-Manteiga et al. (2007). Note, however, that (4.2.36) is not the proper Empirical Best

Predictor by Jiang (2003). The proper EBP does not have closed form and needs to be computed by numerical approximations. For this reason, the CEP version (4.2.36) is used in practice as is the case with the

small area estimates of Labour Force activity currently produced by ONS in the

UK.

4.3 Geostatistical methods Equation Chapter 4 Section 3

Geostatistics is concerned with the problem of producing a map of a quantity of interest over a particular geographical region based on, usually noisy, measurement taken at a set of locations in the region. The aim of such a map is to describe and analyze the geographical pattern of the phenomenon of interest. Geostatistical methodologies are born and apply in areas such as environmental studies and epidemiology, where the spatial information is traditionally recorded and available. In the last years the diffusion of spatially detailed statistical data is considerably increased

and these kind of procedures - possibly with appropriate modifications - can be used as well in other fields of application, for example to study demographic and socio- economic characteristics of a population living in a certain region.

Basically, to obtain a surface estimate one can exploit the exact knowledge of the spatial coordinates (latitude and longitude) of the studied phenomenon by using bivariate smoothing techniques, such as kernel estimate or kriging. Bivariate smoothing deals with the flexible smoothing of point clouds to obtain surface estimates that can be used to produce maps. The geographical application, however, is not the only use of bivariate smoothing as the method can be applied to handle the non-linear relation between any two continuous predictors and a response variable. (Cressie, 1993; Ruppert et al., 2003) Also kriging, a widely used method for interpolating or smoothing spatial data, has a close connection with penalized spline smoothing: the goals of kriging sound very much like nonparametric regression and the understanding of spatial estimates can be enriched through their interpretation as smoothing estimates (Nychka, 2000).

However, usually the spatial information alone does not properly explain the pattern of the response variable and we need to introduce some covariates in a more complex model.

4.3.1 Geoadditive models

G eoadditive models, introduced by Kammann and Wand (2003), answer this problem as they analyze the spatial distribution of the study variable while accounting for possible non-linear covariate effects. These models analyse the spatial distribution of the study variable while accounting for possible covariate effects through a linear mixed model representation. The first half of the model formulation involves a low rank mixed model representation of additive models; then, incorporation of the geographical component is achieved by expressing kriging as a linear mixed model and merging it with the additive model to obtain a single mixed model, the geoadditive model (Kammann and Wand, 2003; on kriging see also part 1 of this report).

The model is specified as

(4.3.1) where in the first part of the model s i

, t i

and y i represents measurements on two predictors s and t and a response variable y for unit i , f and g are smooth, but otherwise unspecified, functions of s and t respectively; the second part of the model is the simple universal kriging model with

x i representing the geographical location,

{

S ( x ) : x

Î 2

Î

2

}

is a stationary zero-mean stochastic process.

Since both the first and the second part of model (4.3.1) can be specified as a linear mixed model, also the whole model (4.3.1) can be formulated as a single linear mixed model that can be fitted

using standard mixed model software. Thus, we can say that in a geoadditive model the linear mixed model structure allows to include the area-specific effect as an additional random components. In particular, a geoadditive SAE model has two random effect components: the areaspecific effects and the spatial effects (Bocci, 2009). See Kammann and Wand (2003) for more details on geoadditive model specifications. Having a mixed model specification, geoadditive models can be used to obtain small area estimators under a non-parametric approach (Opsomer at al., 2008; see also part 4.2 of this report).

In this respect, Bocci et al. (2012) a two-part geoadditive small area estimation model to estimate the per farm average grapevine production, specified as a semicontinuous skewed variable, at Agrarian Region level using data from the fifth Italian Agricultural Census. More in detail, the response variable, assumed to have a significant spatial pattern, has a semicontinuous structure, which means that the variable has a fraction of values equal to zero and a continuous skewed distribution among the remaining values; thus, the variable can be recorded as

I ij

= å

åå

1 if y ij

>

0

0 if y ij

=

0

(4.3.2) and y

' ij

= å

åå y ij if y irrelevant f y ij ij

>

0

=

0

. (4.3.3)

For this variable, Bocci et al. (2012) specify two uncorrelated geoadditive small area models, one for the logit probability of

I response

E log

( )

| I ij

=

1

å . ij

=

1

and one for the conditional mean of the logarithm of the

Another extension to the work of Kammand and Wand (2003) is the geoadditive model proposed by Cafarelli and Castrignanò (2011), used to analyse the spatial distribution of grain weight, a commonly used indicator of wheat production, taking into account its nonlinear relations with other crop features.

4.3.2 Area-to-point kriging

Under a more general approach of changing of support for spatial data, Kyriakidis (2004) proposed an area-to-point interpolation technique that is a special case of kriging. This technique is a geostatistical framework that can explicitly and consistently account for the support differences between the available areal data and the sought point predictions, yields

coherent (i.e. mass-preserving or pycnophylactic) predictions. Under this approach, a variable

d

( u )

at point with coordinate vector u is considered a realization of a random variable

D

The collection of spattially correlated random variables

{

D ( u ), u

Î

A

( u )

}

, where A denotes the

. study region, is termed a random function.

D

( u )

consists of two parts: a deterministic component

m

( u )

, indicating the geographical trend or drift, and a stochastic residual component

R

( u )

, autocorrelated in space:

D

( u )

= m ( u )

+

R ( u ). (4.3.4)

Given a lag vector h, the expected difference between

R

( u )

and

R

( u

+ h )

is 0. The variance of the difference is

Var

{

éé

R ( u

+ h )

-

R ( u )

éé

}

=

E

{

éé

R ( u

+ h )

-

R ( u )

éé 2

}

=

2 g

R

( h ) where g

R

( h )

is the so-called semivariogram of residuals. Considering the variance

C

R

(0)

= var

{ }

, the covariance between

R

( u )

and

R

( u

+ h )

is given by

C

R

( h )

=

{

( u

+ h )

×

R ( u )

}

=

C

R

(0)

g r

( h ) .

Kyriakidis (2004) considers kriging of two types of data pertaining to the same attribute: the target residual

r

( u )

, of point support and partially sampled, and the source residual r ( v u

)

, providing ancillary information and defined on a areal support. This kriging is called area-topoint residual kriging. Under this approach, any source residual r ( v u

)

is functionally linked to the point residual within it as r ( v u

)

N u i

å w

=

1 u i r ( u i

), u i

Î v u and

å w =

1 i u i

(4.3.5) where

N u

is the number of points within areal unit v u

, w u i

is the weight associated with r ( u i

)

, assumed to be known. On different specifications and applications of area-to-point kriging see also Yoo and Kyriakidis (2006), Liu et al. (2008), Yoo and Kyriakidis (2009), Goovaerts (2010).

References

Anselin, L. (1992): Spatial Econometrics: Method and Models. Kluwer Academic Publishers,

Boston.

Battese, G., Harter, R. and Fuller, W. (1988): An Error-Components Model for Prediction of

County Crop Areas Using Survey and Satellite Data. Journal of the American Statistical

Association, 83, 28-36.

Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2004): Hierarchical Modeling and Analysis for Spatial

Data. Chapman & Hall, New York.

Best, N., Richardson, S. and Thomson, A. (2005). A comparison of Bayesian spatial models for disease mapping. Statistical Methods in Medical Research 14(1), 35–59.

Bocci, C. (2009) Geoadditive models for data with spatial information. PhD Thesis, Department of

Statistics, University of Florence.

Bocci C., Petrucci A., Rocco E. (2012). Small Area Methods for Agricultural Data: A Two-Part

Geoadditive Model to Estimate the Agrarian Region Level Means of the Grapevines

Production in Tuscany. Journal of the Indian Society of Agricultural Statistics, 66(1), 135-144

Brunsdon, C., Fotheringham, A.S. and Charlton, M., (1998): Geographically weighted regressionmodelling spatial non-stationarity. Journal of the Royal Statistical Society, Series D 47 (3),

431-443.

Cafarelli B., Castrignanò A. (2011). The use of geoadditive models to estimate the spatial distribution of grain weight in an agronomic field: a comparison with kriging with external drift , Environmetrics 22, 769-780.

Chambers, R. L. (1986): Outlier robust finite population estimation. J. Am. Statist. Ass., 81, 1063-

1069.

Chambers, R. and Dunstan, P. (1986): Estimating distribution function from survey data.

Biometrika, 73, 597-604.

Chambers, R. and Tzavidis, N. (2006): M-quantile models for small area estimation. Biometrika,

93, 255-268.

Chambers, R. Chandra, H. and Tzavidis, N. (2011): On bias-robust mean squared error estimation for pseudo-linear small area estimators. Survey Methodology, 37, 153-170.

Chambers, R. Chandra, H., Salvati, N. and Tzavidis, N. (2013): Outlier robust small area estimation. J.R. Statist. Soc. B, 75, 5, 1-28.

Chandra, H., Salvati, N., Chambers, R. and Tzavidis, N. (2012): Small area estimation under spatial nonstationarity. Computational Statistics and Data Analysis, 56, 2875–2888.

Comber, A.J., Proctor, P. and Anthony S.(2007) A combined pycnophylactic-dasymetric method for disaggregating spatial data: the example of agricultural land use. Prooceedings of the

Geographical Information Science Research UK Confernce (ed. Winstanley A.C.) pp. 445-

450, National Centre for Geocomputation, National University of Ireland, Maynooth, 11th-

13th April.

Cressie, N. (1993): Statistics for spatial data, John Wiley & Sons, New York.

Datta, G. S. and Ghosh, M. (1991): Bayesian prediction in linear models: Appli- cations to small area estimation. The Annals of Statistics 19, 1748–1770.

Datta, G. and Ghosh, M. (2012) Small Area Shrinkage Estimation. Statist. Sci., 27, 95-114.

Do, V.H., Thomas ‐ Agnan C. and Vanhemsz A. (2013): Spatial reallocation of areal data ‐ a review. Toulouse School of Economics.

Eicher, C. L., Brewer, C. A. (2001): Dasymetric mapping and areal interpolation, implementation and evaluation, Cartography & Geographic Information Science, 28(2), 125-138

Estevao, V.M. and Sarndal, C.E (2004): Borrowing strength is not the best technique within a wide class of design-consistent domain estimators. Journal of Official Statistics, 20, 645-669.

European Commission (2013) Development of inventory datasets through remote sensing and direct observation data for earthquake loss estimation. Editors: Patrizia Tenerelli and Helen Crowley

Reviewers: Amir M. Kaynia. Publishing Editors: Fabio Taucer and Ufuk Hancilar

Fay, R.E. and Herriot, R.A. (1979): Estimates of income for small places: an application of James-

Stein procedures to census data. Journal of the American Statistical Association, 74, 269-277.

Flowerdew, R. and Green, M., 1992, Statistical methods for inference between incompatible zonal systems. In The Accuracy of Spatial Databases, M.F. Goodchild and S. Gopal (Eds), pp. 239–

247 (London, UK: Taylor and Francis).

Fotheringham, A.S., Brunsdon, C. and Charlton, M., (2002): Geographically Weighted Regression.

John Wiley & Sons, West Sussex.

Fuller, W.A. (1981) Regression estimation for small areas. In D.M. Gilford, G.L. Nelson and L.

Ingram (eds), Rural America in Passage: Statistics for Policy, pp. 572–586. Washington, DC:

National Academy Press.

Fuller, W.A. (1999): Environmental Surveys Over Time, Journal of Agricultural, Biological and

Environmental Statistics, 4, 331-345.

Ghosh, M. and Rao, J.N.K. (1994): Small area estimation: An appraisal (with discussion). Statist.

Sci., 9, 1, 55-93.

Ghosh, M., Natarajan, K., Stroud, T. W. F. and Carlin, B.P. (1998). Generalized linear models for small-area estimation. Journal of the American Statistical Association 93, 273–282.

Gilks, W. R., Richardson, S. and Spiegelhalter D.J. (Eds.) (1995). Markov Chain Monter Carlo in

Practice. Chapman & Hall/CRC, Boca Raton, Florida.

Giusti, C., Marchetti, S., Pratesi, M. and Salvati, N. (2012): Semiparametric Fay-Herriot model

using Penalized Splines. Journal of the Indian Society of Agricultural Statistics, 66, 1-14.

Gomez-Rubio, V., Best, N., Richardson, S., Li, G. and Clarke, P. (2010): Bayesian Statistics Small

Area Estimation. Technical Report. Imperial College London.

González-Manteiga, W., Lombarda, M., Molina, I., Morales, D. and Santamara, L. (2007):

Estimation of the mean squared error of predictors of small area linear parameters under logistic mixed model. Comp. Statist. Data Anal., 51, 2720-2733.

Goovaerts P. (2010) Combining Areal and Point Data in Geostatistical Interpolation: Applications to Soil Science and Medical Geography. Mathematical Geosciences, 42, 535–554

Gregory, I.N., Paul, S.E. (2005): Breaking the boundaries, geographical approaches to integrating

200 years of the census, Journal of the Royal Statistic Society, 168, 419- 437

Henderson, C. (1975): Best linear unbiased estimation and prediction under a selection model,

Biometrics, 31, 423-447.

Huang, Z., Ottens, H. and Masser, I. (2007): A Doubly Weighted Approach to Urban Data

Disaggregation in GIS: A Case Study of Wuhan, China. Transaction in GIS, 11, 197

Jiang, J. (2003): Empirical best prediction for small-area inference based on generalized linear mixed models. Journal of Statistical Planning and Inference 111, 117-127.

Jiang, J. and Lahiri, P. (2001): Empirical best prediction for small area inference with binary data.

Ann. Inst. Statist. Math. 53, 217–243.

Jiang, J. and Lahiri, P. (2006a): Mixed Model Prediction and Small Area Estimation. Test, 15, 1-96.

Jiang, J. and Lahiri, P. (2006b): Estimation of finite population domain means: a model-assisted empirical best prediction approach. Journal of the American Statistical Association, 101, 301–

311.

Jiang, J., Nguyen, T. and Rao J.S. (2011): Best Predictive Small Area Estimation, Journal of the

American Statistical Association, 106, 732-745.

Kammann, E.E. and Wand, M.P. (2003). Geoadditive models. Journal of Applied Statistics, 52, 1-

18

Kempen, M., Heckelei, T., Britz, W, Leip, A., Koeble, R. (2005) A Statistical Approach for Spatial

Disaggregation of Crop Production in the EU. Proceedings of the 89th EAAE Seminar,

Modelling agricultural policies: state of the art and new challenges. Faculty of Economics,

Parma.

Kim, H. and Yao, X. (2010) Pycnophylactic interpolation revisited: integration with the dasymetricmapping method. International Journal of Remote Sensing. Vol. 31, No. 21, 10 November

2010, 5657–5671

Knotters, M., Heuvelink, G.B.M., Hoogland, T. and Walvoort D.J.J. (2010): A disposition of interpolation techniques. Werkdocument 190. Wettelijke Onderzoekstaken Natuur & Milieu

Kott P. (1989): Robust small domain estimation using random effects modelling. Survey

Methodology, 15, 1-12.

Kyriadikidis, P.C. (2004): A Geostatistical Framework for Area-to-Point Spatial Interpolation.

Geographical Analysis, 36, 259-289

Lam, N. (1983) Spatial interpolation methods: a review. American Cartographer 10,129-149.

Langford, M. (2006): Obtaining population estimations in non-census reporting zones, An evaluation of the three-class dasymetric method, Computers, Environment & Urban Systems,

30, 161-180

Langford, M. (2003): Refining methods for dasymetric mapping. In Remotely Sensed Cities, V.

Mesev (Ed.), pp. 181–205 (London, UK: Taylor & Francis).

Langford M. and Harvey J.T. (2001): The Use of Remotely Sensed Data for Spatial Disaggregation of Published Census Population Counts. IEEE/ISPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas

Langford M., Maguire, D., Unwin, D. (1991): The areal interpolation problem: estimating population using remote sensing in a GIS framework. In Handling geographic information:

Methodology and potential applications edited by Masser E and Blakemore M, Longman, 55-

77.

Langford, M., Fisher, P. F. (1996): Modelling sensitivity to accuracy in classification imagery, a study of areal interpolation by dasymetric mapping, Professional Geographer, 48(3), 299-309

Langford, M., Unwin, D. J. (1994): Generating and mapping population density surfaces within a

GIS, The Cartographic Journal, 31, 21-26

Lehtonen, R. and Veijanen, A. (1999): Domain estimation with logistic generalized regression and related estimators. IASS Satellite Conference on Small Area Estimation. Riga: Latvian

Council of Science, 121-128.

Lehtonen, R., Sarndal, C.E. and Veijanen, A. (2003): The effect of model choice in estimation for domains, including small domains. Survey Methodology, 29, 33-44.

Lehtonen R. and Pahkinen, E. (2004): Practical methods for design and analysis of complex surveys. Wiley, New York.

Li, T., Pullar, D., Corcoran, J. & Stimson, R. (2007): A comparison of spatial disaggregation techniques as applied to population estimation for South East Queensland (SEQ), Australia,

Applied GIS, 3(9), 1-16

Li, J. and Heap, A.D. (2008): A Review of Spatial Interpolation Methods for Environmental

Scientists. Geoscience Australia, Record 2008/23, 137 pp.

Liu, X.H., Kyriakidis, P.C. and Goodchild M. F. (2008): Population-density estimation using regression and area-to-point residual kriging. International Journal of Geographical

Information Science, 22, 431–447

Longford, N.T. (1995). Random coefficient models. Clareton Press, London.

Marchetti, S., Tzavidis, N. and Pratesi, M. (2012): Non-parametric Bootstrap Mean Squared Error

Estimation for M-quantile Estimators of Small Area Averages, Quantiles and Poverty

Indicators., Computational Statistics & Data Analysis,vol. 56, 2889-2902.

Mehrotra, R. and Singh, R.D. (1998): Spatial disaggregation of rainfall data. Hydrological

Sciences—Journal des Sciences Hydrologiques, 43(1) February 1998

McBratney A.B. (1998) Some considerations on methods for spatially aggregating and disaggregating soil information. Nutrient Cycling in Agroecosystems 50: 51–62, 1998.

McCullogh, C. E. (1994). Maximum likelihood variance components esti- mation for binary data.

Journal of the American Statistical Association 89, 330-335.

McCullogh, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models.

Journal of the American Statistical Association 92, 162-170.

McCullogh, P. and Searle, S.R. (2001). Generalized, Linear and Mixed Models. John Wiley, New

York.

Mennis, J. (2003): Generating surface models of population using dasymetric mapping, The

Professional Geographer, 55(1), 31-42

Mennis, J. and Hultgren, T. (2006): Intelligent dasymetric mapping and its application to areal interpolation. Cartography and Geographic Information Science, 33, pp. 179–194.

Mohammed, J.I., Comber, A. and Brunsdon C. (2012): Population estimation in small areas: combining dasymetric mapping with pycnophylactic interpolation. GIS Research UK

(GISRUK) Conference, Lancaster University, 11th-13th April.

Molina, I., Salvati, N. and Pratesi, M. (2009): Bootstrap for estimating the MSE of the Spatial

EBLUP. Comp. Statist, 24, 441-458.

Molina, I. and Rao, J.N.K. (2010): Small area estimation of poverty indicators. The Canadian

Journal of Statistics, 38, 369-385.

Nychka, D. (2000): “Spatial Process Estimates as Smoothers.” Smoothing and Regression.

Approaches, Computation and Application, ed. Schimek, M. G., Wiley, New York, 393-424.

Opsomer, J. D., Claeskens, G., Ranalli, M. G., Kauermann, G., and Breidt, F. J. (2008):

Nonparametric small area estimation using penalized spline regression. Journal of the Royal

Statistical Society, Series B, 70, 265-286.

Petrucci, A., Pratesi, M., and Salvati, N. (2005): Geographic Information in Small Area Estimation:

Small Area Models and Spatially Correlated Random Area Effects. Statistics in Transition, 7,

609-623.

Petrucci, A. and Salvati, N. (2006): Small area estimation for spatial correlation in watershed

erosion assessment, Journal of Agricultural, Biological, and Environmental Statistics, 11,

169-182.

Pfeffermann, D. (2002): Small area estimation - New developments and directions. Intern. Statist.

Rev., 70, 1, 125-143.

Prasad, N. and Rao, J. (1990): The estimation of mean squared error of small_area estimators. J.

Amer. Statist. Assoc., 85, 163-171.

Prasad, N.G.N. and Rao, J.N.K. (1999): On robust small area estimation using a simple random effects model. Survey Methodology, 25, 67-72.

Pratesi, M. and Salvati, N. (2008): Small Area Estimation: the EBLUP Estimator Based on

Spatially Correlated Random Area Effects. Statistical Methods & Applications, 17, 113-141.

Pratesi, M., Ranalli, M. G. and Salvati, N. (2008). Semiparametric M-quantile regression for estimating the proportion of acidic lakes in 8-digit HUCs of the Northeastern US.

Environmetrics 19, 687–701.

Pratesi, M. and Salvati, N. (2009): Small area estimation in the presence of correlated random area effects, Journal of Official Statistics, 25, 37-53.

R DEVELOPMENT CORE TEAM (2010). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Rao, J.N.K., Kovar, J.G. and Mantel, H.J. (1990). On estimating distribution functions and quantiles from survey data using auxiliary information. Biometrika 77, 365-375.

Rao, J.N.K. (2003): Small area estimation, John Wiley & Sons, New York.

Rao, J.N.K. (2010): Small-area estimation with applications to agriculture in Benedetti, R., Bee, M.,

Espa, G. and Piersimoni, F. (Eds): Agricultural Survey Methods. John Wiley and Sons, Ltd.,

Publication, UK.

Reibel, M., Aditya, A. (2006) - Land use weighted areal interpolation, GIS Planet 2005

International Conference, Estoril, Portugal

Ruppert, D., Wand, M.P. and Carroll, R. (2003): Semiparametric Regression. Cambridge University

Press, Cambridge, New York.

Saei, A. and Chambers, R. (2003): Small area estimation under linear and generalized linear mixed models with time and area effects. In S3RI Methodology Working Papers, pp. 1–35.

Southampton: Southampton Statistical Sciences Research Institute.

Saei, A and Chambers, R. (2005): Empirical Best Linear Unbiased Prediction for Out of Sample

Areas, Working Paper M05/03, Southampton Statistical Sciences Research Institute,

University of Southampton.

Salvati, N. (2004): Small Area Estimation by Spatial Models: the Spatial Empirical Best Linear

Unbiased Prediction (Spatial EBLUP). Working Paper n 2004/04, 'G. Parenti' Department of

Statistics, University of Florence.

Salvati, N., Tzavidis, N., Pratesi, M. and Chambers, R. (2012): Small area estimation via Mquantile geographically weighted regression. TEST, 21, 1-28.

Sarndal, C.E. (1984): Design-consistent versus model-dependent estimation for small domains.

Journal of the American Statistical Association, 79, 624–631.

Searle, S.R., Casella, G. and McCullogh, P. (1992): Variance Components. John Wiley, New York.

Shu, Y. and Lam N.S.N. (2011): Spatial disaggregation of carbon dioxide emissions from road traffic based on multiple linear regression model. Atmospheric Environment 45 (2011) 634-

640

Singh, B.B., Shukla, G.K., and Kundu, D. (2005): Spatio-Temporal Models in Small Area

Estimation. Survey Methodology, 31, 183-195.

Schmid, T., Münnich, R. (2013): Spatial Robust Small Area Estimation, Statistica Papers. DOI:

10.1007/s00362-013-0517-y, accepted.

Silvan-Cardenas J.L., Wang, L., Rogerson, P., WU, C., FENG, T., and KAMPHAUS B.D. (2010):

Assessing fine-spatial-resolution remote sensing for small-area population estimation.

International Journal of Remote Sensing Vol. 31, No. 21, 10 November 2010, 5605–5634

Singh, M.P., Gambino, J., and Mantel, H.J. (1994): Issues and Strategies for Small Area Data,

Survey Methodology, 20, 3-22.

Sinha, S.K. and Rao, J.N.K. (2009): Robust small area estimation. The Canadian Journal of

Statistics, 37, 381-399.

Song, P. X., Fan, Y. and Kalbfleisch, J. (2005): Maximization by parts in likelihood inference (with discussion). Journal of the American Statistical Association 100, 1145-1158.

Tassone E.C., Miranda, M.L. and Gelfand A.E. (2010): Disaggregated spatial modelling for areal unit categorical data. J R Stat Soc Ser C Appl Stat. 2010 January 1; 59(1): 175–190.

Tobler, W. (1979) Smooth Pycnophylactic Interpolation for Geographical Regions. Journal of the

American Statistical Association, 74, 519-529.

Turner, A. and Openshaw, S (2001) Disaggregative Spatial Interpolation. Paper presented at the

GISRUK 2001 conference, Glamorgan, Wales, April, 2001.

Tzavidis, N., Marchetti, S. and Chambers, R. (2010): Robust estimation of small-area means and quantile. Australian and New Zealand Journal of Statistics, 52, 167-186.

Ugarte, M.D., Goicoa, T., Militino, A.F. and Durban, M. (2009): Spline smoothing in small area trend estimation and forecasting. Computational Statistics & Data Analysis, 53, 3616-3629.

Yoo, E.H. and Kyriakidis P. C. (2009): Area-to-point Kriging in spatial hedonic pricing models.

Journal of Geographical Systems, 11, 381-406

Yoo, E.H. and Kyriakidis P. C. (2006): Area-to-point Kriging with inequality-type data Journal of

Geographical Systems, 8, 357–390

You, Y. and Rao, J.N.K. (2002): A pseudo-empirical best linear unbiased prediction approach to small area estimation using survey weights. The Canadian Journal of Statistics, 30, 431-439.

Yuan, Y., Smith, R.M. and Limp, W.F., (1997): Remodeling census population with spatial information from Landsat TM imagery. Computers, Environment and Urban Systems, 21, pp.

245–258.

Wang, J. and Fuller, W.A. (2003): Mean Squared Error of Small Area Predictors Constructed With

Estimated Area Variances, Journal of the American Statistical Association, 92, 716-723.

Welsh, A. H. and Ronchetti, E. (1998): Bias-calibrated estimation from sample surveys containing outliers. J. R. Statist. Soc. B, 60, 413-428.

Wolter, K. (1985): Introduction to Variance Estimation. New York: Springer-Verlag.

Wright, J.K. (1936): A method of mapping densities of population. The Geographical Review, 26,

103-110.

Wu, S., Qiu, X. and Wang L. (2005): Population Estimation Methods in GIS and Remote Sensing:

A Review. GIScience and Remote Sensing, 2005, 42, No. 1, p. 58-74.

Conclusions and identified gaps

As we have seen in this report, integration, aggregation and disaggregation of multi-sourced spatial data and their interoperability have received much attention in the last few years.

Summarizing the main findings we want to put in evidence three issues, which emerge in many of the contributions analysed:

Statistical data integration, aggregation and disaggregation involve combining information from different administrative and/or survey sources to provide new datasets for statistical and research purposes.

Analysis of these datasets offers valuable opportunities to investigate more complex and expanded policy and research questions than would be possible using only separate, unlinked data sources. The process can produce new official statistics to inform society and produce evidence on agro-environmental phenomena, such as statistics based on analysis of longitudinal and small area data obtained exploiting spatial data.

Data integration can reduce the need for costly collections by better leveraging existing data to meet current and emerging information requirements. Maximising the use of existing data, rather than establishing new collections, avoids additional load on respondents, helps to ensure cost-effectiveness and can improve timeliness.

Assuring statistical quality of data integration, aggregation and disaggregation is therefore a key strategy for maximising governments’ investments in existing information assets.

However there are problems to solve. First of all we note that the whole issue needs more than technical tools and considerations. In fact, due to the diversity of data providers, institutional, social, legal and policy requirements must also be taken into consideration in order to achieve effective integration and interoperability of technical solutions. This is especially true in developing countries.

In this section we do not envision solutions to these legal and policy requirements. Our goal here is to envision statistical problems emerging in the previous issues and to list consequent gaps for possible methodological developments. At the state of the art we individuate the following topics for further developments:

1. Measurement errors in geostatistical models . As it is known, geostatistics is concerned with the problem of producing a map of a quantity of interest over a particular geographical region, based on

(usually noisy) measurements taken at a set of locations in the region. Including a measurement error component in the auxiliary variable is a tool that can help inferences from models for reported areas, also with regards to systematic bias based area measurement. Many of the models developed for integration and disaggregation (say SAE models) have still to be generalized to include the possible measurement errors. Particularly the M-quantile regression models still need this extension.

2. Missing values in spatial data and in auxiliary variables . The patterns of missingness in spatial data (as collected by GPS-based methods or remote sensing methods) and the investigation of their implications for land productivity estimates and the inverse scale-land productivity relationship constitute a very important issue. Using Multiple Imputation (MI) can constitute a useful, and still not completely explored tool, to face with the problem in agro-environmental studies.

3. Developments in small area estimation models in agro-environmental studies . Small area estimation models can afford many of the problems in data disaggregation. Very important is the strength to be borrowed by valuable auxiliary information obtained exploiting spatial data and combining them with study variables coming from sample surveys and censuses

4

. We highlight these enhancements:

Models for space (and time) varying coefficients. That is model allowing the coefficients to vary as smooth functions of the geographic coordinates. These could increase the efficiency of the SAE estimates identifying local stationarity zones. Extensions are possible for multivariate study variables.

Models when the auxiliary variables are measured with error (see previous topic 1). This means trying to take into account this non-sampling error component when measuring the mean squared error of the area estimators, improving the measure of their accuracy.

Theory for “zero inflated” SAE models (some zeros in the data that alter the estimated parameters) as this is a common situation in survey data in agro-environmental field.

Benchmarking and neutral shrinkage of SAE models. That is taking into account the survey weights (if any) in spatial SAE models to benchmark to known auxiliary totals.

Multiple frame SAE modelling. When auxiliary data come from several areas or list frames and units appear in different frames SAE modelling could take advantage of the multiple information and in any case should take into consideration how the linkage of the information affect the accuracy of the estimates. This in comparison with the alternative of using only separate, unlinked data sources.

4. The statistical treatment of the so-called COSPs in SAE contex t. Many of the concepts interlinked with the modifiable area unit problem and other change of support problems have still to be solved and there is no agreement in the literature over the precise scope of its implications and their predictability in statistical inference. Particularly in case of data disaggregation via SAE models the problem has not yet clearly disentangled.

4

For instance World bank Living Standards Measurement Study - Integrated Surveys on Agriculture (LSMS-ISA) is a

$19 million household survey project established by the Bill and Melinda Gates Foundation and implemented by the

Living Standards Measurement Study (LSMS) within the Development Research Group at the World Bank. The primary objective of the project is to foster innovation and efficiency in statistical research on the links between agriculture and poverty reduction in the region.

Download