INTERPOLATION Procedure to predict values of attributes at unsampled points within the region sampled Why? Examples: -Can not measure all locations: - temperature - acid rain deposition - soil characteristics - mining: gold deposits •Time •Money •Impossible (physically, legally) - Changing cell size - Missing/unsuitable data - Past date (e.g. temperature) Spatial Sampling: - Gather observations representative of spatial distribution of variable of interest. Interpolation: Use those sample points to predict values of variable of interest at all other unsampled locations. Sampling methods evaluated here: - Systematic Sampling - Random Sampling - Cluster Sampling - Adaptive Sampling Systematic sampling pattern - Easy - Samples spaced uniformly at fixed X, Y intervals - Parallel lines Advantages - Easy to understand Disadvantages - All receive same attention - Difficult to stay on lines - May be biased Random Sampling -Select random points Advantages - Less biased (unlikely to match pattern in landscape) Disadvantages - Does nothing to distribute samples in areas of high variation - Difficult to explain, location of points may be a problem Cluster Sampling Cluster centers are established (random or systematic) Samples arranged around each center Advantages Reduced travel time Less costly Disadvantages Less representative sampling Adaptive sampling - Higher density sampling where the feature of interest is more variable. - Requires some method of estimating feature variation Advantages -Often efficient as large homogeneous areas have few samples reserving more for areas with higher spatial variation. Disadvantages - If no method of identifying where features are most variable then you need to make several sampling visits; Changes in sample density can not be done on the spot Spatial Sampling: - Gather observations representative of spatial distribution of variable of interest. Interpolation: - Use those sample points to predict values of variable of interest at all other unsampled locations. Interpolation methods evaluated here: - Thiessen Polygons - Fixed-radius – Local Averaging - Inverse Distance Weighted - Trend Surface - Splines - Kriging INTERPOLATION Many different methods All methods use location and value at sampling locations to estimate the variable of interest at unmeasured locations Methods differ in weighting and number of observations used Each method produces different results (even with same data) No method best for every application Accuracy is often judged by withheld sample points (difference between the measured and interpolated values) INTERPOLATION Usually used for point-to-raster data - Some methods produce contour lines (vector lines of uniform value) Raster surface •Values are measured at a set of sample points •Raster layer boundaries and cell dimensions established •Interpolation method estimate the value for the center of each unmeasured grid cell Contour Lines: Iterative process •From the sample points estimate points of a value (e.g. 10° C) •Connect these points to form a line •Estimate the next value (e.g. 20 ° C), creating another line with the restriction that lines of different temperatures do not cross. Example Base Elevation contours Sampled locations and values INTERPOLATION Thiessen Polygon Assigns interpolated value equal to the value found at the nearest sample location Conceptually simplest method Only one point used (nearest) Often called nearest sample or nearest neighbor Thiessen Polygon Start: 1) 3 1 1. Draw lines connecting the points to their nearest neighbors. 2 5 4 2. Find the bisectors of each line. 3. Connect the bisectors of the lines and assign the resulting polygon the value of the center point 2) 3) Sampled locations and values Thiessen polygons INTERPOLATION Thiessen Polygon Advantage: - Ease of application - Appropriate for discrete (i.e., categorical) variables Disadvantages: - Accuracy depends largely on sampling density - Boundaries often odd shaped at transitions - Continuous variables often not well represented INTERPOLATION Fixed-Radius – Local Averaging More complex than nearest sample Cell values estimated based on the average of nearby samples Samples used depend on search radius (any sample found inside the circle is used in average, outside ignored) •Specify output raster grid •Fixed-radius circle is centered over a raster cell Circle radius typically equals several raster cell widths (causes neighboring cell values to be similar) Several sample points used Some circles many contain no points Search radius important; too large may smooth the data too much INTERPOLATION Fixed-Radius – Local Averaging INTERPOLATION Fixed-Radius – Local Averaging INTERPOLATION Fixed-Radius – Local Averaging INTERPOLATION Inverse Distance Weighted (IDW) Estimates the values at unknown points using the distance and values to nearby know points (IDW reduces the contribution of a known point to the interpolated value) Weight of each sample point is an inverse proportion to the distance. The further away the point, the less the weight in helping define the unsampled location INTERPOLATION Inverse Distance Weighted (IDW) Zi is value of known point Dij is distance to known point Zj is the unknown point n is a user selected exponent (often 1,2 or 3) Any number of points may be used up to all points in the sample; typically 3 or more INTERPOLATION Inverse Distance Weighted (IDW) INTERPOLATION Inverse Distance Weighted (IDW) Factors affecting interpolated surface: •Size of exponent, n affects the shape of the surface (larger n means the closer points are more influential) •A larger number of sample points results in a smoother surface INTERPOLATION Inverse Distance Weighted (IDW) INTERPOLATION Inverse Distance Weighted (IDW) INTERPOLATION Trend Surface Interpolation Fitting a statistical model, a trend surface, through the measured points. (typically polynomial) Where Z is the value at any point x Where ais are coefficients estimated in a regression model INTERPOLATION Trend Surface Interpolation INTERPOLATION Splines Name derived from the drafting tool, a flexible ruler, that helps create smooth curves through several points Spline functions (also called splines) are use to interpolate along a smooth curve. (similar to the flexible ruler) Force a smooth line to pass through a desired set of points Constructed from a set of joined polynomial functions INTERPOLATION : Splines INTERPOLATION Kriging A statistically based estimator of spatial variables Components: •Spatial trend (an increase/decrease in a variable that depends on direction, e.g. temperature may decrease toward the northwest) •Autocorrelation (the tendency for points near each other to have similar values) •Random (statistically defined by probability function) Creates a mathematical model which is used to estimate values across the surface Kriging Concept of Lag distance Where: Zi is a variable at a sample point hi is the distance between sample points Every possible set of pairs Zi,Zj defines a distance hij, and is different by the amount Zi – Zj. The distance hij is know as the lag distance between point i and j. Also there is a subset of points in a sample set that are a given lag distance apart h=4 3 2 6 3 2 Kriging Concept of Spatial Autocorrelation Higher autocorrelations indicates points near each other are alike. This provides substantial information about nearby locations h = width of 1 cell INTERPOLATION Kriging Concept of Semi-variance Where Zi is the measured variable at one point Zj is another at h distance away n is the number of pairs that are approximately h distance apart Semi-variance may be calculated for any h (When nearby points are similar (Zi-Zj) is small so the semi-variance is small. High spatial autocorrelation means points near each other have similar Z values) INTERPOLATION (cont.) Kriging When calculating the semi-variance of a particular h often a tolerance is used (as few h values will be identical). Plot the semi-variance of a range of lag distances in a variogram Variogram Semi-variance is usually small at small lag distances and increases to a constant value as the lag distance h increases Variogram •A nugget is the initial semi-variance when the autocorrelation typically is highest •The sill is the point where the variogram levels off; background noise; where there is little autocorrelation •The range is the lag distance at which the sill is reached INTERPOLATION (cont.) Kriging •A set of sample points are used to estimate the shape of the variogram •Variogram model is made (A line is fit through the set of semi-variance points) •The Variogram model is then used to interpolate the entire surface INTERPOLATION (cont.) Kriging Similar to Inverse Distance Weighting (IDW) Kriging uses the minimum variance method to calculate the weights rather than applying an arbitrary or less precise weighting scheme INTERPOLATION Kriging Interpolation in ArcGIS: Spatial Analyst Interpolation in ArcGIS: Geostatistical Analyst Interpolation in ArcGIS: arcscripts.esri.com INTERPOLATION (cont.) Exact/Non Exact methods (Is there a difference at the sample locations?) Exact Thiessen IDW Non Exact Fixed-Radius (averages several points near the sample location) Trend surface (surface typically does not pass through the measured points) Spline Kriging Class Vote: Which method works best for this example? Systematic Random Original Surface: Cluster Adaptive Class Vote: Which method works best for this example? Thiessen Polygons Fixed-radius – Local Averaging IDW: squared, 12 nearest points Original Surface: Trend Surface Spline Kriging Core Area Identification • Commonly used when we have observations on a set of objects, want to identify regions of high density • Crime, wildlife, pollutant detection • Derive regions (territories) or density fields (rasters) from set of sampling points.