e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 available at www.sciencedirect.com journal homepage: www.elsevier.com/locate/ecolmodel Methods for interpolating stream width, depth, and current velocity Jud F. Kratzer a,∗ , Daniel B. Hayes a , Bradley E. Thompson b a b Department of Fisheries and Wildlife, 12 Natural Resources Building, Michigan State University, East Lansing, MI 48824, USA Washington Department of Fish and Wildlife, 600 Capitol Way North, Olympia, WA 98501, USA a r t i c l e i n f o a b s t r a c t Article history: Interpolation is a type of modeling that can be used to estimate habitat variables throughout Received 30 September 2004 a stream based on measurements distributed along the stream’s length, but little guidance Received in revised form 23 January is available to select the best method of interpolation. Thus, we compared several methods 2006 to determine which produced the most accurate interpolation of width, depth, and cur- Accepted 1 February 2006 rent velocity, separately. We also determined whether interpolation should be performed Published on line 20 March 2006 using separate datasets for riffles, runs, and pools or unstratified datasets. We measured stream width, maximum depth, and mean current velocity in a northern Michigan water- Keywords: shed. We tested seven methods of interpolation including global average, linear regression, Interpolation cubic spline, moving average, Lagrange polynomials, Kriging, and Loess smoother. Accuracy Stream habitat of different methods was determined by comparing interpolated habitat conditions to actual Loess values measured at points along the river. This study produced two main recommenda- Kriging tions. First, when performing interpolations, data should be stratified by meso-habitat type Moving average (riffles, runs, and pools) only when habitat variables are different for each meso-habitat type and stratification does not increase distance between points such that interpolation accuracy is reduced. If habitat variables are similar for all meso-habitat types, knowing the meso-habitat type within which a point falls does not add information that will increase interpolation accuracy. Second, the Loess smoother with a smoothing parameter from 0.2 to 0.4 generally produced the most accurate interpolated values and is the method we recommend for similar situations. © 2006 Elsevier B.V. All rights reserved. 1. Introduction Knowing the habitat available in a stream is useful for modeling the distribution and production of fish. To completely describe available habitat, habitat conditions need to be measured throughout the stream. Because of the expense of sampling at a fine scale throughout an entire river system, two approaches are often used to sample stream fish habitat. In the representative reach approach, habitat conditions are measured at a fine scale (e.g., sample grid of 1–2 m), but sampling is limited to a relatively short reach (e.g., 1 km or less). ∗ Corresponding author. Tel.: +1 517 353 6697; fax: +1 517 432 1699. E-mail address: kratzer1@msu.edu (J.F. Kratzer). 0304-3800/$ – see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.ecolmodel.2006.02.004 This approach is limited because habitat conditions between selected reaches are unknown. Another approach is to sample at points spaced broadly along the length of a stream (e.g., 100 m), thereby covering greater lengths of stream. A limitation with this approach, however, is that habitat conditions are not directly sampled at a fine scale, thereby preventing development of detailed maps of stream habitat. The goal of this study was to explore how interpolation could be used with this broad-scale sampling approach to estimate habitat conditions, in effect creating a one-dimensional map of stream habitat based on habitat measurements scattered along the 257 e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 Fig. 1 – Example of sampling transects along a river reach, illustrating the need for interpolation to estimate conditions at intervening points. stream’s length. Interpolation does not describe the relationship between habitat variables such as width, depth, and velocity at a given point, but rather, it uses measurements at selected sampling points to estimate conditions at unknown points. For example, in a study of the movement behavior of steelhead, Oncorhynchus mykiss, in the Pine River, Michigan, Thompson (2004) found it necessary to describe habitat conditions over large reaches of the stream. Stream width, maximum depth, and mean water velocity were collected at transects dispersed along the river reach (Fig. 1). Thus, habitat conditions were known at each sampling transect. Because fish movement depends on habitat conditions throughout their travel route, it was necessary to estimate conditions at a finer scale than the data were collected (Fig. 1), or would even be feasible to collect. The motivation for our study was how best to estimate conditions along the entire river reach. This is a common problem because there are always limitations on the number and scale of measurements that can be taken. Interpolation has been used to describe aquatic habitat and the distribution of aquatic species. Lehmann et al. (1997) interpolated texture, nutrients, and organic content from point sediment samples in the littoral zone of Lake Geneva, Switzerland. Battista (2001) used a GIS-based habitat suitability index to interpolate suitability of habitat for the eastern oyster, Crassotrea virginica, in Chesapeake Bay. On a smaller scale, Beebe (1996) interpolated water velocities at channel edges, and around large woody debris (Beebe, 2001). The PHABSIM model also relies heavily on interpolation of points collected at a fine scale to estimate available habitat (Milhous et al., 1989). Hankin (1984) and Hankin and Reeves (1988) estimated fish abundance in reaches distributed along a stream in order to estimate total fish abundance. However, interpolation has not been widely used to describe the longitudinal distribution of habitat in rivers, and consequently, there are few guidelines for choosing the appropriate method and protocols for interpolating river habitat variables. There are many methods that can be used to interpolate habitat variables in streams. Thus, our first objective was to determine which method produces the most accurate interpolation of several stream habitat variables. The second objective was to determine whether interpolation should be performed using separate datasets for riffles, runs, and pools or unstratified datasets. The third objective was to determine whether there were inherent differences in interpolation accuracy for Fig. 2 – Pine River watershed location within Michigan. the three habitat variables: width, depth, and velocity. The objective was not to describe relationships between habitat variables, but rather to create a separate one-dimensional map for each of the three habitat variables. 2. Study area The Pine River, a tributary to the AuSable River and Lake Huron, drains a 756 km2 watershed located in the southeast quarter of Alcona County, Michigan, USA (Fig. 2). Backus Creek, and the East, West, and South branches converge to form the Main Branch of the Pine before emptying into Van Etten Lake, a 5.7 km2 impoundment. The South Branch is the longest tributary of the Main Branch and contributes the largest amount of discharge relative to other branches (Table 1). The majority of each sub-catchment is forested Table 1 – Mean width, maximum depth, velocity, and discharge; percent habitat composition; number of transects; total length surveyed in Backus Creek and the East, Main, South, and West Branches of the Pine River watershed Branch Backus East Main South West Width (m) Max. depth (cm) Velocity (m/s) Discharge (m3 /s) 5.4 34.8 0.132 0.702 7.0 53.8 0.192 1.286 14.3 68.0 0.152 2.476 8.6 52.7 0.199 2.328 7.7 42.9 0.141 1.211 Habitat composition Pool Riffle Run 17% 30% 52% 17% 38% 45% 22% 19% 58% 27% 25% 49% 26% 24% 50% Number of transects 68 Total (km) 7.5 137 15.2 87 14.5 183 20.2 146 16.5 258 e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 and contains minimal proportions of urban and agricultural areas. The majority of stream habitat is classified as run, with lesser amounts of riffle and pool habitat (Table 1). The stream is a freely meandering stream with low gradient, and surficial substrate composed mostly of sand and gravel. Summer stream temperatures of the major tributaries rarely exceed 23 ◦ C and support a coldwater fish community dominated by juvenile steelhead (O. mykiss), brown trout (Salmo trutta), brook trout (Salvelinus fontinalis), mottled sculpin (Cottus bairdii), creek chub (Semotilus atromaculatus), white sucker (Catostomus commersoni), and northern brook lamprey (Ichthyomyzon fossor). 3. Methods 3.1. Data collection We measured stream width, maximum depth, and mean current velocity during summer base flow conditions in the East Branch, West Branch, South Branch, and Mainstem Pine River and Backus Creek, Alcona County, Michigan (Fig. 1). To collect habitat data, we walked upstream a randomly determined distance from the downstream end of each branch and measured hydraulic variables at random distances thereafter. We determined distance from a random number generator with a uniform distribution from 10 to 190 m for all branches but the mainstem, for which the distribution ranged from 10 to 290 m. We measured wetted width at the selected point. Depth (cm) and current velocity (m/s) were measured at 50 cm intervals in stream reaches <15 m wide and at 100 cm intervals in stream reaches >15 m wide. We also recorded the meso-habitat type as either riffle, run, or pool (Hicks and Watson, 1985). We then randomly selected another distance and repeated the process. We stopped moving upstream when width became <2 m wide or we reached a barrier to upstream fish movement because these data were collected as part of a study on juvenile steelhead. 3.2. Data preparation For each branch, we divided data into four datasets. There were three “stratified” datasets, one for each meso-habitat (i.e., riffles, runs, and pools). The “unstratified” dataset consisted of data from all meso-habitat types combined. To assess the accuracy of interpolation, we subsampled each dataset by leaving out every other sample point. This allowed us to produce interpolation estimates that could be compared to known values at points deleted from the interpolation. 3.3. Interpolation methods We tested seven methods of interpolation including global average, linear regression, cubic spline, moving average, Lagrange polynomials, Kriging, and Loess smoother. We did not describe relationships between the three habitat variables, but rather used width to interpolate width, depth to interpolate depth, and velocity to interpolate velocity. The global average method simply used the overall mean value of observed data to estimate conditions at unknown points. We used lin- ear regression to take into account longitudinal trends and make estimates of habitat variables at unknown points using regression predictions. In the cubic spline interpolation method (Press et al., 1992), the first set of four points was used to make estimations for deleted points within the span of the data. The “window” was then advanced one point (i.e., the second through fifth points) and an estimation was made for the second deleted point. The cubic spline interpolation calculations were performed in Microsoft Excel using a macro found at www.srs1software.com. We evaluated moving averages with varying windows to interpolate habitat variables. For a moving average with a window width of two, the average value of the two data points surrounding a deleted point was used to provide an interpolation estimate. For a width of four, four surrounding observed data points were used, two on each side. Widths ranged from 2 to 10 for the unstratified dataset, but small sample size of some meso-habitats reduced the largest possible moving average width for stratified data in some branches. We used Lagrange polynomials ranging from linear to fourth order (Press et al., 1992). The general equation for an N − 1 order Lagrange polynomial interpolation is P(x) = y1 (x − x2 )(x − x3 ) . . . (x − xN ) (x1 − x2 )(x1 − x3 ) . . . (x1 − xN ) + y2 (x − x2 )(x − x3 ) . . . (x − xN ) (x2 − x1 )(x2 − x3 ) . . . (x2 − xN ) + yN (x − x2 )(x − x3 ) . . . (x − xN ) (xN − x2 )(xN − x3 ) . . . (xN − xN−1 ) where y is the habitat variable, x (without a subscript) is the upstream distance for the point to be interpolated, and the subscripts 1 to N index the number of observed data points used in the interpolation. As with cubic spline interpolation, sequential sets of observed points were used to interpolate for deleted points. We performed Kriging using the KRIGE2D procedure in SAS. The KRIGE2D procedure requires that the shape, range, and scale of the sample variogram be estimated according to the procedure outlined in the SAS/STAT User’s Guide (SAS Institute Inc., 2000). The KRIGE2D procedure performs twodimensional Kriging and requires three values for every data point: an x-coordinate, a y-coordinate, and the variable of interest. Our stream habitat data was one-dimensional, so we simply assigned an x-coordinate value of zero to all data points. A Kriging interpolation is essentially a weighted average, with the weights assigned to observed data values according to their distance from the point of interest and their redundancy (Isaaks and Srivastava, 1989). Observed points that are closer to the deleted point and observed points that are more isolated from other observed points receive higher weights. Global Kriging makes use of all observed data points, but in local Kriging, the observed data points used in the interpolation are limited to points falling within a specified radius. For one-dimensional data, the radius is actually just a distance along the line. We performed global Kriging and local Kriging with radii of 100, 500, 1000, 2000, and 4000 m. In each local Krig- 259 e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 ing interpolation, the minimum number of points to include in the interpolation was set at two. The final method of interpolation was the Loess smoother. We performed Loess interpolation using the LOESS procedure in SAS. In the LOESS procedure, the habitat variable at an unknown data point is estimated from linear regression of observed data points within a neighborhood of chosen size surrounding each unknown point. The smoothing parameter (s) is used to change the radius of the neighborhood used for local regression. When s < 1, the local neighborhood consists of the s fraction of the observed data closest to the given unknown point. When s ≥ 1, all observed data points are used in the local regression. We used smoothing parameters of 0.1, 0.2, 0.3, 0.4, 0.5, and 1 for the unstratified data. Small sample size of some meso-habitats made using small smoothing parameters impossible for the stratified data of some branches. Data points in a given local neighborhood are weighted based on their distance from the center of the neighborhood, with points further from the center receiving less weight in the local regression. The weight of a data point 3 3 is given by: wi = (32/5)(1 − (di /Q) ) , where di is the distance of point i from the center of the neighborhood, and Q is the distance between the center of the neighborhood and the point furthest from the center (SAS Institute Inc., 2000). As s decreases, the wi of a given point will tend to decrease because Q will decrease. 3.4. Comparison of methods We used mean squared errors to determine which method provided the most accurate interpolations for each habitat variable. Squared error was calculated as the squared difference between the actual value of deleted data points and their interpolated values. To determine if interpolation accuracy was increased by stratifying data by meso-habitat types, we compared mean squared errors for interpolations using stratified and unstratified data for each interpolation method individually. The difference between interpolations using stratified and unstratified datasets is that in stratified interpolation, only riffle data were used to estimate habitat in riffles, only runs were used for runs, and only pools were used for pools, whereas unstratified interpolation used data from all mesohabitat types. In order to explain the differences between stratified and unstratified interpolations, we used ANOVA to test for differences in habitat variables among riffles, runs, and pools. We also used the coefficient of variation (CV, square root of mean squared error divided by the mean value of the habitat variable) as a dimensionless measure to compare interpolation accuracy for stream width, depth, and velocity. 4. Results and discussion 4.1. Stratification Over all branches, stream width did not vary by meso-habitat type (P > 0.1), but velocity and water depth did (P < 0.01 in each case). Stream width averaged between eight and nine meters regardless of habitat type. As expected, riffles were shallower Table 2 – Mean and standard deviation of width, depth, and current velocity in pools, riffles, and runs from all studied reaches of the Pine River system Pool (n = 143) Mean S.D. Width (m) 8.87 Depth (cm) 74.10 Velocity (m/s) 0.12 4.58 28.13 0.05 Riffle (n = 190) Mean 8.03 36.91 0.22 S.D. 2.95 16.61 0.08 Run (n = 301) Mean 8.20 48.03 0.15 S.D. 3.82 18.50 0.06 and had faster current than runs, which were shallower and faster than pools (Table 2). When interpolating stream width, it was generally better to not stratify the data by habitat type for all interpolation methods (Table 3). In other words, knowing the width of the river at surrounding points was more important than knowing whether these points fell in riffles, runs, or pools. This result was not unexpected because riffles, runs, and pools are generally defined by their depths and current velocities but not by their widths. The South Branch was the only branch for which stratified data provided a more accurate interpolation (based on median mean squared error of the different methods), and this was also the only branch that had significantly different widths in riffles, runs, and pools (P = 0.04). For water depth, stratified data provided more accurate interpolations for the East Branch, the South Branch, and Backus Creek, but unstratified data generally provided more accurate interpolations for the West Branch and the Mainstem (Table 4). Riffles, runs, and pools are partially defined by depth, so interpolation should intuitively be aided by stratifying the data. For the West Branch and the Mainstem, depths in the area surrounding the estimated point, regardless of the habitat type, were better predictors of depth than depths in similar habitat types. Stratified data generally provided more accurate interpolations of stream velocity. For the Mainstem, unstratified data provided the better interpolation (based on median mean squared errors of the different methods), and for the East Branch, stratified and unstratified interpolations performed similarly (Table 5). Mean velocity differed significantly among riffles, runs, and pools within all branches, thus knowing that an unknown data point fell in a riffle, run, or pool generally increased the accuracy of interpolation. 4.2. Comparison of methods There was no single method of interpolation that consistently produced the most accurate interpolations. However, three methods generally performed best-moving average with width from four to eight points, Kriging with radius from 1000 to 4000 m, and the Loess smoother with a smoothing parameter from 0.2 to 0.4. These three methods are conceptually similar, as each one uses data within a predetermined distance from the point to be interpolated. They also statistically smooth the data because an interpolation of a sampled data point usually estimates a value that is not equal to the actual measured value. They differ in that the moving average simply assigns the average of the surrounding data, while local Kriging assigns a weighted average of the surrounding 260 e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 Table 3 – Mean squared errors for the different methods of interpolation for stream width (m) Interpolation East Branch U S West Branch U S Mainstem U South Branch Backus Creek Average S U S U S U S Global average Linear regression Cubic spline 3.15 2.44 3.51 3.37 3.53 4.00 12.55 12.60 13.52 13.46 12.67 14.89 13.06 8.16 8.25 13.53 8.31 17.63 13.15 12.84 10.95 11.59 11.04 9.24 1.95 1.92 2.68 1.57 1.69 2.10 8.77 7.59 7.78 8.70 7.45 9.57 Moving average Width = 2 Width = 4 Width = 6 Width = 8 Width = 10 2.94 2.51 2.49 2.49 2.52 3.54 3.20 3.31 3.25 NA 12.70 11.61 11.85 12.43 13.07 13.56 12.75 12.45 12.74 12.75 8.52 7.72 7.19 7.78 7.79 8.52 6.88 7.42 8.31 9.44 9.74 10.33 10.48 10.69 10.79 6.97 8.33 10.10 9.71 9.50 2.24 1.94 1.87 1.95 1.99 2.46 1.94 NA NA NA 7.23 6.82 6.78 7.07 7.23 7.01 6.62 8.32 8.50 10.56 LaGrange Linear Quadratic 3rd order 4th order 3.08 3.40 3.44 4.20 3.44 3.76 4.89 9.78 12.41 15.48 13.93 14.44 12.75 15.59 14.52 38.13 7.49 8.67 10.91 13.45 8.11 19.41 160.35 474.42 9.87 11.65 11.57 13.83 5.31 11.49 13.04 332.74 2.40 2.63 2.88 3.29 1.93 2.08 2.23 3.44 7.05 8.37 8.55 9.84 6.31 10.46 39.01 171.70 Kriging Global Radius = 100 Radius = 500 Radius = 1000 Radius = 2000 Radius = 4000 3.02 3.09 3.18 3.04 3.07 3.04 3.47 3.65 3.59 3.55 3.44 3.44 12.62 12.08 11.73 12.39 12.20 12.42 12.96 11.90 11.85 12.22 12.63 12.74 10.13 7.71 7.12 6.57 7.38 7.55 10.54 8.10 8.10 7.81 8.05 8.05 9.22 9.88 9.27 8.98 8.79 8.84 7.93 6.02 5.93 6.63 6.67 6.46 2.15 2.44 2.24 2.18 2.17 2.13 1.54 1.91 1.91 1.86 1.67 1.58 7.43 7.04 6.71 6.63 6.72 6.80 7.29 6.32 6.28 6.41 6.49 6.45 Loess smoother Smooth = 0.1 Smooth = 0.2 Smooth = 0.3 Smooth = 0.4 Smooth = 0.5 Smooth = 1 2.46 2.36 2.37 2.42 2.42 2.41 NA 3.07 3.09 3.06 3.10 3.18 11.88 12.40 12.37 12.38 12.45 12.58 NA 13.23 12.61 13.14 12.78 12.99 7.44 7.28 7.54 7.92 8.01 8.10 NA 7.71 7.68 8.12 8.13 7.40 9.88 10.64 10.75 10.99 11.19 12.26 5.17 6.87 8.12 9.09 9.23 9.88 2.40 1.96 1.90 1.94 1.96 1.92 NA NA NA NA 2.10 1.88 6.81 6.93 6.99 7.13 7.21 7.45 5.17 7.72 7.87 8.36 7.07 7.06 Median 2.98 3.44 12.43 12.75 7.79 8.12 10.66 8.71 2.14 1.91 7.10 7.37 The “U” column reports error when data from riffles, runs, and pools were combined in one dataset (unstratified), and the “S” column reports error when interpolation was performed using data from riffles, runs, and pools as separate datasets (stratified). data, and the Loess smoother assigns a value based on a weighted linear regression of the local data. An advantage of the Loess smoother and Kriging over the moving average is that they can make interpolations regardless of how close the point to be estimated is to the edge of the dataset. For example, in a moving average with a width of six, three observed data points are required on each side of the point to be estimated. Therefore, a moving average of width six cannot be used when there are only two or fewer observed data points between the point to be estimated and the end of the dataset. To overcome this problem, we simply applied the value of the interpolation closest to the edge to all remaining points for which moving average interpolation was impossible. One major drawback of Kriging is that it requires the estimation of more parameters than moving average or the Loess smoother. The shape, range, and scale of the sample variogram must be estimated for each dataset. Also, Kriging was much more sensitive to the radius than the Loess smoother was to the smoothing parameter. It is favorable to have less sensitivity to these parameters because it would make the outcome of the interpolation less dependent on the researcher’s choice of the parameter. Because of these weaknesses of the moving average and Kriging methods, we feel that the Loess smoother is preferable. A Loess smoother with a smoothing parameter of 0.2 was generally one of the more accurate interpolation methods (Tables 3–5). For this interpolation, 20% of the observed data points centered around a given unknown data point were used to estimate habitat conditions at the unknown point. In a sense, the smoothing parameter determines how sensitive the interpolation is to extremes in the data. With a smoothing parameter of 0.1, the interpolation is based on only 10% of the total dataset, making the interpolation more responsive to peaks and valleys in the observed data (Fig. 3). As the smoothing parameter increases, the interpolation becomes less influenced by local highs and lows in the observed data. If autocorrelation is high and sampled points are close together, a smaller smoothing parameter may produce more accurate interpolations. The Loess interpolation was less accurate for depth than for width (Table 6). The lower accuracy of the depth interpolation was at least partially caused by the higher variability of maximum depth (Table 7). The coefficient of variation for depths of unknown points was consistently greater than that of widths in all branches. The difference in variability was not caused solely by differences in width and depth along the river, as differences in depth from one unknown data point to the next were also consistently greater than differences in width. The 261 e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 Table 4 – Mean squared errors for the different methods of interpolation for maximum water depth (cm) Interpolation East Branch West Branch Mainstem U South Branch Backus Creek Average S U S U S U S U S U S Global average Linear regression Cubic spline 487 483 500 253 261 384 385 386 919 696 675 880 587 575 1142 853 839 1831 578 520 646 324 305 438 199 190 248 109 139 173 447 431 691 447 444 741 Moving average Width = 2 Width = 4 Width = 6 Width = 8 Width = 10 420 391 415 434 432 302 243 230 233 256 717 534 449 432 427 698 722 764 788 781 772 549 481 497 538 916 855 860 896 920 552 494 517 520 493 335 294 282 293 304 205 178 162 161 172 141 93 NA NA NA 533 429 405 409 412 478 441 534 553 565 LaGrange Linear Quadratic 3rd order 4th order 408 531 505 551 235 401 729 1009 741 1083 1152 1288 749 891 862 1817 826 902 1387 2122 841 1890 25162 85908 581 624 648 768 377 429 522 1281 207 246 270 376 147 173 192 263 553 677 792 1021 470 757 5494 18056 Kriging Global Radius = 100 Radius = 500 Radius = 1000 Radius = 2000 Radius = 4000 423 408 434 429 424 426 227 261 258 262 247 243 363 748 541 447 388 383 719 717 717 738 728 755 549 771 747 500 525 558 871 915 915 890 861 888 571 583 555 550 544 541 336 368 362 312 293 304 168 208 181 165 165 163 97 135 137 126 111 108 415 544 492 418 409 414 450 479 478 466 448 460 Loess smoother Smooth = 0.1 Smooth = 0.2 Smooth = 0.3 Smooth = 0.4 Smooth = 0.5 Smooth = 1 382 418 435 444 454 487 NA 245 242 228 234 264 481 409 391 385 386 393 NA 745 661 695 730 723 583 501 513 542 555 567 NA 836 830 876 855 835 510 510 518 514 513 515 349 300 273 269 271 285 207 164 161 165 172 189 NA NA NA NA 117 121 433 400 404 410 416 430 349 531 501 517 441 445 Median 433 253 439 730 562 876 543 308 179 135 430 478 The “U” column reports error when data from riffles, runs, and pools were combined in one dataset (unstratified), and the “S” column reports error when interpolation was performed using data from riffles, runs, and pools as separate datasets (stratified). Loess interpolation of velocity was less accurate than that of width, but the accuracies of velocity and depth interpolations were more similar (Table 6). Unknown data points had similar coefficients of variation (CV) and point-to-point variation for velocity and depth, while the CV and point-to-point variation of width were much smaller (Table 7). The global average approach performed surprisingly well, presumably because there was relatively little longitudinal trend in width, depth, or velocity within the length of stream sampled in each branch. Slopes of regression lines were relatively shallow, ranging from −0.0014 to 0.0002 for width, depth, and velocity in all branches; thus linear regression performed similarly to the global average because slopes were near zero. In streams where stronger longitudinal trends are present, a local interpolation method such as the Loess smoother would be preferable. Cubic spline and Lagrange polynomials were consistently poor methods of interpolation. These methods were too sensitive to extreme values in the observed data. In contrast to the Loess smoother, moving average, and Kriging, these methods do not statistically smooth the data. 4.3. Fig. 3 – Observed stream width used in interpolations and interpolated stream width using Loess smoother with smoothing parameters of 0.1 and 0.2. Distances between points The distance between sampled points will determine the accuracy of any interpolation method. We would expect to see smaller fluctuations in habitat conditions and more accurate interpolations between sampled points that are closer to each other. Determining the ideal distance between sampled points was not a goal of this study, but it must be considered 262 e c o l o g i c a l m o d e l l i n g 1 9 6 ( 2 0 0 6 ) 256–264 Table 5 – Mean squared errors for the different methods of interpolation for average water velocity (m/s) Interpolation East Branch West Branch U S U Global average Linear regression Cubic spline 0.0075 0.0060 0.0068 0.0058 0.0054 0.0113 0.0053 0.0038 0.0062 Moving average Width = 2 Width = 4 Width = 6 Width = 8 Width = 10 0.0059 0.0052 0.0053 0.0051 0.0048 0.0036 0.0033 0.0031 0.0031 0.0036 LaGrange Linear Quadratic 3rd order 4th order 0.0059 0.0067 0.0070 0.0073 Kriging Global Radius = 100 Radius = 500 Radius = 1000 Radius = 2000 Radius = 4000 S Mainstem South Branch Backus Creek U U S U S 0.0028 0.0020 0.0029 0.0016 0.0016 0.0029 0.0017 0.0017 0.0030 0.0052 0.0046 0.0069 0.0039 0.0030 0.0050 0.0137 0.0137 0.0142 0.0045 0.0031 0.0033 0.0032 0.0035 0.0021 0.0017 0.0017 0.0017 0.0017 0.0019 0.0018 0.0014 0.0017 0.0018 0.0026 0.0021 0.0021 0.0019 0.0017 0.0048 0.0040 0.0042 0.0043 0.0039 0.0036 0.0033 0.0032 0.0029 0.0027 0.0037 0.0054 0.0095 0.0220 0.0046 0.0055 0.0072 0.0079 0.0023 0.0031 0.0039 0.0094 0.0023 0.0028 0.0037 0.0027 0.0028 0.0031 0.0032 0.0038 0.0055 0.0064 0.0071 0.0097 0.0061 0.0059 0.0061 0.0061 0.0061 0.0061 0.0053 0.0097 0.0103 0.0105 0.0102 0.0099 0.0047 0.0045 0.0033 0.0035 0.0035 0.0034 0.0025 0.0022 0.0021 0.0019 0.0019 0.0019 0.0020 0.0023 0.0021 0.0019 0.0020 0.0020 0.0023 0.0027 0.0027 0.0026 0.0025 0.0025 Loess smoother Smooth = 0.1 Smooth = 0.2 Smooth = 0.3 Smooth = 0.4 Smooth = 0.5 Smooth = 1 0.0052 0.0051 0.0051 0.0052 0.0054 0.0059 NA 0.0086 0.0070 0.0057 0.0052 0.0049 0.0033 0.0034 0.0034 0.0034 0.0034 0.0035 NA 0.0020 0.0018 0.0018 0.0017 0.0018 0.0017 0.0015 0.0016 0.0017 0.0016 0.0016 Median 0.0059 0.0057 0.0035 0.0020 0.0018 Average S U S 0.0012 0.0012 0.0018 0.0067 0.0060 0.0074 0.0031 0.0027 0.0048 0.0140 0.0142 0.0138 0.0139 0.0139 0.0013 0.0010 NA NA NA 0.0062 0.0057 0.0056 0.0057 0.0056 0.0027 0.0023 0.0025 0.0024 0.0024 0.0041 0.0048 0.0057 0.0097 0.0029 0.0180 0.0129 0.0151 0.0013 0.0015 0.0029 0.0112 0.0042 0.0079 0.0076 0.0085 0.0028 0.0036 0.0050 0.0112 0.0045 0.0056 0.0051 0.0047 0.0047 0.0046 0.0034 0.0039 0.0038 0.0037 0.0033 0.0029 0.0137 0.0139 0.0137 0.0137 0.0137 0.0136 0.0012 0.0014 0.0014 0.0012 0.0012 0.0013 0.0062 0.0064 0.0061 0.0060 0.0060 0.0059 0.0029 0.0040 0.0041 0.0040 0.0038 0.0037 NA 0.0027 0.0026 0.0024 0.0022 0.0019 0.0040 0.0041 0.0042 0.0042 0.0042 0.0046 0.0040 0.0036 0.0033 0.0032 0.0031 0.0030 0.0136 0.0138 0.0137 0.0137 0.0138 0.0138 NA NA NA NA 0.0010 0.0010 0.0056 0.0056 0.0056 0.0056 0.0057 0.0059 0.0040 0.0042 0.0037 0.0033 0.0026 0.0025 0.0025 0.0046 0.0035 0.0137 0.0013 0.0060 0.0034 The “U” column reports error when data from riffles, runs, and pools were combined in one dataset (unstratified), and the “S” column reports error when interpolation was performed using data from riffles, runs, and pools as separate datasets (stratified). when planning any interpolation. Spacing of sampled points is probably site-specific, with more heterogeneous conditions requiring closer spacing. Limited funding will tend to favor more widely spaced points. Before interpolating stream habitat, a pilot study to determine the most appropriate spacing of sample points may be helpful. Because the goals (and thus acceptable precision) and funds available would be specific to the particular application, we suggest that the approach we have taken in this paper (i.e., collect data at a “reasonable” scale for the application, then determine approximate preci- sion by dropping out every other point) is how the data in a pilot study could be analyzed. 4.4. Falls River, Upper Peninsula, Michigan In order to determine whether the results of this study would be applicable to streams in other geologic settings, we performed interpolations of stream width with the moving average, Kriging, and Loess methods, using unstratifed data from the Falls River, which is in Michigan’s Upper Peninsula (Bryan Table 6 – Coefficient of variation (CV = square root of mean squared error/mean) for interpolations of the three habitat variables in each branch using a Loess smoother with a smoothing parameter of 0.2 Branch Width Depth Velocity Unstratified Stratified Unstratified Stratified Unstratified Stratified East West Main South Backus 0.23 0.44 0.19 0.39 0.25 0.24 0.43 0.19 0.34 0.39 0.48 0.33 0.42 0.35 0.33 0.57 0.40 0.30 0.36 0.43 0.27 0.33 0.88 0.44 0.30 0.35 0.30 Average 0.30 0.30 0.39 0.40 0.45 0.35 ecological modelling 196 263 ( 2 0 0 6 ) 256–264 Table 7 – Coefficient of variation (CV = standard deviation/mean) and mean point-to-point variation (ptp) of width, depth, and velocity of deleted data points for stratified and unstratified datasets Branch Width Depth Unstratified Stratified Unstratified Velocity Stratified Unstratified Stratified CV ptp CV ptp CV ptp CV ptp CV ptp CV ptp East West Main South Backus 0.26 0.26 0.23 0.37 0.15 0.23 0.26 0.20 0.27 0.17 0.24 0.21 0.23 0.41 0.21 0.23 0.16 0.29 0.41 0.71 0.43 0.35 0.40 0.38 0.48 0.42 0.39 0.41 0.42 0.46 0.35 0.40 0.34 0.31 0.35 0.30 0.43 0.57 0.32 0.40 0.30 0.35 0.42 0.38 0.40 0.33 0.43 0.55 0.28 0.39 0.31 0.26 0.25 0.26 Average 0.25 0.23 0.27 0.22 0.46 0.42 0.41 0.33 0.40 0.38 0.41 0.27 Burroughs, Department of Fisheries and Wildlife, Michigan State University, personal communication). The geology of this region is much different than that of the Pine River. While the Pine River has a low gradient and sandy bottom, the Falls River has a rocky bottom and several waterfalls. Despite the different geologies, the three methods performed similarly, with the Loess smoother with a smoothing parameter of 0.5 yielding the lowest mean squared error (Table 8). 4.5. 5. Relation to physical models Many investigations of stream habitat rely on physical models (e.g., PHABSIM) to estimate conditions within a study reach. The focus of such models is on the interrelationship among variables such as stream width, depth, velocity, and substrate size. These models provide useful estimates of the multivariate structure of stream environments and are based on fundamental laws of physics. Although the relationship among these variables can be thought of as following deterministic physical laws, the actual conditions encountered at a small scale (e.g., local variation in substrate leading to differences in local stream slope, width, etc.) has a stochastic component that leads to the need for having input data available at an Table 8 – Mean squared errors for the different methods of interpolation for stream width (m) in the Falls River, Upper Peninsula, Michigan. Mean width = 11.8 m Interpolation appropriate scale. In this study, we investigated methods for interpolating individual stream habitat variables that could then be used directly, or as inputs into physical models such as PHABSIM. Although the sampling scale, and thus the appropriate interpolation method may vary across studies, the need for some application of interpolation is a consistent feature of such research. Conclusions This study produced two main recommendations. First, data should be stratified by habitat type only when habitat variables are different for each habitat type and stratification does not increase distance between points such that interpolation accuracy is significantly reduced. If habitat variables are similar for all meso-habitat types, knowing the meso-habitat type within which a point falls does not add information that will increase interpolation accuracy. Second, the Loess smoother with a smoothing parameter from 0.2 to 0.4 may generally be the most favorable method. Acknowledgement We would like to thank the Michigan Department of Natural Resources and Michigan State University for funding this study. references M.S.E. Moving average Width = 4 Width = 6 Width = 8 Width = 10 19.09 18.84 19.08 19.27 Kriging Radius = 500 Radius = 1000 Radius = 2000 Radius = 4000 18.70 18.57 18.67 18.80 Loess smoother Smooth = 0.2 Smooth = 0.3 Smooth = 0.4 Smooth = 0.5 Smooth = 1 18.93 18.50 18.13 18.05 18.32 Battista, T.A., 2001. Habitat suitability index modeling to support marine resource restoration efforts—a geographic information system approach. In: Proceedings of the Second Biennial Coastal GeoTools Confernce, Charleston, SC, January 8–11. Beebe, J.T., 1996. Fluid speed variability and the importance to managing fish habitat in Rivers. Regul. Rivers: Res. Manage. 12 (1), 63–79. Beebe, J.T., 2001. Flow disturbance caused by cross-stream coarse woody debris. Phys. Geogr. 22 (3), 222–236. Hankin, D.G., 1984. Multistage sampling designs in fisheries research: applications in small streams. Can. J. Fish. Aquat. Sci. 41, 1575–1591. Hankin, D.G., Reeves, G.H., 1988. Estimating total fish abundance and total habitat area in small streams based on visual estimation methods. Can. J. Fish. Aquat. Sci. 45, 834–844. 264 ecological modelling Hicks, B.J., Watson, N.R.N., 1985. Seasonal changes in abundance of brown trout (Salmo trutta) and rainbow trout (S. gairdneri) assessed by drift diving in the Rangitikei River, New Zealand. N. Z. J. Mar. Freshwater Res. 19, 1–10. Isaaks, E.H., Srivastava, R.M., 1989. Applied Geostatistics. Oxford University Press, New York. Lehmann, A., Jaquet, J.M., Lachavanne, J.B., 1997. A GIS approach of aquatic plant spatial heterogeneity in relation to sediment and depth gradients, Lake Geneva, Switzerland. Aquat. Bot. 58, 347–361. Milhous, R.T., Updike, M.A., Schneider, D.M., 1989. Physical habitat simulation system reference manual—version II. 196 ( 2 0 0 6 ) 256–264 Instream Flow Information Paper No. 26, U.S. Fish and Wildlife Service. Biological Report 89 (16). Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., 1992. Numerical Recipes in C: the Art of Scientific Computing, 2nd ed. Cambridge University Press, New York. SAS Institute Inc., 2000. Chapter 38: The LOESS Procedure. SAS OnlineDoc, Version [8] http://gsbwww.uchicago.edu/computing/research/SASManual/ main.htm. Thompson, B.E., 2004. Modeling of juvenile steelhead growth and movement in the Pine River, Michigan. Ph.D. Dissertation. Michigan State University, East Lansing, MI, USA.