Appendix 1.
Methodological details
Data sources
1. Melbourne Water (MW) macroinvertebrate database (unpublished, but
MW are aiming to make the data publicly available: A. Boscence, MW,
personal communication). This database contains the >6000
macroinvertebrate samples collected from 903 sites across the Greater
Melbourne region since 1992. Most samples in the dataset were collected
using rapid bioassessment methods (RBA: Anon. 1994) , and only samples
collected using those methods were considered in this study. Primary
information used from the dataset were: unique sample and site codes,
easting, northing, sample date, habitat (RBA samples are collected from
riffles or pool edges), processing method, and the list of taxa collected in
each sample.
2. The Australian hydrologic geospatial fabric (hereafter termed the
geofabric: Bureau of Meteorology 2011). This dataset details spatial
relationships between water bodies across Australia. The surface
network and catchments datasets of the geofabric were delineated from a
national 9-second Digital Elevation Model (DEM), and includes associated
look-up tables defining hydrologic and catchment environmental
variables, derived from a stream and nested catchment database (Stein
2006). Each reach of the surface network has a unique segment number
(SEGMENTNO) linking all look-up tables. We used the following
information: mean annual accumulated surface water surplus
(RUNANNMEAN, an approximation of mean annual stream discharge in
the absence of human impacts), a table of monthly accumulated surface
water surplus for each reach, catchment area (CATAREA) and proportion
of catchment area underlain by igneous geology (CAT_IGNEOUS).
3. A 10-m resolution DEM for the Greater Melbourne region, developed
using a combination of contours (rural areas) and LiDAR based elevation
models (urban areas) for the region (J Kunapo, personal communication).
4. A land-use dataset for the Greater Melbourne region, compiled by MW
from: a) the Department of Primary Industries’ Victorian land-use map
(Department of Primary Industries 2011), for most of the rural parts of
the region; and b) Victorian planning scheme zones for 2006 (Department
of Planning and Community Development 2010), for the metropolitan
area and two small rural catchments to the east of the region. Following
concerns over the accuracy of these sources, particularly the planning
scheme zones (MW, unpublished data), we checked and corrected the
entire dataset at small scales, by manually outlining forest cover using
2004 aerial imagery (20-cm resolution). This process was conducted over
the entire study area, with particular care taken to correctly classify
forests along drainage lines (Walsh and Webb 2013).
5. A nested dataset of 15,901 stream reaches and catchments in the Greater
Melbourne Region (DCI_All_Catchments—hereafter DCI—layer: Grace
Detailed-GIS Services 2012). This geographical information system (GIS)
layer is derived from a 1–3-m resolution DEM that incorporates flowpaths
through the urban stormwater drainage system and a high-resolution
map of impervious surface cover across the region. The delineation of
reaches and their subcatchments is at a substantially finer resolution than
the geofabric, and this network was used as the primary data source for
delineating catchment boundaries in our assessment of forest cover. We
used the following information from this dataset: unique identifying
codes for each subcatchment (SC_ID) and the next subcatchment
downstream (NextDownID), accumulated total catchment
imperviousness (TI) and attenuated imperviousness (DAI9 = the most
plausible weighted (attenuated) imperviousness measure (AI)
determined by Walsh and Kunapo 2009).
We also used the underlying map of impervious polygons used to create
these data (Impervious_areas_mosaic_with_dci_distances layer: Grace
Detailed-GIS Services 2012), in conjunction with the land-use dataset, in
the calculation of forest cover (see below).
The DCI data did not cover 3 rural catchments in the east of the region
(those with less dense stream networks in Fig. 1). Subcatchments for this
area were delineated for these streams using the 10-m DEM. The only
substantial impervious coverage draining to streams of interest in this
region is in the town of Drouin and the major roads through the
catchments. We estimated proportional impervious cover manually for all
polygons in the land cover dataset, and used this to calculate total
imperviousness (TI) for each subcatchment not covered by the DCI layer.
As Drouin is >1 km from a perennial stream, and formal stormwater
drainage is absent elsewhere in this region, we set AI to zero for these
Compilation of the biological dataset and predictor variables requiring no further
As detailed in the methods section, data from pairs of samples were combined to
produce presence-absence data for sample-pairs (each pair is called a sample
throughout the manuscript). All taxa collected in each pair of samples were
combined into a single presence-absence list and SIGNAL score was calculated
using the grades of EPA Victoria (2003). This compilation process resulted in a
table of 1723 samples, each with a unique sample code, a sample date (the first
date was used for pairs of samples collected on different dates), a site code (572
sites), an easting, a northing, and three variables used as predictors in the
1. number of spring samples (nspring: 0, 1, or 2),
2. number of riffle samples (nriff: 0, 1 or 2), and
3. the method by which the samples were sorted (process: field or lab).
Two variables were extracted from the geofabric. Each site was mapped to a
reach in the geofabric stream network and assigned its segment number
(SEGMENTNO). Values associated with each SEGMENTNO were then assigned to
each site:
4. mean annual discharge depth (mm), meanQ, calculated as mean
annual accumulated surface water surplus (RUNANNMEAN, from the
"Run" table, an approximation of mean annual stream discharge in the
absence of human impacts) divided by catchment area of the reach
(CATAREA, from the "Terrain" table),
5. proportion of catchment area underlain by igneous geology
(CAT_IGNEOUS from the "substrate" table).
The 10-m-resolution DEM was used to assign:
6. elevation of each site (m).
Three variables were extracted from the DCI dataset. Each site was mapped to a
subcatchment and assigned its SC_ID. Values associated with each SC_ID were
assigned to each site:
7. catchment area (km2): the finer resolution of the DEM used for this
dataset, and its delineation of catchment boundaries based on
stormwater drainage network make it a more suitable estimate of
sampling site catchment area than the geofabric estimate. It was,
however, appropriate to use the geofabric estimate to convert mean
annual runoff to a depth, as RUNANNMEAN and CATAREA are
estimated for the same reach, which may not be an exact match for the
sampling site.
8. a) Total imperviousness (as described above), and b) attenuated
imperviousness (as described above).
Compilation of variables for optimal weighting searches
a) Antecedent flow
We calculated unweighted and linearly weighted antecedent flow for x months,
where x = 6, 12, 24, 30, 36, 64, and 72. For cases in which the two sample units
constituting a sample were collected on different dates, the date of the first
sample was used. The calculation used the monthly discharge estimates from the
geofabric look-up table "run_mmmyy". Appendix 2A presents R code that was
used to perform the calculation.
b) Forest cover
Calculations for spatial weighting schemes of forest cover required the
compilation of five geospatially consistent, 10-m resolution rasters.
i) flow-distance to stream (dL, m, with streams defined as drainage lines
with >1 km2 catchment area), derived from the 10-m-resolution DEM;
ii) flow-distance to bottom of catchment (d2boc, in m), derived from the 10m-resolution DEM;
iii) land-use (luno), with each gridcell one of 13 land use codes, including
forest cover, derived by rasterizing the land-use polygon dataset;
iv) impervious, with each gridcell either 1 (impervious) or 0 (not), derived by
rasterizing the impervious polygon map;
v) subcatID, with each gridcell given the value of the SC_ID in which it lay,
derived from the DCI layer.
Rasters were manipulated as vectors in R. Each vector was 2.74 x 108 long, and
was first reduced to a length of 1.33 x 108, by omitting missing values. Rasters
were later reconstituted for mapping by saving the indices of missing values.
The luno raster was amended to include impervious surfaces as a 14th land use.
All gridcells for which impervious = 1, were given the value of 14 in luno. This
was done because the impervious polygon layer was of a superior accuracy to
the land use polygon layer, and to ensure that there was no spurious
classification of gridcells as being both impervious and forest.
The resulting four rasters (dL, d2boc, luno and subcatID) were saved as 4 vectors
for subsequent manipulations, as this format is computationally quicker than
joining them in a table.
To associate the raster data with each of the 572 sampled sites:
i) We calculated the river distance upstream of the subcatchment outlet
( for the SC_ID associated with each site. For almost all urban sites,
the site was located at the bottom of the subcatchment ( = 0), but
not for many rural sites.
ii) We used the NextDownID field in the DCI data to compile a list of vectors
listing all upstream SC_IDs for every SC_ID in the network. See Appendix
2B for the R code used to construct this list.
iii) For each SC_ID in the network, we built a table of gridcells, by compiling
the values of dL, d2boc and luno for every gridcell with subcatID equal to
that SC_ID.
iv) The smallest d2boc value for each SC_ID was assigned to the DCI table as
subc_d2boc (i.e. the distance of the subcatchment outlet to the bottom of
the catchment).
v) For each of the 572 sites, a table of upstream gridcells was constructed
a) Copying the gridcell table for the SC_ID associated with the site, and
removing all gricells for which (d2boc – dL) < ( + subc.d2boc) (i.e.
removing all gridcells that fall downstream of the sampling site).
b) Appending to that table gridcell tables for upstream SC_IDs, but only
those for which gridcell weightings will be significantly influential (to
reduce computing intensity for large catchments).
The criterion used for assessing which SC_ID tables to include depended on the
weighting model to be used, so this final step was conducted separately for each
weighting scheme.
As described in the methods, 3 distance-weighting models were applied to each
gridcell classified as forest cover:
1) exponential decay,
f(d) = exp(-d)
2) linear decay to zero, f(d) = max(1 - d, 0)
3) threshold,
if d   f(d) = 1, else f(d) = 0
Where d = the flowpath distance (either to-stream or in-stream). The decision
on inclusion of upstream SC_ID tables was based on in-stream flow distance only.
For threshold weighting, the gridcell table for SC_ID i was only included in the
upstream gridcell table for site x if:
(subc.d2boci - dus.scx - subc.d2bocx) <  (i.e. f(d) > 0)
For linear decay to zero, the criterion was:
(subc.d2boci - dus.scx - subc.d2bocx) < 1/ (i.e. f(d) > 0)
For exponential decay, the criterion was:
(subc.d2boci - dus.scx - subc.d2bocx) < 6.91/ (i.e. f(d) > 0.001)
This step excluded SC_IDs in which f(d) of all gridcells = 0 for threshold and
linear weighting and those in which f(d) of all gridcells < 0.001 for exponential
Once each upstream gridcell table was compiled, the instream flow distance to
the sampling site x for each pixel i, was calculated as:
dWi = d2boci – dL i - subc.d2boc x – x.
For each weighting model and each combination of ∝ and ∝ , cumulative
distance-weighted forest cover, F(∝ ∝ )was calculated for each site using the
dW and dL values from its upstream gridcell table as:
(∝ ∝ ) =
Σ   ( ,∝ )  ( ,∝ )
Σ  ( ,∝ )  ( ,∝ )
Where Ci = 1 if luno = forest, 0 if not; and f is one of the three weighting functions.
A wide range of ∝ and ∝ values were calculated as described in the next section.
Appendix 2. R scripts used in analyses
A. R code for calculating antecedent flow of different weighting schemes
antecedent.flow <- function(x, n.months, weight = FALSE, ...)
#x = a vector consisting of an index value identifying the month
#of the sample date, followed by a vector of monthly discharge values
#ordered chronologically, as in the geofabric look-up table "run_mmmyy"
#n.months = the number of months to be used in the calculation
#weight: is the antecedent flow to be weighted?
sum(seq(1/n.months,1,1/n.months)*x[(x[1] - n.months + 1):(x[1])])
sum(x[(x[1] - n.months + 1):(x[1])])
#to use the function, create a data.frame with columns and names equal to the full series of monthly dates in
the geofabric look-up table "run_mmmyy", and rows matching the list of SEGMENTNOs in the list of samples
for which antecedent flow is to be calculated.
#In this example, cres = the look-up table "run_mmmyy" as a data.frame, and tm = the data.frame of samples,
with a columns:
#"segmentno" (corresponding to SEGMENTNO values in cres)
#"flowmonth" (which identifies the month of the sample in the same form as the field.names of the
monthly discharge fields in cres: i.e. run_mmmyy.
#"catarea" (CATAREA from the geofabric look-up table "Terrain")
#"meanQ" (calculated from the geofabric as described above)
tcres <- cres[match(tm$segmentno, cres$segmentno),c(grep("run", names(cres)))]
#field names in the run_mmmyy being non-numeric, need to be manually ordered chronologically
flowmonth.order <- data.frame(flowmonth = names(tcres[-1]),
year = as.numeric(substr(names(tcres[-1]),4,7)),
month = as.numeric(substr(names(tcres[-1]),9,10)))
flowmonth.order <- flowmonth.order[order(flowmonth.order$year + flowmonth.order$month/12),]
#add a segmentno field to the start of the tcres data.frame
tcres <- cbind(cres$segmentno[match(tm$segmentno, cres$segmentno)],
tcres[,match(flowmonth.order$flowmonth, names(tcres))])
names(tcres)[1] <- "segmentno"
#determine the index of the month of the sampling date
tcres.index <- match(tm$flowmonth, names(tcres)[-1])
#use the apply function to calculate antecedent flow for all samples in tm in one step: first unweighted 48
month antecedent flow <- 48
tm$anteflow48 <- 12*apply(cbind(tcres.index, tcres[-1]), 1, FUN = antecedent.flow, n.months =, na.rm
= TRUE)/((*tm$catarea*tm$meanQ))
#then weighted antecedent flow
tm$anteflow48w <- 2*12*apply(cbind(tcres.index, tcres[-1]), 1, FUN = antecedent.flow, n.months =,
weight = TRUE, na.rm = TRUE)/((*tm$catarea*tm$meanQ))
B. R code for deriving a list of vectors listing all upstream SC_IDs for each SC_ID.
allupstream <- function(hierarchy,catchname){
#Function that uses the nextds field in a table 'hierarchy' to extract all 'subcatID's upstream of a site
catchname <- as.vector(catchname)
hierarchy$subcatID <- as.vector(hierarchy$subcatID)
hierarchy$nextds <- as.vector(hierarchy$nextds)
allsc <- as.vector(hierarchy$subcatID[hierarchy$nextds==catchname])
allsc <- allsc[!]
#subcatchments immediately upstream
nbrnch <- end <- length(allsc)
#number of branches immediately upstream
start <- 1
for(i in start:end)
allsc <- c(allsc,as.vector(hierarchy$subcatID[hierarchy$nextds==allsc[i]]))
allsc <- allsc[!]
start <- end + 1
end <- length(allsc)
nbrnch <- end - (start - 1)
allsc <- c(catchname,allsc)
} else
cat(paste(catchname,"is not a subcatID listed in the hierarchy table","\n"))
#allsubcs is a data.frame loaded from the DCI table with fields subcatID and nextds (= SC_ID and
NextDownID, respectively, both from the DCI table) <- list()
for(i in 1:length(allsubcs$subcatID))
{[[i]] <- allupstream(allsubcs allsubcs$subcatID[i])
names( <- allsubcs$subcatID
#the resulting list,, contains 15,901 vectors, each containing all SC_ID values upstream of
(and including) the SC_ID identified as the name of each vector
Anon. (1994) National River Processes and Management Program. Monitoring
River Health Initiative. River Bioassessment Manual. Version 1.0. Department of
the Environment, Sport and Territories; Land and Water Resources Research
and Development Corporation; Commonwealth Environment Protection
Authority, Canberra
Bureau of Meteorology (2011) Australian hydrological geospatial fabric
(geofabric) product guide. Version 2.0 – November 2011. Australian Government,
Bureau of Meteorology, Canberra
Department of Planning and Community Development (2010) Planning schemes
online: Victoria's planning schemes. The State of Victoria, Melbourne. Available
from (accessed January 2013)
Department of Primary Industries (2011) Victorian resources online: land use.
State of Victoria, Melbourne. Available from (accessed
January 2013)
EPA Victoria (2003) Rapid bioassessment methodology for rivers and streams.
Guideline for environmental management, Publication No. 604.1. Environment
Protection Authority Victoria, Melbourne, Australia
Grace Detailed-GIS Services (2012) Directly connected imperviousness
compilation for Melbourne Water selected catchments. Report prepared for
Melbourne Water. Melbourne
Stein JL (2006) A continental landscape framework for systematic conservation
planning for Australian rivers and streams. PhD, Australian National University
Walsh CJ, Kunapo J (2009) The importance of upland flow paths in determining
urban effects on stream ecosystems J. N. Am. Benthol. Soc. 28(4):977–990
Walsh CJ, Webb JA (2013) Predicting stream macroinvertebrate assemblage
composition as a function of land use, physiography and climate: a guide for
strategic planning for river and water management in the Melbourne Water
region. Melbourne Waterway Protection and Restoration Science-Practice
Partnership Report 13-1. Department of Resource Management and Geography,
The University of Melbourne, Melbourne

Appendix 1. - Springer Static Content Server