Pelagic Regionalisation: DRAFT Classification

advertisement
Pelagic Regionalisation: DRAFT Classification
Background
At the October 2004 meeting of the Bioregionalisation Working Group, CSIRO
presented preliminary classification results for Levels 1, 2 and 3 of the pelagic
environment using physical and chemical variables. Two key issues which
arose in the discussion of the results were:
1. The need to analyse and illustrate the 3D structure – in particular
depth variations of classified regions.
2. Analysis and depiction of the variability in the boundary of regions
due to seasonal shifts.
3. The classification at the Ocean Basin scale was deemed to be too
broadscale in extent, and without support of biological corroboration
it was to be described qualitatively.
To address the concerns regarding the 3D structure in a consistent manner,
and considering the extreme variation in structure evident in the ocean, an
attempt was made to extend Rick Smith’s depth-layered analyses, presented
at the last BWG workshop, to 3D. By implication, a depth-layered analysis
requires a-priori selection of depth layers or strategies for dealing with depth
variations in an aggregate analysis. This selection in turn will influence the
outcome of the classification analysis (and the subsequent analysis of the
depth structuring). Other considerations which guided our analyses included:
1. The volume of the data. Each variable was available as a netcdf file
comprising 56 depth levels at the following depths (in metres):
0 10 20 30 40 50 60 70 75 80 90 100 110 125 150 175
200 225 250 275 300 350 400 450 500 550 600 650 700 750
800 850 900 950 1000 1100 1200 1300 1400 1500 1600 1750 2000
2250 2500 2750 3000 3250 3500 3750 4000 4250 4500 4750 5000
5500
Within each depth level, there were 601 grids for latitude spanning 0600S and 901 grids for longitude spanning 90-1800E, and each layer
had fields for mean and annual amplitude.
Even with the good capabilities of our current research computing
systems, it was not feasible to analyse such volumes of data
concurrently using available statistical software.
Analysed
2. Variables available for analysis: temperature, salinity, nitrate,
oxygen, and silicate were of varying (underlying) spatial resolution and
there was high correlation amongst the variables. Visual analysis of the
data shows that silicates are highly influenced by land mass and
current systems that impinge on land masses. On this basis it was not
1
included for analysis. Nitrates were highly correlated with oxygen and
other physical variables but the underlying resolution and/or variability
of nitrate caused the resultant distributions to contain what appeared to
be spatial artefacts from sampling. Thus, it was also excluded from
analysis. Oxygen likewise was of poor but acceptable resolution (as it
is now a relatively standard oceanographic measurement) and it was
retained with the standard variables of temperature and salinity.
Analysis Strategy
Mean Analyses
To cope with the volume of data, a qualitative stratification was implemented
to subsample the data in latitude/longitude/depth space. The 56 levels of
depth were reduced down to 8 comprising:
0m 100m 250m 500m 1000m 2000m 3000m 4000m
These were assumed to capture the bulk of the variability expected to be
important at the Level 1/2 classification scale (broad zonal expanses of water
masses). For each variable (temperature, salinity, oxygen), data for all grid
cells at these levels were extracted, excluding those masked as land. This
data comprised the fundamental input data for analysis.
Given the volume of data the classification algorithm chosen was the large
array algorithm “clara” (Clustering Large Application) which uses subsamples
to build up its classes. As reported in past workshops, in order to neutralise
the adverse impact of frontal regions, the data were histogram-scaled. In
order to select an appropriate number of clusters, an initial run was conducted
with cluster sizes ranging from 2 to 35 (using a sample size of 500 and 2
samples). Silhouette plots were then examined to analyse the distinctiveness
of clusters. These showed that groups at 5 and 25 clusters contained local
maxima of mean silhouette values. The classes with 5 clusters was very
broadscale ocean-basin classes so subsequent analyses concentrated on the
25 clusters. Using a sample size of 1000, 10 samples were used in the clara
classification which was then used to model the entire dataset with a nearestneighbour class selection procedure. This involved randomly selecting 100
training data records for each class which were then used in the fitting
process.
Variability Analyses
Work on this aspect is still in progress. A number of preliminary analyses
conducted to date on the annual-amplitude data (which is a measure of
seasonal variability) shows very patchy spatial patterns which may be
reflecting the inherent sampling variability. Coherent large scale patterns are
visible off eastern Australia, the tropic and Southern Ocean and silhouette
analyses do not show any obvious groupings apart from groupings at 2-4
2
classes. An example analysis for 10 classes, which typifies the problem, is
shown below.
Figure 1 Classification of annual amplitude for temperature and salinity using 10
classes.
Results
In conjunction with this report, a zip file is included showing sections through
the classification at various depths and along longitude sections.
From a management perspective, a considerable volume of information
needs to be assimilated but this is simply reflecting the nature of the problem
at hand. Aggregations of the information may be possible depending on the
intended use and at this stage, it would be useful if the BWG can provide
some guidance on future work.
3
Download