Lagrangian analysis by clustering · Joseph H. LaCasce Inga Monika Koszalka

advertisement
Ocean Dynamics
DOI 10.1007/s10236-010-0306-2
Lagrangian analysis by clustering
Inga Monika Koszalka · Joseph H. LaCasce
Received: 15 February 2010 / Accepted: 21 May 2010
© Springer-Verlag 2010
Abstract We propose a new method for obtaining average velocities and eddy diffusivities from Lagrangian
data. Rather than grouping the drifter-derived velocities in geographical bins, we group them by nearestneighbor distance using a clustering algorithm. This
yields sets with approximately the same number of
observations, covering unequal areas. A major advantage is that, because the number of observations is the
same for the clusters, the statistical accuracy is more
uniform than with geographical bins. We illustrate the
technique using synthetic data from a stochastic model,
employing a realistic mean flow. The latter represents
the surface currents in the Nordic Seas and is strongly
inhomogeneous in space. We use the clustering algorithm to extract the mean velocities and diffusivities
and compare the results with the corresponding quantities from the stochastic model. We perform a similar
comparison with the means and diffusivities obtained
with geographical bins. Clustering is more successful
at capturing the mean flow and improves convergence
in the eddy diffusivity estimates. We discuss both the
advantages and shortcomings of the new method.
Keywords Lagrangian analysis · Eddy diffusivity ·
Binning · Clustering
Responsible Editor: John Grue
I. M. Koszalka (B) · J. H. LaCasce
Department of Geosciences, University of Oslo,
P.O. Box 1022, Blindern, 0315 Oslo, Norway
e-mail: inga.koszalka@geo.uio.no
1 Introduction
Lagrangian instruments, surface drifters and subsurface
floats, are widely used for measuring oceanic velocities.
Their increased use in recent decades has resulted in
coverage over large parts of the world oceans (e.g.,
http://www.aoml.noaa.gov/phod/dac/gdp.html). Given
the amount of data being generated, it is important to
continually improve our analysis techniques, to extract
as much information as possible from that data.
There are a wide range of Lagrangian data analysis
techniques (LaCasce 2008). The most common technique involves estimating Eulerian mean velocities and
diffusivities. With these quantities, one can write an
advection-diffusion equation describing the evolution
of a tracer (Davis 1991):
∂
θ + U ∇θ = ∇ K∇θ ∂t
(1)
Lagrangian data can be used to determine U and K, the
time-mean velocity and the eddy diffusivity tensor, both
of which can vary in space.
The method for calculating U and K is described by
Davis (1991). Consider a data set covering a certain
region. The drifter trajectories are used to calculate
velocities along the drifter paths, by differencing. Then
these velocities are grouped in geographical bins of a
specified size to estimate the mean velocities in the
bins (Fig. 1a). The means pertain to the period spanned
by the data set. One assumes that the sampling in the
bins is sufficient to capture the actual Eulerian means
and that the statistics are stationary over this period.
Examples of such calculations are found in Rossby et al.
(1983), Owens (1991), Poulain et al. (1996), Swenson
and Niiler (1996), and Fratantoni (2001).
Ocean Dynamics
a
b
Fig. 1 a A sketch showing Lagrangian observations grouped in
geographical bins. b Lagrangian data partitioned by the clustering
algorithm under the constraint of a prescribed amount of members in a cluster
The diffusivity calculation stems from that of Taylor
(1921). For example, in the zonal direction, this is:
1d
< x2L (t) >=< xL (t)uL (t) >
2 dt
t
=
< uL (t)uL (τ ) > dτ
κxx (t) ≡
0
t
=
Pxx (τ ) dτ
(2)
0
where xL is Lagrangian displacement, uL the Lagrangian
velocity, and P(τ ) the time-lagged Lagrangian velocity
covariance. Davis (1991) allows for the diffusivity to
also vary in space. To calculate this, one replaces the
velocities above with “residual velocities”, those with
the mean removed, and the same with the displacements. The diffusivities are obtained for each bin and
the averages over all trajectories in the bin. As such, the
diffusivity is a mixed Eulerian–Lagrangian measure.
It is Lagrangian because it involves integrating along
particle paths, but it is Eulerian because the integral
occurs for drifters in a specified area and because it
involves subtracting the Eulerian mean.
There are a number of practical issues with regards
to binning (e.g., Mariano and Ryan 2007). One concerns the bin size. The bins should be small enough
to resolve the mean flow but larger than the scale of
the energy-containing eddies. It should also be large
enough to yield a statistically significant estimate. The
latter necessarily varies between bins, as the amount of
data in each bin varies. Such variations can lead to bias
errors (Davis 1991).
The diffusivities are similarly affected by the bin size.
We assume that the diffusivity converges at long times,
i.e., κ(x, t) → κ ∞ (x) as t → ∞. However, the integration time in Eq. 2 depends on the time a drifter spends
in the bin, and this will generally differ between individual drifters in the same bin. As such, the mean autocorrelation derives from segments of differing lengths,
and this can affect the convergence of the integral (see
below). Using larger bins improves this, by allowing for
longer individual segments, but some tracks will always
be shorter than others.
The binning technique has been widely applied to
ocean data, and different bin sizes and even different
bin shapes and orientations have been explored (e.g.,
Swenson and Niiler 1996; Falco et al. 2000; Poulain
2001; Jakobsen et al. 2003; Lumpkin and Garraffo
2005; Davis 1998; Thompson et al. 2009). Improvements such as fitting the binned velocities with cubic
splines (Bauer et al. 2002), using different sized bins
for the means and diffusivities (e.g., Poulain et al. 1996;
Swenson and Niiler 1996), using different asymptotic
limits for the diffusivity integration (e.g., Poulain et al.
1996; Brink et al. 2000; Thompson et al. 2009), and
using different equivalent formulations to Eq. 2 (e.g.,
Colin de Verdiere 1983; Zhurbas and Oh 2003) have all
been explored.
Hereafter, we examine an alternate idea. Rather
than grouping the velocities in bins of fixed size, we
group a specified number of nearest-neighbor realizations together (Fig. 1b) using a clustering algorithm.
Such algorithms are used in diverse fields, such as data
mining, image processing, and bioinformatics (Lloyd
1982; Kanungo et al. 2002; MacKay 2003). Specifying
the number of members in the cluster then determines
the number and spatial extent of the clusters for the
whole data set.
The resulting mean velocities are on a nonuniform
grid. However, the coverage is determined by the data;
we do not obtain estimates where there are few or no
measurements. A major advantage though is that there
are approximately the same number of realizations in
each cluster. As such, the standard error will depend
only on the standard deviation of the velocity rather
than also depending on the number of observations in
the bin.
The calculation of the diffusivities also differs. First,
we evaluate the velocity autocorrelation with Eq. 2 for
a chosen f ixed period of time. We assign a position to
each autocorrelation (the midpoint along the trajectory
segment) and then cluster those positions. We then
average the autocorrelations in the cluster, with each
cluster having a prescribed number of segments. The
average is then integrated over the time interval equal
to the segment length to obtain an estimate of κ ∞ (x).
The length and number of contributing trajectories are
thus the same, and these values can be adjusted to
improve convergence.
The method of calculating the diffusivity is similar to
that used previously by Garraffo et al. (2001), Lumpkin
and Flament (2001), Lumpkin et al. (2002), and Rupolo
(2007). These authors also used trajectory segments of
Ocean Dynamics
a fixed length in calculating the diffusivity. In contrast
though, most used mean velocities from individual trajectories rather than the interpolated Eulerian means
estimated from the entire data set. And their estimates
were grouped into geographical bins, yielding different
numbers of data points in each bin.
We illustrate the clustering method hereafter using
synthetic trajectories. The latter are generated with a
first-order stochastic model, using mean velocities representative of the surface currents in the Nordic Seas.
The result is a data set with known mean velocities
and diffusivities, allowing us to test the accuracy of
our estimates. In addition, we calculate corresponding
estimates using bins and compare the results. The currents in the Nordic Seas are narrow and strongly inhomogeneous, so this is a fairly strenuous test. Using
synthetic data also ensures that we are not limited by
the size of the data set.
Previous authors have used stochastic models for
Lagrangian analysis (e.g., Griffa 1996; Falco et al. 2000;
Garraffo et al. 2001; Veneziani et al. 2004; Rupolo
2007; Sallee et al. 2008). The goal in these studies was
to use the stochastic models to reproduce dispersion
characteristics in observations. We are treating the stochastic trajectories as the observations, as was done,
for example, by Bauer et al. (1998). Davis (1991) used
synthetic trajectories in this way, to evaluate estimation
errors under binning. However, he did not address the
dependence on bin size, an issue addressed here.
The paper is organized as follows: The study region
and simulated Lagrangian particles are described in
Section 2. In Section 3, we consider mean velocities, and
eddy diffusivities are addressed in Section 4. We discuss
the results in Section 5.
two components of the velocity u and v are assumed
independent. The velocity autocorrelation is given by:
P(τ ) =< u(t)u(t + τ ) >= ν 2 e(−τ/TL ) .
(4)
(3)
From Eq. 2, the diffusivities have the asymptotic value
of κ ∞ = ν 2 TL .
As noted, we use estimates of the surface currents
in the Nordic Seas for the mean velocities, (U, V).
The dominant feature here is the Norwegian Atlantic
Current, off the western Norwegian coast. This is 20–
30 km wide in its core, a distance somewhat larger
than the deformation-scale eddies (5–10 km) which
are ubiquitous here (Poulain et al. 1996; Skagseth and
Orvik 2002; LaCasce 2005; Koszalka et al. 2009). Our
representation derives from a 1-year simulation with
the 4-km MIPOM model of the Norwegian Meteorological Institute. This produces fairly realistic velocity
fields (LaCasce and Engedahl 2005). The velocities
were resampled on a regular grid of 0.25◦ × 0.25◦ and
are contoured in Fig. 2a. The means were then interpolated onto the particle’s instantaneous positions for
advection.
The model also requires the root mean square (rms)
velocity, ν, and the Lagrangian integral time scale, TL .
Based on earlier estimates (Poulain et al. 1996; LaCasce
2005; Andersson et al., submitted for publication), we
assign values of ν = 20 cm/s and TL = 1 day.1 This
yields an effective length scale L = νTL = 18 km, comparable to the core width of the Norwegian Atlantic
current. For simplicity, we assume that the eddy statistics are isotropic and homogeneous. Koszalka et al.
(2009) used a similar stochastic model for comparison
with drifter trajectories in the same region.
Two thousand particles were deployed on a regularly spaced grid and advected for 60 days, yielding ca.
105,000 drifter days. This is comparable to the number
of actual drifter days currently available in the Nordic
Seas; however, the areal coverage in the synthetic set
is much more uniform. Seeding on a uniform grid
also reduces the “array bias”, which can influence the
diffusivities (Davis 1991). Some particles collided with
the coast or islands, and we discarded the subsequent
portions of those trajectories.
The model time step was dto = 0.01 day, and the
data were saved with a time step of dt = 0.1 day (one
tenth of the integral time). The resulting trajectories
are plotted in Fig. 2b. For comparison, we ran an additional simulation with 2,000 stochastic particles with
The subscript refers to the particle, (U, V) is the background mean flow, ν is the square root of the eddy
velocity variance, TL is the Lagrangian integral time
scale, and dw is a Wiener (normal) noise process. The
et al. (1996) found TL = 1 − 3 days here, while
Andersson et al. (submitted for publication) estimated TL =
1.1 days. LaCasce (2005) found that the Eulerian integral time
is 1 to 2 days, which implies an equal or shorter Lagrangian time.
2 Data
For the synthetic trajectories, we employ a first-order
stochastic model (e.g. Griffa 1996), for which the particle positions are given by:
dxi = (ui + U(x, y)) dt,
dyi = (vi + V(x, y)) dt
1
dui = − ui dt +
TL
1
dvi = − vi dt +
TL
2
ν dw,
TL
2
ν dw.
TL
1 Poulain
Ocean Dynamics
Fig. 2 a Magnitude of the
mean velocity
√ field
|U(x, y)| = U 2 + V 2
(centimeters per second)
from a MIPOM model
simulation of the Nordic Seas
used to advance stochastic
particles according to Eq. 3.
b Trajectories from 2,000
synthetic particles evolved for
60 days with a first-order
stochastic model embedded
in this mean flow.
Deployment positions are
marked with circles
a)
76
76
74
74
72
72
70
70
68
68
66
66
64
64
62
62
−15
−10
−5
0
zero mean flow (U = 0, V = 0), all other parameters
being the same.
3 Mean velocities
We focus first on extracting the mean velocities from
the drifters. The resulting estimates will be compared
to the actual U, V values from the MIPOM simulation (used as input to generate the trajectories). We
have velocities with a time step of dt = TL /10, but we
use only a subset of these for calculating the means,
with dt = 2TL . Then each observation is treated as
independent.
3.1 Methods
For binning the velocities, we must first choose the bin
sizes. The bins should be small enough to resolve the
mean flow but larger than the eddy scale. They should
also be large enough to yield statistically significant
estimates. The Nordic Seas is problematic in this regard
because the mean and the eddy scales are comparable. Previous authors used (2◦ × 1◦ ) bins in this region
(Poulain et al. 1996; Saetre 1999; Jakobsen et al. 2003).2
dimensions are listed (degrees longitude × degrees latitude). With (2◦ × 1◦ ), the bins are close to square in the southern
part of the domain but are more rectangular in the north.
2 The
b)
5
10
15
−15
−10
−5
0
5
10
15
Such bins have a length scale of roughly 100 km. We
denote this as our “intermediate” bin size. In addition,
we examine smaller and larger bins, with dimensions
(4◦ × 2◦ ) and (1◦ × 0.5◦ ).
For the clustering, we employ the “k-means” clustering algorithm (Lloyd 1982). The algorithm partitions
the nT observations (x1 , x2 , ..., xn ) into k subsets (clusters), S = S1 , S2 , ..., Sk , such that each observation is
assigned to the nearest cluster in a way that minimizes
the sum, over all clusters, of the squared distance between cluster members and the cluster center μi :
min
k x j − μi 2 .
(5)
i=1 x j ∈Si
As the cluster centers themselves depend on the positions of the observations, this is necessarily done iteratively, in a two-step assignment/update process. In the
assignment step, each data point is assigned to the
nearest center. In the update step, cluster centers are
adjusted to match the sample means of their member
data points. This is repeated until the assignments are
unchanged. For more information on clustering algorithms, see, e.g., Kanungo et al. (2002) and MacKay
(2003).
The main parameter to be specified is k, the number
of clusters. If we wish to have clusters with m members,
then k = nT /m. As with the bins, we use three choices,
ranging from coarser to finer resolution. We chose m
so that the mean standard error among the clusters was
Ocean Dynamics
the same as that in the corresponding bins. The error is
defined:
ν
<
σ >=< √ >,
(6)
n
where ν and n are the rms velocity and the number
of realizations in the bin/cluster and the brackets indicate an average over all the bins/clusters. Alternately,
we could have chosen m to match the mean number of observations in the bins, but the latter varies
widely among bins, as will be seen. Matching mean
errors yields clusters with m = 125, m = 75, and m = 45
members. To guarantee that all the clusters have approximately m observations, we modified the k-means
algorithm (as described in “Appendix”).
The various parameters for the bins and clusters
are shown in Table 1. Note that the “coarse” bins are
roughly twice as large as the coarse clusters and have
nearly twice as many observations, on average. The
“fine” bins and clusters are more comparable in both
regards.
3.2 Results
Shown in Fig. 3 are the means obtained by binaveraging (panels a–c) and by clustering (panels d–f).
In the lower panels, the clustered means are linearly
interpolated onto the same grid as for the input model
field (panels g–i), for comparison with the actual mean
flow, in Fig. 2a.
Consider the bins first (panels a–c). With the finest
resolution (1◦ × 0.5◦ ), the major structures in the surface current are recovered. These include the inflow
north of Iceland and the inner and outer branches of
the Norwegian Atlantic Current (e.g., Orvik and Niiler
2002). With the (2◦ × 1◦ ) bins, we observe where the
currents are stronger and weaker but lose much of the
finer structure. The currents with the (4◦ × 2◦ ) bins are
hard to recognize.
The results from clustering are shown in panels d–f.
With m = 45, the means are comparably well-resolved
as those in the finest resolution bins, with the exception
of the currents along the northern periphery (which are
not resolved here but marginally seen with the binned
set). But the m = 75 and m = 125 clusters are also
fairly successful at capturing the mean flow structure.
The primary difference is that, with larger m, there are
fewer clusters.
Of course, part of the difference between the clustered means and the actual field (Fig. 2a) is due to
the uneven plotting with the former. Interpolating the
clustered means onto the same (0.25◦ × 0.25◦ ) grid as
for the input mean flow yields the fields in the lower
panels of Fig. 3. We see that the primary structures
are captured in the clustered means, even with m = 125
(Fig. 3g). Interpolating the binned means on the other
hand produces smoothed versions of those fields (not
shown) and produces results comparable to the input
field only with the (1◦ × 0.5◦ ) bins.
Figure 4 shows further how the statistics vary between the two methods. In panels a and b, we plot the
distributions of the number of observations in the bins
and clusters, respectively. While the largest bins have
many observations, the majority have far fewer. Thus,
the distributions are skewed and the mean number of
observations in the bins (Table 1) is not representative
of the majority. The clusters on the other hand have
nearly a delta-function distribution; all the clusters have
approximately m observations, by design.
A second difference is seen in Fig. 4c, which shows
the size of the bins and clusters as a function of the
mean standard error. For a given error, the average
bin covers a larger area than the corresponding cluster. Moreover, the area covered by the clusters is less
sensitive to the mean error than with the bins. Both
points follow from the differences in the numbers of
observations. Since the clusters have roughly the same
number of observations, it is easier to control the error.
But the errors in the bins vary widely, just as the
numbers of observations do. Since the clusters cover
smaller areas, they are more successful at capturing the
finer-scale structures in the mean.
The standard error determines the significance of the
means in the bins/clusters. In Fig. 5, we examine where
the calculated means differ significantly from the actual
means, averaged over the same areas. Bins in which the
means are not statistically different are shown in blue
while the purple bins indicate a significant difference
Table 1 Parameters of the binning and clustering assignments
Resolution
Long × Lat
km
No. of bins
<n>
<
σ > (cm/s)
m
Nc
Dc (km)
<
σ > (cm/s)
Coarse
Medium
Fine
4×2
2×1
1 × 0.5
186
93
47
61
225
839
865
235
63
1.8
2.5
3.5
125
75
45
452
775
1,356
80
54
34
1.9
2.5
3.3
Bin size (long × lat), bin length scale in kilometers (square root of the area covered by the bin), number of bins, average number of
observations in bins, mean standard error in the bins, number of members in cluster, number of clusters, mean cluster diameter, and
mean standard error in clusters
Ocean Dynamics
a) 4 x 2
b) 2 x 1
c) 1 x 0.5
76
76
76
74
74
74
72
72
72
70
70
70
68
68
68
66
66
66
64
64
64
62
62
− 15 − 10
−5
0
5
10
62
− 15 − 10
15
−5
d) m=125
0
5
10
− 15 − 10
15
e) m=75
76
76
74
74
74
72
72
72
70
70
70
68
68
68
66
66
66
64
64
64
62
62
62
−10
−5
0
5
10
15
−15
−10
−5
g) m=125
0
5
10
15
−15
76
74
74
74
72
72
72
70
70
70
68
68
68
66
66
66
64
64
64
62
62
62
−5
0
5
10
15
−15
−10
−5
0
5
10
15
−5
0
5
10
15
i) m=45
76
−10
−10
h) m=75
76
−15
0
f) m=45
76
−15
−5
5
10
15
−15
−10
−5
0
5
10
15
Ocean Dynamics
Fig.
y)|
3 Pseudo-Eulerian estimate of the mean speed |U(x,
derived through averaging of the synthetic Lagrangian observations. Top Obtained by binning the data in grids with varying
bin size—4◦ × 2◦ (a), 2◦ × 1◦ (b), and 1◦ × 0.5◦ (c). Bins with no
data are plotted in gray. Middle Obtained by clustering the data
with different numbers of members—m = 125 (d), m = 75 (e),
and m = 45 (f). Bottom Clustered estimates interpolated onto a
regular grid of (long × lat) = 0.25◦ × 0.25◦ —m = 125 (g), m = 75
(h), and m = 45 (i)
(panels a–c). Panels d–f show the corresponding fields
for the clusters.
One might expect that, because increasing the bin/
cluster area increases the number of observations in
them, this would likewise reduce the errors. But the
percentage of rejected bins is actually greater with
larger bins and clusters than with smaller ones. The reason for this lies with the mean flow. Because the mean
is so inhomogeneous, using a larger bin involves averaging over a wider range of U, V values. The standard
error is smaller because the number of observations is
greater, making it less likely that the two estimates are
statistically the same. In a sense, the larger bins produce
a more certain answer of an incorrect velocity. With
smaller bins, the sampled mean is more homogeneous
and the error larger, increasing the probability of reproducing the mean flow fields correctly in the bin area.
A larger proportion of bins than clusters are rejected
for a given mean standard error (Fig. 5c). This is again
a)
b)
80
80
4x2
2x1
1x0.5
70
70
60
% CLUSTERS
60
% BINS
50
40
30
50
40
30
20
20
10
10
0 0
10
125
75
45
1
2
10
0 0
10
3
10
10
1
2
10
10
*
3
10
*
N
N
c)
250
CLUST
BIN
LENGTH SCALE (km)
200
4x2
150
100
2x1
125
75
50
1 x 0.5
45
0
1
1.5
2
2.5
<σ> (cm/s)
Fig. 4 a Distributions of the number of independent observations grouped in bins of different size. b Distributions of the
number of independent observations in clusters obtained with
3
3.5
4
different parameter m. c Mean length scale (square root of the
area covered by the bin and cluster diameter) vs. mean sampling
error for binning and clustering analyses
Ocean Dynamics
a) 4 x 2
b) 2 x 1
c) 1 x 0.5
76
76
76
74
74
74
72
72
72
70
70
70
68
68
68
66
66
66
64
64
64
62
62
62
−15 −10
−5
0
5
10
−15 −10
15
−5
d) m=125
0
5
10
−15 −10
15
e) m=75
76
76
74
74
74
72
72
72
70
70
70
68
68
68
66
66
66
64
64
64
62
−10
−5
0
5
10
15
−10
−5
0
5
10
15
−15
−10
g)
CLUST
BIN
80
60
4x2
40
2x1
1 x 0.5
125
20
75
45
0
1
5
10
15
62
−15
100
% BINS/CLUSTERS REJECTED
−15
0
f) m=45
76
62
−5
1.5
2
2.5
<σ> (cm/s)
3
3.5
4
−5
0
5
10
15
Ocean Dynamics
Fig. 5
Comparison of the means in the bins a, b and clusters d–f
with the actual means, U, V, used in generating the particle trajectories. Purple color codes bins/clusters that have means which
are different from the actual means at the 95% confidence level,
and blue color indicates means that are the same. g Percentage
of bins and clusters where the means are significantly different
(“rejected”)
because the clusters cover smaller areas. The three
types of cluster used produce a rejection rate between
14% and 22%, while 25–60% of the bins are rejected.
Thus, the clusters are more successful at capturing the
actual means.
We would obtain different results had we used a
different metric for comparing the bins and clusters.
For instance, if we match the mean number of observations, we obtain larger clusters and a less wellresolved mean. But, as noted earlier, the mean number
of observations is not representative for the bins, due
to their skewed distributions of observations. This is
because the bins, unlike the clusters, are not necessarily
where the data are.
4 Diffusivities
4.1 Diffusivities with zero mean flow
Now we turn to the eddy diffusivities. There are several technical issues to be addressed. First is how it is
actually calculated. Some compute it from the integral
of the ensemble-mean velocity autocorrelation (Eq. 2;
e.g., Poulain et al. 1996; Poulain 2001; Thompson et al.
2009). Others prefer the product of the residual velocity and displacement (e.g., Swenson and Niiler 1996;
Zhurbas and Oh 2003), while some compute half the
derivative of the mean absolute dispersion (Colin de
Verdiere 1983). The different approaches are not often compared (Zhurbas and Oh 2003). We will do
so briefly here, using the stochastic trajectories with
zero mean flow. Without a mean flow, the residual
velocities are the same as the particle velocities and are
homogeneous.
The diffusivity estimates, κ(t), from the three methods are shown in Fig. 6a. Also shown is the theoretical
curve, obtained by integrating the exponential autocorrelation for a first-order stochastic process:
0
t
2
∞
exp(−|t |/TL ) dt = κ
1 − exp −
.
κ(t) = ν
TL
−t
(7)
With TL = 1 day and ν = 20 cm/s, the asymptotic limit
is κ ∞ = ν 2 TL = 3.46 × 107 cm2 /s.
The derivative of the absolute dispersion and the
product of the velocity and displacement produce the
same result, within the errors. The diffusivities asymptote to the theoretical limit after 3 to 4 days but exhibit
significant oscillations thereafter. The integral of the
autocorrelation on the other hand yields a smoother
curve, and this lies near the theoretical curve. The reason this differs from the other two is that integrating
the mean autocorrelation is a smoothing operation.
So much of the variability seen in the other curves is
removed; Davis (1991) concluded the same. We will use
this method exclusively hereafter.
There are two additional points. First is that the
curves in Fig. 6a derive from 2,000 particles—an enormous number in relation to most observational studies. Such experiments typically have at best an order
of magnitude fewer, and this affects the convergence.
Examples with fewer particles, using the integrated
autocorrelation, are shown in Fig. 6b. With an ensemble
of 100 particles, the diffusivity estimate is within 10%
of the theoretical value. The asymptote can be approximately correct with fewer particles, but the errors are
larger.
Second, because the diffusivities should converge
after 3 to 4 days, we require track segments of at
least that length to obtain proper estimates. Shown in
Fig. 6c are the integrals obtained with 100 trajectories
of varying length. For tracks with five or fewer days,
the curves asymptote to values below the theoretical
limit. Evidently track lengths of 10 days, or ten times
the integral time, are required to obtain reasonable
estimates.
So even in the best case scenario with no mean flow,
a meaningful estimate of the eddy diffusivity requires
100 track segments of at least 10 TL duration. Knowing
this helps interpret the subsequent results with the
mean flow restored.
4.2 Diffusivities with a mean flow
Now consider the diffusivities with the mean flow present. We perform the calculations using the three bin
and cluster classes discussed previously. For the means,
we use averages obtained in the fine resolution cases,
i.e., from the (1◦ × 0.5◦ ) bins and from the m = 45
clusters. Although the mean standard errors are larger
with these cases, they best capture the detailed flow
structure (Fig. 3). We linearly interpolated those means
onto the instantaneous drifter positions to obtain the
residual velocities.
For the bins, we use only those segments of drifter
trajectories while the drifters were in the bins. These
were of varying length, as the drifters spent different
Ocean Dynamics
a)
6
5
theor
adisp/ud
autocorr
κ (107 cm2/s)
4
3
2
1
0
0
2
4
6
8
days
10
12
14
c)
8
7
7
6
6
5
5
4
4
3
3
60
20
10
5
2
1
7
2
κ (10 cm /s)
b)
8
2
5
50
100
500
1
0
0
5
10
15
days
20
25
30
2
1
0
0
2
4
6
8
10
days
Fig. 6 a Diffusivity curves derived from 2,000 particles evolving
for 60 days in a zero mean flow by mean sequences of the time
derivative of the absolute dispersion (adisp), mean products of
the single-particle velocity and its displacement (ud), and integration of the ensemble averages of the autocorrelation sequences
(autocorr), compared to the theoretical value (theor). Estimation
errors δκ are derived from errors on the autocorrelation given by
the t-test at the 95% significance level. b Diffusivity curves from
the autocorrelation method computed with a varying number of
particles, each time series being 60 days long, compared to the
theoretical curve drawn in black. c Diffusivity curves from the
autocorrelation method computed for 100 particles with a varying
length of the time series. The theoretical curve is drawn in black
times in the bins. We averaged the autocorrelations
from the individual tracks to obtain the bin diffusivity
(Davis 1991). We did this for each of the three bin
classes (Table 1).
With the clusters, we essentially reverse the procedure. First, we break all trajectories into segments of
a chosen, uniform time length. Then we calculate the
autocorrelations for each segment. The segment is assigned to a position (the midpoint along the track), and
those positions are clustered as in Section 3.2. Then the
autocorrelations for all segments in the cluster are aver-
aged and integrated. We chose the number of members
in the cluster to be 100. With 10-day segments, this
yielded 122 clusters, with a mean radius of 76 km. With
20-day segments, we obtained 62 clusters with a mean
radius of 90 km.
Thus, with the bins and clusters, we obtain time series of the diffusivities. The question then is how to estimate the asymptotic value, κ ∞ . Ideally, we would take
the value as t approaches infinity. But this is impractical
because the particles leave the bin after a finite period
of time and also because the sampling error increases
Ocean Dynamics
as t1/2 (Davis 1991). A number of authors take the
first maximum value of the series, which is similar to
integrating the autocorrelation to the first zero-crossing
(e.g., Brink et al. 2000; Lumpkin et al. 2002; Rupolo
2007). However, the exponential autocorrelation ob-
a) 4 x 2
tained with the stochastic model theoretically has no
zero crossing at finite lag. So instead, we average the
diffusivity over a fixed period, from 4 to 8 days. If
the mean autocorrelation is shorter than 8 days in a
given bin, the integration terminates. If it is shorter than
b) 2 x 1
c) 1 x 0.5
76
76
76
74
74
74
72
72
72
70
70
70
68
68
68
66
66
66
64
64
64
62
62
62
−15 −10
−5
0
5
10
−15 −10
15
−5
0
5
d) 20 days
76
74
74
72
72
70
70
68
68
66
66
64
64
62
62
−5
0
5
−15 −10
15
−5
0
5
10
15
e) 10 days
76
−15 −10
10
10
15
−15 −10
Fig. 7 Maps of eddy diffusivity, scaled by the target theoretical value, derived from the synthetic particles obtained by the
binning method for different bin sizes—4◦ × 2◦ (a), 2◦ × 1◦ (b),
and 1◦ × 0.5◦ (c)—and by the clustering method for different
−5
0
5
10
15
segment lengths—20 days (d) and 10 days (e). All estimates were
interpolated onto a regular grid of (long × lat) = 0.25◦ × 0.25◦
prior to plotting
Ocean Dynamics
4 days, no estimate of κ ∞ is produced. The results do
not change qualitatively when using other choices for
the averaging period (e.g., from 5–10 days).
The resulting estimates for κ ∞ are mapped in
Fig. 7a–c for bins of different size and in panels d and e
for clusters with 100 track segments of 20 and 10 days,
respectively. We normalize κ ∞ by its theoretical value.
For consistency, all estimates are interpolated onto a
regular grid of 0.25◦ × 0.25◦ and contoured with the
same range of values, from 0 to 1.5. The correct value is
1.0, which is contoured in yellow.
The normalized estimates with the (4◦ × 2◦ ) bins
span the range from near 0 to 1.3. Too low values are
found near the borders of the domain, and too large
a)
22
20
20
18
MTL (DAYS)
16
14
12
10
10
8
6
4x2
4
2x1
2
1 x 0.5
0 0
10
1
2
10
3
10
10
NO. SEGMENTS
b)
c)
2
3.5
1.8
3
3.03
1.6
2.5
1.4
1.2
2
1
0.8
1
0.85
1.9
1
1.5
0.86
1.29
0.69
0.6
1
0.4
0.54
0.5
0.2
0
0.36
1
2
BINSIZE (DEGR)
4
10
20
SEGMENT LENGTH (DAYS)
Fig. 8 a A scatter plot showing the number of segments and
mean segment length obtained in bins for different bin sizes—
4◦ × 2◦ (cyan), 2◦ × 1◦ (blue) and 1◦ × 0.5◦ (green) for the
binning method—and for different prescribed segments lengths
(10 and 20 days, red) for the clustering method. The mean values
of these parameters over all bins/clusters marked with rectangles
0
1
2
BINSIZE (DEGR)
4
10
20
SEGMENT LENGTH (DAYS)
and circles, respectively. The number of segments refers to τ = 0
and it falls off thereafter due to variable length of the tracks
that occur in the bin. b A spread of estimates of eddy diffusivity
κ ∞ , scaled with the “target” theoretical value in binning and
clustering assignments. c The error of the diffusivity estimate
< δκ >, averaged over all bins/clusters
Ocean Dynamics
ones occur near the coasts. In the interior, the values
are consistently low, with typical values of 0.8–0.9. With
the (2◦ × 1◦ ) bins, the diffusivity exhibits smaller-scale
variations, and there are many regions in the interior
were the values are too large. The variations are more
marked with the (1◦ × 0.5◦ ) bins, with pockets of high
and low values.
The diffusivities with the clusters are more uniform,
both for the 20- and 10-day segments. The extreme low
estimates found with the bins do not occur. Instead, the
values vary between 0.8 and 1.2. There are larger values
along the periphery, but also in the interior.
A detailed comparison of the bin/cluster statistics is
shown in Fig. 8. Panel a is a histogram of the average
length of the segments used in calculating the autocorrelation for each cluster or bin. The clusters have
segment lengths of 10 and 20 days, by design. The bins
have a range of values, but in most cases, the average
length is below 7 days. None exceed 10 days. The mean
over all the bins is 5, 3, and 1.5 days, with decreasing
bin size.
The second point concerns the number of segments.
Again, the clusters have nearly the same number. There
are small variations, as the clustering procedure could
not always obtain 100 segments. Nevertheless, most
clusters have 80–120 segments. The bins on the other
hand exhibit a wide range of values. There are some
(4◦ × 2◦ ) bins with over 700 segments and other with
less than 10. And there are some (1◦ × 0.5◦ ) bins with
only two or three segments. The average number of
segments is 280, 131, and 64 for the bins, in order of
decreasing area.
Based on the findings in Section 4.1, we expect that
the binned estimates of κ ∞ should vary more and be
biased low because the segments are generally too
short. This is the case. Shown in Fig. 8b are scatterplots
of the diffusivities for the five cases. The bin estimates
span the range from zero to 1.5 times the actual diffusivity. The spread is less with the larger bins but still
pronounced. In all cases, the diffusivities are skewed
toward low values. Thus, the average diffusivities for
all bins are also low.
The clusters on the other hand yield estimates from
0.8 to 1.2 times the actual diffusivity. The distributions
are not skewed, so the averages over all the clusters,
both with 10- and 20-day segments, agree with the
actual diffusivity.
In the Fig. 8c, we plot the diffusivity errors. These
derive from the student t-test at the 95% significance
level, averaged over 4–8 days and over all bins/clusters
and normalized by the theoretical value of κ ∞ . The
errors are the largest with the (1◦ × 0.5◦ ) bins and
decrease with increasing bin size. However, both cluster
examples have significantly smaller errors. The mean
error is 0.36 times the actual diffusivity with 20-day
segments, as compared with 1.29 times the diffusivity
for the “best” binning case.
Other methods for determining the diffusivity yield
similar results. Using the zero-crossing method for estimating κ ∞ yields a similar range of estimates, albeit
with slightly larger average diffusivities. The diffusivities are nevertheless skewed to smaller values.
The primary shortcoming with the binning calculation is that the segments are too short. With small bins,
there are few particles which remain in any bin for
periods longer than TL . Thus, the mean autocorrelation
curves do not reach the asymptotic period (Fig. 6c).
An alternate approach, in line with that of Garraffo
et al. (2001), Lumpkin and Flament (2001), Lumpkin
et al. (2002), and Rupolo (2007), would be to break
the trajectories into uniform segments and regroup
them in bins. Then one could control the length of the
autocorrelations, just as we have done for the clusters.
But by grouping in bins, we would still obtain different
numbers of observations in different bins, as we found
with the mean velocities.
5 Summary and discussion
We considered a new method for calculating pseudoEulerian mean velocities and eddy diffusivities from
Lagrangian data. This involves grouping a specified
amount of data into spatially localized subsets using a
“clustering” algorithm (e.g. Lloyd 1982; MacKay 2003).
This is in contrast to the commonly used method in
which the data is separated into geographical bins of
a specified size. We compared the two approaches by
analyzing a set of 2,000 trajectories generated with a
first-order stochastic model, with a mean velocity representative of that at the surface in the Nordic Seas and
with comparable eddy parameters.
Using bins yields Eulerian estimates on a uniform
grid. But as the number of observations varies greatly
from bin to bin, so does the statistical significance.
Clustering on the other hand produces sets with roughly
the same number of observations and trajectory segments of the same length. The resulting means and
diffusivities are not uniformly spaced but have much
more uniform statistics.
In terms of the mean velocities, clustering produces
regions of smaller areal extent than binning, for comparable mean standard errors. The bins have widely
different numbers of observations but the clusters have
nearly the same, allowing more control of the significance. With smaller areas, the clusters are better able to
Ocean Dynamics
resolve details of the mean flow. Further, the accuracy
is less dependent on the mean standard error with
clusters than it is with bins.
The means are more accurate with smaller bins,
despite the smaller numbers of observations. Binning
with a cell size of (2◦ × 1◦ ), as done previously for
the Nordic Seas (Poulain et al. 1996; Saetre 1999;
Jakobsen et al. 2003), yields a smooth representation
of the mean. Using smaller bins, however, increases the
chances of individual bins being rejected for having too
few observations (e.g., Poulain et al. 1996; Falco et al.
2000; Thompson et al. 2009). Clustering provides a way
around this by allowing the number of observations to
be specified a priori.
Diffusivities are a more Lagrangian measure than
the means, involving an integral along drifter paths.
With bins, these segments are of varying length, which
impacts the averages. One often finds too many short
segments, and this leads to an underestimate of the
diffusivity. With clustering, one specifies a priori how
long and how many trajectory segments are used for
the averages. The resulting diffusivity estimates exhibit
less variation than with the bins and moreover are not
skewed toward low values.
Of course the mean and diffusivity calculations are
closely related because the means are subtracted from
the trajectories prior to calculating the diffusivities. If
the means are calculated with bins which are too large,
integrals of the resulting residual velocities may not
converge (Swenson and Niiler 1996). With clusters, the
areal coverage is typically less and the means apply
where the trajectories are, so the residual velocities are
better captured.
We clustered data according to nearest-neighbor distance, but other choices are also possible. One could
for instance group data according to distance along an
isopycnal or to position vis a vis topography (LaCasce
2000). In addition, we treat each observation equally,
but one can weight the observations, for instance with
regard to errors on individual positions. Such alterations in the k-means algorithm are straightforward.
A related issue is that of “array bias”, in which
nonuniform deployments can produce errors in the
diffusivities (Davis 1991). While this is often less of
a problem than sampling error (Poulain et al. 1996;
Garraffo et al. 2001), it is nevertheless an issue with in
situ data. Here too, the clustering approach is preferable because diffusivities are determined locally, where
the trajectories are. We do not map onto a uniform grid,
introducing variations in coverage.
However, this mapping onto an irregular grid may
be seen as a shortcoming of the clustering approach. If
the means and diffusivities are to be used in a model,
they must necessarily be interpolated onto a regular
grid. In the present case, this interpolation produced
reasonable results (Fig. 3g–i) because the data coverage
was uniform (Fig. 2). But this is not usually the case
with in situ sets. Nevertheless, the procedure of mapping the nonuniform cluster averages onto a regular
grid reminds the user of where the data actually is. With
binned estimates, this can be less obvious.
A reviewer pointed out that we have avoided the
question of time dependency in the mean flow. Indeed,
the diffusivity is proportional to the lowest frequency
in the Lagrangian spectrum (e.g. LaCasce 2008), and
the mean velocity is ideally the component with zero
frequency. In regions with pronounced seasonal and/or
interannual variability, it is common to segregate the
data into climatological groups of several months or
years, often combined with filtering in the frequency
domain (e.g., Swenson and Niiler 1996; Jakobsen et al.
2003; Sallee et al. 2008). More sophisticated techniques
have also been proposed (e.g., Lumpkin 2003). Such
processing would in any case be done prior to the
proposed clustering, which is really a segregation in
space.
In a coming study, we apply the clustering method to
drifter data from the Nordic Seas. Preliminary calculations suggest that clustering yields a similarly improved
representation of the mean flow and the diffusivities.
The primary challenge with the in situ data, in comparison with the present stochastic set, is that the eddy
field is also strongly inhomogeneous. So more care is
required.
Acknowledgements The work is part of the Poleward project, funded by the Norwegian Research Council Norklima program (grant number 178559/S30). Details are found
on http://www.iaoos.no/ and http://folk.uio.no/ingako/my_files/
POLEWARD_WEBPAGE_MAIN.html. Harald Engedahl provided the MIPOM velocities. We appreciate useful comments
from two anonymous reviewers.
Appendix: The clustering algorithm
We base our clustering procedure on a generalized
version of the Llloyd’s (1982) algorithm for the problem
described by Eq. 5. However, contrary to conventional
applications of k-means (MacKay 2003), in our problem, the number of clusters k does not need to be
guessed at, but it is deduced from the total amount of
data to match the desired number of cluster members
m. Hence, we have developed here a procedure to partition the data into clusters with the number of members being as close as possible to a prescribed value
m. This heuristic numerical solution is possibly not an
Ocean Dynamics
optimal one, but it performed well for the purpose
of this study. The implementation is done with the
MATLAB k-means toolbox, modified accordingly. The
steps of the algorithm are as follows:
•
•
•
Choose the desired number of members in a cluster, m
Given the total number of independent observations n and m, compute the target number of clusters, k=n/m
Start k-means procedure (“batch phase”)
–
–
–
–
–
•
•
A random set of k clusters is randomly seeded
Assign each point to the nearest cluster center
minimizing the squared Euclidean distance in
geographical coordinates (Eq. 5)
Recompute the new cluster centers
The two previous steps continues until the convergence criterion is met (the assignment has
not changed or maximum number of iterations
is reached, set to be 200 here)
The four previous steps are repeated 100 times
(for 100 initial seedings, or “replicates”) and
the “best solution” (global minimum, that is,
the lowest value of the sum of within-cluster
distances, summed over all clusters) is the
output
End k-means procedure
Clusters with the desired number of members
are removed from consideration and stored, while
the entire clustering procedure is repeated on the
smaller data set. The process continues until all
the data are grouped in clusters which satisfy m ∈
(m − 5, m + 5), or until maximum number of iterations, 400, is reached. The requirement was not
met in some subsets, which considered typically
clusters peripheral to the data-covered area. These
were still included in the further analysis making the distribution curves in Fig. 4b differ from
delta-functions.
Large number of iterations and the requirement of
uniform splitting of the data makes the analysis computationally intensive. For that reason, we do not perform
a check for a “local minimum” (in terms of Eq. 5) by a
series of reassignments of the points between clusters.
Nevertheless, we found that repeated runs of the entire
procedure described above led merely to a slightly
different arrangement of clusters, while the reported
results from the Z -test (Fig. 5) changed only within
±2%.
The running time of the entire procedure was ca. 6 h
on x86_64 GNU/Linux machine with 32 GB RAM.
References
Bauer S, Swenson MS, Griffa A, Mariano AJ, Owens K (1998)
Eddy mean flow decomposition and eddy diffusivity estimates in the tropical Pacific Ocean. J Geophys Res
103(C13):30855–30871
Bauer S, Swenson MS, Griffa A (2002) Eddy mean flow
decomposition and eddy diffusivity estimates in the tropical
Pacific Ocean: 2. Results. J Geophys Res 107(C10):3154
Brink KH, Breadsley RC, Paduan J, Limeburner R, Caruso M,
Sires JG (2000) A view of the 1993–1994 California Current
based on surface drifters, floats, and remotely sensed data.
J Geophys Res 105(C4):8575–8604
Colin de Verdiere A (1983) Lagrangian eddy statistics from surface drifters in the eastern North Atlantic. J Mar Res 41:
375–398
Davis RE (1991) Observing the general circulation with floats.
Deep-Sea Res Suppl 38:S531–S571
Davis RE (1998) Preliminary results from directly measuring
mid-depth circulation in the Tropical and South Pacific.
J Geophys Res 103:24619–24639
Falco P, Griffa A, Poulain P-M, Zambianchi E (2000) Transport
properties in the Adriatic Sea as deduced from drifter data.
J Phys Oceanogr 30:2055–2071
Fratantoni DM (2001) North Atlantic surface circulation during
the 1990’s observed with satellite-tracked drifters. J Geophys Res 106(C10):22067–22093
Garraffo Z, Griffa A, Mariano AJ, Chassignet EP (2001)
Lagrangian data in a high-resolution numerical simulation
of the North Atlantic II. On the pseudo-Eulerian averaging
of Lagrangian data. J Mar Syst 29:177–200
Griffa A (1996) Applications of stochastic particle models to
oceanographical problems. In: Adler R, Muller P, Rozovskii
B (eds) Stochastic modelling in physical oceanography.
Birkhauser, Boston, pp 114–140
Jakobsen PK, Ribergaard MH, Quadfasel D, Schmith T, Hughes
CW (2003) Near-surface circulation in the northern North
Atlantic as inferred from Lagrangian drifters: variability
from the mesoscale to interannual. J Geophys Res 108(C5):
3251
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering
algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Koszalka I, LaCasce JH, Orvik KA (2009) Relative dispersion in
the Nordic Seas. J Mar Res 67:411–433
LaCasce J (2005) Statistics of low frequency currents over the
western Norwegian shelf and slope I: current meters. Ocean
Model 55:213–221
LaCasce J (2008) Statistics from Lagrangian observations. Prog
Oceanogr 77(1):1–29
LaCasce J, Engedahl H (2005) Statistics of low frequency currents over the western Norwegian shelf and slope II: model.
Ocean Model 55:222–237
LaCasce JH (2000) Floats and f/H. J Mar Res 58:61–95
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans
Inf Theory 28(2):129–137
Lumpkin R (2003) Decomposition of surface drifter observations in the Atlantic Ocean. Geophys Res Lett 30(14):
1753
Lumpkin R, Flament P (2001) Lagrangian statistics in the central
North Pacific. J Mar Syst 29:141–155
Lumpkin R, Garraffo Z (2005) Evaluating the decomposition
of Tropical Atlantic drifter observations. J Phys Oceanogr
22:1403–1415
Ocean Dynamics
Lumpkin R, Treguier A-M, Speer K (2002) Lagrangian eddy
scales in the Northern Atlantic Ocean. J Phys Oceanogr
32:2425–2440
MacKay DJC (2003) Information theory, inference, and learning
algorithms. Cambridge University Press, Cambridge
Mariano A, Ryan E (2007) Lagrangian analysis and prediction of coastal and ocean dynamics (LAPCOD review). In
Griffa A, Kirwan AD, Mariano AJ, Ozgokmen T, Rossby
T (eds) Lagrangian analysis and prediction of coastal and
ocean dynamics, Chapter 13. Cambridge University Press,
Cambridge, pp 423–467
Orvik KA, Niiler P (2002) Major pathways of Atlantic Water in
the northern North Atlantic and Nordic Seas towards Arctic.
Geophys Res Lett 29(19):1896
Owens WB (1991) A statistical description of the mean circulation and eddy variability in the northwestern North Atlantic
using SOFAR floats. Prog Oceanogr 28:257–303
Poulain P-M (2001) Adriatic Sea surface circulation as derived
from drifter data between 1990 and 1999. J Mar Syst 29:3–32
Poulain P-M, Warn-Varnas A, Niiler PP (1996) Near-surface
circulation of the Nordic Seas as measured by Lagrangian
drifters. J Geophys Res 101:18237–18258
Rossby HT, Riser SC, Mariano AJ (1983) The western North
Atlantic—a Lagrangian viewpoint. In: Robinson AR (ed)
Eddies in marine science. Springer, Heidelberg, pp 66–91
Rupolo V (2007) Observing turbulence regimes and Lagrangian
dispersal properties in the oceans. In Griffa A, Kirwan
AD, Mariano AJ, Ozgokmen T, Rossby T (eds) Lagrangian
analysis and prediction of coastal and ocean dynamics, Chapter 9. Cambridge University Press, Cambridge, pp 231–
274
Saetre R (1999) Features of the central Norwegian shelf circulation. Cont Shelf Res 19:1809–1831
Sallee JB, Speer K, Morrow R, Lumpkin R (2008) An estimate of
Lagrangian eddy statistics and diffusion in the mixed layer of
the Southern Ocean. J Mar Res 66:441–463
Skagseth Ø, Orvik KA (2002) Identifying fluctuations in the
Norwegian Atlantic Slope Current by means of empirical
orthogonal functions. Cont Shelf Res 22:547–563
Swenson MS, Niiler PP (1996) Statistical analysis of the surface circulation of the California Current. J Geophys Res
101(C10):22631–22645
Taylor GI (1921) Diffusion by continuous movements. Proc Lond
Math Soc 20:196–212
Thompson A, Heywood KJ, Thorpe SE, Renner AH, Trasvina
A (2009) Surface circulation at the tip of the Antarctic
Peninsula from drifters. J Phys Oceanogr 39:3–25
Veneziani M, Griffa A, Reynolds AM, Mariano AJ (2004)
Oceanic turbulence and stochastic models from subsurface
Lagrangian data for the Northwest Atlantic Ocean. J Phys
Oceanogr 34:1884–1906
Zhurbas V, Oh IS (2003) Lateral diffusivity and Lagrangian
scales in the Pacific Ocean as derived from drifter data.
J Geophys Res 108(C5):3141
Download