Predicting the most likely state for a basic

advertisement
Predicting the most likely state for a basic
geophysical ow: theoretical framework.
Expository paper submitted in
partial fulllment for the degree of
Master of Science, Mathematics
by Rodrigo Duran 1
Major Advisor: Dr. Yevgeniy Kovchegov
Committee: Dr. Nathan L. Gibson and Dr. Radu Dascaliuc
Department of Mathematics
Oregon State University
Kidder Hall 368
Corvallis, OR 97331-5503 USA
Last revision:
January 9, 2015
1
rduran@coas.oregonstate.edu
Preface
During my PhD research (physical oceanography), I developed an interest in statistical mechanics applications
to geophysical uid dynamics. This interest motivated the topic for my expository paper as the nal requisite to
my master's in math. The underlying objective was writing a paper with which anyone with enough background
may be introduced to the topic without necessarily knowing uid dynamics or statistical mechanics or neither.
To this eect I have added a number of appendix sections as well as references throught the text at the very
least I deem this was a great learning experience.
It is clear from the concept of an expository paper (also called a review paper) that no original ideas are
to be credited to the author, but just to make sure: I claim no original work in this paper. My job has been
to assimilate and put together the information needed to achieve the stated objective. This information has
been mainly found in dierent journal articles, textbooks and talks with specialists. A number of proofs and
explanations of details or parts thereof were not available in the references and have been worked out by the
author under advisement of Dr. Yevgeniy Kovchegov, Dr. Radu Dascaliuc and Dr. Nathan Gibson.
I wish to thank my mathematics advisor, Dr. Yevgeniy Kovchegov, he has been an extraordinary professor
both inside and outside classrooms. I have beneted much from his rigourous proofs as well as from his generous
sharing of insights and meanings (the greater picture as he calls it), all while under a constantly positive and
constructive ambiance.1 For all of this and more, I am indebted. I also thank my committee members: Dr. Radu
Dascaliuc and Dr. Nathan Gibson from the Mathematics department for their insightful help and comments
that made this a better paper as well as for the voluntary committee duty that took from their busy schedules
for my benet. They have been very generous in their availability and explanations which directly resulted in
my academic benet. I also wish to thank my physical oceanography PhD advisor Dr. Roger M. Samelson for
his moral support while completing my master's. I wish to acknowledge his rigorous, very positive and fruitful
academic inuence on my development as a graduate student.
I have undoubtedly improved much thanks to my advisors' and committee members' tutelage.
1 "constantly
positive and constructive ambiance" should be understood in a literal sense.
1
Contents
1 Introduction.
4
2 Foundations for Geophysical Fluid Dynamics
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
Conservation of Mass . . . . . . . . . . . . . . . . .
Boussinesq approximation . . . . . . . . . . . . . .
Conservation of Energy . . . . . . . . . . . . . . .
Navier Stokes equations . . . . . . . . . . . . . . .
Euler equations . . . . . . . . . . . . . . . . . . . .
Buoyancy Frequency . . . . . . . . . . . . . . . . .
Vorticity . . . . . . . . . . . . . . . . . . . . . . . .
2.7.1 Potential vorticity . . . . . . . . . . . . . .
2.7.2 Baroclinic vs barotropic uid . . . . . . . .
Shallow water equations . . . . . . . . . . . . . . .
Conservation of potential vorticity in shallow water
A streamfunction for two-dimensional ow . . . .
2.10.1 Solution to Poisson's equation . . . . . . . .
Rossby number . . . . . . . . . . . . . . . . . . . .
Quasi-geostrophy . . . . . . . . . . . . . . . . . . .
Barotropic QGE . . . . . . . . . . . . . . . . . . .
A basic solution to the barotropic QGE . . . . . .
Non-linear stability of the barotropic QGE . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Galerkin approximation . . . . . . . . . . . . . . . . . . . . . . .
Conserved quantities. . . . . . . . . . . . . . . . . . . . . . . . . .
Non-linear stability of the exact solution to the truncated system
Liuoville property of the truncated system. . . . . . . . . . . . .
Statistical predictions of the truncated system. . . . . . . . . . .
The limit Λ → ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Choosing µ = µΛ . . . . . . . . . . . . . . . . . . . . . . .
4.6.2 The limit of the mean states. . . . . . . . . . . . . . . . .
4.6.3 Choosing α = αΛ and the enstrophy constraint . . . . . .
4.6.4 The energy constraint . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Foundations of Statistical mechanics
3.1
3.2
3.3
3.4
3.5
Mixing leads to ergodicity . . . . . . . . . . . . . . . . . . .
Statistichal mechanics' main ingredient: Liouville property .
Conserved quantities and their ensemble averages . . . . . .
Shannon's Entropy . . . . . . . . . . . . . . . . . . . . . . .
Casimir's conservation laws. . . . . . . . . . . . . . . . . . .
4 Statistical theory for a simple Geophysical Flow.
4.1
4.2
4.3
4.4
4.5
4.6
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
equations
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
5 References
5
6
6
6
6
8
8
9
9
10
10
12
12
13
13
13
14
16
17
20
21
21
24
25
28
29
29
30
31
32
32
34
36
36
37
37
38
A Existence and Uniqueness to Poisson's equation
A.1 Green's identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Maximum-Minimum principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.3 Existence and Uniqueness for Dirichlet and Robin Conditions in a bounded domain.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.4 Existence and Uniqueness for Neumann Conditions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.5 Unbounded domains
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
. . . . . . .
. . . . . . .
40
40
40
. . . . . . .
40
. . . . . . .
41
. . . . . . .
41
B Dominated Convergence Theorem
42
C Ensemble average
42
D Flow map of a dierential equation
43
E Entropy of a Measurable Partition
43
F Lagrange multiplier
45
G Orthogonal projections
46
B.1 Bounded Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.1 Uniquenness of Shannon's entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
E.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
42
44
44
1 Introduction.
Statistical mechanics studies the probability that a system is in a certain state given one or more constraints
which are usually xed conserved quantities. It is a particularly useful and powerful approach for problems with
a large number of degrees of freedom where a complete knowledge of the system is not practical or even possible.
By allowing to reduce the complexity of the system to a few parameters, statistical mechanics allows avoiding
the question of 'what is the state of a system?' by asking instead 'what is the most likely state of the system
given some known constraints?'.
Holloway (1986) has a review of successful applications of statistical mechanics to a variety of geophysical
uid dynamics (GFD) problems, including geostrophic turbulence over topography, two-dimensional turbulence
on a plane and on a sphere, closed-basin circulation and Western intensication, the shape of a thermocline,
baroclinic turbulence, eddy heat transport, predictability (i.e. sensitivity of ow evolution to perturbations in
the initial conditions), stirring of tracer elds, internal gravity waves and buoyant turbulence among others.
More recently, statistical mechanics has been successfully used to understand aspects of large-scale GFD. For
example the Robert-Sommeria-Miller (RSM) equilibrium statistical mechanics has been used to interpret rings
and jets as statistical equilibria (Bouchet & Venaille 2012).
Statistical Mechanics has also been successfully applied to numerical GFD, a subject of great interest for
humanity not only for purely scientic reasons, but also for the large number of applications to real life. A sharp
increase in the ability to numerically simulate oceans and atmospheres, as well as in the interest of projecting
current states into the future have fueled important developments in numerical GFD.
The rest of this paper is organized into a section 2 where the equations of geophysical uid dynamics and
some simplications are introduced; some details of the simplied equations like non-linear stability of the steady
state solution are also presented. Section 3 develops the theory of statistichal mechanics. We obtain a probability
function with which we can predict the most-likely state of the system describing a generalized ow. In section
4 we predict the most likely state of a basic geophysical ow. The rst step applies the theory of section 3
to a Galerkin approximation of the simplied equations from section 2. The nal step extends the result to a
continuum by taking the limit to innity of the truncated system. An appendix includes material for some of
the concepts we use.
4
2 Foundations for Geophysical Fluid Dynamics
In this section we introduce the equations used in GFD from where the simplied models used to make dierent
problems tractable are derived. The main objective is to introduce the physical principles at play as well as their
mathematical expressions. Further information and rigourous derivations can be found in the references given
through out this section. We start by dening what kind of uid dynamics is considered to be GFD and making
some comments on the domain. The dierent sections are based on the classical GFD book by Pedlosky (1987)
as well as from Cushman-Roisin & Beckers (2011), Kundu & Cohen (2008), Majda (2003) and others mentioned
in the text.
Geophysical uid dynamics is the study of uid dynamics on a rotating sphere (our planet is close enough
to be represented as a sphere). We can know that the rotation is relevant to the dynamics of uid ow when a
couple of inequalities constraining time, length and velocity scales, T, L, U respectively, are met. Let Ω be the
ambient rotation frequency dened as:
2π radians
time of one revolution
The criteria for geophysical uid dynamics is that the time scale of the motion satises:
Ω=
1
Ω
but to be able to account for the fact that the timescale is related to the lengthscale through the velocity of uid
motion, the following inequality is usually used instead:
T ≥
U
≤Ω
L
(1)
A natural coordinate system for the Earth is spherical coordinates. However, it is convenient that for many
useful GFD applications, the sphere may be locally represented by a tangent plane such that near a xed
latitude φ0 the horizontal components of spherical coordinates (λ, φ) may be replaced by Cartesian coordinates
(x1 , x2 ) = Re (cos φ0 λ, φ − φ0 ) where Re is the radius of the Earth. This approximation would no longer be
valid when a typical horizontal length scale for the motion we are studying approaches the radius of the earth
(L ≈ Re ).
The coordinate system, tangent to Earth, is rotating with an angular velocity Ω which introduces the Coriolis
parameter (or planetary vorticity). Locally, rotation is about the vertical axis and thus we dene2 f ≡ 2Ω · b
k=
2Ω sin φ which can be represented locally in Cartesian coordinates (again in a tangent plane approximation) as
f ≈ f0 + βx2
(2)
where x2 = Re (φ − φ0 ) and f0 = 2Ω sin φ0 and
β≡
1 df 2Ω cos φ0
=
Re dφ φ0
Re
These local Cartesian coordinates are known as the β -plane approximation. Typical values at mid-latitudes
are f0 = 10−4 s−1 and β = 2 × 10−11 m−1 s−1 (Samelson 2011). For a complete derivation of the Coriolis term
due to a rotating framework the interested reader is referred to chapter 2 of Cushman-Roisin & Beckers (2011)
and references therein.
Fluid dynamics can be studied from an Eulerian perspective that treats the uid as a eld in which velocity and
density are to be determined as a function of space and time, or from a Lagrangian perspective where the uid
is a continuous eld of particles and their trajectories, velocity and mass densities are to be determined at each
point in their trajectory as a function of the particles' identity (often its initial position). Although we shall
work using the Eulerian perspective in this paper, we will allude to quantities that are conserved following the
movement of a weightless (Lagrangian) particle passively moving with the uid (being advected). A quantity
that is conserved following the motion, is said to be materially conserved. For further reading on the two
perspectives of uid ow the interested reader is referred to Price (2006) and Bennet (2006).
2 the
symbol ≡ is used throughout to dene so that a ≡ b means a is dened as expression b.
5
2.1 Conservation of Mass
An important equation used in GFD corresponds to the dierential form of conservation of mass when there are
no sources or sinks:
∂ρ
+ ∇ · ρu = 0
(3)
∂t
The rst term in the left hand side accounts for the local change of density with respect to time, it is
balanced by the mass ux. For incompressible or nearly incompressible uid (like air under regular atmospheric
conditions, or even more so like water in the ocean) the contribution due to change of density with respect to
time is negligible, and (3) can be written:
∇·u=0
(4)
this does not necessarily imply that we are assuming ρ is a constant, rather ∂ ρ/∂t is given (as we will see) by
(5), and it only vanishes identically when conductive or internal heating3 can be neglected, in other words when
motion is adiabatic. The validity of (4) must be examined in each situation by systematic scaling arguments
(Pedlosky 1986).
2.2 Boussinesq approximation
The basis for the Boussinesq approximation is noting that changes in density due to water pressure (compression),
haline contraction and/or thermal expansion are very small compared to the mean value of water density (which
is O(1000kg/m3 ); Vallis 2006). Consequently we can dene density as
ρ̃ ≡ ρ0 + ρ̄(x3 ) + ρ0 (x, t) = ρ0 + ρ (x, t)
where ρ0 is a background or mean density, ρ̄(x3 ) is a background stratication, ρ0 is a density perturbation and
|ρ̄| , |ρ0 | , |ρ| ρ0
The Boussinesq approximation which applies to a vast number of natural phenomena is a commonly used
series of simplications under which (3) reduces to (4). It is can be briey explained as an approximation where
variations in density are neglected except where multiplied by gravity. More details can be found in Vallis (2006)
or Kundu & Cohen (2008). We will be using the Boussinesq approximation throughout this paper.
2.3 Conservation of Energy
Using the notation of the previous section, we present another equation needed to model uid motion namely
conservation of Energy. In approximate form (Pedlosky 1986, Majda 2003) it can be written as:
0
d ρ̄
∂ρ
d2 ρ̄
∂ ρ0
0
+ uH · ∇H ρ + u3
+
(5)
= κ∆H ρ0 + κ 2
∂t
∂x3
dx3
dx3
here a vector with subscript H includes the rst two dimensions only, κ is the coecient of thermal diusivity
and ∆ ≡ ∇ · ∇ represents the Laplace operator. It is important to notice that although they may seem similar,
equations (3) and (5) describe two completely dierent physical principals.
2.4 Navier Stokes equations
We introduce the equations of motion for a uid continuum. A suitable domain for these equations is D × [0, ∞)
where D ⊂ Rn , n ∈ {2, 3} -either two or three dimensional ow- and time t ∈ [0, ∞) .
For a physical space rotating at a constant rate with angular velocity Ω (e.g. Earth) the Navier-Stokes
equations of motion describing Newton's second law are:
Du
+ 2Ω × u = −∇p + ρ∇φ + F (u)
(6)
ρ0
Dt
The left hand side of (6) is the background density ρ0 (mass per unit volume) multiplying:
3
the term for internal heating is not included in (5)
6
1. the material (or total) derivative of the n-dimensional velocity eld u = u (x, t) , x ∈ Rn .
2. the Coriolis acceleration term 2Ω × u that accounts for the rotation of the coordinate system.
The material derivative is a linear operator given by
Df
∂f
≡
+ u · ∇f
Dt
∂t
(7)
it is the time derivative of some f = f (x(t), t) while following the motion of f which is passively transported with
the uid's motion. The second term of the total derivative, known as the advective term, comes from the chain
rule, due to the position vector (at each time) being given by a uid element's path (trajectory) x = x (t). Thus
we use the chain rule and equate dxi /dt = ui to obtain the expression for the material derivative. A comment
on the notation: while f above is a scalar-valued function, in (6) we are applying the material derivative to a
vector u. We will limit ourselves to suggest how it can be thought of: for the i-th component of the vector
equation, we are applying the material derivative to the i-th component of u, which indeed is a scalar. For a
detailed clarication of the notation involving tensors, we refer the interested reader to the paper by DeCaria &
Sikora (2010). When f = ui as in (6) it is momentum that is being transported or advected.
Notice that the advective term makes these equations non-linear. An interesting (well known) observation
made by Pedlosky (1987) is that in physical terms, the non-linearity usually implies interactions between motions of dierent length scales. Thus when numerically integrating (6) on a grid that does not resolve certain
length scales (sub-grid motion), interaction between sub-grid motion and resolved motion will not be explicitly
accounted for. Likewise with any analytical treatment that includes the advection of momentum, we can expect
motions with dierent length-scales to interact and cause the solution to change. This needs to be taken into
account if the ow is projected into a nite number of modes each with a characteristic length scale.
On the right hand side of (6) the rst term is the pressure gradient force. The second accounts for any and
all body forces, represented by a potential φ multiplying the uid perturbation density ρ = ρ (x, t). Forces like
gravity are represented through this term by dening φ̃ = −gx3 , where g represents the Newtonian gravitational
acceleration. When the physical space is rotating it is common to combine the eect of gravity and that of the
centrifugal force due to rotation into φ , in this case g is the eective gravity of the system. This is possible
because the potential force φ can be written as the sum of the gravitational and centrifugal potentials. An
b = (0, 0, 1).
equipotential surface is then the surface perpendicular to k
F represents any nonconservative force, most importantly it represents the frictional forces which are ultimately responsible for dissipation of kinetic energy, draining momentum -that would otherwise be conservedinto molecular disorder (i.e. heat). Frictional forces avoid geophysical ows accelerating to innity in the presence of a persistent external forcing (like solar heating). In large-scale ows however (here large is yet to be
made unambiguous), molecular viscosity is often negligible in the force balance (specially away from boundaries).
It is important to mention that friction at the ocean surface (boundary conditions) is often used as a forcing
mechanism through which wind transmits momentum to the ocean.
The exact expression for F will vary depending on the properties of the uid and is given by what is known
as a constitutive equation, i.e. the relation between stress and deformation in a continuum. A uid at rest
only feels inuence of a normal stress which is isotropic (i.e. ambient pressure). A moving uid feels additional
components of stress due to viscosity. This last part, which is exclusively due to uid motion, is a nonisotropic
tensor called the deviatoric stress tensor: σ ≡ σij and it depends on the velocity gradient tensor ∂ui /∂xj . As any
tensor it can be written as the sum of an antisymmetric and a symmetric part. The antisymmetric part, given by
1/2(∂ ui /∂xj − ∂ ui /∂xj ), represents uid rotation without deformation and does not, by itself, generate stress.
For Newtonian uids like water or air, the assumption (which works remarkably well) is that the deviatoric stress
tensor is a linear function of the symmetric part. The symmetric part is known as the the strain rate tensor eij
and it is given by:
∂uj
1 ∂ui
+
eij =
2 ∂xj
∂xi
7
The assumed linear relationship between the nonisotropic stress and the strain rate tensor can be written as:
σij =
n X
n
X
Kijpq epq
p=1 q=1
F is then given by:
Fi =
n
X
∂σij
j=1
For an incompressible uid, we can use (4) (i.e.
be written as:
Pn
Fi = µ
i=1
∂xj
∂ui /∂xi = 0) to simplify this last expression, F can
n
X
∂ 2 ui
j=1
∂x2j
Here µ is the only constant that survives after assuming that Kijpq is an isotropic tensor and exploiting
the consequential symmetry (see e.g. Kundu & Cohen page 101), also to be able to exchange the order of
dierentiation, we have assumed u ∈ C2 (D). Thus we write
F (u) = µ∆u
The ad hoc assumptions we have mentioned turn out to be a very accurate approximation for geophysical
ows. For a physical intuition on the meaning of such viscous forces the interested reader is recommended section
3.4.4 of Price (2006).
Although the form above is by far the most common dissipation operator, other forms of F are also used in GFD.
Some examples can be found in section 1.1 of Majda and Wang (2006). We will denote a general dissipation
operator by F .
2.5 Euler equations
With F = 0 and ∇φ = g where g = (0, 0, g) is the gravitational acceleration, equation (6) becomes a special
case known as Euler equations (on a rotating frame), which for future reference we will write as:
∂u
1
ρ
+ u · ∇u + 2Ω × u = − ∇p + g
∂t
ρ0
ρ0
(8)
2.6 Buoyancy Frequency
If we take d ρ̄/dx3 in (5) to be a constant, and consider:
u = (0, 0, u3 (x1 , x2 , t)) , p = 0, ρ0 = ρ0 (x1 , x2 , t)
as an elementary solution for equations (4), (5) and (6), we will nd (Majda 2003) that both u3 and ρ0 satisfy a
simple harmonic oscillator (for the case d ρ̄/dx3 < 0) with frequency:
N2 ≡ −
g d ρ̄
ρ0 dx3
(9)
N is the buoyancy frequency of the background stratication and has units of 1/time, a stable stratication
happens when d ρ̄/dx3 < 0 which implies that lighter uid is above heavier uid. If heavier uid is above lighter
uid (d ρ̄/dx3 > 0) the solution is unstable and this is manifested in it's most basic form as a solution with
exponential growth (no longer a simple harmonic oscillator). Unstable solutions do not last long in the physical
world.
8
2.7 Vorticity
The curl of the velocity eld
ω ≡∇×u
(10)
is a measure of the angular momentum of a small spherical particle of uid about it's center of mass, it is thus
used as the basis upon which a conservation law may be sought in hopes of constraining uid motion. Such
constraint facilitates the description and understanding of GFD.
2.7.1 Potential vorticity
The dynamics of geophysical ows can be studied through the equation of a scalar quantity derived from the
principle of conservation of angular momentum. In particular, the equation of a scalar quantity derived from the
vorticity is a prognostic equation from which we may know the evolution of all other variables for a prominent
class of motion (e.g. geostrophy and quasi-geostrophy) which is contained in (8). This is the essential reason
why potential vorticity (yet to be dened) is of central importance to our current understanding of GFD. Similar
vorticity laws play fundamental roles in other types of uid dynamics for the same reasons. We will come back
to nding an equation for a scalar quantity derived from the vorticity vector and how this equation relates to
uid ow. An excellent introduction to vorticity can be found at: http://web.mit.edu/hml/ncfmf.html
In GFD the curl of the uid's velocity (10), is known as the relative vorticity. A physical system like our
planet, rotating with angular velocity Ω, causes the uid on it (like the oceans) to rotate as a solid body (after
a spin-up period), with a uniform angular velocity (planetary velocity) up ≡ Ω × r, where r is a position vector
for a uid particle. The planetary vorticity is therefore twice the rate of rotation ω p ≡ ∇ × up = 2Ω.
Thus in a rotating frame the absolute vorticity is then the sum of the relative and planetary vorticity: ω a =
ω + ω p = ω + 2Ω.
An equation for ω a can be found by taking the curl of the momentum equation (8) (see e.g. Pedlosky 1986
section 2.4). This equation would allow us to know how the vorticity vector evolves in time and space, for an
incompressible uid it has the form:
∇ρ × ∇p
D
ω a = ω a · ∇u − ω a ∇ · u +
Dt
ρ2
Thus we see that following uid motion vorticity can change due to (from left to right in the right hand
side) vertical stretching/shrinking and twisting, changes of volume and nally baroclinicity (to be explained).
However, as mentioned above, we would like to nd a constraint to this evolution in hopes of facilitating our
understanding of the ow, the above equation is really nothing more than a dierent way of writting of (8).
One such constraint is the potential vorticity theorem due to Ertel (1942; see also section 2.5 of Pedlosky
1986). The strength of Ertel's theorem is that it gives an equation by which all dependent variables and their
evolution are completely determined for a number of important GFD simplications. This equation is in terms of
a scalar named potential vorticity. Roughly the vorticity is constrained by projecting it on a surface of constant
density. Under the assumption of adiabatic motion, density is materially conserved (Dρ/Dt = 0) and projecting
the absolute vorticity on a material density isosurface4 allows a generalization of Kelvin's theorem that may be
applicable to oceans and atmospheres. In particular Ertel's potential vorticity allows the existence of a baroclinic
uid (which is of central importance for many types of realistic simulations, it is dened in the next subsection)
while Kelvin's theorem requires the uid to be barotropic. Therefore a general equation for potential vorticity in
terms of any conserved quantity λ (i.e. Dλ/Dt = 0) not necessarily density although usually so and assuming
no frictional forces is given by
∇ρ × ∇p
D ωa
· ∇λ = ∇λ ·
(11)
Dt ρ
ρ3
The term on the right hand side includes the baroclinicity vector which is introduced in the next section.
The quantity Π ≡ (ωa /ρ) · ∇λ is called the potential vorticity.
Remark:
4 A material isossurface is an isosurface that is transported with the ow, it may be deformed but it always remains an isosurface
as long as material conservation holds.
9
2.7.2 Baroclinic vs barotropic uid
Vorticity may be produced through baroclinicity, the physical principle is a torque induced by a mismatch between
the pressure and density isosurfaces (i.e. a mismatch of the centers of gravity that causes rotation). Baroclinicity
is measured through the baroclinic vector. A uid is baroclinic when the baroclinic vector is dierent from zero
∇ρ × ∇p 6= 0
and it is considered barotropic when it equals zero. A barotropic uid is two-dimensional with no variation
in the vertical coordinate except possibly the variation found in the pressure eld, this happens if and only if
density is a function of the pressure: ρ = ρ(p(x, y, z)). Note density may be everywhere constant as well, in
which case the density gradient is zero and the following proof is trivial.
Proposition 2.1 Suppose ∇p, ∇ρ 6= 0, then
∇ρ × ∇p = 0 ⇔ ρ = ρ (p (x, y, z))
Proof (⇐)
Suppose ρ = ρ (p (x, y, z)) then ∇ρ = (∂ ρ/∂p) ∇p which implies ∇ρ × ∇p = ∂ ρ/∂p (∇p × ∇p) = 0.
(⇒)
∇ρ × ∇p = 0 ⇔ |∇ρ × ∇p| = 0 and |∇ρ × ∇p| = |∇ρ| |∇p| sin θ = 0 if θ ∈ {0, π}. We are considering the angle
between a surface of constant density and a surface of constant pressure, thus cases θ ∈ {0, π} are equivalent to
θ = 0, i.e. the isosurfaces are parallel. This in turn implies that we can write ∇ρ = f (x, y, z) (∇p/ |∇p|) for
some scalar-valued function f that at each (x, y, z) gives the magnitude of the vector ∇ρ. We can redene f to
−1
include |∇p| (also scalar valued), so we arrive to ∇ρ = f (x, y, z) ∇p.
Write the gradients as the exterior derivative (a 1-form), e.g. in the case of density:
∇ρ = dρ = ∂ ρ/∂x dx + ∂ ρ/∂y dy + ∂ ρ/∂z dz . Thus f = dρ/dp, and ∇ρ = (dρ/dp) ∇p integrating both sides
we get that ρ = ρ (p (x, y, z)).
2.8 Shallow water equations
A common and very useful approximation to the Euler equations is called the shallow water equations (SWE).
The following two sections follow Pedlosky's (1986) derivations of the SWE closely but not exclusively, the
results will be used to illustrate a time-proven two-dimensional model that can be used to simulate a variety of
geophysical ows.
To describe our domain let the vector g represent the body forces arising from the potential φ in (6) this
of course includes (predominantly) gravitational acceleration. Let our space be R3 , we will take the direction
b so that this is indeed the rotation axis of the uid, this means that
of g as our vertical axis with unit vector k
b. Let us dene the Coriolis parameter f ≡ 2Ω which is, under the
the angular velocity simplies to Ω = Ωk
present circumstances, twice the angular speed. Let the distance from the plane z = 0 to the uid's free-surface
be denoted by h = h (x1 , x2 , t) and let rigid bottom of our domain be given by the bathymetry (or bottom
topography) z = hB (x1 , x2 ). The three dimensional velocity is given by u = (u1 , u2 , u3 ). We will take the
density to be homogeneous. Clearly (3) then reduces to (4) 5 .
Let D be a typical scale for the depth of the water column, it could be taken to be the mean depth for
example. Let L be a characteristic horizontal length scale for the motion. Then the meaning of "shallow" can be
5 The assumption of a constant density could be taken at rst consideration as a serious drawback of the model. It will not be able
to simulate stratied uids, nor the stratication of uids, as we usually encounter in oceans and atmosphere. However, it is possible
to develop the theory for SWE under this assumption and then use a model of several, say N , shallow water layers stacked one above
the other, each one with its own constant density. This allows to simulate stratied uids, often can one encounter two-layer models
for example. It is in fact, possible to consider "continuously" stratied models by taking the limit N → ∞ and solving the equations
by separation, with the vertical structure given through normal modes (see e.g. Shimizu 2011). More advanced mathematical
methods allow for density anomalies to develop across the water column for models that include diusion (Gardiner-Garden 1991).
10
made precise with what Pedlosky calls the "fundamental parametric condition that characterizes shallow-water
theory" :
D
δ=
1
L
Perhaps we still need a bit more precision: what exactly does much less than one mean? The answer is that
L should be at least 10 times bigger than D. The average depth for our planet's oceans is 4km, so we should
expect that the SWE could apply to motions of order 40km and bigger. As Pedlosky points out, the vertical
extent of the major currents in our oceans is usually much less than 4km yet their horizontal scales is often
hundreds or thousands of kilometers. Atmospheric motions similarly have a very small aspect ratio δ .
We can now look into what is big and what is small. Let us consider a characteristic magnitude U for
horizontal velocities and W for vertical velocities, let T be a characteristic scale for time. The rst two terms in
(4) are then O (U/L) so that W needs to satisfy W/D ≤ O (U/L). So W is bounded above by O (δU ). Typical
horizontal velocities for the oceans are usually less than 0.5 m/s, a very energetic current (like the Gulf Stream)
is about 2m/s, so we conclude that vertical motion is often negligible.
Having mentioned the above main assumptions, the interested reader is referred to section 3.3 of Pedlosky
(1986) for a formal dimensional analysis of equation (6) that leads to the shallow water equations.
Equivalently,
∂p
= −ρg + O δ 2
(12)
∂x3
also known as the hydrostatic approximation, can be taken as the denition of a shallow-water model. Integrating
(12) (which is the vertical equation of motion) from an arbitrary depth x3 to the ocean's surface h and assuming
that pressure at the surface is constant for all time, i.e. p (x1 , x2 , h, t) = p0 , we get that the pressure at any
given point is equal to the weight of the water column above that point and at that time:
p (x1 , x2 , x3 , t) = ρg h̃ − x3 + p0
(13)
where h̃ = h̃ (x1 , x2 , t) is the height of the ocean's surface above the x3 = 0 plane.
The horizontal momentum equations then become:
∂u1
∂u1
∂ h̃
∂u1
+ u1
+ u2
− f u2 = −g
∂t
∂x1
∂x2
∂x1
(14)
∂u2
∂u2
∂ h̃
∂u2
+ u1
+ u2
+ f u1 = −g
∂t
∂x1
∂x2
∂x2
Animportant consequence
of (13) is that the horizontal pressure gradient is independent of x3 : (∂p/∂x1 , ∂p/∂x2 ) =
ρg ∂ h̃/∂x1 , ∂ h̃/∂x2 . Thus horizontal accelerations are independent of x3 . This means that if the horizontal
velocities are initially independent of x3 , then they will remain so for all time. We will assume such initial
condition in our work. If we then integrate (4) with respect to x3 and consider the proper boundary conditions
(see e.g. Pedlosky section 3.3) our conservation of mass equation becomes:
DH
∂u1
∂u2
+H
+
=0
(15)
Dt
∂x1
∂x2
where H ≡ h̃ − hB has been dened, h̃ is the ocean's surface and hB = hB (x1 , x2 ) is the bathymetry (height
of the ocean's oor above the plane x3 = 0).
11
2.9 Conservation of potential vorticity in shallow water equations
Under the shallow water assumption (i.e. (u1 , u2 ) are independent of depth) and in R3 space with it's canonical
b , the relative vorticity components are:
basis {bi, bj, k}
bi · ω = ∂u3 ∈ O W ∈ O δ U
∂x2
L
L
W
U
∂u
bj · ω = − 3 ∈ O
∈O δ
∂x1
L
L
∂u
∂u
U
2
1
b·ω =
k
−
∈O
∂x1
∂x2
L
And since by denition of shallow water δ 1, (in fact δ 0.1 is often satised) we have that only the
vertical component of the relative vorticity needs to be considered. So let
b·ω =
ζ≡k
∂u2
∂u1
−
∂x1
∂x2
(16)
By cross-dierentiating (14), i.e. the rst equation with respect to x2 and the second equation with respect
to x1 , then h, the uid's free-surface (assumed to be twice continuously dierentiable) is eliminated and we get
an equation for the time evolution of ζ :
Dζ
∂u1
∂u2
= − (ζ + f )
+
(17)
Dt
∂x1
∂x2
Using (15) we can write (17) as
ζ + f DH
Dζ
=
Dt
H Dt
DH
So vorticity can increase due to column stretching Dt > 0 or decrease due to column shrinking
As we have f constant (with respect to time) we can write (18) as:
D ζ +f
=0
Dt
H
(18)
DH
Dt
< 0.
(19)
so that Π ≡ ζ+f
H is conserved following uid motion. Notice that if the depth increases (decreases) then the
absolute vorticity must decrease (increase) so that Π may remain constant for any given uid parcel. In the
context of SWE (i.e. two-dimensional ow), Π is called the Potential Vorticity.
B
It can be shown that if we dene λ ≡ x3 −h
and if the uid is barotropic then (11) coincides with (19). In the
H
case of (19) the conserved quantity λ is the ratio between the relative vertical position of a uid particle in the
water column and the total depth, it is easy to show that it is indeed conserved following uid motion (see e.g.
Cushman-Roisin & Beckers 2011 page 215).
2.10 A streamfunction for two-dimensional ow
An incompressible two-dimensional ow can be expressed entirely in terms of a scalar-valued function known
as the streamfunction. Following Samelson & Wiggins (2006), let u = (u1 , u2 ) be the velocity and assume
incompressibility for a homogeneous uid: ∂ u1 /∂x1 + ∂ u2 /∂x2 = 0. By integrating the divergence over a surface
and by using (the generalized) Stoke's theorem we can write:
Z Z ∂u2
∂u1
dx1 dx2
+
0=−
∂x1
∂x2
I R
= − u · nds
C
I
u2 dx1 − u1 dx2
=
C
12
Where C is the boundary of region R, with outward normal vector n, if the velocity eld is time-dependent, t
is treated as a xed variable and integration is done only with respect to the spatial variables. The integrand of
the last line integral can be represented as the (spatial) dierential of a scalar-valued function ψ = ψ(x1 , x2 , t).
dψ(x1 , x2 , t) = u2 (x1 , x2 , t)dx1 − u1 (x1 , x2 , t)dx2
Using the denition for the dierential dψ (i.e. the dierential of a 0-form ψ is a 1-form dψ ):
dψ =
we arrive to:
∂ψ
∂ψ
(x1 , x2 , t)dx1 +
(x1 , x2 , t)dx2
∂x 1
∂x 2
∂ψ
(x1 , x2 , t) = u2 (x1 , x2 , t)
∂x1
∂ψ
(x1 , x2 , t) = −u1 (x1 , x2 , t)
∂x2
(20)
And so the streamfunction can be dened as
∇ψ ≡ u⊥
(21)
where u⊥ ≡ (u2 , −u1 ). Because the line integral above is equal to zero, the sign of the streamline in (20) is
dened by convention. It corresponds to ow following the streamline and higher streamfunction values to the
right.
It follows by using denition (16) that
∆ψ = ζ
(22)
2.10.1 Solution to Poisson's equation
By solving Poisson's equation (22) in terms of the known vorticity ζ , the velocity eld is completely determined,
it is therefore essential to be able to invert the Laplacian. Existence and Uniqueness to Poisson's equation is
briey treated in Appendix A.
The fundamental solution of Poisson equation in two dimensions is given by the Green function:
G (x, y) =
1
1
log
2π
|x − y|
(23)
where x, y ∈ R2 , and x is xed.
2.11 Rossby number
The Rossby number compares the distance traveled horizontally by a uid parcel during one revolution (U/2Ω)
with the length scale over which the parcels motion takes place. Rotational eects are important when the former
is less than the latter. It is an equivalent way of writing (1):
Ro =
U
2ΩL
Rotational eects are dynamically important when the Rossby number is of the order of unity or less. The
Rossby number could also be thought of as a comparison of the advection term to the Coriolis acceleration. As
a general rule the characteristics of geophysical ows vary greatly with the values of the Rossby numbers.
2.12 Quasi-geostrophy
GFD is heavily constrained by Earth's rotation, this is the case when the Rossby number is small and advection
is negligible. The Coriolis acceleration 2Ω × u is a leading term in (6) or (8), and it is largely balanced by the
pressure gradient − ρ10 ∇p. This is called geostrophic balance and it is often the leading order balance found in
GFD.
13
Clearly there are no accelerations in this balance which would imply that some other terms must likewise
be relevant, otherwise once geostrophic balance is achieved, uid ow would be stationary for eternity which we
know is not the case. When the timescale is longer than about a day, geophysical ows are usually in a nearly
geostrophic state but not identically so. An advantageous simplied dynamical formalism called quasi-geostrophy
captures the leading geostrophic balance plus second order terms that allow to emulate large-scale ow with great
accuracy. The underlying idea is an asymptotic expansion retaining the two leading terms. To make this even
better quasi-geostrophy can be completely studied in terms of a suitable potential vorticity equation. Thus we
have a very good approximation to GFD which is amendable to analytical treatment.
In particular for a barotropic uid the SWE equations can be written in terms of a suitable potential vorticity. A
formal and rigorous derivation of the quasi-geostrophic equations (QGE) can be found in Majda (2003) it includes
convergence of the SWE to the QGE. Further information can be found in Pedlosky (1988), Cushman-Roisin &
Beckers (2011) and Vallis (2006). Here we will only state and introduce the barotropic geostrophic equations on
a β -plane for latter use in section 4.
2.13 Barotropic QGE
In this section we describe the equations we will be using for an analytical application of statistical mechanics
to GFD. We will set the background density ρ0 = 1, as is often done. There is no loss of generality by doing so.
It will be convenient to write the velocity in terms of the streamfunction. Making sure we are consistent with
(21) we can dene the velocity as the orthogonal gradient of the streamfunction:

u = ∇⊥ ψ =
−∂ ψ/∂x2
∂ ψ/∂x1


Our domain is [0, 2π] × [0, 2π] and we will work with periodic boundary conditions in both spatial directions.
Periodic boundary conditions are not physically unreasonable except if we want to simulate ow near a boundary.
However, even in that case generalizations to a number of other geometries like a channel (periodic in one
direction) or a closed basin are not dicult. Doubly-periodic boundary conditions allow us to avoid generation
and dissipation of vorticity near boundaries, in terms of Navier-Stokes equations we are comfortably allowed to
set F = 0.
We will include Earth's rotation by using a β -plane but since (2) is not a periodic function, a large scale
mean ow needs to be introduced to overcome the diculty (as we will see in the next section). More precisely,
we introduce a non-periodic stream function ψ including a non-periodic large-scale mean ow and a small-scale
periodic component ψ 0 :
(24)
ψ = −V (t) x2 + ψ 0
Thus our stream function satises: ψ (x1 + 2π, x2 , t) = ψ (x1 , x2 + 2π, t) = ψ (x1 , x2 , t). The velocity eld
is given by:
0
0

⊥
u=∇ ψ=

0

V (t)
 + ∇⊥ ψ 0
0
(25)
The total energy, which is the squared L2 norm of the velocity (divided by two), is given by
1
E≡
2
Z
2
(2π) 2
1
|u| dx =
V (t) +
2
2
2
Z
2
|∇ψ 0 | dx
(26)
Moving on, we arrive at the potential vorticity, which for our case will include the relative and planetary
vorticity (as explained above) but also a topographic term h = h (x1 , x2 ) usually called the bottom topography,
this term is equal to the fractional change in layer thickness divided by the Rossby number. Thus, our potential
vorticity is given by:
q = ∆ψ + βx2 + h
14
(27)
This form of potential vorticity arises from the asymptotic expansion of the SW potential vorticity (19), the
details can be found in Majda (2003).
The potential vorticity is also split into a small scale (q 0 ) and a large scale component (βx2 ):
q = q 0 + βx2
q 0 = ∆ψ 0 + h
The relative vorticity is given in terms of the small-scale stream function:
ζ = ∆ψ 0
The conservation of potential vorticity is expressed as:
∂q
+ J (ψ, q) = 0
∂t
(28)
where J (·, ·) is the Jacobian determinant, in our case we can write J(ψ, q) = ∇⊥ ψ · ∇q .
We introduce the following notation for simplicity:
Z
− dλ ≡
D
1
λ(D)
Z
dλ
D
Finally we will include a term by which the small scale motion and the large scale motion will interact:
topographic stress.
Z
d
∂h 0
V (t) = −−
ψ
(29)
dt
∂x1
This term comes from the conservation of energy law. We have that the total energy is given by the sum of
large-scale and small-scale contributions:
ET otal = ELarge + ESmall
and if we assume conservation of Energy we can write the above expression as equal to a constant:
Z
1
2
2 1
ET otal = (2π) V 2 (t) +
|∇ψ 0 | = K
2
2
which means
dET otal
dV (t)
2
= (2π) V (t)
+ V (t)
dt
dt
Z
∂h 0
ψ =0
∂x
From where (29) is obvious. To see why the time derivative of the small-scale energy is equal to V (t)
rst note that:
Z
1
2
ESmall =
|∇ψ 0 |
2
Z
1
=
∇ψ 0 · ∇ψ 0
2
Z
1
=−
ψ 0 ∆ψ 0
2
15
R
∂h 0
∂x ψ
we
We can then use (28) to write
Z
dESmall
1
dq
ψ0
=−
dt
2
dt
Z
Z
1
dq
0 ⊥ 0
ψ ∇ ψ · ∇q + ψ 0 V (t)
=
2
dx
| R
{z
}
=−
q∇·(ψ∇⊥ ψ 0 )=0
∂ ∆ψ 0 0
+ V (t)
=V (t)
ψ
∂x
|
{z
}
R ∂ 1
=− ∂x
( 2 |∇ψ0 |2 )=0
Z
Z
∂h 0
ψ
∂x
Intermediate steps of the integrations by parts are indicated with underbraces. It is thus that we arrived to
Z
dESmall
∂h 0
= V (t)
ψ
dt
∂x
To see why this term is called topographic stress, the integral in (29) can be written (integration by parts over
periodic functions) as:
Z
Z
∂ ψ0
∂h 0
ψ = −−
h
−
∂x1
∂x1
which helps us visualize the reason for such a name: ∂ ψ 0 /∂x1 is proportional to the pressure gradient so that
(∂ ψ 0 /∂x1 ) h can be named topographic stress: a linear representation of stress for the geostrophic velocity with
the proportionality coecient given by the bottom topography h.
Thus, our set of equations are:
∂q
+ J (ψ, q) = 0
∂t
Z
d
∂h 0
V (t) = −−
ψ
dt
∂x1
q = ∆ψ 0 + h + βx2 ,
(30)
ψ = −V (t) x2 + ψ 0
The periodic functions are h, ζ and ψ 0 .
The set of equations (30) are close to being the simplest equations capable of meaningfully describing geophysical ows, we will use them as a test case for the predictions of statistical mechanics. In particular we
are interested in simulating quasi-geostrophic turbulence, a common feature in GFD that forms long lasting
quasi-stationary structures like long-lived eddies for example. This means we need to verify that in some sense
our equations are stable, we make this precise in the following subsection. QG turbulence is a close relative
of two-dimensional turbulence (conserves energy and estrophy) but it is more realistic by allowing the eects
of vortex stretching and varying Coriolis parameter. It could also allow for vertical stratication which is not
pursued in this paper.
2.14 A basic solution to the barotropic QGE
From here on we will assume that the solutions (and their perturbations of course) live in Sobolev space H2 .
We will also assume that any function with a Fourier expansion is at least piecewise C1 ; whenever needed to
exchange order of dierentiation functions are assumed to exist in C2 . Consider the term J (ψ, q), if q = q(ψ) then
the Jacobian determinant vanishes (this statement is proven in proposition (2.1) under a dierent notation). A
common ansatz is a linear relationship q = µψ . It turns out this is often a good approximation (see e.g. Merryeld
et. al. 2001, and references therein). Under these conditions the QG equation becomes:
µ
∂ψ
= 0,
∂t
µψ = ∆ψ 0 + βx2 + h (x)
16
(31)
The beta-plane eect is eliminated from (31) by assuming a large-scale mean ow V0 = − βµ so that ψ =
−V0 x2 + ψ 0 where
(32)
µψ 0 = ∆ψ 0 + h (x)
If there where no topographic eects, then µ would need to be an eigenvalue of the Laplacian with associated
eigenfunctions given by the small-scale stream function ψ 0 . But with topography (32) is solvable only if µ is not
an eigenvalue of the Laplacian, as we will verify through Fourier analysis. Assume for simplicity that h and ψ 0
are periodic and have zero mean, with their Fourier expansion given by:
X
X
b
h=
hk eik·x + c.c.,
ψ0 =
ψb0 k eik·x + c.c.
(33)
|k|6=0
|k|6=0
whre c.c. stands for complex conjugate (reality condition). Introducing (33) into (32) we get:
2
µ + |k| ψb0 k = b
hk
(34)
2
Clearly to solve this we need µ 6= − |k| . From here we can get an expression for the stream function and
thus for the velocity and relative vorticity as well:
ψ = − V 0 x2 + ψ 0 =

⊥
u =∇ ψ =
−
ζ =∆ψ = −
X
1
β
b
x2 +
hk eik·x
2
µ
|k|
+
µ
|k|6=0

β/µ
+
0
X
|k|
X
1
|k|6=0
|k| + µ
2

(35)

−k2
b
hk ieik·x  
k1
(36)
2
2
|k| + µ
|k|6=0
b
hk eik·x
(37)
To make it unambiguous that this is a steady-state solution in future references, we will use the notation
V ≡ V0 and ψ ≡ ψ and in general q ≡ q,. We now return to the stability of the steady-state solution.
2.15 Non-linear stability of the barotropic QGE
Non-linear stability analysis uses the full equation (without linearizing) by bounding perturbations and considering the evolution of small, nite perturbations. The intuitive idea is that a steady-state solution is non-linearly
stable if for small initial perturbations, the resulting perturbations remain small for all time. We will use the
standard denition of inner product under the following notation:
Z
hf gi ≡ − f gdx
and we will measure perturbations using L2 -norm:
k (q, V
) k20
=
kqk20
+ kV
k20
Z
2
2
= − |q| + |V |
where the zero subscript refers to the above inner product.
Assume that a steady-state solution to (30) is given by q = q(x1 , x2 ) and by V (t) = V .
Denition 1 The steady state solution q, V of the barotropic quasi-geostrophic equations (30) is non-linearly
stable in the L2 sense if there are constants C > 0 and R > 0 such that for any initial perturbation δq0 of q
and any initial perturbation δV0 of V given by:
q t=0 = q + δq0 ,
V t=0 = V + δV0
17
with k (δq0 , δV0 ) k0 ≤ R, then the resulting perturbed solution (q (x1 , x2 , t) , V (t)) of (30) given by:
q (x1 , x2 , t) = q + δq (x1 , x2 , t) ,
V (t) = V + δV (t)
has perturbations that satisfy:
k (δq (t) , δV (t)) k0 ≤ Ck (δq0 , δV0 ) k0
for any time t > 0.
We will also need the following denition:
Denition 2 A non-linear functional W (δq, δV ) is locally positive denite (with respect to L2 norm) provided there is R0 > 0 and C ≥ 1 so that if k (δq, δV ) k0 ≤ R0 then
C −1 k (δq, δV ) k20 ≤ W (δq, δV ) ≤ Ck (δq, δV ) k20
(38)
If we can construct such a conserved functional then we can prove that the steady state is non-linearly stable.
In particular,
Proposition 2.2 Assume that W (δq, δV ) is a conserved functional for the quasi-geostrophic
equations (30) with
the property that it is locally positive denite. Then the steady state q (x1 , x2 ) , V is non-linearly stable
Proof
R0
Assume that the initial perturbation satises k (δq0 , δV0 ) k0 ≤ R = 2C
. Suppose that the future perturbation (δq (t) , δV (t)) satises k (δq (t) , δV (t)) k0 ≤ R0 for all t ≤ T , since W (δq, δV ) is a conserved quantity for
the ow and it satises the inequalities in (38) we then have that:
C −1 k (δq (t) , δV (t)) k20 ≤ W (δq (t) , δV (t)) = W (δq0 , δV0 ) ≤ Ck (δq0 , δV0 ) k20
from where we get that
k (δq (t) , δV (t)) k0 ≤ Ck (δq0 , δV0 ) k0 ≤ R0 /2
We conclude that (38) holds for all time and hence the steady state q (x1 , x2 ) , V is non-linearly stable.
To see that (38) holds for all time suppose that at any time t ≤ T we get k (δq (t) , δV (t)) k0 = R0 this is a
contradiction since it follows that (δq (t) , δV (t)) k0 ≤ Ck (δq0 , δV0 ) k0 ≤ R0 /2. We can then choose a bigger T
and repeat the same reasoning ad innitum.
The above proposition shows that to prove non-linear stability, it is enough to produce a conserved functional
W (δq, δV ) satisfying (38). At this point we will assume that total energy:
Z
Z
2
1
1 1
1
E (q (t) , V (t)) ≡ V 2 (t) + − ∇⊥ ψ 0 = V 2 (t) − − ψ 0 ζ
(39)
2
2
2
2
(where integration by parts of a periodic function has been used in the last part), as well as the large scale
enstrophy:
Z
1
2
(40)
E (q (t) , V (t)) ≡ βV (t) + − |q 0 |
2
are two conserved quantities. For now we will assume they are conserved, when we use this fact in our GFD
application we will actually prove that this assertion holds. It then becomes natural to try combining this two
conserved quantities to construct the needed conserved functional W for the perturbation (δq (t) , δV (t)).
In section 2.14 we found a steady state solution. Considering
that in the northern hemisphere β > 0, and
considering that the large scale mean velocity was V , 0 = (−β/µ, 0), we can see that µ > 0 corresponds to
a mean westward ow. We will now prove that for westward mean large-scale ow the steady state solution is
non-linearly stable.
Theorem 2.1 The steady-state solution q, V given in section 2.14 is a non-linearly stable steady-state solution
of the quasi-geostrophic equations (30) provided that 0 < µ < ∞
18
Proof Towards constructing the functional W, we will expand the conserved quantities about a mean state, let
q = q + δq and V = V + δV . Then the energy in (39) has the expansion:
Z
Z
1
1
E q + δq, V + δV = E q, V + V δV − − ψδq + δV 2 − − δq∆−1 δq
(41)
2
2
and enstrophy has the expansion:
Z
Z
1
2
E q + δq, V + δV = E q, V + βδV + − qδq + − |δq|
2
(42)
We will do a linear combination of the energy and enstrophy expansions so that we can form the form W, notice
that we will get rid of the linear terms in the process:
Wµ (δq, δV ) = E q + δq, V + δV − E q, V + µ E q + δq, V + δV − E q, V
Z
Z
Z
µ
1
µ
2
= β + µV δV + − q − µψ δq + δV 2 + − |δq| − − δq∆−1 δq
2
2
2
Since the steady state satises q = µψ and V = −β/µ we have that (43) reduces to:
Z
Z
µ 2 1
µ
2
Wµ (δq, δV ) = δV + − |δq| − − δq∆−1 δq
2
2
2
To verify that the functional W (δq) is positive denite we use the following Fourier expansion:
X
ik·x
d
δq =
δq
ke
(43)
(44)
(45)
k6=0
Then Parseval's identity6 can be used to represent the functional as:
1X
µ
µ
1+
Wµ (δq, δV ) = δV 2 +
2
2
2
|k|
k6=0
!
2
d
δqk (46)
We are looking for a constant Cµ that depends on µ such that the following two inequalities hold:
Cµ−1 ≤
1
µ
1+
≤ Cµ ,
2
s
Cµ−1 ≤
µ
≤ Cµ
2
where 1 ≤ s < ∞ and 0 < µ < ∞. Then Cµ ≥ 1 can be given by:
(
2/µ
if 0 < µ < 1
Cµ =
max {(1 + µ) /2 , 2} if 1 ≤ µ < ∞
(47)
(48)
The fact that Cµ → ∞ as µ → 0 is consistent with the fact that the coecient for the mean ow perturbation
goes to zero as well. With no mean ow the analysis is dierent than what shown here, more details can be
found in section 4.2 of Majda & Wang (2003). Plugging (47) back into (46) gives the desired result:
Cµ−1 k (δq, δV ) k20 ≤ Wµ (δq, δV ) ≤ Cµ k (δq, δV ) k20
which proves that for 0 < µ < ∞ the steady-state solution of the QG equations (30) are non-linearly stable. 6 the innite sum of the squares of the Fourier coecients of some function equals the square of the L2 norm of same function. It
is a generalization to a Hilbert space of the Pythagorean theorem which states that the sum of the squared components of a vector
equals the squared norm of the vector.
19
3 Foundations of Statistical mechanics
In the context of classical mechanics, the state of a system is completely described by the phase space, the space
formed by the coordinates (i.e. where the system is) and the momentum associated to those coordinates (i.e.
where is the system heading). An important concept for statistical mechanics is that there cannot be any loss
of information in phase space. Given a position at a certain time we should be able to know where the particle
was at a prior time, this is essentially the Loiuville property. Qualitatively, trajectories do not merge, nor they
cross nor do they approach asymptotically. If there is stretching in one direction there must be compression in
another to compensate, we equivalently say that the ow is incompressible. This is a crucial property that will
allow an initial probability distribution P0 (x) in phase space to be carried to a probability distribution P (x, t) at
any future time (as will be shown). The Liouville property thus allows a probability measure over phase space at
any given time with which we can dene an information-theoretic entropy. Another necessary condition for the
statistical mechanics approach to work is the existence of conserved quantities which impose constraints on the
probability distribution. With the Liouville property plus the existence of conserved quantities we can invoke the
maximum entropy principle to obtain the most likely probability distributions on phase space i.e. least biased
probability or Gibbs measure. It can be further proved that the Gibbs measure is an invariant measure of ow
maps. Thus statistical solutions like a mean ow and uctuations about that mean can be estimated by using
the most likely probability distribution.
The fact that we can nd the statistical equilibrium state (most probable state) does not necessarily imply that
this state will be reached. Additionally, we need our system to be non-linear with which a healthy amount
of bending and twisting of trajectories can be achieved. A chaotic system will help ensure ergodicity, strong
mixing makes ergodicity more likely. Proving the convergence to statistical equilibrium is an open problem in
mathematics and will not be pursued, but it is good to know that numerical simulations as well as experience
conrm the statistical mechanics approach as a useful tool in predicting the most likely mean state of many a
system as well as its uctuations about the mean. In the next subsection we present a an argument showing
that ergodicity is indeed feasible.
But how does this relate to Geophysical Fluid Dynamics? Physical space is called the conguration space.
The properties of the phase space ow eld that have just been mentioned are similar to those found in the
conguration space of many types of ocean ows which in general and at large scales, tend to be chaotic,
quite nearly incompressible and nearly two-dimensional. These geophysical turbulent ows tend to form largescale coherent7 structures (perhaps most notably eddies). The robustness and ubiquitousness of eddies (see e.g.
Chelton 2007, 2011) suggest that the large-scale coherent ow does not depend on the ne details of the dynamics
(in a sense we will not make precise!). As their name suggests, these structures last for relatively long periods of
time8 thus the importance that the equations we use to simulate the ow are stable in the sense of section 2.15.
Under these circumstances, using statistical mechanics to solve GFD problems becomes a natural alternative, the
ultimate goal being the prediction of these large-scale coherent features based on the bulk properties retained in a
theoretical formulation that avoids the tremendous amount of degrees of freedom inherent to the original problem.
Onsager was the rst to use statistical mechanics to explain two-dimensional turbulence in 1949. A review of
Onsager's contributions to turbulence theory can be found in Eyink & Sreenivasan (2006). The Robert-SommeriaMiller equilibrium statistical mechanics explains spontaneous organization of unforced undissipated geophysical
ows. An excellent as well as relevant introduction to quasi-geostrophic turbulence can be found in Salmon (1998,
Chapter 6), it includes references to a number of reviews on the topic. For further readings on applications of
statistical mechanics to GFD the books by Bouchet and Venaille (2011 and 2012) are recommended.
In this section we introduce the mathematical foundations of statistical theories for geophysical ows following
Majda and Wang (2006). We let our phase space to be RN . We begin by revisiting the claim that mixing9 causes
ergodicity.
7 Coherent in GFD usually referes to a spatio-temporal structure persisting during the evolution of a system.
8 in GFD terms: long when compared to the eddy's turn-around time
9 Mixing is meant in the probability sense of the word. In the context of GFD mixing has a dierent meaning;
context and in GFD terminology it should be thought of as stirring.
20
under the current
3.1 Mixing leads to ergodicity
It was claimed in the introduction to this section that mixing makes ergodicity more likely, in this subsection
we make this statement precise. The reader not familiar with the problem of ergodicity is encouraged to read
appendix C rst.
For the purpose of outlining a feasibility argument, we need the following mathematical denitions (Sarig
2006 and Durret 1996):
Denition 3 The orbit of transformation
system originally in state X.
Remark:
counterpart.
Φ
denoted by {Φn (X)}n∈Z is a record of the time evolution of a
Here we used discrete time, it is straightforward to extend these concepts to their continuous
Denition 4 A measurable set A ∈ B is called invariant set, if Φ−1 (A) = A.
Denition 5 Let (D, B, P ) be a probability space, a quartet
transformation if
(D, B, P, Φ)
is called a measure preserving
1. Φ is measurable: A ∈ B ⇒ Φ−1 A ∈ B, and
2. P is an invariant probability measure: i.e. P Φ−1 A = P (A) , ∀A ∈ B
Denition 6 A measure preserving transformation
all measurable sets A and B :
Φ
on a probability space (D, A, P ) is called mixing if for
lim P A ∩ Φ−n B = P (A) P (B)
n→∞
(49)
Denition 7 A measure preserving transformation
(D, B, P, Φ) is called ergodic, if every invariant set A
satises P (A) = 0 or P (D\A) = 0. We say P is an ergodic probability measure.
Proposition 3.1 Mixing implies ergodicity
Proof Take B = A
P (A) ∈ {0, 1}, i.e. P
to be an invariant set in (49) to get that P (A) = P (A)P (A). This can only hold if
is ergodic.
On the other hand, ergodicity implies that:
n−1
1 X
1B (Φm X) −n→∞
−−−→ P (B) = E [1B ]
n m=0
(50)
It is a basic fact of Measure theory that measurable functions continuous functions being a subset can be
approximated by a linear combination of characteristic functions (i.e. simple functions); this allows (50) to be
generalized. In particular, for some probability measure P and some measurable function g , it becomes feasible
that we can write:
ZT
Z
1
T →∞
g (X (t)) dt −−−−→
g (X) P (X) dX = E [g (X)]
T
T0
RN
where the notation X (t) is for a trajectory.
3.2 Statistichal mechanics' main ingredient: Liouville property
Denition 8 A vector eld F(X) satises the Liouville property if it is divergence free, i.e.
∇·F=
N
X
∂ Fj
=0
∂X
j
j=1
21
(51)
Let X ∈ RN and F = (F1 , ..., FN ) with N 1. Consider a system of ordinary dierential equations given by
dX
= F(X)
dt
Xt=0 = X0
(52)
The Liouville property implies the ow map associated with (52) is volume (or measure) preserving on phase
space, as we are about to see.
Denition 9 We dene the time-t ow map associated with the nite-dimensional ODE system
dΦt (X)
= F Φt (X) ,
dt
Φt (X) |t=0 = X
(52)
by:
(53)
We will also refer to it by ow map for simplicity.
Section D in the appendix elaborates on the notation and meaning of the ow map.
An interesting analogy can be drawn here: if this ow map is over physical instead of phase space, Φt (X)
would be the trajectory of uid elements and F would be the velocity eld, (53) would be a statement of the
Fundamental Theorem of Kinematics (see e.g. Price 2006), and (51) would be the condition of incompressibility
stated in (4).
The condition of incompressibility for the ow map is given by the Jacobian determinant J(t) ≡ det ∇X Φt (X)
of the transformation that takes the initial position to the position at time t, thus to prove that the ow map is
volume (or measure) preserving we only need a calculus identity which is proven in the following proposition:
Proposition 3.2 Assume Φ ∈ C2 , then
dJ(t)
= ∇ · F Φt (X) J (t)
dt
(54)
where J(t) is the Jacobian determinant of the transformation X 7→ Φt (X) for each t.
Proof
∂ (Φt ,Φt ,Φt )
We use the notation J(t) ≡ ∂(X11,X22 ,X33 ) .
The derivative of a Jacobian determinant is:
dJ(t)
∂ (F1 , Φt2 , Φt3 )
∂ (Φt1 , F2 , Φt3 )
∂ (Φt1 , Φt2 , F3 )
=
+
+
(55)
dt
∂(X1 , X2 , X3 )
∂(X1 , X2 , X3 )
∂(X1 , X2 , X3 )
where we have used that dΦti /dt = Fi Φt for i = 1, 2, 3 by the denition of ow map (53). It is straightforward
to see (by computation) that:
∂ (F1 , Φt2 , Φt3 )
∂ (F1 , Y2 , Y3 ) ∂ (Φt1 , Φt2 , Φt3 )
=
∂(X1 , X2 , X3 )
∂(Y1 , Y2 , Y3 ) Y=Φt ∂(X1 , X2 , X3 )
t
t
∂ (Φ1 , F2 , Φ3 )
∂ (Y1 , F2 , Y3 ) ∂ (Φt1 , Φt2 , Φt3 )
=
(56)
∂(X1 , X2 , X3 )
∂(Y1 , Y2 , Y3 ) Y=Φt ∂(X1 , X2 , X3 )
∂ (Y1 , Y2 , F3 ) ∂ (Φt1 , Φt2 , F3 )
∂ (Φt1 , Φt2 , Φt3 )
=
∂(X1 , X2 , X3 )
∂(Y1 , Y2 , Y3 ) Y=Φt ∂(X1 , X2 , X3 )
It is also straightforward to see (again by computation) that plugging (56) into (55) gives (54).
We are now in a position to show that for an incompressible vector eld, the associated ow map is volume
(or measure) preserving, which is the main reason we need (51). We consider phase space in particular:
22
Proposition 3.3
Φt (X)
is volume (or measure) preserving on phase space, that is:
(57)
J(t) = 1, ∀t ≥ 0
Proof
We know that
dJ(t)
= ∇ · F Φt (X) J (t)
dt
since ∇ · F = 0, we have that J(t) = J(0), ∀t ≥ 0. Using the initial value of the ow map given by (53) we get
that J(0) is the determinant of the identity matrix, thus J(0) = 1.
As mentioned in the introduction of this section the incompressibility condition assures no loss of information,
in other words the ow map is invertible. Since we are interested in the behavior of an ensemble of solutions, it
is natural to consider probability measures on phase space. Let a probability density function at time t = 0 be
given by P0 (X),
Denition 10 we dene a probability density
function with the ow map, that is:
P (X, t)
P (X, t) ≡ P0
Φt
by the pull-back of the initial probability density
−1
(X)
(58)
We shall now verify that this denition yields a probability measure. This will follow from the fact that the
probability measure is transported by the vector eld. More precisely we have:
Proposition 3.4 P (X, t) = P0 Φt −1 (X) is transported by the vector eld F, or equivalently it satises the
Liouville equation:
∂P
+ F · ∇X P = 0
∂t
(59)
and hence P (X, t) is a probability density function for all time.
Proof We will use the transport theorem (Majda and Wang 2006) that states that for any function f (X, t) ∈
C 1 [D] in a subset of phase space D ⊂ RN :
Z
Z ∂
∂f
f (X, t) dX =
+ div(f F) dX
(60)
∂t
∂t
Φt (D)
Φt (D)
by (51) this equation reduces to
∂
∂t
Z
Z
f (X, t) dX =
Φt (D)
∂f
+ F · ∇X f
∂t
dX
(61)
Φt (D)
let f (X, t) = P (X, t) and using the denition (58)
Z
Z
P (X, t) dX =
Φt (D)
P0
Φt
−1
(X) dX
Φt (D)
Z
=
P0 (Y) det ∇Y Φt (Y) dY
D
Z
=
P0 (Y) dY
(62)
D
where we have used a substitution X = Φt (Y) and (57) to exchange the substitution Jacobian for one. By
plugging (62) back into (61) we see that:
Z
Z ∂
∂P
0=
P (X, t) dX =
+ F · ∇X P dX
(63)
∂t
∂t
Φt (D)
Φt (D)
23
Since D is arbitrary we conclude that
∂P
+ F · ∇X P = 0
(64)
∂t
Notice that implicit in denition (58) is the fact that P (t) is a family of probability measures depending on
parameter t. To see that P (t) is a probability density function for all time, we can use (58) to get that P (X, t) ≥ 0
and (62) with D = RN to verify that
Z
Z
P0 (X) dX = 1
(65)
P (X, t) dX =
RN
RN
is satised.
Corollary 3.1 If G(P ) is any dierentiable function of the probability density P then
∂ G(P )
+ F · ∇G(P ) = 0
∂t
(66)
this further implies
d
dt
Z
(67)
G (P (X, t)) = 0
RN
Proof
the total time derivative is
dG(P )
∂ G(P )
=
+ F · ∇G(P ) = G0 (P )
dt
∂t
∂P
+ F · ∇P
∂t
=0
(68)
where we have used the chain rule as in (7), as well as (51), this proves (66). Integrating (68) over RN and
assuming that the integral and derivative can be exchanged10 we can write:
d
dt
Z
RN
Z
G (P (X, t)) =
∂ G(P )
=−
∂t
Z
F · ∇G(P )
RN
RN
Z
=−
∇ · [G(P )F]
RN
I
= − lim
r→∞
∂Br (0)
[G(P )F] · ds = 0
where we have integrated by parts to get the second to last line. Although we haven't said much on the precise
nature of G(P )F, we assume that the behavior at innity allows the integrand to be dominated by at least r−1 .
3.3 Conserved quantities and their ensemble averages
An essential ingredient for statistical mechanics is the existence of conserved quantities known as Casimirs, in
the context of GFD they exist due to a relabeling symmetry of uid mechanics (Salmon 1998). Here we will just
assume our system has L conserved quantities E such that for 1 ≤ l ≤ L:
El Φt (X) = El (X)
We will now show that the ensemble average of these conserved quantities are conserved in time.
10 a
formal proof for a similar exchange between derivative and integral will follow in proposition 3.8
24
(69)
Denition 11 We dene the ensemble average of a conserved quantity as:
Z
El = hEl iP ≡
El (X) P (X) dX
(70)
RN
Remarks: Here El is denoting a specic constant quantity, it will be used again when dening a set of probability
measures that satisfy certain constraints (equation (73)).
Proposition 3.5
hEl iP (t) = hEl iP0 ,
∀t
Proof
Z
El (X) P (X, t)dX
hEl iP (t) =
RN
Z
El (X) P0
=
Φt
−1
(X) dX
RN
Z
El Φt (Y) P0 (Y) dY
=
RN
Z
El (Y) P0 (Y) dY = hEl iP0
=
RN
where we have used property (57): the substitution Y = Φt
in time as stated in (69).
−1
(X) preserves volume, and that E is conserved
3.4 Shannon's Entropy
With a probability density function P on RN we may use the denition of entropy.
Denition 12 The Shannon entropy
continuous with respect to Lebesgue
S
for the probability density function P (X) on RN that is absolutely
is dened by:
measure11
S(P ) ≡ −
Z
P (X) ln P (X)dX
(71)
RN
The entropy is a measure of the information we do not know (see Appendix E).
By setting G(P ) = −P ln P in (67) we get that the Shannon entropy is conserved in time:
S(P (t)) = S(P0 )
(72)
The next step is to choose a probability measure that guarantees the least bias. The maximum entropy
principle states that the probability density function with the least bias should be the one that maximizes the
Shannon Entropy (71), subject to the constraint set of conserved quantities as dened through (70). To this
purpose we dene the set:


Z

 (73)
P = P P (X) ≥ 0,
P (X)dX = 1, hEl iP = El


RN
Clearly, if we have an initial probability density function P0 , and we use it to dene El , then P 6= ∅.
The maximum entropy principle predicts that the most probable probability density function P∗ ∈ P is the
one that satises:
11 Let (Ω, A) be a measurable space and µ and ν measures on A, then ν is
ν(A) = 0. In this case Lebesgue measure zero implies probability measure P
25
absolutely continuous with respect to µ if µ(A) = 0 ⇒
is also zero.
S (P∗ ) = max S (P )
(74)
P ∈P
To solve this maximization problem we will use the Lagrange multiplier method (see Appendix F) , for which
we need to calculate the variational derivative of a functional. The denition of this derivative can be motivated
by recalling the denition of the directional derivative of a function f : RN → R in the direction of vector ν :
f (x + hν)
≡ ∇f (x) · ν
h→0
h
lim
(75)
Instead of the Euclidean inner product in (75), we will use the L2 inner product between functions f and g
over a space Ω given by:
Z
Z
1
f ḡdλ = − f ḡdλ
f g ≡
λ (D)
D
D
Let H be a Hilbert space, we dene:
Denition 13 The variational derivative of a functional F : H → R with respect to an innitesimal
continuous real-valued function δu is given by the function δF/δu that satises:
F (u + δu) − F (u)
=
lim
→0
δF δu
δu (76)
We notice that δF/δu is analogous to the gradient
R of f in (75).
In general the variational derivative of F (u) = − G (u) dx for some function G ∈ C 1 [Ω] is given by:
Z
F (u + δu) − F (u)
[G (u + δu) − G (u)]
= lim −
dx
→0
→0
Z
= − G0 δu
lim
(77)
(Exchanging the limit with the integral will be justied in proposition 3.8.) Thus we identify:
δF
= G0
δu
We can now use (78) to calculate the variational derivative of (71) as:
δS
= − (1 + ln P )
δP
(78)
(79)
The constraints of this maximization problem are given in (73), in particular we calculate the variational derivative of the ensemble averages of the conserved quantities using (70):
δEl
= El (X)
δP
(80)
as well as the constraint that P ∈ P is a probability over RN , which gives
δP
=1
δP
The Lagrange multiplier method tells us that the probability P∗ that maximizes the Entropy S subject to
the constraints El with 1 ≤ l ≤ L as well as the constraint that P∗ is a probability density function is given by:
− (1 + ln P∗ ) = θ0 +
L
X
l=1
26
θl El (X)
Solving for P∗ we get:
P∗ = C exp −
L
X
!
θl El (X)
(81)
l=1
P
L
Here θ0 has been absorbed into C. Assuming that exp − l=1 θl El (X) is integrable, we can write:
C
−1
Z
≡I=
exp −
L
X
!
θl El (X) dX
l=1
Rn
With this denition for C (notice we chose θ0 = ln(I) − 1 so that C −1 = exp(θ0 + 1) = I ) we call
!
L
X
Gθl ≡ P∗ = C exp −
θl El (X)
the
(82)
l=1
Gibbs measure.
We will show that any smooth function of the conserved quantities El of the ODE (52) is a steady state
solution to the Liouville equation, in particular the Gibbs measure satisifes:
F · ∇X Gθl = 0
(83)
Proposition 3.6 Let
G (E1 , ..., EJ )
Proof
Ej for 1 ≤ j ≤ J be conserved quantities of system (52). Any smooth function G =
is a steady state solution of the Liouville equation.
Ej (X (t)) is conserved in time so for 1 ≤ j ≤ J :
0=
∂ Ej
d
Ej (X (t)) =
+ F · ∇ X Ej = F · ∇ X Ej
dt
∂t
Using the chain rule on G we get the desired result:
F · ∇X G (E1 , ..., EJ ) =
J
X
∂G
F · ∇ X Ej = 0
∂Ej
j=1
(84)
Setting Gθl = G in (84) we get (83). We can use this result to show that any probability measure satisfying
(83) is an invariant probability measure.
Denition 14 A measure µ on Rn is said to be an invariant measure under the ow map Φt if
µ
Φt
−1
(D) = µ (D)
(85)
for all measurable sets D ⊂ Rn and all time t.
Proposition 3.7 The Gibss measure
stationary statistical solution to (52).
Gθl
is an invariant measure of the system (52), in other words it is a
Proof
d
dt
Z
d
Gθl (X) dX =
dt
(Φt )−1 D
Z
=
Z
Gθl Φt (Y) dY
D
F · ∇X Gθl (X) X=Φt (Y) dY
D
=0
27
We have used the Rusual substitution as well as the volume preserving property of the ow eld J = 1. The
conclusion is that (Φt )−1 D Gθl (X) dX is independent of time and therefore:
Z
Z
Gθl (X) dX = Gθl (X) dX
D
(Φt )−1 D
which in turn imples Gθl is an invariant probability measure of Φt .
3.5 Casimir's conservation laws.
Both Euler and the quasi-geostrophic equations conserve an innite number of functionals named Casimirs. They
are of the form:
Z
Cs [q] = s(q)dx
(86)
D
where q = q (x, t) is either the vorticity or the potential vorticity, both of which are materially conserved
i.e. they satisfy Dq/Dt = 0. Throughout this paper we have exchanged a derivative with an integral without
rigorous treatment. We will use this section to show how this can be done, while introducing Casimirs at the
same time. Although we will not be referencing these conserved quantities any further, readers that follow up
on the topic will invariably encounter them. Thus, in the author's mind at least, there is some added value in
introducing them while we prove that the exchange of derivative and integral may be an acceptable practice.
Proposition 3.8
Cs
is materially conserved for any s ∈ C 1 [D]. That is
DCs
=0
Dt
Proof
We need to prove that
D
Cs [q] =
Dt
Z
D
∂q
+ u · ∇q dx = 0
s (q)
∂t
|
{z
}
0
(87)
=0
For this to be valid it is only necessary to justify the exchange of the time derivative with the integral over
the xed spatial domain D. The problem amounts to bringing the limit into the integral in:


Z
D
[s(q(x,
t
+
h))
−
s(q(x,
t))]
Cs [q] = lim 
dx 
h→0
Dt
h
(88)
D
We would like to use the bounded convergence theorem (Appendix B.1). First we will assume that the domain
is closed, but eventually we will relax this assumption. This will illustrate the dierences between the two types
of domain. Any closed set is compact since D ⊂ Rn , and any continuous function on a compact domain is
uniformly bounded, thus that s and Ds/Dt are uniformly bounded is straightforward. Next dene
sn (q (x, t)) ≡
s(q(x, t + n1 )) − s(q(x, t))
1
n
From the Mean Value Theorem: ∀n ∈ N ∃ t? ∈ t, t + n1 such that Ds/Dt(x,t ) = sn (q (x, t)) therefore for some
?
0 < M ∈ R we have |sn | < M ∀n ∈ N. At this point the Dominated Convergence Theorem applies and we can
take the limit into the integral in (88) to get (87).
Remark: This proof can be extended to an unbounded (e.g. D = Rn ) or some other open domain by requiring
that the derivative of the integrand is bounded and not only continuous. Next, the dominated convergence
theorem could be used by nding a non-negative integrable function g such that |sn | ≤ g .
Remark:
Usually s(q) = q n , n ∈ N so the assumption s ∈ C 1 [D] implies no restriction.
28
4 Statistical theory for a simple Geophysical Flow.
We will now use the theory developed in section 3 to nd the mean state of the Gibbs measure for a simple
realistic barotropic ow with no forcing or dissipation as dened in section 2.12:
∂q
+ J (ψ, q) = 0
∂t
Z
dV (t)
∂h 0
= −−
ψ
dt
∂x
(89)
Where
q = ∆ψ 0 + h + βy, ζ ≡ ∆ψ 0
ψ = −V (t) y + ψ 0 , q 0 ≡ ∆ψ 0 + h
V (t)
⊥
v=∇ ψ=
+ ∇⊥ ψ 0
0
Remarks:
We are using equation (22) for the relative vorticity ζ . Here ∇⊥ = (−∂ /∂x2 , ∂ /∂x1 ). See section
2.13 for more details.
We can choose our units so that our domain is D = [0, 2π] × [0, 2π] with periodic boundary conditions.
We will use the statistical theory for the truncated quasi-geostrophic dynamics (89) originally developed by
Kraichnan (1975), Salmon et. al. (1976) as well as Carnevale and Frederiksen (1987): the rst step is to write
a Galerkin approximation to (89) where the equations are projected to a nite dimensional space. Periodic
boundary conditions make Fourier series a natural option. Next we verify which quantities are conserved in the
nite dimensional space and if the truncated equations satisfy the Liouville property. These are the essential
ingredients as we have seen in the previous section. Next we nd the Gibbs measure for this system and compute
its mean state. The last step includes taking the limit as the number of dimensions of our space tends to innity
thus obtaining the continuum limit.
4.1 Galerkin approximation
0
We will use the truncated spatial Fourier series expansions
of the small-scale
n
o stream function ψ , the vorticity ζ
2
and the topography h as spanned by the basis BΛ ≡ eik·x 1 ≤ |k| ≤ Λ where k = (k1 , k2 ) ∈ Z2 so that:
0
ψΛ
≡
X
ck (t) eik·x = −
ψ
1≤|k|2 ≤Λ
hΛ ≡
X
1 b
ik·x
2 ζk (t) e
|k|
2
1≤|k| ≤Λ
X
(90)
b
hk (t) eik·x
1≤|k|2 ≤Λ
ζΛ ≡
X
X
ζbk (t) eik·x =
2
− |k| ψbk eik·x
1≤|k|2 ≤Λ
1≤|k|2 ≤Λ
We note that spatial Fourier transformation maps spatial derivatives to products, for example the Laplacian
2
operator is mapped as: ∆ (·) 7→ − |k| (·) and ∂ /∂x(·) 7→ ik1 (·). Thus in the rst and last line of (90) we have
used (22).
For simplicity we have assumed zero mean for our functions. All the amplitudes of the Fourier expansions are
∗
c∗ , b
c∗
b
b∗
real numbers, i.e. they satisfy, ψb−k = ψ
k h−k = h k and ζ−k = ζ k where (·) denotes complex conjugation.
The Fourier coecients are dened in the usual way, for example:
Z2πZ2π
Z
ik·x 1
−ik·x
b
ψ≡ ψ e
= − ψe
dx =
ψe−ik·x dx1 dx2
4π 2
D
0
29
0
BΛ is an orthonormal basis under the above L2 inner product, a basic result of Fourier analysis. Let VΛ =
span{BΛ }. We obtain the truncated dynamical equations for barotropic quasi-geostrophy by projecting (89)
orthogonally onto VΛ . Let PΛ denote the orthogonal projection onto VΛ 12 , if we project the barotropic quasigeostrophic equations (89) onto VΛ we get the truncated dynamical equations. It is a Galerkin approximation
using a Fourier basis:
0
∂ q0
∂ ψ0
∂ qΛ
0
0
+ β Λ + V Λ + PΛ ∇⊥ ψΛ
· ∇qΛ
=0
∂t
∂x1
∂x1
Z
dV
∂ ψ0
− − hΛ Λ = 0
dt
∂x1
(91)
(92)
Remarks:
2
2
2
• VΛ is not closed under multiplication since for |p| ≤ Λ, |q| ≤ Λ we may have |p + q| ≥ Λ (note
exp (ip · x) exp (iq · x) = exp (i (p + q) · x)). Therefore the non-linearity in (91) needs special treatment
when projecting the Fourier series onto VΛ . This projection is made clear in the next set of equations when
(91) and (92) are written in Fourier space.
• The last integral is the result of integration by parts of (29).
Plugging (90) into (91) and (92) we get an ODE system describing evolution in time. Thus we have that, for
2
the Fourier coecients satisfying 1 ≤ |k| ≤ Λ:
iβk1 b
d ζbk
bk + b
−
ζ
+
iV
k
ζ
h
−
k
1
k
2
dt
|k|
X
p+q=k
|p|2 ≤Λ, |q|2 ≤Λ
p⊥ · q b b
b
ζ
ζ
+
h
=0
p
q
q
2
|p|
dV
−i
dt
X
1≤|k|2 ≤Λ
k1
b
h−k ζbk
|k|
2
=0
(93)
(94)
Remarks:
2
• The relations qbk = ζbk + b
hk and ψbk = −ζbk / |k| have been used in (93).
• q or p are never zero since we have assumed zero mean for the Fourier expansion.
4.2 Conserved quantities.
As noted above the existence of conserved quantities is one of the basic ingredients needed for statistical mechanics. We will now prove that our truncated equations (91) and (92) indeed have two conserved quantities
with which we can proceed. These quantities are the energy EΛ and the enstrophy EΛ , they were introduced as
equations (39) and (40):
Z
1 2 1 ⊥ 0 2
1
1 X
2
EΛ = V + − ∇ ψΛ dx = V 2 +
|k| |ψbk |2
(95)
2
2
2
2
1≤|k|2 ≤Λ
Z
2
1
1 X 2
0 2
EΛ = βV + − (qΛ
) dx = βV +
hk (96)
− |k| ψbk + b
2
2
2
1≤|k| ≤Λ
Proposition 4.1 The truncated energy EΛ and enstrophy EΛ are conserved in the nite-dimensional truncated
dynamics.
12 a
brief introduction to orthogonal projections is given in Appendix G.
30
Proof
Λ
= 0 and dE
dt = 0 for all time. We start with energy:
Z
dEΛ
∂V
0 ∂ ζΛ
=V
− − ψΛ
dx
dt
∂t
∂t
Z
0
∂V
0 ∂ qΛ
=V
− − ψΛ
dx
∂t
∂t
Z
Z
Z
0
0
∂ ψ0
0 ∂ qΛ
0 ∂ ψΛ
= V − hΛ Λ dx + β − ψΛ
dx +V − ψΛ
dx
∂x
∂x
∂x
{z
}
|
R
2
∂
=− 21 ∂x
(ψΛ0 ) dx=0
Z
0
0
0
+ − PΛ ∇⊥ ψΛ
· ∇qΛ
ψΛ dx
Z
Z
Z
0
0
0
∂ ψΛ
0 ∂ hΛ
0
0
ψΛ dx
dx + V − ψΛ
dx +− ∇⊥ ψΛ
· ∇qΛ
= V − hΛ
∂x
∂x
{z
}
|
R ∂
=V − ∂x
(hΛ ψΛ0 )dx=0
We need to prove that
dEΛ
dt
=0
where we have used the relationships between our functions as described in (90), (91) and in (92) as well as
R
R
R 0
0 2
0
0
integration by parts. In the rst line we did − ∂ (1/2) |∇ψΛ
| /∂t = − ∇ψΛ
∂ ∇ψΛ
/∂t = −− ψΛ
∂ ζΛ /∂t We have
also used that the integral of the derivative of a periodic function is zero, as well as the product rule. The
orthogonal projection can be dropped under integration due to the orthogonality relation of the Fourier basis:
(
1 if
m=n
im·x in·x
he
|e
i=
0 if
m 6= n
The same techniques used above will work on the time derivative of enstrophy to show it is zero.
4.3 Non-linear stability of the exact solution to the truncated system
Similar to what we did in section 2.15, it is easy to verify that the truncated equation has an exact solution
when assuming a linear relationship
q Λ = µψ Λ
(97)
where similar to section 2.15 this means we have:
∆ψ Λ + hΛ = µψ 0 Λ , V = −
β
µ
(98)
To assess non-linear stability, a positive quadratic form for the perturbations can be constructed from the
truncated energy and enstrophy. Let (δqΛ , δV ) be the perturbations, following section 2.15 and using (98) we
have:
µEΛ (qΛ , V ) + EΛ (qΛ , V ) = µEΛ q Λ , V + EΛ q Λ , V + Wµ (δqΛ , δV )
(99)
where
µ
1
Wµ (δqΛ , δV ) = δV 2 +
2
2
=
X
1+
1≤|k|2 ≤Λ
2 1
µ
V −V +
2
2
X
!
µ
(δqΛ )
2
|k|
|k|
2
2
2
µ + |k|
2
ψbk − ψbk
(100)
1≤|k|2 ≤Λ
Recall that δq = q − q . The same arguments as in section (2.15) prove that the steady-state solution q Λ , V
non-linearly stable for µ > 0.
31
is
4.4 Liuoville property of the truncated system.
2
Choose
a set S = o
{k1 , ..., kM } of the modes k satisfying 1 ≤ |k| ≤ Λ such that if k ∈ S ⇒ −k 6∈ S but S∪(−S)
=
n
2
N
b
b
b
b
k : 1 ≤ |k| ≤ Λ . Let N = 2M + 1 and dene X ∈ R by X ≡ V, Reψk1 , Imψk1 , ..., ReψkM , ImψkM .
Now the entire state of the nite-dimensional system can be represented by a point X ∈ RN where N 1.
Equations (93) and (94) can be represented as:
dX
= F (X)
dt
(101)
with initial conditions Xt=0 = X0 . Under this form it becomes easy to prove that the Liouville property is
satised since the vector eld F satises:
(102)
Fj (X) = Fj (X1 , ..., Xj−1 , Xj+1 , ..., XN )
in other words:
Proposition 4.2
Fj
does not depend on Xj in (101).
Proof
Clearly F1 is independent of X1 = V . When thinking about F, it is appropriate to remember the
2
relationship ζbk = − |k| ψbk . Next notice that in (93), F2 and F3 correspond to Reψbk1 and Imψbk1 respectively.
In general F2j and F2j+1 correspond to X2j = Reψbkj and X2j+1 = Imψbkj respectively. Clearly the linear terms
in (101) are either independent of ψbkj or only cause a rotation of X2j and X2j+1 of the form:




d X2j
−X2j+1

 = C (kj ) 

dt X2j+1
X2j
where C (kj ) is a constant for each kj . Clearly, this rotation keeps Fj from being dependent on Xj .
The nonlinear term in (93) has contribution zero from ψbkj and ψb−kj (i.e. the contribution from X2j and
X2j+1 is also zero) since the restriction on the summation implies that either p = 0, q = kj or q = 0, p = kj
or p = 2kj , q = −kj or q = 2kj , p = −kj in any case p⊥ · q = 0.
4.5 Statistical predictions of the truncated system.
The truncated system has two conserved quantities, it satises the Liouville property and it is non-linear stable
to small perturbations. We can now use equation (82) for our least biased probability with the expressions for
energy and enstrophy given by (95) and (96) to get that:





2
2
X X
1
1
1
2
2
Gα,θ = C exp −α βV +
hk  − θ  V 2 +
|k| ψbk 
(103)
− |k| ψbk + b
2
2
2
2
2
1≤|k| ≤Λ
1≤|k| ≤Λ
where α and θ are the Lagrange multipliers for enstrophy and energy respectively, they are determined from
the ensemble average enstrophy and energy constraints derived from (70). For (103) to be a probability measure
there needs to be some restrictions; in particular, it is needed that the coecients of the quadratic terms are
4
2
2
negative which means that α |k| + θ |k| > 0 for each k satisfying |k| ≤ Λ. Additionally if V 6= 0 then θ > 0 is
also needed. Let
µ≡
θ
,
α
α 6= 0
The above condition implies that either
(104)
α, µ > 0
(105)
V ≡ 0, α > 0, µ > −1
(106)
or
32
or
(107)
α < 0, µ < −Λ, θ > 0
The last condition depends on the truncation so it is not physically relevant. Condition (106) is a possibility
when there is no large scale zonal ow, but it will not be pursued in this paper. We focus on (105) under which
we can use:
β
V =− ,
µ
ψbk =
ψ 0 Λ (x, t) =
X
with
b
hk
(108)
2
µ + |k|
(109)
ψbk eik·x
2
1≤|k| ≤Λ
As we have seen in 4.3 the solution V , ψ 0 Λ under condition (105) is non-linearly stable. Thus we can write the
Gibbs measure (103) in terms of (100) as:
Gα,µ = C exp (− (αEΛ + αµEΛ ))
= C exp (−α (EΛ + µEΛ ))
= C exp (−αWµ (δq, δV ))


2 1
µ
= C exp −α  V − V +
2
2
X
|k|
2

2
2
|k| + µ ψbk − ψbk 
(110)
2
1≤|k| ≤Λ
or equivalently:
Gα,µ (X) =
N
Y
Gjα,µ (Xj )
j=1
We see that the Gibbs measure for the dynamics is a product of Gaussian probabilities, the good news is that it
0
)
is a measure with which we are well familiar. In fact calculating the ensemble average or mean state of (V, ψΛ
is straightforward, and it is given by:
Z
hXi =
XGα,µ (X) dX = V , ψbk1 , . . . , ψbkM
(111)
RN
Thus, it turns out that the non-linearly stable steady state solution is in fact the ensemble average or mean state
of the system. Assuming ergodicity, we can predict that for long-enough time T the time-average of the solutions
converges to the non-linearly stable steady state solution of the QG truncated equations, more precisely we have
:
1
lim
T →∞ T
TZ
0 +T
0
ψΛ
(x, t) dt = ψ 0 Λ
(112)
β
µ
(113)
T0
1
T →∞ T
TZ
0 +T
V (t) dt = V = −
lim
T0
Equations (112) and (113) show that, regardless of the initial conditions, a specic large-scale coherent mean
ow will develop from the truncated equations (91). Converging after a long enough time period to the most
probable mean state, which is the non-linearly stable exact steady state solution. To see that this solution is
a specic ow, recall that the steady state solution V , ψ 0 Λ is given by (108) and (109); for a given bottom
topography h and a given constant µ the solution converges to a specic steady-state solution. The underlying
physics explaining this is conservation of angular momentum: topography and vorticity are anti-correlated. More
will be said about choosing µ in section 4.6.1.
33
4.6 The limit Λ → ∞
It is when we take the limit Λ → ∞ that equations (91) and (92) converge to (89), thus we are interested in
investigating the asymptotic behavior of the invariant Gibbs measure as well as their mean states as Λ increases;
this is the continuum limit. How can the parameters α = αΛ and µ = µΛ be chosen so that they satisfy the
energy and enstrophy contraints as Λ → ∞?
We begin by noticing that we do not need the truncated energy/enstrophy constraints to hold. It is in the
limit of the cut-o wave number approaching innity that the constraints need to be satised:
lim hEΛ i = E0 ,
lim hEΛ i = E0
Λ→∞
Λ→∞
(114)
Here it is possible to exchange the integral and the limit because EΛ ≤ E, ∀Λ (the same holds for E ), and both
the enstrophy and energy are bounded.
The second thing to notice is that the energy and enstrophy are related by an imposed constraint. The mean
enstrophy E0 must be greater than the minimum enstrophy associated with a given energy level. To be precise:
E0 >
min
E(ψ)=E0
E (ψ) = E∗ (E0 )
(115)
We only consider the case where the above inequality, as written, is strictly greater. The reason is that when
there is equality in (115) any state satisfying the energy/enstrophy constraint must be an enstrophy-minimizing
(selective decay) state, which does not have much randomness and therefore not of interest (see e.g. section 4.5
of Majda and Wang 2006 and references therein).
2
Next we recall the denition of variance: var [x] ≡ E x2 − E [x] , because we know the Gaussian distribution
well, it is straightforward to calculate that:
Z
−1
V 2 Gα,µ = (αµ)
+
β2
µ2
Z 2
−1 2
c
c
2
2
+ ψ
ψk Gα,µ = α |k| µ + |k|
k
(116)
We can therefore observe that the ensemble average of the energy and enstrophy separate naturally into two
parts: one that corresponds to the mean state and one that corresponds to the uctuation part:
hEΛ i =hEΛ iG = EΛ + EΛ0

2 
2 c
|k|
2
hk 
X
1 β
EΛ =  2 +
2 
2 µ
2
1≤|k|2 ≤Λ µ + |k|


X
1
1
µ−1 +

EΛ0 =
2
2α
µ + |k|
2
(117)
1≤|k| ≤Λ
while for the enstrophy
hEΛ i =hEΛ iG = EΛ + E0Λ
2
2 c
µ
hk β
1
EΛ = −
+
2
µ
2
2
1≤|k|2 ≤Λ µ + |k|
2
E0Λ =
1
2α
X
2
|k|
X
2
1≤|k|2 ≤Λ
µ + |k|
34
(118)
The uctuation part of the energy is isotropic and independent of the mean state. For large Λ the summation
may be exchanged with integration so that
EΛ0 =
1
1
+
2αµ 2α
1
2π
∼
+
=
2αµ 2α
1
X
2
1≤|k|2 ≤Λ
√
ZΛ
µ + |k|
|k|
µ + |k|
1
2 d |k|
√
Λ
1
π
2 =
+
ln µ + |k| 2αµ 2α
1
1
π
µ+Λ
=
+
ln
2αµ 2α
µ+1
(119)
where a transformation into polar coordinates has been done to carry out the double integration.
Likewise the uctuation part of the total enstrophy can be estimated as:
E0Λ =
1
2α
|k|
X
2
2
1≤|k|2 ≤Λ
√
ZΛ
µ + |k|
3
|k|
1
∼
2π
=
2α
2 d |k|
µ + |k|
1
√
Λ
Z |k| µ + |k|2 − µ |k|
π
=
d |k|
2
α
µ + |k|
1
µ+Λ
π
Λ − 1 − µ ln
=
2α
µ+1
(120)
With this we can see what is needed for the parameters α and µ. If they are bounded independent of Λ
then, as Λ → ∞ the uctuation part of the energy and estrophy will go to innity, which contradicts the energy
constraint. It can be shown that the parameter α must approach innity, µ should remain bounded since large
µ implies small geophysical inuence which contradicts physical reality (recall Coriolis acceleration is often one
of the terms in the leading geophysical balance). By the realizability condition (105) and the explicit formula
for the energy ensemble average (117) we have that EΛ0 > 0 this further implies that for large Λ the asymptotic
behavior is:
β2
≤ E Λ ≤ E0
2µ2Λ
(121)
from where we can establish a lower bound on µΛ :
µΛ ≥ √
Λ,
β
2E0
(122)
substituting into the perturbation equation for enstrophy in (118) we see that, again asymptotically for large
EΛ ≥ −
p
β2
≥ −β 2E0
µΛ
With this, the uctuation part of the ensemble enstrophy can be asymptotically bounded:
p
E0Λ ≤ E0 − EΛ ≤ E0 + β 2E0
Now that µΛ is bounded above and positive from the realizability condition (105), it follows that
35
(123)
(124)
lim
ln µµΛΛ+Λ
+1
Λ→∞
= lim
Λ
Λ→∞
ln Λ
=0
Λ
(125)
combining (125), (124) and (120) we have that asymptotically for large Λ
αΛ ≥
πΛ
√
2 E0 + β 2E0
(126)
Use (126), (122) and the boundedness of µ in (119) to deduce that
EΛ0 → 0,
as
(127)
Λ→∞
Thus we conclude that all energy resides in the mean state asymptotically for large Λ. We now investigate
how to choose the parameters and the asymptotic behaviour of the parameters as well as the mean states.
4.6.1 Choosing µ = µΛ
It is possible to nd a unique µ = µΛ > 0 such that
(128)
E Λ = E ψ 0 µΛ , V µΛ = E0
Then ψ 0 µΛ , V µΛ satisfy
∆ψ 0 µΛ + hΛ = µΛ ψ 0 µΛ ,
V µΛ = −
β
µΛ
(129)
Clearly µΛ needs to be a non-decreasing function of Λ since for xed µ the mean energy state E Λ in (117) is
a monotonic-increasing function in Λ. It is easy to check that there exists a µ = µ∞ > 0 such that
(130)
lim µΛ = µ
Λ→∞
in a monotonic decreasing fashion. In terms of statistical mechanics we then have a positive "temperature" for
all energy levels.
4.6.2 The limit of the mean states.
The mean state also converges, it is easy to check using (108), (109) and (130), that
ψ 0 µΛ → ψ 0 µ =
X
b
hk
µ + |k|
2e
ik·x
,
V µΛ → V µ = −
β
µ
(131)
where µ = µ∞ > 0 is the unique µ ∈ (0, ∞) such that the energy constraint is met by the limit mean state
ψ 0 µ , V µ , that is
E ψ 0 µ , V µ = E0
(132)
It is also true that thanks to (131) and (132), the limit mean state satises the limit mean eld equation
∆ψ 0 µ + h = µψ 0 µ ,
Vµ =−
which means it is non-linearly stable according to section 2.15 (see
36
β
µ
e.g. proposition 2.2).
(133)
4.6.3 Choosing α = αΛ and the enstrophy constraint
For a given energy E0 there exists a minimal enstrophy, E∗ (E0 ) which is equal to E ψ 0 µ , V µ since ψ 0 µ , V µ is
the enstrophy-minimizing state (or selective decay state) with the topography h as well as with β . On the other
hand thanks to (131) - (133)
lim EΛ = E ψ 0 µ , V µ = E∗ (E0 )
(134)
Λ→∞
combining (115) and (134) we have tht for large Λ
E0 > EΛ = E ψ 0 µΛ , V µΛ
(135)
We now pick an αΛ so that the enstrophy constraint is satised for all truncations
i.e.
E0 = hEΛ i = EΛ + E0Λ
(136)
which amounts to requiring
αΛ ∼
=
π
2Λ
E0 − EΛ
(137)
for Λ 1. Note that αΛ → ∞ as Λ → ∞.
4.6.4 The energy constraint
As mentioned the energy uctuation goes to zero as Λ → ∞. To be precise use (137) in (119) to get
π ln Λ
EΛ0 ∼
→ 0 as Λ → ∞
(138)
=
2Λ
Combining (138) with (128) and (117) we can conclude that the energy constraint is satised in the sense of
(114).
37
5
References
Bennet, A. (2006). Lagrangian Fluid Dynamics. Cambridge University Press.
Bouchet, F., A. Venaille (2011). Statistical mechanics of two-dimensional and geophysical ows. arXiv:1110.6245v1
[physics.u-dyn]. http://arxiv.org/abs/1110.6245
Also published in Elsevier.
Bouchet, F., A. Venaille (2012). Applications of equilibrium statistical mechanics to atmospheres and oceans.
Paper available here. See also http://perso.ens-lyon.fr/antoine.venaille/
Carnevale, G.F., Frederiksen, J.S. (1987). Nonlinear stability and statistical mechanics of ow over topography.
J. Fluid. Mech. 175, 157-181.
Cushman-Roisin B., Beckers J. (2011). Introduction to Geophysical Fluid Dynamics. 2nd edition. Academic
Press.
DeCaria A. J. , T . D. Sikora (2010). Momentum Advection and the Gradient of a Vector Field: A Discussion
of Standard Notation. Journal of Atmospheric Sciences. DOI: 10.1175/2009JAS3393.1
Dubinkina S. (2010). Statistical Mechanics and Numerical Modeling of Geophysical Fluid Dynamics. PhD thesis
Universiteit van Amsterdam. Center for Computer Science and Mathematics (CWI). Thomas Stieltjes
Institute for Mathematics. Published in the Journal of Computational Physics.
Durret, R (1996). Probability: theory and examples. 2nd ed. Duxbury Press.
Ertel, H. (1942). Ein neuer hydrodynamischer Wirbelsatz. Meteorologische Zeitschrift, 59, 277-281.
Eyink G. L. and K. R. Sreenivasan (2006). Onsager and the theory of hydrodynamic turbulence, Rev. Mod.
Phys. 78, 87-135
Gardiner-Garden R. S. (1991). New vertical modes for Dissipative Stratied Circulations with applications to
Coastal Upwelling. Jour. Geoph. Res. Vol. 96. No. C5. page 8811-8822.
Gluss, David and Weisstein, Eric W. "Lagrange Multiplier." From MathWorldA Wolfram Web Resource. http:
//mathworld.wolfram.com/LagrangeMultiplier.html
Holloway, G. (1986). Eddies, waves, circulation and mixing: Statistical Geouid Mechanics. Ann. Rev. Fluid
Mech. 18: 91-147
Kundu P. K., I. M. Cohen (2008). Fluid Mechanics 4th Edition. Academic Press.
Kraichnan, R. H. (1975). Statistical dynamics of two-dimensional ow. J. Fluid Mech. 67, 155-175.
Majda A. J. (2003). Introduction to PDE's and Waves for the Atmosphere and Ocean. Courant lecture notes in
Mathematics. American Mathematical Society.
Majda A. J., X. Wang (2006). Nonlinear Dynamics and Statistical Theories for Basic Geophysical Flows.
Cambridge University Press.
Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics.
Merryeld W. J., P. F. Cummins & G. Holloway (2001). Equilibrium Statistical Mechanics of Barotropic Flow
over Finite Topography. Journal of Physical Oceanography. Vol. 31, 1880-1890.
McDonald, J.N., N.A. Weiss. A course in Real Analysis. Academic Press 1999.
Modern Physics Course: Statistical Mechanics. Stanford University Youtube Channel
http://youtu.be/H1Zbp6__uNw
Pedlosky J. (1987) Geophysical Fluid Dynamics. 2nd Edition. Springer.
Price, J. F. (2006). Lagrangian and Eulerian representation of Fluid Flow: Kinematics and the Equations of
Motion. Woods Hole Oceanographic Institution.
http://www.whoi.edu/science/PO/people/jprice
Randall, D. A., (2010). The Evolution of Complexity in General Circulation Models. In: The Development
of Atmospheric General Circulation Models: Complexity, Synthesis, and Computation, L. Donner, W.
Schubert, and R. C. J. Somerville, Eds. Cambridge University Press, 272 pp
Salmon, R., Holloway, G. and Hendershott, M. (1976). The equilibrium statistical mechanics of simple quasigeostrophic models. J. Fluid Mech. 75, 691-703.
Salmon, R. (1998) Lectures on Geophysical Fluid Dynamics, Oxford University Press.
38
Samelson, R. M., S. Wiggins (2006). Lagrangian transport in Geophysical Jets and Waves: the dynamical
systems approach. Springer.
Samelson, R. M. (2011). The theory of large-scale ocean circulation. Cambridge University Press.
Samelson, R. M. (2012). Lagrangian motion, coherent structures, and lines of persistent material strain. Ann.
Rev. Mar. Sci. Accepted for publication.
Sarig, O. (2008) Lecture Notes on Ergodic Theory - Graduate course at Penn State University. http://www.
math.psu.edu/sarig/506/ErgodicNotes.pdf
Shimizu, K., 2011: A Theory of Vertical Modes in Multilayer Stratied Fluids. J. Phys. Oceanogr., 41, 1694-1707.
doi: http://dx.doi.org/10.1175/2011JPO4546.1
Simi¢, S.N. (2005) Notes for Math 134. San Jose State University http://www.math.sjsu.edu/~simic/Fall05/
Math134/flows.pdf
Vallis, G. K. 2006 Atmospheric and Oceanic Fluid Dynamics. Cambridge University Press.
Venaille A., G. K. Vallis and S. M. Gries (2012). The catalytic role of beta eect in barotropization processes.
arXiv:1201.0657v1 [physics.u-dyn].
39
A Existence and Uniqueness to Poisson's equation
A.1 Green's identities
Suppose that ψ, φ ∈ C 2 (D) ∩ C 1 (D̄), where D is a bounded normal domain with boundary B , starting from
the Gauss Divergence theorem the following identities known as Green's rst, second and third identities are
obtained:
Z
Z
∂ψ
[φ∆ψ + ∇φ · ∇ψ] dx = φ
dσ
(139)
∂n
D
where
∂ψ
∂n
B
≡ ∇ψ · n is the outward normal derivative.
Z
Z ∂ψ
∂φ
[φ∆ψ + ψ∆φ] dx =
φ
−ψ
dσ
∂n
∂n
D
(140)
B
Z
Z
∆ψdx =
D
∂ψ
dσ
∂n
(141)
B
A.2 Maximum-Minimum principle
Let D be a bounded domain and ψ ∈ C 2 (D) ∩ C 1 (D̄):
1. if ∆ψ ≥ 0 in D, then ψ(x) ≤ maxy∈B ψ(y) for x ∈ D
2. if ∆ψ ≤ 0 in D, then ψ(x) ≥ miny∈B ψ(y) for x ∈ D
3. if ∆ψ = 0 in D, then miny∈B ψ(y) ≤ ψ(x) ≤ maxy∈B ψ(y) for x ∈ D
A.3 Existence and Uniqueness for Dirichlet and Robin Conditions in a bounded
domain.
Let D be a bounded normal domain, it's closure D̄ ≡ D ∪ B includes the boundary B . Consider the problem
x∈D
∆ψ = ζ,
If B is twice continuously dierentiable then there is at most one function ψ ∈ C 2 (D) ∩ C 1 (D̄) that solves (22)
with either:
ψ = f (x),
x∈B
(Dirichlet)
or
∂ψ
+ α(x)ψ = f (x), x ∈ B
(Robin)
∂n
where ζ is the known vorticity in equation (22), f (x) is a given function and α(x) ≥ 0 is bounded, continuous
and not identically zero.
proof: Let ψ1 and ψ2 be solutions to the Robin problem, dene ψ = ψ1 − ψ2 , then ψ satises ∆ψ = 0, x ∈ D
and ∂ψ/∂n + αψ = 0; x ∈ B . Take ψ = φ in (139):
Z
Z
Z
∂ψ
2
|∇ψ| dx = ψ
dσ = − αψ 2 dσ
∂n
D
B
B
The rst integral is non-negative and the last integral is non-positive, this implies ψ = 0. Similar reasoning
proves uniqueness for the Dirichlet problem.
40
A.4 Existence and Uniqueness for Neumann Conditions
Under the same conditions as above it can be shown that two solutions for the Neumman problem:
∆ψ = ζ, x ∈ D
∂ψ
= f (x), x ∈ B
∂n
But because of the boundary condition, a further necessary condition for the existence of a solution is that
Green's third identity (141) holds this means that the integral of the Laplacian over the domain D is zero because
the normal derivative is zero: no normal ow through boundary.
A.5 Unbounded domains
In a bounded domain we can easily strengthen the uniqueness of the Dirichlet problem (i.e. drop the need for
the domain to be normal) by using the Maximum-Minimum principle (see section A.2). As an example of this:
the proof for Dirichlet is straightforward by assuming that there are two solutions for the problem ψ1 , ψ2 , dene
ψ = ψ1 − ψ2 , ψ which satises ∆ψ = 0 in D and ψ = 0 on B , by the Maximum principle ψ = 0 and so there is
only one solution. Using the Maximum principle it is also possible to relax the conditions needed for the Robin
problem with α(x) > 0 so that for ψ ∈ C 2 (D) ∩ C 1 (D̄) where D is a bounded domain, there is at most one
solution. However the reason that we used Green's identities in the previous section is because it aids in the
generalization to innite domains. Further requirements for innite domains are discussed next, we start with
some denitions:
−p
w = O |x|
as x → ∞
means that there exist constants M and a such that
−p
|w(x)| ≤ M |x|
for |x| ≥ a
Also we will dene an exterior domain D as the complement of a bounded domain D1 that includes the
origin, if the bounded domain is normal then we say that the complement is a normal exterior domain. B is the
boundary of D. Also let Kr (c) ≡ {x : |x − c| < r}, the boundary of Kr (c)
r}
is Sr (c) ≡ {x : |x − c| =
−2 −1
2
1
It can be shown that if ψ, φ ∈ C (D) ∪ C D̄ and ψ, φ = O |x|
and ∇ψ, ∇φ = O (| x as |x| →
∞, then the Green identities (139)-(141) hold on D.
The
The
exterior Dirichlet problem for D is to nd a function ψ ∈ C 2 (D) ∪ C 1 D̄ that satises
exterior
∆ψ = h(x) in D, ψ = f (x) on B, ψ → 0 uniformly at infinity
Neumann problem for D is to nd a function ψ ∈ C 2 (D) ∪ C 1 D̄ that satises:
∂ψ
= f (x) on B, ψ → 0 uniformly at infinity
∂n
is to nd a function u that satises Poisson's equation, vanishes at innity
∆ψ = h(x) in D,
The exterior
and satises:
Robin problem for D
∂ψ
+ α(x)ψ = f (x) on
∂n
where α (x) ≥ 0 on B , α 6≡ 0.
Proof of Uniqueness of the Dirichlet problem:
Let ψ = ψ1 − ψ2 where ψ1 , ψ2 are solutions to the Dirichlet problem. Let > 0 and x a point x0 ∈ D. Choose
a > |x0 |, large enough so that D1 ⊂ Ka (0) and |ψ| < , on Sa (0) which is possible to due to the uniform
41
convergence ψ → 0 as x → ∞. Since ψ = 0 on B , the Maximum principle gives |ψ| < for x ∈ D ∩ Ka , in
particular |ψ(x0 )| < which holds for any > 0, so we conclude that ψ(x0 ) = 0, since x0 is an arbitrary point
in D, ψ ≡ 0. Thus, ψ1 = ψ2 . The proof for the Neumann problem follows from using Green's identity (141) and
the fact that ∇ψ = 0 due to the behavior of ψ1 , ψ2 at innity.
B
Dominated Convergence Theorem
∞
Let (Ω, A, µ) be a measure space. Suppose that {fn }n=1 is a sequence of complex-valued, A-measurable functions
that converge µ-ae. Further suppose that there is a nonnegative Lebesgue integrable function, g , such that
|fn | ≤ g µ-ae for each n ∈ N. Then
Z
Z
lim fn dµ = lim
n→∞
fn dµ
n→∞
E
E
for each E ∈ A. proof page 197 of McDonald and Weiss (1999).
B.1 Bounded Convergence Theorem
∞
Let (Ω, A, µ) be a nite measure space. Suppose that {fn }n=1 is a sequence of uniformly bounded, complexvalued, A-measurable functions that converge µ-ae. Then
Z
Z
lim fn dµ = lim
fn dµ
(142)
n→∞
n→∞
E
E
for each E ∈ A. Proof: page 199 of McDonald and Weiss (1999).
C Ensemble average
Consider a canonical Hamiltonian system, let {yi }1≤i≤N represent the generalized coordinates and {pi }1≤i≤N
the conjugate momenta, and H({yi , pi }) the Hamiltonian. The phase space is the 2N -dimensional space formed
by {yi , pi }1≤i≤N and each point ({yi , pi }) is called a microstate. The system is completely described if we know
the coordinates and momenta of all particles.
Dene the average of a physical quantity A by:
1
Ā = lim
∆t→∞ ∆t
t0Z+∆t
A [{yi (t) , pi (t)}] dt
t0
Often we do not know the system's exact location in phase space nor it's trajectory. What we know is the
macroscopic state. It is therefore usual to investigate an ensemble of systems, a collection of all the microscopic
systems that could possibly belong to the same macroscopic state. This means that the denition of A above
needs to be reformulated in these terms. Instead of a time average over one system we calculate an average over
an ensemble of equivalent systems at a xed time. Implicitly, the assumption is that a trajectory in phase space
will spend equal time intervals in all regions of a constant energy surface (i.e. an energy conservation constraint),
that is accessible from the initial conguration. This property is called ergodicity. And thus we write
Z
Ā = hAiP ≡ A ({yi , pi }) P ({yi , pi }) dydp
B
where B is the whole phase space and P is the probability density that a unit volume in phase space is
occupied.
Ergodicity of a system tends to be a property which is extremely dicult to prove, but it is commonly believed
that the parts in phase space B, in which motion is trapped (recirculating trajectories, i.e. spaces that are
inaccessible to trajectories outside that space) occupy an extremely small relative volume of B. This seems to
42
be the common wisdom derived from the successfully passed empirical tests of a century of statistical mechanics
studies (Bouchet & Venaille 2011). Likewise numerical simulations are encouraging, even when it can be proved
that ergodicity does not hold strictly statistical mechanics results in useful predictions as compared with the
numerical solution.
D Flow map of a dierential equation
From Simi¢ (2005). Suppose F : Rn → Rn is a C 1 vector eld, then for each X0 ∈ Rn the ODE dX/dt = F has
a unique solution X = X (t) with initial conditions X = X0 .
Denition 15 Let Ω ⊂ R, the ow Φ : Ω × Rn → RN of this ODE is dened by:
Φ (t, X0 ) = X (t)
Therefore the dening properties of the ow map are:
Φ (0, X0 ) = X0
and
dΦ (t, X0 )
= F (Φ (t, X0 )) ,
dt
∀t
Denition 16 The time-t map of the ow map is Φt : Rn → Rn and it is dened by
Φt (X0 ) = Φ (t, X0 )
n
n
It is the state of the system after t units of time,
a transformation of R to R . Often notation is abused and
we say that the collection of time-t maps Φt is the ow map.
From the uniqueness of the solutions we have that
Φ0 = id.
Φt+s = Φs ◦ Φt
and
(143)
−1
Thus, the time-t map is invertible and Φt
= Φ−t One of the conveniences of the denition of a ow map
is that instead of considering a particular solution X (t), when we write Φt (X0 ) we consider all the solutions
dependent on the initial condition, a more global point of view. This way the variable becomes X0 , to avoid
confusion we relabel the variable to a more commonly-used variable name, and we write Φt (X) for the solution
that starts at X when t = 0.
Remark: Notice that implicit in (143) we nd that composition of time-t maps is commutative. Clearly, keeping
track of the initial conditions is necessary for this to work out.
Further reading can be found in Bennet (2006).
E Entropy of a Measurable Partition
Denition 17 Let P be a discrete probability measure on the sample space A = {a1 , . . . , an }
P =
n
X
Pi δai ,
Pi ≥ 0,
i=1
n
X
Pi = 1
(144)
i=1
where δai is the delta function at the point ai The Shannon entropy S (P ) of the probability P is dened as
S (P ) = S (P1 , . . . , Pn ) = −
n
X
i=1
43
Pi ln Pi
(145)
The function S is the information-theoretic entropy, it was used by Shannon to measure information. To recall
the intuition behind this equation consider a "word" message as a sequence of binary digits with length n i.e.
we need n-digits to characterize it. The set A2n of all words of length n has 2n = N elements and, clearly, the
amount of information needed to characterize one element is n = log2 N . The amount of information needed to
characterize an element of any set AN , is log2 N for general N. Let A = AN1 ∪ · · · ∪ ANk , where theP
sets ANi are
pairwise disjoint, and each set ANi has Ni elements. Let Pi be given by Pi = Ni /N , where N = i Ni . If we
know that an element of A belongs to some ANi , we then need log2 Ni additional information to determine it
completely. Thus, the average amount of information we need to determine an element, provided that we already
know the ANi to which it belongs, is given by
X Ni
i
N
log2 Ni =
X
(146)
Pi log2 Pi + log2 N
i
Recall that log2 N is the information needed to determine an element of A if we do not know to which ANi the
given element belongs. Thus the corresponding average lack of information is
X
−
Pi log2 Pi
(147)
so now we can see that (145) is a meausre of the lack of information. It can be shown that the Shannon entropy
is unique see Majda & Wang (2006, from which this discussion is taken) page 186 for details.
E.1 Uniquenness of Shannon's entropy
Shannon's Entropy is a unique measure of the lack of information up to a positive constant. This was originally
proved by Janes in 1957, please see Majda and Wang (2006) proposition 6.1 and corresponding references.
Proposition E.1 Let H
P
Pnn be a function dened on the space of discrete
n
p
δ
,
p
≥
0,
i
i=1 i ai
i=1 pi = 1} over sample space A and satisfying
• Hn (p1 , . . . , pn )
is a continuous function.
• A (n) = Hn (1/n, . . . , 1/n)
uncertainty.
•
probability measures P Mn (A) ≡ {p =
three properties:
is monotonic increasing in n, i.e. Hn is monotonic increasing with increasing
(Composition law). If the sample space A = {a1 , . . . , an } is divided into two sets A1 = {a1 , . . . , ak } and
A2 = {ak+1 , . . . , an } with probabilities w1 = p1 + · · · + pk and w2 = pk+1 + · · · + pn and conditional probabilities (p1 /w1 , . . . , pk /wk ) and (pk+1 /wk+1 , . . . , pn /wn ) then the amount of uncertainty with the information
split in this way is the same as it was originally
Hn (p1 , . . . , pn ) = H2 (w1 , wn ) + w1 Hk (p1 /w1 , . . . , pk /w1 ) + w2 Hn−k (pk+1 /w2 , . . . , pn /w2 )
Then Hn is a positive multpiple of the Shannon entropy
Hn (p1 , . . . , pn ) = KS (p1 , . . . , pn ) = −K
n
X
pi ln pi
i=1
E.2 Motivation
Consider a probability space (Ω, A, ν) and suppose that the probability of a particle position in Ω is given by
the probability measure ν . That is, for each A ∈ A the probability that the particle p is in A is equal to ν (A).
We would like to know the position of p as closely as possible, and for this purpose the concept of a measurable
partition is used.
Denition 18 Let (Ω, A) be a measurable space and A ∈ A.
A nite sequence {Ak }nk=1 of subsets of Ω is said to be a measurable partition of A if the Ak 's are A-measurable,
pairwise disjoint and their union is A. That is,
44
1. Ak ∈ A, k = 1, 2, ..., n
2. Ai ∩ Aj = ∅ for i 6= j
3. ∪nk=1 Ak = A
Let B be a measurable partition of (Ω, A). Suppose we can extract information about the location of p by
answering the question "is p in A ?", for each A ∈ B. We are trying to ascertain which element of B contains p.
Some measurable partitions will give us more information than others. For example if a partition reduces
the probability by half when one of the two elements of the partition are chosen (i.e. each partition element
has probability ν = 1/2) then we will have more information than with a partition that also has two elements
1
and the other element of the partition has the probability of the complement,
but one has probability ν = 1000
unless we are very very lucky.
We need to assign a number to the amount of information gained by a measurable partition to be able to
proceed rigorously. That number is called entropy.
Denition 19 Entropy of a measurable partition Let (Ω, A, ν) be a probability space and B a measurable
partition of (Ω, A). Then the entropy of B, denoted H (B), is dened by
H (B) = −
X
ν (A) log ν (A) ,
A∈B
where the convention 0 log 0 = 0 has been adopted.
Here the sum over all the elements of the measurable partition can be thought of, in a more general sense, as an
integral with respect to the probability measure for that space. Notice that the negative sign is needed for the
Entropy to be nonnegative since by denition 0 ≤ ν (A) ≤ 1, ∀A ∈ B
F Lagrange multiplier
Theorem F.1 (Lagrange Multiplier theorem) Let
f : U ⊂ Rn → R and g : U ⊂ Rn → R be given
functions. Let x0 ∈ U and g (x0 ) = c and let S be the level curve for g with value c, i.e. S ≡ {x ∈ Rn : g (x) = c}.
Assume
∇g (x0 ) 6= 0.
If f S has a maximum or minimum on S at x0 then there is a number Λ such that:
∇f (x0 ) = Λ∇g (x0 )
(148)
Remark:
(148) is n+1 equations (g (x0 ) = c, plus the n componentes of the gradient vectors) and n+1 unkowns
(the Lagrange multiplier Λ plus the n components of x ∈ Rn ).
Proof
(A sketch). The tangent space of S at x0 is by denition the space orthogonal to ∇g (x0 ). Consider a
path σ(t) that lies on S such that σ(0) = x0 , then σ 0 (0) is a tangent vector to S at x0 but
d
dc
g (σ (t)) =
=0
dt
dt
and at the same time by the chain rule
d
g (σ (t)) = ∇g (x0 ) · σ 0 (0)
dt
t=0
So that
∇g (x0 ) · σ 0 (0) = 0.
If f S has a maximum at x0 , then clearly f (σ (t)) has a maximum at t = 0, therefore df (σ (t)) /dt t=0 = 0,
and by the chain rule:
d
0 = f (σ (t)) = ∇f (x0 ) · σ 0 (0) = 0
dt
t=0
Thus ∇f (x0 ) is perpendicular to the tangent of every curve on S and it is also perpendicular to the tangent
space of S at x0 . We now have that the gradient of f and g at x0 are parallel which is what (148) states.
45
It is straightforward to extend this theorem to the case with multiple constraints of the form gi (x0 ) = ci for
1 ≤ i ≤ N . In this case we need to solve:
∇f (x0 ) =
N
X
Λi ∇gi (x0 )
i=1
G Orthogonal projections
In this section we include the basic notions of an orthogonal projection, further concepts and examples may be
found in the text by Meyer (2000) that we use as reference for this section.
First, what is a projector (also called a projection)?
Denition 20 If P is idempotent (i.e.
P = P2 )
then P is called a projector.
We will also need the denition of an orthogonal complement.
Denition 21 The orthogonal complement M⊥ of a subset M of an inner-product space13
V
is dened as:
M⊥ = x ∈ V : hm xi = 0, ∀m ∈ M
Let M be a subspace of a vector space V, then V = M ⊕ M⊥ , i.e. for every v ∈ V we have v = m + n where
m ∈ M and n ∈ M⊥ and M ∩ M⊥ = ∅.
We call m the orthogonal projection of v onto M. The orthogonal projector PM onto M along M⊥ is
called the orthogonal projector onto M. PM is the unique linear operator such that PM v = m. If the eld
is C, a formula for the orthogonal projector onto M is given by
−1
PM = M (M∗ M)
M∗
where the columns of M are some (any) basis for M and (·)∗ denotes the conjugate transpose. We note that
if dim M = r then M∗ M is r × r and rank (M∗ M) = rank (M) = r this shows that M∗ M is nonsingular. In the
particular case when the columns of M form an orthonormal basis, then M∗ M = I and our formula reduces to
PM = MM∗
Moreover,
if the columns of M and N constitute orthonormal bases for M and M⊥ respectively, then U =
M N is a unitary matrix14 . In this case the formula for the orthogonal projector can be written as:
I 0
PM = U r
U∗
0 0
where Ir is the r × r identity matrix. So the projector onto M made of orthonormal bases is similar to a diagonal
matrix with ones and zeros.
Whatever the formula we use for PM , the formula for PM⊥ is given by:
PM⊥ = I − PM
13 A
14 A
vector space with an inner product.
unitary matrix Un×n is a complex matrix whose columns (or rows) constitute an orthonormal basis for Cn . Unitary matrices
satisfy U∗ U = UU∗ = I.
46
Download