ABayesian ANOVAschemefor calculating climate anomalies, with applications to theinstrumental temperature record

advertisement
Generatedusingversion3.0oftheofficialAMSLAT E Xtemplate
ABayesian ANOVAschemefor calculating climate anomalies,
with applications to theinstrumental temperature record
*
Martin P. Tingley
NationalCenterforAtmosphericResearch,Boulder,Colorado
*
Correspondingauthoraddress: MartinP.Tingley,NationalCenterforAtmosphericResearch,1850Table
MesaDrive,Boulder,CO 80305andDepartmentofEarthandPlanetarySciences,HarvardUniversity,20
OxfordStreet,Cambridge,MA02138.
E-mail: tingley@fas.harvard.edu
1
ABSTRACT
Climate data sets with both spatial and temporal components are often studied after removingfromeachtimeseriesatemporalmeancalculatedoveracommonreferenceinterval,
whichisgenerallyshorterthantheoveralllengthofthedataset. Theuseofashortreference
intervalaffectsthetemporalpropertiesofthevariabilityacrosstherecords,byreducingthe
standard deviation within the reference interval and inflating it elsewhere. For an annuallyaveragedversionoftheClimateResearchUnit’s(CRU)temperatureanomalyproduct,
themeanstandarddeviationis0.67◦Cwithinthe1961–1990referenceinterval,and0.81◦C
elsewhere.
The calculation of anomalies can be interpreted in terms of a two factor Analysis of
Variancemodel. WithinaBayesianinferenceframework,anymissingvaluesareviewed as
additional parameters, and the reference interval is specified as the fulllength ofthe data
set. ThisBayesianschemeisusedtore-expresstheCRUdatasetasanomalieswithrespect
tomeanscalculatedovertheentire1850–2009intervalspannedbythedataset. Themean
standard deviation is increased to0.69◦C within the original 1961–1990reference interval,
and reduced to 0.76◦C elsewhere. The choice of reference interval thus has a predictable
anddemonstrableeffectonthesecondspatialmomenttimeseriesoftheCRUdataset. The
spatial mean time series is in this case largely unaffected: the amplitude of spatial mean
temperaturechangeisreducedby0.1◦Cwhenusingthe1850–2009referenceinterval,while
the90%uncertaintyintervalof(-0.03,0.23)indicatesthatthereductionisnotstatistically
significant.
1
1. Introduction
Forthepurposesofstudyingclimate,space–timedatasetsareoftenanalyzedafterthe
mean over some specific timeinterval has been removed fromeach time series. Forexample,thegriddedtemperatureanomalycompilationproducedbytheClimateResearchUnit
(CRU)iscomposedofmonthlytimeseriesofanomaliesfroma1961–1990referenceinterval
(Brohanetal.2006),whiletheIPCCFourthAssessmentReportplotsnumerousmillennialscale climate reconstructions asanomalies fromthatsameinterval (Fig6.10,Jansen etal.
2007).
There are both technical and scientific reasons for analyzing temperature anomalies,
ratherthantheactualvalues. Jonesetal.(1999)arguethatuseofanomaliesavoidsanumberofproblemsthatcanpotentiallyarisewhencombiningdailystationdataintomonthly
grid-box averages. The impacts of differences in station elevations, the timings of daily
observations, and the methods used tocalculate monthly means are minimized by considering anomalies, and the resulting data set is more homogeneous than the corresponding
compilationofactualvalues(Jonesetal.1999).
Ingeneral,climatefieldsdisplaycomplexspatialstructures,suchasastrongdependence
onlatitude,sharpgradientsacrossland-seaboundaries,andelevationeffects–allofwhich
canbeseenintheNCEP/NCARreanalysis(Kalnayetal.1996)annualmeantemperature
fieldfor1981(Fig.1a). Manyofthesespatialstructuresarerelativelystableasafunction
oftime,andarethuswellestimatedfromthelong-termtemporalmean(Fig.1b). Thereis
generallylessstructureinthefieldofanomalieswithrespecttoanestimatedlong-termmean
(Fig.1c),andtheanomaliesaregenerallyrepresentativeoflargerspatialareas(e.g.,Hansen
2
andLebedeff1987). Asensibleanalysisoftheanomalyfieldlikelyrequiresfewercovariates,
as the anomaly field is more likely to be spatially stationary (e.g., Banerjee et al. 2004).
As scientific interest often lies in understanding changes in climate fields such as surface
temperatures,ratherthanthedetailsofthefielditself,theanalysisofanomaliesallowsfora
simplerstatisticalmodel,whichfacilitatestheidentificationoftrendsandpatternsofchange.
Giventheunderlyingassumptionthattheclimatefieldischangingintime,itisimportant
toremovefromeachtimeseriesameancalculatedoveracommonreferenceinterval,andthis
reference interval is oftenchosen as asub-interval which minimizes the number ofmissing
values (e.g. Brohan et al. 2006). Using a reference interval that is shorter than the full
lengthofthedatasetleadstoincreaseduncertaintyintheestimationofthemeans(smaller
sample size), andoftendoesnotentirely eliminatethemissing dataproblem. Inaddition,
using a short reference interval results in the variance across the estimated anomaly time
series(spatialvariance)beingreducedwithinthereferenceinterval,andinflatedelsewhere
(see Section 2). In the extreme case, thespatial variancewithin aone time-step reference
intervaliszero.
Anyanalysisofclimateanomaliesthatdependsonsecond-orhigher-momentproperties
maythereforebeaffectedbythechoiceofreferenceintervalusedtocalculatetheanomalies.
Forexample,thefrequencyofclimateeventswhicharedefinedasthresholdexceedances,such
asheatwavesorotherextremes,changesifeitherthemeanorthestandarddeviationchange
(see IPCC 2001, Figure 4-1). As the standard deviation is lower within a short reference
interval, any threshold is more likely to be exceeded outside of the reference interval used
tocalculatetheanomalies,andthiseffectbecomesmorepronouncedthemoreextremethe
threshold.
3
ThescientificinterpretationofseveralfiguresfromtheIPCCFourthAssessmentReport
(Trenberth et al. 2007) is influenced by the reference interval used to calculate climate
anomalies. For example, Figure 3-5 plots zonally averaged temperature anomalies with
respect to a 1961–1990 reference interval as a function of latitude and time, while Figure
3-15 depicts precipitation anomalies in the same manner. The choice of a short reference
intervalaffectsthetemporalevolutionofthespatialstandarddeviationasafunctionoftime,
and this effect is clearly visible in the precipitation plot, which features markedly reduced
variabilitywithinthe1961–1990referenceinterval.
As a final example, paleoclimatic field reconstructions are generally calibrated against
instrumentalanomalieswithrespecttoareferenceintervalthatisasubsetofthecalibration
interval. Mannetal.(2009)presentsareconstruction,calibratedoverthe1850–1995interval,
of surface temperature anomalies relative to a 1961–1990 reference interval. The spatial
patternoftemperatureanomaliesfortheMedievalClimateAnomalyandtheLittleIceAge,
which are shown in Fig. 2 of Mann et al. (2009), are influenced in part by the choice of
reference interval, which affects the spatial variability as a function of time. In addition,
the1850–1995calibrationintervalused inMannetal.(2009)includes boththe1961-1990
reference intervalused tocalculate theinstrumental anomalies, aswell astimes outside of
this interval. The reconstruction is thus calibrated against a data set with non-staionary
statisticalproperties,andthistemporalstructuremayintroduceartifactsintotheestimated
relationshipbetweenthetwodatasets. NotethattheseissuearenotuniquetoMannetal.
(2009); indeed all twelve large-scale temperature reconstructions depicted in Figure 6-10
from Jansen et al. (2007) are calibrated against instrumental temperature anomalies with
respecttoa1961–1990referenceinterval.
4
Analternativeapproachtothecalculationofanomaliesistomaximizethelengthofthe
referenceinterval,therebymitigatingtheeffectsonthesecondmomentpropertiesofthedata
set,whileaccountingfortheincreaseduncertaintythatresultsfromsomeobservationsbeing
missing. WeproposeatwofactorAnalysisofVariance(ANOVA)modelforthecalculation
ofclimateanomalies,withthefactorsbeinglocationandyear. Thedesignisnotbalanced,
as there is either one or zero observations at each combination of factor levels, and this
complexity motivates aBayesian inference scheme. The missing valuesin thedataset are
thentreatedasadditionalparameters,thereferenceintervalisspecifiedasthefulllengthof
thedataset,andtheuncertaintyintheestimatedanomaliesaccountsforthefactthatthe
datasetisincomplete.
Section2illustratestheeffectsofashortreferenceintervalonthetimeseriesofspatial
standarddeviationsusingasimpleexampledataset,Section3presentstheANOVAmodel
forcalculatinganomaliesanddetailsaBayesianapproachtofittingthismodelinthepresence
of missing data, Section 4 uses the ANOVA model to re-express the annual mean CRU
temperature product asanomalies fromthe 1850-2009interval andinvestigates the effects
ofthechangeinreferenceintervalonthetimeseriesofmeansandstandarddeviations,and
Section5providesdiscussionandconcludingremarks.
2. Biasesproducedbyashortreferenceinterval
This section will demonstrate that calculating anomalies from a short reference inter-
val introduces spurious structures in the time series of the standard deviations across the
series. In the climate context, using a short reference interval introduces temporal struc5
ture
in
esti
mat
es
of
the
spa
tial
sta
nda
rd
dev
iati
on.
Co
nsi
der
a
len
gthN
uni
vari
ate
first
ord
era
utor
egr
essi
ve[
AR(
1)]ti
me
seri
esX
,wit
hA
R(1
)par
am
eter
|a|<
1an
din
dep
end
ent
a
n
di
d
e
nt
ic
al
ly
di
st
ri
b
ut
e
d(
II
D
)n
or
m
al
in
n
o
v
at
io
n
s
wi
th
m
e
a
n
z
er
o
a
n
d
v
ar
ia
n
c
e
s2
.
Xi
s
multivariatenormal,
X~N(0,Ω), Ωij
a
n
d
th
e
di
a
g
o
n
al
el
e
m
e
nt
s
of
Ω
ar
e
al
l
e
q
u
al
,
i
m
pl
yi
n
g
th
at
th
e
v
ar
ia
n
c
e
= s21-a2 a|i-j|, (1)
of
X
is
c
o
n
st
a
nt
asafunctionoftime(Fig.2). NowconsideralengthN*
reference intervalthatrunsfrom
timepointss1 tos2,where1=s1
=s2 =N. Thevectorofanomaliesfromthisreference
in
te
rv
al
,Y
,c
a
n
b
e
w
rit
te
n
a
s
al
in
e
ar
tr
a
n
sf
or
m
at
io
n
of
X,
Y
=
(
I
+
A
)
X
w
h
er
eI
ist
h
e
N
b
y
N
id
e
nt
ity
m
at
ri
x
a
n
d
Ai
s
a
n
N
b
y
N
m
at
ri
xc
o
m
p
o
s
e
d
of
c
ol
u
m
n
,
(
2
)
s
of
zerosoutsideofthereferenceinterval,andcolumnsof-1/N*
insidethereferenceinterval.
*), Ω
=(I+A)Ω(I+A)T
. (3)
ThedistributionofYisthen,
*Y~N(0,Ω
* ,which
Whilenostructurehasbeenaddedtothemeanvector,thediagonalelementsofΩ
re
pr
e
s
e
nt
th
e
v
ar
ia
n
c
e
at
e
a
c
h
ti
m
e
p
oi
nt
,
n
o
w
v
ar
y
a
s
a
fu
n
cti
o
n
of
ti
m
e
(F
ig
.
3)
.
T
h
e
*diagonalentriesofΩ canbeexpressedas
,
s
=Var Xi
N1* k=s21 Xk
Ω*
ii
s2
s2
Ω* = 1+ 1N*
Var[Xi]+ 2(N*)2
k=s1,p>k
Cov[Xk,XpN]2*
k=s1
Cov[Xi,Xk]
k=s1
a|i-k|
ii
= 1+ 1N*
Ω* ii
s21-a2 + 2s21-a2N 1
* 2 k=s1,p>k
6
a|k-p| N1*
. (4)
s2 s2
Inth
ese
con
d
line
,
the
first
ter
mis
the
su
mof
the
vari
anc
es,
the
sec
ond
isth
esu
mof
the
c
o
v
ar
ia
n
c
e
s
wi
th
in
th
er
ef
er
e
n
c
e
in
te
rv
al
,
a
n
dt
h
et
hi
rd
te
r
m
is
th
e
s
u
m
of
th
e
c
o
v
ar
ia
n
c
e
s
thbetween the i time point and each time point within the reference interval. For points
located far from the reference interval (i.e, i s1 or i s2), the second sum essentially
dropsout,alltermsarepositive,andΩ* ii
>Ωii. Forpointswithinthereferenceinterval,the
second sum, proportional to 1/N* dominates the first, which is proportional to 1/N 2*, so
thatΩ* ii <Ωii (Fig.3). Asthetwosumsarepartialgeometricseries,aclosed-formexpression
canbederivedforthediagonalelementsΩ * . Theresultingexpression,however,isnomore
andΩii asafunctionoftheAR(1)coefficientand
informativethanEq.4.
ToexplorethediscrepancybetweenΩ* ii
th
er
el
at
iv
el
e
n
gt
h
of
th
er
ef
er
e
n
c
ei
nt
er
v
al
,d
e
fi
n
et
h
e
sc
al
e
d
st
a
n
d
ar
d
d
e
vi
at
io
nr
a
n
g
e
a
s,
∆(a,N,N*)=
max(Ω *ii )
min(ΩΩ) v11
*
ii
Scaling by vΩ11 eliminates the dependence on
s2
sta
nda
rd
dev
iati
on
val
. (5)
, so that ∆(a,N,N
*)
is the range of
ues
ind
uce
d in
the
ano
mal
ies
as
a
pro
port
ion
of
the
sta
nda
rd
dev
iat
io
n
of
th
e
or
ig
in
al
,
st
at
io
n
ar
y
ti
m
e
s
er
ie
s.
T
h
e
v
al
u
e
of
∆(
a,
N,
N*
)
fo
r
fix
e
d
N
is
a
decreasing function of N* and an increasing function of a (Fig.4). As a specific example,
∆(0.
34,
252
,30)
=0.
068
(se
eFi
gs.
2&3
,an
dSe
ctio
n4c
),im
plyi
ngt
hatt
hea
no
mal
ies
with
res
pec
ttoa
30ti
meste
pref
ere
nce
inte
rval
feat
ure
astr
uct
urei
nth
eti
me
seri
eso
fsta
nda
rd
d
e
vi
at
io
n
s
wi
th
a
n
a
m
pl
it
u
d
et
h
at
is
6.
8
%
of
th
e
or
ig
in
al
,s
ta
ti
o
n
ar
ys
ta
n
d
ar
d
d
e
vi
at
io
n.
7
3. AnANOVAmodelforcalculatingclimateanomalies
LetXrepresent amatrix ofobservations ofaclimate variable, with theM rows corresponding to locations and the N columns to equally spaced time points. For example, in
Section4,Xwillcorrespondtoannualmeantemperatureanomaliesatanumberofspatial
locations. Ingeneral,Xwillfeaturemissingvalues,asdifferentlocationshaveinstrumental
observationsthatcoverdifferenttimeperiods. WeexpresstheelementsofXviaatwo-way
ANOVA decomposition (e.g., Scheff´e 1999; Zar 1999), where the factors are location and
=γ+di +µj +ij . (6)
year:
Xij
FollowingstandardANOVAterminology, γ represents thegrandmean, whiletheelements
ofdandµcorrespond,respectively,totheM locationeffectsandNyeareffects. Intuitively,
eachelementofdcorrespondstothetemporalmean(relativetoγ)ataparticularlocation,
whileeachelementofµcorrespondstothemean(relativetoγ)acrossavailableobservations
ataparticulartimepoint.
To ensure identifiability of the parameters, the vectors of location and year effects are
N µj =0. (7)
=0 and
j=1
subjecttosum-to-zeroconstraints,
M
i=1di
As a result, the number of free parameters in d and µ is one less than the length of that
vector. Thesimplestchoiceformodelingtheerrortermsij
istoassumethattheyareIID
normal,ij ~N(0,s2),andthischoiceismadebelow. WhiletheassumptionofIIDerrors
islikelynotcorrect,itsimplifiescalculationsandissufficienttodemonstratetheeffectsofa
shortreferenceinterval;alternativesarediscussedinSection5.
8
Ifthemaingoaloftheanalysis istoarriveatabetterestimate oftheanomalies, then
interestliesprimarilyininferenceonthelocationeffects,d. Estimatesoftheanomalies,Yij,
followfromremovingthegrandmeanandlocationeffectfromeachobservation,
Yij =Xij -γ-di
=µj +ij . (8)
Standardtechniques forfittingANOVAmodels (e.g.,Scheff´e 1999;Zar1999)generally
assumeabalanceddesign,meaningthattherearethesamenumberofobservationsateach
combination of factorlevels. In particular, package ANOVA solutions are notdesigned to
accountforfactorcombinations forwhichtherearenoobservations, asisthecaseifthere
aremissingobservationsinthematrixX.
a. BayesianANOVAwithmissingdata
FittingtheproposedANOVAmodel[Eq.(6)]involvesparameterestimationinthepresence of missing data — a situation amenable to Bayesian analysis. Within the Bayesian
framework,themissingvaluesaretreatedasadditionalparametersthatmustbeestimated,
while the posterior distributions of d, µ and γ include the uncertainty introduced by the
mmissing data. Let X oand
represent the elements of X that are missing and
X
observed,
respectively,wherewerequirethatnoroworcolumnbeentirelymissing. Similarly,letthe
vectors
andXo represent theobserved andmissing elements forthejth year.
·j
Xm ·j
Weseek
m
posterior inference on , d, µ, γ, and s2 o, conditional on X. Application of Bayes’ rule
X
9
yields,
P mX,d,µ,γ,s2
o|X
∝PN=P
|X
o
2
m|X,d,µ,γ,s
Xo
·P, ·P
m|X,d,µ,γ,s2
X
m ·j (9)
m
X j=1
(11)
m,d,µ,γ,s2
X
|d,µ,γ,s2 ·P
d,µ,γ,s2 (10)
j=1
=
o ·jP
X
m ·j,d,µj,s2,γ,
·
N
P X
|d,µj,s2,γ, ·P
d,µ,s2,γ ,
wherethesecondlinefollowsfromtheidentityP(A,B)=P(A|B)P(B),andthethirdfrom
theassumptionthattheerrorvectorsforeachtimepointareindependent.
oThefirsttermontherighthandsideofEq.(11)isthelikelihoodofthedata(X)given
the unknowns. Under the assumption that the error elements ij are IID normal, the off-
iance matrix of X·
d
i
a
g
o
n
a
l
e
l
e
m
e
n
t
s
o
f
t
h
e
c
o
v
a
r
a
likelihood thus do
depend
r
on the missing values, and can be re-expressed as a doubl
univariate
e
z
e
r
o
.
T
h
e
P Xo m
·j ·j|X,d,µj,s2,γ,
=
M,N o ijP
i,j=1
|di,µj,s2,γ, . (12)
X
normals:
N
j=1
ThesecondproductofmutlivariatenormalsinEq.(11)canlikewisebeexpressedasadouble
productofunivariatenormals:
M,N m ijP X|di,µj,s2,γ, . (13)
P Xm ·j|d,µj,s2,γ, = i,j=1
N
j=1
ThesecondtermontherighthandsideofEq.(9)givesthejointpriorfortheunknowns,
mwhich is re-expressed in Eq. (11) as the conditional distribution of X
given the model
parametersmultipliedbythejointpriorford,µ,γ,ands2
. Wespecifyindependentpriors
=P(d)·P(µ)·P s102 ·P(γ). (14)
fortheseparameters,
P d,µ,s2,γ
Detailsofthepriorspecifications,whichmustenforcethesum-to-zeroconstraints[Eq.(7)],
andthesamplingstrategyareprovidedinAppendixA.Theendresultoftheanalysisisan
mensemble
of posterior draws of X, d, µ, γ, and s2, conditional on the data, priors, and
modeling assumptions. The posterior ensemble can be used to used to specify both point
1estimatesanduncertaintiesfortheanomalies,andforanyotherfunctionoftheunknowns.
AnalternativeinferencestrategycouldmakeuseofavariantoftheExpectation-Maximization
algorithmofDempsteretal.(1977). Whilesuchafrequentist approachcanproducepoint
estimates of the missing values
and the vectors d and µ, as well as estimates of the
Xm ·j
associated uncertainty forthese quantities, Bayesian inference isuseful inwhatfollowsfor
tworeasons. First,drawsfromtheposteriorallowforuncertainty estimationinquantities
such as the time series of the change in standard deviations after expressing the data as
anomaliesfromthelongerinterval(Fig.9). Second,obviousextensionstomodelswithspatiallycorrelatedlocationeffectsortemporallycorrelatedyeareffectswillbemoretractable
withinaBayesianframework(seeSection5).
4. CRU annual mean temperatures: anomalies from
1961–1990and from1850–2009
a. Dataandbasicresults
WeapplytheBayesianANOVAmodeltoanannuallyaveragedversionoftheCRUTEM3
data set (Brohan et al. 2006) of land surface temperatures; results are qualitatively un-
1AMatlabcodepackageandrelevantdatafilesareavailableatwww.people.fas.harvard.edu/
11
~ tingley.
changedwhenusingthevariance-adjustedCRUTEM3v. TheCRUTEM3datasetprovides
monthly mean anomalies with respect to a 1961–1990 reference, and we calculate annual
anomaliesbyaveragingallavailablemonthlyobservationsforyearsandlocationsforwhich
thereareatleast9monthlyobservations. ThespatialdistributionofdataavailabilityindicatesthatthelongerinstrumentalrecordsarepredominantlylocatedinEuropeandNorth
America (Fig. 5). Forming the matrix X from time series at each of the 839 locations for
which there is at least one annual mean observation, 45% of the values are missing, with
79%ofthemissingvaluesoccurringinthefirsthalf(1850–1929)ofthe1850–2009interval
spannedbythedataset. Theyear1850is111timestepsfromthebeginningofthe30year
referenceinterval,whichmotivatesthesymmetricexampleinFigs.2&3,whereeachofthe
459 (45% of 839) time series consists of 111 observations on either side of a 30 time-step
referenceinterval.
Results are based on5000samples from the posterior distributions ofγ, d, µ, and s2,
after discarding 600 samples to allow the chain to reach convergence (e.g., Gelman et al.
2003). Details of the hyper-parameters used in the prior distributions for the unknown
parameterscanbefoundinAppendixA.
Theelementsofthelocationeffectsvector,d,arethetemporalmeansofthetimeseries
relativetothegrandmeanγ(Fig.6). Pointestimatesand90%credibleintervalsareformed
fromthemedian,and5th and95th percentiles oftheposteriordraws,respectively. Nineof
the 839locationeffects aregreater (in magnitude) than 0.5◦C, while 108are greater than
0.25◦C.Themeanwidthofthe90%credibleintervalsis0.25◦C,and318ofthe90%point-wise
credibleintervalsdonotcoverzero. Theconsiderablespatialstructureinthelocationeffects
(Fig.6)issomewhat surprising, given thattheoriginalCRUdatasetisalreadyexpressed
12
as anomalies from the 1961–1990mean, and that no spatial structure is assumed a priori
forthevectord(seeSection5). Thecorrelationbetweenthemediansofthelocationeffects
(Fig.6)andthenumberofobservationsatthoselocations(Fig.5)is-0.06,indicatingthat
on aglobal scale, there is essentially no correlation between dataavailability and location
effect.
Theposteriordistributionoftheyeareffects(Fig.6)isnotimportantinthecontextof
calculatinganomalies. Notethattheestimatedyeareffectsshouldnotbeinterpretedasan
estimate of the temporal evolution of the spatial mean of the temperature field; the year
effectsµaresimplythetimeseriesthatismostcommontothedataset,withoutregardto
thespatialdistributionoftheobservationsorthepatternofmissingdata. Estimatesofthe
spatialmean,whichmakeuseofsimpleassumptionsaboutthespace-timecovarianceofthe
temperaturefieldandaccountforthetemporallychangingpatternofdataavailability,are
discussedbelow(Section4c).
Theposteriorhistogramsofthescalarparametersareinallcasessharplypeakedrelative
tothepriors, indicatingthattheposterioris dominatedby theinformationfromthedata
(Fig. 7). Posterior estimates of γ are negative, as the original CRU reference interval of
1961–1990 is warm relative to the longer 1850–2009 interval. The parameters
and s2
µ
s2 d
are related to the variance ofthe location and mean effects, under priors that enforce the
sum-to-zeroconstraintsofEq.(7);detailscanbefoundinAppendixA.
13
b. Timeseriesofsimplemeansandstandarddeviations
Foreachposteriordrawofγ andd,wecalculatethematrixofanomaliesYviaEq.(8).
mAlthoughtheBayesianalgorithmimputesthemissingvaluesX
(andthusYm canbecalculated), forthesakeofinvestigating theeffects ofchangingthereference intervalwefirst
ocomparemeansandstandarddeviationscalculatedusingX
(originalCRUdata;anomalies
from 1961–1990) and (adjusted data set; anomalies from 1850–2009). The time
Yo
series
orYo willbereferred
formedbytaking,ateachyearj,themeanorstandarddeviationofXo ·j
·j
tosimplyasthemeanorstandarddeviationtimeseriesforthatdataset.
Section2demonstratedthat,forindependentAR(1)timeserieswithnomissingvalues,
thechoiceofreferenceintervaldoesnotaddtemporalstructuretothemeantimeseries. Were
woul
othe CRU data complete, the difference between the mean time series of and
Yo
d
X
beconstantasafunctionoftimeandgivenbythenegativeofthegrandmean,γ (Fig.8).
However,ifthespatialdistributionofdataavailabilityiscorrelatedwiththelocationeffects,
thenthemeantimeseriesofYo oandX couldbeverydifferent. Forexample,iflongrecords
have generally positive location effects, then estimates of the mean time series during the
early part of the record would be colder, in relation to the later part, after removing the
location effects and grand mean from each series. For the CRU data set, the correlation
between the number ofobservations and the locationeffects is -0.06, which explains why
othere is little temporalstructure in the difference between the mean time series ofX an
d
(Fig.8).
Y Extending the reference interval from 1961–1990 to 1850–2009 increases the
o
standard
deviationwithintheoriginal1961–1990referenceinterval,anddecreasesthestandarddevi14
atio
n
els
eh
wer
e
(Fig
. 9).
Suc
ha
res
ult
is
to
be
exp
ect
ed,
giv
en
the
res
ults
fro
m
Sec
tion
2
whi
chi
ndi
cat
eth
atth
est
and
ard
dev
iati
oni
sre
duc
ed
with
ina
sho
rtref
ere
nce
inte
rval
and
in
fl
at
e
d
el
s
e
w
h
er
e.
W
it
hi
n(
o
ut
si
d
e
of
)t
h
e
or
ig
in
al
1
9
6
1
–
1
9
9
0r
ef
er
e
n
c
ei
nt
er
v
al
,t
h
e
m
e
a
n
oof the standard deviation time series of X is 0.67◦C (0.81◦C), while that of Yo
is
0.69◦C
C).Re-expressingthedatasetasanomalieswithrespecttothefull1850–1990inter-
(
valthus
0
increases themean standarddeviation within the original1961–1990reference by
.
7
about
0.02◦C and decreases the mean standard deviation elsewhere by about 0.05◦C.
6
The
◦
totalrange(differencebetweenhighestandlowestvalue)ofthestandarddeviationtimeseoriesforX is0.76◦C,whilethatforYo
is0.66◦C.Changingthereferenceintervalthusreduces
th
et
ot
al
ra
n
g
e
of
st
a
n
d
ar
d
d
e
vi
at
io
n
v
al
u
e
s
b
y
a
b
o
ut
1
3
%
.
A
n
e
s
t
i
m
a
t
e
o
f
t
h
e
s
c
a
l
e
d
s
t
a
n
d
a
r
d
d
e
v
i
a
t
i
o
n
r
a
n
g
e
,
∆
(
s
e
e
S
e
c
t
i
o
n
2
a
n
d
E
q
.
(
5
)
)
,
f
o
r
the
CR
U
dat
a
set
req
uire
s
esti
mat
es
of
the
ma
xim
um
and
min
imu
m
of
the
sta
nda
rd
deov
ia
ti
o
n
ti
m
e
s
er
ie
s
of
X,
a
n
d
th
e
c
o
m
m
o
n
st
a
n
d
ar
d
d
e
vi
at
io
n
of
Y
o.
A
s
e
a
c
h
ti
m
e
s
er
ie
s
ofstandarddeviationsisnoisy,weestimate oof max(Ω *ii )and
the standard deviation time series of X
min(Ω *ii )asthemeanvalue
outside and inside of the 1961–1990 refer
interval,respectively, and vΩ11
Y
o
asthemeanvalueofthestandarddeviationtimeseriesof
, giving an estimate of ∆ = 0.19 (Fig. 4). Results are unchanged when forming thes
estimates as the square root of the mean value of the corresponding variance time se
FortheCRUdata,theanomalieswithrespecttothe1961–1990referenceintervalfeaturea
secondmomentstructurewithanamplitudethatis19%ofthebase-linestandarddeviation
oftheanomalieswithrespecttoareferenceintervalthatspanstheentirelengthofthedata
set.
15
c. Timeseriesofspatialmeans
Theposteriordistributionofµ(Fig.5)givesanestimateofthedistributionoftheyear
effects,butshouldnotbeinterpretedasanestimateofthetemporalevolutionofthespatial
meanofthetemperaturefield. Itis,rather,anestimateoftheannualeffectsthataremost
ocommon to the particular time series under study. Likewise, the mean time series of X
andYo (Fig.8)donottakeintoaccountthespatialdistributionoftheobservations,orthe
spatialandtemporalcovariancestructureofthetemperaturefield.
Weestimatethespatialmeantimeseriesoftheglobal(ex-Anarctica)landsurfacetemoperatureanomaliesusingfirsttheoriginalX
–theanomaliesfrom1961–1990–andthenthe
–theanomaliesfrom1850–2009. Toaccountforth
p
o
patternofdataavailabilityandthespatialandtemporalcovarianceofthesurfacete
i
n
ature
anomaly process, we adopt a hierarchical approach (e.g., Gelman et
t
infer in each case the spatially and temporally complete field. The process l
w
i
thetemperatureanomalyfieldasfirst-orderautoregressiveintime,withspatiallyc
s
e
AR(1)
parameter and innovations with covariance thatdecays exponentially
m
e
ofd spatial separation. The resulting space-time covariance form is separable
i
a
bothspace
andtime, andisotropic inspace (e.g.,Banerjeeetal.2004). Attheda
n
o
theobservationsaremodeledasthetruefieldplusIIDnormalobservationalerrors
f
t
processanddatalevelspecificationsresultinamodelwhichisaspecialcase(nopr
h
e
Y
servations)oftheBARCASTalgorithmdescribedinTingleyandHuybers(2010),w
o
usedtoinferbothmodelparametersandthespatiallycompletetemperatureanom
b
To increase the speed of computations, temperatures are inferred at only those 5◦ y
16
5◦ gridboxes thatcontain anon-zerofractionofland accordingtoa.5◦
by .5◦ land
mask
(Rodelletal.2004). Ineffect,thisdecisioneliminatesanumberoftheCRUgridboxesthat
areprimarilyoceanicbutcontainsmall,remoteislands. Thesegridboxeshaveanegligible
effect onestimates ofthespatialmean over land, asthey contain very small areasofland
andaregenerallyisolatedfromotherlandmasses.
Inordertoexploretheimpactsofchangingthereferenceintervalonthespatialmeantime
series, the analysis is conducted in two stages to isolate this effect. First, the 403 annual
mean CRU series that are complete from 1950–2000 are used to estimate all parameters
oof
the BARCAST model, for both the anomalies from 1961–1990(X) and the point-wise
medianoftheanomaliesfrom1850–2009(Yo). Ineachcase,wefindtheposteriorsampleof
the vector ofscalar parameters (see Table 1ofTingley and Huybers 2010)thatis closest,
according to the Mahalanobis distance, to the median of the ensemble of draws of these
parameters. TheposteriormediansoftheAR(1)andvarianceparametersfortheanomalies
fromthelonger1850–2009interval,are,respectively, 0.34◦Cand0.47◦ 2C,andthesevalues
areusedintheexamplesinSection2.
),fixingallscalarparameterssavethelong-termm
BARCASTisthenappliedtoeachanomalydataset(Xo
Yo
ofBARCAST,toinfertemperatureanomaliesatallnodesofthe5◦
by5◦ gridthatcontain
some fraction of land. The long-term mean parameter is allowed to vary in these second
applications, as for both the original and adjusted CRU data, the mean calculated over
1950–2000isdifferent fromthatover 1850–2009. The two-stageapplicationofBARCAST
allows formuch fastercomputation, asinthefirstapplicationthereis onlyonepatternof
missing data,andinthesecond, onlyoneparameterestimateisupdated(seeTingley and
17
Huybers2010,fordetails). Ineffect,thisisanempiricalBayes’solution,astheuncertainty
intheparameterestimatesisnottakenintoaccount.
Results foreach applicationofBARCAST are based on2000draws fromthe posterior
distributionofthetemperatureprocess, afterdiscarding600samplestoallowthechainto
reachconvergence(e.g.,Gelmanetal.2003). Thespatialmeantimeseriesiscalculatedfor
each posterior draw by weighting each grid box by the area of land it contains. In order
to compare the structure and amplitude of globally (ex-Antarctica) averaged land surface
otemperaturechangesinferredusingX andthepoint-wisemedianofYo,weremovefromeach
drawofthespatialmeantimeseriesthecorrespondingdrawoftheBARCASTmeanparameter. Apointestimateofeachspatialmeantimeseriesisthenformedbytakingthemedian
oftheposteriordrawsforeachyear,while90%point-wisecredibleintervalsareformedfrom
the5th and95th percentiles(Fig.10a). Wealsosmootheachofthespatialaveragetimeseries
(afterremovingthecorrespondingdrawoftheBARCASTmeanparameter)byanine-point
Hanningwindow,andcalculatethemedianand90%point-wisecredibleintervals(Fig.10b).
To explore the influence of the reference interval on the rate of long-term temperature
change,wetakethedifferencebetweendrawsofthetemporallysmoothedspatialmeantime
oseries based on the respective analyses of the
and Yo, and calculate the median and
X
90%point-wisecredibleintervalsforthetimeseriesofdifferences(Fig.10c). Thesmoothed
ospatial mean time series calculated using
(anomalies from 1961–1990) is cooler in
X
the
earlier part of the record, and warmer in the later part, relative to the spatial mean time
seriescalculatedusingthepoint-wisemedianofYo (anomaliesfrom1850–2009)
.
The range of the time series of differences in the temporally smoothed spatial means
(Fig.10c)indicatesthechangeintheamplitudeofspatialmeantemperaturesthatcanbe
18
attributedtothechoiceofreferenceinterval. Therangeinthepoint-wiseposteriormedian
is0.1◦C,withthemaximumat1866andtheminimumat2005;theassociated90%credible
intervalis(-0.03,0.23).Inotherwords,changingthereferenceintervalfrom1961–1990to
1850–2009reducestheamplitudeofthe(smoothed)spatialmeantimeseriesbyabout0.1◦C,
but as the associated 90% uncertainty interval covers zero, this result is not significant at
the90%level. Indeed,theveryweakcorrelationbetweenthelocationeffectsandnumberof
observationsateachlocationsuggestedthatthechangeinreferenceintervalwouldnothave
havealargeeffectonthespatialmeantimeseries.
5. Discussionand Conclusions
Therearebothtechnical andscientific reasons foranalyzingclimate datasets afterremoving from each time series a mean value calculated over a common reference interval.
Asinterestinthetemporalevolutionofclimatevariablesextendsbeyondchangesinmean
values,itiscrucialtoensurethatthemethodusedtocalculateanomaliesdoesnotaddspuriousstructurestoeitherthefirst-momentorhigher-momentpropertiesofthedataset. The
Bayesiantwo-factorANOVAapproachtocalculatingclimateanomaliesproposedheremakes
useofallavailabledata,andcalculatesthelocationeffectsoverareferenceintervalthatis
aslongaspossible,whichavoidstheintroductionofnon-climaticsecondmomentstructures
intotheanomalies. Bayesianinferencetreatsthemissingvaluesasadditionalmodelparameters,anduncertaintyestimatesfortheanomaliesincludestheuncertaintythatarrisesfrom
themissingdata.
Severalgeneralizationstothebasicanalysismodel(Section3)arepossible. Theassump19
tion that the only structure in the location effects is that introduced by the sum-to-zero
constraint [Eq. (7)] is plausible for the analysis presented here, as the original CRU data
arealreadyexpressedasanomaliesfromacommoninterval. However,thereisclearspatial
coherencetotheestimatedlocationeffects(Fig.6a),whichcouldbeaccountedforinfuture
workbymodelingthelocationeffectsasaspatialprocesswithastandardspatialcovariance
form(e.g.,Banerjeeetal.2004),modifiedtoaccountforthesum-to-zeroconstraint. When
calculatinganomaliesfromactualvalues(ratherthanadjustingthereferenceinterval,asis
done here), the model for the location effects should also take into account expected spatialstructuresbyincludinglatitude,elevation,andperhapsothervariablesasco-variatesin
theexpressionforthemeanofthelocationeffects. Thetreatmentoftheyeareffectscould
likewise be generalized toinclude temporal trends in the mean structure and acovariance
matrix that accounts fortemporal autocorrelation. Finally, the assumption that the error
terms are IID normal could be modified to account for any observed spatial or temporal
patternsintheresiduals–thoughcaremustbetakentoensureidentifiabilitywhenadding
structuretotheerrorsaswellasthelocationandyeareffects.
Using the basic Bayesian ANOVA scheme introduced here to re-express an annually
averaged version of the CRU’s gridded temperature product as anomalies with respect to
means calculated over the entire 1850–2009 demonstrates the influence that the choice of
referenceintervalcanhaveonthestatisticalpropertiesoftheanomalydataset. Relativeto
theoriginalanomalieswithrespectto1961–1990,theanomalieswithrespecttothelonger
intervaldisplaylargerspatialvariancewithintheoriginal1961–1990referenceinterval,and
smallerspatialvarianceelsewhere. AnyanalysisoftheoriginalCRUdatathatdependson
second-momentproperties,suchasestimatesofspatialpatternsofvariability,orthespatial
20
distribution of extreme values, will thus be affected by second moment features which are
directly attributable to the choice of reference interval. Measured by an estimate of the
scaledstandarddeviationrange(∆fromEq.(5)),theanomalieswithrespecttotheoriginal
1961–1990intervalfeatureasecond-momentstructurewithanamplitudeofabout19%the
magnitudeofthemeanstandarddeviationoftheanomalieswithrespecttothelonger1850–
2009interval. CalculationsforAR(1)timeserieswithAR(1)parameterestimatedfromthe
CRUanomalieswithrespecttothelonger1850–2009predictedaqualitativelysimilarresult,
but a smaller scaled standard deviation range of about 6.8%. The larger value found in
practicecouldresultfromthespatialcovarianceoftheCRUdata,whichwasnotaccounted
forintheANOVAmodelortheexperimentswithAR(1)datainSection2.
For the CRU data, the location effects from the ANOVA analysis are essentially uncorrelated withthe number ofobservations ateachlocation, andasaresult, thechoice of
referenceintervalhaslittleeffectoneitherthetimeseriesofsimplemeans,orestimatesof
thetimeseriesofspatialmeans. Intermsofthespatialmeans,re-expressingtheCRUdata
asanomalieswithrespecttothelonger1850–2009referenceintervalreducestheamplitude
oftemperaturechangeoverthepast160yearsbyabout0.1◦C,butasthe90%uncertainty
interval, (-0.03,0.23), covers zero, this result is not statistically significant. It is importanttoemphasizethatforotherclimatedatasets,wherethelocationeffectsmaybemore
strongly correlated with dataavailability, the choice ofreference interval used tocalculate
the anomalies will influence both the first and second moment structures of the resulting
anomalies.
21
Acknowledgments.
ThecontentandpresentationofthearticlebenefitedfromdiscussionswithT.Greasby,
P. Huybers, D. Nychka, J. Rougier, S. Sain, B. Shaby, and from the comments of two
anonymousreferees.
22
A
P
P
E
N
D
I
X
P
r
i
o
r
s
p
e
c
i
fi
c
a
t
i
o
n
a
n
d
p
o
s
t
e
r
i
o
r
s
a
m
p
l
i
n
g
A
G
i
b
b
s
S
a
m
p
l
e
r
(
e
.
g
.
,
G
e
l
m
a
n
e
t
a
l
.
2
0
0
3
)
i
s
u
s
e
d
t
o
s
a
m
p
l
e
f
r
o
m
t
h
e
p
o
s
t
e
r
i
o
r
d
i
s
t
r
i
buti
ono
fthe
mis
sin
gva
lue
s,y
ear
and
loc
atio
neff
ect
s,a
nds
cal
arp
ara
met
ers
ofth
eA
NO
VA
mo
del.
We
spe
cify
con
jug
ate
prio
rsfo
rall
scal
arp
ara
met
ers;
giv
ent
hes
truc
ture
ofE
q.(1
0),
mitis
not
nec
ess
aryt
osp
ecif
ypri
orsf
orth
eva
lue
soft
he
mis
sin
gob
ser
vati
ons
,X.
We
first
spe
cify
the
fun
ctio
nal
for
ms
of
the
prio
rs
and
full
con
diti
ona
l
pos
teri
ors,
and
the
n
disc
uss
the
hyp
er-p
ara
met
ers
use
d
inth
e
ana
lysi
s
ofth
e
CR
U
dat
a.
The
not
atio
nA|·
will
den
ote
th
e
di
st
ri
b
ut
io
n
of
th
e
v
ar
ia
bl
e
A
c
o
n
di
ti
o
n
al
o
n
al
lo
th
er
v
ar
ia
bl
e
s.
a
.
T
h
e
m
i
s
s
i
n
g
v
a
l
u
e
s
,
X
m
i
,
j
N
o
pr
io
r
is
n
e
c
e
ss
ar
y
fo
r
th
e
m
is
si
n
g
v
al
u
e
s;
th
e
fu
ll
c
o
n
di
ti
o
n
al
p
o
st
er
io
r
fo
r
e
a
c
h
is
+µj,s2). (A1)
normal:
Xm ij|·~N(γ+di
b. Grandmean,γ
,s 2γ ). (A2)
Thenormaldistributionistheconjugateprior:
23
γ~N(µγ
Theconditionalposteriorislikewisenormal,
,Ψγ), (A3)
γ|·~N(ΨγVγ
all Xij +
=
, (A4)
1s2
µsγ2
where
γ
Vγ
-1
= 2N·M +
. (A5)
and
s
1s2 γ
Ψγ
c. Errorvariance,s2
Theinverse-gammadistributionistheconjugateprior:
s2 ~Inverse-Gamma(λ,ν),sothatP s2 ∝
s2-(λ+1)
·exp -ν/s2 (A6)
Note that the inverse-gamma prior can be interpreted as 2λ prior observations with an
average squared deviation of ν/λ (e.g., Gelman et al. 2003). The conditional posterior is
+λ, 1all2 (Xij -γ-di
-µj)2 +ν . (A7)
likewiseinverse-gamma,
s2|·~Inverse-Gamma N·M 2
d. Thelocationandyeareffects,d andµ
The priors for d and µ must take into account the sum-to-zero constraints of Eq. (7).
dWefollowKaufmanandSain(2010),settingd~N(0,S
µ)andµ~N(0,S). UsingIM
to
torepresenttheM byM matrixofones,wespecify
representtheM byM identity,andJM
24
thepriorcovariancesas, =s
dS
-1 M s 2d , ifi=j
2
d
IM 1M JM
=s2
IN 1N JN
. (A8)
µ
Sd =
i
j
I
n
o
t
h
e
r
w
o
r
d
s
,
and Sµ
1
-
1 2d
M
s, ifi=j
(A9)
d
0,1
µandsimilarlyforS
. Thesumoftheelements ofd isthendistributedasN(1
S 1)=
N(0,0), which ensures that the sum-to-zero condition is enforced. Posterior sampling is
dcomplicatedbythesingularityofthepriorcovariancematricesS
µandS . Ourstrategyfor
sampling thevectors oflocationandyeareffects, d andµ, which we detailford,is based
onthepresentationinKaufmanandSain(2010).
The sum-to-zero constraint implies that there are only M -1 free parameters in the
the length M vector d. We therefore seek toexpress d as alinear transformofan M-1
dimensional vector ofIID normal variables. Letting d* 2 ~N(0M-1
,s2 ·IM-1), we must
d|s
d
such that d = QM d* has the required covariance
find an M by M -1 matrix QM
form
[Eq.(A9)].
FollowingKaufmanandSain(2010),definetheM byM-1matrixQ* M
ascolumnsof
Helmert contrasts, which compares the effect of one level of a factor to the mean of the
25
precedingfactors(e.g.,Ruberg1989). Asanexample,forM =4,
=
0 0 -3
. (A10)
0 -2 1
Q*
4
-1 1 1
111
Now solve for an M -1 by M -1 diagonal matrix RM
, where QM = Q* RM , such that
M
d=QMd*
hasthecorrectcovarianceform. Thatis,solveforRM
thatsatisfies,
T*
Q* MRM RM QM d=S =IM 1M JM . (A11)
Sol
vinyields,
gfo
rRM
T*
Q* M QT*
=
RM
M
IM 1M J
Q* M
T
M
Q*
-11/2
M
M
1/2
Q
*
M
*
M
*
M
-1
Q
*
-1-1
M
Q
. (A12)
=
Q T * Q* M Q T Q*
-1/2
M
= QT*MM
*
Q TQ
M
,andthentakingthematrixsquareroot. Thesecondlinefollowsfrom
thefactthateachcolumnofQ* M
sumstozero,sothatQ* M JM =0M . Pluggingthisformfor
intothedefinitionofQM
gives, =Q* M Q TM* Q* -1/2 . (A13)
M
RM
QM
T*
-1
Thefirstlinefollowsfromleft-multiplyingEq.(A11)by QbyQ* MM Q* M
QT*MQ* M-1
QTM* ,rightmultiplying
KaufmanandSain(2010)indicatethateachcolumnofamatrixofHelmert
contrastsneeds
tobescaled by somefactor, anddemonstrate fortheM =3case,
butdonotprovide the
generalformulasderivedhere.
26
Toproducesamplesfromtheconditionalposteriorofd,wefirstdrawfromtheconditional
posteriorofd*,andthentransformusingtheexpressionforQM. Thepriorford*
,givens2
d
M-1
2 dIM-1
isnormal,
d*
~N 0 2
,s
. (A14)
d|s
d
d* ,Ψd* IM-1), (A15)
2 ·Q
·j -γ1M ), (A16)
· j=1
* islikewisenormal,
Thefullconditionalposteriorford
d* |·~N(Ψ * V
T
N
* = 1s
M
(X
where
Vd
and
Ψd* =
-1
N +
. (A17)
s2 1s2 d
The calculation proceeds by substituting Qd* for d in Eq. (11), and factoring the joint
distributionoftheelementsofXasaproductofN multivariatenormals,oneforeachtime
,givens2 µ,isnormal:
interval.
Thetreatmentofµisequivalenttothatford. Thepriorforµ*
µ* 2 ~N 0N-1 ,s2 µIN-1 . (A18)
µ|s
Thefullconditionalposteriorislikewisenormal,
µ*|·~N(Ψ * Vµµ* ,Ψµ* IN-1), (A19)
M (Xi· -γ1N ), (A20)
·QT i=1
N
where
Vµ* = 1s2
and
Ψµ
27
*
=
-1
M +
s2 1s2 µ
. (A21)
andsµ
e. Variancesofthelocationandyeareffects,sd
~Inverse-Gamma(λd,νd)
Theconjugatepriorsareinversegamma:
~Inverse-Gamma(λµ,νµ). (A22)
Thefullconditionalposteriorsarelikewiseinverse-gamma:
s2 d
s
2
µ
+λd, d
s2 d|·=Inverse-Gamma M-1 2
s2 µ|·=Inverse-Gamma N-1 2
T*
d*
T*
µ*
+λµ, µ 2 +ν
2 +ν
d
µ
. (A23)
f. Hyper-parametersfortheanalysisoftheCRUdata
Preliminary analysis ofthe observed values areused
toset the parameters ofthe prior
distributions forγ, s2 ,s2 , s2 µ. Aninitialestimate ofthegrandmean(γ)is formedasthe
d
omeanoftheX
,whilethelocationeffectsareestimatedasthetemporalmeansofX
o
lessth
e
estimated grandmean, andlikewise fortheyeareffects.
Anestimateoftheerrorvariance
ocan be formed by first estimating each element of X as the sum of the grand mean and
corresponding locationand year effects, taking the
difference between these estimates and
the observed values, and then taking the variance of the
resulting residuals. The hyperparametersarethensetasfollows:
• Grandmean,γ.
Setthepriormean,µγ,tothemeanofallavailableobservation
s,and
thepriorvariances2 γ to16timestheestimatedvariance.
• Prior for the error variances, s2. Set λ to 1/2, and ν to
half the estimated residual
28
variance. These parameters corresponds to one prior observation with an average
squareddeviationgivenbytheestimatedresidualvariance.
2 d• Factorvariances,s ands2 µ. Setλd,µ to1/4andsetµd,µ
toonefourththevarianceof
theestimatedeffectvectors. Ineachcase,theseparameterscorrespondtohalfaprior
observationwithanaveragesquareddeviationgivenbythesamplevariance.
29
REFERENCES
Banerjee,S.,B.P.Carlin,andA.E.Gelfand,2004: HierarchicalModelingandAnalysisfor
SpatialStatistics.Chapman&Hall/CRC,NewYork.
Brohan, P., J. J. Kennedy, I. Harris, S. F. B. Tett, and P. D. Jones, 2006: Uncertainty
estimates in regional and global observed temperature changes: A new data set from
1850.JournalofGeophysicalResearch,2,99–113.
Dempster,A.,N.Laird,D.Rubin,etal.,1977: Maximumlikelihoodfromincompletedata
viatheEMalgorithm.JournaloftheRoyalStatisticalSociety,39(1),1–38.
Gelman,A.,J.B.Carlin,H.S.Stern,andD.B.Rubin,2003: BayesianDataAnalysis.2d
ed.,Chapman&Hall/CRC,BocaRaton.
Hansen,J.andS.Lebedeff,1987: Globaltrendsofmeasuredsurfaceairtemperature.Journal
ofGeophysicalResearch,92 (13),345–13.
IPCC, 2001: Climate Change 2001: Synthesis Report.A Contribution of Working Groups
I, II, and III to the Third Assessment Report of the Intergovernmental Panel on ClimateChange,R.WatsonandtheCoreWritingTeam,Eds.,CambridgeUniversityPress,
Cambridge,UnitedKingdomandNewYork,NY,USA.
Jansen,E.,etal.,2007: Palaeoclimate.ClimateChange2007: ThePhysicalScienceBasis.
ContributionofWorkingGroupItotheFourthAssessmentReportoftheIntergovernmen-
tal Panel on Climate Change, S. Solomon, D. Qin, M. Manning, Z. Chen, M. Marquis,
30
K. Averyt, M. Tignor, and H. Miller, Eds., Cambridge University Press, Cambridge,
UnitedKingdomandNewYork,NY,USA,chap.6.
Jones, P., M.New, D.Parker, S.Martin, andI.Rigor,1999: Surface airtemperature and
itschangesoverthepast150years.ReviewsofGeophysics,37(2),173–199.
Kalnay, E., et al., 1996: The NCEP/NCAR 40-year reanalysis project. Bulletin of the
American Meteorological Society, 77 (3), 437–471, NCEP Reanalysis Derived data providedbytheNOAA/OAR/ESRLPSD,Boulder,Colorado,USA,fromtheirWebsiteat
http://www.esrl.noaa.gov/psd/.
Kaufman, C. and S. Sain, 2010: Bayesian functional ANOVA modeling using Gaussian
processpriordistributions.BayesianAnalysis,5(1),123–150.
Mann, M., etal., 2009: Globalsignatures anddynamical originsoftheLittleIce Ageand
MedievalClimateAnomaly.Science,326,1256–1260.
Rodell,M.,etal.,2004: Thegloballanddataassimilationsystem.BulletinoftheAmerican
MeteorologicalSociety,85(3),381–394,dataavailableathttp://ldas.gsfc.nasa.gov/
gldas/GLDASvegetation.php.
Ruberg, S., 1989: Contrasts for identifying the minimum effective dose. Journal of the
AmericanStatisticalAssociation,84 (407),816–822.
Scheff´e,H.,1999: Theanalysisofvariance.Wiley-Interscience.
Tingley, M. and P. Huybers, 2010: A Bayesian Algorithm for Reconstructing Climate
31
Anomalies in Space and Time. Part 1: Development and applications to paleoclimate
reconstructionproblems.JournalofClimate,23(10),2759–2781.
Trenberth, K., et al., 2007: Observations: Surface and atmospheric climate change. ClimateChange2007: ThePhysicalScienceBasis.ContributionofWorkingGroupItothe
FourthAssessmentReportoftheIntergovernmentalPanelonClimateChange,S.Solomon,
D.Qin, M.Manning, Z.Chen, M.Marquis, K.Averyt, M.Tignor, andH.Miller, Eds.,
Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA,
chap.3.
Zar,J.H.,1999: BiostatisticalAnalysis.4thed.,PearsonEduucation,Singapore.
32
ListofFigures
1 NCEPreanalysis(Kalnayetal.1996)fortheannualmeantemperaturefield.
(a) Values for 1981. (b) The 1968–1996 long-term mean. (c) The 1981
anomaliesfromthelongtermmean. 36
2 Upper panel: 459 independent AR(1) time series (a = 0.34, s2
= 0.47),
box-plotsateveryfourthtimestep,andthesamplemeanateachtimestep.
Lowerpanel: thesamplestandarddeviationateachtimestep(black)andthe
populationstandarddeviation(grey)calculatedfromEq.(1). Thenumberof
time series, the relative length ofthereference interval, andthe values ofa
ands2 arechosentocorrespondtoestimatesfromtheCRUdatasetanalyzed
inSection4. 37
3 AsinFig.2,butafterremovingfromeachtimeseriesthesamplemeancalculatedoverthe30time-stepshadedinterval. Inthelowerpanel,thegreyline
isthepopulationstandarddeviation oftheoriginalseries, andtheblueline
isthepopulationstandarddeviationoftheanomaliescalculatedfromEq.(4). 38
4 Thescaledstandarddeviationrange,∆(a,N,N*),asafunctionofboththe
AR(1)coefficientandthelengthofthereferenceinterval,forAR(1)timeseries
oflengthN =252with reference intervalcentered onN/2. The white plus
signdemarcatesN* =30anda=0.34,whichcorrespondtotheexamplein
Figs. 2&3and the parameter values obtained fromthe analysis ofthe CRU
data in Section 4. The black contour corresponds to the actual value of ∆
estimatedfromtheCRUdatasetinSection4. 39
33
5 The number ofyears inthe 1850–2009interval forwhich there isan annual
mean temperature anomaly observation, as a function of spatial location,
calculatedfromtheCRUTEM3monthlydataset(Brohanetal.2006). 40
6 (a)Posteriormediansofthelocationeffects,d. Themeanwidthofthe90%
point-wisecredibleintervalsis0.25◦C,andhatchingindicateslocationswhere
thecorresponding90%credibleintervalcontainszero. (b)Posteriorestimates
oftheyeareffects,µ. Theposteriormedianisshowninblack,and90%pointwisecredibleintervalsinlightgrey. 41
7 Black: posteriorhistogramsofthegrandmean(γ),errorvariance(s2),and
thevariancesoftheyearandlocationeffects(s2 µ ands2 d). Dashedgrey:
prior
distributionsfortheseparameters. 42
o8
Upper panel: The mean time series of X, the original CRU temperature
anomaliesusinga1961–1990referenceinterval(black)andthemeantimese-
riesofYo,theadjustedanomaliesusingan1850–2009referenceinterval(red).
Results for the 1850–2009 reference interval are formed by first calculating
the anomalies
via Eq. (8), and then the average across these
Yo
anomalies,
foreachposteriordrawofγ andd. Boththemediansandthecorresponding
90% point-wise credible intervals (light red shading; not readily discernible
fromtheredmedians)oftheresultingdistributionareplotted. LowerPanel:
Median(black) and90%credible interval(shading)forthedifference inthe
meantimeseriesofeachdrawofYo
andthemeantimeseriesoftheoriginal
o . 43
X
andYo. 44
o9 AsinFigure8,butforthestandarddeviationtimeseriesofX
34
10 (a)Timeseriesofspatiallyaveragedglobal(ex-Antarctica)landsurfacetemperature anomalies based on the annual mean CRU temperature anomalies
from the original 1961–1990 reference interval (black) and from the longer
1850–2009interval(red). 90%point-wisecredibleintervalsareshownforthe
spatialaveragecalculatedusingtheanomaliesfromthelonger1850–2009interval(lightredshading), andthecorresponding uncertainty intervals when
using the shorter 1961–1990 interval are similar. In both cases, a reduced
formofBARCAST (Tingley andHuybers 2010)wasused toinferthemissingtemperaturevalues inspace andtime, andthetemporalmeanhasbeen
removed fromeachdrawofthespatialmeantimeseriespriortocalculating
percentiles. (b)Asin(a),buteachdrawofthespatialmeantimeseriesfrom
BARCAST is smoothed using a nine-point Hanning window prior to calculatingpercentiles. (c)Themediandifferencebetweensmootheddrawsofthe
globalmeantemperatureseriesusingthetworeferenceintervals(black),and
the90%point-wiseuncertainty(grey). 45
35
(a) 1981
values
60
60ooN N
30
30ooN N
(b) 1968-1996 mean
oo
WW
50
50
(c) 1981
anomalies
60
60ooN N
60 ooW
60ooN WN
3
0
30
30ooN N
50
50ooW
W
50
50
3
0
o
o
N
N
o
o
1
-25 -20 -15 -10 -5 0 5 10 15 20 250
0
1
0
0
W
1
0
0
o oo 10
o 100W
1
0
0
Fig.1. NCEPreanalysis(Kalnayetal.1996)fortheannualmeantemperaturefield. (a
W
Valuesfor1981. (b)The1968–1996long-termmean. (c)The1981anomaliesfromth
termmean.
36
Time Series
2
0
-2
0 20 40 60 80 100 120 140 160 180 200 220 240
Standard Dev.
0.8
0.75
0.7
0 50 100 150 200 250 0.65
Time step
Fig.2. Upperpanel: 459independent AR(1)timeseries(a=0.34,s22=0.47),box-plots
ateveryfourthtimestep,andthesamplemeanateachtimestep. Lowerpanel: thesample
standarddeviationateachtimestep(black)andthepopulationstandarddeviation(grey) calculated
from Eq.(1). The number of time series, the relative length of the reference interval, and the
values of a and sare chosen to correspond to estimates from the CRU
datasetanalyzedinSection4.
37
Time Series
2
0
-2
0 20 40 60 80 100 120 140 160 180 200 220 240
Standard Dev.
0.8
0.75
0.7
0 50 100 150 200 250 0.65
Time step
Fig.3. AsinFig.2,butafterremovingfromeachtimeseriesthesamplemeancalculated
overthe30time-stepshadedinterval.
Inthelowerpanel,thegreylineisthepopulationstandarddeviationoftheoriginalseries,andtheblueli
neisthepopulationstandarddeviation oftheanomaliescalculatedfromEq.(4).
38
1.2
0.9
1
0.7
0.6
0.5
0.4
0.3
0.2
Value of the AR(1) coefficient, a
0.8
0.8
0.6
0.4
0.2
0
0.1
20 40 60 80 100 120 0
Length of the reference
interval, N*
Fig.4. Thescaledstandarddeviationrange,∆(a,N,N*
*),asafunctionofboththeAR(1)
coefficientandthelengthofthereferenceinterval,f
reference interval centered on N/2. The white p
a=0.34,whichcorrespondtotheexampleinFigs.2&
fromtheanalysisoftheCRUdatainSection4. Theb
of∆estimatedfromtheCRUdatasetinSection4.
39
W
1
2
0
W
60
60N N
oo
1
2
0
30
30oN N
o
0oo
oW
o
oW
o
6
0
W
0
W
0
6
0
o
o
6
0
o
o
140 160
Observations per
6
0
Fig.5. Thenumberofyearsinthe1850–2009intervalfor
temperature anomaly observation, as a function of s
CRUTEM3monthlydataset(Brohanetal.2006).
40
0
30
30oS S
o
60
60oS S
o
o
o
1
8
0
1
8
0
E
1
2
0
E
1
2
0
oC
W 120W
120
-1 -0.5 0 0.5 1
(a)
oo
o
o
o
o
W
W
6
0
W
0
W
0
60
60N N
oo
6
0
30
30oN N
o
0oo
0
30
30oS S
o
60
60oS S
o
oo
180
180
1 (b
0 )
0.
5
1
8
5
0
1
9
0
0
1
9
5
0
2
0
0
0
0
.
5
◦Fig.6.
(a)Posteriormediansofthelocation
effects,d.
Themeanwidthofthe90%pointwis
e credible
intervalsis0.25C,andhatchingindi
cates locationswhere
thecorresponding 90% credible
interval contains zero. (b)
Posterior estimates of the year
effects, µ. The
posteriormedianisshowninblack,a
nd90%point-wisecredibleinterval
Y
e
a
r
o
o
ooE
120E 6 120
0
6
0
E 180E
180
oW
o
W
o
C
sinlightgrey.
41
100
50 γ
-0.13 -0.125 -0.12 -0.115 -0.11
-0.105 0
50
10 2s
0
15
0
0
.
5
1
4
0
.
5
1
6
0
.
5
1
8
0
.
5
2
0
.
5
2
2
0
.
5
2
4
0
.
5
2
6
0
.
5
2
8
0
.
5
3
0
.
5
3
2
0
2
0
s
2
µ
10
. 0
1
0
.
1
2
0
.
1
4
0
.
1
6
0
.
1
8
0
.
2
0
.
2
2
0
.
2
4
0
2
d
5
0
1
0
0
1
5
0
s
0
.
0
3
0
.
0
3
2
0
.
0
3
4
0
.
0
3
6
0
.
0
3
8
0
.
0
4
0
.
0
4
2
0
.
0
4
4
0
Fig. 7. Black: posterior
histograms of the grand
mean (γ), error variance
(s2), and the
variancesoftheyearandloca
tioneffects(s2 µands2 d).
Dashedgrey:
priordistributionsfor
theseparameters.
42
CRU: Anomalies from
1961-1990 Adjusted:
Anomalies from 1850-2009
90% Credible interval
o
C
1
0.5
0
-0.5
o
1850 1900 1950 2000
1850 1900 1950 2000 0.09
oFig.8.
Upperpanel:
ThemeantimeseriesofX,theoriginalCRUtemperatureanomalies using a
1961–1990reference interval (black) and the mean time series of Yo, the
adjusted anomaliesusingan1850–2009referenceinterval(red).
Resultsforthe1850–2009reference
intervalareformedbyfirstcalculatingtheanomaliesYoviaEq.(8),andthentheave
rage across these anomalies, for each posterior draw of γ and d. Both the
medians and the corresponding 90% point-wise credible intervals (light red
shading; not readily discernible
fromtheredmedians)oftheresultingdistributionareplotted. LowerPanel:
Median(black)
and90%credibleinterval(shading)forthedifferenceinthemeantimeseriesofeac
hdraw ofYandthemeantimeseriesoftheoriginalX.
43
0
.
1
Difference 90%
Credible interval
Negative Grand Mean
o
0
.
1
1
0
.
1
2
0
.
1
3
0
.
1
4
0
.
1
5
Mean in
o
C
CRU: Anomalies from
1961-1990 Adjusted:
Anomalies from 1850-2009
90% Credible interval
o
1.2
1
0.8
0.6
1850 1900 1950 2000
Difference
andYo.
1850 1900 1950 2000 -0.1
Years AD
oFig.9.
AsinFigure8,butforthestandarddeviationtimeseriesofX
44
0.05
0
-0.05
D
i
f
f
e
r
e
n
c
e
9
0
%
C
r
e
d
i
b
l
e
Standard dev. in
i
n
t
e
r
v
a
l
Years AD
o
-0.5 0
C
(a)
0.5 1
0.5 1
Anomalies from 1961-1990 Anomalies from 1850-2009 90% Credible
(
interval
c
)
o
C
-0.5 (b)
0
o
C
0.2
0
Difference 90% Credible interval
1850 1900 1950 2000 -0.2
Year
Fig.10.
(a)Timeseriesofspatiallyaveragedglobal(ex-Antarctica)landsurfacetemperat
ure anomalies based on the annual mean CRU temperature anomalies
from the original 1961–1990 reference interval (black) and from the longer
1850–2009 interval (red). 90%
point-wisecredibleintervalsareshownforthespatialaveragecalculatedusingth
eanomalies from the longer 1850–2009interval (light red shading), and the
corresponding uncertainty intervals when using the shorter 1961–1990
interval are similar. In both cases, a reduced
formofBARCAST(TingleyandHuybers2010)wasusedtoinferthemissingtemp
erature
valuesinspaceandtime,andthetemporalmeanhasbeenremovedfromeachdra
wofthe spatial mean time series prior to calculating percentiles. (b) As in
(a), but each draw of
thespatialmeantimeseriesfromBARCASTissmoothedusinganine-pointHanni
ngwindowpriortocalculatingpercentiles.
(c)Themediandifferencebetweensmootheddrawsof the globalmean
temperature series using the two reference intervals (black), andthe 90%
point-wiseuncertainty(grey).
45
Download