Generatedusingversion3.0oftheofficialAMSLAT E Xtemplate ABayesian ANOVAschemefor calculating climate anomalies, with applications to theinstrumental temperature record * Martin P. Tingley NationalCenterforAtmosphericResearch,Boulder,Colorado * Correspondingauthoraddress: MartinP.Tingley,NationalCenterforAtmosphericResearch,1850Table MesaDrive,Boulder,CO 80305andDepartmentofEarthandPlanetarySciences,HarvardUniversity,20 OxfordStreet,Cambridge,MA02138. E-mail: tingley@fas.harvard.edu 1 ABSTRACT Climate data sets with both spatial and temporal components are often studied after removingfromeachtimeseriesatemporalmeancalculatedoveracommonreferenceinterval, whichisgenerallyshorterthantheoveralllengthofthedataset. Theuseofashortreference intervalaffectsthetemporalpropertiesofthevariabilityacrosstherecords,byreducingthe standard deviation within the reference interval and inflating it elsewhere. For an annuallyaveragedversionoftheClimateResearchUnit’s(CRU)temperatureanomalyproduct, themeanstandarddeviationis0.67◦Cwithinthe1961–1990referenceinterval,and0.81◦C elsewhere. The calculation of anomalies can be interpreted in terms of a two factor Analysis of Variancemodel. WithinaBayesianinferenceframework,anymissingvaluesareviewed as additional parameters, and the reference interval is specified as the fulllength ofthe data set. ThisBayesianschemeisusedtore-expresstheCRUdatasetasanomalieswithrespect tomeanscalculatedovertheentire1850–2009intervalspannedbythedataset. Themean standard deviation is increased to0.69◦C within the original 1961–1990reference interval, and reduced to 0.76◦C elsewhere. The choice of reference interval thus has a predictable anddemonstrableeffectonthesecondspatialmomenttimeseriesoftheCRUdataset. The spatial mean time series is in this case largely unaffected: the amplitude of spatial mean temperaturechangeisreducedby0.1◦Cwhenusingthe1850–2009referenceinterval,while the90%uncertaintyintervalof(-0.03,0.23)indicatesthatthereductionisnotstatistically significant. 1 1. Introduction Forthepurposesofstudyingclimate,space–timedatasetsareoftenanalyzedafterthe mean over some specific timeinterval has been removed fromeach time series. Forexample,thegriddedtemperatureanomalycompilationproducedbytheClimateResearchUnit (CRU)iscomposedofmonthlytimeseriesofanomaliesfroma1961–1990referenceinterval (Brohanetal.2006),whiletheIPCCFourthAssessmentReportplotsnumerousmillennialscale climate reconstructions asanomalies fromthatsameinterval (Fig6.10,Jansen etal. 2007). There are both technical and scientific reasons for analyzing temperature anomalies, ratherthantheactualvalues. Jonesetal.(1999)arguethatuseofanomaliesavoidsanumberofproblemsthatcanpotentiallyarisewhencombiningdailystationdataintomonthly grid-box averages. The impacts of differences in station elevations, the timings of daily observations, and the methods used tocalculate monthly means are minimized by considering anomalies, and the resulting data set is more homogeneous than the corresponding compilationofactualvalues(Jonesetal.1999). Ingeneral,climatefieldsdisplaycomplexspatialstructures,suchasastrongdependence onlatitude,sharpgradientsacrossland-seaboundaries,andelevationeffects–allofwhich canbeseenintheNCEP/NCARreanalysis(Kalnayetal.1996)annualmeantemperature fieldfor1981(Fig.1a). Manyofthesespatialstructuresarerelativelystableasafunction oftime,andarethuswellestimatedfromthelong-termtemporalmean(Fig.1b). Thereis generallylessstructureinthefieldofanomalieswithrespecttoanestimatedlong-termmean (Fig.1c),andtheanomaliesaregenerallyrepresentativeoflargerspatialareas(e.g.,Hansen 2 andLebedeff1987). Asensibleanalysisoftheanomalyfieldlikelyrequiresfewercovariates, as the anomaly field is more likely to be spatially stationary (e.g., Banerjee et al. 2004). As scientific interest often lies in understanding changes in climate fields such as surface temperatures,ratherthanthedetailsofthefielditself,theanalysisofanomaliesallowsfora simplerstatisticalmodel,whichfacilitatestheidentificationoftrendsandpatternsofchange. Giventheunderlyingassumptionthattheclimatefieldischangingintime,itisimportant toremovefromeachtimeseriesameancalculatedoveracommonreferenceinterval,andthis reference interval is oftenchosen as asub-interval which minimizes the number ofmissing values (e.g. Brohan et al. 2006). Using a reference interval that is shorter than the full lengthofthedatasetleadstoincreaseduncertaintyintheestimationofthemeans(smaller sample size), andoftendoesnotentirely eliminatethemissing dataproblem. Inaddition, using a short reference interval results in the variance across the estimated anomaly time series(spatialvariance)beingreducedwithinthereferenceinterval,andinflatedelsewhere (see Section 2). In the extreme case, thespatial variancewithin aone time-step reference intervaliszero. Anyanalysisofclimateanomaliesthatdependsonsecond-orhigher-momentproperties maythereforebeaffectedbythechoiceofreferenceintervalusedtocalculatetheanomalies. Forexample,thefrequencyofclimateeventswhicharedefinedasthresholdexceedances,such asheatwavesorotherextremes,changesifeitherthemeanorthestandarddeviationchange (see IPCC 2001, Figure 4-1). As the standard deviation is lower within a short reference interval, any threshold is more likely to be exceeded outside of the reference interval used tocalculatetheanomalies,andthiseffectbecomesmorepronouncedthemoreextremethe threshold. 3 ThescientificinterpretationofseveralfiguresfromtheIPCCFourthAssessmentReport (Trenberth et al. 2007) is influenced by the reference interval used to calculate climate anomalies. For example, Figure 3-5 plots zonally averaged temperature anomalies with respect to a 1961–1990 reference interval as a function of latitude and time, while Figure 3-15 depicts precipitation anomalies in the same manner. The choice of a short reference intervalaffectsthetemporalevolutionofthespatialstandarddeviationasafunctionoftime, and this effect is clearly visible in the precipitation plot, which features markedly reduced variabilitywithinthe1961–1990referenceinterval. As a final example, paleoclimatic field reconstructions are generally calibrated against instrumentalanomalieswithrespecttoareferenceintervalthatisasubsetofthecalibration interval. Mannetal.(2009)presentsareconstruction,calibratedoverthe1850–1995interval, of surface temperature anomalies relative to a 1961–1990 reference interval. The spatial patternoftemperatureanomaliesfortheMedievalClimateAnomalyandtheLittleIceAge, which are shown in Fig. 2 of Mann et al. (2009), are influenced in part by the choice of reference interval, which affects the spatial variability as a function of time. In addition, the1850–1995calibrationintervalused inMannetal.(2009)includes boththe1961-1990 reference intervalused tocalculate theinstrumental anomalies, aswell astimes outside of this interval. The reconstruction is thus calibrated against a data set with non-staionary statisticalproperties,andthistemporalstructuremayintroduceartifactsintotheestimated relationshipbetweenthetwodatasets. NotethattheseissuearenotuniquetoMannetal. (2009); indeed all twelve large-scale temperature reconstructions depicted in Figure 6-10 from Jansen et al. (2007) are calibrated against instrumental temperature anomalies with respecttoa1961–1990referenceinterval. 4 Analternativeapproachtothecalculationofanomaliesistomaximizethelengthofthe referenceinterval,therebymitigatingtheeffectsonthesecondmomentpropertiesofthedata set,whileaccountingfortheincreaseduncertaintythatresultsfromsomeobservationsbeing missing. WeproposeatwofactorAnalysisofVariance(ANOVA)modelforthecalculation ofclimateanomalies,withthefactorsbeinglocationandyear. Thedesignisnotbalanced, as there is either one or zero observations at each combination of factor levels, and this complexity motivates aBayesian inference scheme. The missing valuesin thedataset are thentreatedasadditionalparameters,thereferenceintervalisspecifiedasthefulllengthof thedataset,andtheuncertaintyintheestimatedanomaliesaccountsforthefactthatthe datasetisincomplete. Section2illustratestheeffectsofashortreferenceintervalonthetimeseriesofspatial standarddeviationsusingasimpleexampledataset,Section3presentstheANOVAmodel forcalculatinganomaliesanddetailsaBayesianapproachtofittingthismodelinthepresence of missing data, Section 4 uses the ANOVA model to re-express the annual mean CRU temperature product asanomalies fromthe 1850-2009interval andinvestigates the effects ofthechangeinreferenceintervalonthetimeseriesofmeansandstandarddeviations,and Section5providesdiscussionandconcludingremarks. 2. Biasesproducedbyashortreferenceinterval This section will demonstrate that calculating anomalies from a short reference inter- val introduces spurious structures in the time series of the standard deviations across the series. In the climate context, using a short reference interval introduces temporal struc5 ture in esti mat es of the spa tial sta nda rd dev iati on. Co nsi der a len gthN uni vari ate first ord era utor egr essi ve[ AR( 1)]ti me seri esX ,wit hA R(1 )par am eter |a|< 1an din dep end ent a n di d e nt ic al ly di st ri b ut e d( II D )n or m al in n o v at io n s wi th m e a n z er o a n d v ar ia n c e s2 . Xi s multivariatenormal, X~N(0,Ω), Ωij a n d th e di a g o n al el e m e nt s of Ω ar e al l e q u al , i m pl yi n g th at th e v ar ia n c e = s21-a2 a|i-j|, (1) of X is c o n st a nt asafunctionoftime(Fig.2). NowconsideralengthN* reference intervalthatrunsfrom timepointss1 tos2,where1=s1 =s2 =N. Thevectorofanomaliesfromthisreference in te rv al ,Y ,c a n b e w rit te n a s al in e ar tr a n sf or m at io n of X, Y = ( I + A ) X w h er eI ist h e N b y N id e nt ity m at ri x a n d Ai s a n N b y N m at ri xc o m p o s e d of c ol u m n , ( 2 ) s of zerosoutsideofthereferenceinterval,andcolumnsof-1/N* insidethereferenceinterval. *), Ω =(I+A)Ω(I+A)T . (3) ThedistributionofYisthen, *Y~N(0,Ω * ,which Whilenostructurehasbeenaddedtothemeanvector,thediagonalelementsofΩ re pr e s e nt th e v ar ia n c e at e a c h ti m e p oi nt , n o w v ar y a s a fu n cti o n of ti m e (F ig . 3) . T h e *diagonalentriesofΩ canbeexpressedas , s =Var Xi N1* k=s21 Xk Ω* ii s2 s2 Ω* = 1+ 1N* Var[Xi]+ 2(N*)2 k=s1,p>k Cov[Xk,XpN]2* k=s1 Cov[Xi,Xk] k=s1 a|i-k| ii = 1+ 1N* Ω* ii s21-a2 + 2s21-a2N 1 * 2 k=s1,p>k 6 a|k-p| N1* . (4) s2 s2 Inth ese con d line , the first ter mis the su mof the vari anc es, the sec ond isth esu mof the c o v ar ia n c e s wi th in th er ef er e n c e in te rv al , a n dt h et hi rd te r m is th e s u m of th e c o v ar ia n c e s thbetween the i time point and each time point within the reference interval. For points located far from the reference interval (i.e, i s1 or i s2), the second sum essentially dropsout,alltermsarepositive,andΩ* ii >Ωii. Forpointswithinthereferenceinterval,the second sum, proportional to 1/N* dominates the first, which is proportional to 1/N 2*, so thatΩ* ii <Ωii (Fig.3). Asthetwosumsarepartialgeometricseries,aclosed-formexpression canbederivedforthediagonalelementsΩ * . Theresultingexpression,however,isnomore andΩii asafunctionoftheAR(1)coefficientand informativethanEq.4. ToexplorethediscrepancybetweenΩ* ii th er el at iv el e n gt h of th er ef er e n c ei nt er v al ,d e fi n et h e sc al e d st a n d ar d d e vi at io nr a n g e a s, ∆(a,N,N*)= max(Ω *ii ) min(ΩΩ) v11 * ii Scaling by vΩ11 eliminates the dependence on s2 sta nda rd dev iati on val . (5) , so that ∆(a,N,N *) is the range of ues ind uce d in the ano mal ies as a pro port ion of the sta nda rd dev iat io n of th e or ig in al , st at io n ar y ti m e s er ie s. T h e v al u e of ∆( a, N, N* ) fo r fix e d N is a decreasing function of N* and an increasing function of a (Fig.4). As a specific example, ∆(0. 34, 252 ,30) =0. 068 (se eFi gs. 2&3 ,an dSe ctio n4c ),im plyi ngt hatt hea no mal ies with res pec ttoa 30ti meste pref ere nce inte rval feat ure astr uct urei nth eti me seri eso fsta nda rd d e vi at io n s wi th a n a m pl it u d et h at is 6. 8 % of th e or ig in al ,s ta ti o n ar ys ta n d ar d d e vi at io n. 7 3. AnANOVAmodelforcalculatingclimateanomalies LetXrepresent amatrix ofobservations ofaclimate variable, with theM rows corresponding to locations and the N columns to equally spaced time points. For example, in Section4,Xwillcorrespondtoannualmeantemperatureanomaliesatanumberofspatial locations. Ingeneral,Xwillfeaturemissingvalues,asdifferentlocationshaveinstrumental observationsthatcoverdifferenttimeperiods. WeexpresstheelementsofXviaatwo-way ANOVA decomposition (e.g., Scheff´e 1999; Zar 1999), where the factors are location and =γ+di +µj +ij . (6) year: Xij FollowingstandardANOVAterminology, γ represents thegrandmean, whiletheelements ofdandµcorrespond,respectively,totheM locationeffectsandNyeareffects. Intuitively, eachelementofdcorrespondstothetemporalmean(relativetoγ)ataparticularlocation, whileeachelementofµcorrespondstothemean(relativetoγ)acrossavailableobservations ataparticulartimepoint. To ensure identifiability of the parameters, the vectors of location and year effects are N µj =0. (7) =0 and j=1 subjecttosum-to-zeroconstraints, M i=1di As a result, the number of free parameters in d and µ is one less than the length of that vector. Thesimplestchoiceformodelingtheerrortermsij istoassumethattheyareIID normal,ij ~N(0,s2),andthischoiceismadebelow. WhiletheassumptionofIIDerrors islikelynotcorrect,itsimplifiescalculationsandissufficienttodemonstratetheeffectsofa shortreferenceinterval;alternativesarediscussedinSection5. 8 Ifthemaingoaloftheanalysis istoarriveatabetterestimate oftheanomalies, then interestliesprimarilyininferenceonthelocationeffects,d. Estimatesoftheanomalies,Yij, followfromremovingthegrandmeanandlocationeffectfromeachobservation, Yij =Xij -γ-di =µj +ij . (8) Standardtechniques forfittingANOVAmodels (e.g.,Scheff´e 1999;Zar1999)generally assumeabalanceddesign,meaningthattherearethesamenumberofobservationsateach combination of factorlevels. In particular, package ANOVA solutions are notdesigned to accountforfactorcombinations forwhichtherearenoobservations, asisthecaseifthere aremissingobservationsinthematrixX. a. BayesianANOVAwithmissingdata FittingtheproposedANOVAmodel[Eq.(6)]involvesparameterestimationinthepresence of missing data — a situation amenable to Bayesian analysis. Within the Bayesian framework,themissingvaluesaretreatedasadditionalparametersthatmustbeestimated, while the posterior distributions of d, µ and γ include the uncertainty introduced by the mmissing data. Let X oand represent the elements of X that are missing and X observed, respectively,wherewerequirethatnoroworcolumnbeentirelymissing. Similarly,letthe vectors andXo represent theobserved andmissing elements forthejth year. ·j Xm ·j Weseek m posterior inference on , d, µ, γ, and s2 o, conditional on X. Application of Bayes’ rule X 9 yields, P mX,d,µ,γ,s2 o|X ∝PN=P |X o 2 m|X,d,µ,γ,s Xo ·P, ·P m|X,d,µ,γ,s2 X m ·j (9) m X j=1 (11) m,d,µ,γ,s2 X |d,µ,γ,s2 ·P d,µ,γ,s2 (10) j=1 = o ·jP X m ·j,d,µj,s2,γ, · N P X |d,µj,s2,γ, ·P d,µ,s2,γ , wherethesecondlinefollowsfromtheidentityP(A,B)=P(A|B)P(B),andthethirdfrom theassumptionthattheerrorvectorsforeachtimepointareindependent. oThefirsttermontherighthandsideofEq.(11)isthelikelihoodofthedata(X)given the unknowns. Under the assumption that the error elements ij are IID normal, the off- iance matrix of X· d i a g o n a l e l e m e n t s o f t h e c o v a r a likelihood thus do depend r on the missing values, and can be re-expressed as a doubl univariate e z e r o . T h e P Xo m ·j ·j|X,d,µj,s2,γ, = M,N o ijP i,j=1 |di,µj,s2,γ, . (12) X normals: N j=1 ThesecondproductofmutlivariatenormalsinEq.(11)canlikewisebeexpressedasadouble productofunivariatenormals: M,N m ijP X|di,µj,s2,γ, . (13) P Xm ·j|d,µj,s2,γ, = i,j=1 N j=1 ThesecondtermontherighthandsideofEq.(9)givesthejointpriorfortheunknowns, mwhich is re-expressed in Eq. (11) as the conditional distribution of X given the model parametersmultipliedbythejointpriorford,µ,γ,ands2 . Wespecifyindependentpriors =P(d)·P(µ)·P s102 ·P(γ). (14) fortheseparameters, P d,µ,s2,γ Detailsofthepriorspecifications,whichmustenforcethesum-to-zeroconstraints[Eq.(7)], andthesamplingstrategyareprovidedinAppendixA.Theendresultoftheanalysisisan mensemble of posterior draws of X, d, µ, γ, and s2, conditional on the data, priors, and modeling assumptions. The posterior ensemble can be used to used to specify both point 1estimatesanduncertaintiesfortheanomalies,andforanyotherfunctionoftheunknowns. AnalternativeinferencestrategycouldmakeuseofavariantoftheExpectation-Maximization algorithmofDempsteretal.(1977). Whilesuchafrequentist approachcanproducepoint estimates of the missing values and the vectors d and µ, as well as estimates of the Xm ·j associated uncertainty forthese quantities, Bayesian inference isuseful inwhatfollowsfor tworeasons. First,drawsfromtheposteriorallowforuncertainty estimationinquantities such as the time series of the change in standard deviations after expressing the data as anomaliesfromthelongerinterval(Fig.9). Second,obviousextensionstomodelswithspatiallycorrelatedlocationeffectsortemporallycorrelatedyeareffectswillbemoretractable withinaBayesianframework(seeSection5). 4. CRU annual mean temperatures: anomalies from 1961–1990and from1850–2009 a. Dataandbasicresults WeapplytheBayesianANOVAmodeltoanannuallyaveragedversionoftheCRUTEM3 data set (Brohan et al. 2006) of land surface temperatures; results are qualitatively un- 1AMatlabcodepackageandrelevantdatafilesareavailableatwww.people.fas.harvard.edu/ 11 ~ tingley. changedwhenusingthevariance-adjustedCRUTEM3v. TheCRUTEM3datasetprovides monthly mean anomalies with respect to a 1961–1990 reference, and we calculate annual anomaliesbyaveragingallavailablemonthlyobservationsforyearsandlocationsforwhich thereareatleast9monthlyobservations. ThespatialdistributionofdataavailabilityindicatesthatthelongerinstrumentalrecordsarepredominantlylocatedinEuropeandNorth America (Fig. 5). Forming the matrix X from time series at each of the 839 locations for which there is at least one annual mean observation, 45% of the values are missing, with 79%ofthemissingvaluesoccurringinthefirsthalf(1850–1929)ofthe1850–2009interval spannedbythedataset. Theyear1850is111timestepsfromthebeginningofthe30year referenceinterval,whichmotivatesthesymmetricexampleinFigs.2&3,whereeachofthe 459 (45% of 839) time series consists of 111 observations on either side of a 30 time-step referenceinterval. Results are based on5000samples from the posterior distributions ofγ, d, µ, and s2, after discarding 600 samples to allow the chain to reach convergence (e.g., Gelman et al. 2003). Details of the hyper-parameters used in the prior distributions for the unknown parameterscanbefoundinAppendixA. Theelementsofthelocationeffectsvector,d,arethetemporalmeansofthetimeseries relativetothegrandmeanγ(Fig.6). Pointestimatesand90%credibleintervalsareformed fromthemedian,and5th and95th percentiles oftheposteriordraws,respectively. Nineof the 839locationeffects aregreater (in magnitude) than 0.5◦C, while 108are greater than 0.25◦C.Themeanwidthofthe90%credibleintervalsis0.25◦C,and318ofthe90%point-wise credibleintervalsdonotcoverzero. Theconsiderablespatialstructureinthelocationeffects (Fig.6)issomewhat surprising, given thattheoriginalCRUdatasetisalreadyexpressed 12 as anomalies from the 1961–1990mean, and that no spatial structure is assumed a priori forthevectord(seeSection5). Thecorrelationbetweenthemediansofthelocationeffects (Fig.6)andthenumberofobservationsatthoselocations(Fig.5)is-0.06,indicatingthat on aglobal scale, there is essentially no correlation between dataavailability and location effect. Theposteriordistributionoftheyeareffects(Fig.6)isnotimportantinthecontextof calculatinganomalies. Notethattheestimatedyeareffectsshouldnotbeinterpretedasan estimate of the temporal evolution of the spatial mean of the temperature field; the year effectsµaresimplythetimeseriesthatismostcommontothedataset,withoutregardto thespatialdistributionoftheobservationsorthepatternofmissingdata. Estimatesofthe spatialmean,whichmakeuseofsimpleassumptionsaboutthespace-timecovarianceofthe temperaturefieldandaccountforthetemporallychangingpatternofdataavailability,are discussedbelow(Section4c). Theposteriorhistogramsofthescalarparametersareinallcasessharplypeakedrelative tothepriors, indicatingthattheposterioris dominatedby theinformationfromthedata (Fig. 7). Posterior estimates of γ are negative, as the original CRU reference interval of 1961–1990 is warm relative to the longer 1850–2009 interval. The parameters and s2 µ s2 d are related to the variance ofthe location and mean effects, under priors that enforce the sum-to-zeroconstraintsofEq.(7);detailscanbefoundinAppendixA. 13 b. Timeseriesofsimplemeansandstandarddeviations Foreachposteriordrawofγ andd,wecalculatethematrixofanomaliesYviaEq.(8). mAlthoughtheBayesianalgorithmimputesthemissingvaluesX (andthusYm canbecalculated), forthesakeofinvestigating theeffects ofchangingthereference intervalwefirst ocomparemeansandstandarddeviationscalculatedusingX (originalCRUdata;anomalies from 1961–1990) and (adjusted data set; anomalies from 1850–2009). The time Yo series orYo willbereferred formedbytaking,ateachyearj,themeanorstandarddeviationofXo ·j ·j tosimplyasthemeanorstandarddeviationtimeseriesforthatdataset. Section2demonstratedthat,forindependentAR(1)timeserieswithnomissingvalues, thechoiceofreferenceintervaldoesnotaddtemporalstructuretothemeantimeseries. Were woul othe CRU data complete, the difference between the mean time series of and Yo d X beconstantasafunctionoftimeandgivenbythenegativeofthegrandmean,γ (Fig.8). However,ifthespatialdistributionofdataavailabilityiscorrelatedwiththelocationeffects, thenthemeantimeseriesofYo oandX couldbeverydifferent. Forexample,iflongrecords have generally positive location effects, then estimates of the mean time series during the early part of the record would be colder, in relation to the later part, after removing the location effects and grand mean from each series. For the CRU data set, the correlation between the number ofobservations and the locationeffects is -0.06, which explains why othere is little temporalstructure in the difference between the mean time series ofX an d (Fig.8). Y Extending the reference interval from 1961–1990 to 1850–2009 increases the o standard deviationwithintheoriginal1961–1990referenceinterval,anddecreasesthestandarddevi14 atio n els eh wer e (Fig . 9). Suc ha res ult is to be exp ect ed, giv en the res ults fro m Sec tion 2 whi chi ndi cat eth atth est and ard dev iati oni sre duc ed with ina sho rtref ere nce inte rval and in fl at e d el s e w h er e. W it hi n( o ut si d e of )t h e or ig in al 1 9 6 1 – 1 9 9 0r ef er e n c ei nt er v al ,t h e m e a n oof the standard deviation time series of X is 0.67◦C (0.81◦C), while that of Yo is 0.69◦C C).Re-expressingthedatasetasanomalieswithrespecttothefull1850–1990inter- ( valthus 0 increases themean standarddeviation within the original1961–1990reference by . 7 about 0.02◦C and decreases the mean standard deviation elsewhere by about 0.05◦C. 6 The ◦ totalrange(differencebetweenhighestandlowestvalue)ofthestandarddeviationtimeseoriesforX is0.76◦C,whilethatforYo is0.66◦C.Changingthereferenceintervalthusreduces th et ot al ra n g e of st a n d ar d d e vi at io n v al u e s b y a b o ut 1 3 % . A n e s t i m a t e o f t h e s c a l e d s t a n d a r d d e v i a t i o n r a n g e , ∆ ( s e e S e c t i o n 2 a n d E q . ( 5 ) ) , f o r the CR U dat a set req uire s esti mat es of the ma xim um and min imu m of the sta nda rd deov ia ti o n ti m e s er ie s of X, a n d th e c o m m o n st a n d ar d d e vi at io n of Y o. A s e a c h ti m e s er ie s ofstandarddeviationsisnoisy,weestimate oof max(Ω *ii )and the standard deviation time series of X min(Ω *ii )asthemeanvalue outside and inside of the 1961–1990 refer interval,respectively, and vΩ11 Y o asthemeanvalueofthestandarddeviationtimeseriesof , giving an estimate of ∆ = 0.19 (Fig. 4). Results are unchanged when forming thes estimates as the square root of the mean value of the corresponding variance time se FortheCRUdata,theanomalieswithrespecttothe1961–1990referenceintervalfeaturea secondmomentstructurewithanamplitudethatis19%ofthebase-linestandarddeviation oftheanomalieswithrespecttoareferenceintervalthatspanstheentirelengthofthedata set. 15 c. Timeseriesofspatialmeans Theposteriordistributionofµ(Fig.5)givesanestimateofthedistributionoftheyear effects,butshouldnotbeinterpretedasanestimateofthetemporalevolutionofthespatial meanofthetemperaturefield. Itis,rather,anestimateoftheannualeffectsthataremost ocommon to the particular time series under study. Likewise, the mean time series of X andYo (Fig.8)donottakeintoaccountthespatialdistributionoftheobservations,orthe spatialandtemporalcovariancestructureofthetemperaturefield. Weestimatethespatialmeantimeseriesoftheglobal(ex-Anarctica)landsurfacetemoperatureanomaliesusingfirsttheoriginalX –theanomaliesfrom1961–1990–andthenthe –theanomaliesfrom1850–2009. Toaccountforth p o patternofdataavailabilityandthespatialandtemporalcovarianceofthesurfacete i n ature anomaly process, we adopt a hierarchical approach (e.g., Gelman et t infer in each case the spatially and temporally complete field. The process l w i thetemperatureanomalyfieldasfirst-orderautoregressiveintime,withspatiallyc s e AR(1) parameter and innovations with covariance thatdecays exponentially m e ofd spatial separation. The resulting space-time covariance form is separable i a bothspace andtime, andisotropic inspace (e.g.,Banerjeeetal.2004). Attheda n o theobservationsaremodeledasthetruefieldplusIIDnormalobservationalerrors f t processanddatalevelspecificationsresultinamodelwhichisaspecialcase(nopr h e Y servations)oftheBARCASTalgorithmdescribedinTingleyandHuybers(2010),w o usedtoinferbothmodelparametersandthespatiallycompletetemperatureanom b To increase the speed of computations, temperatures are inferred at only those 5◦ y 16 5◦ gridboxes thatcontain anon-zerofractionofland accordingtoa.5◦ by .5◦ land mask (Rodelletal.2004). Ineffect,thisdecisioneliminatesanumberoftheCRUgridboxesthat areprimarilyoceanicbutcontainsmall,remoteislands. Thesegridboxeshaveanegligible effect onestimates ofthespatialmean over land, asthey contain very small areasofland andaregenerallyisolatedfromotherlandmasses. Inordertoexploretheimpactsofchangingthereferenceintervalonthespatialmeantime series, the analysis is conducted in two stages to isolate this effect. First, the 403 annual mean CRU series that are complete from 1950–2000 are used to estimate all parameters oof the BARCAST model, for both the anomalies from 1961–1990(X) and the point-wise medianoftheanomaliesfrom1850–2009(Yo). Ineachcase,wefindtheposteriorsampleof the vector ofscalar parameters (see Table 1ofTingley and Huybers 2010)thatis closest, according to the Mahalanobis distance, to the median of the ensemble of draws of these parameters. TheposteriormediansoftheAR(1)andvarianceparametersfortheanomalies fromthelonger1850–2009interval,are,respectively, 0.34◦Cand0.47◦ 2C,andthesevalues areusedintheexamplesinSection2. ),fixingallscalarparameterssavethelong-termm BARCASTisthenappliedtoeachanomalydataset(Xo Yo ofBARCAST,toinfertemperatureanomaliesatallnodesofthe5◦ by5◦ gridthatcontain some fraction of land. The long-term mean parameter is allowed to vary in these second applications, as for both the original and adjusted CRU data, the mean calculated over 1950–2000isdifferent fromthatover 1850–2009. The two-stageapplicationofBARCAST allows formuch fastercomputation, asinthefirstapplicationthereis onlyonepatternof missing data,andinthesecond, onlyoneparameterestimateisupdated(seeTingley and 17 Huybers2010,fordetails). Ineffect,thisisanempiricalBayes’solution,astheuncertainty intheparameterestimatesisnottakenintoaccount. Results foreach applicationofBARCAST are based on2000draws fromthe posterior distributionofthetemperatureprocess, afterdiscarding600samplestoallowthechainto reachconvergence(e.g.,Gelmanetal.2003). Thespatialmeantimeseriesiscalculatedfor each posterior draw by weighting each grid box by the area of land it contains. In order to compare the structure and amplitude of globally (ex-Antarctica) averaged land surface otemperaturechangesinferredusingX andthepoint-wisemedianofYo,weremovefromeach drawofthespatialmeantimeseriesthecorrespondingdrawoftheBARCASTmeanparameter. Apointestimateofeachspatialmeantimeseriesisthenformedbytakingthemedian oftheposteriordrawsforeachyear,while90%point-wisecredibleintervalsareformedfrom the5th and95th percentiles(Fig.10a). Wealsosmootheachofthespatialaveragetimeseries (afterremovingthecorrespondingdrawoftheBARCASTmeanparameter)byanine-point Hanningwindow,andcalculatethemedianand90%point-wisecredibleintervals(Fig.10b). To explore the influence of the reference interval on the rate of long-term temperature change,wetakethedifferencebetweendrawsofthetemporallysmoothedspatialmeantime oseries based on the respective analyses of the and Yo, and calculate the median and X 90%point-wisecredibleintervalsforthetimeseriesofdifferences(Fig.10c). Thesmoothed ospatial mean time series calculated using (anomalies from 1961–1990) is cooler in X the earlier part of the record, and warmer in the later part, relative to the spatial mean time seriescalculatedusingthepoint-wisemedianofYo (anomaliesfrom1850–2009) . The range of the time series of differences in the temporally smoothed spatial means (Fig.10c)indicatesthechangeintheamplitudeofspatialmeantemperaturesthatcanbe 18 attributedtothechoiceofreferenceinterval. Therangeinthepoint-wiseposteriormedian is0.1◦C,withthemaximumat1866andtheminimumat2005;theassociated90%credible intervalis(-0.03,0.23).Inotherwords,changingthereferenceintervalfrom1961–1990to 1850–2009reducestheamplitudeofthe(smoothed)spatialmeantimeseriesbyabout0.1◦C, but as the associated 90% uncertainty interval covers zero, this result is not significant at the90%level. Indeed,theveryweakcorrelationbetweenthelocationeffectsandnumberof observationsateachlocationsuggestedthatthechangeinreferenceintervalwouldnothave havealargeeffectonthespatialmeantimeseries. 5. Discussionand Conclusions Therearebothtechnical andscientific reasons foranalyzingclimate datasets afterremoving from each time series a mean value calculated over a common reference interval. Asinterestinthetemporalevolutionofclimatevariablesextendsbeyondchangesinmean values,itiscrucialtoensurethatthemethodusedtocalculateanomaliesdoesnotaddspuriousstructurestoeitherthefirst-momentorhigher-momentpropertiesofthedataset. The Bayesiantwo-factorANOVAapproachtocalculatingclimateanomaliesproposedheremakes useofallavailabledata,andcalculatesthelocationeffectsoverareferenceintervalthatis aslongaspossible,whichavoidstheintroductionofnon-climaticsecondmomentstructures intotheanomalies. Bayesianinferencetreatsthemissingvaluesasadditionalmodelparameters,anduncertaintyestimatesfortheanomaliesincludestheuncertaintythatarrisesfrom themissingdata. Severalgeneralizationstothebasicanalysismodel(Section3)arepossible. Theassump19 tion that the only structure in the location effects is that introduced by the sum-to-zero constraint [Eq. (7)] is plausible for the analysis presented here, as the original CRU data arealreadyexpressedasanomaliesfromacommoninterval. However,thereisclearspatial coherencetotheestimatedlocationeffects(Fig.6a),whichcouldbeaccountedforinfuture workbymodelingthelocationeffectsasaspatialprocesswithastandardspatialcovariance form(e.g.,Banerjeeetal.2004),modifiedtoaccountforthesum-to-zeroconstraint. When calculatinganomaliesfromactualvalues(ratherthanadjustingthereferenceinterval,asis done here), the model for the location effects should also take into account expected spatialstructuresbyincludinglatitude,elevation,andperhapsothervariablesasco-variatesin theexpressionforthemeanofthelocationeffects. Thetreatmentoftheyeareffectscould likewise be generalized toinclude temporal trends in the mean structure and acovariance matrix that accounts fortemporal autocorrelation. Finally, the assumption that the error terms are IID normal could be modified to account for any observed spatial or temporal patternsintheresiduals–thoughcaremustbetakentoensureidentifiabilitywhenadding structuretotheerrorsaswellasthelocationandyeareffects. Using the basic Bayesian ANOVA scheme introduced here to re-express an annually averaged version of the CRU’s gridded temperature product as anomalies with respect to means calculated over the entire 1850–2009 demonstrates the influence that the choice of referenceintervalcanhaveonthestatisticalpropertiesoftheanomalydataset. Relativeto theoriginalanomalieswithrespectto1961–1990,theanomalieswithrespecttothelonger intervaldisplaylargerspatialvariancewithintheoriginal1961–1990referenceinterval,and smallerspatialvarianceelsewhere. AnyanalysisoftheoriginalCRUdatathatdependson second-momentproperties,suchasestimatesofspatialpatternsofvariability,orthespatial 20 distribution of extreme values, will thus be affected by second moment features which are directly attributable to the choice of reference interval. Measured by an estimate of the scaledstandarddeviationrange(∆fromEq.(5)),theanomalieswithrespecttotheoriginal 1961–1990intervalfeatureasecond-momentstructurewithanamplitudeofabout19%the magnitudeofthemeanstandarddeviationoftheanomalieswithrespecttothelonger1850– 2009interval. CalculationsforAR(1)timeserieswithAR(1)parameterestimatedfromthe CRUanomalieswithrespecttothelonger1850–2009predictedaqualitativelysimilarresult, but a smaller scaled standard deviation range of about 6.8%. The larger value found in practicecouldresultfromthespatialcovarianceoftheCRUdata,whichwasnotaccounted forintheANOVAmodelortheexperimentswithAR(1)datainSection2. For the CRU data, the location effects from the ANOVA analysis are essentially uncorrelated withthe number ofobservations ateachlocation, andasaresult, thechoice of referenceintervalhaslittleeffectoneitherthetimeseriesofsimplemeans,orestimatesof thetimeseriesofspatialmeans. Intermsofthespatialmeans,re-expressingtheCRUdata asanomalieswithrespecttothelonger1850–2009referenceintervalreducestheamplitude oftemperaturechangeoverthepast160yearsbyabout0.1◦C,butasthe90%uncertainty interval, (-0.03,0.23), covers zero, this result is not statistically significant. It is importanttoemphasizethatforotherclimatedatasets,wherethelocationeffectsmaybemore strongly correlated with dataavailability, the choice ofreference interval used tocalculate the anomalies will influence both the first and second moment structures of the resulting anomalies. 21 Acknowledgments. ThecontentandpresentationofthearticlebenefitedfromdiscussionswithT.Greasby, P. Huybers, D. Nychka, J. Rougier, S. Sain, B. Shaby, and from the comments of two anonymousreferees. 22 A P P E N D I X P r i o r s p e c i fi c a t i o n a n d p o s t e r i o r s a m p l i n g A G i b b s S a m p l e r ( e . g . , G e l m a n e t a l . 2 0 0 3 ) i s u s e d t o s a m p l e f r o m t h e p o s t e r i o r d i s t r i buti ono fthe mis sin gva lue s,y ear and loc atio neff ect s,a nds cal arp ara met ers ofth eA NO VA mo del. We spe cify con jug ate prio rsfo rall scal arp ara met ers; giv ent hes truc ture ofE q.(1 0), mitis not nec ess aryt osp ecif ypri orsf orth eva lue soft he mis sin gob ser vati ons ,X. We first spe cify the fun ctio nal for ms of the prio rs and full con diti ona l pos teri ors, and the n disc uss the hyp er-p ara met ers use d inth e ana lysi s ofth e CR U dat a. The not atio nA|· will den ote th e di st ri b ut io n of th e v ar ia bl e A c o n di ti o n al o n al lo th er v ar ia bl e s. a . T h e m i s s i n g v a l u e s , X m i , j N o pr io r is n e c e ss ar y fo r th e m is si n g v al u e s; th e fu ll c o n di ti o n al p o st er io r fo r e a c h is +µj,s2). (A1) normal: Xm ij|·~N(γ+di b. Grandmean,γ ,s 2γ ). (A2) Thenormaldistributionistheconjugateprior: 23 γ~N(µγ Theconditionalposteriorislikewisenormal, ,Ψγ), (A3) γ|·~N(ΨγVγ all Xij + = , (A4) 1s2 µsγ2 where γ Vγ -1 = 2N·M + . (A5) and s 1s2 γ Ψγ c. Errorvariance,s2 Theinverse-gammadistributionistheconjugateprior: s2 ~Inverse-Gamma(λ,ν),sothatP s2 ∝ s2-(λ+1) ·exp -ν/s2 (A6) Note that the inverse-gamma prior can be interpreted as 2λ prior observations with an average squared deviation of ν/λ (e.g., Gelman et al. 2003). The conditional posterior is +λ, 1all2 (Xij -γ-di -µj)2 +ν . (A7) likewiseinverse-gamma, s2|·~Inverse-Gamma N·M 2 d. Thelocationandyeareffects,d andµ The priors for d and µ must take into account the sum-to-zero constraints of Eq. (7). dWefollowKaufmanandSain(2010),settingd~N(0,S µ)andµ~N(0,S). UsingIM to torepresenttheM byM matrixofones,wespecify representtheM byM identity,andJM 24 thepriorcovariancesas, =s dS -1 M s 2d , ifi=j 2 d IM 1M JM =s2 IN 1N JN . (A8) µ Sd = i j I n o t h e r w o r d s , and Sµ 1 - 1 2d M s, ifi=j (A9) d 0,1 µandsimilarlyforS . Thesumoftheelements ofd isthendistributedasN(1 S 1)= N(0,0), which ensures that the sum-to-zero condition is enforced. Posterior sampling is dcomplicatedbythesingularityofthepriorcovariancematricesS µandS . Ourstrategyfor sampling thevectors oflocationandyeareffects, d andµ, which we detailford,is based onthepresentationinKaufmanandSain(2010). The sum-to-zero constraint implies that there are only M -1 free parameters in the the length M vector d. We therefore seek toexpress d as alinear transformofan M-1 dimensional vector ofIID normal variables. Letting d* 2 ~N(0M-1 ,s2 ·IM-1), we must d|s d such that d = QM d* has the required covariance find an M by M -1 matrix QM form [Eq.(A9)]. FollowingKaufmanandSain(2010),definetheM byM-1matrixQ* M ascolumnsof Helmert contrasts, which compares the effect of one level of a factor to the mean of the 25 precedingfactors(e.g.,Ruberg1989). Asanexample,forM =4, = 0 0 -3 . (A10) 0 -2 1 Q* 4 -1 1 1 111 Now solve for an M -1 by M -1 diagonal matrix RM , where QM = Q* RM , such that M d=QMd* hasthecorrectcovarianceform. Thatis,solveforRM thatsatisfies, T* Q* MRM RM QM d=S =IM 1M JM . (A11) Sol vinyields, gfo rRM T* Q* M QT* = RM M IM 1M J Q* M T M Q* -11/2 M M 1/2 Q * M * M * M -1 Q * -1-1 M Q . (A12) = Q T * Q* M Q T Q* -1/2 M = QT*MM * Q TQ M ,andthentakingthematrixsquareroot. Thesecondlinefollowsfrom thefactthateachcolumnofQ* M sumstozero,sothatQ* M JM =0M . Pluggingthisformfor intothedefinitionofQM gives, =Q* M Q TM* Q* -1/2 . (A13) M RM QM T* -1 Thefirstlinefollowsfromleft-multiplyingEq.(A11)by QbyQ* MM Q* M QT*MQ* M-1 QTM* ,rightmultiplying KaufmanandSain(2010)indicatethateachcolumnofamatrixofHelmert contrastsneeds tobescaled by somefactor, anddemonstrate fortheM =3case, butdonotprovide the generalformulasderivedhere. 26 Toproducesamplesfromtheconditionalposteriorofd,wefirstdrawfromtheconditional posteriorofd*,andthentransformusingtheexpressionforQM. Thepriorford* ,givens2 d M-1 2 dIM-1 isnormal, d* ~N 0 2 ,s . (A14) d|s d d* ,Ψd* IM-1), (A15) 2 ·Q ·j -γ1M ), (A16) · j=1 * islikewisenormal, Thefullconditionalposteriorford d* |·~N(Ψ * V T N * = 1s M (X where Vd and Ψd* = -1 N + . (A17) s2 1s2 d The calculation proceeds by substituting Qd* for d in Eq. (11), and factoring the joint distributionoftheelementsofXasaproductofN multivariatenormals,oneforeachtime ,givens2 µ,isnormal: interval. Thetreatmentofµisequivalenttothatford. Thepriorforµ* µ* 2 ~N 0N-1 ,s2 µIN-1 . (A18) µ|s Thefullconditionalposteriorislikewisenormal, µ*|·~N(Ψ * Vµµ* ,Ψµ* IN-1), (A19) M (Xi· -γ1N ), (A20) ·QT i=1 N where Vµ* = 1s2 and Ψµ 27 * = -1 M + s2 1s2 µ . (A21) andsµ e. Variancesofthelocationandyeareffects,sd ~Inverse-Gamma(λd,νd) Theconjugatepriorsareinversegamma: ~Inverse-Gamma(λµ,νµ). (A22) Thefullconditionalposteriorsarelikewiseinverse-gamma: s2 d s 2 µ +λd, d s2 d|·=Inverse-Gamma M-1 2 s2 µ|·=Inverse-Gamma N-1 2 T* d* T* µ* +λµ, µ 2 +ν 2 +ν d µ . (A23) f. Hyper-parametersfortheanalysisoftheCRUdata Preliminary analysis ofthe observed values areused toset the parameters ofthe prior distributions forγ, s2 ,s2 , s2 µ. Aninitialestimate ofthegrandmean(γ)is formedasthe d omeanoftheX ,whilethelocationeffectsareestimatedasthetemporalmeansofX o lessth e estimated grandmean, andlikewise fortheyeareffects. Anestimateoftheerrorvariance ocan be formed by first estimating each element of X as the sum of the grand mean and corresponding locationand year effects, taking the difference between these estimates and the observed values, and then taking the variance of the resulting residuals. The hyperparametersarethensetasfollows: • Grandmean,γ. Setthepriormean,µγ,tothemeanofallavailableobservation s,and thepriorvariances2 γ to16timestheestimatedvariance. • Prior for the error variances, s2. Set λ to 1/2, and ν to half the estimated residual 28 variance. These parameters corresponds to one prior observation with an average squareddeviationgivenbytheestimatedresidualvariance. 2 d• Factorvariances,s ands2 µ. Setλd,µ to1/4andsetµd,µ toonefourththevarianceof theestimatedeffectvectors. Ineachcase,theseparameterscorrespondtohalfaprior observationwithanaveragesquareddeviationgivenbythesamplevariance. 29 REFERENCES Banerjee,S.,B.P.Carlin,andA.E.Gelfand,2004: HierarchicalModelingandAnalysisfor SpatialStatistics.Chapman&Hall/CRC,NewYork. Brohan, P., J. J. Kennedy, I. Harris, S. F. B. Tett, and P. D. Jones, 2006: Uncertainty estimates in regional and global observed temperature changes: A new data set from 1850.JournalofGeophysicalResearch,2,99–113. Dempster,A.,N.Laird,D.Rubin,etal.,1977: Maximumlikelihoodfromincompletedata viatheEMalgorithm.JournaloftheRoyalStatisticalSociety,39(1),1–38. Gelman,A.,J.B.Carlin,H.S.Stern,andD.B.Rubin,2003: BayesianDataAnalysis.2d ed.,Chapman&Hall/CRC,BocaRaton. Hansen,J.andS.Lebedeff,1987: Globaltrendsofmeasuredsurfaceairtemperature.Journal ofGeophysicalResearch,92 (13),345–13. IPCC, 2001: Climate Change 2001: Synthesis Report.A Contribution of Working Groups I, II, and III to the Third Assessment Report of the Intergovernmental Panel on ClimateChange,R.WatsonandtheCoreWritingTeam,Eds.,CambridgeUniversityPress, Cambridge,UnitedKingdomandNewYork,NY,USA. Jansen,E.,etal.,2007: Palaeoclimate.ClimateChange2007: ThePhysicalScienceBasis. ContributionofWorkingGroupItotheFourthAssessmentReportoftheIntergovernmen- tal Panel on Climate Change, S. Solomon, D. Qin, M. Manning, Z. Chen, M. Marquis, 30 K. Averyt, M. Tignor, and H. Miller, Eds., Cambridge University Press, Cambridge, UnitedKingdomandNewYork,NY,USA,chap.6. Jones, P., M.New, D.Parker, S.Martin, andI.Rigor,1999: Surface airtemperature and itschangesoverthepast150years.ReviewsofGeophysics,37(2),173–199. Kalnay, E., et al., 1996: The NCEP/NCAR 40-year reanalysis project. Bulletin of the American Meteorological Society, 77 (3), 437–471, NCEP Reanalysis Derived data providedbytheNOAA/OAR/ESRLPSD,Boulder,Colorado,USA,fromtheirWebsiteat http://www.esrl.noaa.gov/psd/. Kaufman, C. and S. Sain, 2010: Bayesian functional ANOVA modeling using Gaussian processpriordistributions.BayesianAnalysis,5(1),123–150. Mann, M., etal., 2009: Globalsignatures anddynamical originsoftheLittleIce Ageand MedievalClimateAnomaly.Science,326,1256–1260. Rodell,M.,etal.,2004: Thegloballanddataassimilationsystem.BulletinoftheAmerican MeteorologicalSociety,85(3),381–394,dataavailableathttp://ldas.gsfc.nasa.gov/ gldas/GLDASvegetation.php. Ruberg, S., 1989: Contrasts for identifying the minimum effective dose. Journal of the AmericanStatisticalAssociation,84 (407),816–822. Scheff´e,H.,1999: Theanalysisofvariance.Wiley-Interscience. Tingley, M. and P. Huybers, 2010: A Bayesian Algorithm for Reconstructing Climate 31 Anomalies in Space and Time. Part 1: Development and applications to paleoclimate reconstructionproblems.JournalofClimate,23(10),2759–2781. Trenberth, K., et al., 2007: Observations: Surface and atmospheric climate change. ClimateChange2007: ThePhysicalScienceBasis.ContributionofWorkingGroupItothe FourthAssessmentReportoftheIntergovernmentalPanelonClimateChange,S.Solomon, D.Qin, M.Manning, Z.Chen, M.Marquis, K.Averyt, M.Tignor, andH.Miller, Eds., Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, chap.3. Zar,J.H.,1999: BiostatisticalAnalysis.4thed.,PearsonEduucation,Singapore. 32 ListofFigures 1 NCEPreanalysis(Kalnayetal.1996)fortheannualmeantemperaturefield. (a) Values for 1981. (b) The 1968–1996 long-term mean. (c) The 1981 anomaliesfromthelongtermmean. 36 2 Upper panel: 459 independent AR(1) time series (a = 0.34, s2 = 0.47), box-plotsateveryfourthtimestep,andthesamplemeanateachtimestep. Lowerpanel: thesamplestandarddeviationateachtimestep(black)andthe populationstandarddeviation(grey)calculatedfromEq.(1). Thenumberof time series, the relative length ofthereference interval, andthe values ofa ands2 arechosentocorrespondtoestimatesfromtheCRUdatasetanalyzed inSection4. 37 3 AsinFig.2,butafterremovingfromeachtimeseriesthesamplemeancalculatedoverthe30time-stepshadedinterval. Inthelowerpanel,thegreyline isthepopulationstandarddeviation oftheoriginalseries, andtheblueline isthepopulationstandarddeviationoftheanomaliescalculatedfromEq.(4). 38 4 Thescaledstandarddeviationrange,∆(a,N,N*),asafunctionofboththe AR(1)coefficientandthelengthofthereferenceinterval,forAR(1)timeseries oflengthN =252with reference intervalcentered onN/2. The white plus signdemarcatesN* =30anda=0.34,whichcorrespondtotheexamplein Figs. 2&3and the parameter values obtained fromthe analysis ofthe CRU data in Section 4. The black contour corresponds to the actual value of ∆ estimatedfromtheCRUdatasetinSection4. 39 33 5 The number ofyears inthe 1850–2009interval forwhich there isan annual mean temperature anomaly observation, as a function of spatial location, calculatedfromtheCRUTEM3monthlydataset(Brohanetal.2006). 40 6 (a)Posteriormediansofthelocationeffects,d. Themeanwidthofthe90% point-wisecredibleintervalsis0.25◦C,andhatchingindicateslocationswhere thecorresponding90%credibleintervalcontainszero. (b)Posteriorestimates oftheyeareffects,µ. Theposteriormedianisshowninblack,and90%pointwisecredibleintervalsinlightgrey. 41 7 Black: posteriorhistogramsofthegrandmean(γ),errorvariance(s2),and thevariancesoftheyearandlocationeffects(s2 µ ands2 d). Dashedgrey: prior distributionsfortheseparameters. 42 o8 Upper panel: The mean time series of X, the original CRU temperature anomaliesusinga1961–1990referenceinterval(black)andthemeantimese- riesofYo,theadjustedanomaliesusingan1850–2009referenceinterval(red). Results for the 1850–2009 reference interval are formed by first calculating the anomalies via Eq. (8), and then the average across these Yo anomalies, foreachposteriordrawofγ andd. Boththemediansandthecorresponding 90% point-wise credible intervals (light red shading; not readily discernible fromtheredmedians)oftheresultingdistributionareplotted. LowerPanel: Median(black) and90%credible interval(shading)forthedifference inthe meantimeseriesofeachdrawofYo andthemeantimeseriesoftheoriginal o . 43 X andYo. 44 o9 AsinFigure8,butforthestandarddeviationtimeseriesofX 34 10 (a)Timeseriesofspatiallyaveragedglobal(ex-Antarctica)landsurfacetemperature anomalies based on the annual mean CRU temperature anomalies from the original 1961–1990 reference interval (black) and from the longer 1850–2009interval(red). 90%point-wisecredibleintervalsareshownforthe spatialaveragecalculatedusingtheanomaliesfromthelonger1850–2009interval(lightredshading), andthecorresponding uncertainty intervals when using the shorter 1961–1990 interval are similar. In both cases, a reduced formofBARCAST (Tingley andHuybers 2010)wasused toinferthemissingtemperaturevalues inspace andtime, andthetemporalmeanhasbeen removed fromeachdrawofthespatialmeantimeseriespriortocalculating percentiles. (b)Asin(a),buteachdrawofthespatialmeantimeseriesfrom BARCAST is smoothed using a nine-point Hanning window prior to calculatingpercentiles. (c)Themediandifferencebetweensmootheddrawsofthe globalmeantemperatureseriesusingthetworeferenceintervals(black),and the90%point-wiseuncertainty(grey). 45 35 (a) 1981 values 60 60ooN N 30 30ooN N (b) 1968-1996 mean oo WW 50 50 (c) 1981 anomalies 60 60ooN N 60 ooW 60ooN WN 3 0 30 30ooN N 50 50ooW W 50 50 3 0 o o N N o o 1 -25 -20 -15 -10 -5 0 5 10 15 20 250 0 1 0 0 W 1 0 0 o oo 10 o 100W 1 0 0 Fig.1. NCEPreanalysis(Kalnayetal.1996)fortheannualmeantemperaturefield. (a W Valuesfor1981. (b)The1968–1996long-termmean. (c)The1981anomaliesfromth termmean. 36 Time Series 2 0 -2 0 20 40 60 80 100 120 140 160 180 200 220 240 Standard Dev. 0.8 0.75 0.7 0 50 100 150 200 250 0.65 Time step Fig.2. Upperpanel: 459independent AR(1)timeseries(a=0.34,s22=0.47),box-plots ateveryfourthtimestep,andthesamplemeanateachtimestep. Lowerpanel: thesample standarddeviationateachtimestep(black)andthepopulationstandarddeviation(grey) calculated from Eq.(1). The number of time series, the relative length of the reference interval, and the values of a and sare chosen to correspond to estimates from the CRU datasetanalyzedinSection4. 37 Time Series 2 0 -2 0 20 40 60 80 100 120 140 160 180 200 220 240 Standard Dev. 0.8 0.75 0.7 0 50 100 150 200 250 0.65 Time step Fig.3. AsinFig.2,butafterremovingfromeachtimeseriesthesamplemeancalculated overthe30time-stepshadedinterval. Inthelowerpanel,thegreylineisthepopulationstandarddeviationoftheoriginalseries,andtheblueli neisthepopulationstandarddeviation oftheanomaliescalculatedfromEq.(4). 38 1.2 0.9 1 0.7 0.6 0.5 0.4 0.3 0.2 Value of the AR(1) coefficient, a 0.8 0.8 0.6 0.4 0.2 0 0.1 20 40 60 80 100 120 0 Length of the reference interval, N* Fig.4. Thescaledstandarddeviationrange,∆(a,N,N* *),asafunctionofboththeAR(1) coefficientandthelengthofthereferenceinterval,f reference interval centered on N/2. The white p a=0.34,whichcorrespondtotheexampleinFigs.2& fromtheanalysisoftheCRUdatainSection4. Theb of∆estimatedfromtheCRUdatasetinSection4. 39 W 1 2 0 W 60 60N N oo 1 2 0 30 30oN N o 0oo oW o oW o 6 0 W 0 W 0 6 0 o o 6 0 o o 140 160 Observations per 6 0 Fig.5. Thenumberofyearsinthe1850–2009intervalfor temperature anomaly observation, as a function of s CRUTEM3monthlydataset(Brohanetal.2006). 40 0 30 30oS S o 60 60oS S o o o 1 8 0 1 8 0 E 1 2 0 E 1 2 0 oC W 120W 120 -1 -0.5 0 0.5 1 (a) oo o o o o W W 6 0 W 0 W 0 60 60N N oo 6 0 30 30oN N o 0oo 0 30 30oS S o 60 60oS S o oo 180 180 1 (b 0 ) 0. 5 1 8 5 0 1 9 0 0 1 9 5 0 2 0 0 0 0 . 5 ◦Fig.6. (a)Posteriormediansofthelocation effects,d. Themeanwidthofthe90%pointwis e credible intervalsis0.25C,andhatchingindi cates locationswhere thecorresponding 90% credible interval contains zero. (b) Posterior estimates of the year effects, µ. The posteriormedianisshowninblack,a nd90%point-wisecredibleinterval Y e a r o o ooE 120E 6 120 0 6 0 E 180E 180 oW o W o C sinlightgrey. 41 100 50 γ -0.13 -0.125 -0.12 -0.115 -0.11 -0.105 0 50 10 2s 0 15 0 0 . 5 1 4 0 . 5 1 6 0 . 5 1 8 0 . 5 2 0 . 5 2 2 0 . 5 2 4 0 . 5 2 6 0 . 5 2 8 0 . 5 3 0 . 5 3 2 0 2 0 s 2 µ 10 . 0 1 0 . 1 2 0 . 1 4 0 . 1 6 0 . 1 8 0 . 2 0 . 2 2 0 . 2 4 0 2 d 5 0 1 0 0 1 5 0 s 0 . 0 3 0 . 0 3 2 0 . 0 3 4 0 . 0 3 6 0 . 0 3 8 0 . 0 4 0 . 0 4 2 0 . 0 4 4 0 Fig. 7. Black: posterior histograms of the grand mean (γ), error variance (s2), and the variancesoftheyearandloca tioneffects(s2 µands2 d). Dashedgrey: priordistributionsfor theseparameters. 42 CRU: Anomalies from 1961-1990 Adjusted: Anomalies from 1850-2009 90% Credible interval o C 1 0.5 0 -0.5 o 1850 1900 1950 2000 1850 1900 1950 2000 0.09 oFig.8. Upperpanel: ThemeantimeseriesofX,theoriginalCRUtemperatureanomalies using a 1961–1990reference interval (black) and the mean time series of Yo, the adjusted anomaliesusingan1850–2009referenceinterval(red). Resultsforthe1850–2009reference intervalareformedbyfirstcalculatingtheanomaliesYoviaEq.(8),andthentheave rage across these anomalies, for each posterior draw of γ and d. Both the medians and the corresponding 90% point-wise credible intervals (light red shading; not readily discernible fromtheredmedians)oftheresultingdistributionareplotted. LowerPanel: Median(black) and90%credibleinterval(shading)forthedifferenceinthemeantimeseriesofeac hdraw ofYandthemeantimeseriesoftheoriginalX. 43 0 . 1 Difference 90% Credible interval Negative Grand Mean o 0 . 1 1 0 . 1 2 0 . 1 3 0 . 1 4 0 . 1 5 Mean in o C CRU: Anomalies from 1961-1990 Adjusted: Anomalies from 1850-2009 90% Credible interval o 1.2 1 0.8 0.6 1850 1900 1950 2000 Difference andYo. 1850 1900 1950 2000 -0.1 Years AD oFig.9. AsinFigure8,butforthestandarddeviationtimeseriesofX 44 0.05 0 -0.05 D i f f e r e n c e 9 0 % C r e d i b l e Standard dev. in i n t e r v a l Years AD o -0.5 0 C (a) 0.5 1 0.5 1 Anomalies from 1961-1990 Anomalies from 1850-2009 90% Credible ( interval c ) o C -0.5 (b) 0 o C 0.2 0 Difference 90% Credible interval 1850 1900 1950 2000 -0.2 Year Fig.10. (a)Timeseriesofspatiallyaveragedglobal(ex-Antarctica)landsurfacetemperat ure anomalies based on the annual mean CRU temperature anomalies from the original 1961–1990 reference interval (black) and from the longer 1850–2009 interval (red). 90% point-wisecredibleintervalsareshownforthespatialaveragecalculatedusingth eanomalies from the longer 1850–2009interval (light red shading), and the corresponding uncertainty intervals when using the shorter 1961–1990 interval are similar. In both cases, a reduced formofBARCAST(TingleyandHuybers2010)wasusedtoinferthemissingtemp erature valuesinspaceandtime,andthetemporalmeanhasbeenremovedfromeachdra wofthe spatial mean time series prior to calculating percentiles. (b) As in (a), but each draw of thespatialmeantimeseriesfromBARCASTissmoothedusinganine-pointHanni ngwindowpriortocalculatingpercentiles. (c)Themediandifferencebetweensmootheddrawsof the globalmean temperature series using the two reference intervals (black), andthe 90% point-wiseuncertainty(grey). 45