Testing for equal variance Scale family: Y = sX G(x) = P(sX ≤ x) = F(x/s) To compute inverse, let y = G(x) = F(x/s) so x/s = F-1(y) x = G-1(y) = sF-1(y) Δ(x) = G-1(F(x)) – x = s F-1(F(x)) – x = (s-1)x -2 -3 Blue slopes (-0.65,-0.20) CI for scale ratio (0.35,0.8) -4 y Quantiles -1 0 Shiftplot 0 1 2 3 x Quantiles 4 5 Assumptions Iid Scale family Need moderately large samples Testing equal variance for distributions with equal locations Ranking m X-values and n Yvalues, the average rank is (n+m)(n+m+1)/4 If F is more spread out than G, and the locations are the same, we would tend to have more large and small residuals from the mean rank for the X-values. One way to get at this is to assign rank 1 to the smallest and largest values, 2 to second smallest and second largest, and continue in towards the middle. The Ansari-Bradley test Compute the sum of the X-ranks as where p=[(m+n+1)/2] and 1iX is the indicator of the ith observation in the combined ordered sample is an X. Small values of W correspond to F being more dispersed. In practice, align the locations first. Null distribution Let f(w,m,n) be the number of orders with m 1 and n 0 that yield the statistic value W=w. Assume 2N=m+n is even. If we add one more X, either it or a Y is N+1. If it is a Y there are f(w,m,n) ways, while if it is an X, there are f(w-N-1,m-1,n+1) ways. Thus we get the recursion f(w,m,n+1)= f(w,m,n) + f(w-N-1,m-1,n+1) Null distribution, cont. Thus E(W)=m(m+n+2)/4 R: ansari.test(x,y) On the exponential samples, subtracting the median from each sample, p = 5x10-8 CI = (0.40,0.60) estimate 0.49 Assumptions Iid Known difference between locations “No rank test (i.e., a test invariant under strictly increasing transformation of the scale) can hope to be a satisfactory test against dispersion alternatives without some sort of strong restrictions (e.g., equal or known medians) being placed on the class of admissible distribution pairs. “ (Moses, 1963) Another rank test of variability Siegel-Tukey: 1 45 8 9 7 6 3 Sum of green ranks 14 -4x5/2 = 4 Compare to Mann-Whitney distribution P-value 2 x 0.095 = 0.19 2 For exponential samples P-value is 0.0005 NOAA State of the Climate web site State of the Climate 2008 rwrwrw Shen et al. (2012) 2nd warmest –14th warmest 0.5 0.0 -0.5 -1.0 Temperature anomaly °C 1.0 1921, 4th warmest 1900 1920 1940 1960 Year 1980 2000 So we don’t really know which is the fourth warmest year But we have standard errors for each year Can we use the standard errors to assess the uncertainty in ranks? Simple approach Draw independent normal random numbers with the right mean and sd for each year Rank Repeat to get an ensemble of paths. R code: 0.5 0.0 -0.5 -1.0 Temperature anomaly (°C) 1.0 http://www.statmos.washington.edu/wp/wpcontent/uploads/2012/10/Uncertainty-analysis.txt 1900 1920 1940 1960 1980 2000 Rank distribution -4 -2 -1 -2 0 0 1 2 2 But aren’t years dependent? 0 20 40 60 80 0 100 20 40 60 80 100 Time ACF 0.2 0.6 -0.2 0.2 ACF 0.6 1.0 1.0 Time -0.2 0 0 5 10 15 5 10 15 20 Lag 20 Lag -4 -3 -2 -1 0 1 2 1 2 3 Autocorrelation = correlation with itself shifted over 0 20 40 60 80 0 100 20 40 80 100 ACF -0.5 -0.2 0.0 0.2 0.5 0.6 1.0 1.0 Time ACF 60 Time 0 5 10 15 20 0 5 10 Lag Lag 15 20 0 -6 -2 -4 -2 ar2[2:100] 0 -1 white[2:100] 1 2 2 Lagged plots -2 -1 0 1 2 -6 -4 -2 2 ar2[1:99] 0 -1 -4 -3 -2 0 -2 -4 ma2[2:100] 2 arma11[2:100] 1 4 2 white[1:99] 0 -4 -2 0 ma2[1:99] 2 4 -4 -3 -2 -1 0 arma11[1:99] 1 2 Autoregression Idea: Predict the current value from previous values k’th order autoregression R commands library(forecast) acf(series) ar(series) Moving average Idea: Current value is obtained by weighted average of previous errors Moving average of order k auto.arima(series) ARIMA models George Box and Gwilym Jenkins 1919-2013 1932-1982 We have already seen AR and MA ARIMA(0,1,0): Xt = Xt-1 + t or t = Xt – Xt-1, differencing Can be iterated. ARIMA(p,1,q) has t following an ARMA(p,q) model. Why worry? In climate contexts we are often interested in fitting trends. Here is a sequence of slope fits to US monthly average temperature: OLS 0.0055°C/y WLS 0.0048°C/y GLS (AR4) 0.0053°C/y GLS (ARMA(3,1) 0.0059°C/y sd 0.0012*** sd 0.0014*** sd 0.0026* sd 0.0032 The same data, increasingly realistic models. Significance disappears. Does dependence matter? 0.5 0.0 -0.5 -1.0 Temperature anomaly (°C) 1.0 Structure iid 1900 1920 1940 1960 1980 2000 Structure ARMA(3,1) 0.5 0.0 -0.5 -1.0 Temperature anomaly (°C) 1.0 Year 1900 1920 1940 1960 Year 1980 2000 1993 0 1994 0 1995 0 1996 0 0.04 Density 0.00 0.20 0 40 80 80 0 40 80 0 40 0.08 0.00 0.4 0.0 2.0 0.0 Density 40 Density 2000 0 Density 1999 0.0004 80 0 40 80 2001 0 2002 0 2003 0 2004 0 0 40 0.00 Density 040 4080 80 6000 0 0 Density 15000 1996 2008 0 0 0 0 40 4080 80 RankRank Rank Rank 20000 80 2000 0 Frequency 1999 0.0002 Frequency 40000 0 0 0 40 80 Rank 80 0 40 80 2002 0 2003 0 2004 0 80 0 40 0 6000 6000 0 10000 40 80 0 40 0 10000 2001 0 Frequency Rank Frequency Rank Frequency Rank 80 0 40 80 2005 0 2006 0.273 2007 0 2008 0 80 Rank 0 40 80 Rank 0 40 80 Rank 0 8000 0 30000 0 8000 40 6000 Rank Frequency Rank Frequency Rank Frequency Rank 0 0 0 1998 0.6159 80 40 Rank 0 0 80 RankRank Frequency 40 40 80 0 1995 2007 0 0 Density 040 80 Rank 0 15000 Frequency 0.0 1.5 80 6000 Dependent 40 0 40 80 0 0 0 1994 0 2006 0.2874 Density 0 40 80 Rank 0 6000 40 0.00 Density 0 Frequency 0.00 0.04 80 Frequency 0.00 0.10 40 0.15 Rank 0.08 Rank 0.08 Rank Density Rank 0 6000 Density 0.00 0.10 Frequency 80 1998 0.6448 1997 0 Frequency 40 1997 0 Rank Rank Frequency 0 Rank 20051993 0 0 Frequency 80 Rank Rank 0 0.00 Density 40 Rank 0.00 0.10 0 0.20 Density 0 0.04 0 Density 80 Rank 0.00 Independent 40 0.00 Density 0 0.00 0.10 0.00 Density Effect of dependence 0 40 80 Rank 8 6 4 2 0 Dependent rank sd 10 12 Rank sd 0 2 4 6 8 Independent rank sd 10 12 Year Independent Dependent Rank in model model Shen et al. 1921 0.99 0.99 4 1931 0.95 0.94 6 1934 1.00 1.00 3 1938 0.12 0.14 1939 0.18 0.21 1953 0.78 0.74 1954 0.12 0.15 1990 0.82 0.79 7 1998 1.00 1.00 1 1999 1.00 1.00 5 2001 0.71 0.69 9 2005 0.61 0.58 10 2006 1.00 1.00 2 2007 0.61 0.58 8 Back to State of the Climate “2012 ... was the warmest year in the 1895-2012 period of record for the nation.” 0.08 0.06 Standard error (°C) 0.10 0.12 Need to extrapolate standard error 1900 1920 1940 1960 Year se(2012) ≈ 0.08 anomaly(2012) = 1.7 anomaly(1998) = 1.2 0.5/0.08 ≈ 6 !!! 1980 2000 And the uncertainty in the ranking of 2012 is... 0.0 0.2 0.4 Density 0.6 0.8 1.0 2012 1.0 0 20 40 60 Rank 80 100 120 NOAA State of the Climate 2014 The probability that 2014 was... Warmest year on record: 48.0% One of the five warmest years: 90.4% One of the 10 warmest years: 99.2% One of the 20 warmest years: 100.0% Warmer than the 20th century average: 100.0% Warmer than the 1981-2010 average: 100.0% IPCC report The latest IPCC report claimed that the last three decades were the warmest on record, based on global decadal averages. Using the Hadley Center series, we investigate this claim. 8 4 8 0.3 0.0 Density 0.0 0.4 0.8 Density 0 12 0 4 8 12 1913 1922 1923 1932 8 0 4 8 12 0 4 8 0.0 0.3 0.0 0.6 0.0 12 0.6 1903 1912 Density 1893 1902 Density Rank 12 0 4 8 12 Rank 1933 1942 1943 1952 1953 1962 1963 1972 8 0 4 8 12 0 4 8 0.0 0.4 0.0 0.3 0.0 12 0.3 Rank Density Rank Density Rank 12 0 4 8 12 1973 1982 1983 1992 1993 2002 2003 2012 8 Rank 12 4 8 Rank 12 0 4 8 Rank 12 0.0 0.6 0.0 0.6 0.0 0 0.6 Rank Density Rank Density Rank Density Rank 0.3 4 12 Rank 0.0 0 8 1883 1892 Rank 0.3 4 4 1873 1882 Rank 0.0 0 0.3 0 Density 4 0.0 12 Density 0 Density 4 0.0 0.2 0.4 Density 0 Density 1863 1872 Density 0.3 0.0 Density 1853 1862 0 4 8 Rank 12 Last year warmest on record? 2015 was widely reported as the warmest year on record for annual global average temperature. We use the Hadley temperature series to investigate this claim. Based on 100,000 simulations, 2015 is the warmest in all but 724, but it could be as low as the 6th warmest. Other candidates for warmest year are 2014, 2010, 2004 and 1997. No year before 1997 was ranked warmest.