Slides 6

advertisement
Testing for equal
variance
Scale family: Y = sX
G(x) = P(sX ≤ x) = F(x/s)
To compute inverse, let y = G(x) =
F(x/s) so x/s = F-1(y)
x = G-1(y) = sF-1(y)
Δ(x) = G-1(F(x)) – x = s F-1(F(x)) – x
= (s-1)x
-2
-3
Blue slopes (-0.65,-0.20)
CI for scale ratio
(0.35,0.8)
-4
y Quantiles
-1
0
Shiftplot
0
1
2
3
x Quantiles
4
5
Assumptions
Iid
Scale family
Need moderately large samples
Testing equal variance
for distributions with
equal locations
Ranking m X-values and n Yvalues, the average rank is
(n+m)(n+m+1)/4
If F is more spread out than G, and
the locations are the same, we
would tend to have more large and
small residuals from the mean
rank for the X-values.
One way to get at this is to assign
rank 1 to the smallest and largest
values, 2 to second smallest and
second largest, and continue in
towards the middle.
The Ansari-Bradley test
Compute the sum of the X-ranks
as
where p=[(m+n+1)/2] and 1iX is the
indicator of the ith observation in
the combined ordered sample is
an X.
Small values of W correspond to F
being more dispersed.
In practice, align the locations
first.
Null distribution
Let f(w,m,n) be the number of
orders with m 1 and n 0 that yield
the statistic value W=w.
Assume 2N=m+n is even. If we
add one more X, either it or a Y is
N+1. If it is a Y there are f(w,m,n)
ways, while if it is an X, there are
f(w-N-1,m-1,n+1) ways. Thus we
get the recursion
f(w,m,n+1)=
f(w,m,n) + f(w-N-1,m-1,n+1)
Null distribution, cont.
Thus
E(W)=m(m+n+2)/4
R: ansari.test(x,y)
On the exponential samples,
subtracting the median from each
sample, p = 5x10-8
CI = (0.40,0.60)
estimate 0.49
Assumptions
Iid
Known difference between
locations
“No rank test (i.e., a test invariant
under strictly increasing
transformation of the scale) can
hope to be a satisfactory test
against dispersion alternatives
without some sort of strong
restrictions (e.g., equal or known
medians) being placed on the
class of admissible distribution
pairs. “ (Moses, 1963)
Another rank test
of variability
Siegel-Tukey:
1 45 8 9
7 6
3
Sum of green ranks 14
-4x5/2 = 4
Compare to Mann-Whitney
distribution
P-value 2 x 0.095 = 0.19
2
For exponential samples P-value
is 0.0005
NOAA State of the
Climate web site
State of the Climate
2008
rwrwrw
Shen et al. (2012)
2nd warmest –14th warmest
0.5
0.0
-0.5
-1.0
Temperature anomaly °C
1.0
1921, 4th warmest
1900
1920
1940
1960
Year
1980
2000
So we don’t really know
which is the fourth warmest year
But we have standard errors for
each year
Can we use the standard errors to
assess the uncertainty in ranks?
Simple approach
Draw independent normal random
numbers with the right mean and
sd for each year
Rank
Repeat to get an ensemble of
paths. R code:
0.5
0.0
-0.5
-1.0
Temperature anomaly (°C)
1.0
http://www.statmos.washington.edu/wp/wpcontent/uploads/2012/10/Uncertainty-analysis.txt
1900
1920
1940
1960
1980
2000
Rank distribution
-4
-2
-1
-2
0
0
1
2
2
But aren’t years
dependent?
0
20
40
60
80
0
100
20
40
60
80
100
Time
ACF
0.2
0.6
-0.2
0.2
ACF
0.6
1.0
1.0
Time
-0.2
0
0
5
10
15
5
10
15
20
Lag
20
Lag
-4
-3
-2
-1
0 1 2
1 2 3
Autocorrelation = correlation with itself shifted over
0
20
40
60
80
0
100
20
40
80
100
ACF
-0.5
-0.2
0.0
0.2
0.5
0.6
1.0
1.0
Time
ACF
60
Time
0
5
10
15
20
0
5
10
Lag
Lag
15
20
0
-6
-2
-4
-2
ar2[2:100]
0
-1
white[2:100]
1
2
2
Lagged plots
-2
-1
0
1
2
-6
-4
-2
2
ar2[1:99]
0
-1
-4
-3
-2
0
-2
-4
ma2[2:100]
2
arma11[2:100]
1
4
2
white[1:99]
0
-4
-2
0
ma2[1:99]
2
4
-4
-3
-2
-1
0
arma11[1:99]
1
2
Autoregression
Idea: Predict the current value
from previous values
k’th order autoregression
R commands
library(forecast)
acf(series)
ar(series)
Moving average
Idea: Current value is obtained by
weighted average of previous
errors
Moving average of order k
auto.arima(series)
ARIMA models
George Box and Gwilym Jenkins
1919-2013
1932-1982
We have already seen AR and MA
ARIMA(0,1,0): Xt = Xt-1 + t
or t = Xt – Xt-1, differencing
Can be iterated.
ARIMA(p,1,q) has t following an
ARMA(p,q) model.
Why worry?
In climate contexts we are often
interested in fitting trends. Here is a
sequence of slope fits to US monthly
average temperature:
OLS
0.0055°C/y
WLS
0.0048°C/y
GLS (AR4) 0.0053°C/y
GLS (ARMA(3,1)
0.0059°C/y
sd 0.0012***
sd 0.0014***
sd 0.0026*
sd 0.0032
The same data, increasingly realistic
models. Significance disappears.
Does dependence
matter?
0.5
0.0
-0.5
-1.0
Temperature anomaly (°C)
1.0
Structure iid
1900
1920
1940
1960
1980
2000
Structure ARMA(3,1)
0.5
0.0
-0.5
-1.0
Temperature anomaly (°C)
1.0
Year
1900
1920
1940
1960
Year
1980
2000
1993 0
1994 0
1995 0
1996 0
0.04
Density
0.00
0.20
0
40
80
80
0
40
80
0
40
0.08
0.00
0.4
0.0
2.0
0.0
Density
40
Density
2000 0
Density
1999 0.0004
80
0
40
80
2001 0
2002 0
2003 0
2004 0
0
40
0.00
Density
040 4080 80
6000
0
0
Density
15000
1996
2008
0 0
0
0 40 4080 80
RankRank
Rank
Rank
20000
80
2000 0
Frequency
1999 0.0002
Frequency
40000
0
0
0
40
80
Rank
80
0
40
80
2002 0
2003 0
2004 0
80
0
40
0 6000
6000
0
10000
40
80
0
40
0 10000
2001 0
Frequency
Rank
Frequency
Rank
Frequency
Rank
80
0
40
80
2005 0
2006 0.273
2007 0
2008 0
80
Rank
0
40
80
Rank
0
40
80
Rank
0
8000
0
30000
0
8000
40
6000
Rank
Frequency
Rank
Frequency
Rank
Frequency
Rank
0
0
0
1998 0.6159
80
40
Rank
0
0
80
RankRank
Frequency
40
40
80
0
1995
2007
0 0
Density
040
80
Rank
0 15000
Frequency
0.0
1.5
80
6000
Dependent
40
0
40
80
0
0
0
1994 0
2006 0.2874
Density
0
40
80
Rank
0 6000
40
0.00
Density
0
Frequency
0.00
0.04
80
Frequency
0.00
0.10
40
0.15
Rank
0.08
Rank
0.08
Rank
Density
Rank
0 6000
Density
0.00
0.10
Frequency
80
1998 0.6448
1997 0
Frequency
40
1997 0
Rank Rank
Frequency
0
Rank
20051993
0 0
Frequency
80
Rank
Rank
0
0.00
Density
40
Rank
0.00 0.10
0
0.20
Density
0
0.04
0
Density
80
Rank
0.00
Independent
40
0.00
Density
0
0.00
0.10
0.00
Density
Effect of dependence
0
40
80
Rank
8
6
4
2
0
Dependent rank sd
10
12
Rank sd
0
2
4
6
8
Independent rank sd
10
12
Year
Independent Dependent Rank in
model
model
Shen et al.
1921
0.99
0.99
4
1931
0.95
0.94
6
1934
1.00
1.00
3
1938
0.12
0.14
1939
0.18
0.21
1953
0.78
0.74
1954
0.12
0.15
1990
0.82
0.79
7
1998
1.00
1.00
1
1999
1.00
1.00
5
2001
0.71
0.69
9
2005
0.61
0.58
10
2006
1.00
1.00
2
2007
0.61
0.58
8
Back to
State of the Climate
“2012 ... was the warmest year in
the 1895-2012 period of record for
the nation.”
0.08
0.06
Standard error (°C)
0.10
0.12
Need to extrapolate
standard error
1900
1920
1940
1960
Year
se(2012) ≈ 0.08
anomaly(2012) = 1.7
anomaly(1998) = 1.2
0.5/0.08 ≈ 6 !!!
1980
2000
And the uncertainty in
the ranking of 2012 is...
0.0
0.2
0.4
Density
0.6
0.8
1.0
2012 1.0
0
20
40
60
Rank
80
100
120
NOAA State of the
Climate 2014
The probability that 2014 was...
Warmest year on record: 48.0%
One of the five warmest years:
90.4%
One of the 10 warmest years:
99.2%
One of the 20 warmest years:
100.0%
Warmer than the 20th century
average: 100.0%
Warmer than the 1981-2010
average: 100.0%
IPCC report
The latest IPCC report claimed
that the last three decades were
the warmest on record, based on
global decadal averages. Using
the Hadley Center series, we
investigate this claim.
8
4
8
0.3
0.0
Density
0.0 0.4 0.8
Density
0
12
0
4
8
12
1913
1922
1923
1932
8
0
4
8
12
0
4
8
0.0
0.3
0.0
0.6
0.0
12
0.6
1903
1912
Density
1893
1902
Density
Rank
12
0
4
8
12
Rank
1933
1942
1943
1952
1953
1962
1963
1972
8
0
4
8
12
0
4
8
0.0
0.4
0.0
0.3
0.0
12
0.3
Rank
Density
Rank
Density
Rank
12
0
4
8
12
1973
1982
1983
1992
1993
2002
2003
2012
8
Rank
12
4
8
Rank
12
0
4
8
Rank
12
0.0
0.6
0.0
0.6
0.0
0
0.6
Rank
Density
Rank
Density
Rank
Density
Rank
0.3
4
12
Rank
0.0
0
8
1883
1892
Rank
0.3
4
4
1873
1882
Rank
0.0
0
0.3
0
Density
4
0.0
12
Density
0
Density
4
0.0 0.2 0.4
Density
0
Density
1863
1872
Density
0.3
0.0
Density
1853
1862
0
4
8
Rank
12
Last year warmest
on record?
2015 was widely reported as the
warmest year on record for annual
global average temperature. We
use the Hadley temperature series
to investigate this claim.
Based on 100,000 simulations,
2015 is the warmest in all but 724,
but it could be as low as the 6th
warmest.
Other candidates for warmest year
are 2014, 2010, 2004 and 1997.
No year before 1997 was ranked
warmest.
Download