SUFFICIENT STATISTICS OF ENSEMBLE FOR

advertisement
SUFFICIENT STATISTICS OF ENSEMBLE
FOR
PROBABILISTIC WEATHER FORECASTING
By
Nah Youn Lee and Roman Krzysztofowicz
University of Virginia
Presented at the
Forecast Applications Branch
NOAA / OAR / ESRL / GSD
Boulder, Colorado
11 May 2010
Acknowledgments:
Work supported by the National Science Foundation
under Grant No. ATM–0641572.
Collaboration of Zoltan Toth, Forecast Applications Branch
NOAA / OAR / ESRL / GSD.
BAYESIAN PROCESSOR OF ENSEMBLE (BPE)
Represents the stochastic dependence between Y and W:
f (y | w) g ( w)
y = (y1, …, yJ)
φ (w | y) =
κ (y )
κ ( y ) = ∫ f ( y | w) g ( w) dw
Calibrates ensemble against prior density function:
E [φ ( w | Y ) ] = ∫ φ ( w | y ) ⋅ κ ( y ) dy
= ∫ f ( y | w) ⋅ g ( w) d y
= g ( w) ∫ f ( y | w ) d y
= g (w)
2
SUFFICIENT STATISTICS OF ENSEMBLE
Reduce dimension without loss of information
W – predictand
Y – ensemble
Y = (Y1, …, YJ)
Y
ξ
(X, T)
X – predictor of central tendency of W
T – predictor of uncertainty about W
Objective: Find ξ such that for every (x, t) = ξ(y) and w
φ ( w | x, t ) = φ ( w | y )
⇒ (X, T) and Y are equally informative
3
BASIC BPE
Posterior Density
f ( x | t , w) g ( w)
φ ( w | x, t ) =
κ (x | t)
Information Extraction: Parameter Estimation
g (w)
Prior
← climatic sample from 40 years
Likelihood f ( x | t , w)
← joint sample from 2 years
Parametric Models: Conjugate families of distributions
Gaussian–Gamma
e.g., W – temperature
Meta–Gaussian–Gamma
e.g., W – precip. amount
4
PRIOR AND ENSEMBLE DATA
National Centers for Environmental Prediction (NCEP) at
National Weather Service (NWS)
Location: Savannah, GA
Predictand: 2m temperature at 12 UTC
Forecast time: 00 UTC
Lead times: 12, 60, 108, 156, 204, 252, 300, 348 h
Prior (Climatic) sample: 1959 – 1998 (40 years), re-analysis
Joint sample: 28 Mar. 2007 – 28 Feb. 2009 (~ 2 years)
– Ensemble forecast, low-resolution (20 members)
– High-resolution forecast (HR)
5
STANDARDIZATION
w(k ) − mk
w' (k ) =
sk
y j ' (k ) =
w(k) – realization of predictand
yj(k) – realization of ensemble member j
y j (k ) − mk
sk
j = 1, …, 20
mk – climatic mean on day k of the year
sk – climatic standard deviation on day k of the year
mk and sk estimated from 40 years × 15 days = 600 points
Objective: Obtain margin-stationary and quasi-ergodic
time series of the predictand and ensemble forecasts
6
PRIOR (CLIMATIC) MEAN AND STD. DEV.
Sample
Fourier
300
2m temp.
Savannah, GA
296
Mean m k [K]
12 UTC
298
294
292
290
288
286
284
282
0
50
100
150
200
250
300
350
Day of Year k
40 years
(1959 – 98)
NCEP/NCAR
Re-analysis
Standard Deviation sk [K]
6
Sample
Fourier
5
4
3
2
1
0
0
50
100
150
200
Day of Year k
250
300
350
7
STANDARDIZATION OF ENSEMBLE
USING PRIOR MOMENTS
300
Temperature [K]
Original
20 members
FT = 00 UTC
LT = 108 h
2008
290
280
270
0
50
100
150
200
250
300
350
250
300
350
Day of Year k
Standardized Temperature
Standardized
(1) Margin –
stationary
(2) Quasi –
ergodic
(3) Approx.
Gaussian
5
4
3
2
1
0
-1
-2
-3
-4
-5
0
50
100
150
200
Day of Year k
8
PROPERTIES OF ENSEMBLE
(1) Approx. Identically
Distributed:
P(Yj ≤ yj )
0.8
Lead Time 108 h
Warm Season
N = 366 × 20
1.0
0.8
P(X ≤ x )
1.0
(4) Miscalibrated:
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
-4
-3
-2
-1
0
1
2
3
4
5
Lead Time 108 h
Warm Season
N = 366
Analysis
All 20 Members
-4
-3
yj
-2
-1
0
1
2
3
4
5
x
(2) NOT Stochastically Independent:
0.83 < Cor (Yi, Yj) < 0.92
(3) NOT Conditionally Stochastically Independent:
0.69 < Cor (Yi, Yj | W = w) < 0.85
⇒ Ensemble is NOT a random sample
9
PROPERTIES OF ENSEMBLE
(1) Approx. Identically
Distributed:
P(Yj ≤ yj )
0.8
Lead Time 108 h
Cool Season
N = 334 × 20
1.0
0.8
P(X ≤ x )
1.0
(4) Miscalibrated:
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
-4
-3
-2
-1
0
1
2
3
4
5
Lead Time 108 h
Cool Season
N = 334
Analysis
All 20 Members
-4
-3
yj
-2
-1
0
1
2
3
4
5
x
(2) NOT Stochastically Independent:
0.83 < Cor (Yi, Yj) < 0.90
(3) NOT Conditionally Stochastically Independent:
0.53 < Cor (Yi, Yj | W = w) < 0.73
⇒ Ensemble is NOT a random sample
10
PROPERTIES OF ENSEMBLE
Cor (Yi, Yj | W = w)
Conditional Correlation:
1.0
1.0
Cool Season
0.9
0.8
0.8
0.7
0.6
0.5
Max
0.4
0.3
Average
0.2
Min
Conditional Correlation
Conditional Correlation
Warm Season
0.9
0.7
0.6
0.5
0.4
0.2
0.1
0.1
0.0
0.0
12
60
108
156
204
252
Lead Time [h]
300
348
Max
0.3
Average
Min
12
60
108
156
204
252
300
348
Lead Time [h]
11
GAUSSIAN MODEL OF (X|W=w)
Predictive Regression
I
E (W | X = x) = ∑ ci xi + c 0
Aggregate Predictor
Likelihood Model
i =1
I
X = ∑ ci X i
X = aw + b + Θ
E ( X | w) = aw + b
Var ( X | w) = σ 2
i =1
Θ independent of W, Gaussian
E (Θ) = 0
2
Var (Θ) = σ
⎛σ
⎞
Informativeness Score (IS):
IS = ⎜⎜ 2 + 1⎟⎟
⎝a
⎠
Objective: Evaluate candidate statistics for X
2
−1
12
INFORMATIVENESS SCORE (IS)
Signal-to-noise ratio:
| a | /σ
Standardized by prior variance S2, then transformed:
⎡⎛ | a | / σ ⎞
IS = ⎢⎜
⎟
⎢⎣⎝ 1 / S ⎠
−2
⎤
+ 1⎥
⎥⎦
−1
Prior variance S2 = 1, so IS simplifies to:
⎛σ
⎞
IS = ⎜⎜ 2 + 1⎟⎟
⎝a
⎠
2
−1
Square of the Bayesian estimator of Cor(X, W), from two
asymmetric samples (prior, joint), and therefore IS ≠ R2
13
VALIDATION OF GAUSSIAN MODEL OF (Yj, W)
Gaussian W
1.0
Lead Time 108 h
Warm Season
N = 366
1.0
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
-4
-3
-2
-1
0
1
2
3
4
5
-5
w
-4
-3
-2
-1
0
1
2
3
4
5
y18
Linear and Homoscedastic Y18 | W = w
y18
-5
Lead Time 108 h
Warm Season
N = 366
0.8
P(Y18 ≤ y18 )
0.8
P(W ≤ w )
Gaussian Y18
Lead Time 108 h
Warm Season
N = 366
5
4
3
2
1
0
-1
-2
-3
-4
-5
E(Y18|w) = aw + b
Var(Θ) = σ2
a = 0.57
b = 0.50
σ = 0.76
IS = 0.36
-5
-4
-3
-2
-1
0
w
1
2
3
4
5
14
VALIDATION OF GAUSSIAN MODEL OF (Yj, W)
Gaussian W
1.0
Lead Time 108 h
Cool Season
N = 334
1.0
0.6
0.4
0.6
0.4
0.2
0.2
0.0
0.0
-4
-3
-2
-1
0
1
2
3
4
5
-5
w
-4
-3
-2
-1
Linear and Homoscedastic Y18 | W = w
y18
-5
Lead Time 108 h
Cool Season
N = 334
0.8
P(Y18 ≤ y18 )
0.8
P(W ≤ w )
Gaussian Y18
0
1
2
3
4
5
y18
Lead Time 108 h
Cool Season
N = 334
5
4
3
2
1
0
-1
-2
-3
-4
-5
E(Y18|w) = aw + b
Var(Θ) = σ2
a = 0.74
b = 0.45
σ = 0.62
IS = 0.59
-5
-4
-3
-2
-1
0
w
1
2
3
4
5
15
PREDICTORS OF CENTRAL TENDENCY
Candidate Predictors (822)
Ensemble Members (20)
Ensemble Statistics (9)
Mean
Median
Mode
Midrange
Upper Mean
Lower Mean
Mean of Majority
Mean of Minority
Coefficient of Skewness
Low Resolution Forecast
High Resolution Forecast
Pairs of Statistics (36)
HR + Statistic (9)
HR + Pair of Statistics (36)
Combination of Members (160)
Mean + Combination of Members (140)
HR + Combination of Members (180)
HR + Mean + Combination of Members (220)
Minimum
Maximum
Min + Max
Mean + Min
Mean + Max
Mean + Min + Max
Constructed Predictors (4):
Modifications of mean of majority and
mean of minority
Modifications of upper mean and lower
mean
16
INFORMATIVENESS OF ENSEMBLE
Warm Season
1.0
Ensemble members
Ensemble mean
Best combination of members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
17
INFORMATIVENESS OF ENSEMBLE
Warm Season
1.0
Ensemble members
Ensemble mean
Best combination of members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
17
INFORMATIVENESS OF ENSEMBLE
Warm Season
1.0
Ensemble members
Ensemble mean
Best combination of members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
17
INFORMATIVENESS OF ENSEMBLE + HR
Warm Season
1.0
High resolution forecast (HR)
HR + ensemble mean
HR + members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
18
INFORMATIVENESS OF ENSEMBLE + HR
Warm Season
1.0
High resolution forecast (HR)
HR + ensemble mean
HR + members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
18
INFORMATIVENESS OF ENSEMBLE + HR
Warm Season
1.0
High resolution forecast (HR)
HR + ensemble mean
HR + members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
18
INFORMATIVENESS OF ENSEMBLE
Cool Season
1.0
Ensemble members
Ensemble mean
Best combination of members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
19
INFORMATIVENESS OF ENSEMBLE
Cool Season
1.0
Ensemble members
Ensemble mean
Best combination of members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
19
INFORMATIVENESS OF ENSEMBLE
Cool Season
1.0
Ensemble members
Ensemble mean
Best combination of members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
19
INFORMATIVENESS OF ENSEMBLE + HR
Cool Season
1.0
High resolution forecast (HR)
HR + ensemble mean
HR + members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
20
INFORMATIVENESS OF ENSEMBLE + HR
Cool Season
1.0
High resolution forecast (HR)
HR + ensemble mean
HR + members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
20
INFORMATIVENESS OF ENSEMBLE + HR
Cool Season
1.0
High resolution forecast (HR)
HR + ensemble mean
HR + members
Informativeness Score (IS)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
12
60
108
156
204
252
300
348
Lead Time [h]
20
INFORMATIVENESS SCORES
Lead Time
12 h
60 h
108 h
156 h
204 h
252 h
300 h
348 h
Lead Time
12 h
60 h
108 h
156 h
204 h
252 h
300 h
348 h
Mean
0.79
0.54
0.43
0.31
0.20
0.16
0.13
0.10
Mean
0.95
0.87
0.70
0.48
0.25
0.16
0.10
0.08
HR
0.90
0.72
0.48
0.29
0.13
0.08
0.02
0.02
Warm Season
Mean + HR Members Members + HR
0.91
0.79
0.91
0.72
0.55
0.73
0.51
0.45
0.53
0.35
0.33
0.36
0.22
0.22
0.23
0.18
0.20
0.17
-0.14
0.08
-0.10
0.06
HR
0.97
0.89
0.67
0.43
0.22
0.08
0.03
0.01
Cool Season
Mean + HR Members Members + HR
0.97
0.95
0.97
0.90
0.88
0.90
0.73
0.72
0.73
0.51
0.50
0.52
0.29
0.28
0.31
0.17
0.18
0.17
-0.14
0.14
-0.09
0.02
21
Rank by IS
PROFILES OF MEMBER RANKS BY IS
Warm Season
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
y12
y17
y14
y20
y1
y2
y10
y19
y9
y10
y5
y7
y2
y15
y20
y6
y11
y4
y3
y14
y18
y16
y7
y1
y15
y13
y13
y8
y4
y11
y8
y5
y6
y12
y16
y3
y17
y9
y19
y18
12
60
108
156
204
252
300
348
Lead Time [h]
22
Rank by IS
PROFILES OF MEMBER RANKS BY IS
Cool Season
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
y14
y9
y4
y4
y18
y16
y9
y7
y8
y10
y3
y15
y13
y19
y17
y17
y2
y14
y1
y5
y20
y3
y11
y1
y7
y20
y10
y8
y12
y2
y5
y6
y15
y13
y6
y18
y19
y11
y16
y12
12
60
108
156
204
252
300
348
Lead Time [h]
23
FREQUENCY OF MEMBER IN OPTIMAL COMBINATION
Warm, Cool Season
Lead Time: All Eight
7
6
Frequency
5
4
3
2
1
0
y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15 y16 y17 y18 y19 y20
Ensemble Member
24
VALIDATION OF GAUSSIAN MODEL of (X |W = w)
Linear
5
3
x = ens. mean
Lead Time 60 h
Warm Season
N = 368
x = ens. mean
Lead Time 60 h
Warm Season
N = 368
4
3
2
1
1
θ
2
0
0
E(X | w) = aw + b
-1
Var(X | w) = σ
a = 0.69
b = 0.47
σ = 0.65
IS = 0.54
2
-2
-3
-4
-5
-4
-3
-2
-1
0
1
2
3
4
-1
-2
-3
5
-5
w
-4
-3
-2
-1
0
1
2
3
4
5
w
Gaussian
x = ens. mean
Lead Time 60 h
Warm Season
N = 368
1.0
0.8
P(Θ ≤ θ )
x = ensemble mean
4
Homoscedastic
0.6
0.4
N(0, σ 2)
σ = 0.65
0.2
0.0
25
-3
-2
-1
0
1
θ
2
3
4
VALIDATION OF GAUSSIAN MODEL of (X |W = w)
Linear
5
3
x = ens. mean
Lead Time 252 h
Cool Season
N = 334
x = ens. mean
Lead Time 252 h
Cool Season
N = 334
4
3
2
1
1
θ
2
0
0
E(X | w) = aw + b
-1
Var(X | w) = σ
a = 0.27
b = 0.25
σ = 0.62
IS = 0.16
2
-2
-3
-4
-5
-4
-3
-2
-1
0
1
2
3
4
-1
-2
-3
5
-5
w
-4
-3
-2
-1
0
1
2
3
4
5
w
Gaussian
x = ens. mean
Lead Time 252 h
Cool Season
N = 334
1.0
0.8
P(Θ ≤ θ )
x = ensemble mean
4
Homoscedastic
0.6
0.4
N(0, σ 2)
σ = 0.62
0.2
0.0
26
-3
-2
-1
0
1
θ
2
3
4
GAUSSIAN–GAMMA MODEL OF (X|T=t, W=w)
X = aw + b + Θ
Θ
Θ
independent of W, Gaussian
dependent on T
E (Θ | t ) = 0
2
Var (Θ | t ) = ν t
E ( X | t , w) = aw + b
Var ( X | t , w) = ν t
2
⎛1⎞
⎜ ⎟ ~ Gamma (α , β ) ,
⎝T ⎠
ν = α ( β − 1) σ
2
2
α > 0 , β >1
unit variance
Objective: Evaluate candidate statistics for T
27
PREDICTORS OF UNCERTAINTY
Candidate Predictors (16)
d – standard deviation (spread)
r
– range
tp – credible interval (6), p = .9, .8, .7, .6, .5, .4
ck – coefficient of kurtosis
– transformations and combinations (7)
2
Best Predictor T: max Cor (Θ , T )
Lead Time
day 1
12 h
3
60
5
108
7
156
9
204
11
252
13
300
15
348
Warm Season
T
Cor
0.19
r
0.29
→r
0.25
r
0.07
r
0.26
r
0.20
r
0.11
t.7
0.16
t.5
Cool Season
T
Cor
0.23
r(ck-1)
0.11
t.9
0.19
t.9
0.17
r
0.07
ck-1
0.32
→ r(ck-1)
0.15
ck-1
0.20
ck-1
(.00 for d)
(.03 for d)
(-.01 for d)
(-.05 for d)
28
VALIDATION OF GAMMA MODEL of T
Gamma
Linear
1.0
12
0.8
10
E(Θ2|t) = ν2t
0.6
t = range
Lead Time 60 h
Warm Season
N = 368
0.4
= 0.50
LSQ regression
6
Cor(Θ2,Τ ) = 0.29
4
Gamma (α, β )
α = 0.60
β = 3.00
0.2
ν2 = α (β−1)σ2
8
θ2
P(1/Τ ≤1/ t )
t = range
Lead Time 60 h
Warm Season
N = 368
2
0.0
0
0
1
2
3
4
5
6
7
8
9
0.0
0.5
1.0
1.5
Independent
2.5
3.0
3.5
4.0
Constant - Heteroscedastic
2.5
Cor(Τ, W) = -0.08
2.0
1.5
1.0
0.5
t = range
Lead Time 60 h
Warm Season
N = 368
6
5
x = ensemble mean
t = range
Lead Time 60 h
Warm Season
N = 368
3.0
t = range
2.0
t
1/ t
4
3
2
E(X | t ) = aM + b
1
0
1 Std. Dev.
-1
2 Std. Dev.
-2
-3
29
-4
0.0
-5
-4
-3
-2
-1
0
w
1
2
3
4
5
0.0
0.5
1.0
1.5
t
2.0
2.5
3.0
VALIDATION OF GAMMA MODEL of T
Gamma
Linear
t = r (ck – 1)
Lead Time 252 h
Cool Season
N = 334
7
1.0
6
0.8
0.6
t = r (ck – 1)
Lead Time 252 h
Cool Season
N = 334
0.4
θ2
P(1/Τ ≤1/ t )
5
ν2 = α (β−1)σ2
4
= 0.06
LSQ regression
3
Cor(Θ2,Τ ) = 0.32
2
Gamma (α, β )
α = 0.14
β = 2.07
0.2
E(Θ2|t) = ν2t
1
0.0
0
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
0
10
20
Independent
40
Constant - Heteroscedastic
Cor(Τ, W) = -0.05
20
10
t = r (ck – 1)
Lead Time 252 h
Cool Season
N = 334
6
5
x = ensemble mean
t = r (ck – 1)
Lead Time 252 h
Cool Season
N = 334
30
t = r (ck – 1)
30
t
1/ t
4
3
2
1
E(X | t ) = aM + b
0
1 Std. Dev.
-1
-2
2 Std. Dev.
-3
30
-4
0
-5
-4
-3
-2
-1
0
w
1
2
3
4
5
0
10
20
t
30
CONCLUSIONS
1. Ensemble of 20 members has 1-7 approx. sufficient statistics
they vary with season and lead time
2. Predictors of Central Tendency
Combination of 2-5 Members
Mean
3. Predictors of Uncertainty
Range
Credible Interval
Coefficient of Kurtosis
– for sophisticated users
– for mass processing
– better than Std. Dev. (Spread)
4. Combining Ensemble with HR Forecast
HR + Combination of 1-5 Members
HR + Mean
31
GAUSSIAN–GAMMA BPE: Synopsis
y = (y1, …, yJ)
(x, t), x = (x1, …, xI), I < J
Ensemble Forecast:
Sufficient Statistics:
I
Aggregate Predictor of Central Tendency:
Predictor of Uncertainty:
t
x = ∑ ci xi
i =1
2
f ( x | t , w) ~
N
(
aw
+
b
,
ν
t)
–
2
g ( w) ~
N
(
M
,
S
)
–
Likelihood Function:
Prior Density:
Posterior Density: φ ( w | x, t ) –
~ N ( E (W | x, t ), Var (W | x, t ))
aS 2
E (W | x, t ) = 2 2
a S +ν 2t
Mν 2 t − abS 2
ci xi +
∑
a 2 S 2 +ν 2t
i =1
I
S 2ν 2 t
Var (W | x, t ) = 2 2
a S +ν 2t
linear in xi
non-linear in t
non-linear in t
32
Download