SUFFICIENT STATISTICS OF ENSEMBLE FOR PROBABILISTIC WEATHER FORECASTING By Nah Youn Lee and Roman Krzysztofowicz University of Virginia Presented at the Forecast Applications Branch NOAA / OAR / ESRL / GSD Boulder, Colorado 11 May 2010 Acknowledgments: Work supported by the National Science Foundation under Grant No. ATM–0641572. Collaboration of Zoltan Toth, Forecast Applications Branch NOAA / OAR / ESRL / GSD. BAYESIAN PROCESSOR OF ENSEMBLE (BPE) Represents the stochastic dependence between Y and W: f (y | w) g ( w) y = (y1, …, yJ) φ (w | y) = κ (y ) κ ( y ) = ∫ f ( y | w) g ( w) dw Calibrates ensemble against prior density function: E [φ ( w | Y ) ] = ∫ φ ( w | y ) ⋅ κ ( y ) dy = ∫ f ( y | w) ⋅ g ( w) d y = g ( w) ∫ f ( y | w ) d y = g (w) 2 SUFFICIENT STATISTICS OF ENSEMBLE Reduce dimension without loss of information W – predictand Y – ensemble Y = (Y1, …, YJ) Y ξ (X, T) X – predictor of central tendency of W T – predictor of uncertainty about W Objective: Find ξ such that for every (x, t) = ξ(y) and w φ ( w | x, t ) = φ ( w | y ) ⇒ (X, T) and Y are equally informative 3 BASIC BPE Posterior Density f ( x | t , w) g ( w) φ ( w | x, t ) = κ (x | t) Information Extraction: Parameter Estimation g (w) Prior ← climatic sample from 40 years Likelihood f ( x | t , w) ← joint sample from 2 years Parametric Models: Conjugate families of distributions Gaussian–Gamma e.g., W – temperature Meta–Gaussian–Gamma e.g., W – precip. amount 4 PRIOR AND ENSEMBLE DATA National Centers for Environmental Prediction (NCEP) at National Weather Service (NWS) Location: Savannah, GA Predictand: 2m temperature at 12 UTC Forecast time: 00 UTC Lead times: 12, 60, 108, 156, 204, 252, 300, 348 h Prior (Climatic) sample: 1959 – 1998 (40 years), re-analysis Joint sample: 28 Mar. 2007 – 28 Feb. 2009 (~ 2 years) – Ensemble forecast, low-resolution (20 members) – High-resolution forecast (HR) 5 STANDARDIZATION w(k ) − mk w' (k ) = sk y j ' (k ) = w(k) – realization of predictand yj(k) – realization of ensemble member j y j (k ) − mk sk j = 1, …, 20 mk – climatic mean on day k of the year sk – climatic standard deviation on day k of the year mk and sk estimated from 40 years × 15 days = 600 points Objective: Obtain margin-stationary and quasi-ergodic time series of the predictand and ensemble forecasts 6 PRIOR (CLIMATIC) MEAN AND STD. DEV. Sample Fourier 300 2m temp. Savannah, GA 296 Mean m k [K] 12 UTC 298 294 292 290 288 286 284 282 0 50 100 150 200 250 300 350 Day of Year k 40 years (1959 – 98) NCEP/NCAR Re-analysis Standard Deviation sk [K] 6 Sample Fourier 5 4 3 2 1 0 0 50 100 150 200 Day of Year k 250 300 350 7 STANDARDIZATION OF ENSEMBLE USING PRIOR MOMENTS 300 Temperature [K] Original 20 members FT = 00 UTC LT = 108 h 2008 290 280 270 0 50 100 150 200 250 300 350 250 300 350 Day of Year k Standardized Temperature Standardized (1) Margin – stationary (2) Quasi – ergodic (3) Approx. Gaussian 5 4 3 2 1 0 -1 -2 -3 -4 -5 0 50 100 150 200 Day of Year k 8 PROPERTIES OF ENSEMBLE (1) Approx. Identically Distributed: P(Yj ≤ yj ) 0.8 Lead Time 108 h Warm Season N = 366 × 20 1.0 0.8 P(X ≤ x ) 1.0 (4) Miscalibrated: 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 -4 -3 -2 -1 0 1 2 3 4 5 Lead Time 108 h Warm Season N = 366 Analysis All 20 Members -4 -3 yj -2 -1 0 1 2 3 4 5 x (2) NOT Stochastically Independent: 0.83 < Cor (Yi, Yj) < 0.92 (3) NOT Conditionally Stochastically Independent: 0.69 < Cor (Yi, Yj | W = w) < 0.85 ⇒ Ensemble is NOT a random sample 9 PROPERTIES OF ENSEMBLE (1) Approx. Identically Distributed: P(Yj ≤ yj ) 0.8 Lead Time 108 h Cool Season N = 334 × 20 1.0 0.8 P(X ≤ x ) 1.0 (4) Miscalibrated: 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 -4 -3 -2 -1 0 1 2 3 4 5 Lead Time 108 h Cool Season N = 334 Analysis All 20 Members -4 -3 yj -2 -1 0 1 2 3 4 5 x (2) NOT Stochastically Independent: 0.83 < Cor (Yi, Yj) < 0.90 (3) NOT Conditionally Stochastically Independent: 0.53 < Cor (Yi, Yj | W = w) < 0.73 ⇒ Ensemble is NOT a random sample 10 PROPERTIES OF ENSEMBLE Cor (Yi, Yj | W = w) Conditional Correlation: 1.0 1.0 Cool Season 0.9 0.8 0.8 0.7 0.6 0.5 Max 0.4 0.3 Average 0.2 Min Conditional Correlation Conditional Correlation Warm Season 0.9 0.7 0.6 0.5 0.4 0.2 0.1 0.1 0.0 0.0 12 60 108 156 204 252 Lead Time [h] 300 348 Max 0.3 Average Min 12 60 108 156 204 252 300 348 Lead Time [h] 11 GAUSSIAN MODEL OF (X|W=w) Predictive Regression I E (W | X = x) = ∑ ci xi + c 0 Aggregate Predictor Likelihood Model i =1 I X = ∑ ci X i X = aw + b + Θ E ( X | w) = aw + b Var ( X | w) = σ 2 i =1 Θ independent of W, Gaussian E (Θ) = 0 2 Var (Θ) = σ ⎛σ ⎞ Informativeness Score (IS): IS = ⎜⎜ 2 + 1⎟⎟ ⎝a ⎠ Objective: Evaluate candidate statistics for X 2 −1 12 INFORMATIVENESS SCORE (IS) Signal-to-noise ratio: | a | /σ Standardized by prior variance S2, then transformed: ⎡⎛ | a | / σ ⎞ IS = ⎢⎜ ⎟ ⎢⎣⎝ 1 / S ⎠ −2 ⎤ + 1⎥ ⎥⎦ −1 Prior variance S2 = 1, so IS simplifies to: ⎛σ ⎞ IS = ⎜⎜ 2 + 1⎟⎟ ⎝a ⎠ 2 −1 Square of the Bayesian estimator of Cor(X, W), from two asymmetric samples (prior, joint), and therefore IS ≠ R2 13 VALIDATION OF GAUSSIAN MODEL OF (Yj, W) Gaussian W 1.0 Lead Time 108 h Warm Season N = 366 1.0 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 -4 -3 -2 -1 0 1 2 3 4 5 -5 w -4 -3 -2 -1 0 1 2 3 4 5 y18 Linear and Homoscedastic Y18 | W = w y18 -5 Lead Time 108 h Warm Season N = 366 0.8 P(Y18 ≤ y18 ) 0.8 P(W ≤ w ) Gaussian Y18 Lead Time 108 h Warm Season N = 366 5 4 3 2 1 0 -1 -2 -3 -4 -5 E(Y18|w) = aw + b Var(Θ) = σ2 a = 0.57 b = 0.50 σ = 0.76 IS = 0.36 -5 -4 -3 -2 -1 0 w 1 2 3 4 5 14 VALIDATION OF GAUSSIAN MODEL OF (Yj, W) Gaussian W 1.0 Lead Time 108 h Cool Season N = 334 1.0 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 -4 -3 -2 -1 0 1 2 3 4 5 -5 w -4 -3 -2 -1 Linear and Homoscedastic Y18 | W = w y18 -5 Lead Time 108 h Cool Season N = 334 0.8 P(Y18 ≤ y18 ) 0.8 P(W ≤ w ) Gaussian Y18 0 1 2 3 4 5 y18 Lead Time 108 h Cool Season N = 334 5 4 3 2 1 0 -1 -2 -3 -4 -5 E(Y18|w) = aw + b Var(Θ) = σ2 a = 0.74 b = 0.45 σ = 0.62 IS = 0.59 -5 -4 -3 -2 -1 0 w 1 2 3 4 5 15 PREDICTORS OF CENTRAL TENDENCY Candidate Predictors (822) Ensemble Members (20) Ensemble Statistics (9) Mean Median Mode Midrange Upper Mean Lower Mean Mean of Majority Mean of Minority Coefficient of Skewness Low Resolution Forecast High Resolution Forecast Pairs of Statistics (36) HR + Statistic (9) HR + Pair of Statistics (36) Combination of Members (160) Mean + Combination of Members (140) HR + Combination of Members (180) HR + Mean + Combination of Members (220) Minimum Maximum Min + Max Mean + Min Mean + Max Mean + Min + Max Constructed Predictors (4): Modifications of mean of majority and mean of minority Modifications of upper mean and lower mean 16 INFORMATIVENESS OF ENSEMBLE Warm Season 1.0 Ensemble members Ensemble mean Best combination of members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 17 INFORMATIVENESS OF ENSEMBLE Warm Season 1.0 Ensemble members Ensemble mean Best combination of members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 17 INFORMATIVENESS OF ENSEMBLE Warm Season 1.0 Ensemble members Ensemble mean Best combination of members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 17 INFORMATIVENESS OF ENSEMBLE + HR Warm Season 1.0 High resolution forecast (HR) HR + ensemble mean HR + members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 18 INFORMATIVENESS OF ENSEMBLE + HR Warm Season 1.0 High resolution forecast (HR) HR + ensemble mean HR + members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 18 INFORMATIVENESS OF ENSEMBLE + HR Warm Season 1.0 High resolution forecast (HR) HR + ensemble mean HR + members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 18 INFORMATIVENESS OF ENSEMBLE Cool Season 1.0 Ensemble members Ensemble mean Best combination of members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 19 INFORMATIVENESS OF ENSEMBLE Cool Season 1.0 Ensemble members Ensemble mean Best combination of members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 19 INFORMATIVENESS OF ENSEMBLE Cool Season 1.0 Ensemble members Ensemble mean Best combination of members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 19 INFORMATIVENESS OF ENSEMBLE + HR Cool Season 1.0 High resolution forecast (HR) HR + ensemble mean HR + members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 20 INFORMATIVENESS OF ENSEMBLE + HR Cool Season 1.0 High resolution forecast (HR) HR + ensemble mean HR + members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 20 INFORMATIVENESS OF ENSEMBLE + HR Cool Season 1.0 High resolution forecast (HR) HR + ensemble mean HR + members Informativeness Score (IS) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 12 60 108 156 204 252 300 348 Lead Time [h] 20 INFORMATIVENESS SCORES Lead Time 12 h 60 h 108 h 156 h 204 h 252 h 300 h 348 h Lead Time 12 h 60 h 108 h 156 h 204 h 252 h 300 h 348 h Mean 0.79 0.54 0.43 0.31 0.20 0.16 0.13 0.10 Mean 0.95 0.87 0.70 0.48 0.25 0.16 0.10 0.08 HR 0.90 0.72 0.48 0.29 0.13 0.08 0.02 0.02 Warm Season Mean + HR Members Members + HR 0.91 0.79 0.91 0.72 0.55 0.73 0.51 0.45 0.53 0.35 0.33 0.36 0.22 0.22 0.23 0.18 0.20 0.17 -0.14 0.08 -0.10 0.06 HR 0.97 0.89 0.67 0.43 0.22 0.08 0.03 0.01 Cool Season Mean + HR Members Members + HR 0.97 0.95 0.97 0.90 0.88 0.90 0.73 0.72 0.73 0.51 0.50 0.52 0.29 0.28 0.31 0.17 0.18 0.17 -0.14 0.14 -0.09 0.02 21 Rank by IS PROFILES OF MEMBER RANKS BY IS Warm Season 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 y12 y17 y14 y20 y1 y2 y10 y19 y9 y10 y5 y7 y2 y15 y20 y6 y11 y4 y3 y14 y18 y16 y7 y1 y15 y13 y13 y8 y4 y11 y8 y5 y6 y12 y16 y3 y17 y9 y19 y18 12 60 108 156 204 252 300 348 Lead Time [h] 22 Rank by IS PROFILES OF MEMBER RANKS BY IS Cool Season 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 y14 y9 y4 y4 y18 y16 y9 y7 y8 y10 y3 y15 y13 y19 y17 y17 y2 y14 y1 y5 y20 y3 y11 y1 y7 y20 y10 y8 y12 y2 y5 y6 y15 y13 y6 y18 y19 y11 y16 y12 12 60 108 156 204 252 300 348 Lead Time [h] 23 FREQUENCY OF MEMBER IN OPTIMAL COMBINATION Warm, Cool Season Lead Time: All Eight 7 6 Frequency 5 4 3 2 1 0 y1 y2 y3 y4 y5 y6 y7 y8 y9 y10 y11 y12 y13 y14 y15 y16 y17 y18 y19 y20 Ensemble Member 24 VALIDATION OF GAUSSIAN MODEL of (X |W = w) Linear 5 3 x = ens. mean Lead Time 60 h Warm Season N = 368 x = ens. mean Lead Time 60 h Warm Season N = 368 4 3 2 1 1 θ 2 0 0 E(X | w) = aw + b -1 Var(X | w) = σ a = 0.69 b = 0.47 σ = 0.65 IS = 0.54 2 -2 -3 -4 -5 -4 -3 -2 -1 0 1 2 3 4 -1 -2 -3 5 -5 w -4 -3 -2 -1 0 1 2 3 4 5 w Gaussian x = ens. mean Lead Time 60 h Warm Season N = 368 1.0 0.8 P(Θ ≤ θ ) x = ensemble mean 4 Homoscedastic 0.6 0.4 N(0, σ 2) σ = 0.65 0.2 0.0 25 -3 -2 -1 0 1 θ 2 3 4 VALIDATION OF GAUSSIAN MODEL of (X |W = w) Linear 5 3 x = ens. mean Lead Time 252 h Cool Season N = 334 x = ens. mean Lead Time 252 h Cool Season N = 334 4 3 2 1 1 θ 2 0 0 E(X | w) = aw + b -1 Var(X | w) = σ a = 0.27 b = 0.25 σ = 0.62 IS = 0.16 2 -2 -3 -4 -5 -4 -3 -2 -1 0 1 2 3 4 -1 -2 -3 5 -5 w -4 -3 -2 -1 0 1 2 3 4 5 w Gaussian x = ens. mean Lead Time 252 h Cool Season N = 334 1.0 0.8 P(Θ ≤ θ ) x = ensemble mean 4 Homoscedastic 0.6 0.4 N(0, σ 2) σ = 0.62 0.2 0.0 26 -3 -2 -1 0 1 θ 2 3 4 GAUSSIAN–GAMMA MODEL OF (X|T=t, W=w) X = aw + b + Θ Θ Θ independent of W, Gaussian dependent on T E (Θ | t ) = 0 2 Var (Θ | t ) = ν t E ( X | t , w) = aw + b Var ( X | t , w) = ν t 2 ⎛1⎞ ⎜ ⎟ ~ Gamma (α , β ) , ⎝T ⎠ ν = α ( β − 1) σ 2 2 α > 0 , β >1 unit variance Objective: Evaluate candidate statistics for T 27 PREDICTORS OF UNCERTAINTY Candidate Predictors (16) d – standard deviation (spread) r – range tp – credible interval (6), p = .9, .8, .7, .6, .5, .4 ck – coefficient of kurtosis – transformations and combinations (7) 2 Best Predictor T: max Cor (Θ , T ) Lead Time day 1 12 h 3 60 5 108 7 156 9 204 11 252 13 300 15 348 Warm Season T Cor 0.19 r 0.29 →r 0.25 r 0.07 r 0.26 r 0.20 r 0.11 t.7 0.16 t.5 Cool Season T Cor 0.23 r(ck-1) 0.11 t.9 0.19 t.9 0.17 r 0.07 ck-1 0.32 → r(ck-1) 0.15 ck-1 0.20 ck-1 (.00 for d) (.03 for d) (-.01 for d) (-.05 for d) 28 VALIDATION OF GAMMA MODEL of T Gamma Linear 1.0 12 0.8 10 E(Θ2|t) = ν2t 0.6 t = range Lead Time 60 h Warm Season N = 368 0.4 = 0.50 LSQ regression 6 Cor(Θ2,Τ ) = 0.29 4 Gamma (α, β ) α = 0.60 β = 3.00 0.2 ν2 = α (β−1)σ2 8 θ2 P(1/Τ ≤1/ t ) t = range Lead Time 60 h Warm Season N = 368 2 0.0 0 0 1 2 3 4 5 6 7 8 9 0.0 0.5 1.0 1.5 Independent 2.5 3.0 3.5 4.0 Constant - Heteroscedastic 2.5 Cor(Τ, W) = -0.08 2.0 1.5 1.0 0.5 t = range Lead Time 60 h Warm Season N = 368 6 5 x = ensemble mean t = range Lead Time 60 h Warm Season N = 368 3.0 t = range 2.0 t 1/ t 4 3 2 E(X | t ) = aM + b 1 0 1 Std. Dev. -1 2 Std. Dev. -2 -3 29 -4 0.0 -5 -4 -3 -2 -1 0 w 1 2 3 4 5 0.0 0.5 1.0 1.5 t 2.0 2.5 3.0 VALIDATION OF GAMMA MODEL of T Gamma Linear t = r (ck – 1) Lead Time 252 h Cool Season N = 334 7 1.0 6 0.8 0.6 t = r (ck – 1) Lead Time 252 h Cool Season N = 334 0.4 θ2 P(1/Τ ≤1/ t ) 5 ν2 = α (β−1)σ2 4 = 0.06 LSQ regression 3 Cor(Θ2,Τ ) = 0.32 2 Gamma (α, β ) α = 0.14 β = 2.07 0.2 E(Θ2|t) = ν2t 1 0.0 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 0 10 20 Independent 40 Constant - Heteroscedastic Cor(Τ, W) = -0.05 20 10 t = r (ck – 1) Lead Time 252 h Cool Season N = 334 6 5 x = ensemble mean t = r (ck – 1) Lead Time 252 h Cool Season N = 334 30 t = r (ck – 1) 30 t 1/ t 4 3 2 1 E(X | t ) = aM + b 0 1 Std. Dev. -1 -2 2 Std. Dev. -3 30 -4 0 -5 -4 -3 -2 -1 0 w 1 2 3 4 5 0 10 20 t 30 CONCLUSIONS 1. Ensemble of 20 members has 1-7 approx. sufficient statistics they vary with season and lead time 2. Predictors of Central Tendency Combination of 2-5 Members Mean 3. Predictors of Uncertainty Range Credible Interval Coefficient of Kurtosis – for sophisticated users – for mass processing – better than Std. Dev. (Spread) 4. Combining Ensemble with HR Forecast HR + Combination of 1-5 Members HR + Mean 31 GAUSSIAN–GAMMA BPE: Synopsis y = (y1, …, yJ) (x, t), x = (x1, …, xI), I < J Ensemble Forecast: Sufficient Statistics: I Aggregate Predictor of Central Tendency: Predictor of Uncertainty: t x = ∑ ci xi i =1 2 f ( x | t , w) ~ N ( aw + b , ν t) – 2 g ( w) ~ N ( M , S ) – Likelihood Function: Prior Density: Posterior Density: φ ( w | x, t ) – ~ N ( E (W | x, t ), Var (W | x, t )) aS 2 E (W | x, t ) = 2 2 a S +ν 2t Mν 2 t − abS 2 ci xi + ∑ a 2 S 2 +ν 2t i =1 I S 2ν 2 t Var (W | x, t ) = 2 2 a S +ν 2t linear in xi non-linear in t non-linear in t 32