Some Statistical Issues in Climate Science Research PSW Climate Sciences Workshop, Oct. 15-16, 2003, Albany by Haiganoush K. Preisler Environmental Statistics Unit Common Data Types in Climate Research 1) Time Series: Y(t) (a) unequispaced series 80 70 60 Temperature 90 (most time series methods are for equispaced data) ||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||| 160 180 200 220 Time 240 260 Data from deep sea cores measured at various depths: Y(τ(d)) (b) How to estimate τ, the depth-time association, mapping? (c) Finding relationships between two time series 2) Extreme events Highest Recorded Temperatures Degrees Fahrenheit Degrees Celsius Place Date World (Africa) El Azizia, Libya Sept. 13, 1922 136 58 North America (U.S.) Death Valley, Calif. July 10, 1913 134 57 Asia Tirat Tsvi, Israel Dead Sea, Palestine June 21, 1942 129 54 Australia Cloncurry, Queensland Jan. 16, 1889 128 53 Europe Seville, Spain Aug. 4, 1881 122 50 South America Rivadavia, Argentina Dec. 11, 1905 120 49 Canada Midale and Yellow Grass, Saskatchewan, Canada July 5, 1937 113 45 April 29, 1912 108 45.6 Aug. 5, 1924 96 36 Jan. 5, 1974 59 15 Dec. 27, 1978 7.5 –14 • Most statistical Tuguegarao, Philippines methods are for Oceania Persian Gulf (sea-surface) means. Antarctica Vanda Station, Scott Coast • Poor prediction South Pole of extremes. • Statistics of extremes (Generalized extreme value distribution, Pareto, truncated Pareto, etc.) 3) Spatial-temporal data • Statistical downscaling and interpolation problems • Spatial-temporal processes - examples a) Fire occurrence data : N(x,y,t) point process (=infinite series of random variables) b) Size or cost of fire: A(x,y,t) marked point process c) Height of tree : H(x,y,t) marked point process An example from fire prediction/risk Task: Characterize relationships (if any) between weather/fuel variables and probability of fire occurrence (and large fires) Goal: Short and long-term predictions of fire season severity • Fire occurrence is a spatial-temporal point process model Assume a conditional intensity function λ ( x, y, t | θ ) = Pr ob{dN ( x, y, t ) = 1 | H t } / dxdydt N = number of fires in (x,x+dx] x (y,y+dy] x (t,t+dt] Ht = history up to time t • Discrete Case N = number of fires in (x,x+∆x] x (y,y+∆y] x (t,t+∆t] (∆x, ∆y, ∆t) = km x km x day voxel (volume-pixel) λ = probability of fire in a km^2 area at (x,y) on day=day Data • number of fires in Zone W2 • weather, fuel, topography at ??? • values at nearest weather station? (elevation, slope, fuel type, temperature, …) • Work at km x km x day voxel level • Add estimated probabilities at voxel level to arrive at estimates for number of fires per zone per week (per month etc.) Explanatory variables: 1. At all voxels with an observed fire occurrence. 2. At all voxels with no observed fire. 300 200 0 100 Number of fires 400 500 Historic weekly number of fires in Oregon Federal lands 1990 1992 1994 Oregon: federal lands, 8 years data = 578,192,400 voxels 1996 Year 15,786 = 0.0027% with fire Sample (S) Include all voxels with fire and a sample (π%) of voxel with no fire. S: for Oregon 8 years 15,786 voxels with fire 58,094 voxels with no fires (π = .012%) λk = Pr ob{N k = 1 | H t } γ k = Pr ob{N k = 1 | H t , k ∈ S} How to relate γk (a parameter estimable from the sampled data) to λk (the parameter of interest). 300 200 100 0 Km 400 500 600 Probability of fire occurrence Estimated odds ratios relative to 8 year average 0 200 400 600 Km Some results from the fire/risk problem Conditional probability of large fire (> 100 acres) 500 600 Estimated odds ratios relative to 8 year average 300 1 100 200 1 1 1 0 Km 400 1 0 1 200 400 Km 600 0.6 0.5 0.4 0.3 0.2 Estimated Mutual Information 0.7 Probability of fire occurrence Fuel DBT RH BI ER SC FP TH ST F Model ST : spatial –temporal (ie, lat, long, elevation, and day in year) F : Full Model includes lat, long, elevation, day in year, TH, RH, DBT Summary • Many ‘interesting’ statistical issues • Identify areas where further statistical research is needed • Collaboration between climate research scientists and statisticians