New techniques for detection and adjustment of shifts in daily precipitation series Xiaolan L. Wang1,2, H. Chen3, Y. Wu2, Y. Feng1, and P. Qiang2 1. Climate Research Division, Science & Technology Branch, Environment Canada 2. Department of Mathematics & Statistics, York University, Toronto, Canada 3. Department of Mathematics & Statistics, Bowling Green State University, Ohio, USA J. Appl. Meteor. Climatol. (accepted) 11IMSC, Edinburgh, UK, 12-16 July 2010 Background information: Our recent studies (Wang et al. 2007, Wang 2008a,b) 1. Propose two penalized tests, PMT and PMF, to even out the distribution of false alarm rates 2. Extend these penalized tests to account for the first order autocorrelation (red noise) 3. Propose a stepwise testing algorithm for detecting multiple changepoints in a single series RHtestsV3 software package (R and FORTRAN; 220+ users from 55+ countries so far) … 1. PMTred algorithm k1 k2 - for detecting mean shifts in zero-trend series with independent or AR(1) Gaussian noise - for use with reference series k1 k2 2. PMFred algorithm - for detecting mean shifts in constant trend series with independent or AR(1) Gaussian noise - can be used without a reference series This study: 3. transPMFred algorithm for detecting changepoints - in non-zero daily precipitation series – typically non-Gaussian data - for use without a reference series * Quantile Matching (QM) algorithm for adjusting quantile-dependent artificial shifts * RHtests_dlyPrcp software package for homogenization of daily precipitation series The relevant model with Gaussian noise: The PMF (TPR3) model for constant trend series (Wang 2008a and 2003): t o t est H 0 : X i ti i against ? 1 i c 1 ti i , Ha : Xi 2 ti i , c 1 i N tc t - independent or AR(1) Gaussian noise tc - an unknown changepoint time Also applicable to TPR3b model (Solow 1987) for a trend-change without an accompanying mean shift: ? tc and TPR4 model (Lund & Reeves 2002) for a mean shift that may be accompanied by a trend-change ? tc The test statistic for an unknown changepoint is a maximal F, not regular F statistic, because of the need to search for the most probable point of change in a time series Precipitation is typically not normally distributed; daily precipitation is not a continuous variable! - Log transformation is often sufficient for monthly/annual total precipitation (Prcp) data series recommend use the RHtestsV3 functions to test a log-transformed monthly/annual Prcp series - Homogenization of daily precipitation data is much more challenging, and yet much needed for characterizing extremes Log-transformation is often not good enough; a data-adaptive transformation procedure is needed. - Integrate a Box-Cox transformation in the PMFred algorithm, developing the transPMFred algorithm & RHtests_dlyPrcp package for homogenization of daily precipitation data series alleviates the limitation of the assumption of normal distribution in the RHtestsV3 package Box-Cox transformation: X i h(Yi ; ) (Yi 1) / , 0 where Yi 0 (i 1,2,...,N ) is a series logYi , 0 of non-zero daily precipitation amounts Yi can be other positive values, e.g., non-zero wind speeds The gist of the transPMFred algorithm: - For a set of trial λ values, use the PMFred algorithm to test each transformed series Xi - Use a profile log-likelihood statistic to find the best λ for the series being tested A data-adaptive transformation, because different λ values (transformations) may be chosen for different series To assess detection power of the transPMFred transPMFred above 95% (nominal significance level: 5%) Consider daily precipitation of 5 different distribution types (i.e., of different λ values: -0.2, -0.1, 0.0, 0.1, 0.2) log-normal distribution below 70% For each distribution type (each λ): Block bootstrap 1000 surrogate series of N=600 from a homogeneous real precip. series whose λ is one of the five values ► False Alarm Rates (FARs) – apply the transPMFred to each of the homogenous surrogate series Results: FARs are around the nominal level (5% here) ► Hit Rates (HRs): Hit : kˆ [K 10, K 10] – insert, at a randomly chosen position, shorter upper tail longer upper tail HRs are all above 95% except for very small shifts transPMFred for shift size: one shift to each surrogate series of N=600 then apply the new and old methods to detect the inserted shift Results: hit rates as a function of λ value as a function of shift position K HRs are basically independent of K transPMFred for shift size: Quantile Matching (QM) algorithm – for adjusting quantile-dependent shifts, - regime dependent shifts - seasonality of shifts, e.g., … i.e. shifts that affect not only the mean, but also the entire distribution of the data. Site moves at an Australian station quantile-dependent shifts: Lord Howe Island daily Tmin Larger effects on low extremes de-seasonalized daily Tmin different variances Site moves in Jan 1955 and Dec 1988 Mean-adjusted daily Tmin QM-adjusted daily Tmin Largest diff in the lowest 10% of daily Tmin var. diff. remains Gist of QM adjustments – to match the distributions of different segments of the de-trended base series, i.e., to diminish differences in the distribution caused by non-climatic factors. to preserve in the QM-adjusted series the linear trend estimated from a multi-phase regression fit - important not to remove the natural trend! For daily precipitation, the QM adjustments are estimated this way: Precipitation trend component: Yˆi tr h1 ( Xˆ itr ; b ) Xˆ itr ˆti De-trended precipitation series: dtr tr tr tr Yˆi tr Yi (Yˆmax Yˆi ) 0 Yˆmax max1iN Yˆi tr Yˆmax Yˆi tr 0 Probability Distribution at Mq categories for each segment between-segment differences for each category & interpolate them by fitting splines (Mq=8 here): Adjust to Seg. 3 Adjustment if in Seg. 1 Do these for each value to be adjusted Seg. 1 Seg. 2 Adjustment if in Seg. 2 Seg. 3 Add these to the original series to make it homogeneous Empirical Cumulative Frequency of the value to be adjusted Different quantiles in the same segment could be adjusted differently Examples to show: 1. The proposed new algorithm works well in detecting changepoints in real daily P 2. Small P are harder to measure with accuracy than larger P (larger %error) – discontinuities often exist in freq. series of measured small P (e.g., P < 1 mm) 3. In the presence of frequency discontinuity, any adjustment derived from the measured daily P is not good. (e.g., ratio-based, Quantile-Matching) One must address the issue of freq. discontinuity first! The RHtestsV3 functions can be used to detect frequency discontinuities Examples of application Daily precipitation recorded at The Pas (Manitoba, Canada) for Jun 1st, 1910 to Dec 31st, 2007 - snowfall water equivalent; rainfall adjusted for wetting loses and gauge undercatch (Mekis & Hogg 1999; and updates by Mekis) - joining of two stns: 5052864 for up to 31 Dec. 1945, 5052880 1 Jan 1946 to 31 Dec. 2007 Before including trace precipitation amounts, we have two Prcp data series for this site: 1. not adjusted for joining (noT_naJ) 2. has been adjusted for joining (noT_aJ) Vincent & Mekis (2009): Ratio-based adjustments (used one rainfall ratio & one snowfall ratio for all data) Same three changepoints detected Both series have a very significant changepoint near the time of joining of stations noT_naJ Results for the two series not including trace amounts (noT series): 1. noT_naJ (closest to original measurements): 2. noT_aJ (aJ changed the mean shift size from -0.76 mm to -0.73 mm) transPMFred detected the same 3 changepoints: Type Date 1 4 Jul 1938 1 24 Oct 1946 1 4 Oct 1976 Documented date of change(s) 9 Oct 1937 to 8 Aug 1938: changes in gauge type, rim height, observing frequency; poor gauge condition reported on 9 Oct 1937 31 Dec 1945: joining of two nearby stations (5052864 + 5052880) 16 Oct 1975 to 18 Oct 1977: gauge type change (standard at 12” rim height to Type B at 16” rim height) Changes in the min. measurable amount (precision, unit) -0.76 mm 1937-38 1945-46 joining 1976-77 The ratio-based adjustments for station joining failed to homogenize the series, because … The discontinuities are mainly in the measurements of small precipitation (P ≤ 3 mm), especially in the frequency of measured small precipitation: Series of daily P > 3 mm – homogeneous! 0.21 mm from SWE Much fewer 0.5 ~ 1 mm until 1937 noT_naJ noT_naJ > 3mm Much fewer No P < 0.3 mm or 0.3 ~ 0.4 mm 0.4 < P ≤ 0.5 mm until 1945 -joining point before 1976 Any ratio-based adjustments for joining are not good in this case, because larger P are adjusted more than smaller P when they should not be adjusted at all! The above frequency discontinuities largely remain: noT_aJ noT_naJ > 3mm Homogenization of daily precipitation series – very challenging!! a) Ratio-based adjustments – bad in the presence of frequency discontinuity We also tried b) IBC adjustments the Inverse Box-Cox (IBC) transformation of the fitted multi-phase regression lines wT_naJ Seg. 1 Happy? – No! Because large P are adjusted similarly, while they should not be adjusted at all Seg. 2 homogeneous wT_naJ c) QM adjustments e.g., Quartile-Matching: (4 categories) inhomogeneous Seg. 1 Seg. 2 Seg. 3 Adjust to last Seg. - Seg. 3 This is worse than the simple IBC adjustments! - still inhomogeneous; - larger absolute adjustments made to larger P Quantile matching algorithms would work only if there is no discontinuity in the frequency, because they line up the adjustments by empirical frequency, implicitly assuming homogeneous frequencies. they should be used after all frequency discontinuities have been diminished! How to address the issue of frequency discontinuity? Apply a homogeneity test to the frequency series, and homogenize the series if necessary, e.g.: The frequency of reported trace occurrence at station The Pas is not homogeneous! noT_naJ No trend 1955-56 PMFred algorithm 1945-46 Adding a trace amount for T-flagged days is not good enough in this case Need to address the issue of frequency discontinuity! But how? Flag more days with T? – which dates to flag? Needs obs’ of other variables, such as cloud, humidity… In spite of the uncertainty in the date of trace Prcp, adding days of a trace amount in the series would help obtain more accurate adjustments for other discontinuities using quantile-matching At least, monthly and annual total Prcp can be adjusted to account for the frequency discontinuities, e.g., adjust the total trace amount in each month to that month’s current trace amount when no trend in trace frequency Concluding remarks - the new method, transPMFred, works well for both simulated and real daily precipitation data - Homogenization of precipitation data, especially daily P, is very challenging would recommend: 1) use transPMFred to test series of P > Pmin with different Pmin values (e.g. 0.0, 0.3 mm, 0.4 mm, 0.5 mm, 1.0 mm, …) should reflect changes in measurement precision/unit 2) also test the frequency series of zero P and small P (e.g. Trace, ≤0.3 mm, 0.3-0.5 mm, …) (e.g., using the PMFred algorithm) In the presence of frequency discontinuity, any adjustment derived from the measured daily P is not good, no matter how it was derived! One must address the issue of frequency discontinuity before doing any adjustment (incl. QM)! Shall aim to get better insight into the cause (metadata) and characteristics of discontinuity (e.g., freq.) before any attempt to adjust daily precipitation data! Thank you very much for your attention! Questions and/or comments? The RHtestsV3 and RHtests_dlyPrcp software packages are available free of charge at http://cccma.seos.uvic.ca/ETCCDMI/software.shtml - used by WMO ETCCDI in 12 training workshops so far (Expert Team on Climate Change Detection and Indices) References: Wang, X. L., H. Chen, Y. Wu, Y. Feng, and Q. Pu, 2010: New Techniques for detection and adjustment of shifts in daily precipitation data series. J. App. Meteor. Climatol, accepted subject to revision. Wang, X. L., 2008a: Penalized maximal F test for detecting undocumented mean-shift without trend change. J. Atmos. Oceanic Technol., 25 (No. 3), 368-384. DOI:10.1175/2007/JTECHA982.1 Wang, X. L., 2008b: Accounting for autocorrelation in detecting mean-shifts in climate data series using the penalized maximal t or F test. J. App. Meteor. Climatol, 47, 2423–2444. Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized Maximal t-test for Detecting Undocumented Mean Change in Climate Data Series. J. App. Meteor. Climatol., 46 (No. 6), 916-931. DOI:10.1175/JAM2504.1 Wan, H., X. L. Wang, and V. R. Swail, 2007: A Quality Assurance System for Canadian Hourly Pressure Data. J. App. Meteor. Climatol., 46 (No. 11), 1804-1817.