frequency discontinuity

advertisement
New techniques for detection and adjustment of shifts in
daily precipitation series
Xiaolan L. Wang1,2, H. Chen3, Y. Wu2, Y. Feng1, and P. Qiang2
1. Climate Research Division, Science & Technology Branch, Environment Canada
2. Department of Mathematics & Statistics, York University, Toronto, Canada
3. Department of Mathematics & Statistics, Bowling Green State University, Ohio, USA
J. Appl. Meteor. Climatol. (accepted)
11IMSC, Edinburgh, UK, 12-16 July 2010
Background information:
Our recent studies (Wang et al. 2007, Wang 2008a,b)
1. Propose two penalized tests, PMT and PMF, to even out the distribution of false alarm rates
2. Extend these penalized tests to account for the first order autocorrelation (red noise)
3. Propose a stepwise testing algorithm for detecting multiple changepoints in a single series
RHtestsV3 software package (R and FORTRAN; 220+ users from 55+ countries so far)
…
1. PMTred algorithm
k1 k2
- for detecting mean shifts in zero-trend series with independent or AR(1) Gaussian noise
- for use with reference series
k1
k2
2. PMFred algorithm
- for detecting mean shifts in constant trend series with independent or AR(1) Gaussian noise
- can be used without a reference series
This study: 3. transPMFred algorithm for detecting changepoints
- in non-zero daily precipitation series – typically non-Gaussian data
- for use without a reference series
* Quantile Matching (QM) algorithm for adjusting quantile-dependent artificial shifts
* RHtests_dlyPrcp software package for homogenization of daily precipitation series
The relevant model with Gaussian noise:
The PMF (TPR3) model for constant trend series (Wang 2008a and 2003):
t o t est H 0 : X i     ti   i
against
?
1 i  c
 1   ti   i ,
Ha : Xi  
 2   ti   i , c  1  i  N
tc
 t - independent or AR(1) Gaussian noise
tc - an unknown changepoint time
Also applicable to TPR3b model (Solow 1987) for a trend-change without an accompanying mean shift: ?
tc
and TPR4 model (Lund & Reeves 2002) for a mean shift that may be accompanied by a trend-change ?
tc
The test statistic for an unknown changepoint is a maximal F, not regular F statistic,
because of the need to search for the most probable point of change in a time series
Precipitation is typically not normally distributed; daily precipitation is not a continuous variable!
- Log transformation is often sufficient for monthly/annual total precipitation (Prcp) data series
 recommend use the RHtestsV3 functions to test a log-transformed monthly/annual Prcp series
- Homogenization of daily precipitation data is much more challenging, and yet much needed for
characterizing extremes
Log-transformation is often not good enough; a data-adaptive transformation procedure is needed.
- Integrate a Box-Cox transformation in the PMFred algorithm, developing the transPMFred algorithm
& RHtests_dlyPrcp package for homogenization of daily precipitation data series
 alleviates the limitation of the assumption of normal distribution in the RHtestsV3 package
 
Box-Cox transformation: X i  h(Yi ;  )  (Yi  1) /  ,   0 where Yi  0 (i  1,2,...,N ) is a series
 logYi ,   0
of non-zero daily precipitation amounts
Yi can be other positive values, e.g., non-zero wind speeds
The gist of the transPMFred algorithm:
- For a set of trial λ values, use the PMFred algorithm to test each transformed series Xi
- Use a profile log-likelihood statistic to find the best λ for the series being tested
A data-adaptive transformation, because different λ values (transformations) may be chosen for different series
To assess detection power of the transPMFred
transPMFred
above 95%
(nominal significance level: 5%)
Consider daily precipitation of 5 different distribution types
(i.e., of different λ values: -0.2, -0.1, 0.0, 0.1, 0.2)
log-normal distribution
below 70%
For each distribution type (each λ):
Block bootstrap  1000 surrogate series of N=600 from
a homogeneous real precip. series
whose λ is one of the five values
► False Alarm Rates (FARs) – apply the transPMFred
to each of the homogenous surrogate series
Results: FARs are around the nominal level (5% here)
► Hit Rates (HRs): Hit : kˆ [K 10, K 10]
– insert, at a randomly chosen position,
shorter upper tail
longer upper tail
HRs are all above 95%
except for very small shifts
transPMFred for shift size:
one shift to each surrogate series of N=600
then apply the new and old methods to detect
the inserted shift
Results: hit rates as a function of λ value
as a function of shift position K 
HRs are basically independent of K
transPMFred
for shift size:
Quantile Matching (QM) algorithm – for adjusting quantile-dependent shifts,
- regime dependent shifts
- seasonality of shifts, e.g., …
i.e. shifts that affect not only the mean, but also the entire distribution of the data.
Site moves at an Australian station  quantile-dependent shifts:
Lord Howe Island daily Tmin
Larger effects on
low extremes 
de-seasonalized daily Tmin
different variances
Site moves in Jan 1955 and Dec 1988
Mean-adjusted daily Tmin
QM-adjusted daily Tmin
Largest diff in the lowest 10% of daily Tmin
var. diff. remains
Gist of QM adjustments – to match the distributions of different segments of the de-trended base series,
i.e., to diminish differences in the distribution caused by non-climatic factors.
to preserve in the QM-adjusted series the linear trend estimated from a multi-phase regression fit
- important not to remove the natural trend!
For daily precipitation, the QM adjustments are estimated this way:
Precipitation trend component: Yˆi tr  h1 ( Xˆ itr ; b )  Xˆ itr  ˆti
De-trended precipitation series: dtr
tr
tr
tr
Yˆi
tr
 Yi  (Yˆmax  Yˆi )  0 Yˆmax  max1iN Yˆi tr  Yˆmax
 Yˆi tr  0
Probability Distribution at Mq categories for each segment
between-segment differences for each category
& interpolate them by fitting splines (Mq=8 here):
Adjust to Seg. 3
Adjustment if in Seg. 1
Do these
for each value
to be adjusted
Seg. 1
Seg. 2
Adjustment if in Seg. 2
Seg. 3
Add these to the
original series to
make it homogeneous
Empirical Cumulative Frequency
of the value to be adjusted
Different quantiles in
the same segment could
be adjusted differently
Examples to show:
1. The proposed new algorithm works well in detecting changepoints in real daily P
2. Small P are harder to measure with accuracy than larger P (larger %error)
– discontinuities often exist in freq. series of measured small P (e.g., P < 1 mm)
3. In the presence of frequency discontinuity,
any adjustment derived from the measured daily P is not good.
(e.g., ratio-based, Quantile-Matching)
One must address the issue of freq. discontinuity first!
The RHtestsV3 functions can be used to detect frequency discontinuities
Examples of application
Daily precipitation recorded at The Pas (Manitoba, Canada) for Jun 1st, 1910 to Dec 31st, 2007
- snowfall  water equivalent; rainfall adjusted for wetting loses and gauge undercatch
(Mekis & Hogg 1999; and updates by Mekis)
- joining of two stns: 5052864 for up to 31 Dec. 1945, 5052880 1 Jan 1946 to 31 Dec. 2007
Before including trace precipitation amounts, we have two Prcp data series for this site:
1. not adjusted for joining (noT_naJ)
2. has been adjusted for joining (noT_aJ)
Vincent & Mekis (2009):
Ratio-based adjustments
(used one rainfall ratio &
one snowfall ratio for all data)
Same three changepoints detected
Both series have a very significant
changepoint near the time of joining of stations
noT_naJ
Results for the two series not including trace amounts
(noT series):
1. noT_naJ
(closest to original measurements):
2. noT_aJ
(aJ changed the mean shift size
from -0.76 mm to -0.73 mm)
transPMFred detected the same 3 changepoints:
Type
Date
1 4 Jul 1938
1 24 Oct 1946
1
4 Oct 1976
Documented date of change(s)
9 Oct 1937 to 8 Aug 1938:
changes in gauge type, rim
height, observing frequency;
poor gauge condition reported
on 9 Oct 1937
31 Dec 1945: joining of two nearby
stations (5052864 + 5052880)
16 Oct 1975 to 18 Oct 1977:
gauge type change
(standard at 12” rim height
to Type B at 16” rim height)
Changes in the min. measurable amount
(precision, unit)
-0.76 mm
1937-38
1945-46
joining
1976-77
The ratio-based adjustments for station joining failed to homogenize the series, because …
The discontinuities are mainly in the measurements of small precipitation (P ≤ 3 mm), especially in
the frequency of measured small precipitation:
Series of daily P > 3 mm – homogeneous!
0.21 mm from SWE
Much fewer
0.5 ~ 1 mm
until 1937
noT_naJ
noT_naJ > 3mm
Much fewer
No P < 0.3 mm or
0.3 ~ 0.4 mm
0.4 < P ≤ 0.5 mm
until 1945 -joining point
before 1976
Any ratio-based adjustments for joining are not good
in this case, because larger P are adjusted more than
smaller P when they should not be adjusted at all!
The above frequency discontinuities largely remain:
noT_aJ
noT_naJ > 3mm
Homogenization of daily precipitation series – very challenging!!
a) Ratio-based adjustments – bad in the presence of frequency discontinuity
We also tried
b) IBC adjustments  the Inverse Box-Cox (IBC) transformation of the fitted multi-phase regression lines
wT_naJ
Seg. 1
Happy? – No!
Because large P are
adjusted similarly,
while they should not
be adjusted at all
Seg. 2
homogeneous
wT_naJ
c) QM adjustments
e.g., Quartile-Matching:
(4 categories)
inhomogeneous
Seg. 1
Seg. 2
Seg. 3
Adjust to last Seg. - Seg. 3
This is worse than the simple IBC adjustments!
- still inhomogeneous;
- larger absolute adjustments made to larger P
Quantile matching algorithms would work only if there is no discontinuity in the frequency, because
they line up the adjustments by empirical frequency, implicitly assuming homogeneous frequencies.
 they should be used after all frequency discontinuities have been diminished!
How to address the issue of frequency discontinuity?
Apply a homogeneity test to the frequency series, and homogenize the series if necessary, e.g.:
The frequency of reported trace occurrence at station The Pas is not homogeneous!
noT_naJ
No trend
1955-56
PMFred algorithm
1945-46
Adding a trace amount
for T-flagged days is
not good enough
in this case
Need to address the issue
of frequency discontinuity!
But how?
Flag more days with T? – which dates to flag? Needs obs’ of other variables, such as cloud, humidity…
In spite of the uncertainty in the date of trace Prcp, adding days of a trace amount in the series
would help obtain more accurate adjustments for other discontinuities using quantile-matching
At least, monthly and annual total Prcp can be adjusted to account for the frequency discontinuities,
e.g., adjust the total trace amount in each month to that month’s current trace amount when
no trend in trace frequency
Concluding remarks
- the new method, transPMFred, works well for both simulated and real daily precipitation data
- Homogenization of precipitation data, especially daily P, is very challenging
would recommend: 1) use transPMFred to test series of P > Pmin with different Pmin values
(e.g. 0.0, 0.3 mm, 0.4 mm, 0.5 mm, 1.0 mm, …)
should reflect changes in
measurement precision/unit
2) also test the frequency series of zero P and small P (e.g. Trace, ≤0.3 mm, 0.3-0.5 mm, …)
(e.g., using the PMFred algorithm)
In the presence of frequency discontinuity,
any adjustment derived from the measured daily P is not good, no matter how it was derived!
One must address the issue of frequency discontinuity before doing any adjustment (incl. QM)!
Shall aim to get better insight into the cause (metadata) and characteristics of discontinuity
(e.g., freq.) before any attempt to adjust daily precipitation data!
Thank you very much for your attention!
Questions and/or comments?
The RHtestsV3 and RHtests_dlyPrcp software packages are available free of charge at
http://cccma.seos.uvic.ca/ETCCDMI/software.shtml
- used by WMO ETCCDI in 12 training workshops so far
(Expert Team on Climate Change Detection and Indices)
References:
Wang, X. L., H. Chen, Y. Wu, Y. Feng, and Q. Pu, 2010: New Techniques for detection and adjustment of shifts
in daily precipitation data series. J. App. Meteor. Climatol, accepted subject to revision.
Wang, X. L., 2008a: Penalized maximal F test for detecting undocumented mean-shift without trend change.
J. Atmos. Oceanic Technol., 25 (No. 3), 368-384. DOI:10.1175/2007/JTECHA982.1
Wang, X. L., 2008b: Accounting for autocorrelation in detecting mean-shifts in climate data series
using the penalized maximal t or F test. J. App. Meteor. Climatol, 47, 2423–2444.
Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized Maximal t-test for Detecting Undocumented Mean Change
in Climate Data Series. J. App. Meteor. Climatol., 46 (No. 6), 916-931. DOI:10.1175/JAM2504.1
Wan, H., X. L. Wang, and V. R. Swail, 2007: A Quality Assurance System for Canadian Hourly Pressure Data.
J. App. Meteor. Climatol., 46 (No. 11), 1804-1817.
Download