Quality control and homogenization of the COST benchmark dataset

advertisement
Quality control and
homogenization of the COST
benchmark dataset
Petr Štěpánek
Pavel Zahradníček
Czech Hydrometeorological Institute, regional office Brno
e-mail: [email protected]
[email protected]
Processing before any data analysis
Software
AnClim,
ProClimDB
Data Quality Control
Finding Outliers
Two main approaches:

Using limits derived from interquartile ranges
(time series)
1 0 .0
8 .0
6 .0
4 .0
2 .0
0 .0
- 2 .0
- 4 .0
1950

1955
1960
1965
1970
1975
1980
1985
1990
1995
comparing values to values of neighbouring
stations (spatial analysis)
2000
Creating Reference Series


for monthly data
weighted/unweighted mean from neighbouring stations

Power of weight is 1 for temperature (1/d) and 3 for
precipitation (1/d3) - IDW

criterions used for stations selection
(or combination of it):

best correlated / nearest neighbours
(correlations – from the first differenced series)




limit correlation, limit distance
limit difference in altitudes
neighbouring stations series should be
standardized to test series
AVG and / or STD/ Atlitude
Comparison with „expected“ value –
(calculated as weighted mean
from standardized neighbours values)
Example:
Proposed list of stations used for creating reference series
„Outliers“ temperature sur1, network 1
• detected 12 „outliers“
• 10 errors for station 150 (5 in
year 1909)
• Mean difference between
measured outliers and expect
value is about 6°C
„Outliers“ precipitation sur1, network 1
• detected 8 „outliers“
•Mean difference between
measured outliers and expect
value is about 180 mm
• Max difference is 313 mm
(station 4307012, 8/1971)
Monthly, Seasonal and
Annual Averages
Data Processing
Quality Control - Outliers
Interquartile Range
Comparing to Neighbours
Combining Near Stations
Homogeneity Testing
Alexandersson test
Bivariate Test
t-test
Mann-Whitney-Pettit
Reference Series
from Correlations
Several
Iterations
from Distances
Hom. Assessment
Probability
Adjusting Data
Filling Miss. Values
Months, seasons,
year
Creating Reference Series



for monthly,
weighted/unweighted mean from neighbouring
stations
criterions used for stations selection (or
combination of it):

best correlated / nearest neighbours
(correlations – from the first differenced series)



limit correlation, limit distance
limit difference in altitudes
neighbouring stations series
should be standardized to test series
AVG and / or STD
(temperature - elevation, precipitation - variance)
- missing data are not so big problem then
Relative homogeneity testing



Test series – 40 years
Longer series – divide to the more section with
overlay 10 years
Tests: SNHT, Bivarite, t-test
Example of the detected breaks – temperature, sur1, network 1
- Detected 63 breaks
Station no. 50, break 1928
Station no. 50, break 1975
Test and reference series
Difference between test and reference series
Test statistics
Station no. 100, break 1983
Example of the detected breaks – precipitation, sur1, network 1
- Detected 10 breaks
Station no. 4309900, break 1909
Station no. 4311803, break 1991
Adjusting monthly data




using reference series based on distance
Power of weight is 0.5 for temperature and 1 for
precipitation
adjustment: from differences/ratios 20 years before
and after a change, monhtly
smoothing monthly adjustments (low-pass filter
for adjacent values)
Station no. 100, break 1983
Station no. 50, break 1928
2,000
0,000
1,800
-0,100
1,600
I
II
III
IV
V
VI
VII
VIII
IX
-0,200
1,400
-0,300
1,200
1,000
-0,400
0,800
-0,500
0,600
-0,600
0,400
-0,700
0,200
-0,800
0,000
I
II
III
IV
V
VI
ADJ
VII
VIII
IX
ADJ SMOOTH
X
XI
XII
-0,900
ADJ
ADJ SMOOTH
X
XI
XII
Adjusting values –
evaluation
•After adjust must
correlation increase – if
not, the series is not
adjust
Temperature
Precipitation
Absolute values of adjustment for
temperature, surg1, network 1
Iterative homogeneity testing

several iteration of testing and results
evaluation


several iterations of homogeneity testing and
series adjusting (3 iterations should be sufficient)
question of homogeneity of reference series is
thus solved:


possible inhomogeneities should be eliminated by using
averages of several neighbouring stations
if this is not true: in next iteration neighbours should be
already homogenized
Example – homogenized temperature series
Station no. 50
14
raw data
homogenized
T[°C]
13
12
11
10
1900 1906 1912 1918 1924 1930 1936 1942 1948 1954 1960 1966 1972 1978 1984 1990 1996
Station no. 100
12
raw data
homogenized
11
T[°C]
10
9
8
7
1900 1906 1912 1918 1924 1930 1936 1942 1948 1954 1960 1966 1972 1978 1984 1990 1996
Example – homogenized precipitation series
Station no. 4309900, break 1909
1800
raw data
homogenized
1600
T[°C]
1400
1200
1000
800
600
1900 1906 1912 1918 1924 1930 1936 1942 1948 1954 1960 1966 1972 1978 1984 1990 1996
Station no. 4311803, break 1991
1400
raw data
homogenized
1200
T[°C]
1000
800
600
400
1900 1906 1912 1918 1924 1930 1936 1942 1948 1954 1960 1966 1972 1978 1984 1990 1996
http://www.climahom.eu
Download
Related flashcards
Create Flashcards