Introduction to homogeneity and quality control software

advertisement
TUESDAY STREAM 2:
Quality control and homogeneity
Or, was the data entered correctly and does it reflect climate?
Nicholas Herold
ET-SCI workshop
Tuesday 8th December
Nadi, Fiji
Workflow for calculating extremes indices
Observations
Quality Control
Homogenisation
Time-series of daily tmin,
tmax and/or precipitation.
Formatted according to ETSCI specifications.
Check for basic input errors,
such as rounding bias or
artificially extreme values.
Conducted in ClimPACT2.
Better decision
making & Science!!
Sector analysis
Calculate indices
Compare calculated indices
with sector data (e.g. water,
health, agriculture) to
identify meaningful
relationships between them.
Use ClimPACT2 to calculate
the ET-SCI indices. Creates
.csv and .jpg files for over 50
climate indices.
Check for change points in
time-series and determine
whether they are natural or
artificial, and subsequently
correct. RHtests is one tool
to do this.
Introduction to ClimPACT2 and it’s
Quality Control functionality
ClimPACT2 calculates ET-SCI indices
●
●
●
●
Software package developed at UNSW using the R statistical programming language.
R and ClimPACT2 available for free.
Both also run on Linux, Windows
and MacOS.
Similar to RClimdex from past workshops.
https://www.r-project.org/
https://github.com/ARCCSS-extremes/climpact2
Input data required by ClimPACT2
●
Daily values of precipitation, maximum temperature and minimum temperature
year month day precip tmax tmin
Output from ClimPACT2
●
With all three variables ClimPACT2 will produce over 140 files, including;
○
○
○
○
Plots of each index over time
Spreadsheet .csv files storing the data for each index
Trend and threshold information
Various diagnostic files allowing the user to quality control their data
ClimPACT2 QC functions by Enric Aguilar and Marc Prohom
Numerous files are produced in /extraqc subdirectory.
These identify potential issues relating to:
●
●
●
●
●
Rounding biases
Unphysical values
Unusually large values
Runs of the same value
Unusually large jumps between time-steps
Quality control examples: outliers 1
50
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
75th percentile
40
Median
30
25th percentile
20
10
outliers recorded
in text file
Quality control examples: outliers 2
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Quality control examples: Rounding biases
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Frequency of decimal values
people have been
rounding to .0 and
.5!!
Quality control examples: Duplicates
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Identifies duplicated
dates
Quality control examples: Outliers
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Outliers identified in
box and whisker plots
recorded below
Quality control examples: min T > max T
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Identifies where
minimum temperature >
maximum temperature
Quality control examples: Runs of the same value
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Identifies runs of 4 or
more of the same value
Quality control examples: Large values
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Identifies when
temperature > 50 degrees
or precipitation > 200 mm
Quality control examples: Jumps
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Identifies when |Δt| >= 20
degrees
Quality control examples: more outliers!
/extraqc
3 PDF files:
– Nadi_boxes.pdf
– Nadi_boxseries.pdf
– Nadi_rounding.pdf
8 text files:
– Nadi_duplicates.txt
– Nadi_outliers.txt
– Nadi_tmaxmin.txt
– Nadi_tx_flatline.txt
– Nadi_tn_flatline.txt
– Nadi_toolarge.txt
– Nadi_tx_jumps.txt
– Nadi_tn_jumps.txt
1 csv file:
– Nadi_temp_stddev_QC.csv
Identifies dates where min
or max temperature is over
n standard deviations from
the mean.
CliDE can help with QC
CliDE can help with QC
Hands on time
Actions to take with erroneous data
1.
2.
3.
Run ClimPACT2 “STEP. 1” and examine each file in the /extraqc subdirectory
Make a copy of your input file that will contain corrected values (e.g. Nadi.txt to Nadi.QC.txt)
In Nadi.QC.txt, make the following changes:
a. Change any negative precipitation values found to -99.9
b. For any dates where tmin > tmax, change both tmin and tmax to -99.9
c.
For temperatures identified as outliers or as runs of identical values, many values may be
valid, change any values that are obviously wrong to -99.9. If you are unsure;
i. Does the date correspond to a known weather extreme?
ii. Do nearby stations record similar extremes on that date?
iii.
Do other errors occur on that date (e.g. tmin > tmax)? If so, it is quite possible that all
information for that date is erroneous.
IMPORTANT: keep a copy of both Nadi.txt and Nadi.QC.txt. In a spreadsheet, keep a
note of each date that requires a change and record the reason for the change (e.g.
negative precipitation). This will be important for determining consistency of treatment
between stations.
Now on to homogenisation
Observations
Quality Control
Homogenisation
Time-series of daily tmin,
tmax and/or precipitation.
Formatted according to ETSCI specifications.
Check for basic input errors,
such as rounding bias or
artificially extreme values.
Conducted in ClimPACT2.
Better decision
making & Science!!
Sector analysis
Calculate indices
Compare calculated indices
with sector data (e.g. water,
health, agriculture) to
identify meaningful
relationships between them.
Use ClimPACT2 to calculate
the ET-SCI indices. Creates
.csv and .jpg files for over 50
climate indices.
Check for change points in
time-series and determine
whether they are natural or
artificial, and subsequently
correct. RHtests is one tool
to do this.
Homogeneity checking with RHtests
What is RHtests?
R software tool to identify inhomogeneities.
Who developed it?
Xiaolan Wang and Yang Feng at Environment Canada. http:
//etccdi.pacificclimate.org/software.shtml
Homogenisation is difficult and time consuming.
Should be done at home base with appropriate
metadata.
RHtests converts ClimPACT2 file to multiple inputs
Africa.txt
Africa_LogprcpMLY1mm.txt
Africa_LogprcpMLY.txt
Africa_tminDLY.txt
Africa_tminMLY.txt
Africa_prcpDLY.txt
Africa_prcpMLY1mm.txt
Africa_prcpMLY.txt
Africa_tmaxDLY.txt
Africa_tmaxMLY.txt
Log transformed monthly precip >= 1mm
Log transformed monthly precip
Daily minimum temperature
Monthly minimum temperature
Daily precipitation
Monthly precipitation >= 1mm
Monthly precipitation
Daily maximum temperature
Monthly maximum temperature
RHtests produces plots of change points in *U.pdf
Monthly Tmax is
homogenous
Monthly Tmin has
3 change points
RHtests stores change point information in *mCs.txt
Change point type (1 or 0)
0: change points that could be significant
only if supported by metadata
1: change points that are significant even
without a reference site
Date
confidence
level
test statistic
95% confidence
interval of test
statistic
Whether change
point is significant
RHtests stores final data in *U.dat
Original data
Date
Linear trend
w/mean shift
Mean adjusted
base series
multi-phase
regression fit
base anomaly
series
mean annual cycle
with linear trend and
mean shifts
QM adjusted
base series
What to do with flagged data?
There are two options:
1.
When data is WRONG (i.e. unrealistic), make the data
-99.9.
2.
For suspicious data it is subjective and you may want to
contact your home institutions to see whether, for
example, extremely high temperatures occurred on a
particular day.
Quality control (QC) with ClimPACT2
Download