TUESDAY STREAM 2: Quality control and homogeneity Or, was the data entered correctly and does it reflect climate? Nicholas Herold ET-SCI workshop Tuesday 8th December Nadi, Fiji Workflow for calculating extremes indices Observations Quality Control Homogenisation Time-series of daily tmin, tmax and/or precipitation. Formatted according to ETSCI specifications. Check for basic input errors, such as rounding bias or artificially extreme values. Conducted in ClimPACT2. Better decision making & Science!! Sector analysis Calculate indices Compare calculated indices with sector data (e.g. water, health, agriculture) to identify meaningful relationships between them. Use ClimPACT2 to calculate the ET-SCI indices. Creates .csv and .jpg files for over 50 climate indices. Check for change points in time-series and determine whether they are natural or artificial, and subsequently correct. RHtests is one tool to do this. Introduction to ClimPACT2 and it’s Quality Control functionality ClimPACT2 calculates ET-SCI indices ● ● ● ● Software package developed at UNSW using the R statistical programming language. R and ClimPACT2 available for free. Both also run on Linux, Windows and MacOS. Similar to RClimdex from past workshops. https://www.r-project.org/ https://github.com/ARCCSS-extremes/climpact2 Input data required by ClimPACT2 ● Daily values of precipitation, maximum temperature and minimum temperature year month day precip tmax tmin Output from ClimPACT2 ● With all three variables ClimPACT2 will produce over 140 files, including; ○ ○ ○ ○ Plots of each index over time Spreadsheet .csv files storing the data for each index Trend and threshold information Various diagnostic files allowing the user to quality control their data ClimPACT2 QC functions by Enric Aguilar and Marc Prohom Numerous files are produced in /extraqc subdirectory. These identify potential issues relating to: ● ● ● ● ● Rounding biases Unphysical values Unusually large values Runs of the same value Unusually large jumps between time-steps Quality control examples: outliers 1 50 /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv 75th percentile 40 Median 30 25th percentile 20 10 outliers recorded in text file Quality control examples: outliers 2 /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Quality control examples: Rounding biases /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Frequency of decimal values people have been rounding to .0 and .5!! Quality control examples: Duplicates /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Identifies duplicated dates Quality control examples: Outliers /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Outliers identified in box and whisker plots recorded below Quality control examples: min T > max T /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Identifies where minimum temperature > maximum temperature Quality control examples: Runs of the same value /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Identifies runs of 4 or more of the same value Quality control examples: Large values /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Identifies when temperature > 50 degrees or precipitation > 200 mm Quality control examples: Jumps /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Identifies when |Δt| >= 20 degrees Quality control examples: more outliers! /extraqc 3 PDF files: – Nadi_boxes.pdf – Nadi_boxseries.pdf – Nadi_rounding.pdf 8 text files: – Nadi_duplicates.txt – Nadi_outliers.txt – Nadi_tmaxmin.txt – Nadi_tx_flatline.txt – Nadi_tn_flatline.txt – Nadi_toolarge.txt – Nadi_tx_jumps.txt – Nadi_tn_jumps.txt 1 csv file: – Nadi_temp_stddev_QC.csv Identifies dates where min or max temperature is over n standard deviations from the mean. CliDE can help with QC CliDE can help with QC Hands on time Actions to take with erroneous data 1. 2. 3. Run ClimPACT2 “STEP. 1” and examine each file in the /extraqc subdirectory Make a copy of your input file that will contain corrected values (e.g. Nadi.txt to Nadi.QC.txt) In Nadi.QC.txt, make the following changes: a. Change any negative precipitation values found to -99.9 b. For any dates where tmin > tmax, change both tmin and tmax to -99.9 c. For temperatures identified as outliers or as runs of identical values, many values may be valid, change any values that are obviously wrong to -99.9. If you are unsure; i. Does the date correspond to a known weather extreme? ii. Do nearby stations record similar extremes on that date? iii. Do other errors occur on that date (e.g. tmin > tmax)? If so, it is quite possible that all information for that date is erroneous. IMPORTANT: keep a copy of both Nadi.txt and Nadi.QC.txt. In a spreadsheet, keep a note of each date that requires a change and record the reason for the change (e.g. negative precipitation). This will be important for determining consistency of treatment between stations. Now on to homogenisation Observations Quality Control Homogenisation Time-series of daily tmin, tmax and/or precipitation. Formatted according to ETSCI specifications. Check for basic input errors, such as rounding bias or artificially extreme values. Conducted in ClimPACT2. Better decision making & Science!! Sector analysis Calculate indices Compare calculated indices with sector data (e.g. water, health, agriculture) to identify meaningful relationships between them. Use ClimPACT2 to calculate the ET-SCI indices. Creates .csv and .jpg files for over 50 climate indices. Check for change points in time-series and determine whether they are natural or artificial, and subsequently correct. RHtests is one tool to do this. Homogeneity checking with RHtests What is RHtests? R software tool to identify inhomogeneities. Who developed it? Xiaolan Wang and Yang Feng at Environment Canada. http: //etccdi.pacificclimate.org/software.shtml Homogenisation is difficult and time consuming. Should be done at home base with appropriate metadata. RHtests converts ClimPACT2 file to multiple inputs Africa.txt Africa_LogprcpMLY1mm.txt Africa_LogprcpMLY.txt Africa_tminDLY.txt Africa_tminMLY.txt Africa_prcpDLY.txt Africa_prcpMLY1mm.txt Africa_prcpMLY.txt Africa_tmaxDLY.txt Africa_tmaxMLY.txt Log transformed monthly precip >= 1mm Log transformed monthly precip Daily minimum temperature Monthly minimum temperature Daily precipitation Monthly precipitation >= 1mm Monthly precipitation Daily maximum temperature Monthly maximum temperature RHtests produces plots of change points in *U.pdf Monthly Tmax is homogenous Monthly Tmin has 3 change points RHtests stores change point information in *mCs.txt Change point type (1 or 0) 0: change points that could be significant only if supported by metadata 1: change points that are significant even without a reference site Date confidence level test statistic 95% confidence interval of test statistic Whether change point is significant RHtests stores final data in *U.dat Original data Date Linear trend w/mean shift Mean adjusted base series multi-phase regression fit base anomaly series mean annual cycle with linear trend and mean shifts QM adjusted base series What to do with flagged data? There are two options: 1. When data is WRONG (i.e. unrealistic), make the data -99.9. 2. For suspicious data it is subjective and you may want to contact your home institutions to see whether, for example, extremely high temperatures occurred on a particular day. Quality control (QC) with ClimPACT2