EMODnet Thematic Lot n° 4 - Chemistry Data QA/QC and DIVA products for the north Sea Martin M. Larsen (AU-DCE) Date: 12/09/2014 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report Contents Introduction ................................................................................................................................................ 3 1. Common methodology for data QA/QC .............................................................................................. 4 Additional step: ....................................................................................................................................... 4 Broad-range check values in the Mediterranean .................................................................................... 8 2. Common rules for products generation .............................................................................................. 9 3. General guidelines for DIVA settings ................................................................................................. 10 2 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report Introduction This data report describes the steps based on the commen methodology for data QA/QC adopted at the expert meeting september 3rd in Paris The final dataset from 28 october 2014 was imported in to ODV version 4.3.3 (PC version, manually updated), and aggregated. The final checks was done using ODV version 4.3.4 (mainly mac version). 3 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report \ 1. Common methodology for data QA/QC ODV data: 626529/626611 According to EMODnet website fertilizers (nutrients) 139464 results, silicates 101986, chlorophyll 103577 No data with QF 6 was found. No depths <0 found. A number of results of 0 and <0 was found, as Danish data for some years was reported even as negative due to biologists idea that it would somehow average out to 0 for <DL, and that would somehow be helpful. These data was instead transformed to half-DL values. The concentration ranges was checked on the full dataset, and values above a “check value” (reasonable highest concentration in coastal waters) was performed to see how much data would be lost. Table 1: first check of concentration ranges Concentration ranges Max found Check value #above PO4 110 µM 50 µM 2859 NO2 97 µM 20µM 591 NO3 646 µM 100 µM 1004 NO23 658 µM 100 µM 11171 NH4 878 µM 200 µM 409 Si 696 µM 200 µM 4287 comment Reported NO2+NO3 results only 199583 duplicates in 85666 groups found – re-run with depth as distinguishing parameter and export of EDMO code and local CDI no. resulted in 200341 duplicate stations in 86637 groups found Additional checks for Nutrients The Nutrients was checked visually, and all was looking reasonable, with concentrations lowest below 100 m in most cases (PO4, NOx, NH4) and opposite for Silicate, with the highest concentrations in the deep waters. Typically the highest concentrations were found in the upper 5 meters. 4 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report Only two examples of QF values 2 (probably OK) for PO4 were found to be probably not OK, showing jumps of a factor of 10 to 200 with lower concentrations both above and below the suspected outlier value. For the other nutrients, the concentration profiles looks reasonable and no further exclusions for these were deemed necessary. Table 2 PO4 results with changed QF flag to 4: Accession no. “jump” Factor 180932 200 180511 10-20 Excluded value PO4 = 200 PO4 = 100 Nitrite was found to be larger than nitrate in a few samples, mostly from the PHABMO II cruise (may 2003, EDMO 486 = Ifremer). A few other results for very large nitrite/nitrate where nitrite results QF was set to 4. Table 3 NO2 results with changed QF flag to 4: Accession no. EDMO code, cruise 180932 32, NMMP0105M 206956 545, 77CB1997 Excluded value One case of very high nitrite concentrations showed increase with depth, and was therefore kept despite 3-6 µM nitrite to 0.1 µM nitrate in the deepest samples (Cruise 77CB1992 accession 206454, Saltøfjord by EDMO545). QF for results where Nitrate ~0.02 with positive Nitrite was not changed, as results are close to detection limit (DL) as the DL for nitrate is higher than nitrite so uncertainty in the low range is larger. The comparison between inorganic and total N or P, with the following suggestion for setting of flags: Where ratio of inorganic to total is defined as: RP = OP/TP (ortho phosphate/total phosphorus) RN = (NH4+NOx [+other N-inorganic species as e.g. Urea])/TN Table 4: Total to inorganic ratio checks Ratio of inorganic to total 1 – 1.15 QF value Not changed 1.15 – 2 Probably 4 >2 4 Description <15%* difference, data could be correct within uncertainty of measurements Data are very probably incorrect for inorganic or total measurement. Inspect profile data. Some of the data are likely incorrect The PO4/TP ratio was found bad for several stations, each with Ratio>3 inspected and following corrections made (table 5) 5 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report Table 5: Check of nitrogen ratios RP Accession /EDMO Cruise Station Ratio comment 629061/1181 Haithabu 2004 OM225019 (B) 3.4; 36.4 TP value (0.39) changed to QF4, OP considered ok OM704 (B) 6.6 TP value (0.39) changed to QF4, OP considered ok OM225003 (B) 4.9 TP VALUE (0.39) changed to QF4 62013/729 0 NOR5503 (B) 8.4 NH4 very high (main part of TN), suggests Bottom water impacted by something, TP QF changed to 4 633710/2537 LLUR 2010 225004 (B) 5.7 No indication of higher P in bottom water, OP QF changed to 4 228680 BE2009/20A W10 (B) 3.8 Dissolved tot.P , TP set to PO4 OP/TP ratio between 2-3 was only found for further 9 samples . OP/TP ratio between 1.2 and 2 was found for 65 samples in all. Nitrogen ratios were checked and 9 incidences of 1.15-2 and three above 2 was found (table 5). Table 5 Check of phosphate ratios RN Accession /EDMO Cruise Station Ratio comment 637642/1850 Pelagia 64PE364 43 (B) 12 TN value at 11 m very low, TN result changed to 10 from 0.10 65155/729 0 NOR7715 (B) 2.7 TN value lower at 14 m, NO3 similar (higher), TN QF flag set to 4 626215/ Celtic Explorer 41 (B) CE/11/010B 2.2 TN value similar at 6 and x m, NO3 only at 6 m, QF NO3 set to 4 6 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report 606004 stations exported30995 duplicate stations in 28918 groups found Values under detection limits and 0 values Values under detection limit (namely with QF=6, or in Danish samples values <0) but having the accompanying measured value=0 are incorrect. These data values need to be changed with ½ of the detection limit for the technique. Table 6 Detection limit suggestions from AU-DCE (accreditated methods) as valid for the North sea area Nutrient Detection limit Expected Relative standard deviation NO2 0.04 µM 7% NO3 0.1 µM 7% NOx 0.1 µM 7% NH4 0.3 µM 7% TN 1 µM 12.5% OP TP 0.06 µM 0.1 µM 5% 10% SiO4 0.2 µM 4% A general value of 0.05 for inorganic N-species could be applied (0.5 for TN), and 0.03/0.05 for P. Table 7 Detection limit suggestions from OGS Laboratory of Marine Chemistry (accreditated methods) as valid for the Adriatic sea Nutrient Detection limit Expected Relative standard deviation NO2 0.0015 µM 10% NO3 0.01 µM 3% NOx 0.01 µM 7% NH4 0.04 µM 12% TN 1 µM 1% OP TP 0.02 µM 0.02 µM 5% 8% SiO4 0.016 µM 3% 7 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report Measured values 0 with QF =1 or 2: if we know the detection limit, the values should be replaced by ½ detection limit and set QF=6 otherwise use above defined detection limits, change data values with ½ and set QF=6. #1 1 < 0.5 #1 IFTE [if conc. < DL then DL/2 else conc; in this case #1=TN, DL=1 umol/l] As a result, it is suggested to involve internal regional expert to review the aggregated data set. Finally, make an Odv regional collection (Export SDN ODV collection) as basis for Diva and other products and provide copy to Maris including detailed QC report. The same report should be sent to the Originator or erroneous data together with the corrected data. Broad-range check values in the North-sea The data availability on a substance by substance basis Substance O2 Chlorofyl-a Si No. of datapoints (total) 4 911 756 662 107 5 646 601 PO4 Total P 716 463 153 409 NO2 NO3 NO2+3 NH4 Total N 276 339 361 028 311 744 253 078 Range outlier-range (typicalle QF 4) -500 -> 1000 umol/l 0 ->692 0 ->695 Phosphate nutrients 0-155 0-1000 Nitrogen nutrients 0-658 0-1100 0-644 0-1000 8 >1000 >700 >700 number of outliers 1 159 102 983 200- 1669 1000-5071 82 174 1100-10367 650-2502 1000-1621 0 87 102 21 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report 2. Common rules for products generation I’m having some difficulties with getting the dataset to DIVA in a way that makes sense. Despite the tricks learned at Stareso, I still just get errors when doing the same exports as on Stareso, using the new version of ODV, and changing Depth to dybde where it shouldn’t be. I will contact Jean to solve this problem ASAP, and continue with the products… Martin Timing: 10-year moving average from 1960 to 2014, by season and levels Seasons: defined per region (check definition...) Vertical layers: defined per region (Baltic uses Helcom standard depths, Med using IODE standard depths… are the same? Please check) Seasons as adopted in the Mediterranean: winter (January to March), spring (April to June), summer (July to September) and autumn (October to December). IODE standard levels as adopted in the Mediterranean: 0, 5, 10, 20, 30, 40, 50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1750, 2000, 2500, 3000, 3500, 4000, 4500, 5000. Please check if are the same in the 5 regions. Variable list: already defined to NO3, NOx, Total Nitrogen, PO4, Total Phosphorus, SiO4, NH4. Can we evaluate to add also Dissolved Oxygen and Chlorophyll-a to complete the water column? Diva parameters/settings are defined in the following chapter. Masking for Diva maps to error field to 0.3 and 0.5. Use only data with QF=1, 2, 6 for Diva. 9 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report 3. General guidelines for DIVA settings While performing a DIVA analysis, the following steps has to be followed. 1. Domain definition and topography: Domain definition and topography: should be ok (check resolution not too fine nor too coarse). Masking by definition of regions should be left until the very end if any. Eliminate lowlands right from the start. 2. Output resolution: Decades with sliding window every year, by seasons. Regional definition of vertical levels and seasons. Seasons as adopted in the Mediterranean: winter (January to March), spring (April to June), summer (July to September) and autumn (October to December). IODE standard levels as adopted in the Mediterranean: 0, 5, 10, 20, 30, 40, 50, 75, 100, 125, 150, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1750, 2000, 2500, 3000, 3500, 4000, 4500, 5000. Please check if are the same in the 5 regions. 3. Data sets: Aggregated data: make sure data outside of your region are eliminated (via option in driver). QF selection: make sure that SeaDataNet QF scheme is used (Diva is not aware of QF schemes, it only selects data from a list of QF to be accepted). In aggregated data set there should be no 0 flags, so 1,2,6 retained only, the others too dangerous to use. 4. Background fields: make sure it has correct vertical coherence. Use climatological average (all data for a given season, with large L, low SN and possibly detrending). 5. Statistical parameter: SN and L: optimize but take with a grain of salt and provide reasonable bounds. VERTICAL coherence (via option -30 in driver). 6. Outliers: use the function outlier elimination ONLY if you are very confident or if you see a few bulleyes in the analysis (too many bullseyes indicate a too high SNR) in statistical parameters and quality of your products (final fine-tuning). In all cases check if reasonable amount of data are flagged "outliers". 7. Error fields: always mask the results where relative error field exceeds 0.3 and 0.5 using the same approach as in SDN and EMODnet Pilot (zero means analysis is expected to be perfect, 1 means the analysis has an expected error as large as your first guess, the reference field). 8. Advanced features: use advection if you have info (provide velocity fields) or if you really have currents that are coastal (use second parameter in driver to create pseudo along-coast velocities). Detrending: if trends in years are expected. Change of variables: specially for concentrations: apply log or logit. 9. Checking: • Work on 4D netCDF file (the one which will be published and includes already masked fields). • Vertical coherence via vertical sections 10 EMODnet Thematic Lot n° 4 - Chemistry QA/QC and DIVA report • Presence of bullseye or other artifacts (too high SN or too small L, suspect data) • Verify data coverage field to make sure you did not "loose" some data • Look at Output/3Danalysis/Variablename.Metainfo.txt Discussion Reiner mentionned the good practise of looking at residuals. ULg will add automatic plot of residuals and global indicator of residuals follow expected distribution. If not warning message will be issued Next Diva workshop: 3-7 November Calvi: intensive work, if special requests/questions send them before, in particular for features which are presently not possible to exploit by the driver options Example of DIVA settings: finetuning when more or less satisfied: Data extraction: 0 = do nothing boundary lines and coastlines generation: 0 = nothing cleaning data on mesh: 4 = 1 + outliers elimination minimal number of data in a layer: 0 Parameters estimation and vertical filtering: -30 Minimal L (larger than output grid spacing): 0.25 Maximal L (domain length): 10 Minimal SN: 0.1 Maximal SN: 3 Analysis and reference field: 1 Note: SN= Signal to Noise ratio; L= correlation Length 11