Baltic Marine Environment Protection Commission Project on Development of a HELCOM Pollution Load User System Helsinki, Finland, 26-27 February 2014 Document title Code Category Agenda Item Submission date Submitted by PLUS 5-2014, 2-3 Information on the OSPAR RID database development activities 2-3 DEC 2 – Information by Secretariat and Contracting Parties 27.2.2014 Chair of HELCOM LOAD Background OSPAR HASEC initiated over summer 2013 a project for preparing requirements for – and proposals on how – a modernized OSPAR RID database could be developed and for solving some issues on the present RID database. The enclose annex 1 “Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID)” is the result of the consultant (QUODATA) analysis with an proposal for a solution for the RID database. Annex 1 also included description of the present RID data model and functionalities, the reporting templates, and which parameters that are reported. The analysis assumes that the requirements which were selected as “need to have” by the OSPAR INPUT group should be included as a part of a modernized RID database. The prioritized requirements are described and included in annex 2 and can be divided in the following topics: a. Data access b. Data structure c. Data import d. Validation of imported data e. Data export f. General functionalities (filer, sort and search entries; customize range of geographic, areas, and flag individual data with a short comment) QUODATA describe and evaluate 5 solution scenarios for the 2016 RID database: S1: Improve existing solution. Only one organization hosts the one active (i.e. “production”) version of the database. The host handles tasks like import into the database. S2: Improve existing solution. All CPs can import data, which is synchronized into the web version. S3: Database is re-developed by means of a pure web solution, so that all functions can be used from a web browser. S4: Close IT cooperation with HELCOM. One joint database and all data are shared. S5: Close IT cooperation with HELCOM. There are two separate databases, data are not shared. QUODATA evaluate these scenarios based on the assumption that the same nine work packages as recommended should be conducted for each solution, and QUODATA has estimated the cost for the five solution with a precision of -30 to +60 %. The recommended work packages consist of: g. WP 1: Confirmation of database model Page 1 of 2 PLUS 5-2014, 2-3 h. WP 2: Choice of the database platform i. WP 3: Definition of workflow roles and user responsibilities j. WP 4: Concept for IT data integrity and security k. WP 5: Development of quality assurance procedures for data validation l. WP 6: Implementation of the functionalities m. WP 7: Initial data migration and production environment n. WP 8: Database testing and preparation for approval by OSPAR o. WP 9: Preparing documentation QUODATA assumes that 1 man month cost 13 600€ (160 hour pr. month, 1hour = 85€). The estimated total cost is estimated as: S1: 23.8-27.3 man months + maintenance for 5 years (102,400€) in total 426,080-473,680€ S2: 24.4-28.0 man months + maintenance for 5 years (100,050€) in total= 431,890-480,850€ S3: 40.4-42.3 man months+ maintenance for 5 years (94,610€) in total 644,050-675,330€ S4-S5: Not estimated but at least as high as S3, but some sharing with HELCOM should be expected, also regarding the maintenance cost. OSPAR HASEC 2014 will in March 2014 consider how to proceed with the modernization of the RID database. A data task group (RID DTG) following the project gives the following advice to HASEC: a. HASEC support to investigate the possibilities to cooperate with HELCOM with the aim of investigating a partly common solution with HELCOM. b. HASEC request RID DTG to describe how such an investigation could be implemented, included a road map. c. HASEC should indicate if they find it realistic to find at least expected 250 000 € for developing a revised RID database solution even dividing cost with HELCOM. d. HASEC should not at present support scenarios S1 and S2. Solution S1 and S2 built on the existing RID system, will not change the datamodel and seem very expensive compared with what is obtained and gained. Only S3 will start from scratch and provide a new system. There is a strong need to search for (at least a partly) common solution with HELCOM The LOAD chair will inform further on progress with the RID modernization project. Action required The Meeting is invited to: − take note of the provided information − take into account the results and progress on the RID modernization project in relation to HELCOM PLUS − consider possibilities for cooperation, common developed between HELCOM PLUS and OSPAR RID modernization projects. Page 2 of 2 Annex 1: Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Draft report Imprint QuoData GmbH Quality Management and Statistics Prellerstr. 14 D-01309 Dresden Germany Phone: +49 (0) 351 40 28867 0 Fax: +49 (0) 351 40 28867 19 Email: info@quodata.de Web: www.quodata.de Authors PD Dr. habil. Steffen Uhlig Dipl.-Phys. Christian Bläul Dipl.-Math. Henning Baldauf Bertrand Colson 21.02.2014 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Contents 1 Introduction .................................................................................................................................... 5 2 Description of data and workflow ................................................................................................. 6 3 Assessment of the 2008 RID database ........................................................................................ 7 4 5 3.1 Database platform and software ........................................................................................... 7 3.2 Database access ................................................................................................................... 7 3.3 Database structure ................................................................................................................ 7 3.4 Data import ............................................................................................................................ 7 3.5 Data validation ....................................................................................................................... 7 3.6 Data export ............................................................................................................................ 7 3.7 Description of the database ................................................................................................... 8 Improvements to the RID database in 2014 – the 2014 RID database ...................................... 9 4.1 Background ........................................................................................................................... 9 4.2 New tables ............................................................................................................................. 9 4.3 Import module ...................................................................................................................... 10 4.4 Export module ..................................................................................................................... 10 RID database user requirements ................................................................................................ 11 5.1 General requirements .......................................................................................................... 11 5.2 Requirements of the OSPAR community ............................................................................ 11 5.2.1 Functionalities with high priority ................................................................................................. 12 5.2.2 Functionalities with low priority .................................................................................................. 13 6 Recommended database model for the 2016 RID database .................................................... 14 7 Solutions for the 2016 RID database .......................................................................................... 15 8 7.1 Scenario overview ............................................................................................................... 15 7.2 Motivation for scenario choice and detailed scenario description ....................................... 15 7.3 Advantages and disadvantages of the scenarios ................................................................ 16 Recommended organisation of the implementation of the 2016 RID database .................... 19 8.1 WP 1: Confirmation of database model ............................................................................... 19 8.2 WP 2: Choice of the database platform ............................................................................... 20 8.3 WP 3: Definition of workflow roles and user responsibilities ............................................... 20 8.4 WP 4: IT data integrity and security .................................................................................... 21 8.5 WP 5: Development of quality assurance procedures for data validation ........................... 22 8.6 WP 6: Implementation of the functionalities ........................................................................ 23 8.7 WP 7: Initial data migration and setup of production environment ...................................... 23 8.8 WP 8: Database testing and preparation for approval by OSPAR ...................................... 23 8.9 WP 9: Preparing documentation ......................................................................................... 24 8.10 Project organisation ............................................................................................................. 24 9 Budgeting of the implementation of the 2016 RID database ................................................... 25 9.1 Total costs ........................................................................................................................... 25 9.2 Annual maintenance costs of the 2016 RID database ........................................................ 28 QuoData GmbH Page 3 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 10 Time table of the implementation of the 2016 RID database ................................................... 29 11 Recommended scenario .............................................................................................................. 30 11.1 Common solution with HELCOM PLC ................................................................................. 30 11.2 Independent, standalone RID database .............................................................................. 30 A Annex ............................................................................................................................................ 32 A.1 Structure of 2008 RID database .......................................................................................... 33 A.2 Description of 2008 RID database ...................................................................................... 34 A.2.1 Structure of input tables....................................................................................................... 34 A.2.2 Main Form............................................................................................................................ 34 A.2.3 Tables and Reports ............................................................................................................. 34 A.2.4 Export/Aggregation .............................................................................................................. 40 A.2.5 Import from Excel ................................................................................................................ 41 A.2.6 Log Book.............................................................................................................................. 43 QuoData GmbH Page 4 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 1 Introduction The Comprehensive Study on Riverine Inputs and Direct Discharges (RID) is an OSPAR monitoring programme designed to provide annual information on the pressures and development of selected pollutants in the OSPAR maritime area (North-East Atlantic). On a mandatory basis the concentrations and loads of the determinants cadmium, copper, lead, mercury and zinc, nitrogen and phosphorus species, and suspended particulate matter are to be monitored and reported by each OSPAR Contracting Party. RID data constitute an important part of the OSPAR Information System for the purpose of marine environmental assessment. RID data are also important for the implementation of the EU Marine Strategy Framework Directive (MSFD). The objectives of the RID study are set out in the RID Principles. As a basis for the assessment of the RID data, the RID database was designed and implemented by QuoData in 2008. The aim of the RID database is the structured storage of the annual RID data and the harmonization of the national RID data. An overview of the 2008 database is given in section 3. However, there is a need to improve the database structure due to some shortcomings of the 2008 RID database. These shortcomings result from the fact that this first database was conceived as the first step of a step-wise improvement process. While some improvements are currently carried out (see section 4), further steps will be necessary. Suggestions can be found in this document. This report describes and compares proposals regarding a long-term solution for the RID database. Required features and functionalities were determined on the basis of the evaluation of a questionnaire and prioritized by the RID Database Task Group (RID DTG). These required features and functionalities are summarized in section 5. As completely new functionalities are desired, the database model needs to be expanded. Section 6 presents the recommended revision of the 2008 RID database model, even though the changes are minimal. Different solutions for the future RID database (in this report it is referred to as “2016 RID database” as it is assumed that the project of implementing the 2016 RID database will start in 2015 and it will be completed in mid-2016) are compared in section 7. Time and financial resources as well as specifications covering data submission, procedures for comprehensive quality assurance, data access, security and requirements for the web interface are taken into account. In section 8, a project plan for implementing the 2016 RID database is presented. A budget draft and a road map are given in sections 9 and 10. QuoData GmbH Page 5 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 2 Description of data and workflow This section describes how RID data are generated and distributed from an organizational perspective. For a technical description, please refer to section 3. All Contracting Parties (CP) monitor the pollutants in their rivers, as specified in the RID principles. For most CPs, monitoring is carried out at least at a yearly basis. The CPs calculate input into a maritime area on the basis of individual water samples and flow measurements. Specifically, in the OSPAR context, the word “input” refers to mass amount of substance carried by water into the maritime area. All input is then aggregated and an annual level is estimated and reported to the OSPAR Secretariat. Each CP then submits a Text report containing meta data and a Data report consisting of Excel files with a fixed structure. This data is then imported into the RID database. This process takes place every year. In order to fulfill data requests by so-called ‘data users’, data can be extracted from the database. The same is done to prepare the annual Data Report, which contains the input for one year as well as input time series. QuoData GmbH Page 6 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 3 Assessment of the 2008 RID database The basic idea of the 2008 RID database was to create a database in order to manage the data from all Excel files of riverine inputs and direct discharges since 1989 in a consistent way. 3.1 Database platform and software The 2008 database was built with Microsoft Access and with functions programmed in Visual Basic for Applications. 3.2 Database access The data is only available upon request to the RID Data Centre. The database is operated by the Norwegian Institute for Agricultural and Environmental Research (Bioforsk). The national RID data of the OSPAR Contracting Parties are managed by Bioforsk, i.e. data import, quality assurance and data export is carried out by Bioforsk1. A database directly accessible by the Contracting Parties was not within the scope in 2008. 3.3 Database structure The data structure was conceived for storing data from the previously used Excel detailed sheets (Tables 5a-c, 6a-c, 7, 8, 9) and the annual overview tables (AA Tables 1a – 4b). An overview of the complete structure of the database can be found in the Annex A.1. A more general data structure was not requested in 2008. Accordingly, it was accepted that data e.g. from aquaculture discharges, urban runoffs as well as inputs from unmonitored areas would be taken into consideration only at a later stage. Some very important technical improvements to the 2008 RID database structure are currently being implemented by QuoData. Detailed information on this work will be found in section 4. 3.4 Data import Annual data can be imported to the Access database using Excel files. The database can create Excel templates2 for data entry. The Excel files filled out by the Contracting Parties are then uploaded to the database. 3.5 Data validation The OSPAR Commission’s intention was to implement quality assurance procedures at a later stage. Accordingly, automated validation tests identifying possible problems in the imported data are not part of the 2008 RID database. Thus, manual checks need to be carried out. 3.6 Data export With the data export module, all tables can be exported. A selected set of raw3 data for one OSPAR Contracting Party can be exported into Excel. No further export options were requested. 1 2 Contracting parties don’t have direct access to the database, but infrequently, copies are sent to data providers. files with the correct structure but no data content QuoData GmbH Page 7 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Depending on the different versions of Excel, Access and Windows used, substantial errors have occurred, e.g. data export failed altogether or exported values were erroneous. These issues are addressed in Step 4 of this project and described in section 4. To visualize and analyse selected data, export to RTrend was also implemented in the database. RTrend, developed by QuoData, is a software program that allows the adjustment and statistical evaluation of riverine loads. 3.7 Description of the database A detailed description of the 2008 database general functionalities is given in Annex A.2. 3 Here raw means “as imported”, i.e. without further calculations QuoData GmbH Page 8 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 4 Improvements to the RID database in 2014 – the 2014 RID database 4.1 Background The RID Review Group recommended changing the 2008 RID database structure in order to harmonize the database4 and to improve data export possibilities. QuoData was contracted by the OSPAR secretariat to implement the necessary changes in Step 4 of this project. It must be noted that the implementation’s start was delayed by the OSPAR secretariat until the meeting of the Working Group on Inputs to the Marine Environment on 28-30 January 2014 (“INPUT meeting”) in order make decisions with respect to possible consequences. Currently, QuoData is working on the implementation. The work will be completed at the end of March 2014. The necessary changes include new tables and new import/export functionalities. 4.2 New tables A set of new tables will be included in the RID database. In the first place, the restructuring addresses the reporting of direct discharges. The data of aquaculture discharges and other direct discharges will be added to the RID database. In addition, the restructuring addresses the reporting of riverine inputs. Inputs of all monitored rivers will be pooled into one table. Therefore, main and tributary rivers are no longer reported in separate tables. The improved structure of the RID database is summarised in the following table. Table 2008 RID database Table 4-1: 4 2014 RID database 5a Sewage effluents 5b Industrial effluents 5c Total direct discharges Aquaculture discharges 5d -- Other direct discharges 5e -- Total direct discharges 6a Main rivers Monitored rivers 6b Tributary rivers Unmonitored areas 6c Total riverine inputs Total riverine inputs (monitored + unmonitored) 7 Contaminant concentrations 8 Limits of detection and/or quantification 9 Catchment-dependent information Improvements of RID database structure revised in consultation with RID Database Task Group removing the inconsistencies in data reporting across the CPs and increasing comparability of data from different countries QuoData GmbH Page 9 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database In order to ensure consistency in the data series, it will be necessary to carry out some restructuring and assignment adjustments in the historical data. Previously, data reporting was not 100% harmonized. For instance, there were differences in the reporting of unmonitored areas. During the INPUT meeting, several Contracting Parties preferred a resubmission of national RID data in order to avoid wrong data assignments. 4.3 Import module The import module will be adapted so that the reported subtotals and totals for all Contracting Parties can be imported. This adaptation improves data comparability and ensures that the exported data reflect the actual reported inputs into the sea. 4.4 Export module The following export functions will be added or improved Export of reported raw data of each Contracting Party Export of aggregated data per country or OSPAR Region Export of time series based on single data tables (e.g. for one constituent in one river for several years) or aggregated data (e.g. total loads to the sea from a Contracting Party or a maritime area) Export of annual overview tables generated on the basis of the actual reported totals. QuoData GmbH Page 10 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 5 RID database user requirements 5.1 General requirements There are implicit requirements common to the majority of software projects. In particular, even though the following points were not explicitly mentioned in the questionnaire, it is understood that the RID database should be easy and intuitive to use, be flexible, i.e. it is easy to carry out enhancements, provide detailed error messages, be robust, i.e. perform well not only under ordinary conditions, be secure, be quick, i.e. all functionalities should work fast. 5.2 Requirements of the OSPAR community To find out about the requirements of the potential end users (‘data providers’ as well as ‘data users’) a questionnaire was developed with the support of OSPAR in step 1 of this project. Experts active in OSPAR Contracting Parties were asked to prioritize the different functionalities of an improved RID database. (For a detailed analysis and evaluation of the questionnaire results, see document INPUT 14/4/3-Add.1). The online questionnaires were sent to a total of 86 respondents in October 2013. The final deadline for filling it in was mid-November 2013. There were a total of 35 replies5, reflecting the expectations across the OSPAR community. The prioritized functionalities were graded into the following three categories based on the frequency of answers in the questionnaire: "need to have", "nice to have" and "minority wish" (for detailed information see document INPUT 14/4/3-Add.2a). However, the RID Database Task Group (DTG) classified the "need to have" functionalities into basic and less basic ones. This classification leads to the final proposed prioritisation. Priority What functionalities belong to this category? high All basic functions medium All other functions which are not considered basic, but fairly easy to implement. low Functions which are not within the scope of RID and might increase the report burden on each OSPAR Contracting Party. Table 5-1: Priority criteria for the functionalities of the RID database On the basis of the questionnaire results and the DTG prioritisation, the high-, medium- and lowpriority functionalities are summarized separately in the following subsections. 5 Replies from Belgium, Denmark, France, Germany, Ireland, The Netherlands, Norway, Spain, Sweden, the United Kingdom, and additionally from ICES and the OSPAR Secretariat. QuoData GmbH Page 11 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 5.2.1 5.2.1.1 Functionalities with high priority Data access All respondents but one of the questionnaire expressed the opinion that the database should be accessible by a web browser. Hence, the 2016 RID database needs to provide a web-based access to the RID data. This web-based access does not imply a web-only solution, but it allows extending the Access database and making its data available for web browsers. 5.2.1.2 Data structure As extracted from the replies to the questionnaire, the improved data structure (see section 4.2) needs to be extended only in two aspects. The first extension desired by the data users concerns the storage of the measurement uncertainty for the annually aggregated loads and the second one concerns the storage of the limit of detection and limit of quantification for all reported and measured data (not only for the sewage effluents, industrial effluents and riverine inputs per year, catchment area and Contracting Party as before). 5.2.1.3 Data import No significant changes are required regarding data submission. The import of data will continue to take place on the basis of Excel files. Contracting Parties can alternatively use the CSV format for reporting. However, a new requirement is the possibility of importing partial RID datasets. In the case of the submission of such incomplete datasets, the same validation rules should be provided as in the case of a complete submission. 5.2.1.4 Validation of imported data Quality control procedures should be implemented in the 2016 RID database in order to reduce import errors. After data import, automatic tests for the identification of - missing values, - invalid values (e.g. wrong units), - suspicious values (e.g. outliers) and - too many significant figures should be carried out in the RID database. Subsequently, a status report of these quality control procedures should be provided. The validation should only issue warnings about potential problems, but not prevent users from importing values. 5.2.1.5 Data export Data export to both Excel and CSV formats should be possible (the user chooses one of the formats). 5.2.1.6 General functionalities There are three features in this questionnaire section that should be included in the 2016 RID database. First, it should be possible to filter, sort and search entries (e.g. determinands, years, catchment areas) in the different parts of the RID tables. QuoData GmbH Page 12 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Second, it should be possible to customize the range of geographic areas (e.g. across sub-regions) from which data are to be aggregated in data products. Thirdly, it should be possible to flag an individual data item with a short comment or an external reference. 5.2.1.7 Functionalities with medium priority The RID Database Task Group considers two functionalities to be important but not essential: (1) The storage of the measurement uncertainty at the level of load calculation should be included in the 2016 RID database. It must be taken into account that the quantification of measurement uncertainties for such environmental monitoring programs remains a difficult scientific task. (2) It would be a nice feature if the RID database were able to link to a GIS system to create maps by means of unique georeferencing identifiers stored in the RID database. 5.2.2 Functionalities with low priority Some functionalities requested by questionnaire respondents are not considered as prioritised tasks by the RID Database Task Group, e.g.: (1) The storage of monitoring data used for RID calculations, as this lies outside the scope of RID. (2) The possibility of data exchange between the RID database and other databases, as the differences with regard to database structure and data format of the different databases are too big. (3) A separate module for trend analysis (e.g. RTrend software), as users should use their preferred trend analysis software. (4) The possibility to link with the streamlined text reporting format or a summary abstract. QuoData GmbH Page 13 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 6 Recommended database model for the 2016 RID database The database model, also called database structure defines which tables the database uses internally, and which fields they have. In layman’s terms, a field is a table column. The table contents are not part of the structure. Typically, the database structure is not directly visible to any user of the database. The pre-2014 database structure is given in annex A.1. There were no structure changes required for Step 4 (immediate improvements to RID database, section 4) of this project. There are 2 high priority requirements listed in section 5 that require minor database structure changes: 1. Storage of the measurement uncertainty for the annual aggregated loads requires a new field of type float in “Table 5 and 6”. 2. The storage of the limit of detection and limit of quantification for all reported and measured data requires two new fields of type float in "Table 5 and 6". There are also 2 medium priority requirements listed in section 5 that require straightforward additions of new fields. Later in this document, a scenario will be presented in which a re-development of the RID database is recommended. However, QuoData suggests keeping the 2008 database structure or using a very similar one (e.g. minor field type changes), as it has proved to be useful. Typically, a developer has full control over the database structure. However, depending on the development tools or frameworks used for implementation, in some cases considerable changes to the structure might be necessary in order to speed up development and thus reduce costs. Accordingly, a call for tenders should not make this structure mandatory, but rather put it forward as an example. QuoData GmbH Page 14 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 7 Solutions for the 2016 RID database In the following section, the scenarios for the 2016 RID database are laid out. First, an overview is given. Then the motivation for choosing these specific scenarios and excluding others is presented. Finally, advantages and disadvantages are discussed. 7.1 Scenario overview The following five scenarios are in line for the 2016 RID database. S1. Improve existing solution. Only one organization hosts the one active (i.e. “production”) version of the database. The host handles tasks like import into the database. S2. Improve existing solution. All CPs can import data, which is synchronized into the web version. S3. Database is re-developed by means of a pure web solution, so that all functions can be used from a web browser. S4. Close IT cooperation with HELCOM. One joint database and all data are shared. S5. Close IT cooperation with HELCOM. There are two separate databases, data are not shared. 7.2 Motivation for scenario choice and detailed scenario description Scenarios S1 and S2 are based on the existing solution. Because it already contains mechanisms for data import and export, the amount of computer code to be written can be kept small if these mechanisms continue to be used. This implies that as few functionalities as possible are made available via a browser. Specifically, scenarios S1 and S2 do not allow the user to import data via the web, but do contain a table view and Excel/CSV export functionality. This implies that on the web there is only one user role: the ‘data user’. Users of this type cannot change the database. S3, on the other hand, allows the user to perform all tasks within the web browser. This allows for better accessibility, exposure to a wider audience and a longer expected lifetime6. In contrast to S1, only open-source software is considered to reduce OSPAR’s costs as well as dependence on vendors. In our opinion, the restriction to open source isn’t a factor in one-time costs, but closed-source components reduce options for future enhancements and potentially give rise to license costs. As a result, open source is cheaper to maintain. Scenarios S4 and S5 are motivated by keeping compatibility high and costs low, as many outcomes of the HELCOM PLC efforts are expected to be usable for OSPAR. At the moment of writing this document, the HELCOM database structure is not defined yet. Neither the likelihood of cooperation nor the resulting costs of cooperation between the two conventions can be assessed. In S4, only one joint database for both HELCOM and OSPAR exists, meaning both conventions import their data into the exact same database. Access rules could prevent read or write access by role. In S5, only structure and computer code is shared, but not the data. This results in much less need for 6 before the database is replaced because of IT reasons or immense cost increases due to scarcity of experts familiar with the IT environment QuoData GmbH Page 15 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database requirements’ and definitions’ harmonization. Functions specific to HELCOM would be hidden in the OSPAR copy of the database. 7.3 Advantages and disadvantages of the scenarios The table on the next page lists the pros and cons for each of the five scenarios. The first column contains two types of identifiers: rows whose identifier starts with the letter M are about the maintenance aspects. Those starting with letter O are about one-time efforts. The abbreviation WP stands for work package. The nine proposed work packages are described in section 8. Some table cells contain Euro-based cost estimates, which will be broken down and explained in section 9. A recommendation on the scenario to choose is given in section 11. QuoData GmbH Page 16 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Scenario S1 S2 S3 S4 S5 high medium highest high WP # Requirements to be compared M1 Implementation costs for future high requirements Yearly data submission: reoccurring effort for CPs and M2 OSPAR secretary to keep data up to date; waiting time for feedback from the data validation high, due to email exchange and slow feedback (S1), need for synchronization (S2), both S1-S2: active host needed small, direct online submission, no active host for most data requests Required effort for planning O1 stage (everything that happens before the IT implementation starts) medium high Development and implementation work including O2 documentation (web server, user interface, database management system) high O3 One-time network security efforts for the database 74'800 € 80'240 € high little effort, because high only of only one webaccessible role: data user no no very small very small medium administrative expenditure to none harmonize DB O6 model/backend/frontend choice with HELCOM QuoData GmbH 76'160 € - 106'080 €81'600 € 112'880 € low-highest (depends on cooperation and cost-sharing 1-5 conditions) very high highest, but price shared with HELCOM 180'880 € 187'680 € 348'160 € 6-9 372'640 € 223'040 € 231'200 € Does reporting format for data no O4 suppliers change, do CPs need to adjust? Effort to plan and program O5 workflows, user rights and privileges medium none none high, but shared 4,6 maybe a little bit more than S3, but shared 3 high 1,2 Page 17 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Existing Access application can partly be reused (smaller O7 development, testing and documentation effort) O8 yes Functionality available to remote little users with only a web browser yes only data no model little all all a little more high, may be shared (depends on similarity of workflows) Expected effort for OSPAR to approve delivery, i.e. after IT O9 implementation and testing by contractor, with the aim to switch to production little little 5’250 € 10’500 € 5’500 € 10’850 € Effort for migration of existing data to the new database little O10 Table 7-1: 7’650 € 15’050 € medium medium-high 8 7 Advantages and disadvantages of the five scenarios for the 2016 RID database QuoData GmbH Page 18 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 8 Recommended organisation of the implementation of the 2016 RID database The complete project of the implementation of the 2016 RID database needs to be split up into different subprojects or work packages. Each work package (WP) covers a definite area of responsibilities associated with specific tasks. Each work package provides an output for the subsequent work package. While the details of the tasks themselves will be different from one scenario to the next, the overall list below is independent from the chosen scenario (see section 7). In order to program the 2016 RID database, QuoData recommends the following nine work packages: WP 1: Confirmation of database model WP 2: Choice of the database platform WP 3: Definition of workflow roles and user responsibilities WP 4: Concept for IT data integrity and security WP 5: Development of quality assurance procedures for data validation WP 6: Implementation of the functionalities WP 7: Initial data migration and production environment WP 8: Database testing and preparation for approval by OSPAR WP 9: Preparing documentation In the following, the separate work packages are described. 8.1 WP 1: Confirmation of database model The aim of the first work package is to design the final database model. It will be necessary to compare the proposed solution for the RID database outlined in this report with the objectives at the time of the project implementation. It is not unlikely that the requirements concerning the RID program will change. It will be necessary to check whether the user requirements have changed, whether new input tables or additional information are of interest to OSPAR, whether there is obsolete information stored in the RID database, whether import/export functions should be expanded or additional functionalities are desired, among other things. If there will be no changes or supplements the database model proposed in section 6 may be used. Otherwise the database model needs to be adapted accordingly. QuoData GmbH Page 19 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 8.2 WP 2: Choice of the database platform This work packages will deliver the common environment for the RID database and the decision on the frontend and backend software system used for the implementation of the 2016 RID database. The general term “database” can be split into the fields of data storage and user interaction. The first is often called a backend or Database Management Systems (DBMS), and only requires attention from IT administrators during initial setup and during emergencies. The latter is the frontend, and typically the only part any user experiences. A typical frontend can work together with several backends and vice versa. An appropriate backend and frontend need to be chosen on the basis of the requirements for the 2016 RID database, the specified database model (see WP 1), the scenario (S1 – S5), the available budget and the experiences of the software development team. If the decision to use scenario 1 or 2 (expansion of the 2008/2014 RID database) is taken, the backend MS Access will be applicable. Otherwise it has to be decided which is the optimum backend. This could be, for example: MySQL (widely-used open source) PostgreSQL (open source released by a community of developers called PostgreSQL Global Development Group) MS SQL Server (powerful backend produced by Microsoft) Additionally, a framework or RAD environment has to be chosen for the frontend. For a detailed comparison of the different backend and frontend considerations, the reader is referred to the Step 2 report of this project. 8.3 WP 3: Definition of workflow roles and user responsibilities This work package focuses primarily on the data access rights, also called user roles. It is important to have a clear understanding of all the tasks and processes that will play a role during the production stage (i.e. once all one-time tasks such as implementation have been completed) of the RID database and which persons will be responsible for each task. The workflow proceeds as follows: 7 1. The input tables are filled in by entering data into the Excel templates. 2. The completed Excel templates are imported into the RID database. 3. The submitted RID data is released after a quality check of the data and corrections where required. 4. The released RID data can be exported, e.g. for preparing the annual Data Report or for assessing time series of inputs. In this work package it is necessary to define: Who is allowed to import data? 7 it is acknowledged that other formats might be required to be supported as well, but the authors of this document will use Excel as an example format during this discussion QuoData GmbH Page 20 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Who is responsible for the validation of the data to be imported? Who carries out the quality assurance procedures of the data to be imported? What is the communication process, e.g. if there are inconsistencies in the data to be imported? Who contacts the contracting parties, and in which situations? Who is responsible for the final release of imported data? Who is allowed to export which data? Who is allowed to read which data? Which data, if any, are public? Based on the corresponding answers user groups are assigned which have specific responsibilities and privileges. Moreover it is necessary to define Who is responsible for the administration of content? o maintenance and management of the data tables, o adaptation of Excel templates, o data recovery etc... Who is responsible for the user management? o Who decides on “who can do what”? o Who sets up new users? Who changes user privileges? Who is responsible for the technical administration? o database maintenance (there are a lot of tools for an effective system and database maintenance), o data backup, o data security (protection from malicious actions), o emergency data recovery etc… For example, there might be five user groups: OSPAR secretariat, RID Database Task Group, CPs, an external contractor for the technical administration and the public. The concrete responsibilities and tasks need to be set down in a written agreement. 8.4 WP 4: IT data integrity and security The RID data might be accidently deleted, lost due to hardware or software problems, or subjected to malicious actions. All of these pose a threat to the data integrity and security. The objective of this work package is to analyse the requirements and possible methods to ensure the security and integrity of the RID data. It is necessary to define the procedures to maintain, backup, recover and secure the data stored in the RID database in order to guarantee no data loss and data abuse. This work package also includes reaching a decision as to where the production server will be hosted (e.g. OSPAR secretariat, HELCOM secretariat in case of scenario S4 or S5, ICES, OSPAR Working group, EMEP, Research Institutes as Bioforsk, external company) QuoData GmbH Page 21 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 8.5 WP 5: Development of quality assurance procedures for data validation This work package deals with the validation of the data to be imported and the development of suitable quality assurance (QA) procedures in order to guarantee assessment is carried out on the basis of reliable data. After all checks have been completed, the data can be released. Data can be read and aggregated results calculated only after the data has been released. The completed Excel spreadsheets are electronically imported into the RID database by the responsible person. It goes without saying that the data to be imported is checked before importing by the Contracting Parties. Nevertheless, automatic QA tests need to be performed in order to identify possible problems and inconsistencies. These QA tests will include tests for the identification of missing values, A test is performed to determine that a complete set of data has been imported. Gaps or missing data need to be identified. The user must provide information as to whether the missing data will be submitted later, or are definitively not available. invalid values, For example a test checking for wrong units, or a test checking for negative entries. suspicious values, For example several statistical outlier tests are performed to identify values which are too low or too high. values with too few or too many significant figures, A test is performed to identify results with too few or too many significant figures, as such values can indicate incorrect precision. More importantly, it is necessary to decide how such values will be treated. In particular, the treatment of too many significant figures is important as rounding rules are necessary for display. It should be mentioned that the import file format may play a role here, as e.g. Excel uses both a display format and an internal float value. They will probably have to be used together at all times, as the internal value always has a large number of decimal places. duplicate values, A test is performed to verify that the same data were not imported twice, including for different determinands or years. Procedures to determine when it is permitted to overwrite existing data need to be defined. out-of-range values, For example a QA test checks whether the range of the loads is too wide or implausible given the LOD, i.e. whether the range of the load (upper value minus lower value) coincides with the limit of detection based on the run-off. inconsistent values. For example a QA test checks whether the concentration and load figures given in the database are sufficiently consistent, i.e. whether there is a clear relationship between the mean concentration and mean load based on the run-off. QuoData GmbH Page 22 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database This work package requires the theoretical specification of the QA tests to be included. It is necessary to define which procedures will be applied and how the procedures will work. The output of all QA tests needs to be clarified, both in prose description and with examples. A status report with detailed information on possible problems and inconsistencies needs to be drafted. This status report will summarize the results of the QA tests and, if necessary, describe any further steps necessary to eliminate the problems. It has to be noted that, for all QA tests, it will be necessary to clarify whether the user is allowed to import despite failed tests. 8.6 WP 6: Implementation of the functionalities This work package is concerned with the IT implementation of the 2016 RID database. Specifically, the following functions need to be implemented: Database structure (e.g. scheme, database tables, columns, keys and indexes) Data import QA tests Data export Functionalities desired by the users (e.g. searching, sorting, filtering) Warning and help messages Access facilities of the different user groups Automatic logging of the database accesses and changes Backup and recovery functions Afterwards, the proper functioning must be extensively tested. Ideally, this is in part done via automated tests that can later be re-run in case of a new functionality being implemented. 8.7 WP 7: Initial data migration and setup of production environment The main purpose of this work package is the transmission of the whole RID data (i.e. database content) to the new platform. This work package includes the installation of the production server, the transmission of data from the development environment (local computers) to the production server and the submission of the existing RID data from the 2008/2014 Access database to the newly developed one. Possible data restructuring must be taken into account. Additionally, a backup of the “old” database needs to be performed. 8.8 WP 8: Database testing and preparation for approval by OSPAR This work package aims to verify and validate the completely developed RID database. It will be necessary to evaluate whether the features and functionalities which have been implemented in the database meet the specified requirements as well as the users’ needs and whether they have been implemented correctly. Whenever a shortcoming occurs it will need to be resolved by the software development team. The database will not be delivered until all shortcomings have been eliminated. However, the approval of the delivery of the software system also depends on the availability of documentation, as described in the next work package. QuoData GmbH Page 23 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 8.9 WP 9: Preparing documentation The final work package is concerned with the production of detailed documentation. On the one hand the documentation must include a user’s manual, and, on the other hand, the program’s source code documentation. The former will describe comprehensively the usage of the 2016 RID database including all tasks and workflows defined in WP 3. The second one ensures the possibility to transfer the responsibility for the technical part of the database to a different contractor from the software development team. This also ensures the possibility to further modify and extend the RID database over many years independently of the developer. 8.10 Project organisation It is a great advantage for the project’s success if there is a project manager overseeing the software developer’s side. This person plans and manages all work packages which are the responsibility of the software development team and coordinates the various project activities between OSPAR, the RID Database Task Group, and the software development team to ensure that WPs are completed on time, to specification and within budget. The budget for the project manager was increased in comparison to the Step 2 report, as we consider the RID DB to require slightly more managing than average software projects. QuoData GmbH Page 24 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 9 Budgeting of the implementation of the 2016 RID database 9.1 Total costs For scenarios S1-S3, the costs based on the priorities suggested by the RID Database Task Group (RID DTG) are presented in the following table. The costs are set out for each work package, for the project management and for the maintenance. As a further classification, the costs are set out for the case that a. Only high priority requirements are implemented. b. High and medium priority requirements are implemented, and the option for adding low priority requirements at a later stage is considered. Please note that the numbers below carry an uncertainty of -30% to +60%. The estimation was based on man months. To derive Euro numbers, it is assumed that one man month corresponds to 160 h and 85 €/hour. The last number is an average based on the different salaries of the expertise fields expected to be necessary, e.g. project management and IT expert oversight are more cost intense than basic IT tasks, document copy-editing and layouting. Scenarios S4 and S5 are not considered in the table. This is in part because of the temporal overlap between the HELCOM PLC and OSPAR RID decision-making and the current uncertainties regarding the HELCOM PLC project development. The costs of both S4 and S5 are expected to be at least as high as those of S3, but there is a saving potential as costs can be split between HELCOM and OSPAR. However, high harmonization costs are to be expected and their estimation carries a high uncertainty. The estimation uses on a top-down approach based on experience of similar projects and implementation tasks. The cost estimation presented here follows lays on the same foundation as the Step 2 report calculation. However, in Step 2 the IT security planning was not considered. Also, a different grouping was used in Step 2 than in this report. Specifically, to avoid misunderstandings, the presentation in this report is based on work packages defined in much more detail then the general categories used in Step 2. QuoData GmbH Page 25 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database Scenario S1 S2 S3 Improve existing ACCESS solution and only one organization can import data Improve existing ACCESS solution and all CPs can import data Database is re-developed by means of a pure web solution Cost item WP 1: Confirmation of database model a. 0.6 man months = 8'160 € a. 0.6 man months = 8'160 € a. 0.6 man months = 8'160 € b. 0.6 man months = 8'160 € b. 0.6 man months = 8'160 € b. 0.6 man months = 8'160 € WP 2: Choice of the database platform 0.2 man months = 2'720 € 0.2 man months = 2'720 € 0.6 man months = 8'160 € WP 3: Definition of workflow roles and user responsibilities 0.5 man months = 6'800 € 0.5 man months = 6'800 € 0.9 man months = 12'240 € WP 4: IT data integrity and security 1.8 man months = 24'480 € 1.9 man months = 25'840 € 2.8 man months = 38'080 € WP 5: Development of QA procedures for data validation a. 2.4 man months = 32'640 € a. 2.4 man months = 32'640 € a. 2.9 man months = 39'440 € b. 2.8 man months = 38'080 € b. 2.8 man months = 38'080 € b. 3.4 man months = 46'240 € WP 6: Implementation of the functionalities a. 8.1 man months = 110'160 € a. 8.4 man months = 114'240 € a. 16.1 man months = 218'960 € b. 10 b. 10.4 man months = 141'440 € b. 17 WP 7: Initial data migration and setup of production env. man months = 136'000 € 0.4 man months = 5'440 € 0.4 man months = 5'440 € WP 8: Database testing and preparation for approval (contains both OSPAR and contractor costs) a. 2.4 man months = 32'640 € a. 2.5 man months = b. 3 man months = 40'800 € WP 9: Preparing a. 2.4 man months = 32'640 € QuoData GmbH man months = 54'400 € 34'000 € a. 3.5 man months = 47'600 € b. 3.1 man months = 42'160 € b. 4.3 man months = 58'480 € a. 2.5 man months = 34'000 € a. 2 27'200 € Page 26 of 44 4 man months = 231'200 € man months = Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database documentation Project manager Maintenance for 5 years Total Table 9-1: b. 3 man months = 40'800 € 5 man months = 68'000 € b. 3.1 man months = 5 man months = 42'160 € 68'000 € 7 man months = 28'560 € 95'200 € 102'400 € 100'050 € 94'610 € a. 23.8 man months = 426'080 € a. 24.4 man months = 431'890 € a. 40.4 man months = 644'050 € b. 27.3 man months = 473'680 € b. 28 b. 42.7 man months = 675'330 € man months = 480'850 € Estimation of the costs of implementing the 2016 RID database QuoData GmbH b. 2.1 man months = Page 27 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 9.2 Annual maintenance costs of the 2016 RID database In this section, a breakdown of the maintenance costs from in the previous table is given. Costs are given in Euros per year (€/y) Scenario Cost item S1 S2 S3 Backups (and in the unlikely case of a disaster: restore). In S1 and S3, this is done centrally. In S2, each CP holds a copy of the database and has to take care of backups 1'320 €/y 1'450 €/y 2'380 €/y Software security updates to all involved software systems 3'300 €/y 4'520 €/y 5'490 €/y 13'210 €/y 9'680 €/y 7'940 €/y Handle special data requests for data users. Done by the organization responsible for database content, e.g. OSPAR secretary or Bioforsk 2'640 €/y 2'640 €/y 2'640 €/y Database maintenance of user rights and privileges. In S3, this means create/remove users. This is likely done by CPs and/or OSPAR Secretariat n/a n/a 472 €/y 20'480 €/y 20'010 €/y 18'922 €/y Yearly data submission: reoccurring effort for CPs and OSPAR Secretariat to keep data up to date; in S1, this contains additional costs due waiting time for feedback from the data validation; in S2 this contains the need for synchronization of the distributed copies Total Table 9-2: Estimation of the annual costs of maintenance QuoData GmbH Page 28 of 44 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 6 – Proposal for a solution for the RID Database 10 Time table of the implementation of the 2016 RID database The following table presents an example of which tasks can be taken care of in parallel. Some tasks take shorter than presented, but have to be done during that period. No specific scenario is targeted, as the durations needed for each task depend more on factors such as contractor experience, man power and requirement changes. However, in our opinion the call for tender should ask potential contractors for a more precise and date-bound schedule. Month Work package 1 2 3 4 5 6 7 8 9 Preparation of call for tender Collection of offers Decision and contract WP 1: Confirmation of database model WP 2: Choice of the database platform WP 3: Definition of workflow roles and user responsibilities WP 4: IT data integrity and security WP 5: Development of QA procedures for data validation WP 6: Implementation of the functionalities WP 7: Initial data migration and setup of production env. WP 8: Database testing and preparation for approval WP 9: Preparing documentation Table 10-1: Proposed road map for the project of the implementation of the 2016 RID database QuoData GmbH Page 29 of 44 10 11 12 13 14 15 16 17 18 11 Recommended scenario The recommended scenario depends on whether a common solution with the HELCOM PLC database (scenarios S4 or S5) or an independent, standalone RID database (S1 – S3) is preferred. 11.1 Common solution with HELCOM PLC The OSPAR RID and HELCOM PLC databases have partly different requirements. If the requirements can be harmonized, an integrated common database could make sense. The advantages of such a common system for data users and data providers are obvious. As costs for developing and maintaining the database can be split between HELCOM and OSPAR, considerable cost reductions would be achieved. However, there are comparatively high harmonization costs. Agreements have to reached, among other things, on Methodology of sampling and monitoring Definitions, e.g. unmonitored areas Methodology of verification of data Determination of measurement uncertainties Consideration of values below LOD/LOQ Methodology of the estimation of loads from unmonitored areas Methodology of trend analysis Principles of source apportionment Report structure The administrative amount of work involved in preventing any preference regarding OSPAR’s or HELCOM’s requirements must also be taken into account. Thus, for small and customized applications, a scheme as in scenario S5 is recommended. For a large-scale solution8, a common single database might be better-suited, as in scenario S4. 11.2 Independent, standalone RID database The complete re-development of the 2008/2014 RID database as web-only solution (S3) gives the user the most functionality in the easiest accessible way, but the costs are about 1.5 times higher than those connected with an improvement of the existing Access based database (S1 and S2). The existing database represents a solid foundation for the 2016 RID database. The basic 2008 and improved 2014 database can be expanded with an acceptable amount of work to fulfil the users’ requirements such as providing 8 an intuitive, flexible, robust, secure and quick application easy web-access to the RID data for data users easy import of RID data validated and quality-assured RID data, as well as flexible export of RID data. e.g. long-term collaboration, expansion of the scope of the RID Programme or strong integration into 3 rd-party databases. Compared to scenario S1, scenario S2 offers the considerable advantage that the time-consuming import is done by the data providers themselves, so errors are found quicker and feedback is more direct as continuous communication between central data host and CPs is no longer required. Thus, QuoData recommends scenario S2. A Annex A.1 Structure of 2008 RID database Figure 10-1: Implemented structure of the 2008 RID database For additional information, the reader is referred to the RID database documentation delivered by QuoData in 2008. A.2 Description of 2008 RID database A.2.1 Structure of input tables To cover the different input sources reported by the contracting parties, there are a several input tables. The tables 5a to 5c contain the direct discharges to the maritime area (sewage effluents in table 5a, industrial effluents in table 5b and the totals of direct discharges in table 5c). The tables 6a to 6c contain the riverine inputs to the maritime area (6a for main rivers, 6b for tributary rivers and 6c for the totals of riverine inputs). There are three other tables. Table 7 contains information on the measured concentration of a contaminant for a selected combination of contracting party – year – catchment area. Table 8 contains the contaminant-specific detection limits of the sewage effluents, industrial effluents and riverine inputs for a selected combination of contracting party – year – catchment area. Specific information on the catchment area, e.g. flow rate, are content of table 9. A.2.2 Main Form The whole data management of the 2008 RID database is conducted through the main form. This form consists of four tabs: Tables and Reports, Export/Aggregation, Import from Excel and Log Book. A.2.3 Tables and Reports The tab Tables and Reports is used for displaying various tables according to the selected criteria. Five possible overview reports can be created: Plausibility check, Annual summary, Structure of water bodies, Available tables, Annual data per year Figure 10-2: Main Form – Tables and Reports Plausibility Check In this section the data from each contracting party can be checked for its plausibility. For a respective contaminant the plausibility check tests the not aggregated raw data (1) whether the range of the load (upper value minus lower value) coincides with the limit of detection based on the run off (so-called “Noise Check”) and (2) whether there is a clear relationship between the mean concentration and mean load based on the runoff (so-called “Consistency Check”). Annual Summary In this section the following summary tables can be created for one corresponding year. Table 1a: Information Convention Received on Inputs to the Maritime Area of the OSPAR Table 1b: Determinands Reported by Contracting Parties Table 2: Direct Discharges to the Maritime Area of the OSPAR Convention by Country Table 3: Riverine Inputs to the Maritime Area of the OSPAR Convention by Country Table 4: Sum of Direct Discharges (Table 2) and Riverine Inputs (Table 3) to the Maritime Area of the OSPAR Convention by Country Table 4b: Sum of Direct Discharges and Riverine Inputs to the Maritime Area of the OSPAR Convention by Sea Area Figure 10-3: Example: Table 1a from Tables and Reports – Annual Summary Structure of Water bodies An overview of the structure of all water bodies for each contracting party can be exported in this section. Figure 10-4: Structure of Water bodies for Germany Availability Tables In this section a separate Excel file containing an availability overview for the input tables 5a – 5c as well as 6a – 6c can be exported per each contracting party. This file indicates for which water body and which year the specified contracting party submitted data. Figure 10-5: Availability Tables for Germany – table 6a Annual Data per Year For each contracting party and year the data of the input tables (5a - 9) can be exported separately in an Excel file. Available for selection is either an overview of the number of existing values or an overview of the lower, upper and mean load/concentration. Figure 10-6: Annual Data per Year – Existing Values Figure 10-7: Annual Data per Year – Annual Data A.2.4 Export/Aggregation Via this tab RID data over several years can be exported to Excel or directly to RTrend to generate longterm trends. Figure 10-8: Main Form – Export to RTrend In order to guarantee a statistical reliable trend assessment there are certain rules on extrapolation and interpolation of missing data and the aggregation of data. For the data export, a contracting party, the time span and, if applicable, the determinand have to be specified. A.2.5 Import from Excel Via this tab annual data can be imported to the RID database. For an easy import, Excel file templates can be generated for each contracting party and year. These templates correspond with the export files of the input tables (table 5a to 9). After entering the data into the Excel files, these may be uploaded. Figure 10-9: Main Form – Import from Excel Figure 10-10: Template for data input – CP Germany, year 2009, table 6a A.2.6 Log Book For the purpose of traceability actions of the database can be logged into a log book. All imports into the database are automatically logged and the log book can be exported to an Excel file. Figure 10-11: Main Form – Log Book The following actions can be recorded: Data delivery when data have been delivered; E-mail when information have been sent or received via an e-mail; Telephone call when information have been sent or received via a telephone call. Agenda Item 4 INPUT 14/4/3‐Add.2a‐E English only OSPAR Convention for the Protection of the Marine Environment of the North‐East Atlantic Meeting of the Working Group on Inputs to the Marine Environment (INPUT) London (UK): 28 ‐ 30 January 2014 Development of the RID Database ‐ Step 2: Evaluation of database model and outline of overall requirements Presented by the Secretariat and QuoData Attached are the results of the work by QuoData on ‘Step 2’ of the project as reported on 10 January 2014. Description of Step 2 Evaluate database model and outline overall requests on data submission, data access and database interface (from the contract between OSPAR Commission and QuoData) Under this task the Contractor will evaluate the existing database model (structure) and assess if it matches the database outputs required by users. This will involve the Contractor to: 2A assess the existing database; 2B. evaluate possible database platforms; 2C set minimum requirements; 2D identify pros and cons of different database designs. 2A Assessment of existing database The Contractor will assess the existing database model. For this assessment, the summary of the questionnaire (see step 1) will be used. The highly prioritized needs will be given focus. The Contractor will consider all relevant wishes and questions for the optimal database structure. Based upon these preconditions, the Contractor will evaluate, where the existing database does not meet the user requirements. This will happen through the contact to the RID DTG. 2B Evaluation of possible database platforms Based on the user requirements evaluated in step 1 and the minimum requirements (see next section) as well as other requirements (e.g. expenditures of time and costs), the Contractor will recommend the most appropriate technique to create the database and for OSPAR running and maintaining it. The Contractor will consider and offer a variety of platforms for the new database. Especially the database systems MS SQL, MySQL and MS Access, but also SAP Advantage Database Server and Postgre SQL. Regarding the frontend, the Contractor will also consider and offer a variety of options, such as Drupal, DaDaBIK or DevExpress’ XAF in order to generate user‐ friendly web user interfaces and guarantee filtering and sorting functions. 2C Minimum requirements The Contractor will work towards a proposal that guarantees that the end user works with a user‐friendly, intuitive interface. This could include a web interface, making the system requirements for the end user as minimal as possible and increasing the accessibility. To ensure the data is valid there are different scenarios of user roles. These will be such that they can be implemented through a system of user groups with different rights. 2D Pros and cons of different database designs The Contractor will evaluate the advantages and disadvantages of: 1. 2. 3. one commonly used database for OSPAR and HELCOM; two separate databases with a common structure; retain the existing RID database and implement a new interface upon it. The evaluation will be based on different categories concerning the user needs. These will be specified after the results of the questionnaire are summarized. Other categories will be: 1. 2. 3. 4. the security structure (different user roles and user groups or equivalent system); the ability to change the database structure in the future; the ability to generate reports and visualizations; the security and consistency of the data; and also expenditures of time and costs. 1 of 1 OSPAR Commission INPUT 14/4/3‐Add.2a‐E Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Evaluate database model and outline overall requirements on data submission, data access and database interface Imprint QuoData GmbH Quality Management and Statistics Kaitzer Str. 135 D-01187 Dresden Germany Phone: +49 (0) 351 40 28867 0 Fax: +49 (0) 351 40 28867 19 E-mail: info@quodata.de Web: www.quodata.de Authors PD Dr. habil Steffen Uhlig Dipl.-Phys. Christian Bläul Dipl.-Math. Henning Baldauf Dipl.-Math. Kirstin Frost 10.01.2014 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Contents 1 Summary and introduction ........................................................................................................... 4 2 Assessment of existing database (Step 2A) ............................................................................... 6 3 4 2.1 Prioritization of tasks ................................................................................................................. 6 2.2 Summary table .......................................................................................................................... 7 2.3 Requirements basic to all software systems ........................................................................... 13 Evaluation of possible database platforms (Step 2B) .............................................................. 14 3.1 Considered backends ............................................................................................................. 14 3.2 Frontend options ..................................................................................................................... 15 Approach comparison ................................................................................................................. 17 4.1 Implementation effort .............................................................................................................. 17 4.2 Summary table ........................................................................................................................ 18 4.3 Read-only web frontend for approach (A1) and (A2) .............................................................. 25 4.4 Synchronisation methods for the distributed copies in approach (A2) ................................... 25 4.5 Summary of effort by topic for the three approaches (Step 2D) ............................................. 26 5 OSPAR and HELCOM: Pros and cons of different database designs (Step 2D) .................. 29 A Appendices ................................................................................................................................... 34 A.1 DBMS backend possibilities comparison ................................................................................ 34 A.1.1 Performance .................................................................................................................................. 34 A.1.2 License costs ................................................................................................................................. 34 A.1.3 Operating system requirements ..................................................................................................... 34 A.1.4 Administration effort ....................................................................................................................... 35 A.1.5 Spatial Data (GIS).......................................................................................................................... 35 A.2 Web database vs. local database software ............................................................................ 35 A.2.1 Advantages of web applications .................................................................................................... 35 A.2.2 Disadvantages of web applications................................................................................................ 36 A.3 Technical considerations on frontend frameworks using Drupal as an example .................. 37 QuoData GmbH Page 3 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 1 Summary and introduction In this report, the current Access-based database as well as possible future web-based database 1 solutions are checked with regard to the wishes stated in the questionnaire of Step 1 of this project. Additionally, some implicit requirements, such as ease-of-use, an intuitive user interface and userfriendly error handling are considered. It was found that the current RID DB supports 7 of the 28 functions classified as “Need to have”. None of the 8 “Nice to have” functions are currently available. This classification was based on the frequency of answers in the questionnaire (see Section 2.1). From a technical perspective, all functions requested by users could be implemented into the existing Access-based solution. While every system has its advantages and caveats, based on the information obtained by QuoData th until Jan 6 2014, extending the Access database and making its data available for web browsers seems slightly more cost-efficient (see Section 4.3). Three approaches are considered: (A1) Access database with a read-only web frontend for data presentation, data downloads. Only 2 one copy exists, into which only one organisation imports data using a non-web import module. This approach cost about 18.4 ± 6 man months. (A2) Access database with a read-only web frontend for data presentation and data downloads. All data providers have a copy of the database, which they can use for importing, reporting 2 and GIS integration. All copies are synchronised by only one organisation . This approach cost about 19.9 ± 6 man months. (A3) Pure web solution. All functions are usable with a browser, including the data import by the CPs. This approach cost about 24.3 ± 10 man months. While the web-based approach is more accessible and future-proof, the Access approaches poses less risk. More differences and implications are discussed in Section 3 and 4, as well as Appendix A.2. The following sub-steps were defined in the contract between OSPAR and QuoData: Step 2A – Assessment of the existing database First of all the existing database model is assessed regarding the responses of the questionnaire. Based on all relevant wishes of data providers and data users, it is analysed where the existing database does not meet the user requirements and, if possible, how it can be implemented in the current RID database structure. Step 2B – Evaluation of possible database platforms The decision about the database management system should be based on the user requirements 1 To find out about the requirements of the end users ('data providers' and 'data users'), a questionnaire to prioritize new features and improvements for the RID database was filled in by 35 participants, from 21st October to 12th November 2013. A report containing all questions and answers has been provided by QuoData GmbH. 2 e.g. OSPAR secretary or Bioforsk Page 4 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface evaluated in step 1 and the resulting minimum requirements. Other requirements (e.g. expenditures of time and costs) are also considered. To determine which technique is most appropriate to create the database and for OSPAR running and maintaining the database, the different possible solution are assessed. There are a variety of platforms for a new database. Especially the database systems MS SQL, MySQL and MS Access, but also PostgreSQL should be mentioned here. Regarding the frontend there is another variety of options. Existing frameworks such as Drupal or DevExpress’ XAF could be used to generate user-friendly web user interfaces and guarantee filtering and sorting functions. Each of these solutions comes with a large number of well-tested plugins and modules to allow for speedy, cost-efficient and reliable software development. Step 2C – Minimum requirements Depending on the database platform it should always be guaranteed that the end user works with a user-friendly, intuitive interface. This could include a web interface, making the system requirements for the end user as minimal as possible and increasing the accessibility. MS SQL, MySQL or PostgreSQL and even MS Access databases can be maintained under a Server infrastructure. This is an easy way to guarantee the availability for the end users, fulfil security requirements, and warrant proper data validation. Therefore in this step, the different options are compared with regard to the overall minimum requirements for the • data submission • data validation • database interface etc. Step 2D – Pros and cons of different database designs In this last step the advantages and disadvantages of • one commonly used database for OSPAR and HELCOM, • two separate databases with a common structure • keeping the existing RID database and implement a new interface upon it are evaluated. The evaluation is based on different categories concerning the user needs. These are specified based on the results of the questionnaire. Other categories are: • the security structure (different user roles and user groups or equivalent system), • the ability to change the database structure in the future, • the ability to generate reports and visualizations, • the security and consistency of the data, and also • expenditures of time and costs. QuoData GmbH Page 5 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2 Assessment of existing database (Step 2A) In this section, the current MS Access-based RID database, abbreviated as RID DB from now on, is examined. First, a summary table to ease decision-taking is presented. In it, for all wishes stated, it is checked if they are currently served to end users, in other words that they are usable without any IT implementation effort. These estimated efforts will be spelled out in Section 4, where a table lists them for realization within the current RID DB and alternative proposal. 2.1 Prioritization of tasks To prioritize the questionnaire items, the scoring system explained below was used to harmonize the different answer possibilities. It resulted in the following categories to be used in this report: 1. ‘Need to have’: score of 0 or larger. Also in this priority category are requirements basic to all software systems: free of crashes, system availability, behaviour after the user makes an error, and intuitivity. 2. ‘Nice to have’: score fulfilling this condition 0 > score > -1 3. ‘Minority wish’: most respondents don’t consider this important: score < -1 To calculate the score, the possible answers were assigned a value. These values were averaged to create the score. Answer very important +2 fairly important +1 important 0 slightly important -1 not at all important -2 Answer Page 6 of 37 Value Value yes +2 no -2 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.2 2.2.1 Summary table Part 1: Tables of RID database Question What do potential users prefer? 1 Is it preferred to pool all monitored rivers in one comprehensive table? yes no 2 Should the set of separate RID tables be expanded? yes no 3 Which additional RID tables are needed? many proposals meas’nt uncertainty calculation methods 4 Should monitoring data used for RID calculations be stored in the RID database? very important fairly important important slightly important not at all important Should RID load data at less than annual aggregation be stored in the RID database? 5 If less-than annual values are to be stored in the RID database, what temporal aggregation should such data have? QuoData GmbH Score Prioritization Available in existing RID DB 28 of 33 5 of 33 1.39 Need to have no 10 of 29 19 of 29 -0.62 Nice to have -- n/a Nice to have -- 16 of 32 5 of 32 7 of 32 3 of 32 1 of 32 1.00 Need to have no very important fairly important important slightly important not at all important 10 of 32 9 of 32 7 of 32 1 of 32 5 of 32 0.56 Need to have no monthly per quarter every six months 27 of 30 3 of 30 0 of 30 n/a Need to have no 4 of 12 3 of 12 Page 7 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.2.2 Part 2: Access to the RID data Question What do potential users prefer? 1 Should the RID database have the functionality to exchange data with other databases? yes no With which database should a data exchange be possible? Score Prioritization Available to users of existing RID DB 25 of 32 7 of 32 1.13 Need to have no universal, easily accessible format that can be exchanged with other databases 7 of 22 n/a Need to have -- WISE (EEA) 6 of 22 Nice to have HELCOM PLUS, EMEP, WFD, MSFD, CEMP, CAMP, ICES database 2 of 22 Minority wish EMECO, Waterquality database NL, DONAR, GIS DB, GEMS/Water, DB on winter nutrient concentrations 1 of 22 Minority wish 2 Is there a need to allow easy access to data and charts by a webbrowser? yes no 33 of 34 1 of 34 1.88 Need to have no 3 Is there a need to allow easy access to data and charts by smartphone or tablet? yes no 6 of 32 26 of 32 -1.25 Minority wish no 4 Which additional conditions should apply to access to the RID database? data should be public after QA step 2 of 5 n/a Minority wish no original data should be attributed to original providers 1 of 5 contact of data manager if a user finds unreasonable values 1 of 5 1 of 5 Password protected; users from OSPAR community only Page 8 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.2.3 Part 3: Submission of data into the RID database Question What do potential users prefer? Score Prioritization Available to users of existing RID DB 1 Submitting data via importing the data file? very important fairly important important slightly important not at all important 11 of 17 4 of 17 2 of 17 0 of 17 0 of 17 1.53 Need to have yes Submitting data via transferring the data by “copy & paste”? very important fairly important important slightly important not at all important 1 of 11 1 of 11 4 of 11 3 of 11 2 of 11 -0.36 Nice to have no Excel file format for submitting data? very important fairly important important slightly important not at all important 10 of 18 4 of 18 3 of 18 1 of 18 0 of 18 1.28 Need to have yes Comma separated values (CSV) file format for submitting data? very important fairly important important slightly important not at all important 3 of 13 2 of 13 3 of 13 1 of 13 4 of 13 -0.08 Nice to have no Which other formats should be used for data submission? ACCESS XML ASCII 2 of 9 2 of 9 1 of 9 n/a Minority wish no 3 Should be given the opportunity to submit partial RID datasets? yes no 14 of 17 3 of 17 1.29 Need to have yes 4 Should there be a possibility to add information on the ‘measure of uncertainty’? very important fairly important important slightly important not at all important 6 of 17 7 of 17 2 of 17 1 of 17 1 of 17 0.94 Need to have no Should there be a possibility to add additional comments on data (e.g. flagging when data is missing or suspicious)? very important fairly important important slightly important not at all important 6 of 18 8 of 18 3 of 18 1 of 18 0 of 18 1.06 Need to have no 2 QuoData GmbH Page 9 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.2.4 Part 4: Validation of imported data Question What do potential users prefer? Score Prioritization Available to users of existing RID DB 1 Covering quality control procedure for missing values? very important fairly important important slightly important not at all important 14 of 20 1 of 20 1 of 20 2 of 20 2 of 20 1.15 Need to have no Covering quality control procedure for invalid values (e.g. wrong units in concentration values)? very important fairly important important slightly important not at all important 17 of 20 2 of 20 0 of 20 1 of 20 0 of 20 1.75 Need to have no Covering quality control procedure for suspicious values? very important fairly important important slightly important not at all important 10 of 20 5 of 20 2 of 20 3 of 20 0 of 20 1.10 Need to have no Covering quality control procedure for too many significant figures? very important fairly important important slightly important not at all important 4 of 18 5 of 18 4 of 18 4 of 18 1 of 18 0.39 Need to have no 2 Which further automatic tests should be taken into account? Double recordings 3 of 12 n/a Nice to have no 3 Who is responsible for the final approval of imported data? Contracting Party only relevant OSPAR Committee somebody else 14 of 18 n/a -- n/a -- Who should be responsible for the final approval of imported data instead of Contracting Parties or relevant OSPAR committee? Page 10 of 37 RID input data group or data manager 2 of 18 2 of 18 4 of 18 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.2.5 Part 5: Functionality Question What do potential users prefer? Score Prioritization Available to users of existing RID DB 1 Should it be possible to filter, sort and search entries (for example select parameters, years, areas) from the different parts of the RID tables? very important fairly important important slightly important not at all important 22 of 31 7 of 31 0 of 31 2 of 31 0 of 31 1.58 Need to have no 2 Should it be possible to customize e.g. the range of geographic areas (e.g. across sub-regions) from which data are to be aggregated in data products? very important fairly important important slightly important not at all important 20 of 31 8 of 31 1 of 31 1 of 31 1 of 31 1.45 Need to have no 3 For which particular features you would like a ‘customization’ option? subregional assessment Total river basin inputs parts of sea area covered by OSPAR 11 of 20 n/a Nice to have no very important fairly important important slightly important not at all important 17 of 30 8 of 30 2 of 30 3 of 30 0 of 30 1.30 Need to have yes Exporting data via CSV file? very important fairly important important slightly important not at all important 10 of 23 7 of 23 3 of 23 3 of 23 0 of 23 1.04 Need to have no Exporting data via XML file? very important fairly important important slightly important not at all important 5 of 26 3 of 26 7 of 26 5 of 26 6 of 26 -0.15 Nice to have no Which other formats should be used for data table export? ACCESS ASCII Shape file for graphs PDF 2 of 11 3 of 11 2 of 11 1 of 11 n/a Minority wish no 14 of 31 9 of 31 5 of 31 3 of 31 0 of 31 1.10 Need to have no 4 Exporting data via Excel file? 5 Should the RID database be able to link to a GIS system so as to create maps? QuoData GmbH very important fairly important important slightly important not at all important 5 of 20 3 of 20 Page 11 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.2.6 Part 6: Additional features Question What do potential users prefer? Score Prioritization Available to users of existing RID DB 2 Should LOD be included in the RID database? very important fairly important important slightly important not at all important 14 of 29 6 of 29 4 of 29 3 of 29 2 of 29 0.93 Need to have yes Should LOQ be included in the RID database? very important fairly important important slightly important not at all important 15 of 28 7 of 28 5 of 28 1 of 28 0 of 28 1.29 Need to have no Should flow measurements be included in the RID database? very important fairly important important slightly important not at all important 22 of 29 5 of 29 2 of 29 0 of 29 0 of 29 1.69 Need to have yes Should proportions of measurements above/below LOD and LOQ be included in the RID database? very important fairly important important slightly important not at all important 14 of 28 6 of 28 4 of 28 3 of 28 1 of 28 1.04 Need to have no Should level of the annual aggregated loads be included in the RID database? very important fairly important important slightly important not at all important 11 of 27 7 of 27 4 of 27 4 of 27 1 of 27 0.85 Need to have no 3 Should a separate module for trend analysis (e.g. RTrend software) be part of the RID database capabilities? yes no 24 of 32 8 of 32 1.00 Need to have yes 4 Should some of the fields of the ‘text reporting format’ be incorporated into the RID data submission and the RID database? yes no 19 of 29 10 of 29 0.62 Need to have no Page 12 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 2.3 Requirements basic to all software systems The items mentioned below were not asked in the questionnaire. Instead, they are an implicit requirement of almost all software projects: • Ease-of-use: describes if user can get the results he needs quickly The current RID DB has a clearly defined feature set. The only part that consumes more user time than necessary is the data import, which is part of the improvements of Step 4 of this project. The actual functionality potential users would expect was discussed above. • Intuitive user interface: describes how difficult it is for a user to guess or learn which functions are where, and how they work. As the current RID DB consists of just 4 tabs in a single window, as can be seen in Figure 1, learning to use it is easy. • Error handling: when the user tries something that is prevented by the software or if the software needs more input than the user has given: Most error messages are currently very short. It is recommended to provide a higher level of detail. After an error occurs, the software remains usable. However, the RID database does not clearly communicate which contents have been updates. Improving this is also part of Step 4. Figure 1: Main window of current RID DB QuoData GmbH Page 13 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 3 Evaluation of possible database platforms (Step 2B) In the previous section, the currently existing Access solution was assessed. This section will focus on possible alternatives for an entirely new development of the RID database (without using existing VBA source code). For a comparison between the two approaches (A1) and (A2) of enhancing the currently existing database and creating a new solution (A3) from scratch, the reader is referred to Section 4. The general term “database” can be split in the realms of data storage and user interaction. The first is often called a back-end or Database Management Systems (DBMS), and only requires attention from IT administrators during initial setup and during emergencies. The latter is the frontend, and typically the only part any user experiences. A typical frontend can work together with several backends and vice versa. 3.1 Considered backends In this section, four different DBMS are analysed from different perspectives. To begin with, these systems are presented shortly. • MySQL MySQL is a widely-used open source DBMS. • MS SQL Server MS SQL Server is a powerful DBMS produced by Microsoft. • MS Access MS Access is a software package produced by Microsoft which combines a DBMS (called Microsoft Jet Engine) with an integrated development environment. • PostgreSQL PostgreSQL is an open source DBMS released by a community of developers called PostgreSQL Global Development Group. MS Access differs from the other 3 analysed DBMS in that its main focus lies on desktop database applications. It offers a powerful graphical user interface which makes it comparatively easy to create and maintain databases. Database applications with higher complexity can be created with the development environment for Visual Basic for Applications (VBA), which is integrated in the software package. It is possible to use Access as database in a browser-based web application as a data storage, but not as a programming environment. Alternatively, Access applications can be used over the web with the help of Microsoft Terminal Services and Remote Desktop Application in Windows Server 2008. This can be an appropriate solution if an existing local Access application should be extended to remote users. But compared to a browser-based web application, it does not perform as well in a multi-user scenario and is connected with higher licence costs. The maximum size of an Access Database is 2 GB, which could be a limiting factor in the long term, because (a) the database grows much faster in multi-user environments due to its internal change 7 tracking , and (b) the possible onset of sub-annual data collection. Page 14 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface As 33 out of 34 respondents of the questionnaire expressed the opinion, that the database should be accessible by a web browser, using MS Access as the data storage for an entirely new development is an inferior choice. It is therefore not be included in the following analysis. Please note that this statement is not about extending the existing RID DB, but about selecting a backend for an alternative solution. The other 3 database systems however perform equally well on the expected data amounts and user numbers. The details that lead to this conclusion can be found in Appendix A.1. The following table summarises the information given there. MySQL MS SQL PostgreSQL Performance good excellent good License costs free free free Operating System Windows, Linux, UNIX Windows Windows, Linux, UNIX Administration effort low high moderate Spatial Data (GIS) basic support full support full support 3.2 3 Frontend options A frontend is what the user sees: it determines the graphical user interface, typically having buttons, input boxes, information displayed in tables etc. In the following paragraphs, three options are presented. They are not to be confused with the three approaches discussed elsewhere in this document. Here are the general options for frontend creation: (1) A native application, typically a Windows program (2) A web application (3) A mixture of both (1) In the most common scenario, a Windows program needs to be installed locally. This means that every user of the new RID DB has his or her own installation, and typically expects its data to be up-to-date. A Windows application could connect to a central server via the web to synchronize the local copy of the data with the central server. Well-designed native applications have speed benefits over pure web applications, but require more attention from the IT department within every institute the application is used in. This would limit the reach to users that depend on the database and might deter potential users. (2) In a web application, everything is managed within a browser. Therefore, there is no installation effort for end-users. This option is considered in approach (A3). It makes the database very easy to access, but might feel slower for certain data-centred tasks. 3 for SQL SERVER EXPRESS EDITION QuoData GmbH Page 15 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface (3) To relieve this problem and avoid the problems of the native application, one can split the user requirements into some for a core user group that have to use a native application (ideally with better speed) and a web application for occasional users or users within restrictive IT environments. Using this architecture, a locally installed software could be used for importing and Windows-centric tasks like GIS integration. An additional web application would be specialized for data presentation, data downloads and reporting. The existing Access application could partly be reused and therefore the development effort could be reduced. On the other hand, two different systems would have to be maintained, which will increase the long-term costs. This option is used for approach (A1) and (A2). For a detailed list of pros and cons of web applications, the reader is referred to Appendix A.2. We believe that for a new development without the use of existing source code, option (2) carries most benefits, least risk and smallest overall costs, especially when including the internal costs within the Contracting Party. If a fresh start is chosen, it is therefore recommended to develop the database user 4 interface as a complete web application. For this task, a content management system like Drupal or a 5 framework like DevExpress eXpressApp Framework (XAF) could be used instead of starting from scratch. Drupal is free of costs and supports the three web DBMS backends mentioned above. More details on the development process of a future RID DB with Drupal can be found in Appendix A.3. XAF requires a one-time development licence (ca. 1604 €), and supports all above mentioned database systems. Instead of using a framework, a web application could also be developed from scratch with much more effort but also more possibilities for customization. The best choice of programming language and framework depends on the experience of the software development team, but the situation is similar for other available frameworks and programming languages. 4 5 http://drupal.org https://www.devexpress.com/products/net/application_framework Page 16 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4 Approach comparison In this section, technical explanations and effort estimations are presented for each question of Step 1. The two approaches (A1) and (A2) of extending the current Access database and approach (A3) of creating a new web application are compared. 4.1 Implementation effort In the tables below, the effort column lists the estimated technical effort (time) needed to implement 6 and test the feature. It does not take into account the effort to create end-user and IT documentation, because time needed for documentation is similar for most functions. Also, the effort column does not include how much work the change entails for the IT department of the Contracting Party. Before the implementation can begin, it is recommended to craft a detailed requirements document. The effort for describing the feature in such a document is crucial for the success of the overall project, but is not part of the implementation estimate given below. Here, only a qualified statement is given due to the high uncertainty of individual tasks. The exact hours depend on factors such as past experience and organisational overhead (i.e. a project done by 1 person during 4 years can’t be sped up to be finished in one year by 4 people. Instead, an overhead of 50-100% for design, internal communication needs and quality assurance is to be expected). Nevertheless, here are the approximate hours for the qualitative terms: Effort statement Man months without overhead small 0.0 – 0.3 medium 0.3 – 1.2 high 1.2 – 1.8 In Section 4.3, the efforts of for the questions are quantified by topic. 6 Here, test refers to the individual function being tested by the developer or someone with a similar mindset. It ensures that the function works as the developer indents it to work. Only if proper documentation or communication exists, this will also ensure that the function works how the end user needs it. It does not refer to integration tests (all functions coming together) or user experience tests. QuoData GmbH Page 17 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.2 Summary table The following headings group the assessment by questionnaire topics. Note that approaches (A1) and (A2) are not evaluated separately, because from the programming perspective, they are almost identical, with the exception of an additional synchronization method for approach (A2). 4.2.1 Part 1: Tables of RID database Question Prioritization Available in existing RID DB 1 Is it preferred to pool all monitored rivers in one comprehensive table? Need to have no 2 Should the set of separate RID tables be expanded? Nice to have -- 3 Which additional RID tables are needed? Nice to have -- 4 Should monitoring data used for RID calculations be stored in the RID database? Need to have no Should RID load data at less than annual aggregation be stored in the RID database? Need to have no Page 18 of 37 What needs to be done to extend the Access RID DB? Effort to extend Access DB Effort for new web application Within the RID DB, no distinction is made between main rivers, tributary rivers, areas or point sources. This distinction only exists for the import files. The import workflow including templates and its messages needs to be updated. The main work here would be for the data providers who have to change their reporting procedures and data collection systems. small included in minimal design Expanding the tables affects both the import and the export. It would also need comprehensive documentation for the data providers and split-up of existing tables. This task would likely also make it necessary to communicate with Bioforsk to make sure that the required accuracy is reached. medium less than Access The current DB design is not meant for less-than-annual data. Import, export and data storage would need to be changed. This change also implies a larger planning step to make sure that the frequency is either flexible or fixed correctly. medium about half of Access QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.2.2 Part 2: Access to the RID data Question Prioritization Available in existing RID DB 1 Should the RID database have the functionality to exchange data with other databases? Need to have no 2 Is there a need to allow easy access to data and charts by a webbrowser? Need to have 3 Is there a need to allow easy access to data and charts by smartphone or tablet? Minority wish QuoData GmbH What needs to be done to extend the Access RID DB? Effort to extend Access DB Effort for new web application A connection to another database cannot be assessed without proper documentation. It is likely that a complete link would be very hard to create based on the assumption that data is likely to be not fully compatible. Access can, in theory, connect to online databases to retrieve or submit data. This would require the internet connection for Access, which sometimes is not granted by the respective IT department. Currently, the RID DB doesn’t support any remote data sources and no generic import or export format. Indeed, for some data, no export functionality exists in the current RID DB. medium medium no Access doesn’t provide an easy way to make its interface available online. While a web frontend could be built on top of the current database, it would mean a tremendous effort, because the existing VBA code is meant for a locally installed Access. It would be almost impossible to use more than the idea and the structure of the existing code, because a web frontend would not use VBA (see Step 2B). Also, Access doesn’t make its VBA code directly available to other programming language. The only sensible way would be to use intermediate communication tables that store VBA input or output temporarily. high included in minimal design no Smartphones and tablets can read Excel files, but not Access files. Thus, a data export would be needed before copying the data on the mobile device. This is however very unpractical because a desktop Access installation would be needed. Alternatively, the web interface of question 2 would be accessible by a mobile device. Unless the web interface is indented to be used on mobile devices, it would be cumbersome to use. medium medium Page 19 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.2.3 Part 3: Submission of data into the RID database Question Prioritization Available in existing RID DB 1 Submitting data via importing the data file? Need to have yes This is currently implemented. Submitting data via transferring the data by “copy & paste”? Nice to have no Excel file format for submitting data? Need to have Comma separated values (CSV) file format for submitting data? Nice to have 2 Page 20 of 37 What needs to be done to extend the Access RID DB? Effort to extend Access DB Effort for new web application available see below The RID DB can produce empty template files. These files can be filled with copy and paste already, but no format detection or column assignment method exists at the moment. Implementing such a mechanism would reduce the import effort dramatically, but would not be sensible because it would imply that the data provider has a copy of the Access RID DB. This copy would then need to be synchronized with the main DB. The institution responsible of collecting data files, e.g. Bioforsk already receives files in the correct Excel format, ready for import. They would probably refuse undocumented formats to use such a smart “copy & paste” mechanism, as the Excel template already serves as both documentation of the data meaning and as import format. small small if file-based import exists, otherwise medium yes Currently, the RID DB only accepts files opened in Excel. The file format itself is merely a container. Excel files increase the interoperability because it avoids decimal separator conflicts. Also, the RID DB produces native Excel templates (empty files). For the RID DB to support CSV of the exactly same format as the Excel file, the Effort of implementing this task is estimated as 1. Another fundamental question is the content format detection and column assignment discussed above. available medium no Some participants suggested using a tab-separated format. From the implementation effort, it is no different to supporting CSV files. small small if file-based import exists, otherwise medium QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Question Prioritization Available in existing RID DB What needs to be done to extend the Access RID DB? Effort to extend Access DB Effort for new web application Which other formats should be used for data submission? Minority wish no Support of generic XML files would be very time-consuming, but supporting a well-specified format would be easily realizable. Some respondents asked for Access as an import format. Unless the DB format is well-specified, this would be effortful to implement. tabseparated: small; XML: high small if file-based import exists, otherwise medium 3 Should be given the opportunity to submit partial RID datasets? Need to have yes The current RID DB already supports partial imports. However, no feedback or warning mechanism exists for the user to see that not all areas, rivers, or substances have been imported. available no additional effort 4 Should there be a possibility to add information on the ‘measure of uncertainty’? Need to have no Supporting the uncertainty import would imply a change of the current reporting format, leading to time-consuming changes in the data provider’s data collection infrastructure and scientific underpinning/verification. From the IT perspective, a pure import and storage would not be a very difficult task. Measurement uncertainties should be importable as absolute and relative values. The reporting and export would also need to be changed, requiring a comprehensive planning stage. medium medium Should there be a possibility to add additional comments on data (e.g. flagging when data is missing or suspicious)? Need to have no The meta data is currently not stored in an accessible way. The main task would be defining which objects and data structures can be commented on, e.g. rivers, individual concentrations or loads, entire years or catchment areas. Another issue is whether comments have to be imported or can be inserted later. Who is allowed to comment? Where should the comments appear? After completion of the planning, the implementation is straight-forward. medium medium QuoData GmbH Page 21 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.2.4 Part 4: Validation of imported data Question Prioritization Available in existing RID DB 1 Covering quality control procedure for missing values? Need to have no Covering quality control procedure for invalid values (e.g. wrong units in concentration values)? Need to have no Covering quality control procedure for suspicious values? Need to have Covering quality control procedure for too many significant figures? Which further automatic tests should be taken into account? 2 Page 22 of 37 What needs to be done to extend the Access RID DB? Effort to extend Access DB Effort for new web application small small small small no small small Need to have no smallmedium smallmedium Nice to have no medium per test medium per test The detection of these values is straightforward. To detect such values, one or more statistical models and tests are needed. Due to the nature of these tests, a probability of false alarm exists. Therefore, the test should merely be a flag, advising the user about the problem, but not stopping him from importing the data. QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.2.5 Part 5: Functionality Question Prioritizatio n Available in existing RID DB 1 Should it be possible to filter, sort and search entries (for example select parameters, years, areas) from the different parts of the RID tables? Need to have no 2 Should it be possible to customize e.g. the range of geographic areas (e.g. across sub-regions) from which data are to be aggregated in data products? Need to have no 3 For which particular features you would like a ‘customization’ option? Nice to have no 4 Exporting data via Excel file? Need to have yes Exporting data via CSV file? Need to have no Exporting data via XML file? Nice to have no Which other formats should be used for data table export? Minority wish no Should the RID database be able to link to a GIS system so as to create maps? Need to have no 5 QuoData GmbH What needs to be done to extend the Access RID DB? Effort to extend Access DB Effort for new web application Access provides functionalities to filter, to sort and to search the data records. These are only available direct in the database view or in the development view. Therefore it is better to let the end user directly use the database and not just the form frontend. smallmedium smallmedium Options for customizing data aggregation could be implemented in Access. Therefore it would be necessary to alter the frontend and the data queries in the backend. medium medium Access provides the export capability for Excel files and also CSV files. Therefore it is easy to implement features regarding the export that are not yet in the RID DB. These export functions are the easiest to implement. Export to the formats XML and ACSII is possible too, but not demanded by a majority of users. PDF export can be established through report functions in Access and some open-source software. available small small small medium small PDF: small PDF: small ArcGIS: small-high small-high By the use of OLE DB driver, Access can connect to ArcGIS. This can be used to create maps out of the tables, but would lead to additional license costs. Please note that the RID DB does currently not store geographic shapes. Page 23 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.2.6 Part 6: Additional features Question Effort to extend Access DB Effort for new web application available small small small yes available small Need to have no small small Should level of the annual aggregated loads be included in the RID database? Need to have no small small 3 Should a separate module for trend analysis (e.g. RTrend software) be part of the RID database capabilities? Need to have yes A separate module for trend analysis could be implemented in the RID DB. This functionality already exists with the RTrend Software. However, there is only an interface and an indirect data exchange by opening an EXCEL file which was created beforehand in the RID DB in RTrend. small for testing and improving existing interface 4 Should some of the fields of the ‘text reporting format’ be incorporated into the RID data submission and the RID database? Need to have no Adding meta data to the RID DB could be managed. To add meta data it is necessary to extend the data model of the RID DB. This can be done with access internally. Type and coverage of the meta data stored in the RID DB have to be specified later. medium 2 Prioritization Available in existing RID DB Should LOD be included in the RID database? Need to have yes Should LOQ be included in the RID database? Need to have no Should flow measurements be included in the RID database? Need to have Should proportions of measurements above/below LOD and LOQ be included in the RID database? Page 24 of 37 What needs to be done to extend the Access RID DB? The decision about the features depends also from the ability of the data providers to provide data for the asked parameters. Based upon this ability, these additional features can be implemented through expanding the data model of the RID DB and thus can be done within Access. medium for RTrend or generic Excel file creation medium for web trend analysis without external tools medium QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4.3 Read-only web frontend for approach (A1) and (A2) 97% of questionnaire respondents want a browser-based access to “data and charts”. This is a vague statement. To be able to supply an effort estimate, the following functions are proposed: • Table of loads or concentrations per source and year (filterable and searchable, so e.g. only one substance for all years and sources, or all substances for one year etc.) • Table of inputs for an aggregate of sources (e.g. multiple rivers) or catchment area • Time series charts per substance, with a selectable source or aggregate of sources • Chart and table of trend assessment and load adjustment (based on RTrend) When choosing approach (A1) or (A2), we propose to limit the web capabilities to read-only access, because • only one role is needed, which makes implementation and administration easier • import and data validation already exist in the Access, and can be extended more quickly • avoid quick growth of the Access DB. 7 The web frontend may use a Access DB copy as its data source to avoid adding complexity. 4.4 Synchronisation methods for the distributed copies in approach (A2) The motivations for using approach (A2) are the following: • Existing parts of the Access database can continue to be used (no data migration, re-programming or testing). • Currently, only one organisation has access to all functions of the database. If approach (A1) is taken, this might remain true, whereas approach (A2) is making the database available to all Contracting Parties. Thus, more users could benefit from existing and future functions. The time-consuming import is done by the data providers themselves, so errors are found quicker and feedback is more direct. This is done by distributing copies of the Access DB to the CPs.The last point requires synchronisation of the database copies. Since old data is not modified and each CP only adds their own new data, programming the synchronisation can be done, for example, with Jet Replication. Additionally, the keeper of the master copy needs to have a way of checking which data was already imported to avoid the problem of forgetting the data of one CP. This check can be done by a “close year for import” function, that also publishes the data on the read-only web frontend as official. This leads to the following workflow proposal: 1. Access database copies are sent out to the CPs, after final approval of the development. 2. Each CP imports their data using their copy of the Access DB. 3. The modified copy is made available to the keeper of the master copy, who also runs the web frontend. 7 Since Access is a file-based solution, it handles possible concurrent write operations by reserving space, which in practice means it grows quicker than TCP/IP-based databases. QuoData GmbH Page 25 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 4. The keeper imports data into the master copy. 5. The software will check if all data was imported when the keeper closes the year for import. 6. The master copy is again distributed to the CPs for usage and import of next year’s data. 4.5 Summary of effort by topic for the three approaches (Step 2D) In the following tables, the following three approaches are compared: (A1) Access database with a read-only web frontend for data presentation, data downloads. 2 Only one copy exists, into which only one organisation imports data using a non-web import module. (A2) Access database with a read-only web frontend for data presentation, data downloads. All data providers have copy of the database, which they can use for importing, reporting and 2 GIS integration. All copies are synchronised by only one organisation . (A3) Redevelopment for the web. All functions are usable with a browser, including the data import by the CPs. When extending the Access DB, most user requirements would be implemented using VBA, which means they are not accessible from the Internet (see Section 3.1). Only the functions explicitly asking 8 for web browser support would be made available from the Internet using a non-VBA solution. For approach (A3), developing a new web-based application, the estimates are based on using the XAF framework. This implies that all functions are usable from any browser via the Internet. Experts in web programming are already easier to find than VBA experts. This and other factors lead us to the conclusion that approach (A3) is more suitable for a long-term solution that can be adapted to future decisions and currently unknown needs with less effort. The most important difference is how the data providers experience the RID DB: In case of approach (A1), the data import needs to be done by one host (that was the role of Bioforsk). Within approach (A2), a mechanism for synchronising distributed Access DB copies has to be tested, documented and executed by the CP. While the former will likely result in slightly higher reoccurring costs, the latter is associated with a one-time effort of about 1-2 man months. With the web-based approach (A3), the CP can import the data themselves, with immediate feedback (e.g. flagging of problematic/missing data) and thus less communication effort for synchronisation or Excel file exchange. 8 e.g. question 2.2 from Step 1: “Is there a need to allow easy access to data and charts by a web-browser?” Page 26 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Criterion Implementation costs for future requirements (A1) (A2) higher highest (A3) 9 lower Functionality exposed to remote users little little all Existing Access application could partly be reused (smaller development, testing and documentation effort) yes yes no Reoccurring effort for CPs and OSPAR secretary to keep data up to date high medium small The man months below are rough estimates, not precise quantities. For the estimations, the HELCOM efforts (e.g. cooperation or an interface) were ignored. The table below is based on the Step 1 questionnaire where indicated. Please note that the questionnaire was kept brief on purpose, and there is a chance that not all user requirements have been captured. Minority wishes and the comments of open (free-text) questions have not been included in the table below. 9 Several MS Access versions have to be supported, new software versions have to be distributed QuoData GmbH Page 27 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Topic (A1) (A2) (A3) 0.2 0.2 0.2 Detailed specification of data model Recommended (additional) IT documentation for status quo Implement data model, restructure database, move to new platform, move data to new database Initial implementation, migration of existing data - - 2.5 - 1.5 - Need to have: questions 3.4a, 6.2, 6.4 1.2 1.2 1.3 Nice to have: questions 1.2, 1.3 1.4 1.4 1.3 General planning and specification of import workflow 0.4 0.4 0.3 Need to have: questions 3.1, 3.2a, 3.2c 0.2 0.2 1.2 Nice to have: question 3.2b 0.1 0.1 0.3 General planning and design of data validation including mathematical details 0.6 0.6 0.6 Need to have: questions 3.4b, 4.1 1.0 1.0 1.3 Nice to have: question 4.2 0.4 0.4 0.4 Synchronising distributed Access DB copies Review, design and implement new reporting form(s) Set up quality assurance system for data reporting Operationalize the web application for reporting and quality-checking of national data Integration testing and adaptations 2.0 2.0 2.8 Hosting, setup, approval testing by OSPAR 1.5 1.5 2.5 Documentation: maintenance and end-user 1.0 1.0 1.1 Set up public web application for users to view, graphically display and download data Need to have: questions 2.2 = Read-only web interface 1.5 1.5 - Need to have: questions 2.3, 5.1, 5.2, 5.4a-b, 5.4d, 6.3 1.7 1.7 1.5 Nice to have: question 5.4c (XML) 0.7 0.7 0.5 2 2 4 0.5 0.5 0.5 2 2 2 18.4 19.9 24.3 Other reporting and analysis tools not covered in the questionnaire Integration in OSPAR information system/GIS etc. Questions 2.1, 5.5: minimal well-defined data link (does not cover creating a GIS system or map-based user interface) Organisational overhead (meetings, web conferences) Σ Page 28 of 37 QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface 5 OSPAR and HELCOM: Pros and cons of different database designs (Step 2D) Below, advantages and disadvantages of the following database designs are discussed: 1. one commonly used database for OSPAR and HELCOM 2. two separate databases with a common structure 3. independent, standalone RID database, i.e. keeping the existing RID database and implement a new user interface upon it. QuoData GmbH Page 29 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Requirements to be compared Database content (main tables for content items and metadata) Backend (part of application which defines functionality and logic and which is not seen by user) Frontend (graphical user interface) Page 30 of 37 one commonly used database for OSPAR and HELCOM same two separate databases with a common structure same + consistent content between HELCOM and OSPAR + consistent metadata between HELCOM and OSPAR - needs to be clarified (between HELCOM and OSPAR) which content and metadata shall be of interest - needs to be clarified (between HELCOM and OSPAR) which relations between the tables shall exist same same + one-time development work for backend (one web server, one database management system) + consistent tables, relations and procedures - needs to be clarified (between HELCOM and OSPAR) which backend shall be chosen same + one-time development work for frontend + consistent forms and reports + same data query - needs to be clarified (between HELCOM and OSPAR) which frontend shall be chosen different Independent, standalone RID DB different - inconsistent content possible - inconsistent metadata possible + no administrative expenditure to harmonize the tables, their definitions and attributes + import format doesn’t change, CPs don’t need to adjust different - double development work for backend (two web servers, two database management systems) - inconsistent tables, relations and procedures possible + no administrative expenditure to harmonize backend choice different - double development work for frontend - backend become costly if the functionality of the frontend is not clear - different forms and reports - different user interface + optical differentiation between different areas of responsibility of HELCOM and OSPAR + no administrative expenditure to harmonize frontend choice QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Requirements to be compared Data submission/ data entry one commonly used database for OSPAR and HELCOM same two separate databases with a common structure same + consistent import tables for HELCOM Plus and OSPAR RID + same procedure for Contracting Parties with Baltic coast and North-East Atlantic coast (no double expenditure due to only one data format) - needs to be clarified (between HELCOM and OSPAR) how the import tables, import designs look like - needs to be clarified (between HELCOM and OSPAR) what shall be imported (measurement uncertainty, monthly data, methodology of calculation, calculation factors, GIS data etc.) - needs to be clarified (between HELCOM and OSPAR) what do the different definitions (e.g. sub-regions, areal and point source definitions) mean clearly Data verification (quality assurance) QuoData GmbH same same + same automatic quality control + same verification tools (same algorithms for detection of format errors, missing values, suspicious values, invalid values or duplicates) + same statistical calculations (e.g. the detection of outliers highly depends on the statistical test applied) + same documentation (e.g. quality reports, messages, flagging of data) + same correction methods + comparability of HELCOM Plus data and OSPAR RID data - needs to be clarified (between HELCOM and OSPAR) which verification tools shall be used, which different kinds of documentation shall be inserted and which correction methods shall be carried out Independent, standalone RID DB different - inconsistent import tables possible - double expenditure for Contracting Parties + no administrative expenditure to harmonize import tables and content of import tables different - likely different verification tools - data providers have to consider two different QA steps (verification tools, documentation systems and correction methods) - comparability of data can be slightly limited due to different QA + no administrative expenditure to harmonize QA Page 31 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface Requirements to be compared Data output (data analysis, graphs, trend charts, GIS maps, results tables, reports) User rights one commonly used database for OSPAR and HELCOM same + easy access to marine environmental data for institutes and authorities, Contracting Parties or scientists + compatibility of output data (same formats) + comparability of HELCOM Plus data and OSPAR RID data + corresponding tools need to be implemented just once - needs to be clarified (between HELCOM and OSPAR) which data shall be exported and how same two separate databases with a common structure different same + consistent documentation + only one documentation for users - consultations between HELCOM and OSPAR with regard to responsibilities in writing (administrative overhead) Page 32 of 37 different - users need to access two different systems - different output formats - user has to learn to read different graphs, charts and tables (e.g. different axis, legends, displays, colours) - corresponding tools need to be developed twice + no administrative expenditure to harmonize data export + specific requirements can be taken into account (e.g. different reporting or different GIS maps are required or other trend charts are of interest) same + clear superior user groups with different rights, for instance data users, data providers, data managers and IT administrators + only one group for data managers and IT administrators + only one login for data users and data providers - needs to be clarified (between HELCOM and OSPAR) who is allowed to do what and who are data managers and who are IT administrators Guidelines/Instructions Independent, standalone RID DB same/different + same backend manual - different frontend manual different - different user groups possible - data manager and IT administrators for each system - double login data for data users + clear separation between HELCOM Plus and OSPAR RID data different - user need to read different instructions + no consultations necessary + specific features can be focused on (e.g. HELCOM features that are irrelevant to OSPAR) QuoData GmbH Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface OSPAR and HELCOM have partly different thematic focuses and also different requirements. This also applies to the application of the OSPAR RID database and the HELCOM PLC database. If the focuses and requirements could be harmonized, an integrated common database would be an alternative. The advantages of such common system for data users and data providers are obvious. But the comparatively high costs for harmonization have to be taken into account. Agreements have to reached, among other things, on • Methodology of sampling and monitoring • Definitions, e.g. unmonitored areas • Methodology of verification of data • Consideration of LOD/LOQ, measurement uncertainty • Methodology of the estimation of loads • Methodology of the quantification of loads from unmonitored areas • Methodology of trend analysis • Principles of source apportionment • Report structure It has also to be taken into account what happens if the focuses from OSPAR and HELCOM diverge in 10 years. It has always to be assured that there is no preference regarding OSPAR or HELCOM. Thus, the administrative effort should not be underestimated. For small and customized applications integrations are recommendable. However, for larger solutions an adequate examination has to be carried out which can be resulted in two different databases with one common structure. As a webbased database platform also the ICES Data Centre could be a possibility. QuoData GmbH Page 33 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface A Appendices A.1 DBMS backend possibilities comparison A.1.1 Performance If data is entered monthly instead of yearly, the size of the database will consequently grow faster. Therefore, the chosen DBMS should be able to handle several million datasets without major impact on the performance as well as total database sizes of several Gigabytes. All of the compared databases fulfil these requirements. While MS SQL exceeds both MySQL and PostgreSQL in performance, this advantage only starts become important with databases which are much larger than the database in question. A.1.2 License costs MySQL MS SQL PostgreSQL free Express Edition: Free, but imposes minor restrictions free Standard Edition: ca. 1.300 € per Core, or ca. 650 € per Server + 150 € per CAL Business Intelligence Edition: ca. 6.300 € per Server + 150 € per CAL Enterprise Edition: ca. 5.100 € per Core As the table shows, the licensing fees for the products differ considerably. Both, MySQL and PostgreSQL are completely free. MS SQL offers different pricing options and editions. As noted above, the performance restrictions on the free Express Edition are unlikely to turn out limiting for the project. The per-core pricing options require a purchase of a minimum of 4 core licenses per physical processor. The pricing options which include a CAL (Client Access License) mean that one has to pay additional fees according to the number of user which access the system. Therefore, this option is unsuitable for web applications which are accesses by more than a few dozen users. A.1.3 Operating system requirements While MS SQL only operates on Microsoft Windows systems, MySQL and PostgreSQL are available for many common operating systems. The monthly hosting costs for a Windows server are slightly higher than for a server working with Linux. All DBMS have modest hardware requirements. Page 34 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface A.1.4 Administration effort The configuration and maintenance of MySQL is rather simple and there are a lot of tools and documentation available, as the system is commonly used. The configuration of a PostgreSQL database is slightly more complex which increases the initial effort. On the other hand, this provides more possibilities to optimize the performance. MS SQL is the most complex of the analysed DBMS. The administration effort is especially important for the long-term maintenance of a future RID DB, since OSPAR would have to plan with costs for backup and security update tasks. From that perspective, MySQL should be the preferred DBMS. A.1.5 Spatial Data (GIS) Both MS SQL and PostgreSQL provide support for spatial data with a large number of build-in functions. Current versions of MySQL also support spatial data to a certain degree, but the set of available functions is much more limited. Therefore, the performance of requests which are dependent on geographical information could be worse than on the other systems. Even in case GIS data plays an important role for a future RID DB, this is a very minor point because all functions can be manually writing during the web application development in case the DBMS doesn’t support them. A.2 Web database vs. local database software In this section, the most important advantages and disadvantages of web applications in comparison with classical MS Office desktop applications (just as the current RID-DB) are listed. This elaboration shall provide a basis of decision-making which kind of frontend could be realized. A.2.1 • • • Advantages of web applications No installation of software on the personal computers - Outdated software on a personal computer is not possible - Updates can be installed during operation centrally - No effort for local CP IT administrators Low costs per workstation - No license costs per workstation for specific software - Lower hardware requirements as complex calculations are carried out by web server High data security - The storage of all data on the web server allows the central backup and, if necessary, the central installation of a backup - Web server can be protected by means of firewalls, access controls etc. in order to secure sensitive data - Data can easily be encoded. If the data is stored locally encoded infrastructure, e.g. emails are required. This would be technically much more complex. - Different user roles guarantee that each user can only edit data intended for him. Breaches are less likely and can be detected with log files. QuoData GmbH Page 35 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface • Higher data quality - Reports can be created based on the current data due to the storage on the web server - Error-prone data transmission via E-mail is dropped - Format conversions are not required as in the case of synchronization with locally stored data (master-client architecture) • Easier maintenance of the master data - • Location independence of users - • Master data are maintained and modified centrally in the web server database User can access the database from each computer Correct date and time specifications at all time - Due to the usage of the date and time information of the web server it is ensured that always correct and consistent values for data and time are used (record changes, log books) • Low-priced development and maintenance - Know-how for web-applications is currently less-expensive compared to know-how for desktop applications - Simple changes can be implemented quicker as compilation of source code is usually not necessary - Integration with other web based technologies is easier, e.g. links to other webpages (reference to help or external webpages) or links to other web servers or web-based databases A.2.2 • Disadvantages of web applications Internet connection is required for data exchange and data evaluation - unreliable internet connection can affect the work. This can prove challenging during meetings at foreign/rented locations. • an appropriate web server is required - web server require more computing power the more users use the application - web server has stronger security requirements if it handles sensitive data - for use of server-software licenses might be required unless Open-Source software packages (e.g. MySQL, Drupal, PHP) are used • Introduction to working with the user interface is required (MS Office products are more familiar) • Display of the application in the web browser depends on the used browser - the display of the application needs to be checked for each potential browser - complex applications occasionally enforce the use of specific browsers - possible difficulties with the IT administration of the user’s authority if specific browsers should be used • Integration with local applications of the personal computer is more complex, for instance the data export to MS Outlook for mailing purposes Page 36 of 37 Developing the Database for the Comprehensive Study on Riverine Inputs and Direct Discharges (RID) Step 2 – Evaluate database model and outline overall requests on data submission, data access and database interface A.3 Technical considerations on frontend frameworks using Drupal as an example Drupal supports the following database systems: MySQL and PostgreSQL as well as MSSQL and Oracle with additional modules. Like other frameworks, Drupal natively supports a comprehensive user management system, and it is easy to present data in tables with sorting and filter options. There are also existing modules for importing and exporting of data in different formats including the ones asked for in the questionnaire. Common development tasks can be carried out without any programming. For more special functions, own modules have to be written in PHP. The magnitude of the reduction of development time compared to creating a web application from scratch depends on the complexity of the system and how much of its functionality is covered by existing Drupal modules. For very complex web applications or very unexperienced programmers, there is a small risk that the development time with Drupal exceeds the development time without it (app from scratch). This applies to all other web application frameworks. If data is entered monthly instead of yearly, the size of the database will consequently grow faster. Therefore, the chosen framework should be able to handle several million datasets without major impact on the performance. Drupal can handle this amount of data but it uses complex database structure which is highly flexible but impacts the performance negatively. To a certain degree, this can be compensated with appropriate hardware of the server, which would increase the hosting costs. The server hardware should be chosen based on a limit on the user interface response time and a sufficient amount of data. QuoData GmbH Page 37 of 37