Data flows in a Shared Environmental Information System Item: 1.1 7 November 2008 Source: Subject: IDS On the evolution of data flows - Background Introduction In October 2007 the NRC for Information System (part of the EIONET network) met at the EEA premises to discuss the future development of Reportnet. Our understanding of the messages from that meeting was that EEA should not focus on a new development of Reportnet – keep it mainly as it is – do not add new features but go mainly for an updating / refreshing – better processes etc. The MS told us further that they were willing to provide access to web-services for a number of data-flows that are suitable and in which the EEA is interested if EEA (or any other player at the European level) comes forward with the standards, a message that also have been voiced at a majority of the SEIS Country Visits that we have done so far. The strategic perspective for SEIS is linked to the change in information demand that the European Union faces in the years ahead. From an Information Technology point of view (with reference to that description) SEIS is about moving from Acqui driven policy demand that resulted in Compliance assessments via the more Sector oriented environmental policy (in which environment was one sector together with Energy, Transport Agriculture etc.) to an information outcome that responds to the Strategic policies that already exists in the 6th EAP namely Integrated assessments that will use existing data from the countries as input to models that will provide the results. Since the 1970s information technology have progressed enormously and the type of open, integrated and technically interoperable approach that is needed to support the strategic policy demand that leads to integrated assessments becomes increasingly feasible. But we should also bear in mind that SEIS is about creating distributed environmental information system. The focus will be on how systems work together i.e. adapting the existing systems, and the information that they handle, in the MS to new demands, needs and standards. Below is described – as a long term objective – a dataflow that could be looked upon as the SEIS vision on the input side, i.e. deliveries of data that is based on SEIS principles. That can only happen from systems that are operating on line at the national level. Kongens Nytorv 6 1050 Copenhagen K Denmark Tel.: +45 33 36 71 00 Fax: +45 33 36 71 99 D:\687322950.doc E-mail: eea@eea.europa.eu Web: www.eea.europa.eu Long term objective - Deliveries through SEIS on line systems ~6 months - 2 year 3b) Delivery happens near real time Both systems are monitored. (SEIS) ~Near real time Data flow consortium Quality Assurance Packaging Warehouse ~Near real time Online systems Data Protocol Data Protocol Online systems Conceptial data flow model System maintanance System maintanance Data Protocol Near real time DB Services The national systems are linked to the European system on line. Both the national and the European (or global) systems are monitored so when data is changed somewhere the updating will take place simultaneously (i.e. Near Real Time – NRT). Both sides have a team that maintain and manage the flows. The flow became automated through a SEIS induced connection between those teams. We also created Data Flow Consortium – a body designed to overlook the model and protocols needed to sustain the flow. Today this will only work with NRT systems. Systems with a lower flow rate would become more costly and might not have the right amount of resources available to maintain a fully developed system (that might however change in the future). We also believe that the costs related to package reporting (as today) should be able to be used for other purposes it will create possibilities for near real time European services and reporting it will make possible data flows between anybody (i.e. between regions in different countries or towards international organisations) it might create a flow of more data then was originally requested in a Reporting Obligation (i.e. the sum of several obligations from different international organisations dealing with the same topic) We know that countries have shown interest in participating – opening up - their systems – for this kind of delivery. Ozone web could be seen as one of the first setups for EEA. Ozone web is now looking in how it can be open to other international organisations and pass on the near real time data further. From the data flow point it would be ideal if that is done already from the source. Strong evolvement from the providers will be needed to get this happening and that could be accommodated by a data flow consortium. The task is to discuss and decide on technical common solutions in order to facilitate and maintain the data flow specified. Annexed is a catalogue of the types of data flows that we have identified so far. D:\687322950.doc Page 2 How to get to the long term objective Below we outline a process that is the result of a discussion at the EEA. As background to the process we set up the following principles: The SEIS concept is about distributed systems, where the responsibility for the quality of data mainly lies on the data provider – i.e. the organisation / country providing the data. The data is stored and accessed as close to the source as possible. EEA will work on a case by case basis with each dataflow. The flows are afferent and will evolve differently. The cost benefit is also important. At all time we should ensure that the cost/benefit ratio is sane. The focus in the proposed projects should be on how systems work together i.e. adapting the existing systems both at the European and the national level, and the information they handle, to new demands, needs and standards. We should be active in the SEIS NESIS project and use that project in order to get information on what kind of systems that exist in the countries and define the state of play as well as learn more about the country perspective. We are all equal partners; any organisation (provider or receiver of data flows) is responsible for their own system development and integration towards the other systems irrespective if it is about flows that goes regional/regional, regional/national, regional/European or national/European. EEA will continue to develop Reportnet and the work will be focused on improvement of the performance of the present system for the traditional reporting that will continue to take place. Reportnet will also be extended with functionalities that support the SEIS concept of a distributed system, where data is stored as close to the source as possible. In particular this will apply for spatial data. Taking into consideration the way data is produced at national level - the data flows that Reportnet will handle - will be deliveries of data based on Questionnaires (including Compliance Reporting), data that are maintained in offline applications and have to be exported for delivery to Reportnet and data that are maintained in online systems but – in the beginning – have to exported for delivery in Reportnet. In the long run the updated Reportnet should continue being a tool focused on data reporting for compliance assessments and for those data flows that, for cost/benefit reasons, never will evolve into online systems. EEA is also under way to do an inventory in the countries through the NESIS project. (www.nesis.eu) for online systems that could provide deliverables to the European level. A change from the present situation will only make sense if most of the countries in one topic have their data in online systems. D:\687322950.doc Page 3 We should also perform a pilot study for one (or several) specific data flows in order to gain experiences and assess what kind of resources that are needed to move from the present situation towards the long term objective. Presently it is difficult to imply what the above mentioned development will mean on a practical level. As an example responsibility such as quality checking or problem reporting might move towards the European level. There is also the question on resources. The implementation cost need to be carefully looked at and ensured by clear agreements. A small request or implementation from one might have a serious impact to the other. Perhaps the Dataflow Consortium will ensure that those issues are addressed correctly. We propose to start with E-PRTR as a first pilot project. D:\687322950.doc Page 4 Date: 29 May 2008 Bernt Röndell, IDS; Jan Bliki, IDS; Søren Rough, IDS On the evolution of data flows – Annex - Catalogue of deliveries I. Reflections on deliveries Reportnet focus today is mainly based on compliance reporting and a small part on EEA’s voluntary data flows (see above). All datasets looks the same from a Reportnet point of view. But if you take into consideration the way data is produced at national level the picture will different. Below we are trying to describe a number of “types” of data flows and at the same time trying to define in what way the EEA should operate / handle the future development of these types. Many of the national organisations have integrated quality control and validation functions into their on line systems (Described under C below). At the NRC ISmeeting the NRC’s stressed their interest in having connections towards those systems directly instead of exporting data packages and manually cross check such deliveries. Deliveries described under C below are the only type of deliveries that we can move into a higher level of automation and also towards near real time without heavy and costly investments. SEIS is about getting access to and sharing data and information that is handled in information systems and in organisations all over Europe. In the “Catalogue” below (under D and E) are described two closely related flows that could be looked upon as the SEIS vision on the input side. Deliveries can only be moved into these kinds of flows from existing on line systems at the national level. The main reason discussing the alternatives are strongly related to the cost implied on national level in order to meet minimum requirements specified in the legislation / reporting obligation. D:\687322950.doc Page 5 II. Catalogue of Deliveries A. Deliveries based on Questionnaires National Reporting / Reportnet International 1) Reportnet questionnaires. Countries fill forms directly into reportnet Operator Filling Questionairs Forms Directory ROD CDR Data Dictionary Conversion Services DMM Quality Assurance Packaging Warehouse Description The National level use forms inside Reportnet to deliver data. In practice this means that the moment it is time for delivering the data / information a person will log in on Reportnet and manually fill in a questionnaire. The result on international level is small datasets that are serving the purpose of compliance reporting. But in some cases (OECD Questionnaire) it also used for creating information. That procedure however has been questioned by the MS asking for another kind of procedure. Facts - A small number of records from every National body. - Very aggregated data - Slow data flows that takes place at maximum once a year. - The data has no further value than the intended assessments originally planned. Example Reporting obligation for Natura 2000 Standard Data Form (Habitat) Impacts - Countries only invest a small amount of resources to collect and report this data. - Automated indicator assessment will not be possible. The detail of data is very low and does not allow any further analysis. - SEIS will have very little interest in those flows unless more details can get provided. However the Commission have a high interest. Actions Reportnet. User dialogue with the COM? D:\687322950.doc Page 6 B. Deliveries based on offline applications National Reporting / Reportnet 2) Reportnet data deliveries. Countries maintain an offline application and export for delivery in reportnet Offline systems Packaging Quality Assurance Directory ROD CDR Data Conversion DMM Dictionary Services International Quality Packaging Warehouse Assurance Description The National level keeps a database designed for one or more international obligations. They maintain those databases offline on regular intervals. Those databases are then packaged and passed on to Reportnet. The main reason for countries not making this as online systems is in most cases because of the time intervals that are relatively low and because the gathering of the data from lower levels is done manually. The cost/benefit of this systems is lower when done offline than online. Facts - Long manual process to produce the dataset. - Small additional records towards a previous collected list. (Time series) - Relative low time interval makes it possible to put one person on the job for a short time. (Example here is one month of work to produce the dataset form other sources.) Example Corine land cover. A process that is based on manual interpreting satellite data. This process is on every 5 year intervals and is project based. Project based means that a new team is put together to generate a new update. Bathing water. Most countries are gathering the data during summer period and only for the days people go bathing to the beach, lakes or rivers. Most information is initially gathered manually and at the end of the season entered as yearly averages into a database. Impacts - Countries investment is higher than in case A. But because of it nature can not be done according the solution of case A. - Automated indicator assessment could be possible if the indicator can live with slow data updates for this data flow. SEIS will have little impact on those flows. The potential to move into online systems is greater but this need to be a checked on a case by case situation. Actions Reportnet. Moving offline applications into online systems on national level is a slow process which will take several years and must be looked at on case by case basis to ensure that it is possible and cost effective. However, EEA should investigate how to handle these flows when that happens. (Some MS will invest in online services for their public and EEA should find a way to harvest them). Create a process that makes data into information (IMS) at the European level (an information service) D:\687322950.doc Page 7 C. Deliveries based on online systems 3a) Reportnet data deliveries. Countries maintain an online systen and export for delivery in reportnet Quality Assurance Directory Online systems Packaging ROD CDR Data Conversion DMM Dictionary Services Quality Packaging Warehouse Assurance System maintanance Description The National level maintain an online system that either collects data directly from stations or automatically collects data from regional systems or is a central system where everybody logs in and make there changes online. Online systems expose this information directly to the internet (open or secured). Those systems have a team of people who maintain the system. Typical behavior of online systems is that the data can be changed at any time. Facts - Large amount of data. (ex. Monitoring Stations) - Changes can be provided at any time. (Ex. Industry reporting to PRTR) - Any system that is based on monitoring stations is most probably based on online systems. The only issue might be that they are not always connected to the internet. - Any centralized National system could become or is already an online system. Example Ozone: All countries have an online system and EEA is using those inside Ozone web. The is a first of it’s kind and to be SEIS complained it should allow other international or national organisations to make use of that same setup. The concept proves that near real time is technological possible. PRTR: Most probably all countries did move or are moving this into an online system because the cost to maintain a website and gathering the information will be lower than doing this offline. Impacts - High impact on resources. Most of the time we talk of a team of people. The system is the cheapest solution for the necessary and required data flow. - This is a technological driven move from Case B to C. The cost of implementing this in a concept like B would be much higher for the simple reasons of the update intervals that are very high. - SEIS will have a big impact on those flows but the datasets could be very valuable in an assessment process when they are seen as constants i.e that are just slowly changing over time. Actions Move towards an on line delivery accessible for the European level (D1). Develop Discovery services that is based on a metadata standard that “goes beyond Dublin Core (INSPIRE metadata specifications). Create a process that makes data into information (IMS) at the European level (an information service) D:\687322950.doc Page 8 D. Deliveries through online system 3b) Delivery happens near real time. Both systems are monitored. (SEIS) ~Near real time Online systems ~6 months - 2 year Data flow consortium Data Protocol Conceptial data flow model System maintanance Quality Packaging Warehouse Assurance ~Near real time Data Online systems Protocol System maintanance Data Protocol Near real time DB Services Description In this setup on line deliveries are replaced into life systems on both the national and the European level. Both sides have a team that maintain and manage those flows. Both flows can be automated only when there is well established connection between those teams. Because of the statements above; near real time is the only way data can be transported. Systems with a lower flow rate would become more costly and might not have the right amount of resources available to maintain a fully developed system. In practice this means that those data flows that fit into this group will become packaged at international level. But also that the countries have their own packaging process in place. Facts - Near real time update between National and Inter-national databases. - Reduce manual costs related to package reporting. This is only true when such national life systems exist. - Possibilities for near real time European services and reporting. - Possible data flows between anybody including regional to regional from different countries. - Possible return of European services to any national or regional system - Common flows towards other international organisations. - More data than originally requested inside an obligation. (Could be the sum of several obligations from different international organisations) Countries have shown interest in participating – opening up their systems – for this kind of delivery. Example Ozone web could be seen as one of the first setups for EEA. Ozone web is now looking in how it can be open to other international organisations and pass on the near real time data further. Ideal would be if that is done already from the source. Strong evolvement from the providers will be needed to get this happening and that could be accommodated by a data flow consortium Impacts - Higher resource impact at International level - Need for technical communications between the teams hosting online systems dealing with the same data flow. (Data flow consortium (See below)) Actions Start the project that moves processes from C towards D. (See below) Create a process that makes data into information (IMS) at the European level (an information service) The “Data flow Consortium” is needed when the data flows will affect more international institutions than EEA. The task is to discuss technical common solutions in order to facilitate and maintain the data flow specified. D:\687322950.doc Page 9 E. Deliveries using Sensor Web ~6 months - 2 year 4) Sensor web ~Near real time Online systems Sensors alert when treshold occurs Data Protocol System maintanance Description Facts Example Impacts Actions D:\687322950.doc Quality Packaging Warehouse Assurance Online ~Near real time systems Data Protocol System maintanance Data Protocol Near real time DB Services The future data flows based on monitoring station could evolve in sensor web systems. Intelligent sensors only send messages when thresholds are exceeded. The European and the national level can use the same sensor network and maintain different threshold settings to accommodate there needs. - Very large networks which only receives information above the preset threshold. - Can only be based on online systems. One alert can happen at any time and any place. - A strong need for communication and standardisation between stations and online systems. - At the European level Forest monitoring and Flooding - Higher resource impacts on International level all other options. - Strong need for communication and agreements to those who manage that sensor network. - Excellent setup for alert systems. Follow RTD projects Page 10