MS251-specifications_for_the_portal_1.0

advertisement
EU BON MS251: Specification for the
European Biodiversity Portal
Version 1.0
Milestone MS251 for Task 2.5 (European Biodiversity Portal) is due in month 27 (Feb 2015).
The release of a beta version of the portal is due in month 39 (Feb 2016).
Contents
Contents .................................................................................................................................... 1
1.
Introduction ....................................................................................................................... 1
2.
Functional specifications .................................................................................................... 2
3.
2.1
Functional specifications summary.............................................................................. 2
2.2
Search process [SPEC-01] ............................................................................................. 3
2.3
Biodiversity networks integration [SPEC-02] ............................................................... 4
2.4
Data supplying, consuming and visualization [SPEC-03].............................................. 5
2.5
EU BON Taxonomic backbone integration [SPEC-04] .................................................. 5
2.6
Connection with GEOSS [SPEC-05]............................................................................... 6
2.7
Integration with DataONE network [SPEC-06]............................................................. 6
2.8
Geospatial capabilities [SPEC-07]................................................................................. 6
2.9
Potential users and roles [SPEC-08] ............................................................................. 7
2.10
Content managing [SPEC-09] ..................................................................................... 7
2.11
Dataset quality specifications [SPEC-10].................................................................... 7
Architectural Design ........................................................................................................... 8
3.1
Current architectural proposal..................................................................................... 8
3.2
Architectural revision using GI-cat ............................................................................... 9
1. Introduction
Task 2.5 (European Biodiversity Portal) is defined as follows in the Description of Work
(DoW).
A European Biodiversity Portal (EBP) will be developed as the main GEO BON
information hub. It will link to relevant databases and information systems, policy
contacts and recommendations, and structured advice for assessing relevant
distributed information/datasets for different user groups, including contributions
from citizen science data gathering gateways. The EBP will technically integrate the
various data sources under one search facility and spatially/temporally oriented user
interface. The portal will build on the tools developed by task 2.3, functions developed
by task 2.4. It will provide access to full detailed data, geographic visualisation and
1
remotely sensed data. It will be closely linked to the GCI and GEO Portal, and access
layers and data from GEOSS sources. The portal would also act as showcase for the
products from the analytical and modelling activities of other WPs and support
workflows for building such products using the registered e-services. The portal will
also serve general dissemination functions for WP8. (Lead CSIC; UEF, GBIF, UnivLeeds,
Pensoft, FIN, Plazi, GlueCAD, NBIC; Months 1-54)
Person month allocations for the participating partners are: CSIC (Lead, 18) UEF (7), GBIF
(3), UnivLeeds (3.9), MRAC (0.5), FIN (5), Plazi (1), GlueCAD (11), NBIC (2).
As described in the EU BON DoW, the European Biodiversity Portal (task 2.5) “will technically
integrate the various data sources under one search facility and spatially/temporally
oriented user interface. The portal will build on the tools developed in task 2.3, functions
developed by task 2.4. It will provide access to full detailed data, geographic visualisations
and remotely sensed data. It will be closely linked to the GCI and GEO Portal, and access
layers and data from GEOSS sources”.
The EU BON Portal’s first priority is to connect and access data from GBIF, LTER, testing sites
databases and other data/metadata providers, allowing users to search biodiversity data
through a public web interface. The search engine will look for this information by querying
each data provider or aggregator connected through a SOA brokering platform (Enterprise
Service Bus). As a network of networks, EU BON will not connect directly to the original
sources of data, if these are available through existing aggregation services. Instead, the EU
BON Portal will use already aggregated data.
This document encompasses the functional specifications of the portal. In the first place, a
list of specifications is presented, analysing each one in more detail in the successive
sections.
In the last section, we will analyse the current proposal for the architecture, reviewing it
after the inclusion of GEOSS GI-cat as the main data brokering tool.
2. Functional specifications
2.1 Functional specifications summary


The main goal of the portal is to provide integration of biodiversity data, ecosystem
data and genetic data, searchable using a common user interface.
o The portal will provide a common search user interface as an input form.
Search filters to include: taxa, geospatial coverage, temporal coverage, EBV
and data providers.
o The portal will obtain species information by asking the EU BON taxonomic
backbone for the species information and their universal identifiers.
o The search results will be presented as a list of datasets and their associated
metadata.
 Biodiversity and ecological metadata will be obtained by consuming
web services (e.g. OGC CSW, GBIF API) or by EML parsing.
 Genetic data will be represented as GenBank links and their related
metadata.
The portal will act as a network of networks. This implies:
o This implies that it will use metadata only, providing links to the data.
2
o
o





This implies that the portal will not host any data.
If possible, it will ask for the data to data providers, but this will be very
limited and dependant on each network capabilities.
The portal must be compatible the DataONE network solutions, as a requirement to
become a DataONE Member Node. In particular, it must provide an output REST API
to be consumed by DataONE network services.
The portal must provide advanced visualisation capabilities.
o In particular, it must represent search occurrences in a map user interface.
o It must be able to represent remote sensing data, i.e. consuming OGCcompliant services and representing their output layers.
o It must be able to use the map as an input interface for the geographic
coverage search filter.
The portal will consider three main user roles: public, researcher and policy maker.
The results details or visualisations may vary depending on each user role.
The portal must provide a way to upload or link EU BON products, documents, tools,
guidelines and other and relevant info.
The portal must provide an output interface to be consumed by GEOSS GCI, that is,
by the GEOSS Discovery and Access Broker.
The specifications are summarized in the table 1, with an assigned code to ease further
traceability with software requirements.
Specification Code
SPEC-01
SPEC-02
SPEC-03
SPEC-04
SPEC-05
SPEC-06
SPEC-07
SPEC-08
SPEC-09
SPEC-10
Description
Provide a filtered search user interface
Integrate biodiversity networks/providers/test sites by metadata
Supply heterogeneous biodiversity related data
Integrate the EU BON Taxonomic backbone
Provide an interface to GEOSS
Provide an interface to DataONE
Provide geospatial capabilities for filtering and visualisation
Provide different user interfaces for different user roles
Web content management
Data quality specifications
Table 1. EU BON Biodiversity Portal specifications summary
2.2 Search process [SPEC-01]
A simplified search use case, using the taxonomic backbone capabilities, could be as follows:
1. The user selects a particular species or habitat. The user can enter a species
scientific name, a species vernacular name or any habitat-related keyword.
2. The system will provide several filters.
3. The user selects any desired filter.
4. If species information has been introduced as an input, the system will ask for
species information to the taxonomic backbone.
a. The taxonomic backbone will search for the species information on each
taxonomic provider accessible from the backbone. It will integrate the
information, translate the ID’s if needed and return the compiled
information to the system.
3
5. The system will use the integrated species information and habitat-related keywords
as an input to search for occurrence/habitat/genetic data.
6. The system will return the list of search results to the portal component, presenting
the information on the web page.
The search process will use metadata harmonization to enable the lookup and identify of
the required information on each providers' datasets. In principal, at least one search
standard or protocol must be used as a common output, in order to provide a common API
for the search process, e.g. GI-cat OpenSearch and OGC CSW interfaces.
Being a heuristics search, federated search is preferred over indexed searches or data
caching. Nevertheless, we must point out that GI-Cat admits federated search and a hybrid
search (indexing + federating).
The following search filters have been proposed:
- Species (taxa/vernacular name).
- Geospatial filter (bounding boxes / polygons / locations
- Providers/testing sites.
- Date/time filter.
- Kind of data
- Broad species traits
- EBV class (topics).
2.3 Biodiversity networks integration [SPEC-02]
The system will be able to search biodiversity data and metadata from a wide range of data
providers. Metadata-based information will be used to discover datasets, while data will be
offered as links to downloadable standardised file format, whenever possible (e.g. Darwin
Core Archive).
The portal will integrate the providers harvested using the Registry and Catalogue
Specification (MS241, annex 1). During the first phase of the development, the system will
integrate a subset of the data providers, extending the subset in further releases:
- GBIF (through GBIF API).
- LTER Europe (DEIMS + GeoNetwork, using OGC-CSW web services or EML
harvesting).
As genetic/genomic provider, the EBP will use GenBank, consuming its “Entrez” WSDL or
REST services (http://www.ncbi.nlm.nih.gov/genbank/).
Applying with the service-oriented architectural pattern, there will be two possibilities for
connecting data providers to the portal:
- Direct connection to each provider through WSDL web services, OGC services or
REST API.
- EML harvesting, through the implementation of a harvester service.
Following the recommendation included in MS241, the EuroGEOSS broker, GI-cat, must be
assessed and integrated in the architecture as a specialised message broker, that is, the
system that will integrate input sources and generate a common set of outputs. A previous
assessment has been included in section 3.2.
4
GI-cat provides a set of input interfaces (accessors), translating the messages to a common
ISO-19115 data model.
- LTER CSW 2.0.2 services can be connected using the CSW accessor.
- GBIF requires a new accessor (currently in development).
2.4 Data supplying, consuming and visualization [SPEC-03]
As stated in the DoW, EU BON will evaluate seven major biodiversity data types: 1) remote
sensing data; 2) products derived from remote sensing data (e.g., vegetation and habitat
maps); 3) taxonomic backbone data; 4) ecological data; 5) current and historical specimen
data from scientific collections; 6) species profile data; and 7) DNA sequence data.
The portal must act as a common showcase for data and metadata providers. The portal will
not include uploading capabilities: it will delegate this responsibility to different catalogue
and repository applications, installed on each test site or data provider (MS231, data sharing
tools),
Several data representation techniques must be supplied, depending on each user roles:
- Grids, form and maps.
- Charts, statistics and reports (these could be the unique outputs for the “policymaker” user role).
The portal will be capable of executing pre-built workflows, previously defined
- Based on available remote services.
- Using datasets exported by functions within the portal.
- Using background data forwarded from GEOSS services.
Particular workflows for running EBV estimations will be defined later on; these workflows
will be use EBV variables for the estimation processes.
The main metadata standard for describing datasets will be EML (Ecological Metadata
Language). For collections, ABCD metadata translation will not be necessary, since collection
datasets are already indexed by GBIF.
2.5 EU BON Taxonomic backbone integration [SPEC-04]
A unified taxonomic information service has been developed within the scope of the task
1.2. This backbone has been release and is accessible through a set of REST services (RESTful
API), available at http://cybertaxonomy.eu/eu-bon/utis/
The backbone allows running a federated search on multiple European checklists and
returns a unified result set of the individual responses. Currently, the checklist includes the
Pan-European Species directories Infrastructure (PESI EU-nomen), Catalogue of Life (CoL)
and World Register of Marine Species (WoRMS). It is planned to connect more biodiversity
catalogues, as EUNIS and Natura2000, as it is required by the INSPIRE directive.
The portal must use this backbone and obtain extended input species information prior to
searching for datasets among data biodiversity data providers.
5
2.6 Connection with GEOSS [SPEC-05]
EU BON aims at providing European biodiversity data for GEO BON, therefore it will be
connected to the GEO Discovery and Access Broker (GEO DAB, http://www.eurogeossbroker.eu). That is, the EBP must implement several provider services according to the GEO
DAB API (http://api.eurogeoss-broker.eu/docs/index.html).
The GEO DAB is based on the message broker GI-cat. This broker is able to provide a
common output to be consumed by other GI-cat instances, thus constituting a possible
connection between the EU BON platform and GEOSS. Nevertheless, as recommended by
the INSPIRE directive, OGC-compliant outputs will be generated, and in particular a OGCCSW 2.0.2 common output.
2.7 Integration with DataONE network [SPEC-06]
DataONE established particular requirements to participate in its infrastructure: become a
Member Node or use the Investigator Toolkit. EU BON must be prepared to guarantee a
further integration with the DataONE network, thus becoming a DataONE Member Node.
The integration as Member Node requires the implementation of a set of APIs, depending
on the type of Member Node to implement:
-
Tier 1: Read, public objects (MNCore and MNRead APIs)
Provides read-only access to publicly available objects (Science Data, science
metadata, and Resource Maps), along with core system API calls for monitoring and
logging.
-
Tier 2: Access control (MNAuthentication API)
Allows the access to objects to be controlled via access control list (ACL) based
authorization and certificate-based authentication.
-
Tier 3: Write (MNStorage API)
Provides write access (create, update and archive objects).
Allows using DataONE interfaces to create and maintain objects on the MN.
-
Tier 4: Replication target (MNReplication API)
Each tier implicitly includes all lower numbered tiers. For instance, a Tier 3 MN must
implement tiers 1, 2 and 3 methods.
Allows the DataONE infrastructure to use available storage space on the MN for
storing copies of objects that originate on other MNs, based on the Node Replication Policy.
The EU BON platform should implement the Tier 1 at least.
2.8 Geospatial capabilities [SPEC-07]
The portal will use geospatial representation as a way to visualize search results and as a
search filter as well. For filtering, the user could describe bounding boxes or polygons on a
map.
Following the INSPIRE recommendations, OGC standards will be used, e.g. ISO 19115
geospatial metadata model and OGC-CSW services
Map visualisation must include not only occurrence and geographical coverage, but also the
representation of remote sensing data layers. CartoDB must be assessed as the main GIS and
6
web mapping tool for the EU BON Biodiversity. Nevertheless, other GIS tools may be
included as advanced GIS visualisation alternatives.
Map visualisation must include not only occurrence and geographical coverage, but also the
representation of remote sensing data layers. As a map visualization and interaction
component, CartoDB must be evaluated. It provides advanced GIS data visualization,
including observation occurrence, timelines and animations through time. Nevertheless,
other GIS tools may be included as advanced GIS visualisation alternatives.
2.9 Potential users and roles [SPEC-08]
Several user roles and their permissions have been differentiated so far:
- Public or anonymous user: simple search use cases, simple data visualization and
export functionality.
- Researcher: obtain more detailed information, data analysis, charts…
- Policy maker: analytical results, EBV’s estimation, charts., …
- Administrator: data provider administration, portal content administration, portal
and middleware analytics, etc.
Particular user interfaces and interaction capabilities for each user role will be defined in the
requirement analysis phase and improved on each portal release.
2.10 Content managing [SPEC-09]
As it was stated in the DoW, the EBP must give a fast access to EU BON integrated data and
products. As far as a current portal is deployed for uploading EU BON news and deliverables,
among other products, the EBP must facilitate the upload or link of those products,
documents, tools, guidelines and other relevant information. Therefore, the portal could act
as a content management system or at least be able to link to the EU BON portal
(www.eubon.eu).
2.11 Dataset quality specifications [SPEC-10]
The rapid growth of biodiversity data collected by volunteers, amateur naturalists
coined as Citizen Science, provides the chance to gain extensive (if not huge) data
about biodiversity. While, in parallel, formed a pressing need from the scientific
community, that doubts the reliability and quality of public-based data, to establish
and provide methods for quality assurance.
From the 'Biodiversity Portal Specifications Questionnaire' we can learn of the
different approaches regarding the expectations about finding/using/annotating QA
data.
Hence, within the context of the EU BON Portal specifications, the following topics
for discussions regarding quality assurance of observational data are proposed:
1. The convenience of annotating the data with quality-oriented metadata.
2. What information about quality of-data within (registered) datasets:
discoverable, identifiable, available, useful, filtered, acceptable, etc.
7
3. Whether there are feasible actions that can or should be taken by WP1 and
WP2, to apply modifications and actions oriented to enhance and annotate
quality-controlled data discoverable by the portal. E.g. promote, develop and
recommend methods to enhance QA data.
4. Do we need/expected to recommend standard(s) or strategies for quality
control and annotation, given the lack of standards and the range of methods
and tools (techniques & software) to improve, evaluate, validate, and
facilitate e.g. CS-based accuracy of data?
3. Architectural Design
3.1 Current architectural proposal
Following the principles and concepts of LifeWatch (D2.1, Recommendation 7), EU BON will
follow the principles of the Enterprise Application Integration in a service-oriented
architectural approach. The system will be focused on a common middleware layer that will
connect heterogeneous data providers, orchestrating the messages returned by each one.
The EU BON Biodiversity Portal will act as a client of a broader architecture that will connect
several heterogeneous providers: biodiversity data/metadata providers, genetic data
providers, taxonomic providers and so forth. The majority of them currently supply a web
service or REST interface.
Figure 1. Middleware-based architecture for EU BON
The middleware layer will be composed of an Enterprise Service Bus with data service
adapters and the message broker. In particular, the GI-cat message broker, developed by
EuroGEOSS, must be evaluated as a candidate for that purpose, due to its ability to integrate
8
disparate metadata catalogues through standardized interfaces and to establish a bridge
between EU BON and the GEOSS Common Interface.
3.2 Architectural revision using GI-cat
GI-cat is a specialized brokering system developed for the GEOSS portal within the context of
the EuroGEOSS project. It is comprise of a JavaEE application that acts as a broker among
standardized sources and catalogues, and a set of libraries and connectors to provide
standardised input and output capabilities.
As a specialized broker, GI-Cat can be configured as a direct- access mediation service or as a
metadata harvester.
GI-Cat uses OGC-CSW services to distribute the functionality and return search results. The
interaction of EML with CSW services must be studied in detail before testing GI-Cat as a
candidate for the EU BON message broker.
GI-cat common data model is based on ISO 19115 plus extensions. The distributor translates
provided data and metadata to this data model, and exposes the information translating
again to several catalogue formats (e.g. CS-W, OAI-PMH, OpenSearch). Consuming EML
interfaces directly may require the development of a new GI-cat accessor
Figure 1. The GI-cat broker system featuring some catalogue query interfaces (right) and several backend
mediation components (source: Nativi et al. 2009).
After reviewing the recommendation of using GI-cat as the brokering system for EU BON, as
stated on the MS241 document “Specification for the registry and metadata catalogue”, we
9
have encountered that, as a specialized broker, GI-cat is a powerful solution for integrating
standardized sources and for providing an interface for GEOSS, but it lacks on connectors for
common WSDL services or specific input sources needed for EU BON purposes (e.g.
GenBank, EU-Nomen, WoRMS, etc).
We can propose a revision for the architecture, a hybrid solution that consists of the
integration of GI-cat inside a larger ESB based SOA architecture.
Figure 2. EU BON middleware platform using GI-cat
10
Download