Coordinating IT developments in on-going (EC-) projects

advertisement
Coordinating IT developments in on-going (EC-) projects: EPOS seismology
Date: February 8 -10, 2012. Place/host: KNMI, De Bilt, The Netherlands
Invited projects: EPOS, NERA, VERCE, COMMIT, EUDAT, ENVRI, OpenSearch, EFFORT,
IT coordination meeting attendance
Institute
Alberto Michelini
INGV
Alessandro Spinuso
KNMI
Alex Hardisty
Cardiff University
Andrea Bono
INGV
Andreas Rietbrock
Liverpool University
Project
EUDAT
VERCE/NERA
ENVRI
EIDA
VERCE, NERA
Chad Trabant
Giuseppe Fiameni
Jean-Pierre Vilotte
Jenny Zhang
Joachim Saul
Joachim Wasserman
Laurent Frobert
Luca Trani
Marcelo Bianchi
Marek Simon
Martin Kersten
Michelle Galea
Milene Ivanova
Paul Martin
Pedro Gonçalves
Peter Wittenburg
Philip Kaestli
Reinoud Sleeman
Rosa Filgueira
Torild van Eck
Valentino Lauciani
Wim Som de Cerff
IRIS-DMC
EUDAT
VERCE/EPOS
COMMIT
GFZ, NERA
SeisHub
EMSC/NERA/VERCE
VERCE/NERA
NERA
SeisHub
COMMIT
VERCE
COMMIT
VERCE/ENVRI
ENVRI
EUDAT
SHARE/GEM
NERA
Effort, VERCE
NERA/VERCE/EPOS
EIDAT
INFRA
IRIS
CINECA
IPGP
CWI
GFZ
University of Munich
EMSC
KNMI
GFZ
University of Munich
CWI
University of Edinburgh
CWI
University of Edinburgh
Terradue
Max Planck Institute
ETHZ
KNMI/ORFEUS
University of Edinburgh
KNMI/ORFEUS
INGV
KNMI
presentation
use cases
NERA/EPOS
(ENVRI) moderator
use case
IRIS QC &
webservices
EUDAT
VERCE / use cases
SeisHub / use cases
GFZ EIDA
SeisHub
COMMIT
VERCE WP9
OpenSearch
(EUDAT) moderator
ODC QC
Effort
introduction
Day 1 Wednesday, Feb 8:
Introduction, goal and context of this workshop (Torild van Eck – ORFEUS)
Structure of workshop:




IRIS-DMC, ODC, GFZ-GEOFON – 3 large data centres operating on international scale. EIDA =
concept for distributed/federated datacenters approach. However, the data centres have to be
operational as well as evolving. Not self evident how to integrate new ideas and migrate towards
future desires.
Community s/w development: ObsPy, SeisHub, OpenSearch.
Data users have use cases (eg., Effort, WHISPER) and ongoing project developments (e.g., NERA,
VERCE).
On-going IT projects like COMMIT for advanced data storage and analysis and EPOS, EUDAT,
ENVRI for multi-disciplinary interoperability.
Anticipated results: (Torild’s slides have updated list)

Set of presentations providing a discussion overview of the current IT developments on databases
and computational resources, integration approach with other discipline resources.
1




Contribution to the EarthCube debate (COOPEUS). Includes bringing IT developers and earth
scientists closer together. How is EarthCube tackling this?
Minimise overlaps. Set development priorities that fit within the current on-going projects
Possible additional developments to be anticipated and requiring additional resources
Input to EPSO, EUDAT, ENVRI. Clear definition of the EPOS work within ENVRI. Also input in other
direction.
NOTE: References to “=ENVRI problem no. X” are my annotations to highlight problem areas I heard about
during the workshop that may actually be common with those of other ESFRI RIs. I will take these back into
the ENVRI project for further consideration.
IRIS-DMC, datacenter presentation – developments
(Chad Trabant – IRIS/DMC)
Presentations\IRIS DMC Service Overview.pptx
USA consortium. Data services, Instrumentation services incl. global seismograph network, temporary
experiment instruments loaned out, Earthscope – IRIS runs the seismology part of this. Education and
public outreach, international development.
DMC currently holds 155TB (Jan 2012) – this is huge for open public data but small compared to oil
companies! 150-200 sources collected from. Holds data back to 1868. People are requesting data from
across the archive all the time. Currently shipping out 85TB (2010), 180TB (2011) in response to 10s100,000s requests. ½ million requests per day, delivering 1TB per day. Increase in requests is because of the
way the use of data is changing, not because there is more data to request. Web services have had a huge
impact on how users are requesting (in comparison with older email based mechanisms). Retirement of old
mechanisms is difficult.
Currently using 4 linux VMs in front of Isilon NAS disk arrays. Meta-data stored in Oracle. Logs in postGres.
All fronted by load-balancer.
“Real-time” in seismology typically means 1-5 minutes latency. mSEED (miniSEED) is the std seismology
data format.
Seedlink – std open streaming protocol. ArcLink. Web Services are future both at external interface and
internally. Standard REST. Once switched over to WS, can start changing back-end infrastructure.
See www.iris.edu/ws/ for data retrieval services and for data transformation / calculation services.
Users have many levels of data use. =need to process raw data that delivers data products that match
users’ needs. Also, many kinds of users, from novice, IT illiterate through to professionals.
Also, IRIS Web Services Java Library – beta release 7/2/12. Serves data. Don’t need to know anything
behind. Also, MatLab interface.
QC algorithms applied to raw data to make it ‘research ready’ e.g., to calculate/know s:n ratio, remove
noise, etc.
2
GFZ, data center presentation – developments EIDA
(Marcello Bianchi – GFZ/GEOFON)
Presentations\whatiseida.pdf
70 networks, 2800 stations, 15,000 channels feeding GEOFON. 35TB of data. In Europe, many networks,
many data centres, organisations and users. How to find the right data stream from the right place? EIDA
creates a federation in which each data centre retains their own interests. ArcLink is the mechanism for
federation. It delivers time-series data in response to requests or inventory of holdings. Is an asynch
mechanism.
**Look at IRIS-DMC web services and EIDA/ArcLink use cases as examples for what kinds of requests users
want to make from federated data collections. Look at set-up for seismic monitoring and data collection as
a model example for biodiversity sensor data collection.
There were some challenging questions from audience about associated metadata that is needed and
about the efficiency of synchronizing ArcLink nodes to maintain accurate routing approach.
ODC, data center presentation – developments
(Reinoud SLeeman – ODC/KNMI)
Presentations\IT-QC-ORFEUS.pdf
“Some considerations with respect to a seismic station” – a lot of QC work needs to be done to detect, for
example, changes in the type of sensor – because data centre has no control over seismic stations.
IT challenges: a) efficient algorithm for locating the highest quality data in the archives. b) provenance c)
use of QC parameters in delivery services d) PQLX web service
SOH=State of Health. N= north E=east Z=vertical channels.
OFEUS will put ADMIRE gateway in place.
Metadata changes are problematical. When it changes should one go back and re-calculate all the QC
metrics for already held data? Need also to retain the old poor data too. Versioning? Provenance?
**Do the kinds of approaches to QC outlined in this presentation have transferable and more generic
applicability in other areas. = ENVRI solution to ENVRI common problem no.1
OpenSearch
(Pedro Gonçalves, Terradue)
Presentations\T2-EC-GENESIDEC-HO-11-117 OpenSearch applied to Earth Science.ppt
OpenSearch is a descriptor for multiple search templates. It can be adopted across multiple sites to
homogenize the access and the discovery of distributed information performed through most of the
already available clients.
3
It expects to provide RESTful search interfaces and the output formats can be selected among
several options as long as they can be specified through mime-types. Most common format
suggested is ATOM
Is the EPOS community proposing to support OpenSearch queries on the data it holds?.... with geo and
time and other potential extensions?? Will the various data centres that have been described eventually
like to support the OpenSearch capability? This is a common solution they could adopt coming from ENVRI.
How can OpenSearch sit on top of ArcLink?
Q. Can OpenSearch be used to solve some of the problems raised in the discussion about ArcLink? i.e., on
the need of ArcLink servers to synchronise with each other and to hold copy of all routing information.
Marcello thinks it would be necessary to define some seismology extensions to make it possible to obtain
the right granularity. Return atom feed with the matching objects (metadata) but haven’t reached the data
yet. …… discussion can continue.
Checklist in presentation lists what a data provider’s search engine has to provide in order to be accessible
by OpenSearch. www.opensearch.org – describes capabilities of a search engine. Geospatial and temporal
extensions added by GENESI.
Q. Does OpenSearch represent an alternative approach to the Catalogue interface approach? A. It is the
default binding in OGC CSW 3.0.
http://www.google.co.uk/url?sa=t&rct=j&q=ogc%20csw%203.0&source=web&cd=1&ved=0CCIQFjAA&url=
http%3A%2F%2Fwww.genesi-dec.eu%2Fpresentations%2FOGC_201009.pdf&ei=qJMyT4DBsLa8APJjKHrBg&usg=AFQjCNEO9b7KkdxNcpXTQQ4TE9_h0vUndA&cad=rja
Chad – what’s the obvious advantage? Much more useful than failed attempts to adopt ontologies. BUT
Firefox can make the link but it has no idea to what to to do with the data cos it doesn’t understand the
data format. A. Need to define and register the mime type so that universal readers start to appear. E.g., as
has happened with netCDF. Also need to adopt and use aggregators to help out to discover sources of
results that will match queries.
SeisHub
(Joachim Wassermann & Marek Simon, LMU)
Presentations\joachim.wasserman.pdf
ObsPy – allows read, write, manipulation, process and visualize seismology data, including using old codes
for analysis. = rapid apps development for seismology. www.obspy.org . Very nice tutorials and installers
and focus on students/young researchers are some of the reasons why it is so widely used.
SeisHub – a small institution or personal solution. Only used by 1 institution. Unsustained and seems
unlikely to see more widespread adoption.
4
Use cases
(Andreas Rietbrock, Liverpool /VERCE)
Pilot project – RapidSeis: Virtual computing through the NERIES data portal. User can alter the functionality
of virtual application using a plugin. Everything done through a browser. Based on Edinburgh’s RAPID
portlet kit to develop Editor portlet to create/compile scripts/jobs (using SDX) and Executor portlet to
parametise them and run them, … on whatever resources are available. Used for examining waveform data
and localizing extracts from it.
…..ideas carried forward into ADMIRE project …. and to VERCE.
Seismic data sets are becoming denser and larger. Greater coverage of an area. Higher frequency sampling.
Turning up in the local institutes where it is collected / coordinated.
What is the role of the data center in this scenario? Add computational capacity. This is the DIR problem.
Does it require data staging approach or does it require moving computation to where the data is. = ENVRI
common problem no. 2
Need local institutional infrastructure to deal day-to-day with large datasets that may not be deposited in
the data centres.
Once have moved data e.g., from data centre or from field expedition into the institute, one wants to
maximize the utilization of it. J-P Villotte suggests a hierarchical infrastructure is needed.
Use cases
(Alberto Michelini, INGV /EPOS/VERCE/EUDAT)
Presentations\Use_cases_in_Seismology.pdf
Make it feasible to do scientific calculations otherwise impossible on standard desktops, laptops, or small
clusters. Has large data volumes (e.g., data mining). Very intensive CPU applications (e.g., forward
modelling and inversion). = ENVRI common problem no. 3
Metadata definition and assignment – in a way that allows scientists to select data they need, especially
when they are acessing data that is not from their native domain of expertise. = ENVRI common problem
no.4
What are the requirements with respect to assigning metadata to real-time seismological data streams?
What metadata has to be assigned? With what frequency? And how quickly? e.g., in order to satisfy the
NERA and VERCE use cases for immediate and continuous analysis
K.Jeffery is suggested 3 level metadata scheme based on Dublin Core, CERIF, INSPIRE and other relevant
standards.
EPIC or Datacite being proposed in EUDAT for persistent identifier mechanisms. PID scheme = ENVRI
common problem no. 5. Needs coordination with other cluster projects.
Data Centres should adapt to processing incoming real-time data as it arrives and to QC, downsample it if
needed, and assign metadata and store it. Build a search engine over it. Do it like Google. = ENVRI common
problem no. 6
5
Use cases
(Jean Pierre Vilotte, IPGP /VERCE/EPOS)
VERCE – e-science environment for data intensive (seisomological) research based on extensive SOA.
Continuous waveform datasets going back 20 years. Community well structured around data
infrastructures. EIDA with international links to USA and Japan.
Science = Earth interior imaging and dynamics; natural hazards monitoring; interaction of solid earth with
ocean and atmosphere.
WHISPER – new methods using seismic ambient noise for tomography and monitoring of slight changes of
properties in the Earth.
Typical workflow steps:
1. Downloading waveforms (=gather the data). Continuous waveform data in mseed/sac/wav formats
and its metadata. 1-100sTB. Large number of small data sets, typically. From data centres and from
local groups / institutes. Issues: gathering / aggregating a large number of datasets; ftp access and
bandwidth; disk transfer and data ingestion to local storage capacity where maximal use can be
made of it. Data has various lifecycle lengths.
2. Pre-processing (aligning waveforms, filtering and normalization). Trace extraction, aggregation,
alignment, processing, filtering, whitening, clipping, resampling. = low level data processing. Highly
parallel. Often messy to do on e.g., 5 years of data. Then has to be stored. Issues: different trace
durations, frequencies, overlapping, metadata definition (is currently manual and no common
understanding of what metadata is needed), trace format.
3. Computing correlations. More HPC intensive. Compute correlation of all coupling of traces. Mainly
FFTs. Complexity increases with number of stations. Results have to be stored efficiently. 50 10**6
correlations, 5000 points = 22 hours. Spectral whitening adds 40%. Can be done on Grid and
clusters. Issues: variable time windows, stacking strategy, trace query and manipulation, storage of
results, metadata definition.
4. Orchestrated applications
a. Measuring travel times and analysing travel time variations
b. Tomographic inversions and spatial/temporal averaging. Leads to images and models.
Issues: orchestrated workflow between between the data intensive applications and the HPC
intensive applications. Data movement and access across HPC and data intensive infrastructures.
Heterogeneous access policies and data management policies from one processing facility to the
next.
Need reusable software libraries, workflows, support for interaction and traceability. Datasets derived
during the processing has to be published so that they can be re-used for higher level processing, and then
archived after several months or years. What is the role of the data centres here and what is the vision? =
ENVRI problem no.7.
3 kinds of data lifecycle: persistent and resilient data as a public service; massive data processing pipelines;
community analysis of large data sets.
6
Use cases
(Rosa Filgueira, UEDIN /Effort)
Earthquake and failure forecasting (of rock) in real time from controlled laboratory test to volcanoes and
earthquakes. Presentations\DIR-effort.pptx
Project VERCE
(Paul Martin, UEDIN / VERCE)
Presentations\paul.martin.pdf
Aim – create a good platform for DIR. Most of the expertise / experience so far comes from the ADMIRE
project – which consumes workflows and executes them on the resources registered with the ADMIRE
gateway.
**How do VERCE and D4Science relate to one another?
**How do ADMIRE platform and Taverna Server relate to one another? What opportunities are there for
Taverna to generate DISPEL representations of its workflows?
Wittenberg: Workflows can be talked about on different levels. e.g, Workflows for researchers using eg
Taverna. workflow for data management e.g., OGSA-DAI, e.g., iRODS
Key issue is interfacing infrastructures like VERCE and others to PRACE and other compute infrastructures.
Need to be able to hide heterogeneities of access from the users.
Project COMMIT
(Martin Kersten, ICW / COMMIT)
Presentations\KNMIcommit.ppt
€100m project in Netherlands. National broad activity for 5 years. http://www.commit-nl.nl/
Database technology for events storage and processing – trajectory analysis. MonetDB competes with
mySQL and postGres and has far better performance above 100GB.
MonetDB, SQL and SciQL query languages, SciLens computational lenses.
SQL based on relational paradigm. SciQL changes that – symbiosis of relational and array paradigm. Allows
array operations. Driven by needs of astronomy, seismology, remote sensing. See TELEIOS project
(www.earthobservatory.eu) for an example.
Dream machine.
Today’s scientific repositories – usually high volume file based with domain specific standard formats (eg
SEED). Locating data of interest is hard. Limited flexibility, scalability and performance. Raw data and
metadata are often held separately, the latter in an RDBMS. Brought together through middleware and
delivered to apps.
Why are DBMS techniques not better exploited in science?

Desire to hold data more locally. Incompatibility between local stores and centralised data centres.
7


Schema mismatch between relational data model and scientific models.
.......
MonetDB Data Vault offers symbiosis between file repositories and databases with transparent just-in-time
access to external data of interest.
http://www.monetdb.org/Documentation/Cookbooks/SQLrecipies/DataVaults
Project EUDAT
(Giuseppe Fiameni, CINECA / EUDAT)
Presentations\Fiumeni KNMI_Coordination_Meeting.pptx
VPH have already implemented mechanisms for running multi-scale applications involving elements of
PRACE and EGI.
Simple flat; structured; detailed; = 3 levels of metadata. Dublin Core, CERIF and domain specific. Is proposal
of Keith Jeffery. **Adopt it in ENVRI?
How does EUDAT intend to cope with deposition of real-time / streaming data? At present it only supports
deposition of previously collected data sets. (Reveals possible need for links between community specific
data stores (eg., to collect sensor data) and repository data stores of EUDAT (eg. To deposit previously
collected data objects.) Thus, communities require staging area to collect real-time data and gather it up
into digital objects that can be deposited into EUDAT. =ENVRI problem no. 8 In ENVRI recognise this staging
area as a high-level component in ODP, along with EUDAT component.
Does the EPOS community have the notion of data objects? What are they?
**Kahn model – consider adopting it in ENVRI?
Discussion
Set joint development priorities; minimise overlaps. Identify gaps. Establish the follow-up. Input to EPOS,
EUDAT and ENVRI.
Topics









How to deal with the complexity? – separate components managed by recognised authorities,
registries, loose coupling, standards
PID implementation – EUDAT proposal? (EPIC)
QC procedures implementations
Metadata (mapping) organization and definitions
Versioning and provenance and traceability
Handling large datasets: archives, computational resources, processing
Federated data center: most efficient organisation
Archiving secondary data products (synthetics / correlation)
Workflow / dataflow / processing flow
PID implementation – EUDAT proposal? (EPIC)
8
DONA – Digital Object Numbering Authority. – being established.
Suggestion (obvious) is to use DONAs to persistently identify data objects. Community needs to define how
to use DONAs. http://www.pidconsortium.eu/, http://www.handle.net/. Can’t find DONA website.
There are multiple possibilities for how to define allocation of PIDs to objects. Community needs to find the
way most appropriate to its needs. And to support PIDs allocated to aggregations / collections of data.
Astronomers have got this cracked, for data and whole processing workflow. Why can’t this be used in
seismology? http://www.astro-wise.org/what.shtml cf.Bechhofer.
QC procedures implementations
n/a
Metadata (mapping) organization and definitions
What metadata has to be assigned? With what frequency? And how quickly?
The PID is a reference to an object and it’s used ad part of the metadata.
In theory PID are assigned to immutable object, a modified object is a new object (new PID).
The Data Center deals with the private backyard, data already used have a state.
Discussions points over the PID
o
PID on data samples vs PID on chunks: also chunks are often too many.
o
PID on a station configuration on a specific day. That does not help to replicate the experiments
using the data coming from a certain station.
o
A PID could be also assigned to any webservice request in a way that can be reproduced
o
PID per day linking to the PID of the chunks.
NERA/VERCE/EPOS have to keep EUDAT in contact with relevant people to make sense of the PID system to
agree on the assignment policy (it might be also applicable to different schemas if the community can’t
agree on one).
Should be able to identify secondary data produced during the (VERCE) workflow process.
ISO 19115 metadata std for geographic information.
From LifeWatch:


Primary and derived information (including metadata) related to biodiversity research
Meta-information, that is: descriptive information about available information and resources with
regard to a particular purpose (i.e. a particular mode of usage)
www.dataone.org/BestPractices
Versioning and provenance and traceability
9
An area to watch. Research objects. May be multiple approaches to reproducibility problem. Is naive to
think there is only one solution. Distributed computing world and HPC worlds may require different
solutions.
e.g, way I have predicted seismograms for global scale model on PRACE is completely different from
reproducing workflow process for cross-correlation calculations. Each needs different kind of information.
Handling large datasets: archives, computational resources, processing
MapReduce approach in the future?
Try a VERCE use case in monetdb – exploratory.
More people can use cpu cycles closer to the data rather than moving data to where cycles are.
Cloud is not a solution at present because it is a huge computer capacity but with bottleneck of data i/o.
Not obvious that cloud storage can solve the problem. It’s still a lot of data to move around.
Federated data centre concept (FDSN): most efficient organisation
Working system (EIDA) based on ArcLINK. EUDAT studying other approaches.
Aggregating metadata does not mean aggregating the data.
Need to explore technology stack promoted by EUDAT and to consider whether EUDAT will become a
sustainable infrastructure.
Moving data around continues to be a problem. No-one knows the answer to this.
AAA – what are the security requirements and solutions? Probably minimal until restricted access to data
becomes necessary. Use eduGain possibly. Although relatively small scale of restriction may not make it
worthwhile at this stage.
Archiving secondary data products (synthetics / correlation)
Starts with community deciding what is important to archive, costing and justifying it. Can’t define IT
solution until these fundamentals are known.
L0 – raw data; L1 – QC data; L2 – filtered data; L3 – research level pre-processed data; L4 – research
product. This categorisation comes from NASA remote sensing (I think).
**In ENVRI we need to think about how to handle each of these data products, vis a vis cost of archiving vs
cost of regenerating when required again. Can we find common solutions here? = ENVRI common problem
no. 9 Get the community closely involved in these decisions. They get to decide.
Workflow / dataflow / processing flow
?
10
Day 3 Friday, Feb 10:
9:00 – 12:00 reporting back from yesterday’s followed by discussions




Metadata definitions: how to start moving
QC: establish QC standards; standardized QC services
Projects: NERA,VERCE, EUDAT, ENVRI responsibilities
Followup and actions
Metadata enabling data mining
Searching Priorities (Chad):


Location of the data (which federation data is where). A pointer that specify where do I go to fetch
the data.
Discussion has to start about the Meta information, what would you like to search on?
EUDAT will start with the metadata task force soon. Therefore input needs to be pushed to them once the
metrics are defined. Laurent provide information about the same discussion within GEM:
http://www.nexus.globalquakemodel.org/gem-ontology-taxonomy/posts
Data characteristics: time window, location, QC parameters (integrity, having gaps or not, spectral density),
RMS values (statistic on the waveform). Other examples discussed:
- Filtering on noise (currently possible)
-
Mean ( based on timewindow, defining the granules is an issue )
-
IRIS calculates QC metrics on daily granularity (see Chad slides)
-
IRIS has to take its own way at the moment, despite the EU decisions on metadata metrics.
-
Networks operators contributes giving input metrics (IRIS won’t ask to everybody though)
-
SeisComP approach (timestamp configurable) Delay, Offset, RMS
-
Similar metrics but different algorithms (further detail on the procedures are needed)
This discussion will receive a follow up within the community. As a first step an EIDA wiki will be created.
Meta Data actions:
Peter Wittenburg: a group of people must get together to define and store the concepts and the terms in
order to define the metadata items. It has to be done within the FDSN. A useful starting point:
http://www.dataone.org/bestPractices
The EIDA wiki is a first step for our community.
11
QC Actions
-
-
Test dataset are needed to check the validity of the packages and proceduresPrepare the Seiscomp
list of QC parameters as input for the Mustang design. Next step will be to have a closer look to the
implementation of a certain metrics.
Marcelo will distribute the SeisComp QC list (first)
Orfeus will create an EIDA wiki for the definition of the QC metrics and meta data discussions.
Projects Coordination
VERCE
VERCE: SA3 Portal and centralized administration of the platform in coordination with NERA development.
VERCE SA1-SA2 through CINECA and UEDIN: Data replication, AAI and resources management in
coordination with EUDAT. It’s important to keep Amy (UEDIN) updated and actively involved on these
activities.
VERCE JRA1-2 SA2: definition of use-cases and workflow implementation.
Jean Pierre comment: Having people in low level discussions doesn’t mean that the projects are
coordinated. The coordination needs to be formalized. Architects in all involved projects and the project
managers have to coordinate these cross-projects activities. This will receive attention in the next VERCE
meeting.
Use-case coordination is also needed among the two projects VERCE-EUDAT
VERCE use cases: Reinoud Sleeman and Marek Simon will work out a use case using synthetics for Quality
Control. Feedback to VERCE NA2.
12
EUDAT
Replication: PID production (needed also by VERCE catalogs)
Collaboration with VERCE will focus on data staging (PRACE also will be involved)
AAI: EPOS is expected to take part to the discussion that will impact also VERCE. We expects some clear
directives within one year. The EPOS and VERCE communities should come up with requirements. Users of
the VERCE use cases must be represented under the EPOS umbrella in order to find the pragmatic solution
together with EUDAT on the adopted standards. Responsibility of this coordination should be taken within
the HPC participants in VERCE in collaboration with EUDAT .We need to avoid technology driven solution,
therefore requirements need to be provided from the use-cases. Inviting relevant partners to the
periodically video-conference is needed. CINECA might represent VERCE on this discussion related to AAI.
* Notes: based on Alex Hardisty and Alessandro Spinuso meeting notes; merged 20/2/2012
13
Background information and links (provided at the workshop):
EUDAT (www.eudat.eu) ; iRODS (www.irods.org)
VERCE (www.verce.eu) coordinator Jean-Pierre Vilotte
COMMIT http://www.commitnl.nl/Messiaen53/Update%2013%20mei%202011/Toplevel%20document%20COMMIT%20%20April%202011.pdf (representative: Martin Kersten):
NERA (www.nera-eu.org) (representative: Torild van Eck)



OGC web services : Web Notification Services; WMS/WFS for earthquake data viz with std GIS
tools.
Analysis of large datasets – execution of predefined workflows designed to filter, down sample and
normalise large datasets.
Resource orientation – unique and persistent ids for all kinds of raw data, data products, user
composed collections, etc. including annotation.
- Seismic portal (www.seismicportal.eu)
ENVRI :
EPOS (www.epos-eu.org): WG7
Data centres:
ORFEUS (www.orfeus-eu.org),
IRIS-DMC (www.iris.edu),
GFZ - GEOFON (http://geofon.gfz-potsdam.de/)
EarthCube (http://www.nsf.gov/geo/earthcube/)
OpenSearch (http://www.opensearch.org)
SeisHub (www.seishub.org)
ObsPy (www.obspy.org)
WHISPER (http://whisper.obs.ujf-grenoble.fr)
14
Download