Initial Version of Test PERICLES - Promoting and Enh throughout

advertisement
PERICLES - Promoting and Enhancing Reuse of Information
throughout the Content Lifecycle taking account of Evolving
Semantics
[Digital Preservation]
DELIVERABLE 7.3.1
Initial Version of Test Bed
B Implementation
GRANT AGREEMENT: 601138
SCHEME FP7 ICT 2011.4.3
Start date of project: 1 February 2013
Duration: 48 months
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
funded by the European Commission within the Seventh Framework Programme
Project co-funded
(2007-2013)
Dissemination level
PU
PUBLIC
PP
Restricted to other PROGRAMME PARTICIPANTS
(including the Commission Services)
RE
RESTRICTED
to a group specified by the consortium (including the Commission Services)
CO
CONFIDENTIAL
only for members of the consortium (including the Commission Services)
X
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Revision History
istory
V#
Date
Description / Reason of change
Author
V0.1
05/05/14
Initial content table
Sven Bingert
V0.2
27/05/14
Basic content available
Sven Bingert
V0.3
23/06/14
Last version for review
Sven Bingert
V1.0
27/06/14
Final version
Sven Bingert
Authors and Contributors
Authors
Partner
Name
UGOE
Sven Bingert, Philipp Wieder,
Wieder Noa Campos
López
DOT
George Antoniadis
UEDIN
Alistair Grant
ULIV
John Harrison
SpaceApps
Emanuele Milani
Contributors
Partner
Name
UEDIN
Rob Baxter, Amy Krause
SpaceApps
Rani Pinchuk
KCL
Simon Waddington,
Waddington
Mark Hedges, Christine Sauter (reviewers)
UGOE
Jens Ludwig
ULIV
Adil Hasan (reviewer)
DOT
Stavros Tekes (reviewer)
© PERICLES Consortium
Page 2 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Table of Contents
Contents
1
Executive Summary ................................................................................................
................................
................................................ 7
2
Introduction & Rationale ................................................................................................
................................
........................................ 8
2.1 Context of this deliverable ................................................................................................
..................................... 8
2.1.1 PERICLES project ................................................................................................
................................
.......................................... 8
2.2 What to expect from this document ................................................................
...................................................... 8
2.3 Document structure ................................................................................................
................................
............................................... 9
3
User scenarios ................................................................................................
................................
...................................................... 10
3.1
3.2
3.3
4
Arts & Media domain ................................................................................................
........................................... 10
Science domain ................................................................................................
................................
.................................................... 10
Component mapping................................................................................................
................................
............................................ 11
Integration framework ................................................................................................
................................
......................................... 13
4.1 Integration framework architecture ................................................................
.................................................... 13
4.1.1 Workflow engine................................................................................................
................................
........................................ 14
4.1.2 Ingest .........................................................................................................................
................................
......................... 16
4.1.3 Archival storage ................................................................................................
................................
......................................... 16
4.1.4 Data management................................................................................................
..................................... 16
4.1.5 Access................................
.........................................................................................................................
......................... 16
4.2 Connections and connection types ................................................................
...................................................... 16
4.3 Handlers ...............................................................................................................................
................................
............................... 17
4.3.1 What are they? ................................................................................................
................................
.......................................... 17
4.3.2 Why do we need them? .............................................................................................
............................. 17
4.3.3 What they have to do ................................................................................................
................................ 17
4.3.4 Anatomy of a handler and a component ................................................................
................................... 18
4.3.5 Handlers in a chain ................................................................................................
.................................... 19
4.3.6 Status responses ................................................................................................
................................
........................................ 21
4.3.7 Payloads and workflows ............................................................................................
............................ 22
4.4 Data object and data management ................................................................
..................................................... 23
4.4.1 Information package versioning ................................................................
................................................ 23
4.4.2 Local storage and version management for workflow ..............................................
................................
24
4.5 Integration and testing ................................................................................................
......................................... 25
5
Test beds................................
..............................................................................................................................
.............................. 27
5.1 Common technologies ................................................................................................
......................................... 27
5.1.1 Test management: Jenkins ................................................................
........................................................ 27
5.1.2 iRods ..........................................................................................................................
................................
.......................... 27
5.1.3 Maven ................................................................................................
................................
........................................................ 28
5.1.4 BagIt................................
...........................................................................................................................
........................... 28
5.1.5 Topic Maps engine ................................................................................................
..................................... 28
5.1.6 Web service containers ..............................................................................................
.............................. 28
5.1.7 Vagrant ................................................................................................
................................
...................................................... 28
5.1.8 Workflow engine and management ................................................................
.......................................... 29
© PERICLES Consortium
Page 3 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
5.1.9 Development languages ............................................................................................
............................ 29
5.2 Description of the arts & media test bed ................................................................
............................................. 29
5.2.1 Structure of the test bed ............................................................................................
............................ 29
5.2.2 Domain specific technologies ................................................................
.................................................... 30
5.2.3 Current status and tests.............................................................................................
............................. 31
5.3 Description of the science test bed................................................................
...................................................... 31
5.3.1 Structure ................................................................................................
................................
.................................................... 31
5.3.2 Domain specific technologies ................................................................
.................................................... 33
5.3.3 Status and current tests .............................................................................................
............................. 33
6
Roadmap .............................................................................................................................
................................
............................. 35
6.1
6.2
6.3
7
Media test bed ................................................................................................
................................
..................................................... 35
Science test bed ................................................................................................
................................
................................................... 36
Integration of tools and components ................................................................
.................................................. 36
Conclusion ...........................................................................................................................
................................
........................... 38
List of Figures and Tables ................................................................................................
................................
............................................ 39
List of Figures ................................................................................................................................
................................
................................. 39
List of Tables ................................................................................................................................
................................
.................................. 39
Bibliography ...............................................................................................................................
................................
............................... 40
Appendix ................................................................
................................................................................................
................................... 42
© PERICLES Consortium
Page 4 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Glossary
Abbreviation / Acronym
AIP
API
Architecture
Component
DIP
Framework
Meaning
The Archival Information Package is the information stored by the
archive.
Application Programming Interface.
The architecture is an abstraction of a complex software system or
program aiming to describe how the system will behave. It
provides a high level design of the system, the logics of the
software elements, the connectors and the relations between
them instead of a description of the implementation and technical
details.
A component is a functional unit designed and developed to
provide specific sets of operations and behaviors. Functional units
can be models, workflows or software.
The Dissemination Information Package is the information sent to
the user when requested.
A software framework provides generic functionality that can be
augmented by user-written
user
code to create a specific purpose
application. A software framework is a reusable software platform
to develop software applications, products and solutions. Software
frameworks include support programs, compilers, code libraries,
tool sets, and APIs (application programming
programming interfaces) that bring
together all the different components to enable development of a
project or solution.
JSON
The JavaScript Object Notation is a standard and simple text-based
text
message format designed to be human-readable.
human readable. It uses text to
transmit data objects consisting of attribute––value pairs. It is
commonly used to transmit data between
between a server and a web
application.
LRM
LRM is an operational OWL ontology to be used to model the
dependencies between digital resources handled by the PERICLES
preservation ecosystems.
LTDP
Long
Long-Term
Data Preservation.
OASIS model
The Open Archival Information System model is a standard for
digital repositories. The OAIS model specifies how digital assets
should be preserved for a community of users from the moment
digital material is ingested into the storage area, through
subsequent preservation strategies
strategies to the creation of the package
containing the information required for the end user.
The OAIS reference model is a high-level
high level reference model, which
means it is flexible enough to use in a wide variety of
environments.
The REpresentational State Transfer describes a way to create,
read, update or delete information on a server using simple HTTP
REST
© PERICLES Consortium
Page 5 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
SIP
Test bed
Test scenario
calls.
RESTful web services keep interactions between components
simple. Clients only refer to the target resource and the actions;
acti
each request contains all the information required for execution.
No client state is stored on the server between requests.
The Submission Information Package is the information sent from
the producer to the archive.
A test-bed
test
is a project development platform used for replicable
and transparent testing of new tools and technologies.
A testtest bed consists of real hardware and provides an environment
for testing new software.
A Test Scenario is a subset of a user scenario and associated user
scenario that allow for enactable testing to be carrying out.
A Test Scenario should have clear criteria for testing - these can
include pass/fail, performance and quality criteria.
A Test Scenario can encompass an entire user scenario or a minor
part of the user scenario.
UML
Use case
User scenario
Test Scenario can enable the testing of requirements against
product.
The Unified Modeling Language provides a standard way to
visualize the design of a software
software system. UML diagrams represent
in an easy and understandable way the behavior, actions and
components of the system.
A use case is the list of interactions between actors to achieve a
discrete and distinct goal. Actors can be human, internal or
external systems and agents. A use case is a formalization of a path
in a user scenario.
A user scenario describes foreseeable sets of interactions between
user roles and system agents.
A user scenario describes the process by which a goal will be
achieved by an initiating entity in plain language.
The user scenario should define the interacting agents in the
process and the time frame for the process to be enacted across.
VM
User scenarios
scenarios should include scope definitions both at the
equipment and organizational level.
Virtual Machine.
Workflow
A workflow is a sequence of operations, a step by step description
of a real work.
xIP
Information Package of any type (SIP, AIP or DIP).
© PERICLES Consortium
Page 6 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
1 Executive Summary
The PERICLES projects addresses the aspect of change within long-term
long term preservation and its impact
on the reuse of digital data over a prolonged period of time. The research focuses on developing
models which will guide the development of services, guidelines, tools, applications, interfaces in
support of managing change over time. Two cases studies from outside the traditional archives,
archives one
from the media and art domain and one from the science domain, have been chosen to validate the
outcome
come of the project and to direct the research into practice domains.
dom
For the Arts & Media domain several user scenarios that describe the interactions between the user
and the system were created, whereas for the Science domain workflow descriptions that capture
the relevant process information were defined. Combining the information provided by the user
scenarios, workflow descriptions and the user requirements of both domains, we extract the system
requirements, as well as common and specific components and tools that cover the functionalities
required.
The current version of the integration framework, developed so far and described in the deliverable
7.1.1 First version of integration framework and API implementation,
implementation, allows integrating and testing
the extracted components and tools. Its architecture is based on the OAIS model so we can find the
common high-level
level components as Ingest, Data Management, Archival Storage and Access, but also a
Workflow Engine that is responsible for orchestrating the actions
actions of preservation components.
Another new concept was introduced:
introduced: the handler. Handlers are the basis of the integration
framework, in the sense they act as the communication points for each entity within the system,
they do also deal (not the components) with the rest of the PERICLES system and the outside world.
The functionality and behavior of the components mapped from the user scenarios and workflow
descriptions need to be tested. To this end we developed test beds for the Arts & Media and Science
domains.
mains. Both test beds are based on the integration framework and are coordinated and managed
by Jenkins. They share some common components and tools, but they use different technologies.
Tests were run successfully on the first version of the test beds and are described in this document.
© PERICLES Consortium
Page 7 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
2 Introduction & Rationale
2.1 Context of this deliverable
This document is the second deliverable from the PERICLES WP7 Integration and Test Beds.
Beds In the
description for this deliverable it says: “Describes
“
and documents the first iteration of the test beds
for media and science”.
The deliverable describes the first version of the test bed, which is based on the initial version of
integration framework as described in the D7.1.1 First version of integration framework and API
implementation. It is also marks the
t milestone MS3 Initial version of prototypes, which relates to the
first iteration of the evaluation of the integrated prototype.
2.1.1 PERICLES project
Modern concepts of the long-term
term archiving need to incorporate continually evolving environments.
This includes not only the changes in technology but also evolving semantics. To validate the
approach and the proposed solutions, the research will apply the outcome to two distinct domains,
both of which differ from the traditional library archive:
archive
• Arts & Media
edia domain, with a variety of complex, large-scale
large scale and dynamic media data from
TATE, such as digital images and videos,
video born-digital data, software-based
based art installations,
• Science
cience domain, with experimental and contextual scientific, operational, and engineering
data from the European Space Agency (ESA) and International Space Station (ISS), like raw
data, operation commands, calibration curves, etc.
Not only the data of each case study differs, but also their respective stakeholders:
stakeholders artists, archivists,
historians, researchers, scientists and engineers. The interest in the data depends on the
stakeholder, which will evolve over the time, as will the technologies and the semantics.
sema
The
PERICLES project seekss to assure that the data generated today will be available and useful for the
next generations of users. These challenges are germane to both cases as shown in the following
examples:
• In the media case:: in order to ensure that a digital video artwork remains playable over the
next hundred years without losing fidelity with its original,
original it is necessary to record different
types of information: how the data was produced, its color, format and other properties, but
also how to represent the format changes.
changes
• In the science case: in order to preserve experimental data in a way that would allow future
researchers to replicate or continue the original work,
work it would be necessary to record
important experimental data like operation commands, ground and orbital conditions,
c
etc.
but also the contextual data like algorithms,
algorithms configurationss and operation activities.
2.2 What to expect from this document
Deliverable 7.3.1 covers the first version of the test bed integration. The description of the current
test beds includes the common technologies, domain specific technologies and the technical
infrastructure. This document also provides information about tests performed on the test beds and
the relations between the components and the functionality of the test beds. Furthermore,
Furthermore the base
of the test beds, the test scenarios, and their relation to the user scenarios and requirements is
demonstrated. This document also gives an update of the integration framework defined in the
© PERICLES Consortium
Page 8 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
deliverable 7.1.1. This update includes new concepts for the communication between components,
workflow management and data management. The document also describes the next steps as well as
the roadmap of the whole cycle of iterative test bed implementations.
2.3 Document structure
tructure
In Chapter 3 a short summary of the use scenarios and the cross-reference to the work package WP2
is presented. It also describes the concept of the component mapping applied for the first versions of
the test beds. The integration
ntegration framework used for the test bedss is described in Chapter 4. This
chapter gives an update to the deliverable D7.1.1 and introduces a new concept.
The current status of the test beds are explained in detail in Chapter 5.. This chapter is divided into
two parts for the Arts & Media and the Science case.
In Chapter 6 we present a roadmap for the development of the test beds and the integration of tools
and components over the complete course of the project.
We conclude in Chapter 7 and give the sources in the Bibliography and an example test plan in the
Appendix.
© PERICLES Consortium
Page 9 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
3 User scenarios
cenarios
PERICLES distinguishes, as mentioned above (see Section 2.1.1),
), two domains for which the project
addresses the long-term preservation of digital data in evolving ecosystems: the arts & media domain
and the science domain. User scenarios are stories describing the interactions between the user and
the system for reaching a goal, and workflow descriptions capture relevant process information. User
scenarios and workflow descriptions are the source of user requirements,
requirement which guide the
architecture development, and will act as basis for tests
test and evaluations. User scenarios from arts &
media domain and workflow descriptions from science domain are used as input to WP6 to derive
common components and tools through the methodology
m
explained in section 3.3. During the
duration of the project, scenarios and workflow descriptions will evolve through the addition of
components and results from the PERICLES research.
3.1 Arts & Media domain
Within the arts & media domain, working groups specified and analysed four different sub-domains
(as reported in D2.1 Requirements gathering plan and methodology):
methodology
●
●
●
●
Born digital archive collections
Digital video artworks
Software-based
based artworks
Media productions (Tate Media)
domains cover a wide range of long-term
long term digital preservation challenges and each of them
These sub-domains
represents specific issues regarding semantic changes.
change . Several scenarios, user
us
roles and
requirements have been already identified for these four sub-domains
domains (D2.3.1 Media and science
case study functional requirements and user descriptions).
descriptions). The main workflow areas of those
scenarios are:
●
●
●
●
Preservation
reservation planning & policies
Appraisal & ingest
Archive
rchive & data management
Access & re-use
3.2 Science domain
Due to different remits of the domains,
domains, the Space Science stakeholders are less engaged with
effective and pervasive preservation in comparison to the Arts & Media stakeholders. Therefore,
rapid case studies, which are very high-level
high level user scenarios, were defined to bootstrap interviews
with different types of stakeholders, in order to extract detailed data, process and workflow
information. From this, user requirements could be extracted for
for the different user categories, in
particular by analyzing their workflows with respect to dataset production and utilization.
The initial set of components reflects
reflect the gathered requirements and focus on:
• Continual
ontinual ingest of data;
• Relations and semantic extraction from data;
© PERICLES Consortium
Page 10 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
•
•
information accessibility and presentation;
creation of meaningful archival packages for atomic pieces of data.
3.3 Component mapping
m
A key aspect in developing the test bed is to match parts of the user scenarios and workflow
descriptions to components and technologies. Some components had already been pointed out in
the user requirements by the pilot users (Tate and B.USOC),
B.
), and other components need to be
developed in order to resolve technology gaps.
We established a methodology1 to allow for results from WP2 and WP6 to inform WP 3-5. The
requirements The solutions
objective is to track technological gaps detected with the help of the requirements.
might either be provided by those components anticipated in the DoW, or encourage partners to
develop specific components not anticipated in the DoW.
DoW
We applied following methods:
a) Building a step-by-step
step visual flow of the user scenarios for discussions with WPs 3-5.
3
We decided to use Process Flows as UML diagrams (see Figure 1: Example of a process diagram
with components mapping)) which are straightforward diagrams showing how steps in each
process fit together.. They facilitate the communication of how processes work and clearly
document how a particular job gets done.
done
Within those diagrams, we tried to match components against most of the UML symbols
(elongated circles). Partners from WPs 3-5
3 5 then tried to realize extended uses for the
components they had already been building (described
(
in the DoW), in order to fill the orphan
component boxes in the diagrams. Moreover,
Moreover partners sought to uncover areas that were
missing a component allocation, which might prove useful for other potential Long-Term
Long
Digital
Preservation (LTDP) cases. By following the above process we ended up with UML diagrams that
had most of the symbols matched to components which are either already available on the
market, or are currently under development or anticipated to be
be created as part of the PERICLES
project.
b) After this process had been carried out, some parts of the diagrams were not matched with any
component.
t. The corresponding activities in the workflow cannot be automatic and need to be
performed manually to continue
continue the process. In the future such "gaps" may be filled by
components built or made available by other projects.
c) Fill in component information in the registry document.
In month 7 the partners started to provide information about components they were building
bui
or
intended to build. These
se were referenced to very early versions of the user scenarios and could
only provide very vague descriptions for the functionality that they planned to cover. Once the
scenarios evolved and reached a more mature stage, the component list was re-activated
re
and
updated.
1
described in more detail in the deliverable D6.1 Specification of architecture, component and design characteristics (M20)
© PERICLES Consortium
Page 11 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Figure 1: Example of a process diagram with components mapping
© PERICLES Consortium
Page 12 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
4 Integration framework
f
The framework for integrating and testing the components and principles of the PERICLES approach
allow
llow a tester or developer to monitor and initiate a test run at different levels.
It was defined as a set of concepts and structures for testing combinations of software modules in a
controlled and systematic manner and provides the infrastructure and procedures to test large
lar
concepts and operations that span multiple systems.
4.1 Integration framework
ramework architecture
The basis of the implementation of the integration framework is the architecture
architecture (see Figure 2) as
described in the upcoming deliverable D6.1 Specification of architecture, components and design
characteristics from WP6.
Since the architecture is still being developed and refined the following definitions serve only to
provide a better understanding of the building blocks that form the integration framework and are
subject to change.
The integration framework architecture is based on the OAIS model and is comprised of the following
high-level architectural blocks:
1. Workflow Engine
2. Ingest
3. Archival Storage
4. Data Management
5. Access
© PERICLES Consortium
Page 13 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Figure 2: Integration Framework Architecture
4.1.1 Workflow engin
ngine
At the center of the architecture is the workflow engine. It is responsible for orchestrating the
actions of preservation components. A workflow is a sequence of actions that components need to
take in order to ensure the preservation of digital objects that are ingested into the digital
preservation system. It achieves
achieves this by providing the necessary functionality to define, store,
retrieve, update and enforce workflows that govern the system. The Workflow Engine is comprised
up of a number of smallerr functional blocks (see Table 1).
© PERICLES Consortium
Page 14 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Table 1: Functional blocks of the Workflow Engine.
Block Name
Responsibility
Access Control List (ACL)
Management
Managing access rights for users and components to the
digital material stored within the digital preservation
system.
Workflow Management
Enabling creation, update and validation of the workflows
that govern the digital preservation system.
Workflow Repository
Storing the workflows that govern the digital preservation
system and delivering them to other functional blocks when
requested.
Decision Point
Determining when preservation actions are required,
creating the necessary action chains (workflows) and
triggering the Enforcement Point.
Enforcement Point
Executing generated action chains (workflows), including
managing errors etc. Execution happens by triggering
component handlers.
Component Registry
Identifying, describing, and providing the location of
available preservation component handlers.
The workflow engine as described by the integration framework is, as most parts of the PERICLES
architecture, modular and allows blocks to be exchanged with other components/tools that perform
the same functions. Workflow management, repository and ACL are not exempt from this fact. The
use of a third party workflow system is and should remain an option with the addition of an facade or
other intermediate layer on the third party system’s API to conform with the interface of the
implementation framework’s workflow
workf
engine.
As a result of preservation planning
lanning,, workflows and policies are established to govern the digital
preservations system in order to ensure continued access to the stored digital objects over time.
Preservation events, such as the deposit of new
new material, or an alteration of preservation workflows,
are detected by the decision point,
p
which will determine if the current workflow mandates any
further actions. If so, the decision
ecision point will attempt to determine the action or sequence of actions
that need to be taken. It will then use the component registry to determine the appropriate
components for the required actions and create an action chain (a workflow of the necessary
components) and forward this to the enforcement point. The enforcement point
oint is responsible for
ensuring that the work of each component in the action chain is completed before forwarding the
task to the next component in the chain and so on. The way in which it does this will be discussed in
section 5.3. Once the action chain is complete, the enforcement point will inform the decision point
in order to determine if further steps are required.
© PERICLES Consortium
Page 15 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
4.1.2 Ingest
The ingest block is responsible for accepting a Submission Information
on Package (SIP) from a
producer,, and carrying out the steps necessary to create an Archival Information Package (AIP)
suitable for storing in archival storage
torage. This process is likely to be very use-case
case specific, involving a
number of sub-component
component acting in concert. The way in which such components are chained
together is discussed in section 4.3.
4.3
4.1.3 Archival storage
torage
The archival storage block is responsible for the secure storage of the bit streams (the sequences of
1s and 0s) that encode the digital content of each object. It will need to have some form of internal
cataloguing (that is, a mapping of the path or identifier of an object to
to the physical location of the
data in the underlying storage devices(s)) in order to provide ongoing access to the object, and will
also be responsible on some level for allowing or denying access to the various digital objects
through enforcement of the ACLs that are set and managed through the policy engine.
e
The archival storage block may be responsible for Versioning of the digital objects stored within it. It
may achieve secure storage through a combination of strategies, the simplest being replication, but
alternative or complementary approaches may also involve RAID, erasure coding etc.
4.1.4 Data management
anagement
The data management block is responsible for ensuring the preservations of a digital object’s
function.
4.1.5 Access
The access block is responsible for handling a request for an object from a consumer
onsumer. When such a
request is made, the access block carries out the necessary steps in order to convert the AIP into a
Dissemination Information Package (DIP) that is appropriate to the consumer
onsumer who made the
request. This process is also highly use-case
use case specific, potentially involving activities such as format
migration, information redaction etc.
4.2 Connections and connection types
Communication between components is performed through a communication
munication layer in the form of an
HTTP API. Components registered in the system need to conform
conform to a specific interface that allows
other components and the workflow engine to exchange information and data with them. Such an
architectural approach allows for a separation of concerns, meaning the functional API can be
layered
ed with other services and keeps interactions between components simple.
For the design of the API the RESTful architectural style has been adopted using JSON as the method
of transportingg information. The actual data packages are provided either encoded or as URIs
depending on the size and use. Each component is able to provide a description of the actions it is
able to perform together with its inputs, outputs and parameters.
The API allows
ows the components to accept action chains with the work they must perform together
with two additionally actions or chains; one for when their work is completed successfully and one
for error handling.
All components need
d to be registered on the workflow engine and provide a namespace/identifier as
well as a list of actions/endpoints and input/output types.
© PERICLES Consortium
Page 16 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
4.3 Handlers
In the following sections the document describes the newly
new introduced concept of the handlers used
for the communication of the components with
wit in the system.
4.3.1 What are they?
they
The handlers are the basis of the integration framework for PERICLES. The handlers are to function as
the communication points for each major entity within the system. The handlers deal with validating
incoming requests, exercising
rcising the functionality of developed components, storing and transferring
results of component functions and initiating the communications with further components to be
used.
The handlers do not perform any operations, which change or alter the data contained
cont
in xIPs, only
components can alter and change data – but a component should not have knowledge outside itself.
The components should in essence be highly functional ‘dumb’ pieces of software. This means they
can do their function and only their function
funct
– the handlers deal with the rest of a PERICLES system
and the outside world.
This simplifies the job of the component developers – they only have to provide a method, be it a
shell script, executable or callable function, to the handler which will be used to exercise the
component functionality.
Additional functions can be added to the handler based on the component needs – but these are
designed to be plugins to the handler and are not mandatory. These functions could include
functions like data validation,
ation, or checking if the right data format was passed to the handler for the
component to operate on.
4.3.2 Why do we need them?
them
In order to facilitate the creation and operation of situation aware workflows within a system, the
individual end point handlers for components must have the facilities described in order to allow an
auditable and recoverable processing workflow trail to be created. The handlers described here
would allow the overall system to track and adapt to changing circumstances, while providing
providi the
means to avoid repeating process steps in the event of a component failure.
The local storage of the handler would store processing workloads for a given period of time as
defined by a workflow coordinator - likely to be deleted when final storage is achieved. In concert
with this, the prospect of component failure across any complex and multi-node
multi node system is an everever
present risk. The logging and ability to adapt the targets for next steps in the handler allow this to be
mitigated while providing a monitoring
onitoring mechanism for the overall processing workload.
Providing the means to adapt workflows on the fly and to keep an audit trail with local backups of
current processing will allow the overall architecture to respond better to individual component
failure.
4.3.3 What they have to do
Table 2 summarises the operations that a handler must be able to perform:
© PERICLES Consortium
Page 17 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Table 2: Operations of a handler
Get Incoming Request
Validate and Respond to Request
Pull and Validate Data
Exercise Component
Store to Local Storage
Get Next Target Handler
Perform Request
Accept Response
Error Handling:
•
Local Response
•
Central Response
Garbage Collect Local Data Store on Request
Provide Status of Handler and Component
4.3.4 Anatomy of a handler and a component
Table 3 gives a view of the different parts of a handler and its relationship to a functional component.
The handlers are to provide
ide the functions listed previously and the below anatomy should be used as
a template so that the handlers can have a plugin basis for adaptation to changing technologies and
targets.
Table 3: Handler and Component Anatomy (Green are framework
developed, Purple are component developed)
Incoming Request
Workflow Coordination
Outgoing Request
Communications
Communications
Communications
Validation Adaptor
Local Storage
Validation Adaptor
© PERICLES Consortium
Page 18 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Component Adaptor
Component
4.3.5 Handlers in a chain
To function as part of a PERICLES system the handlers must be connected
connected in a chain (see Figure 3).
These chains form a workflow within the overall archive system. The workflows need to be stored at
central and/or local (distributed) levels; this requires a workflow repository. This would form part of
the preservation and data management layers in the OAIS model. The workflows will be based upon
the policies within the system as determined by the archive operators and where possible
translatable into machine executable workflows.
The need for a contactable mechanism for determining what the next step in the workflow is
requires the use of a workflow manager. At present, the development of the test bed uses an
a
incrementally developed approach for a prototype, though in future versions, this could conceivably
be replaced by a full workflow system such as Taverna.
Figure 3: Chained Handlers
Figure 3 shows the concept of chaining the handlers together into a workflow. The workflow
management is where a handler can determine its next step in a workflow – where the data has to
go. The actual mechanics of workflow management – it could be a centralized component,
c
through
this would introduce a single point of failure and possibly performance issues, or it could be
distributed where local cached copies of the part of a workflow relevant to a handler are stored. This
would introduce the need for a refresh mechanism
mechanism but would remove a single point of failure.
So far this introduces to the framework, three major functional aspects:
© PERICLES Consortium
Page 19 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
•
•
•
Components:: These are the functional unit workers. These should only be focused on the
task they were designed for and should not be required to communicate beyond themselves.
Handlers:: These are the wrappers and communication hubs that allow a workflow to be
followed through the system. These should follow the required behaviors listed and have the
ability to respond to status inquiries.
inqu
Workflow Manager:: This should store the standard workflows as templates, which can be
instantiated with a workflow instance identifier and handlers can communicate with the
manager to determine the next endpoint to target including parameters that are
a required to
call that endpoint.
The communications in the system operates over HTTP using the CRUD actions, GET, POST, PUT,
DELETE. The standard method of calling another handler is done via the POST method with a JSON
payload.
methods. The GET method (see Table 4)
The handlers must providee at a minimum the GET and POST methods.
should provide, at a minimum, a status method that will return the current status of all local
workloads at a given component. To extend this, the GET method should provide a method for
getting
ing the status of a specific workload.
Table 4: GET method for getting the status of a specific workload.
GET Target
Results
/status
Returns a list of all current workloads and status.
/status/<ID>
Returns a status of the workload identified by the ID
The POST method (see
POST Target
Results
/<HANDLER> with JSON Payload
Returns a validation and response code:
•
Valid Payload: HTTP 201 Created
•
Invalid Payload: HTTP 400 Bad Request
Table 5) should allow for the transmission, reception and validation of a JSON payload containing the
information required for the handler to exercise the component that it represents and then move
the
he workload to the next handler.
POST Target
Results
/<HANDLER> with JSON Payload
Returns a validation and response code:
•
Valid Payload: HTTP 201 Created
•
Invalid Payload: HTTP 400 Bad Request
Table 5: POST method provide for a handler
© PERICLES Consortium
Page 20 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
The handlers should be able to query the workflow manager server to get a workflow response to
allow them to determine what the next handler to be contacted is. This facility should be exposed by
the server via a GET operation (see Table 6).
Table 6: GET method for getting the workflow
GET Target
Results
/workflows
Returns a list of current workflows
/workflows/<ID>
Returns the workflow identified by the ID
/workflows/<ID>/<STEP>
Returns the step <STEP> of workflow <ID>
4.3.6 Status responses
esponses
In order to coordinate workflows, monitoring the readiness and activity of components in an archive
system, the handlers will be required as previously shown to respond to status queries from another
entity. To codify the possible responses and levels of response the following types of status as
proposed as an initial version. The handlers should respond to a status request with a reply
composed of the elements detailed in Table 7.
Table 7: Status response of a handler
Response Level
Response Values
Meaning
One
Alive
Handler and Component are active and will respond
with further status information levels.
Two
Dead
Handler is not able to process requests.
Idle
Handler has no active workloads.
Queue
Handler has a queue of active workloads and continues
with further status information.
Three
Pending
Workload is in queue awaiting processing
Working
Workload is actively being processed
Blocked
Component is not responding to handler
Error
Error has occurred during the processing of a workload
© PERICLES Consortium
Page 21 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Finished
Workload has finished.
4.3.7 Payloads and workflows
Workflows (Code 1 and Code 2)) and Payloads (Code
(
3 and Code 4) are described by JSON structures –
these are lightweight and descriptive and readily interpreted by many different technologies.
4.3.7.1
{
"id":
"wf":
{
"id":
},
{
"id":
WORKFLOWS
"1",
[
"componentX",
"url":
"http://127.0.0.1:7000/handlers/x",
"params":
[
"--in
INFILE",
"
"--out
OUTFILE"
]
"componentY",
"url":
"http://127.0.0.1:7000/handlers/y
http://127.0.0.1:7000/handlers/y",
"params":
[
"--validate",
"--in
INFILE"
]}]}
Code 1: Example Workflow
A workflow, such as that in the above example, is composed of an identifier and a set of steps, which
identify component handlers and their inputs.
Below is the skeleton JSON:
{
"id":
"wf":
{
"id":
<Unique Workflow Identifier>,
[ <Set of Steps in the Workflow>
<Component Identifier>,
"url":
<Component Endpoint>,
"params":
[<Parameter Array>]
}]}
Code 2: Workflow JSON Skeleton
4.3.7.2
PAYLOADS
Each POST operation to a handler must as part of the call include a JSON payload which identifies the
data source, xIP or workload identifier, workflow instance identifier, workflow template identifier
and step and the parameters
meters for the handler to use with the underlying components.
The payload should be well-formed
formed JSON and the handler can have further validation on the
parameters section and possibly the data source identifier.
{
"payload":{
"xid":"1",
“xuri":"https://github.com/pericles
“xuri":"https://github.com/pericles-project/tests.git",
"wid":"1/0",
© PERICLES Consortium
Page 22 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
“wiid”:”389982-1293081”
“params”:
}}
Code 3: Example Payload
{
"payload":{
"xid":<Workload/xIP Identifier>,
“xuri":<URI for Data Source>,
"wid":<Workflow ID/Step>,
“wiid”:<Workflow Instance>,
“params”:<Parameters>
}}
Code 4: Payload Structure
The payload consists of five elements:
• xid:
the identifier for the current xIP
• xuri: URI for the data source to obtain the xIP content to be operated on
• wid: Workflow ID and what step the component is
• wiid: Unique operating instance ID of the workflow.
• Params: array of parameters for the underlying component.
4.4 Data object
bject and data management
The basic unit handled by the major
major systems in an archive described by deliverable D7.1.1 is the
transmitted data packages. These are bags as defined by the BagIt library. It is a defined format that
includes manifest and fixity values validation. The intention is to use this as the basic
bas interchange and
storage format - individual components within the major system operates on the contents of bags
but outputs a set of bags for handling by the integration framework for an archive. Having a defined
interchange format allows the communications
communications of the archive to be removed from individual
components to another abstraction layer, thus allowing components to be tightly focused on a single
purpose.
As part of using bags as the basic transfer and storage unit, a unique identifier for each bag stored in
the overall archive is generated. This identifier must be unique only within a given archive and is
intended to internal usage only. This identifier allows artworks and materials to be associated with
one another through the data management layer and via any storage metadata.
The use of the identifiers allows networks of AIPs to be followed to allow for new information to be
added to existing artworks and documentation. The intention is to allow any AIP to be related to
another. It is important to keep the identifiers in the metadata in the storage to allow reconstruction
of AIP networks in the case of catastrophic failure of the data management layer. This provides a
method to follow the evolution of an object within the archive.
To enable this functionality
nctionality to be consistent it is important that individual implementations of the
storage and data management layers conform to consistent and uniform interfaces designed to
expose required functionality.
4.4.1 Information package
p
versioning
One part of the data
ta object management design has to cope with different versions of the xIP
structures being transferred within the system. It can be envisioned that requirements for the
information and structure will vary over time and component needs may change over time as well.
This could result in updates to how a xIP is structured and defined.
© PERICLES Consortium
Page 23 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
This is to be expected in a long-lived
long lived system, to help accommodate this, the components and
handlers must be able to identify the xIP versions, which can be received as input and
an transmitted as
output. This information will be required by the workflow coordination and policy management
components such that alternate paths for workflows can be defined. It is important to maintain on
the input side at a minimum backwards compatibility
compatibil for xIP versions.
4.4.2 Local storage
torage and version management for workflow
orkflow
A potential solution to the transfer and management of changes to data flowing through the archive
system process is to have the handlers have local storage under version management, which
w
are
used to store the data, changes and audit logs.
Figure 4: Data Pull
Figure 4 shows the path of data being pulled along a chain of handlers between the local data
storage (yellow circles). The diagram shows a normal fully functional case where once a handler
receives a request (POST) from another handler or client, the targeted handler will respond with a
okay (HTTP 200 OK response) or resource created (HTTP 201 Created response). Once the request is
being processed the new handler will pull data from the previous handler
handl - this will include a change
log and a file set in the form of a repository. This repository could be implemented via Git or other
source control system. The logic behind this is to create a series of auditable workloads,
workloads which will
allow processed work to be retained at a local storage at components in the event of a failure in a
workflow.
This need not be linked to any specific technology
techn
– the main criteria are:
• Pull and Push Mechanism for Data Transfer - this will allow data to be transferred between
betwee
components.
• Delta Storage Log - the change log for the digital objects - this should allow the changes and
processing of the
he objects to be tracked.
• Audit Log Creation - record who and when made changes.
There are potentially a few issues with this approach:
© PERICLES Consortium
Page 24 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
•
•
•
•
•
•
Changes and Tracking on Binary Files: Current methods for tracking changes in binary files
are not efficient and could cause performance or storage problems when binary files
undergo many changes.
Scheduled Cleanups: How are the
the local repositories removed when no longer required by the
system how do the handlers get informed to remove data?
Workflow Mapping: At what level should a component and its handler know about its place
in a workflow? Can a balance be found such that a component
component can intelligently balanced its
workloads and be able to adapt to failures in up or downstream
downstream processing elements.
What is being stored: What should be stored on the local storage - should it be an entire
repository or a subset of the repository that
that the component operates on.
When and how
ow transfers are affected: Are the transfers synchronous with the requests,
queued and what will be used to affect the transfer of content?
What is being transferred: Are changes and deltas for files being exchanged or are entire
repositories being transferred?
Version Management is a common and standard practice in software development and with some
investigation could be adapted for use in the handler layer of the framework.
This introduces an element of redundancy
redundancy to the storage of processing data; if a component or chain
fails the actual processed data to that point is still stored by other components and possibly a
centralized backup (depending on policy). Due to delta storage, only specific artifacts need to be
marked as origin or important copies to be stored in their entirety, other parts can be generated
through the delta logs.
This mirrors the steps that the user scenarios are exploring to record information about the prepre
ingest processes to create the data to be stored in an archive.
Using version management form an AIP in the storage layer means the original data, changes and
additions to the data and importantly the record of how these changes were made and in what order
can be stored as an atomic unit.
4.5 Integration and testing
The approach being taken as laid out in deliverable D7.1.1 is a test driven approach which will allow
for unit, integration, and system testing by developers of components in the test bed while allowing
scenario and disaster testing to be implemented as higher level automated testing. It is important to
note that system testing and scenario testing are not the same - scenario testing is against specific
criteria laid out by the stakeholder for a given set of circumstances - system testing
sting is more general
operation and behaviour testing.
To accommodate the wide range of components and target parts of an archive system, the major
infrastructure components in the test bed will be continuously operational. This is to reduce the
potential scope for error and complications in these components were to be redeployed for each test
run. To try and ensure no carry over between test runs - reset scripts will be employed to delete and
recreate a basic data set in the components for the tests to operate
ope
on.
These tests will be coordinated and managed by Jenkins continuous integration framework. Jenkins
will obtain the source code from the repositories identified by the developers, test the structure of
the component, the compilation of the component and then start on the different level of testing.
Initially
itially most components should be provided with a set of test cases and a skeleton code - these test
cases should fail if the code is incomplete. Note that non-compiling
non compiling code will automatically fail all
testing. Part of this will include the generation of documentation
documentation for a component - this includes
© PERICLES Consortium
Page 25 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
source code documentation and user manuals where appropriate. An example test plan is located in
the appendix.
Prior to code submission, the developers should
should make sure that the test bed has the required
compilers and system libraries required by their code. Should third party libraries be required,
required a
dependency management solution must be employed. Where such a system is used, the developers
should contact thee test bed administrator to ensure that this is installed and available to Jenkins.
The tests will report back on the success of compilation and the success/failures of the tests
implemented at that point in time. The reporting mechanism will be via the dashboard
da
on Jenkins
(http://129.215.213.251/jenkins/
http://129.215.213.251/jenkins/).
). This will include a history of recent builds, execution, and
compilation times and a link to a generated website if the build script and Jenkins provides support.
su
It is the developer’s responsibility to check the status of the components and the tests.
In D7.1.1, the main types of test were described, current and future work builds upon the definitions
of unit, integration, system and scenario testing.
Unit testing
esting of components are the responsibility of the component developers - these tests are
about the functionality of the component. The framework components for communications and
coordination will be unit tested by the framework developers. Both sets of unit
u tests should be
available to the continuous integration testing management system as pre-cursors
pre cursors to further tests.
Integration testing will fall into four main categories of inter-component
inter component testing within the
framework. These categories are:
• Status Tests - These are common tests for ensuring the availability and monitoring
capabilities of the handlers and infrastructure coordination.
• Communication Tests - Tests the ability to move data and commands between the handlers
and components in small scale tests
tests between isolated component groups.
• Behavior Tests - Tests the functionality and consistency of the component groups under test
conditions, which should include normal and error cases.
• Versioning Tests - Tests to cope with variability in the versions of components
c
and
transmitted data packet structures.
System Testing will be focused around end-to-end
end end general operation of components within the
integrated system. These follow the pattern of the integration tests and add two more test
categories:
• Replacement Testing - Tests to evaluate how the replacement of components can be
accomplished and where required indicate the need for migration planning.
• Disaster Testing - Tests for archive system resilience to induced major component failure and
data loss.
The last category of tests is scenario tests as determined by the two case studies. These tests will
follow similar patterns to previous types but will be centred around specific actions and responses.
© PERICLES Consortium
Page 26 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
5 Test beds
The functionality and behavior of the components and tools mapped from the user scenarios and
workflows descriptions and their requirements should be tested. To this end we developed test beds
for both domains, Arts & Media and Science.
A test bed is a project
oject development platform that provides an environment for rigorous, replicable
and transparent testing of new tools and technologies. It includes software, hardware and
networking elements. The test beds are based on the integration framework previously detailed
allowing developers to test a particular component or tool in an isolated fashion, as the framework is
implemented around the component. In that way, the component behaves as if included in the
program/system, which it will later be added on.
5.1 Common technologies
echnologies
5.1.1 Test management:
anagement: Jenkins
Jenkins (Jenkins, 2013) is an open source fork from the Hudson project and a continuous integration
system for software testing. The system can execute local and remote test suites, monitoring the
progress and statuss of execution with reports being generated to a web interface.
The system can be administered through a web interface.
interface The test system can interact with remote
virtual machines, code repositories and build systems. The flexibility of the system in how reports are
generated and how tests are developed and added to the execution plan is well documented and
explained. Key factors that have helped Jenkins to become an accepted system are the easy and
flexible configuration, the plugin model for new tools and
and process, the reporting mechanisms and the
distributed build and execution system.
For PERICLES, Jenkins offers a good continuous integration tool, which can evolve as the project
matures, removing issues about the work expanding beyond the testing capabilities.
capabilities.
The Jenkins dashboard for the test status is accessible via the URL (A&M):
http://129.215.213.251/jenkins
The current tests being hosted by Jenkins are build and unit testing for the framework components
compone
that form connections between the ingest subsystem and the storage layers.
5.1.2 iRods
To quote the online documentation (iRods, 2013):
“iRODS is the integrated Rule-Oriented
Oriented Data-management
Data
System, a community--driven, open source,
data grid software solution.”
iRods provides the functionality to manage large file collections across disparate file systems and
storage technologies, providing facilities for data replication, integrity checks and policy based file
migrations.
© PERICLES Consortium
Page 27 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Similar to Archivematica, iRods
Rods uses a micro-service
micro service architecture monitoring and controlled by a
configurable and extendable rules system. The micro-service
micro service architecture is well defined and has a
clearly defined set of interfaces for micro-service
micro service components. This could be of use in PERICLES at
multiple levels, iRods could function in ingest, access or storage layers or a combination of layers.
iRods has a large user and development community with interfaces to multiple languages and
different approaches to interactions including command line scripting and web services. iRods
appears to be well placed for the near future for stability and any cessation of activity in the software
will likely be well notified.
(A&M) iRods: hosted on an internal virtual machine. It acts as the storage
storage layer in the current test
bed.
torage component for archival packages.
(Sc) An iRODS installation, finally, provides the archival storage
5.1.3 Maven
Maven is a build automation tool used to manage the software build, packaging and deployment
process, initially for Java project but has been expanded to support other languages. This framework
supports dependency management and a plugin behavior support to include new and different
functions. The plugin nature of Maven is done via an exception model, where the default Maven
option is used unless an explicit setting is described.
5.1.4 BagIt
BagIt is a file packaging structure for storing and transferring digital content. The structure (or bags)
mandates a payload (the content to be stored) and tags (metadata about the file structure).
st
The tags
are important as they contribute to the validation and verification of bag contents through
checksums and file listings. BagIt is intended to be cross-platform
cross platform and is supported via libraries in
several programming languages. The format has
has wide adoption in digital libraries.
5.1.5 Topic Maps engine
e
The Topic Maps engine provides the ability to store and work with both data and metadata
(relationship among pieces of data), that is, it is suitable for implementing the data management
component. Once
nce an ontology has been defined, data and metadata can be organized in a semantic
model by means of the Topic Maps engine. The functionalities to query and edit the semantic model
are exposed implementing TMAPI, a standard Java interface.
5.1.6 Web service containers
c
Where required the virtual machines run a web service container which hosts the web services
fronting each major component.
5.1.7 Vagrant
Vagrant is a free and open-source
source software that provides a reproducible, easy to configure virtual
development environment. It works with several virtualization software, such as VirtualBox and
VMware.
© PERICLES Consortium
Page 28 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Vagrant allows all members of a team to run code in the same environment, against the same
dependencies and all configured the same way, independently of the platform (Linux, Mac OS X or
Windows). Vagrant makes easier the code development, testing, versioning and revision control, as
well as managing virtual machines via a simple to use command line interface.
5.1.8 Workflow engine
ngine and management
The basic concept in the test bed is the use of workflows and to this end the handlers are intended to
facilitate communication on this basis. It has been noted that if this is the intended approach that
existing workflow engines and management systems may provide much if not all of the
t required
functionality. To this end, existing workflow engines will be examined for either replacing the
handlers or augmenting them. Two of the initial engines to be examined are Taverna and Kepler.
Taverna Workbench is to be examined for this purpose.
purpose. Developed by myGrid and being actively
developed with an extensible design philosophy and used in commercial and academic organisations,
this is a prime candidate for workflow design and execution.
Kepler is another workflow
flow design and execution engine that,
t
like Taverna, presents the workflows as
directed graphs with the nodes being the execution and decision points and edges being
communication channels. Kepler was initially
initial developed in 2002 and has been used in many scientific
fields for workflow development
lopment including hydrology, physics and computer science.
5.1.9 Development languages
The following languages have been identified as being useful in the development of the test bed and
integration framework components:
• Python - general purpose high level programming language designed to emphasis code
readability and brevity. Python supports multiple styles of programming and includes
dynamic typing and automatic memory management.
• Java - general purpose high level programming object oriented language intended
inten
to support
multiple platforms in a “write once run anywhere” model. This is accomplished via the
compilation to byte-code
code and Java virtual machines for different platforms to interpret bytebyte
code.
• Javascript - a prototype--based script language using dynamic typing with first class functions.
Originally for web browser client side scripting, Javascript and frameworks based on it have
been adopted for other purposes such as GNOME Shell using it as a default programming
language.
• Coffeescript - a language
e that compiles to Javascript but improves the development process
by adopting ideas from other languages into its syntax. This improves the readability and
maintainability of programs developed using it.
It should be noted that this list in not exhaustive
exhaustive and only reflects current work in this area, and will
be added to as required in the lifetime of the project.
5.2 Description of the arts & media test bed
ed
5.2.1 Structure of the test bed
The media test bed is based on the application framework presented in the deliverable D7.1.1. The
current test bed implementation is based on the main requirements for components and
© PERICLES Consortium
Page 29 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
functionality required in the ingest and storage processes from the Arts & Media user scenarios. The
test bed is intended to have the main infrastructure components (including ingest engine, data
management)
anagement) set up on virtual machines as required by the test scenarios. Upon these virtual
machines, instantiations of components under test in the current test scenario will be installed and
operated upon.
Figure 5:: Conceptual view of the Arts & Media Test Bed Instantiation
Figure 5 shows the current test bed approach in terms of an OAIS structure.
structure. In this form a test bed
instantiation would be composed of five distinct entities:
• Ingest Engine - Archivematica
• Data Management - Mock-Up
Mock
TMS
• Archival/Storage - iRods
• Access - Web Portal
• Test Management - Jenkins (Not Shown)
An instance of Jenkins is used to manage and coordinate the tests enacted upon the test bed. Jenkins
is configured to pull source code from Git and SVN and
and can use Maven 2/3 or other build systems to
test for the build stage of tests and then to run associated tests.
5.2.2 Domain specific
pecific technologies
The following software components are hosted across the virtual machine network:
• Archivematica: a software component required by the arts and media domain. This is hosted
on the externally visible system and serves as an entry point to the ingest process.
MySQL:: this is currently serving as a data management layer in place of the proprietary systems like
TMS (The
he Museum System, Tate’s collection management system).
system . The current instance mimics the
data structures exposed via an example data set provide by Tate.
© PERICLES Consortium
Page 30 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
5.2.3 Current status
tatus and tests
The current test bed implementation is hosted by EPCC at the University of Edinburgh. This test bed
is hosted as a series of Linux (Ubuntu) virtual machines (VM). One virtual machine serves as an
external entry point to the test bed system. This is an externally
externally accessible system, with the other
components of the test bed being hosted on internal virtual machines. The access layer is currently
undergoing design and development and is not part of the test bed.
The main functionality currently under
unde test is the use of BagIt and compression
ompression file conversions to
move between the ingest, archival and data management layers. No research components are under
test on the test bed at this stage, the only tests are on integration technologies. Future tests will
includee tests for integrating research components, third party software and robustness testing.
The
he current system uses preinstalled components on a number of VMs in Edinburgh. There will be an
initialisation phase to set up the test bed before running the tests.. It is not very feasible to store VM
images in a revision control repository so we are proposing to use Vagrant
(http://www.vagrantup.com/)) to create VMs for the test bed components. The configuration files
("Vagrant-files")
files") contain descriptions of machine types and bootstrapping configurations for VMs.
They are text files that can be easily change-tracked
change tracked in revision control. The time it takes to create
the VMs will have to be determined, in order to decide how often
often these can realistically be executed.
The cloud at GWDG supports Vagrant.
5.3 Description of the science test bed
5.3.1 Structure
The Space Science test bed includes clones of tools in use at B.USOC
B USOC as well as additional software
and models that implement the different components in the OAIS Model and unit test. In this form a
test bed instantiation would be composed of four distinct entities:
• Ingest Engine - YAMCS
• Data Management - Web Portal
• Archival/Storage - iRods
• Access - Web Portal
The diagram in Figure 6 illustrates such tools and components and their major interactions.
Additionally, a separate infrastructure (hosted by SpaceApps) is in place to support software
development, providing feature/bug tracking, planning and unit testing.
© PERICLES Consortium
Page 31 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Figure 6:: Conceptual view of a Science Test Bed Instantiation
© PERICLES Consortium
Page 32 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
5.3.2 Domain specific technologies
The cloned tools in use at B.USOC
USOC are:
•
•
•
YAMCS,, an open source mission control system; it manages the data coming from payloads in
the ISS, including both experiment and “housekeeping” data
Predictor,, an operation planning and reporting tool; operators use it to have a centralized
overview of all the relevant information that affect operations;
Alfresco,, an open source document management system; at BUSOC it is used to store,
catalogue and search different kinds of digital documents (reports, manuals, procedures…).
The components developed within PERICLES are:
• Semantic model,, a Topic Maps representation of the data and metadata in the Space Science
case; metadata,
adata, in particular, include the relations among different pieces of data.
• PERICLES Portal is the web front end and main user interface of the preservation system; it is
meant to provide control of the system itself and access to its content. The presentation
presenta
of
information is focused on the relations modeled by the semantic model and is customizable
by users.
Packager is a tool to build archival packages out of selections of the information managed by the
system (including relations).
5.3.3 Status and current
urrent tests
Each of the cloned tools in the Space Science test bed is hosted by Linux virtual machines within the
B.USOC
USOC premises, and is populated with a representative sample of the real data stored at B.USOC.
B
In this context, besides recreating the current existing environment, they play the role of data
sources, that is, they are the locations data is pulled from during
d
ingest.
The semantic model currently in use (for the very initial test bed implementation) is very limited,
li
as
shown in the Figure 7.. However, a much deeper and richer one is under development
development - a semantic
model that already contains hundreds of types and relationships and covers the whole SOLAR
experiment.
Figure 7:: Example of the semantic model provided by Topic Maps Engine
The portal is at the moment focused on data access, as it presents the content
ntent of the Topic Maps
engine via web pages, one for each atomic piece of information. Data presentation is customizable
© PERICLES Consortium
Page 33 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
through templates. The portal also provides automatically generated templates,
templates which take
advantage of the relations stored in Topic Maps.
The first prototype of the portal has been implemented and presented to the consortium. A
screenshot is provided below.
The packager is currently a stand-alone
stand
tool, which interfaces with the semantic
mantic model through the
Topic Maps engine. It offers:
• preliminary
reliminary support for different packaging standards (SAFE, BagIt, custom descriptors);
• configurable
onfigurable data selection based on the semantic model and original data sources;
• ability
bility to upload packages to and browse the content of the archival storage component
(currently iRODS).
There is currently no complete component for automatic ingest from data sources. On the contrary,
this operation is still carried out through a semi-automated
semi
procedure, which consists
nsists on:
• data
ata sources scanning and scraping;
• ad-hoc
hoc semantic extraction, relation inferring (e.g. based on time information, or well-known
well
identifiers);
• Topic Map population.
The unit test and development infrastructure is hosted and managed by SpaceApps. A Jenkins
installation is configured to pull sources from the code repository, build the projects and run unit
tests. For building and unit testing, Jenkins relies on Maven and sbt depending on the programming
language used for the single piece of software.
Integration tests are run on installations within BUSOC premises and carried out manually. They
cover:
• relations extraction during ingestion (e.g. checking that the relations
relations between reports,
document and experiment data are identified and are correct);
• browsability of relations in the PERICLES Portal (e.g. checking that all relations of a certain
piece of data are shown, and that such links can be actually used to reach its neighbor pieces
of data);
• readability of automatically rendered pages in the PERICLES Portal (that is, an assessment of
whether a page makes sense to humans);
• archival package creation and upload to storage component (checking that the content of
the packages
ages matches the configuration and that the storage component does receive them).
© PERICLES Consortium
Page 34 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
6 Roadmap
The development of the test bed for both the science and media test bed is a continuous process
during the whole period of the PERICLES project. This deliverable describes the first implementations
being aware of not having the full functionality or documentation available for now.
The next important steps identified to increase the functionality in the test beds are:
• Refactoring and Extension
ension of the Framework:
Framework: the current framework needs to be refactored
to reflect better the understanding of the requirements for the test cases. Additional
functionality will be required to accommodate the range of tests and components under
examination.
• Integration of Component Testing:
Testing: The methods for testing need to be published to
component developers. These need to state how the source should be exposed, how test
data is used, and what types of tests should be carried out at what stage.
• Test Data Repository:: A test data repository should be created to avoid the need for
replication of test data with a set of scenarios. This would allow better sharing of resources
and a common understanding of what and why something is part of the test data body.
• Automated
tomated Scenario Testing:
Testing: The current method of testing a scenario is manual execution.
For further and repeatable testing this has to be replaced with a scripted automatic test.
• Document Generation:: The intention is to provide a set of living documentation
documentatio to reflect the
current status of components, testing and future plans. This will be accomplished via the
automated document generation facilities available in tools like Maven and Jenkins given the
appropriate source materials.
NOTE: Components should bee put into testing/build system as they come online.
This means as tests are developed for them they should be added, even when the tests fail.
NOTE: It should be noted that these timeframes below are for the components/tools coming online
for further development
lopment where appropriate - tests should be continually updated
d for example as new
information becomes available.
6.1 Media test bed
ed
Table 8: Arts & Media test bed roadmap
Month
Task
Description
July onwards Additional Technology
This is an on-going
going tasks to be updated with the
further development of the user scenarios:
Identify all current additional target technologies
for use in the test bed, which are not provided by
project partners. This is based on use case
scenarios. Once identified,
dentified, set up instances for
use in future test beds.
August
Identify a core set of test scenarios for use in the
test bed and create the test plan. Collect and
ensure test data is available.
Initial Test Scenarios and Data
© PERICLES Consortium
Page 35 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
SeptemberDecember
Test Scenarios Implementation
and Maintenance
Implement the identified test scenarios,
monitoring for additional required tests,
document errors and failures. Fix components
accordingly. Begin recording of success/failure
metrics, process and software quality metrics.
m
January
onwards
Continuous Test Scenario
Development
Improve, augment and expand the test scenarios
based on
n the user scenarios and software
development process.
6.2 Science test bed
b
Table 9: Science test bed roadmap
Month
Task
Description
July
Automatic test bed deploy
and integration test
Automate the deployment of the test bed
components within B.USOC
USOC infrastructure. Automate
the current manual integration test.
test
AugustSeptember
Integration of shared test
bed architecture
Implement specifications, formats and interfaces of
the shared test bed architecture in the current
science data test bed.
October
Test case definition
Preparation of test plans based on test scenarios and
requirements. Preparation of datasets for automatic
ingestion.
November
onwards
Test case implementation
Test case implementations based on test plans.
Improvement of test bed components, datasets, test
plans and tests. Document test failures
failur with perfailure dedicated tests.
Begin recording of success/failure metrics, process
and software quality metrics.
January
onwards
Continuous Test Scenario
Development
Improve, augment and expand the test scenarios
based on
n the user scenarios and software
soft
development process.
6.3 Integration of tools and components
Table 10:
10 Integration of tools and components roadmap
Month
Task
Description
JulyJune
Testing Tools Agreement
Ensure all partners agree on testing tools, build
strategy, code repositories, and associated topics.
topics
July
Test Bed Analysis
Analysis of existing test beds to look for commonalities
and report back.
© PERICLES Consortium
Page 36 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
July
Test Planning
Draft to whole project about how testing is/will be
done. This includes types of test, success/failure/error
criteria, test layout and recording of rationale.
rationale
August
Component Interface
Draft proposal for interfaces for different types of
components and target locations in OAIS. For starting
the OASIS diagram from ULIV can used as a basis.
basis
September
Handler Syntax
Draft for how Handler actions will happen.
happen
September
Test Bed Commonality
Publish Test Bed analysis. Set up common scripts if
possible. Unified testing view over test beds.
beds
October
Common Tests Online
Have heartbeat and common scenario testing online.
Adding new AIP and component change.
change
OctoberNovember
First Version of Handler and
component interface
First full versions of these definitions.
definitions
NovemberJanuary
Automated Build Test,
Harness of Components
Initial build test harness
ess for components from WP3-5.
WP3
Should report on compilation status, if documentation
is available. Note: This is the harness being ready components may be unavailable.
DecemberFebruary
Automated Unit and
Integration Test Harness
Components from WP3-5. Should report in failures and
errors. NOTE: This is the harness being ready components may be unavailable.
November
onwards
Feedback and Update on
Test Systems
Improvements to process underlying structures.
structures
© PERICLES Consortium
Page 37 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
7 Conclusion
This deliverable describes the first
first version of the test bed implementations of the two domains of the
PERICLES project. The test beds use technologies and tools,
tools which are on the one hand side found to
be a common component and on the other identified as domain specific. For both test beds
bed the first
tests have been successfully performed.
performed. The document describes in detail the technologies applied
but also hints on the current limitations. To overcome the latter a clear roadmap with short and longlong
term goals was defined and presented in this document. The current test bed implementations
implementation show
a successful collaboration between the different working groups of the project to reach that goal.
© PERICLES Consortium
Page 38 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
List of Figures and Tables
List of Figures
Figure 1: Example of a process diagram with components mapping ...................................................
................................
12
Figure 2: Integration Framework Architecture ................................................................
..................................................... 14
Figure 3: Chained Handlers ................................................................................................
................................
................................................... 19
Figure 4: Data Pull ................................................................................................................................
................................
................................. 24
Figure 5: Conceptual view of the Arts & Media Test Bed Instantiation ................................................
................................
30
Figure 7: Example of the semantic model provided by Topic Maps Engine .........................................
................................
33
List of Tables
Table 1: Functional blocks of the Workflow Engine. ................................................................
............................................. 15
Table 2: Operations of a handler ................................................................................................
........................................... 18
Table 3: Handler and Component Anatomy (Green are framework.....................................................
................................
18
Table 4: GET method for getting the status of a specific workload. .....................................................
................................
20
Table 5: POST method provide for a handler ................................................................
........................................................ 20
Table 6: GET method for getting the workflow................................................................
..................................................... 21
Table 7: Status response of a handler ................................................................................................
................................... 21
Table 8: Arts & Media test bed roadmap ..............................................................................................
.............................. 35
Table 9: Science test bed roadmap ................................................................................................
....................................... 36
Table 10: Integration of tools and components roadmap
r
................................................................
.................................... 36
Table 11: Example test plan ................................................................................................
................................
.................................................. 42
© PERICLES Consortium
Page 39 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Bibliography
Alfresco - Open Source document management
http://www.alfresco.com
Archivematica (2013) – Archivematica Website, 2013, Tested Live – 16/01/2014
https://www.archivematica.org/wiki/Main_Page
CoffeeScript (2014) – Documentation, 2014, Tested Lived – 29/01/2014
http://coffeescript.org/
Git (2014) - Git - Documentation and Product, 2014, Tested Live - 23/05/2014
http://git-scm.com/
scm.com/
IETF (2013), The BagIt File Packaging Format, ITEF, 2013, Tested Live -- 26/01/2014
http://tools.ietf.org/html/draft
http://tools.ietf.org/html/draft-kunze-bagit-09
iRODS(2013)
iRODS – Documentation and Wiki, 2013, Tested Live – 16/01/2014
https://www.irods.org/index.php#
Jenkins (2013) – Jenkins Continuous Integration Website, 2013, Tested Live – 16/01/2014
https://wiki.jenkins
https://wiki.jenkins-ci.org/display/JENKINS/Home
JSON - JavaScript Object Notation
http://en.wikipedia.org/wiki/JSON
Kepler – Scientific workflow system
https://kepler-project.org
project.org
MAVEN (2014) - Maven Documentation and Product, 2014, Tested Live - 23/05/2014
http://maven.apache.org/
MySQL - open-source relational database management system
http://www.mysql.com
SVN: Subversion (2014) - Subversion Documentation and Product, 2014, Tested
Live - 23/05/2014: http://subversion.apache.org/
OAIS - Open Archival Information System
http://en.wikipedia.org/wiki/OAIS
REST - Representational State Transfer
© PERICLES Consortium
Page 40 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
http://en.wikipedia.org/wiki/REST
SAFE - Standard Archive Format for Europe
http://earth.esa.int/SAFE/
.esa.int/SAFE/
Taverna - an open source and domain-independent
domain
Workflow Management System
http://www.taverna.org.uk
Topic Maps - a standard for the representation and interchange of knowledge, with an emphasis
on the findability of information. http://en.wikipedia.org/wiki/Topic_Maps
/wiki/Topic_Maps
YAMCS - A Mission Control System
http://www.busoc.be
© PERICLES Consortium
Page 41 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Appendix
Table 11: Example test plan
Test Name
Handler for Conversion
Test Purpose
Confirm that Handler will call component for audio conversion and confirm
output/workflow/payloads
Test Component
Handler
Conversion Component
Test Data
Source Audio File
Expected Audio File
Test Workflows
Pre-Conditions
Initialised Git Repository with Source Audio File
Mock Workflow Server Active
Dummy Check Handler Active
Post-Conditions
Git Repository Removed
Mock Workflow Server Removed
Dummy Check Handler Removed
Tests
1
Normal Behaviour - Conversion Worked, Workflow Valid, Target Contactable
2
Unavailable Payload - Handler Active - payload non-existent
3
Invalid Params - Handler works and handles parameters errors
© PERICLES Consortium
Page 42 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
4
Component Failure - Handler operational, conversion component fails
5
Next Handler Down - Cope with failure of target handler
6
Mock Server unavailable - Handle failure of server
1
Pass - Dummy handler receives converted file, correct/expected parameters
Success Criteria
Fail - Anything Else
2
Pass - Handler produces error that payload is unavailable for processing,
terminates chain
Fail - Handler tries to pass invalid payload to component, handler tries to follow
normal workflow.
3
Pass - Handler produces error that parameters have been mishandled (malformed
etc)
Fail - Handler validates the parameters and attempts to run conversion
4
Pass - Error of component failure is logged and chain terminates at this point.
Fail - Next handler is contacted
5
Pass - Handler retries X times, Handler produces error and parks to repo.
Fail - Tries indefinitely or tries and ignores failure
6
Pass - Handler stops and logs error
Fail - Anything Else
What is Not Tested
Full Behavior of Conversion Component - only interactions with handler.
Test
1
Kick off process - contact handler under test with valid parameters and payload
Monitor
handler status - check working/finished status
Monitor
dummy status
© PERICLES Consortium
Page 43 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Dummy when activated should do checks on:
●
●
●
2
Converted File to Expected File
Check Workflow Valid
Check Parameters Valid
Kick off process - contact handler under test with valid parameters and
invalid payload
Monitor
handler status - check error status
Assert Error Log has valid error message
3
Kick off process - contact handler under test with invalid parameters and valid
payload
Monitor
handler status - check error status
Assert Error Log has valid error message
messag
4
Kick off process - contact handler under test with valid parameters and valid
payload
Induce Conversion Component Failure
Monitor
handler status - check status becomes blocked
Assert Error Log has valid error message
messag
5
Kick off process - contact handler under test with valid parameters and valid
payload
Disable Dummy Handler
Monitor
handler status - check status becomes error
Assert Error Log has valid error message
Check Repo
6
Kick off process - contact handler under test with valid parameters and valid
payload
© PERICLES Consortium
Page 44 / 46
DELIVERABLE 7.3.1
INITIAL VERSION OF TEST
EST BED IMPLEMENTATION
IMPLEMENTATI
Disable Server
Monitor
handler status - check status becomes error
Assert Error Log has valid error message
Check Repo
© PERICLES Consortium
Page 45 / 46
Related documents
Download