Improving Dependability in Service Oriented Architectures using Ontologies and Fault Injection

advertisement
Improving Dependability in Service Oriented Architectures
using Ontologies and Fault Injection
Binka Gwynne
Jie Xu
School of Computing,
University of Leeds,
Leeds, LS2 9JT, UK
{binka, jxu} @comp.leeds.ac.uk
Abstract
Large distributed systems and computer grids are increasingly being used in science and in
business, with Service Oriented Architectures combined with Web services the current favoured
solutions to access these distributed, heterogeneous resources. However, service-based systems
have a high reliance on a middleware which is continuously evolving. This requires novel methods
of testing and evaluation to provide high dependability. Our objective is to introduce innovative
methods of testing SOA middleware with the use of experimental provenance and ontologically
supported software fault injection, and gain better understanding of the SOA dependability and fault
domains.
1. Introduction
Software Oriented Architectures in combination
with Web services are the current favoured
solutions to access distributed, heterogeneous
resources, yet service-based systems have a
high reliance on continuously evolving
middleware. However, this middleware requires
high levels of dependability in order to become
as widely adopted, ubiquitous and successful as
the established Internet technologies have
become. The Internet is predominantly used for
information retrieval displayed in a human
readable form; however, the emergence of P2P
and grid technologies is driving a revolution in
established distributed technologies, due to the
high demands of increasingly large scale
information processing, loosely coupled
services, and machine comprehension.
The objective of this paper is to introduce
innovative methods of testing SOA middleware
through the use of ontologically supported
software fault injection; this will allow us to
improve the evaluation of the dependability and
fault domains of SOA middleware, and to
cyclically enhance the quality of our testing
methods.
2. Background
2.1 Service Oriented Architecture
A Service Oriented Architecture (SOA) is an
architectural model that emphasises properties
of interoperability and location transparency. It
is essentially a collection of services, where
each service can be considered a resource that is
either provided or consumed. In an ideal world
the human elements of Software Oriented
Systems would be unaware of this transparent,
but in all probability complex, middleware that
provides their trust worthy access to resources.
Although Web services are the current de facto
standard middleware for SOAs, they are not the
only available middleware: an SOA is a high
level architectural model and Web services only
one means of its implementation. In addition,
Web services are still continuing to develop,
driven by ever increasing demands in
requirements. For example, SOAs and Web
services do not yet have full transactional
reliability, although implementations and
protocols are maturing to meet this requirement
[1].
2.2 Dependability
Dependability is a collective term that
encompasses
reliability,
performance,
maintainability, security etc. in computing
systems: it includes everything that is needed to
make a computer system dependable.
Dependability is inextricably linked with the
notion of trust: “The ability to deliver service
that can justifiably be trusted” [2]. Reliability is
the part of dependability concerned with the
probability that a given system will behave
according to its requirements [3].
Reliability was chosen for our test scenario as it
has a distinct numeric value between 0 and 1 for
a given time period. In consequence, reliability
can provide empirical results and an opportunity
to generate meaningful provenance data. The
concept of provenance is well established in the
real world and is a trustworthy, documented
history, including information on origin and
fabrication processes. Provenance has already
been applied in the virtual world, for example
where it has been used for the verification of insilico experiments, so that they are fully
documented and replicable [4] [5].
To improve reliability, it is necessary for a
system to decrease the number of its failures by
removing or surviving the occurrence of errors.
A system failure is an event that occurs when a
delivered service deviates from its correct
service [6]. Errors are inextricably linked to
failures, as an error is defined as present in a
system when that system’s state deviates in
some way from its expected state. In
consequence it can be seen that violations of
timing constraints can also be viewed as
reliability failures [7], as could a task which
completes more efficiently than planned.
According to Randell [8]:
“There will often be considerable subjective
judgement involved in identifying errors,
particularly errors due to design faults in
complex software.”
A paradox exists here: the concepts of the fault
and dependability domains are obscure, but we
try to provide machines, with their requirement
of explicitness, with comprehension of these
domains.
2.3 Ontologies
An ontology is an abstract, formal and explicit
description of a specified part of the world or
domain. Computing ontologies must follow
rules, have structure and lack ambiguities, so
that machines are able process them and glean
the information they require from them. In
consequence, computing ontologies must be
formal and explicit: formal, following or being
in accord with accepted forms or regulations;
explicit, fully, clearly expressed and leaving
nothing implied. Although machines require
their information to be both formal and explicit,
there are difficulties in being explicit when
describing the nature of something, and at this
time machines still remain less successful
semantic processors than their human
counterparts.
Computing ontologies are used in order to glean
intelligence from data by the descriptions and
relations of element attributes within domains;
indeed they are central to the concept of the
semantic Web [9]. Ontologies can vary in scale
from flat lexicons with few relationships, to
very large, expressive ontologies attempting to
capture every possible aspect of a domain [10].
Detailed ontologies can be highly complex,
although languages and tools are available to
assist in their development, such as Protégé
[11].
Our ontologies are to develop with information
from experimental testing and provenance data;
this makes testing methods replicable and
should provide availability to heterogeneous
systems. The ontologies attempt to classify and
describe part of the fault and dependability
domains and provide information to assist in the
design of better tests and evaluation methods,
including conceptualisation of domains.
3. Experimental Methods
3.1 Testing Using Software Fault Injection
Middleware testing is being carried out using
software fault injection on Web services.
Software fault injection is the deliberate
insertion of faults into a computer system so
that the system will accelerate exhibition of its
behavior in the presence of faults: a faster and
more analytical method of testing than simple
observation methods such as logging [12]. Fault
injection is also a well proven method of
demonstrating dependability [13] [14]. As with
all other testing methods, fault injection
increases the chance of finding faults within a
system so that those faults can be removed or
tolerated, but it cannot guarantee that all the
faults have been discovered that exist in that
system’s fault domain, and therefore never give
a 100% surety against all future failures. There
are two main concerns with fault injection: the
plausibility of the fault model to represent
actual faults (fault representativeness) [15]; the
possibility that the fault injection itself will
change conditions within the system which
could lead to the phenomenon of spawned
faults.
3.2 Experimental Procedures
Experiments using fault injection techniques on
Web services protocols aim to generate data for
ontology development. Our experimental
requirements include trustworthy, proven
histories of our testing and evaluation methods
and this is to be provided by recording
experimental provenance.
Attempts to model large domains accurately and
completely can mean that ontologies can grow
very large, which may lead to unnecessary
problems where only a sub set of information is
required. The solution to this situation is the use
of sub-ontologies, which can also lead to an
improvement in efficiency [16]. As the
dependability domain is large and with unclear
boundaries [17], the modeling of the
dependability domain will involve using sub
ontologies that describe inter related sub
domains, and in consequence we have focused
our initial area of interest within the reliability
sub domain. Experiments explore replacement
of parameters and latency issues; these are still
under development.
3.3 The Software Fault Domain Model
Models are abstractions able to portray the
essentials of complex problems and provide a
useful way of understanding those problems
relative to the real world.
The software fault domain model cannot be
described simply with its elements necessarily
occurring only once in a hierarchy and in
consequence an open world assumption is
required. For example, the concept of time
related faults occurs in numerous sub domains
of the software fault domain: latency may occur
due to resource management issues, or during
bit transmission within a communication subdomain, or during software aging.
The
phenomenon of software aging is an
accumulation of errors during execution that
over time eventually results in software failure
[18]. Software aging can sometimes only be
evident from an observable, gradual increase in
resource requirements, but may occur with no
observable effects, so is a logical case for
experiments in fault injection.
•
Inherent design
•
Latent design
•
Resource management
•
Communication-heterogeneity
•
Communication-transmission
•
Life Cycle
•
Security
Figure 1: A High Level Model of Faults in the
Software Fault Domain
Provenance of testing includes details of the
type of test, system under test, timestamp, and
results log, so that revealed errors can be used to
describe their corresponding faults within the
system.
<test>
<testID>testUniqueID</testID>
<testerID>testerUniqueID</testerID>
<testType>type</testType>
<testSubject>subject</testSubject>
<testDate>date</testDate>
<startTestTimestamp>
stTimestamp</startTestTimeStamp>
<provenanceLogID>logNameUniqueID
</provenanceLogID>
<testResults>
<noError></noError>
<error>
<errorType>type</errorType>
<errorTimestamp>Timestamp
</errorTimeStamp>
<errorLocation>location
</errorLocation>
<errorMeassage>message
</errorMessage>
</error>
</testResults>
<endTestTimestamp>endTimestamp
</endTestTimeStamp>
</test>
Figure 2. Example Test Data as Schema
XML is a simple, well documented,
straightforward data format that provides the
explicitness required by machines [19]. XML
can be used to describe the results of tests and to
build a model of the fault domain of a system.
Web services are the focus area of these tests,
and also based on XML, as SOAP messages are
well formed XML documents. In this way
techniques can be applied to SOAP messages.
In further developments we intend to use
OWL [20] , which is a descriptive language
built on top of the more basic language RDF
[21]. RDF is an XML based schema where
statements are represented as triples: subject,
predicate, object. For example: <Program>
<hasError> <arrayIndexOutOfBounds>
OWL consists of Individuals, Properties and
Classes: OWL Classes can intersect (AND),
union (OR), and complement (NOT). OWL’s
restrictions and properties and can facilitate
automated reasoning.
ONTOLOGY ENGINE
PROVENANCE LOGGER
PROVENANCE ANALYSER
ONTOLOGY BUILDER
TEST DESIGNER
Figure 2: The Ontology Engine
4. Conclusion and Future Work
Our early results are encouraging, and
preliminary modeling of the fault and reliability
domains show that they can prove useful in
assisting in the design of further tests and
evaluation methods. Provenance of all fault
injection testing and ontology creation will be
maintained, so that dependability levels of SOA
systems may be established through proven
records.
Future work aims to produce fault models that
are generic enough to match many eventualities.
5. Acknowledgements
This research is supported by The Engineering
and Physical Research Council (EPSRC) and
The
Distributed
Aircraft
Maintenance
Environment (DAME).
6. References
[1] B. Sleeper (2004) “Piecing the SOA Puzzle
Together” InfoWorld, Issue 37, September 2004
[2] A. Avizienis, J. C. Laprie, B. Randell and
C. Landwehr (2004) “Basic Concepts and
Taxonomy of Dependable and Secure
Computing” IEEE Distributed Systems Online
(5) 2
[3] M. R. Lyu Editor in Chief (1995)
“Handbook of Software Reliability
Engineering” McGraw-Hill: Los Alamitos,
USA
[4] M. Greenwood, C. Goble, R. Stevens, J.
Zhao, M. Addis, D. Marvin, L. Mureau, and T.
Oinn (2003) “Provenance of e-Science
Experiments – Experience from Bioinformatics”
UK e-Science All-Hands Meeting 2003,
Nottingham, UK
[5] P. Groth, M. Luck, and L. Moreau (2004)
“Formalising a Protocol for Recording
Provenance in Grids”, UK e-Science All-Hands
Meeting 2004, Nottingham, UK
[6] B. Randell (1998) “Dependability – a
Unifying Concept” Technical Report, School of
Computing Science, University of Newcastle,
UK
[7] J. Xu, and B. Randell (1996) “Roll-Forward
Error Recovery in Embedded Real-Time
Systems” IEEE International Conference on
Parallel and Distributed Systems
[8] B. Randell (2003) “On Failures and Faults”
Technical Report, School of Computing
Science, University of Newcastle, UK
[9] T. Berners-Lee (2001) “The Semantic Web”
Scientific American, May 2001
[10] J. de Bruijn (2003) “Using Ontologies,
Enabling Knowledge Sharing and Reuse on the
Semantic Web” Technical Report, Digital
Enterprise Research Institute, Galway, Ireland
[11] Protégé: http://protege.stanford.edu/
[12] E. Marsden, J.C. Fabre, and J.Arlat (2002)
“Dependability of CORBA Systems: Service
and Characterisation by Fault Injection” 21st
IEEE Symposium on Reliable Distributed
Systems 2002
[13] K. R. Joshi, R. M. Lever, W. H. Sanders,
and M. Cukier (2004) “Achieving Practical
Global-State-Based Fault Injection: Experiences
and Techniques” Technical Report: University
of Illinios, USA
[14] N. Looker, and J. Xu (2003)
“Dependability Assesment of an OGSA
Compliant Middleware Implementation by Fault
Injection” UK e-Science All-Hands Meeting
2003, Nottingham, UK
[15] J. Arlat, Y. Crouzet, J. Karlsson, P.
Folkesson, E. Fuchs, and G. H. Leber (2003)
“Comparison of Physical and SoftwareImplemented Fault Injection Techniques. IEEE
Transactions on Computers, 52(9):1115-1133,
2003
[16] N. Looker, B. Gwynne, J. Xu, and M.
Munro (2005) "An Ontology-Based Approach
for Determining the Dependability of ServiceOriented Architectures ," 10th IEEE
International Workshop on Object-oriented
Real-time Dependable Systems 2005, Sedona,
USA
[17] B. Randell (2004) “Dependability and
Security” IEE UK
http://www.iee.org/Policy/Areas/it/framework/B
rianRandallIEEDependability.pdf
[18] S. Garg, A. van Moorsel, K. Vaidyanathan,
K. S. Trivedi (1998) “A Methodology for
Detection and Estimation of Software Aging”
9th International Symposium on Software
Reliability Engineering
[19] E. R. Harold and W. S. Means (2004) 3rd
Edition. “XML in a Nutshell” O’Reilly:
Sebastopol, USA
[20] OWL: http://www.w3.org/TR/owl-features/
[21] RDF: http://www.w3.org/RDF/
Download