Document 11611028

advertisement
LCCI (Large-scale Complex
Critical Infrastructures)
¡ LCCIs are Internet-scale constellations of
heterogeneous systems glued together
into a federated and open system by a
data distribution middleware.
¡ The shift towards Internet is considered a
necessary
step
to
overcome
the
limitations of the monolithic and closed
architectures used traditionally to build
critical
systems
(e.g.,
SCADA
architectures).
¡ Real world example is the novel
framework for Air Traffic Management
(ATM) that EuroCONTROL is developing
within the SESAR EU Joint Undertaking.
1
LCCI
(Large-scale Complex
Critical Infrastructures)
2
¡ New challenges rise from LCCIs that push the frontiers of
current technologies.
¡ Data distribution task becomes crucial and has to be:
¡ Reliability: deliveries have to be guaranteed despite
failures may happen;
¡ Timeliness: messages must reach their destinations at
the right time, without breaking temporal constraints;
¡ Scalability: performance is affected neither by the time
nor by the LCCI size.
¡ The challenge is to find the best data distribution
paradigm able to meet the aforementioned requirements.
Outline of SWIM concept
3
¡ SWIM (System Wide Information Management) aims
to establish a seamless interoperability among
heterogeneous ATM stakeholders:
¡ common data representation;
¡ coherent view on current ATM information (e.g.
Flight Data, Aeronautical Data, Weather).
¡ It may be seen as a
common data/service
bus on which systems
having to interoperate
are “connected”.
¡ Close in spirit to a
middleware
solution
for LCCI.
SWIM prototype
4
¡ The prototype (named “SWIM-BOX”) has been
conceived as a sort of “Gateway/Mediator” across
legacy applications:
¡ Completely distributed architecture;
¡ Designed using a domain based approach (Flight,
Surveillance, etc);
¡ Implemented using a standard based approach;
¡ Well known data and information models (e.g. ICOG2);
¡ Standard technologies (Web Services, EJB, DDS);
¡ DDS-compliant middleware for sharing data.
Legacy
A
SWIMBOX
SWIM
Network
Adapte
rA
Legacy
B
SWIMBOX
Adapte
rB
Legacy site
Common
Infrastructure
Legacy site
Some challenges
5
¡ How subsystems (as COTS) involved into LCCI impacts
on its dependability?
¡ What are the effects on LCCI if DDS-compliant
middleware is invoked with erroneous inputs?
¡ Robustness
questions:
testing
provides
answers
to
these
¡ Help vendors evaluating their implementations;
¡ Help clients selecting several solutions.
¡ Tests cost reduction à automating tests procedure.
¡ Automating tests results classification.
Our goal
¡ Assessing the
middleware
6
robustness
of
DDS-compliant
¡ What does robustness mean?
“The degree to which a system
operates correctly in the presence of
exceptional inputs or
stressful
environmental conditions” [IEEE Std
610.12.1990].
“Dependability
with
respect
to
external faults, which characterizes a
system reaction to a specific class of
faults” [Avizienis 04].
¡ Robustness testing features:
¡Only the system interface has to be known;
¡ Source code is not needed (black-box approach);
¡Injecting exceptional input through API;
¡Do not alter ”data and structure" internally;
¡Select carefully inputs and stressful conditions that
cause the activation of faults representative of
actual situations.
Robustness Testing Approaches
7
¡ Robustness testing: stressing the public interface
of the application/system/API with invalid and
exceptional values:
¡ From Application To System Under Test (Top-Down);
¡ From OS to System Under Test (Bottom-UP).
API called with
exceptional
values
Application
DDS Middleware
OS syscall
Operating System
OS return with
exceptional
values
Robustness Testing Approaches
8
¡ Robustness testing: stressing the public interface
of the application/system/API with invalid and
exceptional values:
¡ From Application To System Under Test (Top-Down);
¡ From OS to System Under Test (Bottom-UP).
API called with
exceptional
values
Application
DDS Middleware
OS syscall
Operating System
OS return with
exceptional
values
Robustness Testing Approaches
9
¡ Robustness testing: stressing the public interface
of the application/system/API with invalid and
exceptional values:
¡ From Application To System Under Test (Top-Down);
¡ From OS to System Under Test (Bottom-UP).
¡ Workload stands for a set of valid calls. It’s needed
to stress each operation of the device under test.
¡ Fault model is a set of rules applied at API to
expose robustness problems.
¡ Failure mode classification characterizes the
behavior of the system under test while executing
the workload in the presence of fault model.
Fault Injection: WWW dilemma
¡ What to inject?
Injection
library
¡ Fault model -> Fault List
¡ Where to inject?
Fault list
¡ At API interface level
¡ Method with higher occurrences
Method list
(Method list)
¡ When to inject?
Trigger List
¡ At only one invocation of methods
(Trigger list)
¡ Fault, Model and Trigger lists define our
Injection library
10
Faults list
11
¡ The rules list applied during the API invocation:
¡ Each method input is tested with all robustness values one for time.
¡ E.g., void replace(int a, String b).
Method list
¡ Profiling different applications
compliant middleware product:
12
using
DDS-
¡ Ping-pong application;
¡ Touchstone: benchmarking framework for evaluating
performance of OMG DDS compliant implementations;
the
¡ SWIM-BOX.
¡ The methods occurrences have been measured for
each applications:
¡ Only a limited set core of all available methods are invoked;
¡ The same occurrences distribution is noted for all applications
¡ Method list involved the methods with higher
occurrences.
Failure mode classification
13
¡ CRASH scale has been utilized to classify the robustness
problems
¡ Catastrophic: node crashes and OS hangs, DDS provider do not
deliver messages correctly.
¡ Restart: DDS provider becomes unresponsive and must be
terminated by force.
¡ Abort: Abnormal termination when invoking API.
¡ Silent: Faulty submitted value doesn’t rise exceptions, despite
this message are or aren’t transmitted.
¡ Hindering: returned error code is incorrect.
¡Further and suitable levels have been added:
¡ non conformity: fault is not indicated as should be.
¡ DDS API analysis
classification.
has
been
performed
for
results
¡ Golden run has been run for each injecting value to
understand the system behavior.
Test automation:
JFault Injection Tool (JFIT)
14
¡ Pros:
¡ Java-based implementation;
¡ No knowledge about the SUT;
¡ Run-time methods
mutation:
interception
and
values
¡ Exploiting java reflection;
¡ Monitoring status and output of the SUT.
¡ Cons:
¡ Only methods with primitive types (i.e. String,
int, …) are taken into account;
¡ Off line and by hand results classification.
High level architecture of JFIT
15
¡All robustness test are carried out according with
the Injection library;
¡ Controller is in charge for tests management and
runs them through the Activator;
¡ Interceptor catches the methods invocation to
SUT and injects, by Injector, the faults one for
time
¡ Monitor records the output at Pub and Sub side.
CONTROLLER
ACTIVATOR
INTERCEPTO
MONITOR
R
System
Under
Test
INJECTOR
Test execution stages
16
¡Preliminary execution of the workload without faults
¡ To understand the normal behavior
¡Starting robustness testing
DDS initialitation
Workload
execution
Injection phase
One fault for
time
Monitoring &
Logging
Golden run
No faults are
injected
Tests Results
17
¡DDS middleware: OpenSplice® implementation;
¡No QoS features have been defined (Best Effort);
¡ According with the failure mode classification the
achieved results are as follows:
¡ no Catastrophic, Abort and Hindering problems
have been evidenced:
¡ Neither node crashes and nor OS hangs;
¡ No abnormal termination when invoking API;
¡ No erroneous returned error code.
¡ 13% of robustness
problems:
tests
have
shown
Restart
¡ Experiment doesn’t response and must be terminated by
force.
¡ 45% of robustness tests have risen Silent problems:
¡ No exception has been thrown by DDS;
Tests Results
18
¡Faults distribution between Silent and Restart.
Int faults types
String faults types
Faults types
Conclusions
19
¡ Our approach can automatically test the core set of DDS
methods;
¡ A significant fraction of tests shows some robustness issues
raised when exceptional values are submitted to OpenSplice®
APIs (e.g., large strings, or big integers);
¡ The ability to reach a consistent system state before
performing fault injection makes us confident of the results.
Conclusions
20
¡ Our approach can automatically test the core set of DDS
methods;
¡ A significant fraction of tests shows some robustness issues
raised when exceptional values are submitted to OpenSplice®
APIs (e.g., large strings, or big integers);
¡ The ability to reach a consistent system state before
performing fault injection makes us confident of the results.
Ongoing activities
¡ Testing all parameters types and not only primitive types;
¡ Automating results classification;
¡ Running tests in presence of quality of service mechanisms;
¡ Carrying out the same tests with other DDS-compliant
middleware.
References
21
[Avizienis 04] A. Avizienis, J.C. Laprie, B. Randell, C.
Landwehr. Basic Concepts and Taxonomy of Dependable and
Secure Computing. IEEE Trans. Dependable Secure Computing,
2004.
[Koopman 02] P. Koopman. “What's Wrong With Fault Injection
As A Benchmarking Tool?”. in Proc. DSN 2002 Workshop on
Dependability Benchmarking, pp. F- 31-36, Washington,
D.C.,USA, 2002.
[Koopman 99] Koopman P., DeVale J.,
Comparing the
robustness of POSIX operating, Proceedings of Twenty-Ninth
Annual International Symposium on Fault-Tolerant Computing,
1999.
[Johansson 07] Johansson A., Suri N., Murphy B. On the
selection of Error models
for OS Robustness Evaluation
Proceedings of the 37th Annual IEEE/IFIP International
Conference on Dependable Systems and Networks, 2007.
[Miller 95] B.P. Miller et al, Fuzz Revisited: A Re-examination of
the Reliability of UNIX Utilities., Technical report, 1995.
22
23
Test Scenario
Further details
JFIT
Mo n ito rin g
g
t o r in
Mo n i
A P I in je c to r
¡ The Transmitter
sends burst of
messages for a
while then
terminates
A P I in te rc e p to r
¡ A receiver is
waiting for
messages
JFIT
¡DDS middleware: OpenSplice® implementation
¡No QoS features have been defined (Best Effort)
Pub/Sub paradigm
24
¡ Pub/Sub reveals effective to federate heterogeneous
systems
¡ Space, time and synchronization decoupling enforce scalability
¡ Asynchronous multi-point communication good to devise
cooperating systems
SIENA
GREEN
HERALD
CORBA NS
DREAM
JEDI
JMS
HERMES
¡ Among the plethora of Pub/Sub alternatives DDS exhibits better
performances, higher scalability and larger set of offered QoS
¡ Widely used in large scope initiatives addressing wide area
scenarios
¡ E.g., it has been investigating as the data distribution system
into SESAR project through SWIM middleware infrastructure
Download