Document 11611028

LCCI (Large-scale Complex Critical Infrastructures) ¡ LCCIs are Internet-scale constellations of heterogeneous systems glued together into a federated and open system by a data distribution middleware. ¡ The shift towards Internet is considered a necessary step to overcome the limitations of the monolithic and closed architectures used traditionally to build critical systems (e.g., SCADA architectures). ¡ Real world example is the novel framework for Air Traffic Management (ATM) that EuroCONTROL is developing within the SESAR EU Joint Undertaking. 1 LCCI (Large-scale Complex Critical Infrastructures) 2 ¡ New challenges rise from LCCIs that push the frontiers of current technologies. ¡ Data distribution task becomes crucial and has to be: ¡ Reliability: deliveries have to be guaranteed despite failures may happen; ¡ Timeliness: messages must reach their destinations at the right time, without breaking temporal constraints; ¡ Scalability: performance is affected neither by the time nor by the LCCI size. ¡ The challenge is to find the best data distribution paradigm able to meet the aforementioned requirements. Outline of SWIM concept 3 ¡ SWIM (System Wide Information Management) aims to establish a seamless interoperability among heterogeneous ATM stakeholders: ¡ common data representation; ¡ coherent view on current ATM information (e.g. Flight Data, Aeronautical Data, Weather). ¡ It may be seen as a common data/service bus on which systems having to interoperate are “connected”. ¡ Close in spirit to a middleware solution for LCCI. SWIM prototype 4 ¡ The prototype (named “SWIM-BOX”) has been conceived as a sort of “Gateway/Mediator” across legacy applications: ¡ Completely distributed architecture; ¡ Designed using a domain based approach (Flight, Surveillance, etc); ¡ Implemented using a standard based approach; ¡ Well known data and information models (e.g. ICOG2); ¡ Standard technologies (Web Services, EJB, DDS); ¡ DDS-compliant middleware for sharing data. Legacy A SWIMBOX SWIM Network Adapte rA Legacy B SWIMBOX Adapte rB Legacy site Common Infrastructure Legacy site Some challenges 5 ¡ How subsystems (as COTS) involved into LCCI impacts on its dependability? ¡ What are the effects on LCCI if DDS-compliant middleware is invoked with erroneous inputs? ¡ Robustness questions: testing provides answers to these ¡ Help vendors evaluating their implementations; ¡ Help clients selecting several solutions. ¡ Tests cost reduction à automating tests procedure. ¡ Automating tests results classification. Our goal ¡ Assessing the middleware 6 robustness of DDS-compliant ¡ What does robustness mean? “The degree to which a system operates correctly in the presence of exceptional inputs or stressful environmental conditions” [IEEE Std 610.12.1990]. “Dependability with respect to external faults, which characterizes a system reaction to a specific class of faults” [Avizienis 04]. ¡ Robustness testing features: ¡Only the system interface has to be known; ¡ Source code is not needed (black-box approach); ¡Injecting exceptional input through API; ¡Do not alter ”data and structure" internally; ¡Select carefully inputs and stressful conditions that cause the activation of faults representative of actual situations. Robustness Testing Approaches 7 ¡ Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values: ¡ From Application To System Under Test (Top-Down); ¡ From OS to System Under Test (Bottom-UP). API called with exceptional values Application DDS Middleware OS syscall Operating System OS return with exceptional values Robustness Testing Approaches 8 ¡ Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values: ¡ From Application To System Under Test (Top-Down); ¡ From OS to System Under Test (Bottom-UP). API called with exceptional values Application DDS Middleware OS syscall Operating System OS return with exceptional values Robustness Testing Approaches 9 ¡ Robustness testing: stressing the public interface of the application/system/API with invalid and exceptional values: ¡ From Application To System Under Test (Top-Down); ¡ From OS to System Under Test (Bottom-UP). ¡ Workload stands for a set of valid calls. It’s needed to stress each operation of the device under test. ¡ Fault model is a set of rules applied at API to expose robustness problems. ¡ Failure mode classification characterizes the behavior of the system under test while executing the workload in the presence of fault model. Fault Injection: WWW dilemma ¡ What to inject? Injection library ¡ Fault model -> Fault List ¡ Where to inject? Fault list ¡ At API interface level ¡ Method with higher occurrences Method list (Method list) ¡ When to inject? Trigger List ¡ At only one invocation of methods (Trigger list) ¡ Fault, Model and Trigger lists define our Injection library 10 Faults list 11 ¡ The rules list applied during the API invocation: ¡ Each method input is tested with all robustness values one for time. ¡ E.g., void replace(int a, String b). Method list ¡ Profiling different applications compliant middleware product: 12 using DDS- ¡ Ping-pong application; ¡ Touchstone: benchmarking framework for evaluating performance of OMG DDS compliant implementations; the ¡ SWIM-BOX. ¡ The methods occurrences have been measured for each applications: ¡ Only a limited set core of all available methods are invoked; ¡ The same occurrences distribution is noted for all applications ¡ Method list involved the methods with higher occurrences. Failure mode classification 13 ¡ CRASH scale has been utilized to classify the robustness problems ¡ Catastrophic: node crashes and OS hangs, DDS provider do not deliver messages correctly. ¡ Restart: DDS provider becomes unresponsive and must be terminated by force. ¡ Abort: Abnormal termination when invoking API. ¡ Silent: Faulty submitted value doesn’t rise exceptions, despite this message are or aren’t transmitted. ¡ Hindering: returned error code is incorrect. ¡Further and suitable levels have been added: ¡ non conformity: fault is not indicated as should be. ¡ DDS API analysis classification. has been performed for results ¡ Golden run has been run for each injecting value to understand the system behavior. Test automation: JFault Injection Tool (JFIT) 14 ¡ Pros: ¡ Java-based implementation; ¡ No knowledge about the SUT; ¡ Run-time methods mutation: interception and values ¡ Exploiting java reflection; ¡ Monitoring status and output of the SUT. ¡ Cons: ¡ Only methods with primitive types (i.e. String, int, …) are taken into account; ¡ Off line and by hand results classification. High level architecture of JFIT 15 ¡All robustness test are carried out according with the Injection library; ¡ Controller is in charge for tests management and runs them through the Activator; ¡ Interceptor catches the methods invocation to SUT and injects, by Injector, the faults one for time ¡ Monitor records the output at Pub and Sub side. CONTROLLER ACTIVATOR INTERCEPTO MONITOR R System Under Test INJECTOR Test execution stages 16 ¡Preliminary execution of the workload without faults ¡ To understand the normal behavior ¡Starting robustness testing DDS initialitation Workload execution Injection phase One fault for time Monitoring & Logging Golden run No faults are injected Tests Results 17 ¡DDS middleware: OpenSplice® implementation; ¡No QoS features have been defined (Best Effort); ¡ According with the failure mode classification the achieved results are as follows: ¡ no Catastrophic, Abort and Hindering problems have been evidenced: ¡ Neither node crashes and nor OS hangs; ¡ No abnormal termination when invoking API; ¡ No erroneous returned error code. ¡ 13% of robustness problems: tests have shown Restart ¡ Experiment doesn’t response and must be terminated by force. ¡ 45% of robustness tests have risen Silent problems: ¡ No exception has been thrown by DDS; Tests Results 18 ¡Faults distribution between Silent and Restart. Int faults types String faults types Faults types Conclusions 19 ¡ Our approach can automatically test the core set of DDS methods; ¡ A significant fraction of tests shows some robustness issues raised when exceptional values are submitted to OpenSplice® APIs (e.g., large strings, or big integers); ¡ The ability to reach a consistent system state before performing fault injection makes us confident of the results. Conclusions 20 ¡ Our approach can automatically test the core set of DDS methods; ¡ A significant fraction of tests shows some robustness issues raised when exceptional values are submitted to OpenSplice® APIs (e.g., large strings, or big integers); ¡ The ability to reach a consistent system state before performing fault injection makes us confident of the results. Ongoing activities ¡ Testing all parameters types and not only primitive types; ¡ Automating results classification; ¡ Running tests in presence of quality of service mechanisms; ¡ Carrying out the same tests with other DDS-compliant middleware. References 21 [Avizienis 04] A. Avizienis, J.C. Laprie, B. Randell, C. Landwehr. Basic Concepts and Taxonomy of Dependable and Secure Computing. IEEE Trans. Dependable Secure Computing, 2004. [Koopman 02] P. Koopman. “What's Wrong With Fault Injection As A Benchmarking Tool?”. in Proc. DSN 2002 Workshop on Dependability Benchmarking, pp. F- 31-36, Washington, D.C.,USA, 2002. [Koopman 99] Koopman P., DeVale J., Comparing the robustness of POSIX operating, Proceedings of Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing, 1999. [Johansson 07] Johansson A., Suri N., Murphy B. On the selection of Error models for OS Robustness Evaluation Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2007. [Miller 95] B.P. Miller et al, Fuzz Revisited: A Re-examination of the Reliability of UNIX Utilities., Technical report, 1995. 22 23 Test Scenario Further details JFIT Mo n ito rin g g t o r in Mo n i A P I in je c to r ¡ The Transmitter sends burst of messages for a while then terminates A P I in te rc e p to r ¡ A receiver is waiting for messages JFIT ¡DDS middleware: OpenSplice® implementation ¡No QoS features have been defined (Best Effort) Pub/Sub paradigm 24 ¡ Pub/Sub reveals effective to federate heterogeneous systems ¡ Space, time and synchronization decoupling enforce scalability ¡ Asynchronous multi-point communication good to devise cooperating systems SIENA GREEN HERALD CORBA NS DREAM JEDI JMS HERMES ¡ Among the plethora of Pub/Sub alternatives DDS exhibits better performances, higher scalability and larger set of offered QoS ¡ Widely used in large scope initiatives addressing wide area scenarios ¡ E.g., it has been investigating as the data distribution system into SESAR project through SWIM middleware infrastructure

Document 11611028

Related documents

Products

Support

Document 11611028

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib