A Framework for the Assessment and Selection of Software Components and Connectors in COTS-based Architectures Jesal Bhuta, Chris Mattmann {jesal, mattmann}@usc.edu USC Center for Systems & Software Engineering http://csse.usc.edu February 13, 2007 Outline Motivation and Context COTS Interoperability Evaluation Framework Demonstration Experimentation & Results Conclusion and Future work 2 COTS-Based Applications Growth Trend Number of systems using OTS components steadily increasing 80 70 – USC e-Services projects show number of CBA’s rise from 28% in 1997 to 70% in 2002 Percentage 60 50 40 30 20 10 0 1997 1998 1999 2000 2001 2002 Year CBA Growth Trend in USC e-Services Projects Standish Group Results – Standish group’s 2000 survey found similar results (54%) in the industry [Standish 2001 - Extreme Chaos] 3 COTS Integration: Issues COTS products are created with their own set of assumptions which are not always compatible – Example: Java-Based Customer Relationship Management (CRM) and Microsoft Access integration CRM supports JDBC, MS SQL supports ODBC Java CRM JDBC ODBC Microsoft SQL Server 4 Case Study [Garlan et al. 1995] Develop a software architecture toolkit COTS selected – – – – OBST, public domain object oriented database Inter-views, GUI toolkit Softbench, event-based tool integration mechanism Mach RPC interface generator, an RPC mechanism Estimated time to integrate: 6 months and 1 person-year Actual time to integrate: 2 years and 5 person-years 5 Problem: Reduced Trade-Off Space Detailed interoperability assessment is effort intensive – Requires detailed analysis of interfaces and COTS characteristics, prototyping Large number of COTS products available in the market – Over 100 CRM solutions, over 50 databases = 5000 possible combinations This results in interoperability assessment being neglected until late in development cycle These reduce trade-off space between – medium and low priority requirements chosen over cost to integrate COTS Large number of COTS choices High Priority Functional Criteria Filtering Medium and Low Priority Functional Criteria Filtering COTS Product Type A COTS Product Type B COTS Product Type C COTS Product Type D 6 Statement of Purpose To develop an efficient and effective COTS interoperability assessment framework by: 1. Utilizing existing research and observations to introduce concepts for representing COTS products 2. Developing rules that define when specific interoperability mismatches could occur 3. Synthesizing (1 and 2) to develop a comprehensive framework for performing interoperability assessment early (late inception) in the system development cycle Efficient: Acting or producing effectively with a minimum of unnecessary effort Effective: Producing the desired effect (effort reduction during COTS integration ) 7 Proposed Framework: Scope Specifically addresses the problem of technical interoperability Does not address non-technical interoperability issues – Human computer interaction incompatibilities – Inter/intra organization incompatibilities Conceptualize Architecture Identify COTS Software Products Inception IRR Detailed Analysis & Prototyping Apply Proposed Framework Integration & Testing Elaboration LCO IRR – Inception Readiness Review; LCO – Life Cycle Objective Review; LCA – Life Cycle Architecture Review; IOC – Initial Operational Capability [Boehm 2000] Construction LCA IOC High return on investment area 8 Motivating Example: Large Scale Distributed Scenario Data disseminated in multiple intervals NASA JPL (USA, Pasedena) Digital content & metadata Additional Planetary Data Digital Asset Management System Data Retrieval Component High Voluminous Data Connector (C1 & C2) Data Retrieval Component Two user classes separated by distributed geographic networks (Internet) – Scientists from European Space Agency (ESA) – External users Query Manager Query Manager High Voluminous Data Connector (C3) Manage and disseminate – Digital content (planetary science data) External User Systems Digital Asset Management System Additional Planetary Data Digital Metadata ESA (Spain, Madrid) Data flow Data Store Custom/COTS components Organization Intranet 9 Interoperability Evaluation Framework Interfaces Interoperability Evaluation Framework COTS Representation Attributes COTS Components & Proposed System Architecture COTS Interoperability Evaluator (StudioI) Estimates Lines of Glue-Code COCOTS Glue-Code Estimation model [Chris Abts 2002] Developer COTS Interoperability Analysis Report Cost & Effort Estimate to Integrate COTS Products Integration Rules and Strategies 10 COTS Representation Attributes COTS Representation Attributes COTS General Attributes (4) COTS Dependency Attributes* (6) Name Role* Type Version Communication Dependency* Communication Incompatibilities* Deployment Language* Execution Language Support* Underlying Dependency* Same Node Incompatibilities* COTS Interface Attributes* (14) Binding* Communication Language Support* Control Inputs* Control Output* Control Protocols* Error Handling Inputs* Error Handling Outputs* Extensions* Data Inputs* Data Outputs* Data Protocols* Data Format* Data Representation* Packaging* COTS Internal Assumption Attributes (16) Backtracking Control Unit Component Priorities Concurrency Distribution Dynamism Encapsulation Error Handling Mechanism Implementation language* Layering Preemption Reconfiguration Reentrant Response Time Synchronization Triggering capability * indicates the attribute or attribute set can have multiple values 11 COTS Definition Example: Apache 2.0 COTS General Attributes (4) COTS Dependency Attributes (4) Name Apache Communication Dependency None Role Platform Deployment Language Binary Type Third-party component Execution Language Support CGI Version 2.0 Underlying Dependencies Linux, Unix, Windows, Solaris (OR) Interface Attributes (14) Backend Interface Web Interface COTS Internal Assumption Attributes (16) Binding Runtime Dynamic Topologically Dynamic Backtracking No Communication Language Support C, C++ Control Unit Central Control Inputs Procedure call, Trigger Component Priorities No Control Outputs Procedure call, Trigger, Spawn Concurrency Multi-threaded Control Protocols None Distribution Single-node Dynamism Dynamic Encapsulation Encapsulated Error Inputs Error Outputs Logs Data Inputs Data access, Procedure call, Trigger Error Handling Mechanism Notification Data Outputs Data access , Procedure call, Trigger Implementation Lang C++ Data Protocols HTTP Error Codes HTTP Layering None Data Format N/A N/A Preemption Yes Data Representation Ascii, Unicode, Binary Ascii,Unicode, Binary Reconfiguration Offline Extensions Supports Extensions Reentrant Yes Packaging Executable Program Response Time Bounded Synchronization Asynchronous Triggering Capability Yes Web service 12 COTS Interoperability Evaluation Framework COTS Definition Generator COTS Selection Framework COTS Definitions Interoperability Analysis Framework COTS Definitions Architecting User Interface Component COTS Definition Repository Project Analyst Connector Query/Response COTS Connector Selector Connector Options Level of Service Connector Selection Framework Define Architecture & COTS combinations Deployment Architecture Connector Options Integration Analysis Component COTS Interoperability Analysis Report Integration Rules Integration Rules Repository 13 Integration Rules Interface analysis rules – Example: ‘Failure due incompatible error communication’ Internal assumption analysis rules – Example: ‘Data connectors connecting components that are not always active’ Dependency analysis rules – Example: ‘Parent node does not support dependencies required by the child components’ Each rule includes: pre-conditions, results 14 Integration Rules: Interface Analysis ‘Failure due incompatible error communication’ – Pre-conditions 2 components (A and B) communicating via data &/or control (bidirectional) One component’s (A) error handling mechanism is ‘notify’ Two components have incompatible error output/error input methods – Result Failure in the component A will not be communicated in component B causing a permanent block or failure in component B 15 Integration Rules: Internal Assumption Analysis ‘Data connectors connecting components that are not always active’ – Pre-conditions 2 components connected via a data connector One of the component does not have a central control unit – Result Potential data loss Component A Pipe Component B 16 Integration Rules: Dependency Analysis ‘Parent node does not support dependencies required by the child components’ – Pre-condition: Component in the system requires one or more software components to function – Result: The component will not function as expected 17 Voluminous Data Intensive Interaction Analysis An Extension Point implementation of the Level of Service Connector Selector Distribution connector profiles (DCPs) – Data access, distribution, streaming [Mehta et. al 2000] metadata captured for each profiled connector – Can be generated manually, or using an automatic process Distribution Scenarios – Constraint queries phrased against the architectural vocabulary of data distribution Total Volume Number of Users Number of User Types Delivery Intervals Data Types Geographic Distribution Access Policies Performance Requirements 18 Voluminous Data Intensive Interaction Analysis Need to understand the relationship between the scenario dimensions and the connector metadata – If we understood the relationship we would know which connectors to select for a given scenario Current approach allows both Bayesian inference and linear equations as a means of relating the connector metadata to the scenario dimensions For our motivating example – 3 Connectors, C1-C3 – Profiled 12 major OTS connector technologies Including bbFTP, gridFTP, UDP bursting technologies, FTP, etc. – Apply selection framework to “rank” most appropriate of 12 OTS connector solutions for given example scenarios 19 Voluminous Data Intensive Interaction Analysis Precision-Recall analysis – Evaluated framework against 30 real-world data distribution scenarios – 10 high volume, 9 medium volume, and 11 low volume scenarios – Used expert analysis to develop “answer key” for scenarios Set of “right” connectors Set of “wrong” connectors Applied Bayesian and linear programming connector selection algorithm – Clustered ranked connector lists using k-means clustering (k=2) to develop similar answer key for each algorithm Bayesian selection algorithm: 80% precision, linear programming 48% – Bayesian algorithm more “white box” – Linear algorithm more “black box” – White box is better 20 Demonstration Experiment 1 Conducted in graduate software engineering course on 8 projects – 6 projects COTS-Based Applications 2 web-based (3-tier) projects, 1 shared data project, 1 clientserver project, 1 web-service interaction project and 1 singleuser system – Implemented this framework before RLCA* milestone on their respective projects – Data collected using surveys Immediately after interoperability assessment After the completion of the project * Rebaselined Life Cycle Architecture 22 Experiment 1 Results Data Set Groups Mean Standard Deviation Dependency Accuracy Pre Framework Application 79.3% 17.9 0.017 Post Framework Application 100% 0 Pre Framework Application 76.9% 14.4 Interface Accuracy Actual Assessment Effort Actual Integration Effort PValue 0.0029 Post Framework Application 100% 0 Projects using this framework 1.53 1.71 Equivalent projects that did not use this framework 5 hrs 3.46 Projects using this framework 9.5 hrs 2.17 Equivalent projects that did not use this framework 18.2 hrs 3.37 0.053 0.0003 * Accuracy of Dependency Assessment: 1 – (number of unidentified dependencies/total number of dependencies) ** Accuracy of Interface Assessment: 1 – (number of interface interaction mismatches identified/total number of interface interactions) Accuracy: a quantitative measure of the magnitude of error [IEEE 1990] 23 Experiment 2 – Controlled Experiment Treatment Group Control Group Number of Students 75 81 On campus Students 60 65 DEN Students 15 16 Average Experience 1.473 years 1.49 years Average OnCampus Experience 0.54 years 0.62 years Average DEN Experience 5.12 years 5 years 24 Experiment 2 - Cumulative Results Data Set Groups Mean Standard Deviation PValue Treatment Group (75) 100% 0 Control Group (81) 72.5% 11.5 <0.0001 (t=20.7; sdev=8.31; DOF=154) Treatment Group (75) 100% 0 Control Group (81) 80.5% 13.0 Treatment Group (75) 72.8 min 28.8 Control Group (81) 185 min 104 Hypothesis IH1: Dependency Accuracy Hypothesis IH2: Interface Accuracy <0.0001 (t=13.0; sdev=9.37; DOF=154) Hypothesis IH3: Actual Assessment Effort <0.0001 (t=-9.04; sdev=77.5; DOF=154) 25 Experiment 2 – DEN Results Data Set Groups Mean Standard Deviation PValue Treatment Group (60) 100% 0 Control Group (65) 72.6% 11.8 <0.0001 (t=17.9; sdev=8.50; DOF=123) Treatment Group (60) 100% 0 Control Group (65) 80.4% 12.6 Treatment Group (60) 67.1 min 23.1 Control Group (65) 183 min 100 Hypothesis IH1: Dependency Accuracy Hypothesis IH2: Interface Accuracy <0.0001 (t=12.0; sdev=9.12; DOF=123) Hypothesis IH3: Actual Assessment Effort <0.0001 (t=-8.75; sdev=74.2; DOF=123) 26 Conclusion and Future Work Results (so far) indicate a “sweet spot” in small eservices project Framework-based tool automates initial interoperability analysis: – Interface, internal assumption, dependency mismatches Further experimental analysis ongoing – Different software development domains – Projects with greater COTS complexity Additional quality of service extensions 27 Questions 28