A Service-Oriented Componentization Framework for Java Software Systems MASc Seminar Shimin Li Software Technologies Applied Research Lab Department of Electrical & Computer Engineering Outline Motivation Research Goals Proposed Framework (SOC4J) Architecture Processes Case Studies Thesis Contributions Future Works August 29, 2006 Shimin Li, MASc Seminar 2 Motivation Service-oriented computing has dramatically changed the way in which we develop software systems. Providing competitive services to the global market is critical to the success of businesses and organizations. Many competitive services have already been implemented in existing systems. To expose all or parts of an existing system as business services is one of the most effective ways to leverage the value of the system. A business service of a software system is an abstract resource that represents a capability of performing tasks that represent a coherent functionality. To reuse business services, the service-oriented architecture suggests realizing them into self-contained components. A self-contained component is a component that contains all source code which are necessary to implement its services. August 29, 2006 Shimin Li, MASc Seminar 3 Research Questions How to reuse an existing object-oriented software system? Transforming the functionality of the software system into services by identifying critical business services embedded in the system. Realizing the identified services into self-contained components. How to improve the maintainability of an existing objectoriented software system? Transforming the monolithic architecture of the existing system into a more flexible service-oriented architecture. Reconstructing the existing system into a component-based system. August 29, 2006 Shimin Li, MASc Seminar 4 Research Goals To identify critical business services embedded in an existing Java system. To realize identified services as self-contained components. To reconstruct the existing system into a component-based system. To build a comprehensive framework addressing the above objectives, based on the following research areas: Program Comprehension Program Migration Architecture Recovery Software Reuse August 29, 2006 Shimin Li, MASc Seminar 5 Service-Oriented Componentization Framework for Java Software Systems (SOC4J) Stage I: Architecture Recovery (AR) Source Code Modeling Java Source Code Component-Based System Source code models Architecture Modeling Stage IV: System Transformation (ST) Architecture Reconstruction Architectural models Top-Level Service Identification Top-level services Low-Level Service Identification Stage II: Service Identification (SI) August 29, 2006 Self-contained components Top-level services and their low-level services Service Realization Self-Contained Component Repository Stage III: Component Generation (CG) Shimin Li, MASc Seminar 6 Stage I (AR) : Source Code Modeling Goal To build a complete data model set for Java source code at different levels of abstraction to support structural analysis and recovery. Approach Java Source Code Interpreter Raw Data Model Generator Source Code Models (XML Doc) generates JavaCC Grammar JavaCC (Java Compiler Compiler) Source Code Models JPackage – To model Java packages JFile – To model Java source files JClass – To model Java classes and interfaces JMethod – To model Java methods and constructors To construct the Basic View (BView) of the system August 29, 2006 Shimin Li, MASc Seminar 7 Stage I (AR) : Architecture Modeling Goal To establish a repository of relationships among classes and interfaces which can easily be queried in the service identification stage. Approach Relationship Extractor Source Code Models (XML Doc) CIDG XML Parser Graph Transformer Objects CIRG Metric Generator Graph Generator Architectural Models Class/Interface Relationship Graph (CIRG) Class/Interface Dependency Graph (CIDG) To build the Structure View (SView) of the system August 29, 2006 Shimin Li, MASc Seminar 8 Class/Interface Relationship Graph (CIRG) Purpose To capture different types of relationships among classes and interfaces To describe relationships as graph representations Definitions A Labeled Directed Graph (LDG) is a tuple Γ(V, E, LV, LE, lV , lE), where V is a set of nodes (or vertices), E is a set of edges (or arcs), LV is a set of node labels, LE is a set of edge labels, lV : V → LV is a label function that maps nodes to node labels, and lE : E → LE is a label function that maps edges to edge labels. The CIRG of an object-oriented system is an LDG, where V is the set of all classes/interfaces of the system, lV (v) returns the full name (i.e. package name concatenates class or interface name) of v for any v ∈ V , E = {(v, w) ∈ V × V | v references w}, and lE(e) returns the types of relationships between the source node and target node of e for any e ∈ E. The type of a relationship is one of IN, RE, AS, AG, CO, and US, which represents inheritance, realization, association, aggregation, composition, and usage, respectively. In SOC4J, the types of relationships are inheritance (IN), realization (RE), association (AS), aggregation (AG), composition (CO), and usage (US). August 29, 2006 Shimin Li, MASc Seminar 9 Class/Interface Dependency Graph (CIDG) Purpose To capture the dependency relationship among classes and interfaces To represent the CIRG at different levels of abstraction Definition The CIDG of an object-oriented system is an LDG, where V is the set of all classes/interfaces of the system, lV (v) returns the full name (i.e. package name concatenates class or interface name) of v for any v ∈ V , E = { (v, w) ∈ V × V | v w }, LE = φ, and hence lE(e) returns an empty label for any e ∈ E. C1 <<IN>> C2 C4 <<RE>> <<RE>> <<AG>> C1 <<CO, AG>> <<AS>> C3 abstract C5 C5 C3 <<US, AS>> CIDG CIRG August 29, 2006 C4 C2 Shimin Li, MASc Seminar 10 The SOC4J Framework Stage I: Architecture Recovery (AR) Source Code Modeling Java Source Code Component-Based System Source code models Architecture Modeling Stage IV: System Transformation (ST) Architecture Reconstruction Architectural models Top-Level Service Identification Top-level services Low-Level Service Identification Stage II: Service Identification (SI) August 29, 2006 Self-contained components Top-level services and their low-level services Service Realization Self-Contained Component Repository Stage III: Component Generation (CG) Shimin Li, MASc Seminar 11 Service Description in SOC4J Classification Top-Level Service (TLS): A top-level service is a service that is not used by any other services of the system. It may contain a hierarchy of low-level services that further describes/modularizes the service. Low-Level Service (LLS): A low-level service is a service that is underneath a top-level service and may be used by other low-level services. Representation top-level service A service is represented as a tuple: (name, CF, SHG) MASc Seminar Arrangement name – name of the service, low-level service CF – Façade Class Set of the service, SHG – Service Hierarchy Graph that is associated Program Status Room Checking Booking to a top-level service. The SHG describes structural relationships between a top-level service and its lowAnnouncement Database Connection level services. An Example of SHG All top-level services of a system and their SHGs build the Service View (ServView). August 29, 2006 Shimin Li, MASc Seminar 12 Stage II (SI) : Top-Level Service Identification Goals To identify the top-level services embedded in an existing Java software system. To build an initial SHG for each identified top-level service. Low-level services within the initial SHG are called atomic services. An atomic service is a service provided by a single Java class or interface. Rationale By their definition, top-level services partition the system into independent parts. Each of these independent part contains an entry point of the system. From the user's point of view, each of these independent part represents a service (i.e., top-level service) to the outside world. August 29, 2006 Shimin Li, MASc Seminar 13 Stage II (SI) : Top-Level Service Identification cont’d Approach CIRG, CIDG CIDG Transformatiom MCIDGs Top-Level Service Candidate Generation Top-level service candidates Service Validation Validated top-level services and their atomic services (described in SHGs) Top-Level Service Identification Process August 29, 2006 CIDG Transformation – decomposing CIDG into a set of rooted components. A rooted component is named as a Modularized CIDG (MCIDG). Each MCIDG is a subgraph of CIDG and represents an independent part of the system. Top-Level Service Candidate Generation – generating top-level service candidates from MCIDGs and describing each candidate as a tuple, (name, CF, SHG). Service Validation – validating each candidate by examining classes within its facade class set (classes in the façade class set represent the functionality of the service) and assigning a meaningful name for each accepted service. Shimin Li, MASc Seminar 14 Stage II (SI) : Low-Level Service Identification Goal To identify reusable services underneath each top-level service. Rationale The initial SHG built in the top-level service identification process is a rooted directed graphs. It represent the structural dependency between a top-level service and its low-level services (atomic services). Atomic services (provided by a single Java class) are very fine-grained and therefore have very limited reusability. Highly related atomic services could be clustered together to represent a new service. The newly identified service has higher level of granularity and thus presents a higher potential of reuse. After service clustering, a new SHG can be built by introducing the newly identified services. August 29, 2006 Shimin Li, MASc Seminar 15 Stage II (SI) : Low-Level Service Identification cont’d Approach SHG Transformation – preprocessing the SHG, such as collapsing cycles, etc. Dominance Tree Generation – generating the dominance tree from SHG. Dominance Tree Reduction – identifying highly related services and clustering these services into a new service. The newly identified service has a higher level of granularity. A top-level service and its atomic services (described in SHG) SHG Transformation Service Aggregation SHG Dominance Tree Generation DTree of SHG Dominance Tree Reduction Reduced DTree SHG Reconstruction – reconstructing the SHG from a reduced dominance tree. Termination Criteria: SHG Reconstruction SHG No Termination Criteria Satisfied? (1) Yes The input top-level service and its low-level services (described in newly built SHG) (2) The top-level service has been nicely modularized by its low-level services. Low-level services are presenting appropriate level of granularity. Low-Level Service Identification August 29, 2006 Shimin Li, MASc Seminar 16 The SOC4J Framework Stage I: Architecture Recovery (AR) Source Code Modeling Java Source Code Component-Based System Source code models Architecture Modeling Stage IV: System Transformation (ST) Architecture Reconstruction Architectural models Top-Level Service Identification Top-level services Low-Level Service Identification Stage II: Service Identification (SI) August 29, 2006 Self-contained components Top-level services and their low-level services Service Realization Self-Contained Component Repository Stage III: Component Generation (CG) Shimin Li, MASc Seminar 17 Component Description in SOC4J Classification Top-Level Component (TLC): A top-level component is a component that realizes a top-level service. Low-Level Component (LLC): A low-level component is a component that realizes a low-level service. Representation top-level component A component is described as a tuple: (name, if, CF, CC , CHG) MASc Seminar Arrangement name – name of the component, low-level component if – interface of the component, CF – Façade Class Set of the component, Program Status Room Checking Booking CC – Constituent Class Set of the component, CHG – Component Hierarchy Graph that is associated Announcement Database to a top-level component. The CHG describes structural Connection relationships between a top-level component and its lowAn Example of CHG level components. All top-level components of a system and their CHGs build the Component View (CompView) August 29, 2006 Shimin Li, MASc Seminar 18 Component Reusability Model in SOC4J Characteristic Quality Factor Criteria Metric Complexity RPD Observability RCO Adptability Customizability RCC Portability External Dependency Understandability Reusability SCCr SCCp RPD (Reference Parameter Density) measures the occurrence of reference parameters in the interface of a component. RCO (Rate of Component Observability) measures the percentage of readable properties in all fields declared in the interface of a component. RCC (Rate of Component Customizability) measures the percentage of writable properties in all fields declared in the interface of a component. SCCr (Self-Completeness of Component's Return Values) measures the percentage of business methods without any return values in all business methods implemented in a component. SCCp (Self-Completeness of Component's Parameters) measures the percentage of business methods without any parameters in all business methods implemented in a component. Based on the above metrics, the reusability has been formulized to a value in [-1, 1]. A higher value represents a higher level of reusability. August 29, 2006 Shimin Li, MASc Seminar 19 Stage III (CG) : Service Realization Goal To realize each identified service (both top-level service and low-level service) into a self-contained component. Approach Name the component by copying its service’s name. Compute the façade class set, CF, by copying its service’s façade class set. Extract the constituent class set, CC, from the CIDG in order to make the component self-contained. Create a new interface, if, and modify source code of classes/interfaces in the façade class set so that the user can access all public methods and class fields defined in classes in the façade class set through the newly created interface. Generate the CHG for a top-level component, based on the SHG of its service. The above code modification is a kind of refactoring because this modification does not change the observable behavior of the original system. August 29, 2006 Shimin Li, MASc Seminar 20 The SOC4J Framework Stage I: Architecture Recovery (AR) Source Code Modeling Java Source Code Component-Based System Source code models Architecture Modeling Stage IV: System Transformation (ST) Architecture Reconstruction Architectural models Top-Level Service Identification Top-level services Low-Level Service Identification Stage II: Service Identification (SI) August 29, 2006 Self-contained components Top-level services and their low-level services Service Realization Self-Contained Component Repository Stage III: Component Generation (CG) Shimin Li, MASc Seminar 21 Stage IV (ST) : Architecture Reconstruction Goal To reconstruct an existing object-oriented system into a component-based system, based on the components extracted from the system. Reference Model for Component-Based Systems Target System (Component-Based System) 1 contains 1 contains * 1.. * Top-Level Component (JAR file) 1 * 1 * contains contains * * Low-Level Component (JAR file) August 29, 2006 Class/Interface (Java file) contains 1 1 contains Shimin Li, MASc Seminar 22 Stage IV (ST) : Architecture Reconstruction Approach We adopt a bottom-up integration technique that collaborates with the extracted components, by starting with the components in the lowest position in the component hierarchy: for each top-level component t do while there exists a low-level component in t.CHG do find the component c, which is in the lowest position in t.CHG; retrieve the parents of c in t.CHG; refactor the source code of each parent to access c through its interface; remove component c from t.CHG; end while end for The above reconstruction process does not change the observable behavior of the original system. August 29, 2006 Shimin Li, MASc Seminar 23 The SOC4J Framework Stage I: Architecture Recovery (AR) Source Code Modeling Java Source Code Component-Based System Source code models Architecture Modeling Stage IV: System Transformation (ST) Architecture Reconstruction Architectural models Top-Level Service Identification Top-level services Low-Level Service Identification Stage II: Service Identification (SI) August 29, 2006 Self-contained components Top-level services and their low-level services Service Realization Self-Contained Component Repository Stage III: Component Generation (CG) Shimin Li, MASc Seminar 24 JComp – Java Componentization Kit JComp is a toolkit to implement the proposed SOC4J framework and provides an integrated workbench for componentizing Java software systems. It is built on the top of the Eclipse Rich Client Platform (RCP) and is composed of a set of plug-ins. JComp RCP Application Eclipse RCP Platform UI (Generic Workbench) Parser Plug-in Modeler Plug-in JFace Parser Plug-in : Generating source code models (JPackage, JFile, JClass, and JMethod). Modeler Plug-in : Building architectural models (CIRG and CIDG). Extractor Plug-in SWT Generator Plug-in Resource Manager Transformer Plug-in Platform Runtime (OSGi) JComp Architecture August 29, 2006 Extractor Plug-in : Identifying business services. Generator Plug-in : Generating a selfcontained component for each service. Transformer Plug-in : Reconstructing an existing system into a component-based system. Shimin Li, MASc Seminar 25 Case Studies Jetty: an open-source, standards-based, and full-featured web server implemented entirely in Java. Apache Ant: a software tool for automating software build processes. It is similar to make, but it is written in Java and is primarily intended for use with Java. Project Version LOC Java Source Files Packages Classes Interfaces Jetty 5.1.10 44125 318 25 273 47 Apache Ant 1.6.5 86468 690 70 640 60 August 29, 2006 Shimin Li, MASc Seminar 26 Obtained Results - Jetty 33 top-level service candidates were generated. 16 top-level services were accepted. The unacceptable candidates are dead code, debugging modules, or testing modules. For example, we found 8 dead classes in org.mortbay.util package and a debugging module whose entry point is the class org.mortbay.servlet.ProxyServlet. Low-level services underneath each of 16 top-level services were identified. August 29, 2006 Shimin Li, MASc Seminar 27 Business Services Identified from Jetty ID Top-Level Service Classes and Interfaces Low-Level Services T1 Win32 Server 248 11 T2 Dynamic Servlet Invoker 207 12 T3 Jetty Server MBean 126 9 T4 Proxy Request Handler 113 7 T5 XML Configuration MBean 87 5 Low-Level Services within Win32 Server T6 Web Application MBean 86 6 Jetty Server 0.9 T7 Administration Servlet 56 5 Service Handlers 0.6 T8 CGI Servlet 49 5 Resource Handler 0.7 T9 Host Socket Listener 46 5 Security Handler 0.7 T10 Web Configuration 34 3 Socket Listener 0.8 T11 Authentication Access Handler 30 3 HTTP Connection 0.9 T12 Servlet Response Wrapper 27 2 HTTP Request 0.7 T13 IP Access Handler 18 0 HTTP Response 0.5 T14 Multipart Form Data Filter 16 2 Web Application Context 0.6 T15 HTML Script Block 12 1 Servlet 0.7 T16 Applet Block 9 1 Servlet Handler 0.8 August 29, 2006 Shimin Li, MASc Seminar Component Reusability 28 Reusability of Components Extracted from Jetty Reusability of Top-Level Components Average Reusability of Low-Level Components in a Top-Level Component 1 0.9 Reusability 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 Top-Level Components August 29, 2006 Shimin Li, MASc Seminar 29 Time and Space Statistics of JComp Measurement Item Jetty Apache Ant Source Code Modeling Time (min : sec) 2:18 5:20 Architecture Modeling Time (min : sec) 4:19 9:15 Top-Level Service Candidate Identification Time (min : sec) 8:45 19:43 Average Low-Level Service Identification Time (min: sec) 1:06 0:54 Measurement Item Jetty Apache Ant Source Code Space (MB) 2.95 5.69 Source Code Model Space (MB) 1.43 3.34 Architectural Model Space (MB) 1.57 3.92 JComp were running on a Windows desktop with Intel Pentium 4 CPU 3.4GHz, 2G memory. August 29, 2006 Shimin Li, MASc Seminar 30 Thesis Contributions The design and implementation of comprehensive graph representations of an object-oriented system in different levels of abstraction. The design and implementation of an efficient and effective methodology for identifying and realizing critical business services embedded in an existing object-oriented system. The exploration of an incremental program comprehension approach. The BView, SView, ServView, and CompView built by the proposed framework help users gain a program understanding. The design and implementation of a toolkit that provides an integrated workbench for componentizing Java software systems. August 29, 2006 Shimin Li, MASc Seminar 31 Future Works To apply dynamic analysis on system behavior within the first stage of the SOC4J framework to improve the detection of class relationships. To investigate some algorithmic processes that can be used to automatically categorize the identified services and components. To improve the precision of the service identification by considering design-patterns, alternate implementations of the algorithms, and alternate definitions of the class relationships. To extend the SOC4J framework on other programming languages, for instance, C++, or even C and COBOL. August 29, 2006 Shimin Li, MASc Seminar 32 A Service-Oriented Componentization Framework for Java Software Systems MASc Seminar Shimin Li Software Technologies Applied Research Lab Department of Electrical & Computer Engineering