CS 578 Software Architecture Project Report Grid Architecture Recovery for Pegasus & Sun Grid Engine Group No. 15 Mark Shehata Carl Hanks Mayank Singh Table of Contents: 1. Introduction 2. Grid technologies 1. Pegasus A. DAX B. Catalogs 2. Sun Grid Engine A. Hosts B. Daemons C. Queues 3. Architecture recovery Tool 4. Recovery Process 1. For Pegasus A. Recovery and Rationale B. Components C. Connectors D. Style identification E. Identifying discrepancies 2. For Sun Grid Engine A. Recovery and Rationale B. Components C. Connectors D. Style identification E. Identifying discrepancies 5. Difficulties faced 1. Selection of right tool 2. Complexity of diagram 3. Issues with Pegasus 4. Issues with Sun Grid Engine 6. Conclusions 7. References 1. Introduction: “Architectural recovery is a process frequently used to cope with architectural erosion whereby the current, “as implemented” architecture of a software system is extracted from the system’s implementation.”[1] The project work involves studying the given assigned Grid Technologies, creating a Class/Component level diagram from their source code using one of the many Architecture recovery tools available, then shoe-horning the resultant architecture (after applying FOCUS[2]) in the given 5-layer Grid architecture [3] to see where the discrepancy lies between that and our recovered architecture. Since there is always a difference between “as proposed” and “as implemented” architecture there are bound to be mismatches between the two and that is the whole point of this exercise. Also the way in which one chooses to apply the FOCUS technique will also result in a different architecture. Following are the processes involved in recovery: 1. Recovery of major software components. 2. Recovery of major software connectors. 3. Recovery of two major architectural styles used in each Grid technology. 4. Shoe-horning of recovered architecture into the 5-layer standard Grid Architecture 5. Discrepancy identification - includes finding Up-calls, crossing of two-layer boundaries, unspecified layer dependencies and other discrepancies. 2. Grid Technologies: 1. Pegasus (Planning for Execution in Grids): Pegasus is a scientific workflow system which lets users design and plan complex workflows and maps them onto the grid resources for execution, efficiently and automatically and executes and manages the whole thing. It does this mapping of the abstract user-defined work flow onto the underlying distributed resources (data/computation or network) without letting the user worry about low-level specifications required by the Middleware like Condor, EC2 etc. Pegasus has following 3 main components: 1. Pegasus Mapper – It provides and executable work (mapping only) flow based on the specifications of the abstract (user-provided) workflow and maps it onto the grid resources. Optimization of execution and management of data flow is also one of its concerns. 2. Execution Engine (DAGMan) – Uses the resources available and defined by the Mapper to execute the tasks required in the workflow in correct order of the dependencies. 3. Task Manager (Condor Schedd) – Manages execution of individual tasks on the defined resources. It allows for the specification of workflows in general XML format called DAX. Pegasus also uses an xml file which contains information about the available distributed resources called the Site Catalog. From the DAX and Site Catalog files, Pegasus generates an execution mapping of jobs onto the available resources. 1. Abstract Workflow (DAX) : The DAX file specifies the jobs in a workflow, the resources those jobs require and the dependencies those jobs have on one another. It is divided into 3 sections – A. File Lists: Contains all input/output or executable files that are to be used at execution time. B. Tasks: Specifies all the tasks, the resources needed for them and any profiles associated with the tasks. C. Dependencies: Lists all the dependencies between the jobs/tasks. 2. Catalogs: The three kinds of catalogs used by Pegasus are – A. Replica Catalog: Used as a lookup table for finding out where the input and output data is located. B. Site Catalog: Used to keep track of various sites across the grid. C. Transformation Catalog: Used to store the location of all the executable resources on the grid. Fig1. Pegasus Grid Structure Fig. 2 Pegasus Workflow Chart 2. Sun Grid engine: It is an Open source project by Sun Microsystems (now Oracle Corp.). It is a DRM (Distributed Resource Management) software, means it helps to aggregate and provide computing power of a large number of machines at the disposal of users to carry out computation intensive tasks. It has following 3 functionalities – 1. Load balancing, optimization and task scheduling on networked nodes. 2. Letting users queue and schedule tasks. 3. Controlling access to distributed resources centrally and ensuring execution of jobs according to priority. 4. Authorization and authentication. SGE Components: 1. Hosts: There are following 4 types – A. Master Host – It runs the master daemon sge_qmaster and the scheduler daemon sge_schedd, which control all grid engine system components, such as queues and jobs. The daemons also maintain tables about the status of the components, about user access permission. B. Execution Host – They run the execution daemon sge_execd, which has permission to execute jobs in the queue instances associated with them. C. Administration Hosts – Carry out management activities on the grid system. D. Submit Hosts – Allows to submit and monitor batch jobs using QMON the GUI. 2. Daemons: There are following three types – A. sge_qmaster (Master Daemon) - It maintains tables about hosts, queues, jobs, system load, and user permissions. It also co-ordinates with and receives scheduling decisions from sge_schedd and requests actions from sge_execd on the appropriate execution hosts. B. sge_schedd (Scheduler Daemon) – It co-ordinates and maintains an up-to-date view of the cluster's status with the help of sge_qmaster. It makes the following scheduling decisions: a. Which jobs to schedule and when and which queues b. How to prioritize and synchronize job execution. C. sge_execd (Execution Daemon) – It watched over the execution of jobs in queues and offers information about the system load and current statistics to master daemon. 3. Queues: A queue is an abstraction (container) for a class of jobs that are allowed to run on one or more hosts concurrently. It has associated with it job attributes, for example, whether the job can be migrated. The grid engine software automatically dispatches the job to a suitable queue and a suitable host with a light execution load. Fig. 3 SGE 3. Architecture Recovery Tool: Our group had difficulty selecting a tool to do static analysis on the grid technologies that we were given. Since one of our architectures used a combination of C++ and Java (namely Sun Grid Engine), the tool needed to have solid support for both languages. Furthermore, it needed to be able to reverse engineer architecture from source for both languages and import it into the same diagram. The tool also needed to be able to support diagrams with many classes without crashing itself or crashing our systems. So we tried a variety of tools and in the end selected “SPARX Enterprise Architect”. One of the most important qualities which led us to this choice was “CodeEngineering” which includes the ability to reverse engineer the code and extract class/component model from it. It creates a UML diagram after examining the source code and libraries and synchronizes it with any change in the source. It also allows us to view the source and the diagram by “directory” or “namespace/package” views. Moreover saving the diagram as an image file is pretty easy. 4. Recovery Process: 1. For Pegasus: Static analysis of Pegasus was performed using Sparx Enterprise Architect. Enterprise Architect parsed the 140,000 lines of code in Pegasus and recovered slightly over 700 classes. It generated a class diagram for every package in the Java software. These package-level diagrams were combined together into one top-level diagram so the whole software could be visualized as a whole. A. Recovery and Rationale : Navigation through the top-level diagram of Pegasus was too cumbersome with Enterprise Architect, so the FOCUS rules were first applied to classes in the lowest level packages of the software. To perform FOCUS on the lower level packages, related external classes were brought into the diagram to ensure that relationships were not destroyed when combining classes. Once simplified, the lowest level package diagrams were combined into diagrams at the level of their parent packages. FOCUS was then applied on these diagrams. This combining operation was iterated through several times, until the highest level packages were reached. We had to make several decisions as to which pieces of the software should be combined into components and which should remain separate. In order to maintain this fidelity, we consistently followed a few principles: Exception classes were removed from the diagrams entirely, as long as the class throwing and receiving the exception had an association. If two classes were generalizations of the same interface, but did not have any associations in common, they remained separate. The interface that the two implemented was removed from the diagrams. We did this because Pegasus had a few interfaces with no functionality and were only for grouping. Most components associated with the Logging, PegasusProperties and PlannerOptions classes. In the end, we decided to remove all these associations from the end diagrams to simplify the end product. The PegasusProperties and PlannerOptions are classes that consist of many static configuration variables, and as such, did not represent a component. The Logging is a component, but associations with it were removed to make the end product less complicated. B. Components : The recovered components of Pegasus can be seen in the /analysis/final/Pegasus/ folder. These components were then “shoehorned” into the five-layer grid architecture which is visible in the /analysis/shoehorn/Pegasus folder. The two layers that ended up with most of the components are the Application and Collective layers. The Collective layer contains items that perform data replication, workload management, grid-enabled programming systems, and directory services. These capabilities were present in components in our system such as SiteSelector, SiteCatalog, ReplicaCatalog, and GridStart. The Application layer consists of user applications and support for running applications on the grid. A large portion of Pegasus is a representation of running applications on other grid infrastructures. Thus, we put the portions of Pegasus that support running other applications at the application layer. C. Connectors : Pegasus contains a multitude of different connectors. Two of the most prevalent connectors in Pegasus are remote procedure calls and streams. Pegasus prepares submission files to execute jobs on other grid systems. The GridStart component calls other grid systems to execute the jobs remotely by making a remote call of the other grid technology. Pegasus also relies heavily on stream based connectors. In order to perform all of its planning, Pegasus reads in many configuration files. These can be streamed in from files, or in some cases from other sources such as network sources or a database. D. Style Identification One of the prominent architectural styles in Pegasus is Pipe and Filter. Pegasus relies heavily on parsing files which describe both the workflows that need to be executed and the resources available to use in their execution. Components such as the DAX, SiteCatalog and Visualization parse input files or streams from other applications to retrieve the information needed. Thus Pegasus relies on other programs or user-generated files to produce a stream of information about jobs and resources. The files or streams passed in are the pipes which connect to Pegasus components that act as filters. Another architectural style present in Pegasus is the Client-Server style. Pegasus generates workflows and submits them for execution to other grid technologies such as Condor. Pegasus is executing a remote procedure call over the network to execute the job. In this way, Condor, the server, accepts requests from the client, Pegasus. E. Identified Discrepancies After “shoehorning” Pegasus into the 5-layered grid architecture, it was clearly visible that Pegasus contains many discrepancies from an ideal layered architecture. Among these, there are at least 16 instances of crossing 2-layer boundaries for a call. There are also 11 instances of lower layer components calling upper layer components, resulting in an up-call. 2. For Sun Grid Engine : 5. Difficulties faced: 5.1 Selection of right tool: We tried many a tools like IBM Rational Rhapsody, IBM Rational Rose, SciTools Understand, PragSoft UML Studio, ArgoUML, Softwarenaut, DoxyGen, IntensiVE. Of all these Doxygen and Understand seemed to be most promising but were still limited in as what we wanted to achieve using them. None of these tools seemed to be a good fit for our group and grid technology. The following is a list of some of the problems we encountered and the products we encountered them with. 1. Unable to install the program (IntensiVE). Prompted to install “Smalltalk” first and then repeatedly gave “Peer/Network” error messages while trying to access some online repository. 2. Required version not available for trial/demo (Rational Rhapsody). Also there was a license issue for which we had a talk with IBM Support which wasn’t helpful at all. 3. Program crashed because they are unable to handle the large number of classes in the grid technology (Understand). 4. Inability to understand all relationships present in the code (Understand) 5. Limited support for one of the two languages (Rational Rose Java, ArgoUML) 6. Inability to view the architecture as a whole with all classes on one diagram (UML Studio) 7. Inability to parse some source files. (Rational Rose Java, ArgoUML) Insufficient documentation, as on how to use the program for static analysis (Softewarenaut, IntensiVE). This was one of the most confounding situations as documentation for the tools was very rare except a few. 5.2 The complexity of the diagram: It was pretty difficult to view the diagram on the screen as a whole and work with it. Also trying to save it as an image file was tough as it reduced the quality to such an extent that the components’ description was rendered useless and unreadable. Also the complexity of the resulting diagram resulting from the source code which included a dozen packages, scores of classes and functions did not help much to reduce the problem of working with the diagram. 5.3 Issues with Pegasus: There were many difficulties in recovering the architecture of Pegasus. Though Enterprise Architect allowed us to reverse engineer the large easily, it did have a few shortcomings that hampered our ability to properly recover the architecture. The tool failed to identify many relationships in UML properly. The tool generated a plethora of dependency relationships and did not identify any relationships as aggregation or composition, so in order to determine what the actual relationship was; the code was analyzed in Eclipse. In some cases, a class had an internal class that it creates only for its own use. This should be a classic composition relationship, but the tool labeled it as association. The relationship mislabeling could only be corrected by manually looking at the source code and trying to determine the appropriate relationship. Because of this difficulty in the tool, it is likely that we misinterpreted some relationships while performing our analysis. Our group also had problems with getting Enterprise Architect to render all of the diagrams appropriately. When more than one relationship existed between 2 classes, the tool would draw the relationship lines separately even if it was the same kind of relationship. This added extra relationship lines to the recovered UML diagrams which made visually analyzing the diagrams more difficult. When performing the reverse engineering, the recovery software generated a class diagram for every package. In order to get a holistic view of the software, all of the diagrams were combined together by copying elements and pasting them into a top-level diagram. If multiple elements were pasted into a diagram, the software would not place them all in the diagram reliably, resulting in the top-level diagram not having all of the classes in it. The incomplete diagrams also made visual analysis more difficult. Unfortunately, we discovered the problems with Enterprise Architect as we did the recovery. Some of them were not observed until we had invested a large amount of time applying FOCUS. Had we known about the tools abilities and inabilities sooner, it may have affected our decision to use Enterprise Architect, and we may have used a different tool entirely. The software structure of Pegasus itself was another source of difficulty in recovering the architecture. The software contained many elements that shared the same name, but had different functionalities. There were many elements just named “Abstract,” so their true functionally could only be discovered in context. Another element of confusion was the use of interfaces within the software. A few interfaces were created simple to group a bunch of elements. These interfaces provided no methods, and according to the code comments they exist for grouping purposes. Several of the realizations of these interfaces seemed to have nothing in common, which lead us to not combine them into one element. Although the structure and naming of the Pegasus software provided some difficulty, the source code itself actually helped our recovery process. Roughly two-thirds of the source code was documented with many JavaDoc-style comments. Since inabilities in the software analysis tool often required us to look at the source code to determine relationships, the comments allowed for quicker analysis of the code. The source code was also already structured as a project in Eclipse; opening the project with Eclipse made determining relationships among the code easier because it allowed for easily finding hierarchy of class implementations and method calls. 5.4 Issues with Sun Grid Engine: First big problem was that it does not support adding the whole source code at a time if the source based on more than one Programming language. For Java, It generates separated diagrams based on the package while for C++ based on namespaces. However both diagrams could also be generated based on class-wise or directory structure. EA could not find the duplicated classes if any and many stand-alone classes were found which were left dangling in the intermediate diagram. Some Classes does not have any comments, descriptions or others have very vague names. Many Relationships were missing especially the ones form C++ classes to Java classes so, the code and documentation were checked and analyzed repeatedly to identify missing relationships if any. Many Relationships were identified wrong i.e. they should have been generalization but they were shown as associations.