CS 578 Software Architecture Project Report

advertisement
CS 578 Software Architecture
Project Report
Grid Architecture Recovery for
Pegasus
&
Sun Grid Engine
Group No. 15
Mark Shehata
Carl Hanks
Mayank Singh
Table of Contents:
1. Introduction
2. Grid technologies
1.
Pegasus
A. DAX
B. Catalogs
2.
Sun Grid Engine
A.
Hosts
B.
Daemons
C.
Queues
3. Architecture recovery Tool
4. Recovery Process
1. For Pegasus
A. Recovery and Rationale
B. Components
C. Connectors
D. Style identification
E. Identifying discrepancies
2. For Sun Grid Engine
A. Recovery and Rationale
B. Components
C. Connectors
D. Style identification
E. Identifying discrepancies
5. Difficulties faced
1. Selection of right tool
2. Complexity of diagram
3. Issues with Pegasus
4. Issues with Sun Grid Engine
6. Conclusions
7. References
1. Introduction:
“Architectural recovery is a process frequently used to cope with architectural erosion
whereby the current, “as implemented” architecture of a software system is extracted
from the system’s implementation.”[1]
The project work involves studying the given assigned Grid Technologies, creating a
Class/Component level diagram from their source code using one of the many
Architecture recovery tools available, then shoe-horning the resultant architecture (after
applying FOCUS[2]) in the given 5-layer Grid architecture [3] to see where the discrepancy
lies between that and our recovered architecture. Since there is always a difference
between “as proposed” and “as implemented” architecture there are bound to be
mismatches between the two and that is the whole point of this exercise. Also the way in
which one chooses to apply the FOCUS technique will also result in a different
architecture.
Following are the processes involved in recovery:
1. Recovery of major software components.
2. Recovery of major software connectors.
3. Recovery of two major architectural styles used in each Grid technology.
4. Shoe-horning of recovered architecture into the 5-layer standard Grid
Architecture
5. Discrepancy identification - includes finding Up-calls, crossing of two-layer
boundaries, unspecified layer dependencies and other discrepancies.
2. Grid Technologies:
1. Pegasus (Planning for Execution in Grids):
Pegasus is a scientific workflow system which lets users design and plan complex
workflows and maps them onto the grid resources for execution, efficiently and
automatically and executes and manages the whole thing. It does this mapping of
the abstract user-defined work flow onto the underlying distributed resources
(data/computation or network) without letting the user worry about low-level
specifications required by the Middleware like Condor, EC2 etc.
Pegasus has following 3 main components:
1. Pegasus Mapper – It provides and executable work (mapping only) flow based on
the specifications of the abstract (user-provided) workflow and maps it onto the grid
resources. Optimization of execution and management of data flow is also one of its
concerns.
2. Execution Engine (DAGMan) – Uses the resources available and defined by the
Mapper to execute the tasks required in the workflow in correct order of the
dependencies.
3. Task Manager (Condor Schedd) – Manages execution of individual tasks on the
defined resources.
It allows for the specification of workflows in general XML format called DAX.
Pegasus also uses an xml file which contains information about the available
distributed resources called the Site Catalog. From the DAX and Site Catalog files,
Pegasus generates an execution mapping of jobs onto the available resources.
1. Abstract Workflow (DAX) : The DAX file specifies the jobs in a workflow, the
resources those jobs require and the dependencies those jobs have on one another. It
is divided into 3 sections –
A. File Lists: Contains all input/output or executable files that are to be used at
execution time.
B. Tasks: Specifies all the tasks, the resources needed for them and any profiles
associated with the tasks.
C. Dependencies: Lists all the dependencies between the jobs/tasks.
2. Catalogs: The three kinds of catalogs used by Pegasus are –
A. Replica Catalog: Used as a lookup table for finding out where the input and
output data is located.
B. Site Catalog: Used to keep track of various sites across the grid.
C. Transformation Catalog: Used to store the location of all the executable
resources on the grid.
Fig1. Pegasus Grid Structure
Fig. 2 Pegasus Workflow Chart
2. Sun Grid engine:
It is an Open source project by Sun Microsystems (now Oracle Corp.). It is a DRM
(Distributed Resource Management) software, means it helps to aggregate and
provide computing power of a large number of machines at the disposal of users to
carry out computation intensive tasks. It has following 3 functionalities –
1. Load balancing, optimization and task scheduling on networked nodes.
2. Letting users queue and schedule tasks.
3. Controlling access to distributed resources centrally and ensuring execution of
jobs according to priority.
4. Authorization and authentication.
SGE Components:
1. Hosts: There are following 4 types –
A. Master Host – It runs the master daemon sge_qmaster and the scheduler
daemon sge_schedd, which control all grid engine system components, such as
queues and jobs. The daemons also maintain tables about the status of the
components, about user access permission.
B. Execution Host – They run the execution daemon sge_execd, which has
permission to execute jobs in the queue instances associated with them.
C. Administration Hosts – Carry out management activities on the grid system.
D. Submit Hosts – Allows to submit and monitor batch jobs using QMON the
GUI.
2. Daemons: There are following three types –
A. sge_qmaster (Master Daemon) - It maintains tables about hosts, queues, jobs,
system load, and user permissions. It also co-ordinates with and receives
scheduling decisions from sge_schedd and requests actions from sge_execd on
the appropriate execution hosts.
B. sge_schedd (Scheduler Daemon) – It co-ordinates and maintains an up-to-date
view of the cluster's status with the help of sge_qmaster. It makes the following
scheduling decisions:
a. Which jobs to schedule and when and which queues
b. How to prioritize and synchronize job execution.
C. sge_execd (Execution Daemon) – It watched over the execution of jobs in
queues and offers information about the system load and current statistics to
master daemon.
3. Queues: A queue is an abstraction (container) for a class of jobs that are allowed to
run on one or more hosts concurrently. It has associated with it job attributes, for
example, whether the job can be migrated. The grid engine software automatically
dispatches the job to a suitable queue and a suitable host with a light execution
load.
Fig. 3 SGE
3. Architecture Recovery Tool:
Our group had difficulty selecting a tool to do static analysis on the grid technologies
that we were given. Since one of our architectures used a combination of C++ and Java
(namely Sun Grid Engine), the tool needed to have solid support for both languages.
Furthermore, it needed to be able to reverse engineer architecture from source for both
languages and import it into the same diagram. The tool also needed to be able to
support diagrams with many classes without crashing itself or crashing our systems. So
we tried a variety of tools and in the end selected “SPARX Enterprise Architect”.
One of the most important qualities which led us to this choice was “CodeEngineering” which includes the ability to reverse engineer the code and extract
class/component model from it. It creates a UML diagram after examining the source
code and libraries and synchronizes it with any change in the source. It also allows us to
view the source and the diagram by “directory” or “namespace/package” views.
Moreover saving the diagram as an image file is pretty easy.
4. Recovery Process:
1. For Pegasus:
Static analysis of Pegasus was performed using Sparx Enterprise Architect.
Enterprise Architect parsed the 140,000 lines of code in Pegasus and recovered
slightly over 700 classes. It generated a class diagram for every package in the Java
software. These package-level diagrams were combined together into one top-level
diagram so the whole software could be visualized as a whole.
A. Recovery and Rationale : Navigation through the top-level diagram of Pegasus
was too cumbersome with Enterprise Architect, so the FOCUS rules were first
applied to classes in the lowest level packages of the software. To perform FOCUS
on the lower level packages, related external classes were brought into the diagram
to ensure that relationships were not destroyed when combining classes. Once
simplified, the lowest level package diagrams were combined into diagrams at the
level of their parent packages. FOCUS was then applied on these diagrams. This
combining operation was iterated through several times, until the highest level
packages were reached.
We had to make several decisions as to which pieces of the software should be
combined into components and which should remain separate. In order to maintain
this fidelity, we consistently followed a few principles:

Exception classes were removed from the diagrams entirely, as long as the
class throwing and receiving the exception had an association.

If two classes were generalizations of the same interface, but did not have
any associations in common, they remained separate. The interface that
the two implemented was removed from the diagrams. We did this
because Pegasus had a few interfaces with no functionality and were only
for grouping.
Most components associated with the Logging, PegasusProperties and
PlannerOptions classes. In the end, we decided to remove all these associations
from the end diagrams to simplify the end product. The PegasusProperties and
PlannerOptions are classes that consist of many static configuration variables, and as
such, did not represent a component. The Logging is a component, but associations
with it were removed to make the end product less complicated.
B. Components :
The recovered components of Pegasus can be seen in the
/analysis/final/Pegasus/ folder. These components were then “shoehorned” into
the five-layer grid architecture which is visible in the /analysis/shoehorn/Pegasus
folder. The two layers that ended up with most of the components are the
Application and Collective layers. The Collective layer contains items that perform
data replication, workload management, grid-enabled programming systems, and
directory services. These capabilities were present in components in our system
such as SiteSelector, SiteCatalog, ReplicaCatalog, and GridStart. The Application
layer consists of user applications and support for running applications on the grid.
A large portion of Pegasus is a representation of running applications on other grid
infrastructures. Thus, we put the portions of Pegasus that support running other
applications at the application layer.
C. Connectors :
Pegasus contains a multitude of different connectors. Two of the most prevalent
connectors in Pegasus are remote procedure calls and streams. Pegasus prepares
submission files to execute jobs on other grid systems. The GridStart component
calls other grid systems to execute the jobs remotely by making a remote call of the
other grid technology. Pegasus also relies heavily on stream based connectors. In
order to perform all of its planning, Pegasus reads in many configuration files.
These can be streamed in from files, or in some cases from other sources such as
network sources or a database.
D. Style Identification
One of the prominent architectural styles in Pegasus is Pipe and Filter. Pegasus
relies heavily on parsing files which describe both the workflows that need to be
executed and the resources available to use in their execution. Components such as
the DAX, SiteCatalog and Visualization parse input files or streams from other
applications to retrieve the information needed. Thus Pegasus relies on other
programs or user-generated files to produce a stream of information about jobs and
resources. The files or streams passed in are the pipes which connect to Pegasus
components that act as filters.
Another architectural style present in Pegasus is the Client-Server style. Pegasus
generates workflows and submits them for execution to other grid technologies such
as Condor. Pegasus is executing a remote procedure call over the network to
execute the job. In this way, Condor, the server, accepts requests from the client,
Pegasus.
E. Identified Discrepancies
After “shoehorning” Pegasus into the 5-layered grid architecture, it was clearly
visible that Pegasus contains many discrepancies from an ideal layered architecture.
Among these, there are at least 16 instances of crossing 2-layer boundaries for a call.
There are also 11 instances of lower layer components calling upper layer
components, resulting in an up-call.
2. For Sun Grid Engine :
5. Difficulties faced:
5.1 Selection of right tool:
We tried many a tools like IBM Rational Rhapsody, IBM Rational Rose, SciTools
Understand, PragSoft UML Studio, ArgoUML, Softwarenaut, DoxyGen, IntensiVE. Of
all these Doxygen and Understand seemed to be most promising but were still limited
in as what we wanted to achieve using them.
None of these tools seemed to be a good fit for our group and grid technology. The
following is a list of some of the problems we encountered and the products we
encountered them with.
1. Unable to install the program (IntensiVE). Prompted to install “Smalltalk” first
and then repeatedly gave “Peer/Network” error messages while trying to access
some online repository.
2. Required version not available for trial/demo (Rational Rhapsody). Also there
was a license issue for which we had a talk with IBM Support which wasn’t
helpful at all.
3. Program crashed because they are unable to handle the large number of classes
in the grid technology (Understand).
4. Inability to understand all relationships present in the code (Understand)
5. Limited support for one of the two languages (Rational Rose Java, ArgoUML)
6. Inability to view the architecture as a whole with all classes on one diagram
(UML Studio)
7. Inability to parse some source files. (Rational Rose Java, ArgoUML)
Insufficient documentation, as on how to use the program for static analysis
(Softewarenaut, IntensiVE). This was one of the most confounding situations as
documentation for the tools was very rare except a few.
5.2 The complexity of the diagram:
It was pretty difficult to view the diagram on the screen as a whole and work with
it. Also trying to save it as an image file was tough as it reduced the quality to such
an extent that the components’ description was rendered useless and unreadable.
Also the complexity of the resulting diagram resulting from the source code
which included a dozen packages, scores of classes and functions did not help
much to reduce the problem of working with the diagram.
5.3 Issues with Pegasus:
There were many difficulties in recovering the architecture of Pegasus. Though
Enterprise Architect allowed us to reverse engineer the large easily, it did have a
few shortcomings that hampered our ability to properly recover the architecture.
The tool failed to identify many relationships in UML properly. The tool
generated a plethora of dependency relationships and did not identify any
relationships as aggregation or composition, so in order to determine what the
actual relationship was; the code was analyzed in Eclipse. In some cases, a class
had an internal class that it creates only for its own use. This should be a classic
composition relationship, but the tool labeled it as association. The relationship
mislabeling could only be corrected by manually looking at the source code and
trying to determine the appropriate relationship. Because of this difficulty in the
tool, it is likely that we misinterpreted some relationships while performing our
analysis.
Our group also had problems with getting Enterprise Architect to render
all of the diagrams appropriately. When more than one relationship existed
between 2 classes, the tool would draw the relationship lines separately even if it
was the same kind of relationship. This added extra relationship lines to the
recovered UML diagrams which made visually analyzing the diagrams more
difficult. When performing the reverse engineering, the recovery software
generated a class diagram for every package. In order to get a holistic view of
the software, all of the diagrams were combined together by copying elements
and pasting them into a top-level diagram. If multiple elements were pasted into
a diagram, the software would not place them all in the diagram reliably,
resulting in the top-level diagram not having all of the classes in it. The
incomplete diagrams also made visual analysis more difficult. Unfortunately, we
discovered the problems with Enterprise Architect as we did the recovery. Some
of them were not observed until we had invested a large amount of time
applying FOCUS. Had we known about the tools abilities and inabilities sooner,
it may have affected our decision to use Enterprise Architect, and we may have
used a different tool entirely.
The software structure of Pegasus itself was another source of difficulty in
recovering the architecture. The software contained many elements that shared
the same name, but had different functionalities. There were many elements just
named “Abstract,” so their true functionally could only be discovered in context.
Another element of confusion was the use of interfaces within the software. A
few interfaces were created simple to group a bunch of elements. These
interfaces provided no methods, and according to the code comments they exist
for grouping purposes. Several of the realizations of these interfaces seemed to
have nothing in common, which lead us to not combine them into one element.
Although the structure and naming of the Pegasus software provided some
difficulty, the source code itself actually helped our recovery process. Roughly
two-thirds of the source code was documented with many JavaDoc-style
comments. Since inabilities in the software analysis tool often required us to look
at the source code to determine relationships, the comments allowed for quicker
analysis of the code. The source code was also already structured as a project in
Eclipse; opening the project with Eclipse made determining relationships among
the code easier because it allowed for easily finding hierarchy of class
implementations and method calls.
5.4 Issues with Sun Grid Engine:
First big problem was that it does not support adding the whole source code at a
time if the source based on more than one Programming language. For Java, It
generates separated diagrams based on the package while for C++ based on
namespaces. However both diagrams could also be generated based on class-wise
or directory structure.
EA could not find the duplicated classes if any and many stand-alone classes
were found which were left dangling in the intermediate diagram. Some Classes
does not have any comments, descriptions or others have very vague names. Many
Relationships were missing especially the ones form C++ classes to Java classes so,
the code and documentation were checked and analyzed repeatedly to identify
missing relationships if any. Many Relationships were identified wrong i.e. they
should have been generalization but they were shown as associations.
Download