Dale Allen Fletter
B.A., Illinois Institute of Technology, 1990
Submitted in partial satisfaction of
the requirements for the degree of
© 2010
Dale Allen Fletter
A Project
Dale Allen Fletter
Approved by:
__________________________________, Committee Chair
Cui Zhang, Ph.D.
__________________________________, Second Reader
Robert Buckley, M.S.
Student: Dale Allen Fletter
I certify that this student has met the requirements for format contained in the University format
manual, and that this project is suitable for shelving in the Library and credit is to be awarded for
the project.
__________________________, Graduate Coordinator
Nikrouz Faroughi, Ph.D.
Department of Computer Science
Dale Allen Fletter
Production software systems often deviate from their intended architectures and sometimes completely lack a comprehensive, well designed and documented architecture. Effective design methods exist to re-engineer an existing system for new or omitted functional and quality requirements
using the architecture. But without knowledge of the system's architecture, it is necessary to first
perform a software system architecture recovery to understand the as-built architecture before
applying these design methods. Most research in this area focuses on the creation of tools for use
in semi-automated architecture recovery but is of little use for practitioners. This MS project developed a manual methodology that focuses on the specific activities of architecture recovery
aiming at achieving maximum efficiency. A method, called Software Architecture Recovery
Method (SARM), is presented, accompanied by a case-study of that methodology on a mediumsized ( nearly 300kLOC) website accessibility assessment tool that employs a heterogeneous code
base (OO Python, OTS components, Java).
__________________________________, Committee Chair
Cui Zhang, Ph.D.
I have made this letter rather long only because I have not had time to make it shorter.
Blaise Pascal (1623-1662)
Clarity is our only defense against the embarrassment felt on completion of a large project when it
is discovered that the wrong problem has been solved.
C.A.R. Hoare.
This project had its origins in the modest intent to create a database of web accessibility metrics
as part of a course on software metrics. It was in the literature research that I discovered the European Internet Accessibility Observatory (EIAO). Since their work was far more extensive than
my modest intent, there was no point in creating an inferior competitor. My next thought was to
port it to a California Internet Accessibility Observatory (CIAO). While the resulting information
may have been interesting it did not serve my needs to complete a project sufficient for my degree. Yet I did not want to abandon a piece of work that I admired and saw as an important contribution to internet accessibility.
My requirement for my graduation was to complete a project. The California Code of Regulations: Title 5 Education, Section 40510 defines a project as:
“…a significant undertaking appropriate to the fine and applied arts or to professional fields. It evidences originality and independent thinking, appropriate form and organization, and rationale. It
is described and summarized in a written abstract that includes the project’s significance, objectives, methodology, and a conclusion or recommendation.”
On the recommendation of my advisor I reviewed the system for its use in a demonstration of the
techniques of Carnegie Mellon’s Software Engineering Institute (SEI). Given the size and complexity of the EIAO system, even a minor enhancement to that system was a significant undertaking and worth of a Master’s Project.
My original intent was to extend this work to make it applicable to the original intent of EIAO but
specifically for the needs of assessing California university web sites for web site accessibility.
No one is currently maintaining the EIAO implementation and the primary architects and developers are now engaged with eGovMon creating an opportunity to continue on this tack. But since
there were inadequacies in the EIAO system that needed to be resolved before eGovMon could
proceed, that team made significant improvements in the EIAO code base before beginning their
more ambitious project. I was advised by those researchers to use the eGovMon code base for
my purposes and began implementing a copy of the eGovMon code base in Spring 2010. The system was of sufficient complexity and insufficiently robust to be easily ported without considerable support from that team. With their help I was able to port it to my own platform but quickly
realized that extending it for the purpose of creating CIAO was a larger project than I wanted to
tackle while also largely lacking in the academic challenge that would justify a Master’s Project
(it consist of more than 236,000 lines of code). Even to do that I found it necessary to reverse engineer significant portions of the downloadable package in order to determine how to diagnose
errors that arose. In time understanding the underlying architecture became the focus of my work
and the subject of the case study.
In the literature the phrase “architecture reconstruction” is found more often than “architecture
recovery” although in context they mean the same thing. However for this project the phrase “architecture recovery” has been favored over “architecture reconstruction.” In his book, Software
Architecture in Practice [Bass 2003], Bass states “…every computing system with software has a
software architecture”. In keeping with this view, that every system has an architecture separate
from its representation, I believe the proper term for the activity which creates a representation of
this architecture “recovery” and not “reconstruction” since we are attempting to construct or reconstruct the representation, not the architecture. The architecture will remain the same regardless
of the representation we may document. In deference to SEI, I have used “reconstruction” in reference to their work in this project.
Architecture recovery is not just architecture documentation. Documentation is creating the representation of the architecture when knowledge of its structures is known and understood in relationship to the quality needs placed upon the designers and implementers. Architecture recovery
is the rediscovery of those structures, perhaps the original needs of the stakeholders and the interrelationships between them. Since the architecture exists independently of its representation and
since every system has an architecture, this process of rediscovery makes it more challenging
than choosing the suitable representations. It requires the skills that were needed when the system
was designed and built since only someone who could have designed the system is capable of
understanding the choices faced by the designer and the tradeoffs that they made. To do this in
the absence of a written product beyond the source code is a severe challenge.
This project is primarily an elaboration of the methods of SEI for the reconstruction of the architectural views of a system under study with a case study based on that elaborated methodology.
The primary references in this project are two books published under the SEI Series in Software
Engineering: Software Architecture in Practice, 2nd ed [Bass 2003] and Documenting Software
Architectures: Views and Beyond [Clements 2003]. My intent was to create a case-study of a sig-
nificant, but manageable system which had sufficient complexity to demonstrate the power of
these techniques in a practical setting.
To look at the finished products of these academic projects suggests that they primarily spring
forth from the minds of the authors. But to do one shows how interdependent the academic community is and how much of a contribution is made by people who receive minor credit or sometimes no credit at all. In my case any credit for this work must be shared with my advisor, Dr. Cui
Zhang, for without her encouragement and enthusiasm for the discipline of software engineering I
doubt I would have persevered. Robert Buckley also provided a great deal of inspiration during
my program and some perspiration in reviewing this manuscript. I must also provide a sincere
thank you to Morton Goodwin Olsen of the eGovMon project who was more than gracious in
helping me through the early stages of porting their system to my environment. Without his answers, this project would not have been possible. Of course no one with a spouse can ignore their
contributions of tolerance, support and encouragement through the many ups-and-downs of a project like this.
Preface ............................................................................................................................................ vi
Acknowledgments............................................................................................................................ x
List of Figures ................................................................................................................................ xii
1. INTRODUCTION ...................................................................................................................... 1
2. BACKGROUND ........................................................................................................................ 4
3. LITERATURE REVIEW ........................................................................................................... 6
5. THE EGOVMON CASE STUDY ............................................................................................ 27
6. CONCLUSIONS....................................................................................................................... 58
Appendix A. A Partial Recovery of the eGovMon Software Architecture .................................... 60
Appendix B. Pre-Existing Architecture Documentation for EIAO ............................................. 141
References .................................................................................................................................. 145
Figure 1 (Fig from O'Brien 2003-b, p2)............................................................................ 15
Figure 2 SARM Diagram .................................................................................................. 18
Figure 3 O'Brien's Element Types and Relations ............................................................. 23
Figure 4: Scope Diagram .................................................................................................. 32
Figure 5 from .................................... 33
Figure 6 The EIAO Architecture (from 34
Figure 7 Static relationships for Crawler .......................................................................... 56
Chapter 1
Some large software systems can remain in production for many years, even decades. We know
that maintenance and upgrades are a significant component of the total cost of ownership over the
product life-cycle. Given the amount of money spent on the re-engineering of these long-lived
systems, it is interesting to look at the tools we have to perform these activities effectively and
efficiently. We already know about the benefits of using an architecture-centric approach for the
initial design and implementation. But we often err in not following that approach, not implementing according to that architecture or allowing the production system to depart from that architecture plan over time. When we start a significant re-engineering effort on a system where
the as-built architecture deviates from the as-designed architecture or where there is no comprehensive or well designed architecture at all, we cannot use an architecture-centric design approach for the re-engineering without first reconstructing (recovering) the as-built architecture of
that system. This project focuses on the methodology that a practitioner can use to methodically
recover that as-built architecture to enable the use of architecture-centric design during the reengineering of the system.
An architecture-centric design approach can ensure that the development can more predictably
meet its non-functional (quality) goals as well as the desired features and functions. [Bass 2003]
If a maintenance team has a well documented and accurate architecture for the system to be modified, they can approach the maintenance as an extension of the original development. When that
architecture is not available, the team must first recover the architecture of the system as-built.
Recovering the architecture from a production system that may be lacking in current architecture
documentation is challenging. Any team that will tackle this challenge must have an approach lest
they find themselves spending too much time on low-value activities. A clear plan and a well understood methodology for how the project will be approached can dramatically improve the performance of the team. This project presents such a methodology to structure the team effort and
coordinate their effort to achieve improved efficiency.
The term software architecture is new enough to have no generally agreed upon definition. Before
discussing any topic that involves software architecture in a rigorous manner, it is necessary to
define the term. While more generally understood, even methodology needs to be understood if
there is to be an understanding of what we hope to achieve by creating one for this purpose.
These, and other topics, are elaborated in Chapter 2, Background, which should give the reader a
sufficient base to understand the remainder of the work in this project.
This project is not the first to look at the activities and methodology of software architecture recovery. Carnegie Mellon’s Software Engineering Institute has published many papers on the subject and included it in some of their most popular books [Bass 2003], [Clements 2003], [Ivers
2004], [Kazman 2003], [Nord 2009], [O’Brien 2002], [O’Brien 2003-a], [O’Brien 2003-b]. The
prior research has focused on the academic issues of attempting to automate the chore of extracting information from the source code of the system. Their work is discussed in some depth in
Chapter 3, Literature Review.
The intent of this project is to propose a novel methodology for software architecture recovery.
This methodology is presented in Chapter 4. How this methodology differs from the prior methodologies is explored. The activities, their relationships and the artifacts produced are described.
To show that this methodology has practical benefit, it is used in a case-study to recover the software architecture of a real system. The system under study is a medium-scale web site accessibility assessment system that was developed in Europe. The first implementation of the system was
published as EIAO (European Internet Accessibility Observatory). It was used to gather evidence
of website accessibility across the EU and various industry categories. [Nietzio 2006], [UlltveitMoe 2008] At the end of that project, the researchers shifted to a focus on measuring other aspects of government websites in addition to accessibility. This new project was called eGovMon
(eGovernment Monitoring) These attributes include transparency, efficiency and impact. The
starting point for their eGovMon project was the end-state of the EIAO project. The specific version that is used for this case study is a version from early in the development of the eGovMon
system, after some of the deficiencies of the EIAO system were addressed but before the
measures for transparency, efficiency and impact were added. The case study is presented in
Chapter 5 and the partial architecture documentation is contained in Appendix A.
Chapter 2
While the functionality, size and complexity of software systems continue to grow, so too do the
qualities we expect of those systems, qualities such as performance, usability, availability, testability, modifiability and security. Mere functionality is no longer sufficient, if it ever was. Many
academics have turned their attention to the challenge of engineering in the qualities that we need.
If a piece of embedded software is to become part of a fly-by-wire system in an airplane it cannot
have the dependability quality common in consumer software products but must have a meantime-to-failure rating that runs to at least many decades before we feel comfortable stepping onto
that plane.
Significant work in finding ways to engineer qualities into software systems has been done at
Carnegie Mellon's Software Engineering Institute (SEI) among others. [Bass 2003], [Clements
2003], [Stoemer 2007]Their work depends upon the articulation of a system's architecture as an
early artifact of the system design process so that there is assurance in the design process that the
system will meet some understood level of quality in the final product. This saves the design team
from the fruitless effort of trying to tack some quality onto the final design. The articulation of a
well designed and well documented software system architecture provides the first tangible artifact of the system that can support reasoning about the design's ability to deliver the intended
SEI and others have found ways to incorporate the architecture into the overall design process
while a system is being built. But not every system is developed with this kind of forethought or
with a well documented architecture. It is not as much that the system lacks an architecture, (all
systems have an architecture, no matter how bad) but that it may never have been clearly documented or followed by the implementers. Many successful systems are running today that may
have little or no current documentation of the architectural decisions that were made when the
system was first built. Consequently the maintainers of those systems make decisions in a vacuum and often begin to deviate from what the original architect may have intended. There is even a
term for this form of entropy - architectural rot.
Now that we are seeing major systems that have lifetimes that are measured in decades, we are
faced with the dilemma of how to perform major overhauls of these systems while maintaining
the quality in the original product and improving other qualities while adding new features. Many
teams faced with this challenge have opted to ignore the current system and develop from scratch.
But as the scope and complexity of these systems grows, this becomes progressively more infeasible. Re-engineering and not re-building is becoming more common.
Where a development team has a system which adheres to a well designed and well documented
architecture, it is possible to step back into the development process that was used for the original
development. But where the architecture has been lost or when architectural rot has caused the asbuilt system to deviate seriously from the as-designed architecture, the team has no choice but to
either approach the project as new development or to find a way to recover the as-built architecture before proceeding. It is this second challenge, the recovery of the as-built architecture that
this project addresses. It looks at the ways that the development team can take the current system
and attempt to recover a usable interpretation of that system that can support its re-engineering
and avoid the process of developing the system again.
Chapter 3
The most visible voices in the area of software architecture can be found at Carnegie Mellon’s
Software Engineering Institute (SEI). A canonical text for software engineering is Software Architecture in Practice [Bass 2003]. In it, the authors give an overview of the topics which are of
interest to this project including documenting software quality requirements, the evaluation of
software architectures and the reconstruction of software architectures.
Quality Requirements
A significant concern of architects is ensuring that the implementation of their designs exhibit the
qualities desired by their clients. The basic thesis of SEI is that it qualities, specifically the nonfunctional qualities, give shape to the architecture. Therefore an important consideration of the
design process is the capturing of the client’s quality requirements in a way to supports the architecture design process as well as the subsequent design decisions.
The starting point for how to gather and document the quality requirements for a software product
is the Bass text [Bass 2003] chapter 4 titled Understanding Quality Attributes (co-authored with
Felix Bachmann and Mark Klein) In it they provide a way of documenting the quality requirements by documenting specific concrete Quality Attribute Scenarios. Each scenario is a qualityattribute-specific requirement and consists of these 6 parts:
source of stimulus (human, computer system, time event or other actuator)
stimulus (condition that needs to be considered)
environment (the overall state such as an overload condition, normal operation, shutdown, etc)
artifact (that part of the system stimulated by the stimulus, usually some component of
the overall system)
response (the activity and perhaps message or action that results from that activity)
response measure (some appropriate metric for this scenario)
One advantage of this approach is that it avoids the discussion of quality attribute taxonomies. As
long as everyone agrees that the scenario captures the desired quality of the system, it doesn't
matter how it is categorized. The authors discuss general scenarios for the six quality attributes in
their own taxonomy: availability, modifiability, performance, security, testability, usability.
Architecture Evaluation
A primary motivation for articulating the software architecture as an early step in the design process is to enable reasoning about the qualities that the system implemented from this design will
exhibit. This architecture evaluation gives leverage over the later design decisions.
Bass [Bass 2003] presents a 9 step architecture analysis method that he calls Architecture
Tradeoff Analysis Method (ATAM). Those steps are:
Step 1: Present the ATAM
Step 2: Present the Business Drivers
Step 3: Present Architecture
Step 4: Identify Architecture Approaches
Step 5: Generate Quality Attribute Utility Tree
Step 6: Analyze Architectural Approaches
Step 7: Brainstorm and Prioritize Scenarios
Step 8: Analyze Architectural Approaches (again)
Step 9: Present Results
In addition to Bass’ ATAM a more recent doctoral dissertation by Christoph Hermann Stoermer
titled Software Quality Attribute Analysis by Architecture Reconstruction (SQUA3RE) [Stoemer
2007]offers an alternative to evaluation of a designed architecture by performing quality analysis
in parallel with the reconstruction of a system’s architecture.
Documenting Software Architecture
To enable ATAM or any form of architecture analysis requires some artifact that can support the
analysis. The researchers at SEI are the most important voices on this topic and the book, Documenting Software Architectures by Clements et al [Clements 2003] is one of the dominant texts
to address this topic. It lays out the SEI recommendations for the presentation of software architecture.
As Clements argues, while a pictorial representation alone is insufficient to adequately communicate the full meaning of any given architectural view, it is often the primary presentation of that
view with text used to elaborate and extend the pictorial representation. The most dominant form
of graphic communication for software is Unified Modeling Language, UML. While SEI does not
proscribe the way architecture information should be displayed graphically, it does require a legend when it is not a standard such as UML. UML provides all the support needed for all the module views except layers. But for the Component-and-Connector views UML has distinct disadvantages that must be overcome.
One problem in Component-and-Connector views is that the connector itself can have sophisticated behaviors. UML only supports a connector which cannot itself hold the required information. Both Bass and Clements [Bass 2003], [Clements 2003] discuss some alternative ways of
using UML 1.0. Additional methods are presented in the paper Documenting Component-andConnector Views with UML 2.0. [Ivers 2004] The basic choice is to either represent the connector as an association, a class or a stereotyped class in UML 2.0.
Review of Architecture Reconstruction Literature
Reconstructing architecture from an existing system can be a difficult process. The hazards and
barriers to be found in the process are named in The Perils of Reconstructing Architectures by
Carriere and Kazman. [Carriere 1998]. One significant barrier is the potential for the actual architecture to differ from the intended architecture. The qualities intended by the architectural decisions are only achieved if the system is implemented as designed.
“Beyond its importance for measuring architectural conformance, software architecture
reconstruction also provides important leverage for the effective reuse of software assets.
The ability to identify the architecture of an existing system that has successfully met its
quality goals fosters reuse of the architecture in systems with similar goals; hence architectural reuse is the cornerstone practice of product line development.”
They also observe the insufficiency of static information derived from lexical analysis alone to
fully understand the architecture:
“Unfortunately, system models extracted using these techniques provide a minimum of
information to describe the run-time nature of the system. The primary factor contributing to this deficiency is the widespread use of programming language features, operating system primitives and middleware functionality that allow the specification of many
aspects of the system's topology to be deferred until run-time.”
These aspects include “polymorphism and first-class functions (including approximations such as those provided by C and C++); operating system features such as proprietary socket-based communication and message passing; middleware layers such as
CORBA.” [Carriere 1998]
They also mention some other common obstacles such as non-compilable code, missing source
elements, language dialects, and obscure or non-reproducible hardware platforms. All these factors contribute to the difficulty of a tool-assisted reconstruction. They suggest that ATAM can
serve as a structured way to elicit architectural information besides being a way to review and
analyze an architecture.
Guo, Atlee and Kazman wrote about a semi-automatic method to extract software architecture
from source material [Guo 1999] that used a tool called Dali. Guo says “An architecture recovery
method defines a series of steps, and the pre/post conditions for each step, to guide an analyst in
systematically applying existing reverse engineering tools to recover a system’s architecture.”
[Guo 1999, p7]
They noted that “it is difficult to use these methods to recover architectures that are designed and
implemented with design patterns. As design patterns are described as well-defined structures
with constraint rules, a pattern-oriented architecture recovery method must incorporate the design
pattern rules as well as structural information such as the system decomposition hierarchy.” They
propose the Architecture Reconstruction Method (ARM) “a semi-automatic analysis method for
reconstructing architectures based on the recognition of architectural patterns.” Their method
consists of four major phases:
Developing a concrete pattern recognition plan
Extracting a source model
Detecting and evaluating pattern instances
Reconstructing and Analyzing the architecture.
The first phase, developing a concrete pattern recognition plan, consists of defining the patterns
that are expected (or hoped for) in the source material in a form that supports pattern recognition.
Since these researchers are using Dali, their patterns are specified using Rigi Standard Form
The second phase is extracting information from the source model. Detecting and evaluating pattern instances is the mining of the database using the pattern templates to detect the source element and relations that realize the patterns.
Reconstructing and analyzing the architecture is the process of providing the artifacts which express the design patterns found in the system, usually as a visual presentation using the Rigi tool.
In the paper Architecture Reconstruction Guidelines, Third Edition [Kazman 2003], the authors
not only elaborate on the text above but also discuss some of the tools that can, and usually
should, be employed when approaching a large-scale reconstruction project. For data extraction
these tools include parsers (Understand for C/C++/Java, Imagix, SNiFF+, C++ Information Abstractor aka CIA, rigiparse), abstract syntax tree (AST)-based analyzers (Gen++, Refine), lexical
analyzers (Lightweight Source Model Extractor), profiles (gprof) and ad-hoc tools like grep and
Perl. The authors observe that these tools are limited to extracting static information about the
code. Because of late-binding due to polymorphism, function pointers or runtime parameterization and other mechanisms, the dynamic structure of the system might not be reconstruct-able
without additional information that must come from other tools such as code instrumentation, and
such tools may not even be available in some situations such as embedded code.
The Kazman paper also discusses the burdens of storing and organizing the large amount of information that can result from the information extraction phase of a large system. As they observe, “…with the large amount of software in most systems, it is nearly impossible to perform all
architecture reconstruction activities manually.”[ibid] In particular, this paper talks about a tool
developed at SEI called Architecture Reconstruction and Mining or ARMIN which was constructed with the intended purpose of assisting with the architecture reconstruction of large systems.
In chapter 10 of the Bass text [Bass 2003, pp 231-259] the authors address architecture reconstruction. In it, they present four activities that are executed iteratively:
Information extraction
Database construction
View fusion
In their approach it is assumed that several tools are employed to extract the available relationships among the components in the source code. In this text they specifically mention the Dali
workbench that was developed at SEI.
Their next step includes the conversion of the information extracted by each of the tools into a
standard form such as Rigi Standard Form (a tuple-based data format in the form of relationship
<entity1> <entity2> and the storage of this information into a database for later retrieval and
Due to the inherent limitations of lexical analysis, the information extracted using different tools
will result in incomplete lists of the elements and relations. Therefore a needed activity is the
manual analysis, verification and reconciliation of these various “views” of the data. (The view of
the system as given by these tools should not be confused with what these authors define as architectural views earlier in the text.) The authors call this activity view fusion.
The final activity is called reconstruction. It takes the improved database from the view fusion
activity and seeks to abstract away the non-architectural information to support the expression of
the true architectural views which will document this architecture. The most important part of this
activity is the pattern matching wherein the people involved in the reconstruction map the elements and relations of the source material to the known and recognized patterns used to achieve
the desired qualities. Since their approach assumes the use of a workbench their reconstruction of
elements consists of language statements that define the elements in relation to the database. For
the reconstruction activity, they offer several practical guidelines. Briefly they include iterating
the reconstructed architecture with the system’s architect, being judicious in what is brought into
the architecture (do not list every source element), using naming conventions and directory structures to infer intended structure, and using a guideline which emphasizes the need of personal
knowledge with the product. “As reconstruction proceeds, information must be added to reintroduce the architectural decisions which introduces bias from the reconstructor and thus reinforces the need for a person knowledgeable in the architecture to be involved.”[ibid]
Kazman, O'Brien and Verhoef published a paper titled Architecture Reconstruction Guidelines,
Third Edition in Nov 2003 [Kazman 2003] in which they present their guidelines for an architecture reconstruction project. In Architecture Reconstruction Guidelines, Third Edition [Kazman
2003], the authors characterize software architecture reconstruction as “the process where ‘asbuilt’ architecture of an implemented system is obtained from an existing legacy system.”
Those guidelines are:
articulate a goal which explains why the organization is performing the reconstruction
obtain a high-level architectural view of the system before beginning the detailed reconstruction process
use existing documentation
involve people familiar with the system
assign someone to work full-time on the project
“Use the ‘least effort’ extraction. Consider the kind of information that needs to be extracted from
a source corpus and choose the most appropriate tool. Is the information lexical in nature? Does it
require the comprehension of complex syntactic structures? Does it require some semantic analysis?” [Kazman 2003]
As Kazman observed “…with the large amount of software in most systems, it is nearly impossible to perform all architecture reconstruction activities manually.” [Kazman 2003] For even a
moderate system, the amount of data to be extracted and managed is significant.
In Architecture Reconstruction of J2EE Applications: Generating Views from the Module
Viewtype, O’Brien el al [O’Brien 2003-b] presents the process diagrammatically. This can be
seen in :
Figure 1(fig from O'Brien 2003-b, p2)
In Quality-driven software re-engineering, Thavildari et al [Tahvildari 2003] talk about a model
analysis phase which focuses on documenting and understanding the architecture and the functionality of the legacy system. In total, their re-engineering framework (life-cycle) consists of
these phases:
requirements analysis
model analysis
source code analysis
Jansen also addresses how architecture design decisions can be recovered after the fact (Jansen, et
al., Aug 2007)
Chapter 4
For small systems, there is rarely a need for a distinct methodology to reconstruct the architecture. In those cases, direct examination of the system artifacts such as source code, operational
instructions and informal system descriptions is sufficient to gain an understanding of the intended architecture. But for large systems the amount of source code creates a burden on individuals.
For these larger systems it becomes necessary to approach the extraction of information in a methodical fashion and frequently with the use of some tool support.
By definition a methodology must lay out a series of steps which while necessary are not sufficient for the successful achievement of the task to be performed. The steps which must be accomplished to recover a software architecture are:
Gather Evidence
Create Views
As shown in Figure 2, these steps are iterated until a satisfactory outcome is achieved. Also, it is
common for these steps to occur in parallel in the early iterations with shifting attention based on
the availability of the resources and schedules and reflecting the investigatory nature of the task.
In most cases, these activities will be performed iteratively until a satisfactory result is achieved.
Usually, the last iteration will perform these steps one last time in this order.
Figure 2 SARM Diagram
Like any project, initial planning is important. Also like most projects, things will happen that
will require an adjustment to that plan. This methodology does not require many artifacts from
the planning activity but it does require a few. There are two vital documents required for the initiation of this sub-project; the initiating project’s charter and the architecture drivers in the form
of quality requirements. In addition, there are some basic items that must be in place before the
main work of this effort should be allowed to start. These include:
identification of the stakeholders of this project
the scope of study
the results of investigation into those tools that could be used for the work
some ground rules for the decision making process that will conclude the project
A key decision point in the project regards the architecture documentation presented to the stakeholders. Those stakeholders must judge when the representation of the as-built architecture is sufficiently detailed, accurate and complete enough to support the needs of the greater project. There
is no oracle possible for this decision; it must be the collective decision of the stakeholders.
Therefore knowing the decision makers and their needs is important if the recovery is to proceed
smoothly from initiation to conclusion.
The initiating documents of the initiating project should provide sufficient information to determine the proper scope for the recovery project. However the architecture will serve needs for the
stakeholders that go beyond the immediate need to re-engineer the system. The stakeholders may
depend upon the architecture for a many other purposes such as long-range enterprise architecture
planning or operational support. For these reasons and more, the scope of the recovery effort
should be considered separately from the initiating project.
As researchers have shown, the use of tools to assist with the static analysis can be helpful.
[O’Brien 2002][O’Brien 2003-1][O’Brien 2003-b] Appendix C shows some of the tools mentioned in papers that were reviewed for this project. But the choice of tools will depend upon several factors. Will the tool support the code base? How dependable are the results of the tool analysis? If the results will require a complete manual analysis to verify the tool results, little may be
gained. What are the costs of the tool in terms of both acquisition and training?
Gather Evidence
While academically less interesting, a key activity in the recovery of an architecture is the investigative work of finding helpful documents and collecting the memories from people with
knowledge of the system. People successful in this activity must possess both the skills of an experienced business analyst and knowledge of software architecture principles. The goal is to find
all evidence that will illuminate the design choices that were made in the design and implementation of the system under study and then document them in a way to makes them available to the
later activities.
This is a discovery activity and like any discovery is likely to proceed from the rapid and general
to the specific and labored during the course of the project. Some information, such as the source
code and any end-user documentation are likely to be easily secured. In some cases these may be
sufficient but often making the tie from the in-place system and the architecture patterns that had
been intended may prove elusive. This section of the methodology provides some practical advice
on avenues of investigation to consider.
In many cases the system in production was installed many years ago and personnel have
changed. Some may have been promoted to other positions in the company and others may have
left the organization. In some cases there are some people in the same roles. In an ideal engagement the original architect is available to the team. This would be the starting point for this project. In the planning step the team was introduced to the stakeholders. Identify which stakeholders
were also stakeholders for the original project. Ask each stakeholder if they have a copy of any
presentations that were made which presented the original architecture or can provide an introduction to whoever made that presentation.
An assumption of this methodology is that architecture recovery is never done in a vacuum. Any
effort to recover a software system architecture is driven by some larger purpose; most likely the
re-engineering of that system for new or altered requirements.
One motivation for the clear articulation of the architecture is to reason about the qualities of the
system implemented according to that architecture. These early design decisions ensure that focus
is placed on the most important qualities that must be achieved in the implementation and that the
proper tradeoffs are made in subsequent design decisions.
Even when little or no explicit documentation of the original quality requirements has been
found, there may be evidence which suggests the most valued quality requirements of the as-built
system. Regardless of whether the quality requirements are recovered from the original implementation or inferred from historic or contemporary sources, the recovery team should attempt to
capture them in a systematic way to aid in the documentation of the architecture. The justification
for capturing the qualities of the as-built system is to provide as sound a base as possible for the
subsequent analysis of that system.
Quality Requirements Documentation
The methodology used for capturing quality requirements is directly taken from Clements [Clements 2003]. In his methodology, the stakeholder quality requirements are documented using his 7
part scenario which includes the source, stimulus, artifact, environment, response and response
measure. Using these quality requirements, the architect can argue how the architecture supports
the requirement.
The requirements document from the original development for this system may be difficult or
impossible to find. Ideally the architecture will make explicit the relationship between the quality
(non-functional) requirement and the architectural pattern employed. If explicit documentation of
the original project cannot be found, then the interviews with people should include questions
about the requirements of the prior system and current requirements. Each stakeholder can potentially introduce someone with data previously unseen. While it may be impossible to always separate what was a requirement at the time of the original implementation and the current needs
which are driving the re-engineering, it should be done as best as possible.
One of the goals of an architecture recovery is to enable reasoning about the qualities of the system. Since the point of this reasoning during design time was to determine if the system built to
the given architecture will achieve the intended qualities. Insofar as it is possible, a complete architecture recovery will capture the original architectural drivers and recreate the reasoning of
how the (at the time) proposed architecture achieved the required qualities of the original stakeholders.
Almost immediately following the collection of any documentation, the analysis of that documentation can begin. The overall objective of the analysis activity is to ensure that enough information has been gathered to support the generation of the views. There will be some overlap between analysis and Generate Views since it is the views that enable some level of analysis of the
Analysis must verify that the elements and relationships in the system under study are collected.
The element types and relations between them in O’Brien’s [O’Brien 2002] work are shown in
Figure 3.
Relation Name
Source Element
Target Element
A class defines a function
A file contains a function
A file defines a class
A package defines a class
A file defines a global variable
A function defines a local variable
A file depends on another file
A class has a member variable
Figure 3 O'Brien's Element Types and Relations
This list can be used but at a minimum, the decomposition of the modules, the is-a relationships
of object-oriented systems and the uses relationships must be extracted.
During the course of analysis, it is possible that the team may discover discrepancies between the
as-designed architecture and the as-built architecture. There are many reasons why this might be
so but the reasons for the discrepancies are less important during recovery. The goal of the team
in documenting the architecture is to attempt to document as much as can be known both about
the as-designed and as-built architectures.
The first analysis done is the module level relationships. If good architecture documentation is
available on the system, this can proceed in a top-down fashion investigating the modules that
comprise a high-level module and the relationships that exist between those modules. When the
high-level modules have been documented to the extent possible, it will be necessary to proceed
bottom up.
In the absence of all other quality requirements, one that will be universally found is some level
of organization which enabled the system to be buildable. The development team needed to organize themselves to enable them to work together. Unless the number of modules was so small
that every member knew every module, they needed to be organized into a structure that enabled
each member to both work independently yet still be contributing to the overall design. The most
common way this is done is in a functionally hierarchical structure where the modules are leaves
in a tree of some height. Each major branch represents some abstraction of the leaves or other
braches attached to it.
In the O’Brien work, the need was only to reconstruct the static decomposition of the system under study. In this project the need is to reconstruct both the static and dynamic architecture. However the overall approach is still appropriate.
Generate Views
This activity is separated from Analyze since the creation of the formal views requires two skills
that are not needed in Analyze: visual communication and synthesis. This will work in concert
with Analyze since in performing the synthesis of the view questions will arise. If the information
cannot be found in the document repository then it may trigger further analysis or even more research for evidence. It is possible that there will be gaps which must then be noted in the final
version of the view.
Each view is introduced with a primary presentation. For most module and component-andconnector views this is a semi-formal diagrammatic presentation. Semi-formal means that every
pictorial element on the diagram has a specific meaning. For the module diagrams UML works as
well as any other notation. However for component-and-connector views UML offers some challenges as a semi-formal notation. This methodology does not advocate any specific method of
depicting the component-and-connector views. In the associated case study, the reader can find a
discussion of how this was resolved in that case.
A necessary skill for the person responsible for the creation of the primary presentation diagrams
besides an understanding of the semi-formal notation being used, is the ability to lay out the visual elements in a way to emphasize the underlying organization of the architecture pattern.
During planning, the form of the presentation can be negotiated. However the recommended going-in position should be SEI’s ATAM. It is a well articulated methodology of its own. For an
understanding of the ATAM the reader is directed to the Background section of this paper (Chapter 3, Literature Review for Architecture Evaluation). But before the ATAM is initiated with the
stakeholders, the team should review the intended presentation to ensure it meets their internal
quality standards. Better to cancel an planned presentation than alienate the stakeholders with a
wasted meeting.
The primary reason for this presentation is to determine if the architecture is sufficient for this
project. As mentioned in the section on planning, the recovery of an architecture is purpose driven. It must support the needs of multiple stakeholders and they are the final arbiters of its success.
If the architecture fails to satisfy some stakeholder, the shortcoming must be noted and an understanding of what can be done to correct it must be discussed at the meeting.
If the decision is made that the architecture as presented is sufficient, the team must produce the
final version and distribute it.
The Document Repository
While not part of the methodology, it cannot function without a document repository. The ultimate document will have many authors and include many exhibits which will be independently
Chapter 5
A Brief Overview of Website Accessibility
The basic paradigm of website accessibility is the concept of barriers to accessibility. The standards set by World Wide Web Consortium (W3C) in the Web Content Accessibility Guidelines
(WCAG) and as articulated in US law, list the barriers that are to be avoided in the creation of
websites but no quantitative method exists to unambiguously assess compliance to these guidelines since many require human judgment. The most oft cited example is the descriptive text (alttext) for an image. The guidelines recommend that this text provide a suitable description of the
image to allow someone without access to the image to make use of the page, to the extent possible without it. The W3C has worked over the years to create the Unified Web Accessibility
Methodology (UWAM) which has the objective of quantifying accessibility where possible as
well as providing a recommended methodology for the evaluation of websites by different organizations.
Many organizations have an interest in accessibility for a variety of reasons. However governments in the US and abroad are intensely interested in assessing their own service to the public
for both humanitarian and legal reasons. In 2004 a project was co-funded by the European Commission (EIAO publishable final activity report project no 004526 to create a
system which would apply UWAM to European websites to assess differences across national
and industry segments. This effort was dubbed European Internet Accessibility Observatory or
EIAO. The researchers on that project presented their work and went on to a new project called
eGovMon seeks to expand the attributes of the website assessment to include impact, transparency, and efficiency as well as accessibility. But while the attributes to be assessed are expanded,
their focus is now on the information and services provided by municipalities in Norway. They
began with the EIAO system and addressed some of the most immediate issues facing that system
and then began their work on the eGovMon system in late 2009/early 2010. More information
about the eGovMon and its EIAO predecessor can be found in the partial eGovMon architecture
recovery in Appendix A. Specifically Section 2 of Appendix A will give the reader a background
in that project.
Architecture Recovery of the eGovMon System
As documented in Chapter 4, the methodology calls for an initial planning phase followed by iterations of Gather Evidence, Analyze, Prepare Views, and Present. A part of that planning is to
understand the link between the architecture recovery effort and whatever motivated it. The intent
of the project which motivated this architecture recovery project is to use the eGovMon system to
measure the accessibility of university websites in California. Therefore the purpose of this architecture recovery is to prepare for a re-engineering effort of the existing eGovMon system for this
purpose. What comes from this recovery must allow the developers to reason about the qualities
that this system will exhibit, determine if they meet the needs of the re-engineered system and
guide the specification for changes to the system.
There are many different aspects of the system under study. There is the static structure of the
package that is used for installation, the dynamic behavior of the install process and the static
structures left on the object machine when it concludes, the dynamic behavior of the system as it
creates the run-time structures to provide the services and of course the run-time objects and their
behavior as they interact with the users’ requests. In the course of this project all of these required
some analysis. However the only views that will be developed will be the static structure of the
elements from which the servers are started and the dynamic structures and behaviors which support the services. The SARM recommends determining the decision making process for stakeholder acceptance. Since there were no other stakeholders for this project, that was not an issue.
SARM Planning - Tool Selection
A key decision in this project was to not attempt to use any form of tool support for the lexical
analysis Why forego tool support? Cost for one, both financial and time. Understand for Java
( by itself costs $995 for a single use license and $1995 for a “floating” license. While many of the tools are free, there is still a cost of acquisition in the form of research
time to find the appropriate package, integration time, learning time etc.
Another reason is the limitations of the tools. The very best tools still suffer from the inherent
limitations of static analysis. All authors agree that tools can provide leverage but the output must
be carefully used lest inappropriate conclusions are drawn from faulty data.
As with any architectural analysis, learning the full lexical complexity of the code was not needed
although a sub-goal for the larger vision as well as a healthy skepticism that any tool can capture
the more subtle semantic information embedded in the code that is needed to document the behavior in Component-and-Connector views.
Another reason a tool was not used is training. No tool is useful without understanding how to use
it. This requires an investment in training to ensure that the results from the tool are dependable
and accurate. Also the distraction from the purpose of the project. One does not engage in a reengineering effort to establish infrastructure. While many organizations will allocate the budget
needed to create the workbench needed to perform this kind of analysis using the best available
tools, many other organizations will not. This has significant implications for the skills needed
and the tasks that must be undertaken.
The tools introduce an additional step. The objective is to reconstruct the architecture and an inherent part of that architecture is the graphic presentation. While tools such as Dali/ARMIN can
provide some graphic support, most authors setting out to create figures will only use the tool for
a draft of the figure and then recreate it using a more robust graphics tool.
While most other researchers have used automated tools to assist with the lexical analysis of the
source code, this project relied almost exclusively on manual interpretation with only some use of
editors to search for specific tokens. In the O’Brien work [O’Brien 2002][O’Brien 2003a][O’Brien 2003-b] they comment that it was necessary to determine which view types were to be
reconstructed before beginning the data extraction phase. Unlike the O’Brien work, this project
set out to create three viewtypes: module, C&C and Implementation for the system under study.
One tool that was indispensible was the tool used to create the visual representation. Visual representation of the material is important and hand-drawn diagrams would not be acceptable. The tool
chosen was Microsoft’s Visio, supplemented by UML templates from the web.
SARM Planning - Scope
Like most major systems, the creation of the production environment and the installation process
of the system is a non-trivial activity. Since this could reasonably be considered part of the system if the intent is to make this system widely available to others, it was excluded from the scope
since the platform is likely to change and it helped to keep the scope to reasonable bounds. Likewise that part of the system that was necessary to perform the start-up activities of loading the
servers and establishing the software environment was also excluded for this project. This is a
case study and the scope only needed to be as large as necessary to demonstrate the methodology.
While there are tools available to assist with architecture recovery, there were downsides to using
a tool. This project was done with manual code analysis since the use of any tool risked a distraction from the primary purpose of the project. The choice for the final form of the architecture
documentation was from the Clements book, Documenting Software Architectures [Clements
The eGovMon system consists of a collection of hardware and software components. Not all of
them will be within the scope of analysis for this project. Figure 4 shows the major components
of the complete system. The scope of analysis focused on those pieces which are currently under
development and will largely ignore those components that are hardware or effectively commercial off-the-shelf components such as PostgresSQL, Linux, Java or Apache. There was some consideration given to the way in which the software components can be allocated to hardware later
in the project.
Figure 4: Scope Diagram
Figure 5 from
While the eGovMon Project has many components, there are only a few that fall within the scope
of study for this master’s project. Primarily they are:
Automated assessment
eGovMon database
There are many documents available from this website as well. But for the purposes of inferring
the architectural description of the system under study, the documents used are limited to:
D4.2.1 eGovMon System Design Specification [Goodwin-Olsen 2009]
Architecture for large-scale automatic web accessibility evaluation based on the
UWEM methodology [Ulltveit-Moe 2008]
A proposed architecture for large scale web accessibility assessment [Snaprud 2006]
Second version of ROBACC WAMs, D3.2.1 [Nietzio 2006]
These documents clearly describe the prior work of the European Internet Accessibility Observatory (EIAO).
From the EIAO website ( they offer an architecture diagram reproduced in Figure
Figure 6: The EIAO Architecture (from
In their work they describe Figure 6 as follows:
Initially, an administrator populates a URL repository with web site URLs.
The crawler gets web site URLs from the URL repository and further populates the it
(sic) with individual web page URLs. The crawler extracts at most 6000 pages from each
When the crawler is finished, the sampler selects 600 web pages from each site at random
making a near random uniform sampling.
Each of the 600 pages are (sic) evaluated to detect accessibility barriers by the WAM and
the results are stored in an RDF database.
The ETL extracts these results, transforms them and inserts the results for long storage in
a data warehouse.
When all scheduled sites have been crawler, evaluated and loaded in the data warehouse,
the data in the data warehouse is organised to be available in the online reporting tool.
Users can then see the results from the online reporting tool.
Another important document that helps in the reconstruction is the EIAO documented source
code, Deliverable Number: D5.2.2.1-2. The pages of this document which address the architecture are in the appendix.
The scope of this project focused on the parts of this software system likely to be modified during
research to extend eGovMon and placed packaged functions out of scope. Clearly the OS, Postgresql, Python and Java are out of scope. But other components developed by others and used
without modification are also out of scope. These include HarvestMan, Imergo, Tidy, Jython and
many others. Statements about the boundaries between the elements within scope and out of
scope are primarily found in the Architecture reconstruction chapter in the Module decomposition
Many elements in eGovMon are built with extensive logic to aid in development and operational
debugging. Except for the section where the quality of modifiability is discussed, these features
will be kept out of scope. To do them justice would make an already complex system much more
difficult to understand. While the developers and maintainers of this system are primary stakeholders for the purpose of this project, they will already possess the skills needed to understand
this variability in use once they understand the architecture as presented here.
Relaxed, the HTML validator, ( is out of scope. Here is what the
author, Petr Nalevka says about it ( :
“ ‘Relaxed’ is my XML validation project and a part of my bachelor's thesis. It is an easy
to use HTML validation application which is special in the sense it doesn't use the official
W3C DTD's. It rather validates HTML documents using schema definitions written in
Relax NG with embedded Schematron patterns. This is an extremely expressive combination of languages which enables validation of additional restrictions which can not be expressed using DTD. This includes most restrictions specified in the W3C HTML 4.01 and
the W3C XHTML 1.0 recommendation and some restrictions from WAI WCAG 1.0
Besides limiting the scope of this project by functionality, the scope is limited along a temporal
axis as well. Some say that “writing down system installation procedures…is not architectural.”
(Clements, et al., 2003 p. 374) A sidebar in one of the references makes astute observations of the
various “times” that exist in a system (Clements, et al., 2003 pp. 213-215). In this spirit, this pro-
ject will reduce the scope of the “times” of interest in the architectural design of this system. If
this is accepted, those parts of the eGovMon system that exist to install the correct platform for
the system fall outside the scope of analysis and architecture recovery. In addition, the eGovMon
system must perform significant logic to create the run-time environment that will provide the
services of this system. Yet many of these elements have no persistent run-time presence. For the
purposes of this project those start-up processes and modules will be largely excluded from the
analysis and presentation. While they do offer some interesting architectural issues, this is being
done to keep the scope of this project within reason.
SARM Planning - Final Form of Deliverable
SARM recommends that an early decision be made regarding the form the final deliverable will
take. This will guide the data that must be gathered, the analysis that must be done upon that evidence, the types of exhibits that must be created and the way the material must be presented. The
source material for the form of documentation chosen is from Documenting Software Architectures by Clements [Clements 2003]. There are many alternative ways to document the architecture but theirs was chosen. There are other styles, in the text [Clements 2003], but these will serve
to satisfy the stakeholder’s initial needs.
SARM Planning - Document Repository
In this case study, it was not possible to envision the document repository that was needed. This
repository grew organically over the course of the project. In the end, the document repository for
the recovery was the collected UML diagrams in addition to the source documents. Since the intended form of the final deliverable was known from the beginning of the recovery, the various
documents that could be part of that deliverable were kept organized according to that structure.
That complete organization can be seen in Appendix A. But there were many other documents
that would not be included in the final deliverable but were important reference documents during
the recovery. In this case study, an important aid to understanding was the source code itself with
extensive notes made over time to document key insights into the overall system.
SARM Planning - Stakeholder Needs/Concerns
Any architecture recovery project is driven by some need. To maximize the chances of success
and minimize wasted effort, the team must understand those project drivers. In this case study, all
of the needs documented in the EIAO and eGovMon projects exist as drivers for this case study.
The reason this system exists is to perform research into the metrics of websites as defined by the
Unified Web Evaluation Methodology (UWEM), an ongoing project to create a unified way to
assess website accessibility, and to provide feedback to those researchers so UWEM might be
amended. In this capacity the system can be viewed from three very different perspectives. On the
one hand, it is a framework for the housing and execution of metrics under evaluation. Researchers posit metrics and plan experiments to determine if the new metric has value over prior ones.
But in another light the system is an autonomous device to apply these individual metrics to statistically significant sets of websites. As such it must be prepared to handle the widest possible set
of exception conditions that a web crawler might encounter when attempting to crawl large portions of the web unattended. The third perspective is that of the service this device can provide to
people who might want to subject individual websites to analysis. It is from these three perspectives that the requirements of eGovMon are drawn. These perspectives provide the framework for
the stakeholder’s needs of the eGovMon system. To support the ATAM, it is best if these re-
quirements are documented using the multi-part format given by Bass in Software Architecture in
Practice [Bass 2003]. The development of those requirements is developed into that form in the
next section.
SARM Planning – Stakeholder Needs/Concerns - Need for Extensibility and Maintainability
The first perspective on eGovMon clearly shows the need for maintainability. The primary purpose is exploration and research. A system that is difficult to modify to explore a new metric
would be undesirable. Therefore this system has a need for the system quality attribute of modifiability. From Bass [Bass 2003 p74] the general quality attribute scenario for modifiability looks
like this:
Source: Developer/Researcher
Stimulus: A proposed change to the metric calculation
Artifact: Code
Environment: Design Time
Response: Modification is made to eGovMon with no side effects
Response Measure: the target for work-hours and elapsed time to make the change
This general quality attribute scenario is difficult to translate into specific concrete scenarios due
to the indefinability of the stimulus. However it is clear that some architectures will enable more
efficient modification of the system while others will make it more difficult. The objective is to
have an architecture which will minimize the work-hours and elapsed time for the expected set of
changes which will be made to eGovMon over the foreseeable future. Therefore the focus for
modifiability is primarily within the WAMs module and secondarily in the Sampler module because of its tight coupling with the WAMs module.
SARM Planning – Stakeholders Needs/Concerns - Need for Availability
From the perspective of a tool running unattended performing metric analysis on a large number
of websites and from the perspective of a casual user requesting the evaluation of a single website, there are some needs for availability. For the purpose of performing large-scale website
evaluation, there is a tradeoff between performance and availability. If large numbers of websites
can be evaluated in a brief period of time, it can justify a researcher giving the system undivided
attention during operation to immediately respond to errors. However the idea of acquiring larger
and faster machines to cause the system to fail faster is not consistent with the continual economic pressure that is usually present in a research project. Rather the quality that is desired is to both
reduce the mean-time-to-failure and the mean-time-to-repair. Mean-time-to-failure is most directly impacted by research and analysis into known and expected exception conditions that will occur in operation and ensuring that each one has been included in the run-time behavior. There will
also be a clear relationship between the mean-time-to-repair and the requirement for modifiability.
The general quality attribute scenario for availability is:
Source: Internal, External
Stimulus: (Fault) Omission, Crash, Timing, Response
Artifact: Process, Storage, Processor, Communication
Environment: Normal Operation, Degraded Operation
Response: Record, Notify, Disable, Continue (Normal, Degraded), Become Unavailable
Response Measure: Repair Time, Availability/ Available Time, Degraded Time Interval
There are many foreseeable fault events in the operation including website unavailability or
communication failures. However the statistical nature of the evaluation allows for these conditions up to but not including the complete loss of internet connectivity. However since any given
testrun is likely to include a large number of websites, what is NOT desired is for the testrun to
cease when it encounters a fault such that manual intervention is required in order to restart the
testrun. This gives the following concrete scenario:
Source: Internal or External
Stimulus: Fault
Artifact: Process
Environment: Normal Operation
Response: Record and/or Notify and Continue Normal Operation
Response Measure: A nominal increase in the processing time for the website which
raised the fault is expected and reduced functionality for the metrics of that website are
also permissible. However handling the fault should not increase the handling time of this
website by more than 50%.
In general, availability, α, can be measured as Mean-Time-To-Failure (MTTF) divided by the
sum MTTF and Mean-Time-To-Repair (MTTR).
MTTR can be improved with good documentation. This can be expressed with a poorly quantified concrete scenario as follows:
Source: Fault
Artifact: Process
Environment: Normal Operation
Response: Fault recognized, source of fault identified, code modified and Normal Operation resumed
Response Measure: MTTR improved over EIAO levels
SARM Planning – Stakeholders Needs/Concerns - Need for Scalability and Performance
While performance is not a major concern, at least in the early phases of the development for this
product, it is intended to eventually scale to handle the frequent evaluation of a large number of
websites over time. Therefore some consideration to the scalability of this system must be given
even if the implementation of that scalability is deferred. The modifiability requirements already
gives a functional organization to the system and these functional boundaries can also be used at
the initial source for parallel operation putting different functional components onto different machines. This will be analyzed later but the immediate task is to provide one or more concrete scenarios which capture this requirement.
The general scenario for performance is:
Source: One of a number of independent sources, possibly from within system
Stimulus: Periodic events arrive; sporadic events arrive; stochastic events arrive
Artifact: System
Environment: Normal mode; overload mode
Response: Processes stimuli; changes level of service
Response Measure: Latency, deadline, throughput, jitter, miss rate, data loss
As this applies to the need for scalability, one concrete scenario can be:
Source: stakeholder or funding agency
Stimulus: Additional funds for hardware are made available to the system to allow for
greater throughput
Artifact: The eGovMon System
Environment: Normal mode
Response: More hardware is made available and throughput is improved
Response Measure: While it is not necessary to specify an exact benchmark for the marginal improvement in throughput for the system, there must be some curve of marginal
improvement that clearly shows that the architecture can be scaled as needed. i.e., the
curve must be approximately linear and not exponential.
SARM Planning – Stakeholder Needs/Concerns - Need for Usability
While it must be possible for the researchers to use the system, there is currently little space between the developers and the research users. Therefore the primary persona for usability analysis
can be assumed to be one of the less experienced developers and someone familiar with Linux
and SQL. However it is also true that since the pool of workers on this project may be students
who are only on the project for a short time, the need to have people quickly understand and contribute to the project is important. Therefore the usability of this system is important to the project
in that the less senior member time spent training new members, the greater the productivity of
the entire team. One reasonable concrete scenario for this is:
Source: A Researcher or eGovernment advocate
Stimulus: Request for a new testrun
Artifact: eGovMon System
Environment: Normal Operation
Response: Testrun setup and initiated
Response Measure: For an advanced user who is familiar with the scripts and their use
this can currently be done in a few minutes. The goal is to make it as easy for a novice
user as for an experienced developer.
SARM First Iteration - Allocation Implementation View
The first iteration of the methodology called for documenting the Allocation Implementation
View. This view captures how the source code is stored in both the development environment and
production. For this project, it was only necessary to document the development environment
since the mapping from development to the production system is largely self-evident from the
development file system organization. This organization can be seen in Appendix A, Section 15.
The analysis in this iteration consisted of understanding the mapping from the Subversion tree to
the installed location on the disk of the target machine(s). This piece of documentation became an
important reference for the remainder of the recovery.
SARM First Iteration - Gather Evidence: eGovMon Source Materials
The primary source material for this project is the source code for the system as housed in a Subversion repository. The URL for this material is The
eGovMon Project has made this repository publically available. In addition to the source code,
the main web site for the project offers many other materials that are helpful for the architectural
reconstruction. One such item is the diagram shown in Figure 5 on page 34 which presents the
overall eGovMon structure. Some of the tasks that were done while gathering evidence included:
Catalog existing documentation and separate source documents from derivative works.
Clone subversion tree of source code for static analysis and review of comments
Install and debug an installation for dynamic analysis
Correspondence with developers to assist with installation
To avoid the issues of duplicate element names, the file elements are fully qualified with the directory path as they are run on the system. In addition, the classes and methods are also fully
qualified to the file which defines them.
SARM Second Iteration - Module Decomposition and Generalizes Views
The second iteration consisted in reconstruction some of the module views. These consisted of
the Decomposition view-type, the Generalization view-type, and the Layered view-type. These
can be seen in Appendix A, Sections 8, 10 – 11. The Generalization and Decomposition were
primarily a mechanical process of extracting the relationships between the modules and recording
them in a database. No tool was used to extract the modules and relationships from the source
code so or this case study, the information was directly captured in the appropriate UML diagrams without building the database suggested by Kazman and others [Guo 1999] [Kazman
2003] . However this required making determinations of which modules to decompose. For this
project the biggest challenge was to determine where to draw the line between the system under
study and the environment. As can be seen from the context diagram in Figure 3 there is a large
amount of supporting software which could have been considered within the scope of the study.
The determination was made to keep the project within reasonable bounds and to focus on the
most interesting aspects of the system. A complete architectural study may have included all of
these components but the objectives of the re-engineering effort which motivated this recovery
did not require it. The final decision was to focus on the modules that are repeatedly mentioned in
the architecture diagrams from the original developers: SiteURLServer, Crawler, Sampler, and
the Web Access Metric (WAMs).
One of the decisions the creator of the views must make is when to combine similar views to
avoid needless duplication. Decomposition and Genralizes are two such views since they are both
module views. In finalizing the views to present, the Decomposition and Generalizes views could
have been combined. However after preparing a couple combined views, it was clear that this
detracted from the visual communication of the structures without providing any new insight.
Therefore it was decided to keep the module views as separate and not combine any of them.
The architecture of the main functionality of eGovMon did not exhibit any clear layered pattern.
However there was a significant amount of supporting software that needed to be in place for this
system to operate. The Layered view-type was used to capture information about the relationship
between the primary application code and the software needed to support it. The most difficult
view-type to construct of the Module view-types was the Uses view-type.
Since the primary language of this system under study was Python and not Java and since the intended reconstruction was to include dynamic and static views, this set of element types and relations was not sufficient. For static structure, the only element types that required capture were
files, class and methods. The relations needed were generalizes, includes, contains, imports and
uses. A sample of the information collected is shown in Figure 7. However this was abandoned
since there was no benefit to capturing the information in tabular form and again in the graphic
form of UML. In this case study, direct reference to the UML diagrams was just as helpful as a
table lookup.
SARM Third Iteration - Module Uses View
The primary way that the dynamic views would be captured was from direct examination of the
code. The Module view that most directly would support that analysis was the Uses View. This
was the most difficult of the Module views to capture.
While it is true that most uses are explicit calls, it is not true that all calls necessarily represent a
uses relationship nor that there must be a call for there to be a uses relationship. [Clements 2003,
p68] Therefore this analysis requires an understanding of the relationship between the modules.
The Uses view-type is given in Appendix A, Section 9. The analysis needed to determine the uses
relationship added significant understanding of the run-time behavior of the system and was invaluable in preparing the Component-and-Connector views. While a straw-man of the Uses view-
type was created in iteration 2, the finalization of the Uses view-type actually became a separate,
third, iteration in the case-study.
The second iteration documented the decomposition and generalization. However one relationship that was more challenging and deferred to the second iteration was the uses relationship. Unlike decomposition and generalizes, uses can not always be inferred lexically. The uses style
“tells developers what other modules must exist in order for their portion of the system to work
correctly.” [Clements 2003, p64]. Clements details how the uses relationship is not a simple call
since one module can depend upon the proper functioning of another without an explicit call between them. In this system, some inter-module communication is mediated by the database.
Therefore the direct relationship between the modules is not directly available from lexical analysis. There were not many in eGovMon but the way analysis is started is one example.
In eGovMon, the SiteURLServer will not provide any of the siteURLs in a list for analysis before
the user has communicated that the list is to be initiated using a call to the
egovmondb.initiateTestrun method. This method will put the database in a state that allows the
SiteURLServer to begin serving these URLs to the Crawler. Therefore the SiteURLServer will
not behave properly without this other module in place and this cannot be seen except by analyzing what precondition the SiteURLServer depends upon and determining what creates that precondition. While it is trivially possible to stub out the initiateTestRun method by direct manipulation of the database it clearly shows the difficulty of depending solely on lexical analysis even for
the static module views.
Since the module views were directly constructed by manual analysis of the source code, the
Analyze and Prepare views activities occurred simultaneously. Of course there was no explicit
presentation of the views for iterations 1, 2 or 3 but instead this part of the iterative cycle was
used to assess and plan the next iteration.
The need for someone on the team who understands the language used in creating the system became clear in this iteration. While it is not necessary to understand every line of code, if there is
code that has architecture implications it cannot be ignored. For this case study it caused some
delays as pieces of code that were not immediately clear were researched.
Another challenge in this iteration was the different scales of time when the system is running.
As the authors in Clements confessed, it is easy to confuse what “time” it is when documenting
systems [Clements 2003, p213]. The author also pointed out the importance of clear semantics
when referring to compile time and run time constructs. They advocate the use of the term “module” in the context of design and compile time while saving the term “component” for the runtime
component. Their discussion of the reasons are found on [Clements 2003, p21].
For simple systems keeping track of run-time and compile time is trivial. But as the system gets
bigger and more complex there are “times” between these two and clearly differentiating and
documenting them becomes challenging. In this case study that issue was seen in the multi-step
process by which the system started. The primary run-time structures are the servers which persist
until they specifically stopped. But there are run-time structures that exist to setup the environment and then end. In this project these structures are ignored in the component-and-connector
After refining the scope to properly focus on the “times” of interest to the stakeholder, the module
views were reviewed to capture specific information about each uses relationship between modules. As mentioned previously, the tabular form of data collection suggested by O’Brien and de-
picted in Figure 3 [O’Brien 2002] was not found generally helpful in this case study. But in collecting information about these uses relationship, this format of data collection was helpful in
preparing for the creation of the final views found in the Appendix. Figure 7 is an example of the
data collected when performing a uses analysis of the Crawler modules.
SARM Fourth Iteration: Dynamic Views
In the fourth iteration the primary source document was the Uses View from the last iteration and
supported by confirmation by examining the system in operation. Many of the processes have
variability in their number and a key challenge was understanding the code constructs that enabled this variability.
SARM Fourth Iteration - Gather Evidence
While the second and third iterations gave basic static structures, the next iteration gave greater
understanding of the dynamic or run-time structures. Some of the modules exist only to establish
the persistent run-time structures. For example the module /usr/bin/crawlerserver (references to
modules are given as references to the implementation view which are references to the leaf within the subversion tree) exists to initiate the crawlerserver but then exits. While it is no burden to
document these modules in the module views, to try to represent all of these initiating processes
in the Component-and-Connector views was beyond the scope of the project. Yet it was necessary to analyze these modules to understand which modules initiated the persistent run-time processes.
In addition, like most web-centric servers, this one was multi-threaded. Modules like CrawlingServer spawned multiple threads. This is most clearly seen in the Component-and-Connector,
Communicating Processes View found in Appendix A, Section 14.
SARM Fourth Iteration - Analyze
The analysis in this iteration included the identification of architecture patterns or tactics. For example the system made extensive use of wrappers to use pre-existing components. It also explicitly referenced servers in an obvious client-server pattern. This suggested the deployment of these
servers to different machines to aid in performance and capacity. This was further emphasized by
the use of SOAP which allowed the servers to communicate among various machines on a shared
SARM Fourth Iteration - Generate Views
The challenge in this iteration was determining which views would meet the needs of this project.
The advice Bass gives regarding which views should be generated [Bass 2003] is limited. He
points out that the views must satisfy the needs to which the stakeholders will have of the architecture documentation. (p205) and also points out that different views allows different types of
reasoning about the system. Given that one conceptual view of this system is a linear process of
accepting website addresses and generating an accessibility metric for that site it makes sense to
view the system as a pipe-and-filter view. Yet one stipulation of a pure pipe-and-filter pattern is
order preservation. While this is trivially true when only one website is presented for analysis, it
is not true when the system is presented with many websites to evaluate simultaneously. In that
case each website is analyzed in its own threads and the results posted to the database depending
upon how many pages are at that site and the difficulty of analyzing the pages. The order results
are posted cannot be predicted from the order the websites are presented to the system. Yet for
throughput analysis the pipe-and-filter view still makes sense and is the most helpful as a grounding view of the system.
Once all the module views were documented, the next iteration was to recover the Componentand-Connector (run-time) views of the system. There was no straightforward way to perform this
recovery or decide the most important views in advance of creating them. The method chosen
was to create all the viewtypes mentioned by Clements [Clements 2003] and then determine
which one most accurately captured and communicated the architecture most clearly. The Pipeand-Filter view-type (Appendix A, Section 12) brought out the linear progression from the initial
URL to the final analysis of that website. The Shared-Data view-type (Appendix A, Section 13)
showed the central role played by the database in this design. In the end, these view-types were
all kept since each one provided a helpful view of the design that aided understanding. It should
be noted however that the Pipe-and-Filter view-type differs from a strict representation since the
multi-threaded nature of the design does not enforce the first-in-first-out behavior ordinarily associated with a Pipe-and-Filter pattern.
The data gathering to determine the Component-and-Connector views was a combination of direct inference from reviewing the code and observation of the processes running on the machine.
The starting place was the primary use-cases of the system which were given by the operating
instructions supplied by the developers. From the user actions specified by the operating instructions, it was possible to trace the causal relationship between the modules and reconcile it to the
observed behavior. The Uses View-type (Appendix A, Section 9) gave easy reference to the vari-
ous modules were used at run-time and direct examination of the code revealed when this uses
relationship was one where multiple instances of a class were created or whether there was a single static module at run-time. This was the longest and most difficult part of the architecture recovery. But once complete it provided enough evidence to directly construct all of the Component-and-Connector views.
The Analyze step of this iteration focused on how to represent the various connectors in each of
the views. There is considerable latitude given in what constitutes a “connector” in these views as
these can consist of pure communication components or a combination of communication and
run-time counterpart of the modules. Since SOAP (originally Simple Object Access Protocol, a
way to support inter-process communication) connectivity is implemented using various packages
in Python, it became a challenge to determine exactly how to document these connectors. In the
end, it became a matter of capturing enough information to suggest what was in these connectors
without become overwhelmed with technical details. One important detail was which connectors
were SOAP since that had implications for variation in the Allocation Deployment view-type.
With the creation of the Component-and-Connector view-types, it became possible to perform
some reasoning about the qualities of the system and the requirements on the system that are satisfied by those qualities. The documentation of the system was not complete enough that a
presentation of significant parts of the system architecture could have been made. For this iteration, the views were generated in parallel with the analyze so there was no separate Prepare
Views step.
SARM Fifth Iteration: Plus One Views
At the conclusion of iteration 4 the architecture became explainable to others. But it was not yet a
complete deliverable. As with any document of this size and complexity, there were sections
needed to provide a complete and comprehensible document for final delivery. While this case
study stopped short of a complete recovery of eGovMon, it was complete for the purpose. In order to ensure no information was lost and that the document was understandable to someone who
had not worked on the system, there was a last iteration needed to fill in the supporting sections.
Those included a glossary, introductory material, and, insofar as it was possible, a discussion of
how the architecture patterns contribute, or inhibit, achieving the needs for the system.
The review of the artifacts of the system under study uncovered both explicit references to original requirements and clearly implied requirements. Where major architectural decisions could be
associated with these requirements, this was done. There were, of course, a large number of specific requirements that had to be satisfied, but these are the major ones that had profound architectural influence. These connections between patterns and needs are captured in the document in
Appendix A in the views to which they apply. These sections are titled Architecture Background
and numbered Section x.4.5.
The most obvious pattern that could be associated with a need was the client-server pattern. That,
in conjunction with the use of SOAP, clearly showed the intent to satisfy the need for scalability,
and performance by enabling different components to be deployed to different machines. But in
doing so, there was a tradeoff in the operability and availability.
Having different components on different machines multiplied the places were problems can occur. When the throughput of the system is below expectations, the logs from each machine must
be examined to find the bottleneck.
Another pattern used was the dynamic allocation of requests for WAMs. The system attempts to
make requests of processes that appear able to handle additional load while ignoring processes
that appear to be busy. This dynamic allocation maximizes the capability by using the processors
efficiently while also avoiding overloading the system and inducing thrashing.
Figure 7 Static relationships for Crawler
/etc/i ni t.d/cra wl ers erver
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er
/us r/bi n/cra wl er:ma i n (method)
/us r/bi n/cra wl er:run-cra wl er (method)
/us r/bi n/cra wl er:Cra wl i ngServer (cl a s s )
/us r/bi n/cra wl er:Cra wl i ngServer.s tart (method)
/us r/bi n/cra wl er:Cra wl erServerT (cl a s s )
/us r/bi n/cra wl er:Cra wl erServerT.s tart (method)
/us r/bi n/cra wl er.Cra wl (method)
/us r/bi n/cra wl er.Cra wl (method)
/us r/bi n/cra wl erwra pper
/us r/bi n/cra wl erwra pper
/us r/bi n/cra wl erwra pper
/us r/bi n/cra wl erwra pper
/us r/bi n/cra wl erwra pper
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:doCra wl (method)
/us r/bi n/cra wl erwra pper:doCra wl (method)
/us r/bi n/cra wl erwra pper:doCra wl (method)
/us r/bi n/cra wl erwra pper
/us r/bi n/cra wl erwra pper
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s
us es
i mport
i mport
i mport
contai ns
contai ns
contai ns
contai ns
contai ns
us es
us es
contai ns
us es
contai ns
us es
us es
us es
contai ns
contai ns
contai ns
contai ns
contai ns
us es
us es
us es
us es
us es
us es
us es
us es
us es
us es
i mport
i mport
contai ns
contai ns
/us r/bi n/cra wl er
os s pa wn s tart-s top-da emon --s tart --exec cra wl ers erver -- -d
/us r/bi n/cra wl er:Log (cl a s s )
/us r/bi n/cra wl er:Ki l l i ngThrea d (cl a s s )
/us r/bi n/cra wl er:Cra wl i ngServerT (cl a s s )
/us r/bi n/cra wl er:Cra wl i ngServer (cl a s s )
/us r/bi n/cra wl er:ma i n (method)
/us r/bi n/cra wl er:run-cra wl er (method)
/us r/bi n/cra wl er:Cra wl i ngServer (cl a s s )
/us r/bi n/cra wl er:Cra wl i ngServer.s tart (method)
/us r/bi n/cra wl er:Cra wl i ngServerT (cl a s s )
/us r/bi n/cra wl er:Cra wl i ngServerT.s tart (method)
s tart i s i nheri ted from s upercl a s s
/us r/bi n/cra wl er:Cra wl i (method)
/us r/bi n/s i teurl s erver.getSi teURL
SOAP ca l l
/us r/bi n/cra wl erwra pper
os s pa wn Popen(cra wl erwra pper…)
/us r/bi n/cra wl erwra pper:s endToSa mpl er (method)
/us r/bi n/cra wl erwra pper:s tartCra wl i ng (method)
/us r/bi n/cra wl erwra pper:s i teSel ection (method)
/us r/bi n/cra wl erwra pper:mkConfi g (method)
/us r/bi n/cra wl erwra pper:doCra wl (method)
/us r/bi n/cra wl erwra pper:s a mpl i ngs erver.loa dSa mpl e SOAP ca l l
/us r/bi n/eGovMonDB.getTes tRunID (method)
/us r/bi n/eGovMonDB.getSi teInTes trun (method)
/us r/bi n/eGovMonDB.getSi teFi ni s hed (method)
/us r/bi n/eGovMonDB.getPa ges Downl oa ded (method)
/us r/bi n/eGovMonDB.getAl l StartURLs FromSi te (method)
/us r/bi n/eGovMonDB.getIs Sma l l Si te
/??/ha rves tma n.prepa re
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s .eGovMonHa rves tMa n.s etIns tances (method)
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s .eGovMonHa rves tMa i n (method)
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma n.*
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s .eGovMonHa rves tMa n
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s :Ha rves tMa n (cl a s s )
/us r/l i b/python2.5/s i te-pa cka ges /ha rves tma nkl a s s :eGovMonHa rves tMa n (cl a s s )
Chapter 6
While the completely algorithmic recovery of a production system’s architecture is highly unlikely, the experience of using a simple methodology while performing the data gathering, analysis,
view creation and review provided structure to the effort that made it easier to assess progress,
determine what still needed to be accomplished and keep things organized. While a good project
manager will provide this structure, they will do it based on their own experience since there is
little available in the literature to guide them. What is suggested by this methodology and supporting case study is that most any architecture recovery effort will need to solve similar problems. While this methodology is not the ultimate, it gives a project manager a starting point for
This project suggests several areas of further research. While the case study provides anecdotal
evidence that this methodology leads to a superior outcome to what would have happened in its
absence, this was not proven. It is unlikely that a controlled experiment could ever be attempted
to compare the outcomes of two teams, one using the methodology and one without. However a
single case study does not verify the effectiveness of this methodology. At a minimum there must
be additional case studies which document the methodologies used by other teams that have accomplished architecture recovery.
Another possible direction for further study is the creation of a tool that incorporates the major
elements of this methodology with supporting features such as document management with the
kind of tools that are already under development for the semi-automatic recovery of some system
Given the growth in social systems in the past few years and the inherent collaborative nature of a
recovery effort, it would be interesting to attempt the distributed architecture recovery for an open
source product that does not yet have a documented architecture.
As architecture becomes more commonly used in systems work, the need for architecture recovery will grow. Tools like this methodology will surely gain further study and refinement in time.
A Partial Recovery of the eGovMon Software Architecture
This appendix contains a partial reconstruction of the eGovMon architecture as it existed at the
beginning of this project (March 2010). The format and much of the text in this appendix are taken directly from Appendix A of Documenting Software Architectures: Views and Beyond by
Paul Clements, Felix Bachmann, Len Bass, David Garlan, James Ivers, Reed Little, Robert Nord,
and Judith Stafford [Clements 2003]. Every attempt was made to present this architecture as the
authors of this text would have presented it. Note that much of the front matter from in this appendix (Sections 1 – 3) is directly quoted from the Clements text [Clements 2003] to provide a
description of the organization in their words.
Section 1
Documentation Roadmap ........................................................................................... 64
Section 2
eGovMon Overview ................................................................................................... 70
Section 3
eGovMon Software Architecture View Template ...................................................... 73
Section 4
Mapping Between Views ........................................................................................... 77
Section 5
Directory..................................................................................................................... 78
Section 6
Architecture Glossary and Acronym List ................................................................... 79
Section 7
Rationale, Background, and Design Constraints ........................................................ 80
Section 8
Module Decomposition View .................................................................................... 82
8.1 Module Decomposition View Packet 1: The eGovMon System ......................................... 82
8.2 Module Decomposition View Packet 2: The Crawler ......................................................... 86
8.3 Module Decomposition View Packet 3: The Sampler ......................................................... 92
8.4 Module Decomposition View Packet 4: The SiteURLServer.............................................. 98
8.5 Module Decomposition View Packet 5: The WAMs ........................................................ 101
8.6 Module Decomposition View Packet 6: The eGovMonDB .............................................. 105
Section 9
Module Uses View ................................................................................................... 108
9.1 Module Uses View Packet 1: The eGovMon System ........................................................ 108
Section 10 Module Generalization View ................................................................................... 112
10.1 Module Generalization View Packet 1: The Crawler ...................................................... 112
10.2 Module Generalization View Packet 2: The Sampler ...................................................... 114
Section 11 Module Layered View .............................................................................................. 116
11.1 Module Layered View Packet 1: The eGovMon System................................................. 116
Section 12 Component and Connector Pipe-and-Filter View .................................................... 119
12.1 Component and Connector Pipe-and-Filter View Packet 1: The eGovMon System ....... 119
Section 13 Component and Connector Shared-Data View ........................................................ 124
13.1 Component and Connector Shared-Data View Packet 1: The eGovMon System ........... 124
Section 14 Component and Connector Communicating-Processes View .................................. 126
14.1 C&C Communicating-Processes View Packet 1: eGovMon ........................................... 126
Section 15 Allocation Implementation View ............................................................................. 129
15.1 Allocation Implementation View Packet 1: Subversion Distribution .............................. 129
Section 16 Allocation Deployment View ................................................................................... 139
16.1 Allocation Deployment View Packet: The eGovMon System ........................................ 139
Section 17 Allocation Work Assignment View ......................................................................... 140
Figure 1 eGovMon System Decomposition ...................................................................... 82
Figure 2 Crawler Module Decomposition View ............................................................... 86
Figure 3 Sampler Module Decomposition ........................................................................ 92
Figure 4 SiteURLServer Module Decomposition............................................................. 98
Figure 5 WAMs Module Decomposition ....................................................................... 101
Figure 6 eGovMonDB Module Decomposition.............................................................. 105
Figure 7 Crawler Module Generalization View .............................................................. 112
Figure 8 Sampler Module Generalization View ............................................................. 114
Figure 9 eGovMon System Component and Connector Pipe-and-Filter View .............. 119
Figure 10 eGovMon System Component and Connector Shared-Data View................. 124
Figure 11 eGovMon Communicating Processes View ................................................... 126
Section 1
1.1 Description of eGovMon Software Architecture Documentation Package
This section describes the structure and contents of the entire eGovMon software architecture
documentation package. This package contains the following Sections:
Section 1, eGovMon Software Architecture Documentation Roadmap, “lists and outlines the
contents of the overall documentation package and explains how stakeholder concerns can be
addressed by the individual parts. This is the first document that a new stakeholder should
read.” [Clements 2003, p 385]
Section 2, eGovMon System Overview, “gives a broad overview of the purpose and functionality of eGovMon. Architectural detail is purposely omitted from this overview; instead, the
emphasis is on the system’s background, its external interfaces, major constraints, and what
functions the system performs. The purpose is to help someone new to the project understand
what the architecture is trying to achieve.” [ibid.]
Section 3, View Template, explains the standard organization for the views. “The purpose is
to help a reader understand the information given in the views of this document.” [ibid.]
Section 4, Mapping Between Views, “draws comparisons between separate views of the
eGovMon architecture that are given. The mapping points out places where the views overlap
or have elements in common and resolves areas of apparent conflict between the views.”
Section 5, Directory, “is a look-up index of all the elements, relations, and properties in the
eGovMon architecture, listing where these items are defined and where they are used. The
purpose of the directory is to help a stakeholder quickly locate the definition of an architectural entity.” [ibid.]
Section 6, Architectural Glossary and Acronym List, “defines special terms and acronyms
used elsewhere in the architecture documentation package. This list is intended to supplement
the overall project glossary and acronym list.” [ibid. p386]
Section 7, Rationale, Background, and Design Constraints, would be where an architect
would explain the design rationale behind their architecture, including the most relevant
background information and imposed design constraints. Since the goal of this project is architecture reconstruction and not documentation, the sections which would ordinarily contain
the architecture background are left empty in this document. For a description of the inferred
architecture the reader is directed to the main document of which this is only an appendix.
Section 8, Module Decomposition View: “The module decomposition view shows how the
system is decomposed into implementation units and, simultaneously, how the functionality
of the system ids allocated to those units. The elements of this view are modules. The relation
is is-part-of.
“The decomposition view presents the functionality of a system intellectually manageable pieces
that are recursively refined to convey more and more details. Therefore, this style supports the
learning process about a system. This view is a learning and navigation tool for newcomers in the
project or other people who do not necessarily have the whole functional structure of the system
memorized. The grouping of functionality shown in this view also builds a useful basis for defining configuration items within a configuration management framework.
The decomposition view is the basis for creating work assignments, mapping parts of a software
system onto the organizational units—teams—that will be given the responsibility for implementing and testing them. The module decomposition view also provides some support for analyzing
effects of changes at the software implementation level. But because this view does not show all
the dependencies among modules, you should not expect to do a complete impact analysis. Here,
views that elaborate the dependency relationships more thoroughly such as the module uses style
described later are required.” [ibid.]
Section 9, Module Uses View: “The uses view shows how modules are related to one another
by the uses relation, a kind of depends-on relation. A module uses another module if the correctness of the first depends on the correctness of the second. The view is used to help integrators and testers field incrementally larger subsets of the system.” [ibid. p 387]
Section 10, Module Generalization View: “The generalization view shows how classes, a
kind of module, are related to one another by inheritance, a kind of generalization/specialization relation.
“This view is used to show extension and evolution of architectures and individual elements. This
view is the predominant means for expressing the inheritance-based object-oriented design of
eGovMon. It shows where component and design reuse, or reuse with variation, occurs in the system. Like the decomposition view, the generalization view is also useful for analyzing the scope
of a change.” [ibid.]
Section 11, Module Layered View: “The layered view shows how the“ eGovMon “software
is structured as a set of … abstract virtual machines. Lower layers provide abstract hardware,
network, and data transport facilities. Intermediate layers provide common facilities and object services. The highest layers encapsulate the application-dependent aspects of the system.
Elements of this view are layers, a kind of module. Layers are related by the allowed-to-use
relation, where use has the meaning given in the uses view.
“Layers are used to provide portability and modifiability. Specifically, the implementation of any
layer can be replaced without affecting other layers.” [ibid.]
Section 12, Component-and-Connector Pipe-and-Filter View: “The pipe-and-filter view
shows how data entering eGovMon flows through a series of transformations (that are not order preserving, in violation of a pure pipe-and-filter pattern) before being assigned to the appropriate data warehousing facilities. The pipe-and-filter view of eGovMon is a conceptual
view, meaning that in truth, the system is not structured as a series of pipes and filters in the
formal sense. In fact, the shared-data view is a higher fidelity picture of the system as it is
built. However, the pipe-and-filter view conveys a valuable conceptual picture to project
newcomers and users because it shows the transformation.
Elements are pipes and filters, both a kind of component. The relations shown in this view are
the attachments between pipes and filters.” [ibid.]
Section 13, Component and Connector Shared-Data View: “The shared-data view shows the
system structured as a number of data accessors that, at runtime, read and write data in the
various eGovMon shared repositories. Elements are repositories, accessors, and the connectors between the two.
The shared-data view is used to help tune the system for performance to make sure that it can
handle the volume of ingested data, as well as service data processing and production requests in
a timely fashion. [ibid.]
Section 14, Component-and-Connector Communicating-Processes View: “The communicating-processes view represents the system as a set of concurrently executing units together
with their interactions. A concurrent unit is an abstraction of more concrete software platform
elements, such as tasks, processes, and threads. Any pair of concurrent units depicted in a
process style has the potential to execute concurrently, either logically on a single processor
or physically on multiple processors or distributed processors. Connectors enable data ex-
change between concurrent units and control of concurrent units, such as start, stop, synchronization, and so on.
“The communicating-processes view is used to perform concurrency-related analyses, including performance analysis and deadlock detection and prevention. It is also used as the basis
for allocation of the software to hardware processors in the deployment view. [ibid. p 388]
Section 15, Allocation Deployment View: “In the deployment view, processes and other
software elements are allocated to execution platforms, physical units that store, transmit, or
compute data. Physical units include processing nodes (CPUs), communication channels, and
data stores.
“The relation depicted in the deployment view is a special form of allocated to that shows on
which physical units the software elements reside. The allocation relation in” eGovMon “is static;
that is, it does not change at runtime.
“The view is used as the basis for performance analysis by, among other things, analyzing the
volume and frequency of communication among software units on different processing element those elements. The view is also used to support memory capacity analysis; reliability
analysis, by examining the effects of a failed processor or communication channel; and security analysis, be examining the way each platform is connected and potentially vulnerable to
external threats.” [ibid.] It is also helpful in understanding the variability of deployment since
using multiple machines concurrently helps to improve the production system throughput.
Section 16, Allocation Implementation View: “The implementation view shows how code
units, or modules, are mapped, or allocated, to the eGovMon development and implementation environment. It is used to help understand the current development environment and to
manage the day-to-day artifact production and storage process, including configuration management.” [ibid.]
1.1.1 How Stakeholders Can Use the Documentation
“This section lists the stakeholder roles of primary importance to” eGovMon “and how they
might use the documentation package to address their concerns.” [ibid. p389] At this time, the
stakeholders are developers and researchers seeking to extend the architecture to include new
metrics. Here are some suggested parsings of this document for various roles:
A developer new to the project Sections 1, 2, 3, 12, and 8-11
Section 2
2.1 Background
The home page for the eGovMon (eGovernment Monitor) says:
“A massive digitalization of public services is underway. The main challenge in this development
is to ensure that the new online services effectively address the real needs of the citizens, businesses and governmental agencies. A system to monitor this development can give a better understanding of how to build good online service for citizens and enterprises.
The eGovMon project is developing methodology and software for quality evaluation of web services, in particular eGovernment services, concerning four areas:
Additionally eGovMon will provide a policy design tool based on simulation models.
“A set of well defined indicators are to be identified for each area, using a coherent assessment
methodology. Evaluation results will be gathered through automated tools when possible, and
supplemented by surveys, manual assessments and other sources as needed.
“eGovMon will deliver demonstrators for:
Tools for testing and improving selected parts of websites. The first release of the
eAccessibility Checker can be found at
A large scale, online demonstrator for benchmarking eGovernment services. An inital
version for accessibility evaluation is in experimental deployment.
An online simulation model to investigate the estimated impact of planned eGovernment
initiatives, and to see how the particular indicators impact the eGovernment status and
The demonstrators will be designed for the Norwegian context in a close collaboration with a
group of 20 selected Norwegian pilot municipalities.
“To draw on international results and to easily share project outcomes, eGovMon is built on an
open policy with open licensing for documents and for software; this way, the project can re-use
the results from other open projects and focus the overall resources more efficiently on the scientific issues. All software is open source and the project results are released under an open license.
“This openness can also foster synergy among related initiatives; in particular, interaction with the
eGovMoNet project is very useful for eGovMon methodology development.
“eGovMon is a user-driven innovation project co-funded by the Research Council of Norway, under the VERDIKT program focusing on innovation in the public sector (Project no.: Verdikt
183392/S10.) The project started in 2008 and will last for 3 years.”
(copied from August 7, 2010)
The primary researchers (A Nietzio, M H Snaprud, M G Olsen, N Ulltveit-Moe) on this project
had also worked on an earlier effort called the European Internet Accessibility Observatory
(EIAO). Here is what they said of EIAO:
“A demonstrator for large scale accessibility benchmarking - Access for all to the Information Society is a key goal for the European Union. With increasing use of the World Wide Web – in particular for government information and services, it is essential to secure access for all citizens. Yet
many people - especially those with a disability – meet barriers. Benchmarking can locate potential barriers and fuel the development towards Internet access for all.
“The European Internet Accessibility Observatory (EIAO) project has established a large scale accessibility benchmarking service. The distributed system consisting of 9 servers can process 100
web sites per day according to 26 of the automatable tests specified in Unified Web Evaluation
Methodology (UWEM 1.2).”
(copied from August 7, 2010)
The initial system for eGovMon was the end-state system for the EIAO project. The most notable
change between the end-state for EIAO and the intermediate version of eGovMon that is the subject of this architecture recovery is the porting of the application to the Debian environment and
the consolidation of the two databases into a single Postgres database.
Section 3
“Each eGovMon view is presented as a number of related view packets. A view packet is a small,
relatively self-contained bundle of information about the system or a particular part of the system,
rendered in the language—element and relation types—of the view to which it belongs. Two
view packets are related to each other as either parent/child—because one shows a refinement of
the information in the other—or as siblings—because both are children of another view packet.
This Section describes the seven-part standard organization that the documentation for view
packets—Sections 8 through 16—obeys:
1. “A primary presentation that shows the elements and their relationships that populate the
view packet. The primary presentation contains the information important to convey about
the system, in the vocabulary of that view, first. It includes the primary elements and relations
of the view packet but under some circumstances might not include all of them. For example,
the primary presentation may show the elements and relations that come into play during
normal operation, relegating error handling or exception processing to the supporting documentation.
“The primary presentation is usually graphical. If so, the presentation will include a key that
explains the meaning of every symbol used. The first part of the key identifies the notation: if
a defined notation is being used, the key will name it and cite the document that defines it or
defines the version of it being used. If the notation is informal, the key will say so and pro-
ceed to define the symbology and the meaning, if any, of colors, position, or other information carrying aspects of the diagram.
2. “Element catalog detailing at least those elements depicted in the primary presentation and
others that were omitted from the primary presentation. Specific parts of the catalog include
a. Elements and their properties. This section names each element in the view packet
and lists the properties of that element. For example, elements in a module decomposition view have the property of “responsibility,” and explanation of each module’s
role in the system, and elements in a communication-process view have timing parameters, among other things, as properties.
b. Relations and their properties. Each view has a specific type of relation that it depicts
among the elements in that view. However, if the primary presentation does not show
all the relations or if there are exceptions to what is depicted in the primary presentation, this section will record that information.
c. Element interface. An interface is a boundary across which elements interact or
communicate with each other. This section is where element interfaces are documented.
d. Element behavior. Some elements have complex interactions with their environment
and for purpose of understanding or analysis, the element’s behavior is documented.
For eGovMon, behavior of elements is specified in component-and-connector views
(Sections 12-14)
3. “Context diagram showing how the system depicted in the view packet relates to its environment.
4. “Variability guide showing how to exercise any variation points that are a part of the architecture shown in this view packet.
5. “Architecture background explaining why the design reflected in the view packet came to be.
As previously mentioned, this section will be omitted in this document since this is a reconstruction and therefore the original intention of the designer is unknown. If it were present,
the goal of this section would be to explain why the design is as it is and to provide a convincing argument that it is sound. Architecture background includes
a. Rationale. This explains why the design decisions reflected in the view packet were
made and gives a list of rejected alternatives and why they were rejected. This will
prevent future architects from pursuing dead ends in the face of required changes.
b. Analysis results. This documents the results of analyses that have been conducted,
such as the results of performance or security analyses, or a list of what would have
to change in the face of a particular kind of system modification.
c. Assumptions. This documents any assumptions the architect made when crafting the
design. Assumptions are generally about either environment or need.
“Environmental assumptions document what the architect assumes is available in the environment that can be used by the system being designed. They also include assumptions about invariants in the environment. For example, a navigation system architect might make assumptions
about the stability of Earth’s geographic and/or magnetic poles. Finally, assumptions about the
environment may be about the development environment: tool suites available or the skill levels
of the implementation teams, for example.
“Assumptions about need state why the design provided is sufficient for what’s needed. For example, if a navigation system’s software interface provides location information in a single geographic frame of reference, the architect is assuming that is sufficient, and that alternative frame
of reference are not useful.
6. “Other information. This section includes non-architectural and organization-specific information.
7. “Related view packets. This section will name other view packets that are related to the one
being described in a parent/child or sibling capacity. [ibid. pp 397-399]
Section 4
Given the relatively small size of this system, there is little chance that the reader will not immediately see the relationship between the view packets. If this had been a larger system, this section
would contain a cross-reference explaining the relationship between the views. Instead I will offer
a more casual discussion of these relationships.
There is no reason why all the module views cannot be combined into one consolidated view using UML. However this consolidated view becomes very difficult to read with the abundance of
relationships between these modules. The uses view by itself requires some reflection to fully
comprehend and is best presented by itself.
Since the component and connector views represent run-time components and not the static modules which are used to instantiate the run-time components, there is a more complicated relationship back to the module views. In particular, the system is multi-threaded so multiple threads are
programmatically spawned based on static configuration parameters. The same is true of the
crawlers. However the diagrams use a conventional naming which includes the module name for
the instantiated components so little confusion should result.
The Allocation view lack the work assignment view since this is an architecture recovery and not
design with the intent of development. The deployment view documents how the components
from the component and connector view type can be deployed to hardware although only two
alternatives are presented. The deployment view provides the strict hierarchical view of the modules as they are found in the Sub-Version directory.
Section 5
“The directory is an index of all the elements, relations and properties that appear in any of the
views. This index is intended to help a reader find all places where a particular kind of element,
relation, or property is used.” [ibid. p 403]
Section 6
W3C – World Wide Web Consortium
UWEM – Unified Web Evaluation Methodology
WCAG – Web Content Accessibility Guidelines
WAM – Web Accessibility Metric
Barrier – some aspect of a web page that makes the content inaccessible to a differently-abled
HTML – Hypertext Markup Language
CSS – Cascading Style Sheet
C&C – Component and Connector
SOAP – This once stood for Simple Object Access Protocol. In this system it is the mechanism
which allows different components on different machines to communicate
URL – Uniform Resource Locator
EIAO – European Internet Accessibility Observatory
Section 7
When this document is produced as part of architecture documentation, this section will contain
narrative that provides the primary motivators behind the major architectural decisions. However
as part of an architecture recovery, this section makes less sense. Where I feel that the architecture choices were clearly motivated by requirements that I could infer, I have documented them.
Therefore the reader should be aware that this in no way represents what the original designer of
this architecture may have had in mind but is my best guess.
Client-Server and SOAP
The clearest architecture decision was to use a client-server pattern with SOAP. Since there were
comments in some of the academic papers regarding the throughput of the system and its causes,
it is safe to assume that capacity and performance were considerations. By using a client-server
arrangement where the various functional transformations from the website URL to the final metric for that URL are deployed to different machines. In this architecture, the Postgres database,
the SiteURLServer, the Crawler, the Sampler and the WAMs can be run on separate machines
that are connected to the same ip subnet.
Dynamic Allocation of sites to evaluate to available WAM servers
In the sampler, random samples of website URLs are evaluated for their metric. A design choice
was made to have many WAM running simultaneously in order to provide greater capacity. The
design choice to choose the assign the URL to available WAM server represents a key design
Encapsulation of previously created tools into wrappers
Several other products appear to have existed prior to the creation of EIAO or possibly developed
concurrently but independently. To use the functionality these products are seen encapsulated into
wrapper classes. In particular HarvestMan is encapsulated in the crawler wrapper. Another is
RelaxedWAM ( which was written using Java and is encapsulated
in the relaxedwam module.
Section 8
The module decomposition view consists of 6 view packets. View packet 1 shows the decomposition of the entire eGovMon system into a group of 5 subsystems and provides the context diagram
for the subsequent view packets. Subsequent view packets (2-6) show the further decomposition
of each of the subsystems.
8.1 Module Decomposition View Packet 1: The eGovMon System
8.1.1 Primary Presentation
Figure 8 eGovMon System Decomposition
8.1.2 Element Catalog Elements and Their Properties
Properties of the eGovMon modules are
Name, given in the following table
Responsibility, given in the following table
Visibility; all elements are visible across the entire system unless otherwise noted
Implementation information: See the implementation view (Section 17)
Element Name
Element Responsibility
The Crawler
The Crawler is responsible for accepting a URL and recursively identifying all
referenced URLs accessible from this URL until 6000 URLs have been found.
The crawling can be constrained by inclusion and exclusion rules
The Sampler
The Sampler is responsible for randomly selecting URLs identified by the
Crawler, downloading the page and calling the WAMs to perform analysis of
that page. It stops once it has found and analyzed 600 pages.
The SiteURLServer is responsible for buffering all the site URLs that will be
processed in a testrun and passing them out to The Crawler as needed
The WAMs
The WAMs (web accessibility metric calculators) are responsible for analysing the code of the web page and performing analysis on that code per the Unified Web Evaluation Methodology as well as other metrics of interest to the
The eGovMon
The eGovMon Database is responsible for the persistent storage of the
eGovMon data and functions related to the integrity and use of that data.
84 Relations and Their Properties
The relation type in this view is is-part-of. There are no exceptions or additions to the relations
shown in the primary presentation. Elements Interfaces
Element interfaces for these elements are given in subsequent decompositions. Element Behavior
Not applicable.
8.1.3 Context Diagram
This diagram will be the context diagram for the subsequent module views.
8.1.4 Variability Guide
8.1.5 Architecture Background
8.1.6 Other Information
8.1.7 Related View Packets
Parent: none
Module Decomposition View Packet 2: The Crawler (8.2)
Module Decomposition View Packet 3: The Sampler (8.3)
Module Decomposition View Packet 4: The SiteURLServer (8.4)
Module Decomposition View Packet 5: The WAM (8.5)
Module Decomposition View Packet 6: The eGovMon Database (8.6)
Siblings: None in this view. View packets in other views that express the same scope as this
one—namely, the whole system—include
Component-and-Connector: Pipe-and-Filter, Shared Data, Communicating Processes
Allocation: Deploys, Implements
8.2 Module Decomposition View Packet 2: The Crawler
8.2.1 Primary Presentation
Legend: UML
Figure 9 Crawler Module Decomposition View
8.2.2 Element Catalog Elements and Their Properties
Element Name
Element Responsibility
/etc/init.d/crawlerserver The responsibility of this element is to
pass messages to, and initiate when
needed, the usr/bin/crawlerserver element.
The responsibility of this element is to
provide the persistent process which
provides the crawler service to the system.
The responsibility of this class is to establish and maintain the crawler environment and interpret command line input.
The responsibility of this class is to provide one thread to initiate a crawlerwrapper.
The responsibility of this class is to
monitor the CrawlingServerT and ensure
that it is still active.
The responsibility of this class is to pro-
vide a log function to this element.
The responsibility of this script is to encapsulate the HarvestMan function. It
builds the HarvestMan configuration file
and then instantiates the Crawler object
with this configuration file
The responsibility of this class is to use
the harvestMan crawler to perform the
crawl for the given URL. Relations and Their Properties
The relation type in this view is is-part-of. There are no exceptions or additions to the relations
shown in the primary presentation. Element Interfaces
/etc/init.d/crawlerserver is the command line interface for this OS script that accepts command
line arguments.
Its syntax is
/etc/init.d/crawlerserver command
where command is (start | stop | restart)
Its meaning is to cause The Crawler to either begin operating, cease operating or to restart its operation depending upon the argument.
There are no restrictions on the issuance of the command.
There are no locally defined datatypes.
There is no error handling in this module.
There is no variability in this module.
There are no quality attributes attached to this module.
This module requires the existence of the /usr/bin/crawlerserver module.
(rationale)This module exists to provide an OS interface for the user to control the operation of
The Crawler.
The usage of this module is limited to these commands: start, stop, restart.
/usr/bin/crawlerserver is the command line interface for this module (implemented as a Python
Interface identity: /usr/bin/crawlerserver command line
Resource provided: a crawler server
Resource Syntax: from command line, /usr/bin/crawlerserver options
where options can be (-d|--daemonise|-h|--help|--debug)
Resource semantics: causes the persistent crawlerserver to launch. The daemonise option causes
the service to operate as a daemon disconnected from a command line, the help option causes the
module to respond with the syntax for the command and the debug option sets the debug flag in
the module
Resource usage restrictions: none
Locally defined data types: none
Error handling: none
Variability provided: none
Quality attribute characteristics: none
What the element requires:
Rationale and design issues
Usage guide: the interface is designed to be run from the /etc/init.d/crawlerserver module although there is nothing to prevent its execution directly from the command line
/usr/bin/crawlerwrapper is a command issued to the OS. One is spawned for each site which is to
be crawled.
Interface identity: /usr/bin/crawlerwrapper
Resource provided: command line options
Resource Syntax: site testrunid
Resource semantics: the site option is the url which represents the root for this site. Testrunid is a
reference to the previously create testrun in the database
Resource usage restrictions: none
Locally defined data types: none
Error handling: none
Variability provided: none
Quality attribute characteristics: none
What the element requires: no extra requirements for interface
Rationale and design issues: it allows each crawl to be in a separate os process
Usage guide: ordinarily this command is executed by the crawlerserver
Interface identity
Resource provided
Resource Syntax
Resource semantics
Resource usage restrictions
Locally defined data types
Error handling
Variability provided
Quality attribute characteristics
What the element requires
Rationale and design issues
Usage guide Element Behavior
Not applicable.
8.2.3 Context Diagram
See Figure 1.
8.2.4 Variability Guide
8.2.5 Architecture Background
8.2.6 Other Information
8.2.7 Related View Packets
8.3 Module Decomposition View Packet 3: The Sampler
8.3.1 Primary Presentation
Legend: UML
Figure 10 Sampler Module Decomposition
8.3.2 Element Catalog Elements and Their Properties
Element Name
Element Responsibility
The responsibility of this element is
to pass messages to, and initiate when
needed, the usr/bin/crawlerserver element.
The responsibility of this element is
to provide the persistent process
which provides the crawler service to
the system.
The responsibility of this class is to
establish the crawler environment and
interpret command line input.
The responsibility of this class is to
provide one thread to initiate a crawlerwrapper.
The responsibility of this class is to
monitor the CrawlingServerT and
ensure that it is still active.
The responsibility of this class is to
provide a log function to this element.
The responsibility of this script is to
encapsulate the HarvestMan function.
The responsibility of this class is to
use the harvestMan crawler to perform the crawl for the given URL.
/etc/init.d/samplingserver The responsibility of this Unix shell
script is to launch the
/usr/bin/samplingserver and pass the
command line arguments.
The responsibility of this Python
module is to act as the persistent process to provide the sampling service
to the system.
The responsibility of this Python
module is to provide the sampling for
a specific web page.
The responsibility of this class is to
provide the supervisory functions for
the sampler including starting, stopping, and starting the threads.
The responsibility of this class is to
provide a single processing thread for
the sampling of web pages.
The responsibility of this class is to
download a webpage from the internet and place it into a queue.
The responsibility of this class is to
take the results of a sampling and
analysis and place it into the database.
The responsibility of this class is to
introduce the randomness needed to
ensure that the sampling is a valid
random sampling of the crawled
The responsibility of this class is to
monitor the various threads, take action when they go silent and terminate when a sufficient number of
samples have been taken.
The responsibility of this class is to
take a webpage found in a queue, perform the analysis of that webpage and
place the results in another queue. Relations and Their Properties
The relation type in this view is is-part-of. There are no exceptions or additions to the relations
shown in the primary presentation. Elements Interfaces
/etc/init.d/crawlerserver is called from the Linux command line and accepts arguments… Element Behavior
Not applicable.
8.3.3 Context Diagram
See Figure 1.
8.3.4 Variability Guide
8.3.5 Architecture Background
8.3.6 Other Information
8.3.7 Related View Packets
8.4 Module Decomposition View Packet 4: The SiteURLServer
8.4.1 Primary Presentation
Legend: UML
Figure 11 SiteURLServer Module Decomposition
8.4.2 Element Catalog Elements and Their Properties
/etc/init.d/siteurlserver The responsibility of this Linux shell
script is to launch the
/usr/bin/siteurlserver and pass it the
command line arguments.
The responsibility of this Python module
is to provide the persistent service for
serving up individual URLs within the
scope of the active testrun when requested.
The responsibility of this Python module
is to accept the arguments from the
command line for the urllistid, testrunid
and name of the responsible party for
this test run then initiate the processes
which will execute the test run.
The responsibility of this class is to provide logging for the module.
The responsibility of this class is to provide the main functionality for the persistent SiteURLServer. Relations and Their Properties
The relation type in this view is is-part-of. There are no exceptions or additions to the relations
shown in the primary presentation. Elements Interfaces
The /etc/init.d/siteurlserver module is called from the Linux command line using an argument
which provides a command for the SiteURLServer. That command may be start, stop, or restart.
100 Element Behavior
8.4.3 Context Diagram
See Figure 1.
8.4.4 Variability Guide
8.4.5 Architecture Background
8.4.6 Other Information
8.4.7 Related View Packets
8.5 Module Decomposition View Packet 5: The WAMs
8.5.1 Primary Presentation
Legend: UML
Figure 12 WAMs Module Decomposition
8.5.2 Element Catalog Elements and Their Properties
Element Name
Element Responsibility
/etc/init.d/relaxedwam_0 This logical link provides a unique name that can be used by
subsidiary modules to create multiple instances of the
The responsibilility of this Linux script is to initiate the
/usr/bin/relaxedwam and pass its command line arguments.
The responsibility of this Python module is to verify the environment for the WamServer and then initiate it under
The responsibility of this Python module is to provide the
persistent service for providing analysis of web pages per
UWEM and returning the results of that analysis. It must run
under Jython
The responsibility of this class is to provide a log function to
this element.
The responsibility of this class is to retrieve the group and
user ids from a file and return them.
The responsibility of this class is to register the check function with the SOAP server, perform the individual WAM
calculations and then return those results.
103 Relations and Their Properties
The relation type in this view is is-part-of. There are no exceptions or additions to the relations
shown in the primary presentation. Elements Interfaces
SiteURLServer: interfaces provided
startTestrun (testrunid): void
getSiteURL ():url:string,testrunid:string
ping (): boolean
keepAlive (pid) : void
getSilentSites () : [site,pid]
samplingServer: interfaces provided
WAMServer: interfaces provided
eGovMonDB: interfaces provided
getUnfinishedSites Element Behavior
8.5.3 Context Diagram
See Figure 1.
8.5.4 Variability Guide
8.5.5 Architecture Background Design Rationale Results of Analysis Assumptions
8.5.6 Other Information
8.5.1 Related View Packets
8.6 Module Decomposition View Packet 6: The eGovMonDB
8.6.1 Primary Presentation
Legend: UML
Figure 13 eGovMonDB Module Decomposition
8.6.2 Element Catalog Elements and Their Properties
Element Name
Element Responsibility
The responsibility of this Python module is to accept the arguments from the command line for the urllistid and URL that
should be added to that list in the database.
The responsibility of this Python module is to create a new list
in the database and return that new listid.
This Python module provides the database access functions for
the rest of the system. It is meant to be included by any other Relations and Their Properties
The relation type in this view is is-part-of. There are no exceptions or additions to the relations
shown in the primary presentation. Elements Interfaces Element Behavior
8.6.3 Context Diagram
See Figure 1.
8.6.4 Variability Guide
8.6.5 Architecture Background
8.6.6 Other Information
8.6.7 Related View Packets
Section 9
9.1 Module Uses View Packet 1: The eGovMon System
9.1.1 Primary Presentation
9.1.2 Element Catalog Elements and Their Properties
The elements in this view are the modules of the eGovMon system as defined in Section 8. Relations and Their Properties
The relation type in his view is uses. An element uses another element if the correctness of the
first depends on a correct implementation of the second being present. It is not enough for an element to invoke another element; if the behavior of the invoked element does not affect the correctness of the invoking element, it is not used. Similarly, an element may use another element
that it does not invoke, if it relies on that element to take an action or to perform a function autonomously.
There are no exceptions or additional uses relations among the element in this view packet beyond those shown in the primary presentation.
“The uses relation resembles, but is decidedly not, the simple calls relation provided by
most programming languages…’uses’ is not ‘calls’ or ‘invokes.’ Likewise, ‘uses’ is different from other depends-on relations, such as includes or inherits-from. The includes relation deals with compilation dependencies but need not influence runtime correctness.
The inherits-from relation is also usually a preruntime dependency not necessarily related
to uses.”
[Clements 2003, p68]
“Some C programmers like to compare the Python module import operation to a C #include, but they really shouldn’t—in Python, imports are not just textual insertions of one
file into another. They are really runtime operations that perform three distinct steps the
first time a program imports a given file: Find the module’s file; Compile it to byte code
(if needed); Run the module’s code to build the objects it defines.” [from Learning Python by Mark Lutz by O’Reilly, p 533] Elements Interfaces
Interfaces for the elements shown in this view are specified under the corresponding elements in
the module decomposition view, Section 8. Element Behavior
Not applicable.
9.1.3 Context Diagram
See Figure 1.
9.1.4 Variability Guide
9.1.5 Architecture Background
9.1.6 Other Information
9.1.7 Related View Packets
Parent: None in this view. View packets in other views whose scope is the entire system include
Module Decomposition View Packet 1: The eGovMon System (Section 8, section 8.1)
Children: None
Siblings: None
Section 10
10.1 Module Generalization View Packet 1: The Crawler
10.1.1 Primary Presentation
Legend: UML
Figure 14 Crawler Module Generalization View
10.1.2 Element Catalog
There are no new elements introduced in this view packet. See Section 8 for the
10.1.3 Context Diagram
10.1.4 Variability Guide
10.1.5 Architecture Background
10.1.6 Other Information
10.1.7 Related View Packets
10.2 Module Generalization View Packet 2: The Sampler
10.2.1 Primary Presentation
Legend: UML
Figure 15 Sampler Module Generalization View
10.2.2 Element Catalog
115 Elements and Their Properties Relations and Their Properties Elements Interfaces Element Behavior
10.2.3 Context Diagram
See Figure 1.
10.2.4 Variability Guide
10.2.5 Architecture Background
10.2.6 Other Information
10.2.7 Related View Packets
Section 11
11.1 Module Layered View Packet 1: The eGovMon System
11.1.1 Primary Presentation
Layer 0: Presentation Layer
Layer 1: eGovMon Application Domain
Layer 2: Supporting Projects (HarvestMan, Relaxed, CSSWam, W3CValidatorAWAM,
Layer 3: Packages (Jython, Python)
Layer 4: Middleware (SOAP)
Layer 5: Datalink/Physical (socket,psycopg)
Layer 6: Database (PostgreSQL)
Layer 7: Language (Java, Python)
Layer 8: OS (Debian)
11.1.2 Element Catalog Elements and Their Properties
Common Facilities are defined as
those interfaces and packages shared
across applications. Common Facilities
includes routines to handle regular
expressions (re), parse HTML, URL,
SGML files, and other similar packages.
base64, binascii, common, cStringIO, md5, Queue,
pickle, StringIO, urllib,urllib2,urlparse,urlparser,urlqueue,urltypes,utils,
SOAP, Simple Object Access Protocol,
provides communication services between processes. This middleware
makes it possible to deply services to
different machines within the same
The Language Layer provides the basic
support for the languages used in this
system. This support includes access to
lower level functions from the OS Layer.
Java, Jython, Python, jythonrunner, os, sys, time,
traceback, threading,
Affiliated Projects groups all the functionality needed by eGovMon that was
developed by other project teams.
While nearly COTS, these projects are
open sourced which means that the
packaged version in eGovMon may
have some differences from the normative project release.
Relaxed, Tidy, HarvestMan, W3CValidatorAWAM,
The Database Layer contains the modules needed to provide a persistent
datastore to the system.
egovmondb, psycopg2
OS Layer
The OS Layer contains all modules of
the kernel and the distro. This includes
all networking and file access functions.
Debian 5.3 (Lenny)
118 Relations and Their Properties
The relation in the layered view is allowed-to-use. Software in a layer is allowed to use other
software in the same layer. Software in a layer is allowed to use software in any other layer immediately below, as shown in the primary presentation. Elements Interfaces
[omitted] Element Behavior
Not applicable.
11.1.3 Context Diagram
See Figure 1.
11.1.4 Variability Guide
11.1.5 Architecture Background
11.1.6 Other Information
11.1.7 Related View Packets
Section 12
12.1 Component and Connector Pipe-and-Filter View Packet 1: The eGovMon System
12.1.1 Primary Presentation
Legend: UML
Figure 16 eGovMon System Component and Connector Pipe-and-Filter View
12.1.2 Element Catalog Elements and Their Properties
Properties of eGovMon filter components are given in the following table.
Element Name
a ddOneSi te
a ddURLLi s t
s tartegovmon
s i teurl s erver
cra wl ers erver
cra wl erwra pper
s a mpl i ngs erver
s a mpl er
Wa mServer
Before a tes trun ca n be i ni tia ted, the URLs whi ch
repres ent the ma i n a cces s poi nt to the webs i tes mus t
be a dded to a l i s t. Both the l i s t i d (previ ous l y noted
from ei ther a new l i s t or a pri or l i s t) a nd the URL to be
a dded to the l i s t a re entered.
Before a tes trun ca n be i ni tia ted, a l i s t mus t be crea ted
whi ch wi l l contai n the s i te URLs whi ch wi l l be a na l yzed
duri ng the tes t.
A tes trun i s i ni tia ted by executing thi s procedure wi th
the tes truni d a s a n a rgument.
The da taba s e i s the pri ma ry da ta repos i tory. For thi s
vi ew i t i s trea ted a s a s eri es of fi l ters whi ch a ct more
l i ke pi pes to move the da ta from one fi l ter to a nother.
When the s tartegovmon notifi es the s i teurl s erver tha t a
tes trun i s to s tart, the s i teurl s erver us es the l i s t i d from
the s tartegovmon mes s a ge, pul l s a l l URLs from the
da taba s e a nd then pa s s es them out one a t a time.
The cra wl ers erver i ns tantia tes a s et number of
cra wl erwra pper objects ,
ea ch cra wl erwra pper ha s a port to whi ch reques ts for a
new URL a re di rected. It a l s o ha s a port to whi ch
mes s a ges i ndi ca ting the compl etion of a cra wl a re
di rected.
The s a mpl i ngs erver crea tes the mul tipl e threa ds tha t
ena bl e the pa ra l l el s a mpl i ng of mul tipl e webs i tes . Ea ch
threa d wi l l moni tor a n i nterna l queue of s i tes to be
s a mpl ed. When one i s found i t wi l l s pa wn a s a mpl er
proces s for tha t s i te.
The s a mpl er recei ves the s i te to be s a mpl ed vi a the
a rgument from the comma nd l i ne. Thi s s i te i s us ed to
query the da taba s e a nd get the l i s t of URLs tha t were
found i n the cra wl . Ea ch of the URLs i n thi s l i s t i s
s a mpl ed until the needed number of s a mpl es ha ve
been compl eted. The s a mpl er a l s o s el ects a Wa mServer
a nd reques ts i t to perform a n a na l ys i s to ca l cul a te the
metri cs for thi s s i te.
When a s ked to eva l ua te a url , the Wa mServer wi l l
downl oa d the webpa ge a t tha t URL a nd perform the
a na l ys i s per UWEM on thi s pa ge a nd returns thes e
res ul ts to the s a mpl er proces s .
Thi s pa rtition of the egovmondb proces s recei ves the
i ns ert comma nds from the s a mpl er a nd upda tes the
da taba s e wi th thos e webs i te metri cs for l a ter a cces s .
122 Relations and Their Properties
The relation of this component and connector view is attachment, dictating how components and
connectors are attached to each other. The relations are as shown in the primary presentation;
there are no additional ones. Those attachments to the eGovMon container represent external interfaces to the system. The attachment from the environment to the addOneSite, addURLList,
and startegovmon processes are a command line interfaces. The connector to the egovmondb process represents a SQL interface. Elements Interfaces
addOneSite is called from the Linux command line as a Python script (python
args). It accepts seven arguments. Argument 1 is a Boolean (true/false) which signals whether the
site being added is known to be a small site (fewer than xx pages). Argument 2 is a string which
will be used to label the site as a title. Argument 3 is the ISO code, as a string, which should be
used to categorize this website. Argument 4 is a string which should be the county name to be
associated with this website. Argument 5 is a string which should be the country name to be associated with this website. Argument 6 is a string which is the URL for this website starting without
the transport identifier (i.e., for example). Argument 7 must be a listid that had
been previously established by the system using the addURLList command. It will be the list to
which this URL will be associated. The response from this script will be a console message indicating a successful add.
addURLList is called from the Linux command line as a Python script (python
args). It accepts two arguments. Argument 1 is a string which should contain the name of the
source for this list. Argument 2 is a string which will be labeled as the contact person (for the
source of the list? the researcher who created the list?). The response from this script will be a
console message indicating that the list has been created with the number for this list which is
referred to as the urllistid.
startegovmon is called from the Linux command line like a shell script (startegovmon args). Argument 1 is the urllistid as assigned by the addURLList command. Argument 2 is a string which
is to be the label for this testrun and is referred to as the testrunid. Argument 3 is a string which
should be the party responsible for this testrun. The process will respond with a message echoing
the information entered and confirming that the testrun has been started.
egovmondb is the database management system. It is accessed directly by researchers via its
SQL interface. Element Behavior
12.1.3 Context Diagram
12.1.4 Variability Guide
12.1.5 Architecture Background
12.1.6 Other Information
12.1.7 Related View Packets
Section 13
13.1 Component and Connector Shared-Data View Packet 1: The eGovMon System
The eGovMon System cannot be represented in a pure shared-data architecture style without abstracting away significant parts of its functionality, in particular the WAMs. However its use of a
database to house persistent data does cause significant portions of the functionality to be in this
13.1.1 Primary Presentation
Figure 17 eGovMon System Component and Connector Shared-Data View
13.1.2 Element Catalog Elements and Their Properties Relations and Their Properties Elements Interfaces Element Behavior
13.1.3 Context Diagram
13.1.4 Variability Guide
13.1.5 Architecture Background
13.1.6 Other Information
13.1.7 Related View Packets
Section 14
14.1 C&C Communicating-Processes View Packet 1: eGovMon
In this view the eGovMon system is viewed as a collection of communicating processes.
14.1.1 Primary Presentation
Legend: UML
Figure 18 eGovMon Communicating Processes View
14.1.2 Element Catalog
The primary element catalog is found in the Module Decomposition viewtype in Section 8. The
following information elaborates on the connections between the modules.
User command line call with URL passed as parameter
User command line call with list-id passed as parameter
User command line call with new list-id returned
A DB call to add URL information to the database
A DB call to mark test-run as started in the database
A DB call to add list-id information to the database
A SOAP call to a method which will initiate the processing
A DB call to get the next site to crawl from the database
A SOAP call to get the next site to crawl
An OS call to spawn the new process and pass the site parameter
A SOAP call to signal that the site has been crawled and is ready for sampling
An OS call to spawn the new process and pass the site parameter to it
A SOAP call to an available WAMserver to request evaluation of a page and receive the result
A DB call to update the database with the result of the sampled evaluation
DB calls to retrieve the results of the evaluation (out of scope)
14.1.3 Context Diagram
See Figure 1
14.1.4 Variability Guide
14.1.5 Architecture Background
14.1.6 Other Information
14.1.7 Related View Packets
Section 15
15.1 Allocation Implementation View Packet 1: Subversion Distribution
15.1.1 Primary Presentation
the structure of the Subversion directory
Banco de Portugal.html
eGovMon – Revision 2053, copied March
23, 2010
15.1.2 Element Catalog
15.1.3 Context Diagram
15.1.4 Variability Guide
15.1.5 Architecture Background
15.1.6 Other Information
15.1.7 Related View Packets
Section 16
16.1 Allocation Deployment View Packet: The eGovMon System
16.1.1 Primary Presentation
The primary presentation for the Allocation Deployment View matches Figure 1 when it is deployed to a single processor. See section below on variability for alternative deployments.
16.1.2 Element Catalog
16.1.3 Context Diagram
See Figure 1
16.1.4 Variability Guide
This system can be deployed to a single processor or deployed to multiple processors depending
upon the performance needs. For this project, deployment to a single box provided sufficient capacity. However the following components can be deployed to separate processors if desired:
the database
Section 17
This view is omitted for this project.
Pre-Existing Architecture Documentation for EIAO
[Bass 2003] Software Architecture in Practice, Second Edition, L Bass, P Clements, R Kazman,
ISBN: 0-321-15495-9, Addison-Wesley, 2003
[Clements 2003] Documenting Software Architectures: Views and Beyond, P Clements, F
Bachmann, L Bass, et al, ISBN: 0-201-70372-6, Addison-Wesley, 2003
[Carriere 1998] The Perils of Reconstructing Architectures, S. Jeromy Carriere, Rick Kazman,
ACM, 1998, 1-58113-081-3/98/11)
[Finnigan 1997] The Software Bookshelf, P J Finnigan, R C Holt, I Kalas, et al, IBM Systems
Journal, vol 36 no 4 pp 564-593, 1997
[Goodwin-Olsen 2009] eGovMon System Design Specification D4.2.1, M Goodwin-Olsen, final
version 11/25/2009, eGovMon Project no: Verdikt 183392/S10, copied from April 2010
[Guo 1999] Guo, G. Y., Atlee, J. M., and Kazman, R. 1999. A Software Architecture Reconstruction Method. In Proceedings of the Tc2 First Working IFIP Conference on Software Architecture (Wicsa1) (February 22 - 24, 1999). P. Donohoe, Ed. IFIP Conference Proceedings, vol.
140 pp. 15-34. Kluwer B.V., Deventer, the Netherlands, 15-34.
[Ivers 2004] Documenting Component and Connector Views with UML 2.0, James Ivers, Paul
Clements, David Garlan et al, Technical Report CMU/SEI-2004-TR-008, April 2004, Carnegie
Mellon Software Engineering Institute, Pittsburgh, PA
[Jansen 2007] Documenting after the fact: Recovering architecture design decisions, A Jansen, J
Bosch, P Avgeriou, The Journal of Systems and Software vol 81 (2008) pp536-557, Aug 2007
[Kazman 2003] Architecture Reconstruction Guidelines, Third Edition, R Kazman, L O’Brien, C
Verhoef, CMU/SEI-2002-TR-034, November 2003, Carnegie Mellon Software Engineering Institute, Pittsburgh, PA
[Nord 2009] A Structured Approach for Reviewing Architecture Documentation, Technical Note
CMU/SEI-2009-TN-030, Dec 2009, Carnegie Mellon Software Engineering Institute, Pittsburgh, PA
[Nietzio 2006] 2nd Version of ROBACC WAMs: Deliverable Number D.3.2.1, A Nietzio, EIAO,
accessed from February 2010, version 1.0,
Oct 13, 2006
[O’Brien 2002] Software Architecture Reconstruction: Practice Needs and Current Approaches,
L O’Brien, C Stoermer, C Verhoef, CMU/SEI-2002-TR-024, August 2002, Carnegie Mellon
Software Engineering Institute, Pittsburgh, PA
[O’Brien 2003-a] Architecture Reconstruction Case Study, L O’Brien, C Stoermer, CMU/SEI2003-TN-008, April 2003, Carnegie Mellon Software Engineering Institute, Pittsburgh, PA
[O’Brien 2003-b] Architecture Reconstruction of J2EE Applications: Generating Views from the
Module Viewtype, L O’Brien, V Tamarree, CMU/SEI-2003-TN-028, November 2003, Carnegie Mellon Software Engineering Institute, Pittsburgh, PA
[Snaprud 2006] A proposed architecture for large scale web accessibility assessment, Mikael
Holmesland Snaprud, Nils Ulltveit-Moe, Anand Balachandran Pillai, Morten Goodwin Olsen,
2006, ICCHP 2006, LNCS 4061, pp 234-241, Springer-Verlag Berlin Heidelberg
[Stoemer 2007] Software Quality Attribute Analysis by Architecture Reconstruction
(SQUA3RE), Christoph Hermann Stoermer, doctoral dissertation, Vrije Univ, 16March2007,
accessed online via March 18, 2010
(local filename SQUA3RE)
[Tahvildari 2003] Quality-driven software re-engineering, Ladan Tahvildari, Dostas Kontogiannis, John Mylopoulos, The Journal of Systems and software 66(2003) pp225-239, accepted
15May2002, Elsevier Science Inc)
[Ulltveit-Moe 2008] Architecture for large-scale automatic web accessibility evaluation based on
the UWEM methodology, Nils Ulltveit-Moe, Morten Goodwin Olsen, Anand B Pillai, Christian
Thomsen, Terje Gjoesaeter, Mikael Snaprud, 2008, presented at NIK 2008 conference, available at
(This page intentionally devoid of content. Replace with blank page for final production.)