Abstract

advertisement
Abstract
Parallel and distributed computing environments have gained popularity especially because such systems offer many advantages over centralized sequential systems. Reduced
incremental cost, better reliability, extensibility, better response and performance are
among their potential advantages. However, due to their non-deterministic behaviour and
huge size, understanding the execution behaviour of distributed systems is a major problems in developing such systems. Program visualization has proven to be an important aid
in understanding, debugging, and performance tuning of distributed systems.
In this thesis project, program monitoring, visualization and debugging of distributed computing systems are presented; and a monitoring and visualization system is developed to
collect run-time information about the SOBER distributed system’s execution behaviour
and to present the information in a logical and meaningful way using 3D graphics. To capture the information necessary to drive the visualization, a monitor server is developed so
that every SOBER application task is virtually connected to and send a copy of all successfully sent or received messages via a separate virtual network dedicated to the monitoring.
The collected data is transmitted to the visualization modules for processing and displaying. Our visualization system reveals, among other information about the SOBER execution behaviour, the state of SOBER applications and statistical information about
interprocess communication and synchronization operations in the SOBER distributed system.
I
Acknowledgment
I would like to thank all who have supported me in the process of writing this thesis. I am
deeply indebted to my supervisors Mr. Rune Torkildsen and Prof. Sverre Storøy for their
invaluable criticisms, discussions, suggestions, and encouragements without which this
piece of work would not have been completed. I am also grateful to Christian Michelsen
Research (CMR) for providing me with such a conducive research condition, and the facilities and resources necessary for this project. People in the Advanced Computing Section,
at CMR in general and Mr. Kåre P. Villanger and Mr. Frode Oldervoll in particular deserve
a special gratitude for their day-to-day encouragements and technical support.
Finally, I would like to thank my friends Mr. Shimelis Lemma and Mr. Esmael Musema
for devoting their precious time to proof-reading drafts of this thesis and for their invaluable comments and criticisms.
II
Table of Contents
Abstract ................................................................................................................................ I
Acknowledgment................................................................................................................ II
1
Introduction............................................................................................................... 1
1.1 The Problem ....................................................................................................... 1
1.2 Background ........................................................................................................ 2
1.3 Definitions and Abbreviations............................................................................ 4
2
Distributed Computing ............................................................................................ 5
2.1 Classification of Distributed Systems ................................................................ 6
2.2 Structure of Distributed Systems........................................................................ 7
2.2.1 Interconnection Networks ....................................................................... 7
2.2.2 Network Topologies ................................................................................ 8
2.2.2.1 Bus Topology ............................................................................ 9
2.2.2.2 Star Topology ............................................................................ 9
2.2.2.3 Ring Topology........................................................................... 9
2.3 Architectural Models ........................................................................................ 10
2.3.1 Client/Server Model .............................................................................. 10
2.3.2 Processor Pool Model............................................................................ 11
2.3.3 Integrated Model ................................................................................... 11
2.4 Communication and Synchronization .............................................................. 11
2.4.1 Communication Primitives.................................................................... 12
2.4.2 Synchronization Primitives ................................................................... 12
2.5 Clock Synchronization ..................................................................................... 13
3
Program Monitoring .............................................................................................. 16
3.1 Types of Program Monitoring Systems............................................................ 16
3.1.1 Software Monitoring Systems ............................................................... 16
III
3.1.2 Hardware Monitoring Systems.............................................................. 18
3.1.3 Hybrid Monitoring Systems .................................................................. 19
3.2 Program Monitoring Techniques...................................................................... 20
3.3 Monitoring Distributed Systems ...................................................................... 20
3.4 Abstraction Levels in Program Monitoring...................................................... 21
3.4.1 Process Level Monitoring ..................................................................... 22
3.4.2 Function Level Monitoring ................................................................... 24
3.5 Interference Due to Monitoring........................................................................ 26
3.6 Perturbation Analysis ....................................................................................... 27
4
Program Visualization............................................................................................ 28
4.1 Program Visualization Techniques .................................................................. 28
4.2 Statistical Displays ........................................................................................... 29
4.3 Communication Views ..................................................................................... 29
4.4 Animations ....................................................................................................... 30
4.5 Application-specific Visualization ................................................................... 30
5
Debugging and Testing........................................................................................... 32
5.1 Program Debugging ......................................................................................... 32
5.2 Program Debugging techniques ....................................................................... 32
5.2.1 Static Analysis....................................................................................... 32
5.2.2 Dynamic Analysis ................................................................................. 33
5.2.2.1 Memory dumps........................................................................ 33
5.2.2.2 Tracing..................................................................................... 33
5.2.2.3 Breakpoints.............................................................................. 34
5.3 Performance Measurement............................................................................... 34
5.4 Debugging Distributed Systems ....................................................................... 35
5.5 Chapters Review............................................................................................... 36
IV
6
The SOBER Visualization System ........................................................................ 38
6.1 The SOBER System ......................................................................................... 38
6.1.1 Components of the SOBER System...................................................... 39
6.2 Monitoring Framework for SOBERvis ............................................................ 42
6.3 Visualization Framework for SOBERvis ......................................................... 43
6.3.1 Communication Displays and Statistical Displays................................ 44
6.3.2 On-line vs. Off-line Visualization Approach ........................................ 46
6.4 Design and Implementation of SOBERvis....................................................... 46
6.4.1 Graphical User Interfaces...................................................................... 46
6.4.2 Monitoring Routines.............................................................................. 52
6.4.3 Visualization Routines .......................................................................... 53
7
Summary and Conclusion...................................................................................... 54
8
Bibliography............................................................................................................ 56
9
Appendix.................................................................................................................. 60
V
1
Introduction
1.1
The Problem
The SOBER system is a Crisis Management Training System simulator developed at the
Christian Michelsen Research (CMR) in cooperation with the Norwegian Under-water
Technology (Nutec) Training Centre, and Siemens Nixdorf Information Systems
[SAND95]. The SOBER system is currently installed on twelve networked Silicon Graphics Indigo2 workstations at Nutec in Bergen, Norway; and is being used to train offshore
and maritime personnel in handling emergency situations. In section 6.1 we present the
structure and components of the SOBER system in more detail.
The SOBER system is a distributed system. It is organized as a network computing and the
current topology of the underlying network is a single-star - one central switching server
node at the centre and several application nodes virtually connected to the central switching
server node. Currently, if a computing node, either an application node or the server node,
in SOBER system is suspended by accident or otherwise, other nodes continue to send, or
receive messages, or wait for a response from the faulty nodes. This may cause some nodes
to wait indefinitely, which in turn may result in system malfunctioning such as deadlock.
One way to address this problem is to use time-out technique, but the system may suffer
from a certain degree of execution speed penalty. Especially, if the communication operations are synchronous or blocking, and a time-out mechanism is not used, the problem will
be aggravated. In the current implementation of the SOBER system, neither the instructor
nor the trainee can detect such a faulty node until it will be too late to recover from the error.
Especially, if the faulty node is the switching node, the whole system crashes and the whole
system must be restarted from the scratch.
Developing a mechanism to detect and to report a faulty node as early as possible by monitoring the SOBER system is a novel way to address this problem. The SOBER visualization system (SOBERvis) provides such a support by displaying the graphical entity that
represents the faulty node using a different colour from the one used for displaying the entity that represents a node that is functioning properly. In the virtual network view of the
SOBERvis system, a faulty SOBER node is drawn using the red colour.
A mechanism to recover from such an error can also be integrated into program visualization system. One hypothetical solution to implement this mechanism is to select, randomly
1
or otherwise, one of the application nodes and appoint it to function as a central switching
node. This operation can be integrated into a program visualization system. However, the
issue of dynamically maintaining a faulty computing node of the SOBER distributed system is out of the scope of this thesis project. The main objectives of this project are:
1.2
•
to study distributed program monitoring and visualization techniques
•
to develop a program monitoring and visualization system that collects run-time
information and present the information using 3D graphics
•
to collect run-time information about the SOBER distributed system and to
present the collected information by using 3D graphical entities in order to assist
system programmers and users understand the execution behaviour of the system, and isolate communication bottleneck.
Background
Distributed computing has provided effective solutions to many challenging problems in
recent years and it has evolved into a popular and effective mode of high-performance computing. Increased availability, performance, reliability, low cost, and high scalability are
among the potential benefits of distributed computing system.
Distributed computing, however, is not without its share of obstacles [TOPO96]. In distributed computing, applications execute on workstations that have varying capabilities and
configurations in terms of CPU speed, memory capacity, local vs. networked disks. This
may have a negative effect on system performance as the computing and storage capacity
is bounded by the smallest storage of a workstation in the system. Moreover, if the environment is an open distributed environment each workstation as well as the whole network
itself, is potentially subject to an uncontrollable external load that often results in load imbalances and dynamic fluctuations in delivered resources which can be a major cause of
performance degradation.
The architecture of distributed computing environments is different from that of a sequential program, and hence requires a different approach to measuring and characterizing their
performances, to monitoring applications’ progresses, and to understanding program execution behaviour. Source code browsing and tracing approaches to understanding distributed program are tedious, often ineffective, and hence inapplicable.
Program visualization has been shown to be a novel and highly effective approach to assist
program understanding, debugging, and performance test [WILL93][TOPO96]. Extending
2
and adapting program visualization to distributed computing systems can aid in understanding the complex communication and data flow among the components of distributed
systems. Presently, however, the use of visualization has had only limited use for enhancing the design and development of distributed computing systems. Topol et al. [TOPO94]
hypothesize that one of the primary reasons for the limited use of visualization tools in developing distributed computing system is the difficulty in acquiring the information necessary to drive the visualization. To obtain this data, program visualization systems require a
monitoring mechanism to collect run-time information from the target system. On the other
hand, without visualization support, understanding the data derived by monitoring program
execution is tedious and complex.
Since program monitoring and visualization are highly dependent on each other, in this thesis project we study different techniques of monitoring and visualization of distributed systems and we develop a monitoring and visualization system to capture run-time information from the SOBER distributed system and to display it by using 3D graphics entities. The
graphical displays convey information such as the virtual communication network topology of the SOBER distributed system, and communication statistics among its computing
components.
This report is organized into 9 chapters each of which addresses a given issue(s), of course
the issues are interrelated. In chapter 2 we introduce distributed computing environments
and give an overview of architectural models and basic communication operations in distributed systems. In chapters 3, 4 and 5 we address program monitoring, visualization, and
debugging concepts. We also discuss some general principles and techniques of monitoring, visualization, and debugging and provide some typical examples of distributed monitoring and visualization systems. In chapter 6, we address our target system - the SOBER
distributed system, and the SOBER visualization system - SOBERvis. We present the monitoring and visualization techniques employed and the frameworks developed in the SOBERvis system. In Chapter 7, a short summary of this project, concluding remarks, and recommended future work are discussed. Chapter 8 contains a complete list of references sited
in this thesis. The list of source code for the SOBERvis system is appended to this report
as chapter 9.
Before we spark our discussion on the major topics, we define some terms and concepts
that will be encountered as we read through the thesis. This is important, we believe, not
3
only to avoid ambiguities about the concepts and the words but also to have the necessary
tools to easily read through the thesis.
1.3
Definitions and Abbreviations
Target program is a program from which a run-time information is collected and visualized. That is a program to which a monitoring, a visualization, or a
debugging system is applied.
Distributed computing system is a set of several processes running on different processors
working towards a specific functional requirement.
Program monitoring is a mechanism by which run-time information about an execution
behaviour of a target program is collected.
Program visualization is a graphical presentation of the monitored data and the illustration
of run-time program behaviour.
Program debugging is the process of detecting, locating, analysing, isolating and correcting suspected system faults in the target program.
SOBER is an abbreviation for StatOil “Beredskapstrener” - a Norwegian term for Emergency Trainer.
4
2
Distributed Computing
“A person with one watch knows what time it is; a person
with two watches is never sure.” Anon, [MANB89]
In this chapter we discuss basic concepts associated with the distributed computing environment to provide a necessary background to understand the other issues around this environment that will be introduced later in this thesis. Some general properties of distributed
computing environment such as its structures and architectural models, and classification
of distributed systems are discussed. Finally, the two most important operations in distributed computing systems, namely, the communication and the synchronization operations
are also discussed in this chapter. For a more detailed discussion on distributed computing
environments, it is advisable to refer to [COUL88], [SHAR87] and [TSAI96].
A distributed computing system has several processes running on different processors
working towards a specific functional requirement. The distributed processes are coordinated by an interprocess communication protocol and synchronization mechanisms.
The evolution of distributed computing environment on a networked collection of computer systems into a popular and effective mode of high-performance computing is due to its
potential advantages in increased performance by executing several processes in parallel:
increased availability because a process is more likely to have a resource available if multiple copies exist, increased reliability because the system can be designed to recover from
failures, increased adaptability because components can be added or removed easily, low
cost because expensive resources can be shared, and robust programming models and environments [TSAI96]. In distributed computing environment, the same programming models and methodologies can be used across a wide variety of platforms, ranging from stacks
of headless workstations with high speed interconnections to collections of desktop systems to geographically distributed hierarchies of machines of multiple architecture types
[TOPO96].
The distributed computing environment has also some major drawbacks peculiar to it. Lars
[LARS90] discusses two problems specifically related to distributed systems. Firstly, the
size of a distributed computing system often becomes physically very large and logically
complex, making it difficult to handle. The processors can be spread out over a large geo-
5
graphical area, and communicate with each other by message-passing paradigm, unless a
special hardware is used. This makes the controlling of the processors more difficult. The
size of the program is also another problem factor. Usually, the size of a program code is
very large and difficult to manage and puts high demands on the programming language
and the programming tools. Secondly, distributed programs have a non-deterministic behaviour which makes their executions difficult to reproduce. Non-determinism is caused by
system factors that cannot be directly foreseen and controlled by the programmer.
Another drawback of distributed computing systems is their loss of flexibility in the allocation of memory and processing resources. In centralized computer systems or in tightly
coupled multi-processor systems all of the processing and memory resources are available
for allocation by the operating system as required by the current workload. In distributed
systems, however, the processor and memory capacity of the workstations determine the
largest task that can be performed.
Data security is also another problem in distributed computing systems. To achieve high
extensibility, many of the software interfaces in distributed systems are made available to
clients. Any client that has access to the basic communication services can also have an access to the interfaces to servers. To protect the services against intentional and accidental
violation of access control and privacy constraints, software security measures are needed.
Recent work on software security, data encryption, and capability-based access control offer appropriate solutions.
2.1
Classification of Distributed Systems
Distributed systems can take a variety of forms and different researchers classify them into
different categories depending on different aspect of the systems. Sharp, for example, classify distributed systems depending on the degree of distribution in hardware, control, and
data [SHAR87]. The hardware distribution can range from a single central processing unit
(fully centralized) to multiple computers (fully decentralized); the control distribution can
range from a single control unit to multiple control units which are fully cooperating by
message-passing mechanism; and the data distribution can range from a single copy located
at a central storage location to a distributed database with no central master file or directory.
Sharp uses three axes to represent different levels of decentralization of the three components and defines a system with the highest degree of decentralization in all the three components as fully distributed system.
6
Tsai et al. classify distributed systems as homogeneous and heterogeneous depending on
the architecture of their computing nodes [TSAI96]. In a homogeneous distributed system,
all the computing nodes have the same architecture and supporting software. In contrast,
nodes in a heterogeneous distributed system may have different architecture and/or supporting software. Tsai et al. also classify distributed systems as centralized and decentralized based on the relationship among their computing nodes. In a centralized distributed
system the distinct computing nodes have workstation/server or client/server relationship,
whereas in decentralized distributed systems each computing node is autonomous.
As there exists a wide variety of distributed systems, there is no any debugging technique
that is applicable to all systems that have different architecture, though several debugging
techniques can be applied to a wide range of distributed systems.
In light of the above classification of the distributed systems, the SOBER system can be
classified as a centralized heterogeneous distributed system. Each computing node in the
SOBER system is an autonomous workstation that is virtually connected to the central
switching server node and the message server node (see section 6.1.1).
2.2
Structure of Distributed Systems
2.2.1
Interconnection Networks
The performance and reliability of a distributed computing system is highly dependent on
the performance and reliability of the underlying network [COUL88]. A failure of the underlying network causes the service to users to be interrupted. Overloading of the network
degrades the performance and responsiveness of the system to the users. Thus, much effort
is spent on designing reliable and fault-tolerant networks. Since network failure occurs very
infrequently in practice, this drawback remains theoretical.
In distributed computing systems, there are two main categories of interconnection networks: a single connection system, such as a bus or a ring, and a multiple connection path
system, such as a multiple bus, a star or a mesh. In broader terms, networks can be classified
as either store-and-forward or broadcast. In a store-and-forward network, a message or a
packet is received in its entirety by a node, placed in a buffer, and forwarded to the adjacent
node, if the message is not addressed to the node. The computing nodes in a store-and-forward network are interconnected by independent point-to-point transmission lines. The
store-and-forward networks are used with a wide-area networks (WAN). In broadcast network, all nodes are connected to a common transmission medium and so a single message
7
transmitted by a given node will reach all the other nodes. The broadcast network is used
mostly in local-area networks (LAN).
Another type of interconnection network is the terminal network used to connect a variety
of terminals and printers to a central computer. In this centralized point-to-point starlike
network, the central computer communicates with each terminal over slow but cheap dedicated data transmission wires.
Several network technologies and architecture have emerged with adequate performance to
support distributed systems. The most widely used local network technology for distributed
systems is the Ethernet. Ethernet is based on broadcasting over a simple passive circuit,
with a single high-speed cable linking all of the computers in the network. Another class of
network technology is the slotted ring. In slotted ring, all of the computers in the network
are linked to a ring structure and data is transmitted in small fixed-size packets by passing
it from node to node around the ring. Another ring network technology, known as token
ring, can accommodate larger and variable-size packets.
The performances of the Ethernet and the ring networks are almost the same. The token ring
has a higher channel utilization under high loads and can provide a guarantee of service
within a fixed time, whereas the Ethernet provides higher performance for the transmission
of large volumes of data under light loads. In practice, both networks have been used for
the construction of a variety of distributed computer systems. The differences in architecture are not evident above the lowest levels of network software.
Since all local networks are designed to provide direct communication between any two
hosts, the topology used has relatively little influence on system behaviour as seen by the
user. Virtually all successful high-speed local networks have been structured as either rings
or buses [COUL88].
2.2.2
Network Topologies
A network topology defines the interconnection structure of nodes and links. The network
topology influences the incremental cost of adding another node, the ease to modify the topology, the dependency on a single component of the network, the complexity of the protocols needed, the throughput and delays, and the ability to broadcast data [SLOM87]. In
this subsection we discuss different topologies of computer networks.
8
2.2.2.1
Bus Topology
In networks with the bus topology there is a circuit composed of a single cable or a set of
connected cables passing near all of the hosts on the network. When more than one cable
is used the connections are made by repeaters - a simple amplifying and connecting units
that have no effect on the timing or logical behaviour of the network. The cable is passive
and each host has a drop cable connected to the main cable by a T-connection or tap. Data
is transmitted over the cable to which all hosts have an access. A limitation of buses is that
they are not scalable to connect a large number of processors because the single bus forms
a communication bottleneck.
Since there is no master node to arbiter access to the bus, each node must listen to the bus
before sending or receiving a message. To receive a message, a node looks for a message
addressed to it. To send a message, a node listens to the bus to make sure that the bus is
free. If two or more nodes have been waiting to send a message, then a collision may occur.
The transmitting nodes can detect the collision and attempt to re-send the message after a
random period of time. An increase in number of collisions degrades the throughput of the
network.
2.2.2.2
Star Topology
In a network with star topology, all nodes are connected via a single link to a central switching node. The star topology has a low expansion cost, simple table lookup routing in the
switching node, and a maximum delay of only one intermediate node. The star topology is
commonly used for connecting terminals to a central computer.
The main drawback of star topology is its poor reliability, because a failure of a link isolates
a node. Failure in the central switching node stops all communications, and hence redundancy is sometimes provided at the switching node. Throughput of the network is bounded
by that of the central switching node, which may be a bottleneck. Since the virtual network
topology of the SOBER distributed system is a single-star, the SOBER system may suffer
from this limitation.
2.2.2.3
Ring Topology
In networks with a ring topology the cable is made up of separate links connecting adjacent
nodes. Data is transmitted in one direction around the circle by signalling between nodes.
The node that has the token can send messages. The token is passed from node to node until
9
the node that needs to transmit a message is encountered. Communication software is simple since routing is simple. The delays depend both on the number of nodes in the ring and
the number of bits buffered by each node, typically 1 to 16 bits.
An advantage of the ring topology is that there is no starvation and no deadlock; each node
has its turn to possess the token, and only one node at a time is allowed to do so. Prioritybased access can be established. A disadvantage of ring topology is the effort required to
manage the token, since a disappearance of a token causes the whole network to crash.
Since the amount of time a node can use a token is unbounded, it is impossible to detect if
the token is lost unless time-outs are used. Another disadvantage of the ring is that if a single node fails, the entire ring fails. To detect a node failure, the node receiving a token
should acknowledge its receipt. If a node does not receive an acknowledgement after a certain amount of time, then a failure can be assumed.
2.3
Architectural Models
Knowing the architecture of a distributed system can aid in analysing the system, and in
making a good choice of a monitoring technique. Architectural models are also useful in
classifying distributed system and in analysing their execution properties. Coulouris
[COUL88] presents three architectural models of distributed systems; namely client/server, processor pool, and integrated models. The majority of distributed systems are based
on the client/server model and so is the SOBER system (see section 6.1).
2.3.1
Client/Server Model
In client/server model each user is provided with a single-user workstation, usually known
as a client. Application programs are executing on the users’ workstations. The need for
workstations is based primarily on user interface requirements in application tasks. Other
factors affecting the division of tasks include the need for sharing data between users and
applications, leading to a need for shared file servers and directory servers; for sharing expensive peripheral devices such as high-quality printers, and for specialized device servers.
The workstations may be of several different types, e.g. some standard workstations and
some high-performance workstations. They are integrated by the use of communication
software enabling them to access the same set of servers. The servers provide access to
shared devices, files and other networked resources. For example, an authentication service
is usually provided to validate user identities and to authorize them to use system resources
10
and a network gateway service is often available to offer an access to wide-area networks
to all of the workstations on a local network.
2.3.2
Processor Pool Model
In the processor pool model, programs are executed on a set of computers managed as a
processor service. Users are connected to the network via terminal connectors and interact
with programs via a terminal access protocol.
The potential advantages of this model include an efficient utilization of resources by using
only as many computers as the number of users simultaneously logged in, flexibility by allowing expansion without installing more computers, compatibility, and use of heterogeneous computers. A substantial drawback of the processor pool model is the restricted
mode of user interaction imposed by the use of terminals rather than workstations.
Despite these advantages, the processor pool model does not satisfy the needs of high-performance interactive programs, especially when graphics is used in the application
[COUL88]. Even when a terminal is connected to a host computer via a high-bandwidth
local network, the speed at which graphical data can be transferred to the screen is too low
for many interactive tasks.
A hybrid model includes some workstations for interactive use, some processors and a variety of servers. The hybrid model is based on the client/server model, but with the addition
of pool computers that can be allocated dynamically for tasks that are too large for workstations or tasks that require several computers concurrently.
2.3.3
Integrated Model
The integrated model brings many of the advantages of distributed systems to heterogeneous networks containing single-user and multi-user computers. Each computer is provided
with appropriate software to enable it to perform both the role of a server and an application
processor. The system software located in each computer is similar to an operating system
for a centralized multi-user system, with the addition of networking software.
2.4
Communication and Synchronization
A distributed system consists of a collection of distinct computers which are spatially separated, and connected by a network making it possible to exchange messages among the
processes running on the computers. Communication and synchronization allow distribut11
ed system’s processes to be coordinated [SLOM87]. Synchronization is a mechanism by
which two or more processes are coordinated with respect to time, for example, by sequencing events or by granting a process an exclusive access to a resource. Communication refers to an exchange of information among the processes and does not necessarily imply synchronization.
2.4.1
Communication Primitives
The most basic communication operations are the send and the receive operations. The
simplest receive operation blocks: that is the receiving process waits until the message arrives. Blocking provides a synchronization mechanism. The blocking receive operation can
cause problem in case the message does not arrive. There are other receive operations with
time-out conditions, i.e. if the message is not available in a given time interval, then the receive operation is aborted and the next operation is executed.
A send operation can be either asynchronous or synchronous. In an asynchronous send operation, the sending process sends a message and continues the execution of the next instruction and does not wait for an acknowledgement from the receiving process. In the synchronous send operation, the sending process waits for an acknowledgment from the
receiving process. Obviously, the synchronous send provides synchronization as well as
communication. Since the synchronous send is blocking, if the receiving process is delayed, the sending process will also be delayed. Worse, if the receiving process fails, then
the sending process will hang. Therefore, a mechanism to prevent these situations is necessary. A bidirectional transactions are frequently used in client/server communications.
2.4.2
Synchronization Primitives
Processes in a distributed system can communicate with each other synchronously or asynchronously. For synchronous communication, the sender and receiver must be synchronized. The sending process sends a message to the receiving process, and then waits for an
acknowledgement from the receiving process that the message has been received. In asynchronous communication the sending process does not wait for an acknowledgement from
the receiving node.
Processes on the same node can communicate via shared memory. A semaphore is an interprocess communication primitive that is intended to let multiple processes synchronize
12
their access to the shared memory segment. If one process is reading into some shared
memory, for example, other processes must wait for the read operation to finish before
processing the data.
A single binary semaphore is a semaphore with a value that can be either zero or one. To
obtain a resource that is controlled by a semaphore, a process needs to test its current value,
and if the value is greater than zero, it decreases the value by one (the P operation). If the
current value is zero, the process must wait until the resource is released. To release a resource that is controlled by a semaphore, a process increases the semaphore value by one
(the V operation). Semaphores are discussed in more detail in [STEV90].
2.5
Clock Synchronization
In a single-processor or a tightly-coupled multiprocessor system, there is only one system
clock. Therefore, it is guaranteed that an event that is timestamped with an earlier time value occurred before an event that is timestamped with later time value. However, in distributed systems, since each node has its own local clock which may have a different reading
from the clocks on the other nodes, for two events that occur on different nodes there is no
guarantee that an event with an earlier timestamp occurred before an event with a later
timestamp. Hence, to maintain causality relationship among the events in a distributed system, we need a mechanism to synchronize the clocks of the nodes.
For a meaningful visualization of program execution behaviour, the events’ timestamps
should be as accurate and consistent across the processors as possible. Since each node has
its own local clock, its own starting time, and its own execution rate, it is necessary to implement clock synchronization. Poor clock resolution or synchronization can lead to what
is called tachyons in the trace files - messages that appear to be received before they are
sent [HEAT91]. A tachyon is a hypothetical particle that travels faster than light.
Timestamping events by readings of the physical clock of each node totally orders the
events on each node. However, due to the drifting nature of the quartz controlled oscillators, no two physical clocks run at exactly the same rate. This means that a perfectly accurate global clock cannot be implemented without additional hardware support.
The lack of a global clock makes it impossible to establish the order of two events in a distributed system unless there is a causal relationship between them. Events in the same proc13
ess form a sequence determining the order, usually known as partial ordering. For two
events that occurred in two different nodes, there must have occurred an event involving
both nodes after one of the events and before the other one. Examples of such events involving more than one node include process creation, process termination, and communication events [LAM78].
In monitoring distributed computing systems, a monitor is attached to each node of the target system to detect occurrence of events of interest on that node and to record relevant
event data (see section 3.3). To reconstruct the global state of the target system, a global
time reference is required to timestamp the events with a global clock reading. That is, to
order all the events that occurred in the system, we need to timestamp them by a global
clock reading [TASI96].
To cope up with inconsistencies due to the lack of central clock and global state, Haban and
Wybraneitz implemented two different versions of clock synchronization in distributed test
methodology (DTM) system [HABA90]. The first version uses a central physical clock that
triggers the local time counters on each test and measurement processor (TMP). This version is only used if the global clock is very far away from the TMP nodes. The central clock
allows measurements such as transmission delay. The second version - the software solution - uses a central machine to synchronize all the clocks by running an algorithm similar
to the TEMPO algorithm - the distributed service that synchronizes the clock of 4.3BSD
UNIX systems [GUSE89]: to initially align the first time interval, the central station polls
each TMP station to measure the clock difference between the central station and each local
TMP station using the following equation:
( D1 – D2 )
D = ------------------------2
where D1 is difference of message reception time and master timestamp in the received
message and D2 is the difference of the master node’s acknowledgement receiving time and
the local timestamp in the acknowledgement. Each local TMP station locally stores its time
difference from the central station. When the central station sends a start time, each local
station computes the start time by adding the clock difference to the start time from the master.
14
Topol et al. [TOPO95] propose a dual timestamping methodology that provides both primary and secondary timestamps in trace events. A primary timestamp is a logical timestamp that provides information on which the events are concurrent and hence can be visualized in parallel, whereas a secondary timestamp provides normalized causality
preserving “wall clock” timestamps that are used in program performance visualization.
The dual timestamping methodology is the cornerstone in the development of PVaniM
[TOPO96], a visualization environment for PVM network computing system.
In the SOBER distributed system, a timer mechanism is implemented to address the problem of clock synchronization. The clock synchronization mechanism involves three time
values: the system time, a reference time, and a relative time. The system time is retrieved
by gettimeofday() system call and is equal to the reading of the system clock which is running all the time. A reference time is an arbitrary time value used by an application and it
differs from application to application. A reference time value is sent to other applications
as a parameter to the clock synchronization operations. The relative time is equal to the reference time minus the system time.
There are four operations that are central to the implementation of clock synchronization
in the OBER system. The start() operation initializes a reference time variable, gets the system time, computes the relative time, sets the clock status to “running”, and broadcasts a
synchronization message across the network. All the applications across the network start
their respective local clock based the synchronized time broadcast across the network. The
set() operation sets a reference time, gets the system time, and compute the relative time. If
the clock is not in “running” status, the relative time will be set to zero. The synchronization
message is broadcast across the network. The stop() operation sets the clock status to
“stopped”, the reference time to current time, and the relative time to zero and broadcast
the synchronization message across the network. The broadcastClockEvent() operation is
used to broadcast a time synchronization message across the network.
15
3
Program Monitoring
In developing a program visualization system, a crucial step is to capture information necessary to drive the visualization system. In this chapter we address different program monitoring techniques for collecting run-time information about a given target system. First, we
present types of program monitoring systems and provide some examples of each type of
monitoring system. Then, we present general program monitoring approaches and techniques for monitoring distributed systems. Finally interferences due to program monitoring
is discussed and perturbation analysis methods are presented.
Program monitoring enables us to capture run-time information about a target program that
cannot be obtained by merely studying program source code. The collected information can
be used for program testing and debugging, dynamic system safety checking, dynamic task
scheduling, performance analysis, and program optimization. Program monitoring is accomplished in two phases; namely a triggering phase and a recording phase. In the triggering phase, occurrences of pre-defined events of interest are detected, and collection of data
pertinent to the events is activated. In the recording phase, the data pertinent to the events
is collected and stored for postprocessing or is transmitted to a processing module for online processing, analysis and visualization. The recorded data provides a trace of events that
can be used to describe the execution behaviour of the monitored system.
The triggering and recording phases for program monitoring can be implemented in hardware, or software, or both hardware and software, resulting in software, hardware, and hybrid monitoring systems, respectively. In the remaining sections of this chapter, we discuss
types of monitoring systems, monitoring techniques, and perturbation analysis techniques.
3.1
Types of Program Monitoring Systems
3.1.1
Software Monitoring Systems
Software monitoring systems are implemented by inserting an extra set of instruction (usually known as instrumentation code) into the target system to cause data capture. Both the
triggering and recording phases of program monitoring are accomplished by executing the
inserted code, and the recorded data is often stored in the working memory of the target system. Since an execution of the instrumentation code uses the computing power and working
16
memory of the target system, software monitoring systems may result in an unacceptable
performance penalty of the target program, and possibly their execution behaviour is also
affected. The interference due to monitoring can be measured by using perturbation analysis techniques to obtain the actual performance of the target system (see section 3.6).
The potential advantages of software monitoring systems are their flexibility, and that no
additional hardware is required for their implementation. Without using hardware support,
the dilemma of finding a balance between minimizing interference due to monitoring and
recording sufficient information about the execution behaviour of a target program always
exists. Limiting instrumentation on the one hand may provide inadequate measurement detail, whereas excessive instrumentation, on the other hand, may perturb the target system
to an unacceptable degree.
In program monitoring systems, the pre-defined events of interest are ordered according to
their time of occurrence and they are replayed in the same order during visualization stage.
In order to timestamp the events, a clock support is necessary. Since there is no hardware
support for software monitoring systems, they rely on the target system’s clock(s) and
hence the instrumentation code must have an access to the target system’s clock to timestamp events with the clock’s readings.
Joyce et al. [JOYC87] propose a distributed software monitoring system to detect occurrences of events of interest and to collect information on the concurrent execution of interacting processes. In this system event detection is done inside the target processes. To allow
detection of interprocess events, programmers have to modify the target processes by loading them with a version of an interprocess communication protocol to incorporate the monitoring activity into the execution of the program. The events monitored in this system are
process operations that may have a direct effect on other processes: entering/leaving the
system, creating/killing a process, message sends, receives, and replies. These events
match the process level events we discuss in section 3.4.1. In Joyce’s monitoring system,
process state transitions cannot be monitored because an application process cannot detect
its own state changes. To monitor such kind of events, the kernel needs to be instrumented
so that it sends transition events to the monitor. Joyce’s monitoring system is a typical software approach to program monitoring.
17
The software monitoring system is a suitable approach to the monitoring of the SOBER distributed system mainly because there is no hardware support required in the implementation of our monitoring and visualization system (see section 6.2).
3.1.2
Hardware Monitoring Systems
In hardware monitoring systems, a hardware device is attached to buse(s) of the target system to passively snoop the buse(s) and detect a set of pre-defined signals. Triggering takes
place on a specific combination of the pre-defined signals. Data recording is carried out by
hardware, and the recorded data is stored in a separate memory independent of the monitored system.
The primary advantage of hardware monitoring systems is that their interference with the
execution of the target system is minimal since the monitoring system shares no computing
resource of the target system. Although such devices can be designed to have minimal or
no perturbation effect on the target system, their main drawback is that they generally provide limited low-level information about the execution behaviour of the target system
[HABA90]. Simple snooping of system buses, or probes connected to the processor’s
memory ports or I/O channels do not provide sufficient information about the target system. To collect valuable run-time information, hardware monitoring systems often use sophisticated features of hardware. Another drawback of hardware monitoring systems is that
the desired signals may not be accessible as integrated circuits technique advances and
more functions are built on chips [TSAI96].
Plattner [PLAT84] proposes a hardware monitoring system for monitoring single-processor real-time systems. In Plattner’s system, a hardware device called a listener is attached
to the bus of the target processor and a separate storage space called a phantom memory is
used to mirror the contents of the memory of the target system in real-time. A monitoring
process is employed to access all information from the phanthom memory. This implementation of program monitoring obviously does not interfere with the execution of the target
system since it uses no resource of the target system. The main drawback of this system is
the extra cost of constructing the phanthom memory.
Tsai et al. [TSAI96] extend the Plattner’s monitoring system to monitor distributed realtime systems. This model assumes that each node of the distributed target system is a single-processor autonomous computer system with its own memory and I/O devices. To
18
monitor the target distributed system, a monitoring node is connected to the address, data,
and control buses of every node of the target distributed system. A module known as qualification control unit is used to detect occurrences of pre-defined conditions and to invoke
the corresponding recording phase action; either start or stop action. The collected data is
interpreted, analysed and displayed by the module that derives visualization. Issues such as
global time reference which is crucial to monitoring real-time system, are not elaborated in
this model.
3.1.3
Hybrid Monitoring Systems
Hybrid monitoring systems are attractive compromise between the intrusive software monitoring systems and the expensive non-intrusive hardware monitoring systems. They utilize
both software and hardware approaches to program monitoring, and to minimize perturbation due to monitoring by allowing the hardware to perform the majority of the monitoring
task. Hybrid monitoring systems insert instrumentation code into the target system to detect
the occurrences of pre-defined events of interest. Data recording is carried out by hardware,
and the collected data is saved in a separate memory independent of the memory of the target system.
Hybrid monitoring systems use two different triggering approaches: memory mapped and
co-processor monitoring [TSAI96]. In memory mapped monitoring, a set of pre-defined
addresses are used to trigger data recording. The monitoring unit is mapped onto the memory addresses with each address representing an event. In co-processor monitoring approach, the co-processor instructions are used to trigger event recording. The recording unit
acts as a co-processor that executes the monitoring instructions. To invoke recording of
data pertinent to events of interest, the co-processor instruction is sent by the target processor to the monitoring unit.
Haban and Wybranietz’s DTM (distributed test methodology) [HABA90] system uses the
hybrid monitoring approach to monitor program execution and to collect information pertinent to the events of interest. The main idea in the DTM monitoring system is that the target system detects significant events and these events are processed and displayed by a dedicated hardware. The DTM monitoring system is a typical example of hybrid monitoring
system that employs a memory mapped monitoring approach discussed in the previous paragraph.
19
3.2
Program Monitoring Techniques
In program monitoring process, there are two fundamental techniques for collecting information: tracing and sampling [TOPO95][EILE93]. In tracing technique every occurrence
of the pre-defined events are detected and information about all the occurred events is collected continuously for a certain interval of time, typically, for the whole duration of an execution of the target system. Small pieces of codes, usually known as sensors, are embedded in the target program and perform the desired recording of information. Sensors can be
developed in different ways. Since many distributed systems supply library routines for
communication, synchronization, and creating tasks, these integral events are traced by
providing macro wrappers that first perform the tracing operation and then call the desired
library routines. The pre-defined events of interest that are not related to any library routine
may be traced by providing the user with a function similar to printf() that allows events
with custom application-specific data to be recorded.
In sampling technique, information about occurrences of pre-defined events is collected
asynchronously, usually at a request from the monitor module. Sampling may be performed
by sensors or in some cases by probes, which resides in the monitor module and has direct
access to the address space of the application [OLGE93]. The sampling approach is useful
especially when we are interested only in cumulative statistics such as the total number of
messages sent or received by a node at various stages of the execution of the target application. Utilizing probes can minimize the perturbation to the application that would be incurred had sensors been utilized because sensors are executing continuously, whereas
probes are invoked after a given interval of time based on a sampling rate specified by the
user.
3.3
Monitoring Distributed Systems
In a sequential program programming, it is generally true that monitoring a program does
not alter the data values generated in connection with the events, and the order in which the
events are occurring. However, due to the non-deterministic behaviour of concurrency it is
generally impossible to monitor a distributed system without affecting its execution, and
hence the order of its events. The most we can do is to strive to minimize the probe effects.
A probe affects the distributed target program so that it may not present the same behaviour
20
as it did before the probe was attached. One method to maintain the ordering of events is to
predict the effect of monitoring, and to make necessary adjustments to reduce the interference effect (see section 3.6).
To monitor distributed systems, we need to monitor each computing node of the system by
attaching a monitor to the node. The monitor detects occurrences of pre-defined events,
triggers and records event data generated by the node to which it is attached. The recorded
data can be either stored locally in the memory of the target node for postprocessing or
transmitted to the central node on which the visualization module is executing for on-line
processing. In case the collected data is not evenly distributed among the computing nodes,
too much data could be stored at one node and this requires us to build a sufficiently large
data storage area for each node which is very expensive. To resolve this problem we transmit the data recorded at each computing node to the memory of a central computing node.
In this case, the data storage of each node is replaced with a network interface that sends
data to the central storage location. To minimize perturbation due to monitoring, a separate
dedicated network is used for the transmission of the collected data to the central location
resulting in a need for extra hardware. The latter option is employed in monitoring framework of our visualization system (see section 6.2).
If the target system is a distributed real-time system, an additional challenge is to minimize
the interference due to monitoring because the level of perturbation attained by using hybrid monitoring system may not be acceptable. Two approaches can be used to control the
effect of perturbation due to monitoring system: 1) hardware monitoring devices can be
used to reduce the interference due to monitoring; 2) perturbation analysis techniques are
used to predict the effect of monitoring, and make necessary adjustments to reduce the effect of interference.
3.4
Abstraction Levels in Program Monitoring
In testing and debugging a distributed system, different abstraction levels of the execution
information provide insight into the target system at different levels of details [TSAI91].
Higher level information refers to data pertaining to events such as interprocess communication and synchronization, whereas lower level information refers to data pertaining to
events such as step-by-step execution trace of a target process. Based on the granularity of
21
the required information, program monitoring can be performed at two levels of abstraction; at process level to collect higher level information; or at function level to collect more
detailed lower level information.
Run-time data collected by using process level monitoring includes information about
events such as process state transitions, communication and synchronization among the
software processes, and interactions between the software processes and the external processes. The execution data collected by monitoring at function level includes information
about events such as interaction among the functions and procedures that compose the
processes. Process level information is used to isolate faults within processes, whereas
function level information is used to isolate faults within functions and procedures. In the
rest of this section we identify events of interest in process level monitoring and function
level monitoring and we state triggering and stopping conditions for data recording phase
of program monitoring.
3.4.1
Process Level Monitoring
The main reasons for monitoring and debugging at a process level are: 1) a process is the
minimum program unit that can exhibit non-deterministic behaviour, and hence, if we can
isolate faults to an individual process, we can possibly use the conventional cyclic debugging method for the successive fault isolation levels of abstraction; 2) we can reconstruct
the execution behaviour for interprocess communication and synchronization operations to
localize faults to an individual process [TSAI96].
In process level monitoring a process is considered as a ‘black box’ which can be either in
running, or ready, or waiting state. A process changes its state depending on its current state
and the event(s) occurred in the system. We distinguish events that directly affect the program execution at process level from those events that affect the execution at lower level.
Arithmetic operations, value assignment to variables, and procedure calls, for example, are
events that do not cause immediate state change of a process. Interprocess communication
22
and synchronization operations are among the events that may cause a change of process
state and affect execution behaviour.
To detect the occurrences of process level events and to record their key values, the monitoring module can be set to detect the interrupts from the I/O devices and the software traps
from the applications processes that request services from the kernel. To collect the key values for an event with sub-events on two distinct nodes, such as remote process creation
events, the starting and ending conditions should include the interrupts from the interprocess communication devices.
The set of process level events includes among others: process creation, process termination, process synchronization, I/O operation, interprocess communication, wait child process, external interrupt, and process state change [TSAI96]. The process level events and
key values pertinent to them are summarized in Table 3.1.
To monitor these events and to collect information pertinent to them, Tsai et al. preset two
sets of conditions in the Quality Control Unit of the interface module of the monitoring system, one condition to trigger data recording, and the other one to stop the recording. The
triggering condition that must be satisfied to start data recording is summarized as follows:
IF ((system call interrupt) AND (interrupt process-level related))
OR ((System call interrupt) AND (I/O request))
OR (I/O completion interrupt)
OR (external interrupt from IPC device)
OR (program error interrupt)
THEN
<trigger data recording process>;
23
After the kernel services system calls or interrupts, the kernel always switches the system
mode to the user mode and then returns control to an application process. Thus, the stop
condition for all the events can be stated as follows:
IF (instruction changes the system mode to user mode)
THEN
<stop data recording process>;
Information pertaining to the process level events is collected from the target system as a
block of data that contains the key values for the events. The collected data can be saved to
secondary storage for postprocessing or directly sent to visualization module for on-line interpretation, analysis, and display.
3.4.2
Function Level Monitoring
Information collected by monitoring at process level may be too abstract for the programmer to remove bugs. To identify faulty components at lower level (i.e. faulty functions or
procedures), we need to monitor the system at function level. This can be done in two steps.
First, a set of faulty processes are identified by using process level monitoring, and then the
faulty processes are monitored at the function level to identify faulty functions using the
information collected in the first step.
The events that need to be monitored at function level are function calls, and function returns. The function level events and their key values are summarised in Table 3.2.
24
Event
Process Creation
Process Termination
Process
Synchronization
I/O operation
Interprocess
Communication
Wait Child Process
External Interrupt
Process State Change
Key Values
Parent process ID
Create Call Time
Node ID
------------------------------------------------Child Process ID
Creating Process Time
Node ID
Parent process ID
Resuming Time
Node ID
------------------------------------------------Child Process ID
Termination Time
Node ID
Process ID
Operation (P/V)
Semaphore ID
Value of the Semaphore
Time
Node ID
Process ID
Operation (I/O)
I/O port ID
Message (I/O buffer)
Time
Node ID
Sending Process ID
Message
Node ID
Send-Call Time
Receive-Acknowledgement Time
------------------------------------------------Receiving Process ID
Message
Node ID
Receive-Call Time
Receiving-Message Time
Parent Process ID
Child Process ID
Time
Node ID
Interrupted Process ID
I/O port ID
Message (I/O buffer)
Time
Node ID
Process ID
New State
Transition Time
Node ID
Table 3.1 Process-level events and their key values
25
Event
Function Call
Calling Function ID
Called Function ID
Passed-in Parameters
Time
Function Return
Calling Function ID
Called Function ID
Returned Parameters
Time
Table
3.5
Key Values
3.2 Function-level events and their key values
Interference Due to Monitoring
In monitoring distributed systems, the insertion of instrumentation code into a target system
affects the performance of the target system which in turn affects the ordering and timing
of events. The ordering of the events can be classified as a partial ordering or a total ordering. A partial ordering is a local sequence of events occurring within a node. The timing
of local events is referenced to the local clock of the node. Since the local clocks of different
nodes are not synchronized, the times recorded in one node cannot be compared to the times
recorded in the other nodes. In contrast, the total ordering is a global sequence of all events
occurring in the system. In this case the timing of all events is referenced to a single global
clock or to synchronized clocks [LAMP78]. Therefore, an event with an earlier global
timestamp is definitely occurred before an event with a later timestamp.
In sequential computing, since intra-process events have a total ordering an interference
due to monitoring affects only the timing of the events, but not their order. In distributed
processing, however, delaying one of the processors may slow down or stop the execution
of another process thereby causing it to miss a deadline or alter the event ordering with respect to events on remote processors. To minimize the effect of the interference caused by
monitoring, two approaches are used: 1) a monitoring hardware device is used to reduce
the interference; 2) perturbation analysis technique is used to predict the effect of monitoring and changes are made to reduce the interference. Perturbation analysis is discussed in
the next section.
26
3.6
Perturbation Analysis
Adding instrumentation code to support a program visualization invariably affects the performance of the target system [TOPO94]. Removing the instrumentation code after monitoring can definitely restore the performance of the target system. However, the target system with the instrumentation code removed may present different behaviour from the one
with the instrumented code inserted [TSAI96]. Thus, for the behaviour of the system to remain predictable, the instrumentation code should be kept in the target system permanently.
The “true” execution behaviour of the target system can be discovered by predicting and
removing the perturbation caused by monitoring.
Perturbation analysis techniques examine event ordering and timing in an attempt to find
ways to reduce the effects of monitoring interference by adjusting the event ordering and
timing. Event ordering is found by reconstructing the total ordering of the interprocess
communication events based on the knowledge gained from the system’s kernel and crosscompiler. The “true” event timing are found by measuring the event’s delay due to the execution of the instrumentation code.
Malony et al. [MALO92] present two perturbation analysis models. The first model predicts the “true” total execution time of a program from the collected event trace by removing the effect of execution of the instrumentation code. The second model adjusts individual event to its “true” time by removing the effect of execution of the instrumentation code
before the events occur. In these models, it is assumed that the execution of instrumentation
code can be de-coupled from the execution of the target program, and indirect perturbation
such as register reference and cache reference patterns are neglected. Malony et al. conclude that with the proper perturbation model and analysis, increase in execution time due
to perturbation can be reduced to less than 20% of the total execution time of the target program.
Sute and Kang [SUTE94] propose a technique to preserve the execution behaviour of a target system by giving equal delay time for each involved communication events. In order to
maintain the original order of events, dummy probes are inserted before some events to
make the delay time uniform. More accurate performance data is obtained by removing the
probe time which will be uniform for all events after the adjustments are made.
27
4
Program Visualization
In this chapter, we discuss program visualization techniques, and application of visualization to program debugging and performance evaluation. Program visualization is useful in
understanding, debugging and finding performance bottlenecks of a target program.
Through the use of various displays, erroneous program behaviour can easily be detected
and highlighted.
The raw data collected by monitoring an execution of a target program can be presented
directly as a sequence of data values. However, there is usually too much data for a user to
interpret and comprehend. The data needs to be transformed into another form that allows
large amount of data to become easily comprehensible. For instance, displaying the data in
the form of graph may allow large amount of information to be understood at a glance.
In program visualization we display the execution information collected from the target
system in the monitoring phase, in a systematic, meaningful and logical way. Program visualization is proven to be an efficient way to display and examine dynamic behaviour of
program execution behaviour. A well designed interactive graphical visualization can convey information about the behaviour of program execution much more effectively than textual representations. It also allows the user to control the level of abstraction at which the
available information is displayed.
In developing a program visualization system for displaying animation of program execution behaviour, two major components need to be developed. Firstly, a monitoring mechanism for extracting and formatting program event information needs to be developed. Secondly, a mechanism for mapping and restructuring the collected information as an input to
the visualization component to create animated graphical displays must be developed.
4.1
Program Visualization Techniques
Although program visualization is still in its infancy, some general distributed computing
visualization techniques are emerging [EILE93]. Statistical displays, communication
views, animations, and application-specific visualization are among the visualization techniques. In the SOBER visualization system - the SOBERvis - we employ statistical dis-
28
plays, and communication views of application-specific information. In the rest of this
chapter, we present some characteristics of different visualization approaches.
4.2
Statistical Displays
Many program performance visualization systems such as ParaGraph [HEAT91] heavily
rely on statistical displays for the presentation of performance data. Commonly used statistical displays include bar charts, Kiviat diagrams, and utilization Gantt charts. These displays provide insight into performance of the target distributed computing system, and due
to their performance oriented nature they rely heavily on real-time timestamps [TOPO95].
Thus, a visualization system should support a clock synchronization mechanism, or should
have an access to a synchronized clock(s).
4.3
Communication Views
Communication views are used to represent the message transmission among the nodes in
a distributed computing system. Typically, the topology of the processors and interconnection network that is displayed matches the users’ mental model of the topology of the target
distributed system. The ParaGraph [HEAT91] visualization system, for example, provides
a substantial set of topology-specific communication views. The Lamport view (usually
known as space-time view) is one of the popular communication views. In Lamport view,
process numbers are listed along the y-axis and time is displayed along the x-axis. Communication events such as send/receive events are represented as a line drawn between the
sending and receiving processes. That is the x-coordinates of the line are determined by the
send time and receive time, whereas their y-coordinates are determined by the process identity of the sending and the receiving processes.
When the Lamport view uses real-time, the view provides information about resource utilization and general communication pattern. When the view uses Lamport logical time, it
enforces a consistent ordering on a computation, in addition to displaying communication
pattern. Consistency is an important feature for testing and debugging. It is not achievable
by using global real-time timestamps[FIDG94].
29
4.4
Animations
Sophisticated graphical toolkits support the feature that is necessary to animate events that
are concurrent. This approach conveys critical information to the viewer; information that
cannot be simply achieved with a serialized view of a parallel application. The Conch message passing view [TOPO94] is a typical example of a concurrent animation. In this view,
processes are laid around a circle. When a process sends a message, a small filled circle
representing the message moves towards the centre of the circle in the general vicinity of
the process that will receive the message. When the message is received, it moves from its
intermediate position in the circle to the receiving process. This display clearly presents
message broadcasts and general message passing pattern.
4.5
Application-specific Visualization
Application-specific visualization is a program depiction that is developed specifically for
a particular application. This type of views illustrate the semantics of a program, its fundamental methodologies, and its inherent application domain [STAS92]. Visualization for
correctness debugging is different from that for performance evaluation because debugging
requires application-specific program views. An animation of the sorting algorithm, for example, should show the data values being exchanged, whereas a visualization of Gaussian
elimination should show the matrix of values as it is manipulated. In other words, an application-specific program visualization is recognized as presenting specific information
about the particular program or program class.
By presenting the execution of a distributed program in its inherent semantic format or application domain, a visualization system can provide programmers with an insight into the
programs functionality. The same information could be acquired by examining the values
of program trace variables throughout execution, but this type of tracing is much deliberate
and requires the programmer to associate the values of the variables and the program state
at a particular time.
30
Program performance visualization differs from application-specific program visualization
because performance views depict how efficiently a program is executing on a parallel or
distributed system. Performance views illustrate message passing, process utilization,
memory access, etc., and they are typically drawn from a library of graphical widgets,
gauges, x-y-z plots, and charts. Performance views can be reused for many different applications because they do not focus on the semantics of a particular program.
31
5
Debugging and Testing
Program debugging, and performance test are among the several areas in computing to
which program monitoring and visualization is applicable. In this chapter we discuss some
traditional debugging techniques for sequential programs and present debugging approach
to distributed systems.
5.1
Program Debugging
In program debugging, two different analysis techniques are used, namely: static, and dynamic analysis. Static analysis is used to analyse the design specification and the source
code to detect program anomalies, whereas dynamic analysis is used to analyse the execution behaviour of the program. Static analysis systems are distinguished from dynamic
analysis systems by not requiring program execution and by generally checking for structural faults instead of functional faults. That is, the tools of static analysis technique have
no knowledge about the intended functionality of the target program, but they simply identify program structures that are generally indicators of an error [CHAR89]. In the next section, we closely investigate the two debugging techniques in more detail.
5.2
Program Debugging techniques
5.2.1
Static Analysis
Static analysis tools entirely avoid the interference effect by not executing the programs.
They have the potential to identify a large class of program errors that are particularly difficult to find using the dynamic analysis technique. Static analysis is being used to detect
two classes of errors in distributed programs: synchronization errors and data-usage errors. Synchronization errors include such bugs as deadlock and ‘wait-forever’. Data-usage
errors include the usual sequential errors, such as reading an uninitialized variable, and parallel errors typified by two processes simultaneously updating a shared variable.
Static analysis uses formal specification and verification to locate erroneous code before
program execution. In general, static analysis is supported by a formal specification language that precisely specifies the system, a formal design method that is systematically
used to develop the system, and a formal verification method that logically proves the correctness of the developed system with respect to the specification [TSAI96].
32
The primary problem with most of static analysis algorithms is that the set of the examined
states is large and their worst-case computational complexity is often exponential. In addition, static analysis has inherent limitations in dealing with asynchronous interactions between processes. In other words, it is not possible to fully describe and model the behaviour
of distributed systems by using static analysis technique before program execution.
5.2.2
Dynamic Analysis
In traditional dynamic method for debugging sequential software, the program is executed
until an error manifests itself; the programmer then stops the execution, examines the program status, inserts assertions, and re-executes the program in order to collect additional
information about the causes of the error. This style of debugging is called cyclical debugging. In cyclical debugging three approaches are used: memory dumps, tracing, and breakpoints.
5.2.2.1
Memory dumps
The memory dump approach provides the lowest level debugging information. Once the
system is terminated abnormally or by request from the programmer, the program status including program object code, register contents, and a memory contents, is dumped into a
file. The advantage of this approach is that it provides sufficient information necessary to
locate an error. Its drawbacks are that it requires programmers to have a strong background
in low-level computer languages to examine the dumped code, and it is tedious and error
prone.
5.2.2.2
Tracing
The tracing approach utilizes special tracing facilities supplied by the operating system, a
compiler or programming environment to display selected information. The trace facility
continuously tracks every step of program execution including control flow, data flow, and
variable contents, and it reports relevant changes at defined times [CHEU90]. The advantage of tracing approach is that the user can interactively suspend program execution to examine changes in program status at any time. In addition, the tracing approach has the advantage of output debugging and making trace insertion easier.
The tracing approach completely relies on the programmer to specify appropriate actions.
If traces are enabled for multiple processors, the programmer or the debugger must assem-
33
ble them to obtain a global trace. In any case, global timestamps (either real-time or logical
time) are necessary to timestamp the trace information. Hence a clock synchronization support is required. Although it is possible to develop a clock synchronization, many distributed operating systems do not provide one. When no such facility is available, we can make
a selected processor responsible for generating the global trace according to the order in
which the trace messages are received from all other processors. However, because of variable communication delays and the non-determinism of processor scheduling, the trace
messages may not arrive at the selected processor in the order they were generated. The selected node may also become a bottleneck for the collection of trace information. Thus, a
better way to address this problem is to provide a clock synchronization support.
5.2.2.3
Breakpoints
A breakpoint is a point in a program execution flow where normal execution is suspended
and a relevant run-time information, such as variable values, stack counters, and register
values can be displayed. At breakpoint the programmer can interactively examine and modify parts of program status, or control later execution by requesting single-step execution
or setting further breakpoints.
The advantage of breakpoints approach is that it requires no extra code in the program, and
hence avoids the effects of adding debugging probes to distributed programs. It also allows
the programmer to control distributed execution and to select display information interactively. Its disadvantage is that it requires strong knowledge of programming and debugging
to set the breakpoints at appropriate places in the program and to examine the relevant data.
Traditional distributed debuggers generally support the same type of breakpoints as those
found in sequential debuggers [CHEU90].
5.3
Performance Measurement
As multiprocessor systems proliferate in the market, there is an increasing need to evaluate
their relative performance when executing various applications. Performance measurement
is conducted on an existing computer system to identify current performance bottlenecks,
to correct them, and to prevent potential future performance problems. An advantage of
performance measurement is that the performance of the real system rather than that of the
model system is obtained (as opposed to performance modelling). Disadvantages of per-
34
formance measurement, however, include the need for a real running system, and the necessary design of the measurement instrumentation.
5.4
Debugging Distributed Systems
The classical approach to debugging sequential programs involves repeatedly stopping a
program execution, examining program state, and then either continuing or reexecuting in
order to stop at an earlier point in the execution. Unfortunately, distributed programs are
not always reproducible because of their non-deterministic nature. Even when they are run
several times with the same input, their results can be radically different. These differences
are caused by races - a situation which occurs whenever two or more processes that are running in parallel are trying to use a resource simultaneously. For example, while two processes are running concurrently, one process may attempt to write a memory location while
the other process is reading the memory location. The behaviour of the second process depends on whether or not it reads the new value after the first process has completed writing
the memory.
The non-determinism arising from races is particularly difficult to deal with because the
programmer often has little or no control over it [CAHR89]. The resolution of a race may
depend on each CPU’s load, the network traffic, and non-determinism in the communication medium. The cyclic debugging approach often fails for distributed programs because
of their non-deterministic behaviour.
Another problem found in distributed systems is that the concept of “global state” is misleading or even non-existent [LAMP78]. Without a synchronized global clock, it may be
difficult to determine precisely the global order of events occurring in distinct, concurrently
executing processors.
The most straightforward approach to implementing distributed debugging is to associate
a sequential debugger to each target process and to collect information from each distributed debugger. This implementation would be adequate had bugs occurred only inside the
nodes. However, processes executing on different nodes have interprocess relationships
that would not be detected with such an implementation.
Charles et al. [CHAR89] categorize dynamic analysis techniques for debugging concurrent
systems into two general categories: traditional parallel debuggers (sometimes called
35
“breakpoint” debuggers) and event-based debuggers. The traditional parallel debuggers are
the easiest to build and therefore provide an immediate partial solution. They provide some
control over program execution and state examination. Event-based debuggers provide better abstraction than that provided by traditional style debuggers. They also address the interference effect by permitting deterministic replay of non-deterministic programs.
Using breakpoints technique to debug distributed computing systems raises three problems: 1) it is impossible to define breakpoint in terms of precise global state; 2) the semantics of single-step execution are no longer obvious (some researchers define it to be the execution of a single machine instruction or a statement of source code on a local processor,
others consider it to be a single statement on each processor involved, and still others treat
it as message transmission, reception, or process creation/termination); 3) there is a problem of halting a process cluster at a breakpoint or after a single step [CHEU90].
When a breakpoint is triggered, either all of the processes in the distributed program or only
the process encountering the breakpoint can be stopped. The former can be difficult to
achieve within a sufficiently small interval of time, and the latter can have a serious impact
on systems that contain mechanisms such as time-outs. To address this issue, Cooper
[COOP87] introduced a logical clock mechanism to maintain correct time-out intervals,
and thus provide transparent process halting.
Some debugging issues are very critical with respect to the current implementation of the
SOBER system. Determining the number, the types and the size of messages generated by
a given operation is, for example, very important to identify communication bottleneck.
This same information can also be used in experimenting with the SOBER system using
different network topologies (e.g. ring or hybrid of star and ring) in our future work to reduce message bandwidth. A detection of a failure of a computing node in general and that
of the central switching node in particular, is extremely useful in the process to maintain
the faulty node dynamically in future.
5.5
Chapters Review
In this and the previous chapters, i.e. chapter 2 - 5, we have discussed several general and
basic issues including structures and architectural models of distributed computing systems, general techniques for program monitoring, visualization and debugging and their
36
extension to distributed computing environments. Some of the concepts and principles discussed, especially monitoring and visualization techniques are directly employed in the development of our distributed visualization system, whereas some others are used in selecting a more suitable monitoring and visualization framework for our target distributed
system, and others are addressed mainly because of their theoretical significance and for
completeness of the report.
37
6
The SOBER Visualization System
In this chapter, we briefly discuss the design and implementation of the SOBERvis system
and we present its major components. In section 6.1, we present an overview of the SOBER
distributed system and its components at a fairly detailed level. In sections 6.2 and 6.3, we
discuss monitoring and visualization frameworks we have chosen for SOBERvis system
and their implementations. Finally, in section 6.4, we present the design and implementation of the components of the SOBERvis system.
The SOBER visualization (SOBERvis) system is a pure software visualization system that
is developed to visualize run-time information about the SOBER distributed system. Information such as virtual network topology, statistical information about interprocess communication, and synchronization are presented to assist both SOBER programmers and users
in understanding the execution behaviour of the system, in debugging, and in performance
tuning. SOBERvis consists solely of instrumentation code inserted into the target system to
detect the occurrences of the pre-defined events of interest and to collect the data pertinent
to the events, and modules that produce visual display of the collected data.
6.1
The SOBER System
The SOBER system is a distributed computing system developed to simulate offshore
emergency response training routines. It further approximates real-life situations, and offers an economically attractive alternative to full scale exercises [SAND95]. The SOBER
system focuses on some salient aspects of offshore accident management activities such as
communication, coordination and resource allocation, and develops vital skills among the
personnel and enables them to make the right decisions when the accident occurs.
The SOBER system is a fully distributed system that is implemented on a set of heterogeneous autonomous workstations that are interconnected by a network. That is, a computing
node in the SOBER distributed system is a single-processor computer with its own storage
memory and I/O devices. Different components of the SOBER system are run on a localarea or a wide-area network, in order to better utilize distributed computing resources, and
to enable several operators to interact simultaneously. Adding to the flexibility of the system, all communications are based upon standard industry protocols (TCP/IP sockets and
38
UNIX streams). Unlike many other simulators, the SOBER system uses only off-the-shelf
hardware, which makes the system scalable with respect to need and cost constraints.
As we mentioned earlier in chapter 1, the virtual network topology of the SOBER system
is a single-star: a central switching server node and several application nodes directly connected to the central node (see Fig. 6.1). The communication cost of a network with singlestar topology is a linear function of the number of nodes on the network since communication between two nodes requires at most two transfers. However, this data transfer scheme
may not ensure speed since the central switching node may be a communication bottleneck.
Another major drawback of this scheme is that a failure in the central server node completely partitions the network since the server node is dedicated to message switching task.
6.1.1
Components of the SOBER System
The SOBER system is composed of several components or modules which serve to manage
and control objects in the shared virtual world. Combination of these basic building blocks,
form application units, such as a 3-D virtual flight simulator and a radar system, with which
the user can navigate and interact with objects in the virtual world. Utilization of such an
object-oriented design methodology increases the independence and adaptability of the
modules.
An object server module executes on the central switching node and is dedicated to message switching task and to connecting the individual application components, allowing
them to communicate by message-passing communication paradigm. The object server
module is also responsible for the management and maintenance of the virtual world, for
consistency and coordination of object interactions, for updating the state of the virtual
world after each simulation time-step, and for communicating the changes to the other SOBER applications to ensure that all participating applications are able to see the consequences of the other’s actions. This in turn enables course participants who are interacting
with the objects in the virtual world via the applications to see the consequences of the other
participants’ activities in the virtual world. The object server also coordinates communication and execution of other applications. To avoid data corruption in the virtual world, at
most one application has control over a given object in the virtual world at any given time
and all other applications can only make a ‘proxy’ to all objects they don’t control.
39
A Scenario module is the instructor’s game master used for designing and executing a list
of critical events - scenarios - to which the trainees must react. The scenario module also
enables the instructor to create and update the library of SOBER objects in the virtual world
and to decide the role of a trainee running a given application and which objects the application should control. A scenario clock emulates the time of the day which affects which
objects the participants can see at a given time.
In order to simulate real-life media channels, the SOBER system also integrates two other
communication networks and servers: a message server and a radio communication networks. The message server network provides multimedia capabilities by supporting distribution of sound, picture and text messages. The message server, for example, enables a
course instructor to send text information or alarms to the course participants in order to
guide them through a problematic phase or to indicate emergencies. A radio communication network is also a separate network that is intended to provide digital sound distribution
support. The Oil Simulator and the Weather Simulator components add to the level of realism of the SOBER system by providing a realistic diffusion of oil spills from installation
or ships, taking into account weather conditions and the use of booms and skimmers to fight
the oil spill.
40
Fig. 6.1: Virtual Network Topology of the SOBER System
41
6.2
Monitoring Framework for SOBERvis
As we mentioned earlier, an implementation of a program visualization system has two distinct phases: data collection or monitoring phase, and visual display creation or visualization phase. These two phases can be implemented either as co-routines in case of an on-line
visualization mode where the target system and the visualization system execute concurrently, or as distinct routines in case of an off-line visualization mode where the execution
of the visualization routines start after the execution of the monitoring routines and the target program has completed. The SOBERvis system employs a software monitoring technique discussed in section 3.1.1 to collect run-time information about the target SOBER
system and it shares the computing power and storage resources of the target system. The
implementation of the SOBERvis monitoring, supports both on-line and off-line modes of
visualization.
The monitoring framework for SOBERvis is implemented by developing a monitor server
to which a unique port number is assigned so that the SOBER components, servers and applications, connect themselves automatically when they are created. Fig. 6.2 shows the
structure of the SOBERvis monitoring framework. The communication routines in the
components are modified so that they send a copy of a successfully sent or received messages and events to the monitor server via a separate virtual network intended for the monitoring purpose. Based on the form and type of the events received, and the visualization
mode the monitor server invokes a visualization routines to process the message received
from the SOBER applications.
The set of events of interest in the SOBER distributed system includes, but not limited to,
the following:
•
process creation and termination;
•
interprocess communication (SEND and RECV events);
•
SOBER object creation and deletion; and
•
grabbing and releasing (i.e. to take control of or to release) of an object by an
application.
The transfer and processing of the collected data can be performed in two different ways
depending on the visualization mode. 1) If the visualization mode is on-line, then the collected data is transferred to the visualization modules and is displayed in real-time; 2) if the
42
visualization mode is off-line, then the collected data is transferred to the secondary storage
of the node on which the visualization module executes for postprocessing.
6.3
Visualization Framework for SOBERvis
The SOBERvis system is an experimental visualization system we have developed to provide program monitoring and visualization support to programmers and users of SOBER
distributed computing system. The SOBERvis system supports two types of displays: communication display focuses on large-grained events that are influenced by and related to
the overall aspect of the SOBER distributed system, whereas statistical display is providing
more detailed information that is useful for program analysis and performance evaluation.
Moreover, SOBERvis system supports two visualization modes: on-line visualization and
off-line visualization modes. With an on-line visualization mode, run-time information
about the target system is displayed in real-time, whereas with an off-line visualization
mode data relevant to events of interest is recorded, stored in a secondary storage and replayed when the user interactively input a proper command. The two-mode visualization
approach provides more insight into the execution behaviour, performance efficiency, load
balance, and the operations of the SOBER distributed system. The SOBERvis system provides its users with an opportunity to interactively determine display type and visualization
mode.
43
Radar
Station
Scenario
Weather
Simulator
Flight
Simulator
Object Server
SOBER Distributed System
SOBER Visualization System
Graphical User
Interface
Monitoring
Routines
Visualization
Routines
SOBER Visualization System Network
SOBER Distributed System Network
Fig. 6.2: Structure of SOBERvis System
6.3.1
Communication Displays and Statistical Displays
A virtual network topology display presents information such as the SOBER applications
connected to or disconnected from the switching server node; the name, type, and state of
the applications running on a given computing node; the total number of objects in the virtual world and the number of objects that are controlled by a given application at a given
44
time, and the proportion of messages communicated along a given communication link. For
instance, to indicate the current status of a given SOBER application task we use the following colour coding: green=running, red=suspended. To maintain consistency in using
colour codes in our visualization system, we use the same colour coding to display the
length of time a computing node spends in any one of the above two status.
In a communication display, a node of the SOBER system is represented by a sphere
labeled by the name of the application running on the node. The color of the sphere representing the node indicates the state of the application executing on that particular node. A
communication link between two nodes is indicated by a line drawn between the entities
representing the nodes. The radius of the sphere representing a node gives an indication of
the number of SOBER objects in the virtual world that are controlled by the application at
a given time, and the thickness of a communication link between two application nodes is
proportional the maximum size of message sent along the link.
In addition to revealing the underlying virtual network topology of the SOBER system,
this display enables us to control the proper functionality of the nodes and displays a
faulty node using a proper color. For instance, if the object server is crashed, then the system controller or a user is informed by displaying the sphere that represents the object
server in red color.
The aggregate run-time information collected from the SOBER system, gives an idea
about the global system behavior such as interprocess communication, and synchronization. In contrast, a detailed information about each component application reveals the runtime behavior of the component. The statistical information is categorized into two
groups, namely global and local, and contains the following information.
Global information:
•
the number of client nodes in the network at a given time
•
the number of clients in running/ready/waiting state
•
the total number of SOBER objects in the virtual world database
•
the total number of events occurred in the system
Local information:
For each SOBER client/application we display:
•
the name of the application
•
the network address of the application
45
6.3.2
•
the name and address of the host on which the application executes
•
the total number of objects controlled by the application at a given time
•
the number of communication events that occurred in the application
On-line vs. Off-line Visualization Approach
An important consideration in designing program monitoring and visualization system is
whether the information gathered is utilized in an on-line visualization mode or in an offline visualization mode. It is worth considering this issue because it is among the determining factors in selecting an appropriate monitoring technique. If the tracing monitoring technique is utilized for off-line visualization, for example, extensive buffering of the recorded
data is possible since the analysis is deferred until the executions of the applications complete. But, if the tracing monitoring technique is used for an on-line visualization, information must be processed in real-time and extensive buffering would not be applicable.
Some visualization environments use the same monitoring technique, namely the tracing,
to support their on-line and off-line visualization modes. While these types of visualization systems are very appropriate for off-line program analysis, they require a large
amount of bandwidth when used for on-line analysis. Other visualization systems distinguish between the two techniques of monitoring (tracing and sampling) and between
graphical view used for an on-line analysis and those used for detailed off-line statistical
analysis. PVaniM [TOPO96] uses, for example, buffered tracing for a postmortem analysis, by using a buffering hierarchy to collect trace events. For an on-line graphical view,
PVaniM uses periodic sampling of events with adjustable granularity; this requires substantially less bandwidth than event tracing but the views are not as detailed.
6.4
Design and Implementation of SOBERvis
The primary components of SOBERvis are its graphical user interface, the monitoring routines, and the visualization routines. Figure 6.2 shows the structure of SOBERvis system.
In the rest of this section, we provide implementation details of the primary components of
SOBERvis system.
6.4.1
Graphical User Interfaces
The SOBERvis system has an interactive 3D graphical user interface which allows its users
to interactively setup initial visualization parameters such as the display type, and the vis46
ualization mode. A user can choose between communication and statistical displays, and
can decide to visualize the information either in an on-line or an off-line visualization
mode. These visualization modes are discussed in more detail in section 6.3.1.
When the SOBERvis system is invoked, the first window that appears on the screen is the
SOBERvis Start-up window. The start-up window contains a control panel which allows
the user to interactively setup initial visualization values that are mentioned above by a
mouse click on the appropriate button in the control panel. Figure 6.3 shows a snapshot of
a start-up window of SOBERvis. We can setup different combination of displays and visualization modes and compare the resulting views. For an on-line visualization mode, we
need to specify the sampling rate by using the scale bar that appears on the bottom line of
an on-line visualization window. Each initialization parameter assumes its default value if
it is not initialized explicitly. The default visualization mode is an off-line mode, whereas
the default display type is communication display, and the default sampling rate is 0.0 seconds, i.e. tracing monitoring technique (see section 3.2). The user can alter these default
values interactively at any time.
47
Fig. 6.3: Start-up Window for SOBERvis
After we setup initial values, we press the StartSbvis button to begin the graphical visualization of the target system based on the default parameters or the visualization parameters
we have specified in the initialization step. This in turn displays a user interface component
with which a user can interact to refine the visualization by making further choices. For instance, from the window of the SOBER virtual network topology display whose snapshot
is shown in figure 6.4, selecting an item from the scrolled list of the names of applications
on the left side of the window displays a detailed view of statistical information about the
selected application is obtained.
48
The monitoring and visualization system can also be stopped at any time without affecting
the execution of the target SOBER system. This can be accomplished either by a mouse
click on the Quit button in the start-up window or by selecting the Exit from the File pane
of any SOBERvis window.
Fig. 6.4: Display of Virtual Network Topology of SOBER system
49
Fig. 6.5 Display of Virtual Network Topology of SOBER system
(the red color indicates that the Object Server is suspended)
50
Fig. 6.6 Display of statistical information about SOBER system
51
6.4.2
Monitoring Routines
Typically, programmers must hand annotate their code with print statements to produce an
event log for visualization. This approach is error prone and time consuming, and may not
be able to produce an event trace of sufficient detail. Another problem with this approach
involves trace events that are timestamped by readings of local clocks that are not accurately synchronized, thus leading to misleading visualization. For example, we may discover a message receipt event with timestamp earlier than the timestamp of the
corresponding send event. Visualization systems that use event trace data filled with such
causality violations are misleading. In this subsection, we discuss the modifications we
have made to the communication routines of the SOBER system to provide a support for a
straightforward visualization of the SOBER system.
To address this problem, we directly integrate a monitoring support required for our visualization system into the SOBER distributed system by modifying its communication
primitives. Because the standard communication primitives, such as sendMessage() are
cognizant of the type, form, source, destination and size of the message, they can be modified so that they can automatically produce event trace information that is necessary for
visualization purpose. In our case, we have modified the communication routines, namely
sendMessage(), and recvMessage(), so that they send a copy of all messages successfully
sent or received by a SOBER application to the monitor server. That means, in our visualization system a tracing monitoring framework is used. A similar approach is employed,
for example, in the implementation of POLKA [TOPO94].
When the SOBERvis system is invoked the monitor server initializes itself and waits for a
connection request from the SOBER application’s task. The objectServer is the first
SOBER application to be invoked and to be connected to the monitor server. Then other
application tasks are invoked and connected to the monitor server only if they are successfully connected to the object server. A monitor server receives all the messages from the
SOBER applications tasks and parse them. Then, the trace events are filtered on their
arrival and only events of interest are further processed or stored for postprocessing. The
message header of the SOBER events is also modified in such a way that it contains a flag
which is used by the monitor server to categorize a message as a “send” or “recv”. This
flag is very useful to get aggregate statistical information such as the total number of send
and receive events occurred in a given SOBER application task.
52
6.4.3
Visualization Routines
When the monitor server received a message from a SOBER application task, it handles the
message by invoking a relevant member function of the monitor server. The first message
that is received by the monitor server from a SOBER application task is a request for connection to the server. When the connection request message received the newConnection()
member function handles the request. If the connection is successful, information about the
application is parsed from the connection information (CM_SAP - service access point), an
instance of a SOBER application node is created and appended to the global list of applications connected to the object server.
Once a connection is established between a SOBER application and the monitor server, a
copy of every message that is sent or received by the application is reported to the monitor
server. When the message is received, the dispatchMessage() member function of the monitor server is automatically called to handle the message. If the message received is among
the events of interest, a relevant visualization routine is invoked to further process the event
information on-line or to store it for postprocessing depending on the display and visualization mode.
If the monitor server received a message about a SOBER application is exit (either normally or due to error), the handleClose() member function is automatically called to handle the
connection close request. In addition to closing the connection of the application to the
monitor server, the handleClose() function marks the application with a disconnected state
and the graphical entity that represents the application is redrawn in a red colour to reveal
this information. Because the current implementation of the SOBER distributed system
does not distinguish between normal and abnormal exit of applications, the SOBERvis system does not distinguish between the normal and abnormal close events.
53
7
Summary and Conclusion
The SOBER visualization system is a prototype for monitoring and visualization system
developed to aid users and programmers of the SOBER distributed system in understanding
its execution behaviour and the environment in which it executes.
In the monitoring and visualization of the SOBER distributed system we are interested in
higher level of information and hence the abstraction level of monitoring employed in SOBERvis is at application (process) level. This is mainly because the low level information
is hidden from the SOBER system users and this in turn makes the collection of such information not only irrelevant but also difficult. The main reason behind this impediment, we
believe, is that the SOBER distributed system was developed without a focus on subsequent monitoring and visualization of the system.
Our experience shows that monitoring data necessary to produce a meaningful visualization is not only difficult to capture, but it also may not fit into the general visualization
framework. For example, in the implementation of the SOBER distributed system the notion of the three states of process is not clearly identified. The “waiting” state of a process
is not recognizable. Moreover, there is no distinction between normal exit and abnormal
exit of SOBER application task and hence we are obliged to indicate both exits using the
same colour code - red. In the case of objectServer application task, the distinction between
the normal and abnormal exits may not be necessary because the consequences are almost
the same - a crashing of the system. However, for SOBER application tasks a clear distinction should be made between a normal exit and an exit caused by an error as the later may
need maintenance. The temporary disconnection of the scenario application task is an issue
worth mentioning. We conclude that a support for program visualization, namely event
tracing support, need not be an afterthought. Instead, it should be a vital design issue considered when developing a distributed system.
Several potential avenues for future work on both the SOBER and the SOBERvis systems
exist. Extending the SOBERvis system so that it monitors and visualizes more detailed and
extensive information about the execution is a natural expansion of the SOBERvis system.
For instance, visualizing information about the objects in the virtual world that are visible
to a trainee running a given SOBER application is very useful to a trainer as it provides him
54
with more information about each trainee and assists him to better understand and control
the training sessions. The implementation of the SOBER distributed system requires remedy to make monitoring and visualization easier and execution information readily available for SOBER visualization system. This is also a potential avenue to pursue for future
work.
As we mentioned earlier, the central issue we address in this thesis work is developing a
program visualization program that enables us to monitor and control the execution of the
SOBER distributed system in general and that of the object server in particular, and to detect faulty computing node(s) if there is any. A mechanism to dynamically maintain such a
faulty node, can be integrated into the SOBERvis system and is a potential issue to be addressed in future.
55
8
Bibliography
[CHAR89] E. M. Charles, P. H. David, “Debugging Concurrent Programs”, ACM Computing Surveys, 21(4):593-622, December 1989.
[CHEU90] W. H. Cheung, J, P. Black, E. Manning, “A Framework for Distributed debugging”, IEEE Software, pp. 106-115, January 1990.
[COOP87] R. Cooper, “Pilgrim: A Debugger for Distributed Systems”, Proc. Seventh Int’l
Conf. Distributed Computing Systems, CS Press, Los Alamitos, Calif., 1987, pp. 458-465.
[COUL88] G. F. Coulouris, J. Dollimore, “Distributed Systems - Concepts and Design”,
Addison-Wesley, 1988.
[EILE93] K. Eileen, J.T. Stasko, “The Visualization of Parallel Systems: An Overview”,
Journal of Parallel and Distributed Computing, 18(2):105-117, June 1993.
[FIDG94] C. J. Fidge, “Fundamentals of Distributed Systems Observations”, Australian
Computer Science Communications, 16(1):399-408, January 1994.
[GUSE89] R. Gusella, S. Zatti, “The Accuracy of the Clock Synchronization Achieved by
TEMPO in Berkley UNIX 4.3BSD”, IEEE Trans. on Software Engineering, 16(7). 847853, July 1989.
[HABA90] D. Haban, D. Wybranietz, “A hybrid Monitor for Behaviour and Performance
Analysis of Distributed Systems”, IEEE Trans. on Software Engineering, 16(2): 197-211,
February 1990.
[HEAT91] M. T. Heath, J. A. Etheridge, “Visualizing the Performance of Parallel Programs”, IEEE Software, pp. 29-39, September 1991.
[JOYC87] J. Joyce, G. Lomow, K. Slind, and B. Unger, “Monitoring Distributed Sys56
tems”, ACM Trans. on Computer Systems, 5(2):121-150, May 1987.
[LAMP78] L. Lamport, “Time, Clocks, and the ordering of Events in a Distributed System”, Communication of the ACM, 21(7), July 1978, pp. 558-565.
[LARS90] S. Lars, “Postmortem Debugging go Distributed Systems”, Department of
Computer and Information Science, Linkoping University, Sweden, October 1990.
[MALO92] A. D. Malony, D. A. Reed, H. A. G. Wijshoff, “Performance Measurement
Intrusion and Perturbation Analysis”, IEEE Trans. on Parallel and Distributed Systems,
3(4): 433-450, July 1992.
[MNAB89] U. Manber, “Introduction to Algorithms: A Creative Approach”, AddisonWesley, 1989.
[OGLE93] D. M. Ogle, K. Schwan, and R. Snodgrass, “The Dynamic Monitoring of Distributed and Parallel Systems”, IEEE Trans. on Parallel and Distributed Systems”,
4(7):762-778, July 1993.
[PLAT84] B. Plattner, “Real-time Execution Monitoring”, IEEE Trans. on Software Engineering, SE-10(6):756-764, November 1984.
[SAND95] O. A. Sandvik, F. Oldervoll, R. Torkildsen, and K. P. Villanger, “Advanced
computer technologies improve crisis management training in the offshore industry”,
Exploration & Production Technology International, 1995, (also found on internet at
http://www.cmr.no/english/computer.html)
[SHAR87] An Introduction to Distributed and Parallel Processing. Blackwell Scientific,
Oxford, 1987.
[SLOM87] M. Sloman, J. Kramer, “Distributed Systems and Computer Networks”, Prentice-Hall, London, 1987.
57
[STAS92] Stasko, John T. and Kraemer, Eileen, “A Methodology for Building Application-Specific Visualization of Parallel Programs”, Graphics, Visualization, and Usability
Centre, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-GVU-92-10,
June 1992.
[STEV90] W. R. Stevens, “UNIX Network Programming”, Prentice Hall Software Series,
1990.
[SUTE94] Sute Lei Kang Zhang, “Performance Visualisation of Message Passing Programs Using Relational Approach”, Proceedings of ISCA 7th International Conference on
Parallel and Distributed Computer Systems, Las Vegas, Nevada, 6-8 October, 1994.
[TOPO94] B. Topol, J. T. Stasko, and V. S. Sunderam, “Integrating Visualization Support
into Distributed Computing Systems,” Graphics, Visualization, and Usability Centre,
Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-GVU-94/38, October
1994.
[TOPO95] Topol, Brad and Stasko, John T. and Sunderam, Faddy S., “The Dual Timestamping Methodology for Visualizing Distributed Applications,” College of Computing,
Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-CC-95-21, May
1995.
[TOPO96] Topol, Brad and Stasko, John T. and Sunderam, Vaidy S., “Monitoring and Visualization in Cluster Environments,” College of Computing, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-CC-96-10, March 1996.
[TSAI91] J. J. P. Tsai, K. Y.Fang, H. Y. Chen, Y. D. BI, “A Non-interference Monitoring
and Reply Mechanism for Real-Time Software Testing and Debugging”, IEEE Trans. on
Software Eng. Vol. 16(8), pp. 897-916, August 1991.
[TSAI96] J. J. P. Tsai, Y. Bi, S. J. H. Yang, R. A. W. Smith, “Distributed Real-Time Sys58
tems: Monitoring, Visualization, Debugging, and Analysis”, John Wiley & Sons. Inc.,
1996.
[WILL93] William F. Appelbe, John T. Stasko, and Eileen Kraemer, “Applying Program
Visualization Techniques to Aid Parallel and Distributed Program Development”, Graphics, Visualization, and Usability Centre, Georgia Institute of Technology, Atlanta, GA,
Technical Report GIT-GVU-91-08, October 1993.
59
9
Appendix
The source code of the SOBER visualization system is appended to this report. The list of
all source files and their starting page number is as follows:
60
Download