Abstract

Abstract Parallel and distributed computing environments have gained popularity especially because such systems offer many advantages over centralized sequential systems. Reduced incremental cost, better reliability, extensibility, better response and performance are among their potential advantages. However, due to their non-deterministic behaviour and huge size, understanding the execution behaviour of distributed systems is a major problems in developing such systems. Program visualization has proven to be an important aid in understanding, debugging, and performance tuning of distributed systems. In this thesis project, program monitoring, visualization and debugging of distributed computing systems are presented; and a monitoring and visualization system is developed to collect run-time information about the SOBER distributed system’s execution behaviour and to present the information in a logical and meaningful way using 3D graphics. To capture the information necessary to drive the visualization, a monitor server is developed so that every SOBER application task is virtually connected to and send a copy of all successfully sent or received messages via a separate virtual network dedicated to the monitoring. The collected data is transmitted to the visualization modules for processing and displaying. Our visualization system reveals, among other information about the SOBER execution behaviour, the state of SOBER applications and statistical information about interprocess communication and synchronization operations in the SOBER distributed system. I Acknowledgment I would like to thank all who have supported me in the process of writing this thesis. I am deeply indebted to my supervisors Mr. Rune Torkildsen and Prof. Sverre Storøy for their invaluable criticisms, discussions, suggestions, and encouragements without which this piece of work would not have been completed. I am also grateful to Christian Michelsen Research (CMR) for providing me with such a conducive research condition, and the facilities and resources necessary for this project. People in the Advanced Computing Section, at CMR in general and Mr. Kåre P. Villanger and Mr. Frode Oldervoll in particular deserve a special gratitude for their day-to-day encouragements and technical support. Finally, I would like to thank my friends Mr. Shimelis Lemma and Mr. Esmael Musema for devoting their precious time to proof-reading drafts of this thesis and for their invaluable comments and criticisms. II Table of Contents Abstract ................................................................................................................................ I Acknowledgment................................................................................................................ II 1 Introduction............................................................................................................... 1 1.1 The Problem ....................................................................................................... 1 1.2 Background ........................................................................................................ 2 1.3 Definitions and Abbreviations............................................................................ 4 2 Distributed Computing ............................................................................................ 5 2.1 Classification of Distributed Systems ................................................................ 6 2.2 Structure of Distributed Systems........................................................................ 7 2.2.1 Interconnection Networks ....................................................................... 7 2.2.2 Network Topologies ................................................................................ 8 2.2.2.1 Bus Topology ............................................................................ 9 2.2.2.2 Star Topology ............................................................................ 9 2.2.2.3 Ring Topology........................................................................... 9 2.3 Architectural Models ........................................................................................ 10 2.3.1 Client/Server Model .............................................................................. 10 2.3.2 Processor Pool Model............................................................................ 11 2.3.3 Integrated Model ................................................................................... 11 2.4 Communication and Synchronization .............................................................. 11 2.4.1 Communication Primitives.................................................................... 12 2.4.2 Synchronization Primitives ................................................................... 12 2.5 Clock Synchronization ..................................................................................... 13 3 Program Monitoring .............................................................................................. 16 3.1 Types of Program Monitoring Systems............................................................ 16 3.1.1 Software Monitoring Systems ............................................................... 16 III 3.1.2 Hardware Monitoring Systems.............................................................. 18 3.1.3 Hybrid Monitoring Systems .................................................................. 19 3.2 Program Monitoring Techniques...................................................................... 20 3.3 Monitoring Distributed Systems ...................................................................... 20 3.4 Abstraction Levels in Program Monitoring...................................................... 21 3.4.1 Process Level Monitoring ..................................................................... 22 3.4.2 Function Level Monitoring ................................................................... 24 3.5 Interference Due to Monitoring........................................................................ 26 3.6 Perturbation Analysis ....................................................................................... 27 4 Program Visualization............................................................................................ 28 4.1 Program Visualization Techniques .................................................................. 28 4.2 Statistical Displays ........................................................................................... 29 4.3 Communication Views ..................................................................................... 29 4.4 Animations ....................................................................................................... 30 4.5 Application-specific Visualization ................................................................... 30 5 Debugging and Testing........................................................................................... 32 5.1 Program Debugging ......................................................................................... 32 5.2 Program Debugging techniques ....................................................................... 32 5.2.1 Static Analysis....................................................................................... 32 5.2.2 Dynamic Analysis ................................................................................. 33 5.2.2.1 Memory dumps........................................................................ 33 5.2.2.2 Tracing..................................................................................... 33 5.2.2.3 Breakpoints.............................................................................. 34 5.3 Performance Measurement............................................................................... 34 5.4 Debugging Distributed Systems ....................................................................... 35 5.5 Chapters Review............................................................................................... 36 IV 6 The SOBER Visualization System ........................................................................ 38 6.1 The SOBER System ......................................................................................... 38 6.1.1 Components of the SOBER System...................................................... 39 6.2 Monitoring Framework for SOBERvis ............................................................ 42 6.3 Visualization Framework for SOBERvis ......................................................... 43 6.3.1 Communication Displays and Statistical Displays................................ 44 6.3.2 On-line vs. Off-line Visualization Approach ........................................ 46 6.4 Design and Implementation of SOBERvis....................................................... 46 6.4.1 Graphical User Interfaces...................................................................... 46 6.4.2 Monitoring Routines.............................................................................. 52 6.4.3 Visualization Routines .......................................................................... 53 7 Summary and Conclusion...................................................................................... 54 8 Bibliography............................................................................................................ 56 9 Appendix.................................................................................................................. 60 V 1 Introduction 1.1 The Problem The SOBER system is a Crisis Management Training System simulator developed at the Christian Michelsen Research (CMR) in cooperation with the Norwegian Under-water Technology (Nutec) Training Centre, and Siemens Nixdorf Information Systems [SAND95]. The SOBER system is currently installed on twelve networked Silicon Graphics Indigo2 workstations at Nutec in Bergen, Norway; and is being used to train offshore and maritime personnel in handling emergency situations. In section 6.1 we present the structure and components of the SOBER system in more detail. The SOBER system is a distributed system. It is organized as a network computing and the current topology of the underlying network is a single-star - one central switching server node at the centre and several application nodes virtually connected to the central switching server node. Currently, if a computing node, either an application node or the server node, in SOBER system is suspended by accident or otherwise, other nodes continue to send, or receive messages, or wait for a response from the faulty nodes. This may cause some nodes to wait indefinitely, which in turn may result in system malfunctioning such as deadlock. One way to address this problem is to use time-out technique, but the system may suffer from a certain degree of execution speed penalty. Especially, if the communication operations are synchronous or blocking, and a time-out mechanism is not used, the problem will be aggravated. In the current implementation of the SOBER system, neither the instructor nor the trainee can detect such a faulty node until it will be too late to recover from the error. Especially, if the faulty node is the switching node, the whole system crashes and the whole system must be restarted from the scratch. Developing a mechanism to detect and to report a faulty node as early as possible by monitoring the SOBER system is a novel way to address this problem. The SOBER visualization system (SOBERvis) provides such a support by displaying the graphical entity that represents the faulty node using a different colour from the one used for displaying the entity that represents a node that is functioning properly. In the virtual network view of the SOBERvis system, a faulty SOBER node is drawn using the red colour. A mechanism to recover from such an error can also be integrated into program visualization system. One hypothetical solution to implement this mechanism is to select, randomly 1 or otherwise, one of the application nodes and appoint it to function as a central switching node. This operation can be integrated into a program visualization system. However, the issue of dynamically maintaining a faulty computing node of the SOBER distributed system is out of the scope of this thesis project. The main objectives of this project are: 1.2 • to study distributed program monitoring and visualization techniques • to develop a program monitoring and visualization system that collects run-time information and present the information using 3D graphics • to collect run-time information about the SOBER distributed system and to present the collected information by using 3D graphical entities in order to assist system programmers and users understand the execution behaviour of the system, and isolate communication bottleneck. Background Distributed computing has provided effective solutions to many challenging problems in recent years and it has evolved into a popular and effective mode of high-performance computing. Increased availability, performance, reliability, low cost, and high scalability are among the potential benefits of distributed computing system. Distributed computing, however, is not without its share of obstacles [TOPO96]. In distributed computing, applications execute on workstations that have varying capabilities and configurations in terms of CPU speed, memory capacity, local vs. networked disks. This may have a negative effect on system performance as the computing and storage capacity is bounded by the smallest storage of a workstation in the system. Moreover, if the environment is an open distributed environment each workstation as well as the whole network itself, is potentially subject to an uncontrollable external load that often results in load imbalances and dynamic fluctuations in delivered resources which can be a major cause of performance degradation. The architecture of distributed computing environments is different from that of a sequential program, and hence requires a different approach to measuring and characterizing their performances, to monitoring applications’ progresses, and to understanding program execution behaviour. Source code browsing and tracing approaches to understanding distributed program are tedious, often ineffective, and hence inapplicable. Program visualization has been shown to be a novel and highly effective approach to assist program understanding, debugging, and performance test [WILL93][TOPO96]. Extending 2 and adapting program visualization to distributed computing systems can aid in understanding the complex communication and data flow among the components of distributed systems. Presently, however, the use of visualization has had only limited use for enhancing the design and development of distributed computing systems. Topol et al. [TOPO94] hypothesize that one of the primary reasons for the limited use of visualization tools in developing distributed computing system is the difficulty in acquiring the information necessary to drive the visualization. To obtain this data, program visualization systems require a monitoring mechanism to collect run-time information from the target system. On the other hand, without visualization support, understanding the data derived by monitoring program execution is tedious and complex. Since program monitoring and visualization are highly dependent on each other, in this thesis project we study different techniques of monitoring and visualization of distributed systems and we develop a monitoring and visualization system to capture run-time information from the SOBER distributed system and to display it by using 3D graphics entities. The graphical displays convey information such as the virtual communication network topology of the SOBER distributed system, and communication statistics among its computing components. This report is organized into 9 chapters each of which addresses a given issue(s), of course the issues are interrelated. In chapter 2 we introduce distributed computing environments and give an overview of architectural models and basic communication operations in distributed systems. In chapters 3, 4 and 5 we address program monitoring, visualization, and debugging concepts. We also discuss some general principles and techniques of monitoring, visualization, and debugging and provide some typical examples of distributed monitoring and visualization systems. In chapter 6, we address our target system - the SOBER distributed system, and the SOBER visualization system - SOBERvis. We present the monitoring and visualization techniques employed and the frameworks developed in the SOBERvis system. In Chapter 7, a short summary of this project, concluding remarks, and recommended future work are discussed. Chapter 8 contains a complete list of references sited in this thesis. The list of source code for the SOBERvis system is appended to this report as chapter 9. Before we spark our discussion on the major topics, we define some terms and concepts that will be encountered as we read through the thesis. This is important, we believe, not 3 only to avoid ambiguities about the concepts and the words but also to have the necessary tools to easily read through the thesis. 1.3 Definitions and Abbreviations Target program is a program from which a run-time information is collected and visualized. That is a program to which a monitoring, a visualization, or a debugging system is applied. Distributed computing system is a set of several processes running on different processors working towards a specific functional requirement. Program monitoring is a mechanism by which run-time information about an execution behaviour of a target program is collected. Program visualization is a graphical presentation of the monitored data and the illustration of run-time program behaviour. Program debugging is the process of detecting, locating, analysing, isolating and correcting suspected system faults in the target program. SOBER is an abbreviation for StatOil “Beredskapstrener” - a Norwegian term for Emergency Trainer. 4 2 Distributed Computing “A person with one watch knows what time it is; a person with two watches is never sure.” Anon, [MANB89] In this chapter we discuss basic concepts associated with the distributed computing environment to provide a necessary background to understand the other issues around this environment that will be introduced later in this thesis. Some general properties of distributed computing environment such as its structures and architectural models, and classification of distributed systems are discussed. Finally, the two most important operations in distributed computing systems, namely, the communication and the synchronization operations are also discussed in this chapter. For a more detailed discussion on distributed computing environments, it is advisable to refer to [COUL88], [SHAR87] and [TSAI96]. A distributed computing system has several processes running on different processors working towards a specific functional requirement. The distributed processes are coordinated by an interprocess communication protocol and synchronization mechanisms. The evolution of distributed computing environment on a networked collection of computer systems into a popular and effective mode of high-performance computing is due to its potential advantages in increased performance by executing several processes in parallel: increased availability because a process is more likely to have a resource available if multiple copies exist, increased reliability because the system can be designed to recover from failures, increased adaptability because components can be added or removed easily, low cost because expensive resources can be shared, and robust programming models and environments [TSAI96]. In distributed computing environment, the same programming models and methodologies can be used across a wide variety of platforms, ranging from stacks of headless workstations with high speed interconnections to collections of desktop systems to geographically distributed hierarchies of machines of multiple architecture types [TOPO96]. The distributed computing environment has also some major drawbacks peculiar to it. Lars [LARS90] discusses two problems specifically related to distributed systems. Firstly, the size of a distributed computing system often becomes physically very large and logically complex, making it difficult to handle. The processors can be spread out over a large geo- 5 graphical area, and communicate with each other by message-passing paradigm, unless a special hardware is used. This makes the controlling of the processors more difficult. The size of the program is also another problem factor. Usually, the size of a program code is very large and difficult to manage and puts high demands on the programming language and the programming tools. Secondly, distributed programs have a non-deterministic behaviour which makes their executions difficult to reproduce. Non-determinism is caused by system factors that cannot be directly foreseen and controlled by the programmer. Another drawback of distributed computing systems is their loss of flexibility in the allocation of memory and processing resources. In centralized computer systems or in tightly coupled multi-processor systems all of the processing and memory resources are available for allocation by the operating system as required by the current workload. In distributed systems, however, the processor and memory capacity of the workstations determine the largest task that can be performed. Data security is also another problem in distributed computing systems. To achieve high extensibility, many of the software interfaces in distributed systems are made available to clients. Any client that has access to the basic communication services can also have an access to the interfaces to servers. To protect the services against intentional and accidental violation of access control and privacy constraints, software security measures are needed. Recent work on software security, data encryption, and capability-based access control offer appropriate solutions. 2.1 Classification of Distributed Systems Distributed systems can take a variety of forms and different researchers classify them into different categories depending on different aspect of the systems. Sharp, for example, classify distributed systems depending on the degree of distribution in hardware, control, and data [SHAR87]. The hardware distribution can range from a single central processing unit (fully centralized) to multiple computers (fully decentralized); the control distribution can range from a single control unit to multiple control units which are fully cooperating by message-passing mechanism; and the data distribution can range from a single copy located at a central storage location to a distributed database with no central master file or directory. Sharp uses three axes to represent different levels of decentralization of the three components and defines a system with the highest degree of decentralization in all the three components as fully distributed system. 6 Tsai et al. classify distributed systems as homogeneous and heterogeneous depending on the architecture of their computing nodes [TSAI96]. In a homogeneous distributed system, all the computing nodes have the same architecture and supporting software. In contrast, nodes in a heterogeneous distributed system may have different architecture and/or supporting software. Tsai et al. also classify distributed systems as centralized and decentralized based on the relationship among their computing nodes. In a centralized distributed system the distinct computing nodes have workstation/server or client/server relationship, whereas in decentralized distributed systems each computing node is autonomous. As there exists a wide variety of distributed systems, there is no any debugging technique that is applicable to all systems that have different architecture, though several debugging techniques can be applied to a wide range of distributed systems. In light of the above classification of the distributed systems, the SOBER system can be classified as a centralized heterogeneous distributed system. Each computing node in the SOBER system is an autonomous workstation that is virtually connected to the central switching server node and the message server node (see section 6.1.1). 2.2 Structure of Distributed Systems 2.2.1 Interconnection Networks The performance and reliability of a distributed computing system is highly dependent on the performance and reliability of the underlying network [COUL88]. A failure of the underlying network causes the service to users to be interrupted. Overloading of the network degrades the performance and responsiveness of the system to the users. Thus, much effort is spent on designing reliable and fault-tolerant networks. Since network failure occurs very infrequently in practice, this drawback remains theoretical. In distributed computing systems, there are two main categories of interconnection networks: a single connection system, such as a bus or a ring, and a multiple connection path system, such as a multiple bus, a star or a mesh. In broader terms, networks can be classified as either store-and-forward or broadcast. In a store-and-forward network, a message or a packet is received in its entirety by a node, placed in a buffer, and forwarded to the adjacent node, if the message is not addressed to the node. The computing nodes in a store-and-forward network are interconnected by independent point-to-point transmission lines. The store-and-forward networks are used with a wide-area networks (WAN). In broadcast network, all nodes are connected to a common transmission medium and so a single message 7 transmitted by a given node will reach all the other nodes. The broadcast network is used mostly in local-area networks (LAN). Another type of interconnection network is the terminal network used to connect a variety of terminals and printers to a central computer. In this centralized point-to-point starlike network, the central computer communicates with each terminal over slow but cheap dedicated data transmission wires. Several network technologies and architecture have emerged with adequate performance to support distributed systems. The most widely used local network technology for distributed systems is the Ethernet. Ethernet is based on broadcasting over a simple passive circuit, with a single high-speed cable linking all of the computers in the network. Another class of network technology is the slotted ring. In slotted ring, all of the computers in the network are linked to a ring structure and data is transmitted in small fixed-size packets by passing it from node to node around the ring. Another ring network technology, known as token ring, can accommodate larger and variable-size packets. The performances of the Ethernet and the ring networks are almost the same. The token ring has a higher channel utilization under high loads and can provide a guarantee of service within a fixed time, whereas the Ethernet provides higher performance for the transmission of large volumes of data under light loads. In practice, both networks have been used for the construction of a variety of distributed computer systems. The differences in architecture are not evident above the lowest levels of network software. Since all local networks are designed to provide direct communication between any two hosts, the topology used has relatively little influence on system behaviour as seen by the user. Virtually all successful high-speed local networks have been structured as either rings or buses [COUL88]. 2.2.2 Network Topologies A network topology defines the interconnection structure of nodes and links. The network topology influences the incremental cost of adding another node, the ease to modify the topology, the dependency on a single component of the network, the complexity of the protocols needed, the throughput and delays, and the ability to broadcast data [SLOM87]. In this subsection we discuss different topologies of computer networks. 8 2.2.2.1 Bus Topology In networks with the bus topology there is a circuit composed of a single cable or a set of connected cables passing near all of the hosts on the network. When more than one cable is used the connections are made by repeaters - a simple amplifying and connecting units that have no effect on the timing or logical behaviour of the network. The cable is passive and each host has a drop cable connected to the main cable by a T-connection or tap. Data is transmitted over the cable to which all hosts have an access. A limitation of buses is that they are not scalable to connect a large number of processors because the single bus forms a communication bottleneck. Since there is no master node to arbiter access to the bus, each node must listen to the bus before sending or receiving a message. To receive a message, a node looks for a message addressed to it. To send a message, a node listens to the bus to make sure that the bus is free. If two or more nodes have been waiting to send a message, then a collision may occur. The transmitting nodes can detect the collision and attempt to re-send the message after a random period of time. An increase in number of collisions degrades the throughput of the network. 2.2.2.2 Star Topology In a network with star topology, all nodes are connected via a single link to a central switching node. The star topology has a low expansion cost, simple table lookup routing in the switching node, and a maximum delay of only one intermediate node. The star topology is commonly used for connecting terminals to a central computer. The main drawback of star topology is its poor reliability, because a failure of a link isolates a node. Failure in the central switching node stops all communications, and hence redundancy is sometimes provided at the switching node. Throughput of the network is bounded by that of the central switching node, which may be a bottleneck. Since the virtual network topology of the SOBER distributed system is a single-star, the SOBER system may suffer from this limitation. 2.2.2.3 Ring Topology In networks with a ring topology the cable is made up of separate links connecting adjacent nodes. Data is transmitted in one direction around the circle by signalling between nodes. The node that has the token can send messages. The token is passed from node to node until 9 the node that needs to transmit a message is encountered. Communication software is simple since routing is simple. The delays depend both on the number of nodes in the ring and the number of bits buffered by each node, typically 1 to 16 bits. An advantage of the ring topology is that there is no starvation and no deadlock; each node has its turn to possess the token, and only one node at a time is allowed to do so. Prioritybased access can be established. A disadvantage of ring topology is the effort required to manage the token, since a disappearance of a token causes the whole network to crash. Since the amount of time a node can use a token is unbounded, it is impossible to detect if the token is lost unless time-outs are used. Another disadvantage of the ring is that if a single node fails, the entire ring fails. To detect a node failure, the node receiving a token should acknowledge its receipt. If a node does not receive an acknowledgement after a certain amount of time, then a failure can be assumed. 2.3 Architectural Models Knowing the architecture of a distributed system can aid in analysing the system, and in making a good choice of a monitoring technique. Architectural models are also useful in classifying distributed system and in analysing their execution properties. Coulouris [COUL88] presents three architectural models of distributed systems; namely client/server, processor pool, and integrated models. The majority of distributed systems are based on the client/server model and so is the SOBER system (see section 6.1). 2.3.1 Client/Server Model In client/server model each user is provided with a single-user workstation, usually known as a client. Application programs are executing on the users’ workstations. The need for workstations is based primarily on user interface requirements in application tasks. Other factors affecting the division of tasks include the need for sharing data between users and applications, leading to a need for shared file servers and directory servers; for sharing expensive peripheral devices such as high-quality printers, and for specialized device servers. The workstations may be of several different types, e.g. some standard workstations and some high-performance workstations. They are integrated by the use of communication software enabling them to access the same set of servers. The servers provide access to shared devices, files and other networked resources. For example, an authentication service is usually provided to validate user identities and to authorize them to use system resources 10 and a network gateway service is often available to offer an access to wide-area networks to all of the workstations on a local network. 2.3.2 Processor Pool Model In the processor pool model, programs are executed on a set of computers managed as a processor service. Users are connected to the network via terminal connectors and interact with programs via a terminal access protocol. The potential advantages of this model include an efficient utilization of resources by using only as many computers as the number of users simultaneously logged in, flexibility by allowing expansion without installing more computers, compatibility, and use of heterogeneous computers. A substantial drawback of the processor pool model is the restricted mode of user interaction imposed by the use of terminals rather than workstations. Despite these advantages, the processor pool model does not satisfy the needs of high-performance interactive programs, especially when graphics is used in the application [COUL88]. Even when a terminal is connected to a host computer via a high-bandwidth local network, the speed at which graphical data can be transferred to the screen is too low for many interactive tasks. A hybrid model includes some workstations for interactive use, some processors and a variety of servers. The hybrid model is based on the client/server model, but with the addition of pool computers that can be allocated dynamically for tasks that are too large for workstations or tasks that require several computers concurrently. 2.3.3 Integrated Model The integrated model brings many of the advantages of distributed systems to heterogeneous networks containing single-user and multi-user computers. Each computer is provided with appropriate software to enable it to perform both the role of a server and an application processor. The system software located in each computer is similar to an operating system for a centralized multi-user system, with the addition of networking software. 2.4 Communication and Synchronization A distributed system consists of a collection of distinct computers which are spatially separated, and connected by a network making it possible to exchange messages among the processes running on the computers. Communication and synchronization allow distribut11 ed system’s processes to be coordinated [SLOM87]. Synchronization is a mechanism by which two or more processes are coordinated with respect to time, for example, by sequencing events or by granting a process an exclusive access to a resource. Communication refers to an exchange of information among the processes and does not necessarily imply synchronization. 2.4.1 Communication Primitives The most basic communication operations are the send and the receive operations. The simplest receive operation blocks: that is the receiving process waits until the message arrives. Blocking provides a synchronization mechanism. The blocking receive operation can cause problem in case the message does not arrive. There are other receive operations with time-out conditions, i.e. if the message is not available in a given time interval, then the receive operation is aborted and the next operation is executed. A send operation can be either asynchronous or synchronous. In an asynchronous send operation, the sending process sends a message and continues the execution of the next instruction and does not wait for an acknowledgement from the receiving process. In the synchronous send operation, the sending process waits for an acknowledgment from the receiving process. Obviously, the synchronous send provides synchronization as well as communication. Since the synchronous send is blocking, if the receiving process is delayed, the sending process will also be delayed. Worse, if the receiving process fails, then the sending process will hang. Therefore, a mechanism to prevent these situations is necessary. A bidirectional transactions are frequently used in client/server communications. 2.4.2 Synchronization Primitives Processes in a distributed system can communicate with each other synchronously or asynchronously. For synchronous communication, the sender and receiver must be synchronized. The sending process sends a message to the receiving process, and then waits for an acknowledgement from the receiving process that the message has been received. In asynchronous communication the sending process does not wait for an acknowledgement from the receiving node. Processes on the same node can communicate via shared memory. A semaphore is an interprocess communication primitive that is intended to let multiple processes synchronize 12 their access to the shared memory segment. If one process is reading into some shared memory, for example, other processes must wait for the read operation to finish before processing the data. A single binary semaphore is a semaphore with a value that can be either zero or one. To obtain a resource that is controlled by a semaphore, a process needs to test its current value, and if the value is greater than zero, it decreases the value by one (the P operation). If the current value is zero, the process must wait until the resource is released. To release a resource that is controlled by a semaphore, a process increases the semaphore value by one (the V operation). Semaphores are discussed in more detail in [STEV90]. 2.5 Clock Synchronization In a single-processor or a tightly-coupled multiprocessor system, there is only one system clock. Therefore, it is guaranteed that an event that is timestamped with an earlier time value occurred before an event that is timestamped with later time value. However, in distributed systems, since each node has its own local clock which may have a different reading from the clocks on the other nodes, for two events that occur on different nodes there is no guarantee that an event with an earlier timestamp occurred before an event with a later timestamp. Hence, to maintain causality relationship among the events in a distributed system, we need a mechanism to synchronize the clocks of the nodes. For a meaningful visualization of program execution behaviour, the events’ timestamps should be as accurate and consistent across the processors as possible. Since each node has its own local clock, its own starting time, and its own execution rate, it is necessary to implement clock synchronization. Poor clock resolution or synchronization can lead to what is called tachyons in the trace files - messages that appear to be received before they are sent [HEAT91]. A tachyon is a hypothetical particle that travels faster than light. Timestamping events by readings of the physical clock of each node totally orders the events on each node. However, due to the drifting nature of the quartz controlled oscillators, no two physical clocks run at exactly the same rate. This means that a perfectly accurate global clock cannot be implemented without additional hardware support. The lack of a global clock makes it impossible to establish the order of two events in a distributed system unless there is a causal relationship between them. Events in the same proc13 ess form a sequence determining the order, usually known as partial ordering. For two events that occurred in two different nodes, there must have occurred an event involving both nodes after one of the events and before the other one. Examples of such events involving more than one node include process creation, process termination, and communication events [LAM78]. In monitoring distributed computing systems, a monitor is attached to each node of the target system to detect occurrence of events of interest on that node and to record relevant event data (see section 3.3). To reconstruct the global state of the target system, a global time reference is required to timestamp the events with a global clock reading. That is, to order all the events that occurred in the system, we need to timestamp them by a global clock reading [TASI96]. To cope up with inconsistencies due to the lack of central clock and global state, Haban and Wybraneitz implemented two different versions of clock synchronization in distributed test methodology (DTM) system [HABA90]. The first version uses a central physical clock that triggers the local time counters on each test and measurement processor (TMP). This version is only used if the global clock is very far away from the TMP nodes. The central clock allows measurements such as transmission delay. The second version - the software solution - uses a central machine to synchronize all the clocks by running an algorithm similar to the TEMPO algorithm - the distributed service that synchronizes the clock of 4.3BSD UNIX systems [GUSE89]: to initially align the first time interval, the central station polls each TMP station to measure the clock difference between the central station and each local TMP station using the following equation: ( D1 – D2 ) D = ------------------------2 where D1 is difference of message reception time and master timestamp in the received message and D2 is the difference of the master node’s acknowledgement receiving time and the local timestamp in the acknowledgement. Each local TMP station locally stores its time difference from the central station. When the central station sends a start time, each local station computes the start time by adding the clock difference to the start time from the master. 14 Topol et al. [TOPO95] propose a dual timestamping methodology that provides both primary and secondary timestamps in trace events. A primary timestamp is a logical timestamp that provides information on which the events are concurrent and hence can be visualized in parallel, whereas a secondary timestamp provides normalized causality preserving “wall clock” timestamps that are used in program performance visualization. The dual timestamping methodology is the cornerstone in the development of PVaniM [TOPO96], a visualization environment for PVM network computing system. In the SOBER distributed system, a timer mechanism is implemented to address the problem of clock synchronization. The clock synchronization mechanism involves three time values: the system time, a reference time, and a relative time. The system time is retrieved by gettimeofday() system call and is equal to the reading of the system clock which is running all the time. A reference time is an arbitrary time value used by an application and it differs from application to application. A reference time value is sent to other applications as a parameter to the clock synchronization operations. The relative time is equal to the reference time minus the system time. There are four operations that are central to the implementation of clock synchronization in the OBER system. The start() operation initializes a reference time variable, gets the system time, computes the relative time, sets the clock status to “running”, and broadcasts a synchronization message across the network. All the applications across the network start their respective local clock based the synchronized time broadcast across the network. The set() operation sets a reference time, gets the system time, and compute the relative time. If the clock is not in “running” status, the relative time will be set to zero. The synchronization message is broadcast across the network. The stop() operation sets the clock status to “stopped”, the reference time to current time, and the relative time to zero and broadcast the synchronization message across the network. The broadcastClockEvent() operation is used to broadcast a time synchronization message across the network. 15 3 Program Monitoring In developing a program visualization system, a crucial step is to capture information necessary to drive the visualization system. In this chapter we address different program monitoring techniques for collecting run-time information about a given target system. First, we present types of program monitoring systems and provide some examples of each type of monitoring system. Then, we present general program monitoring approaches and techniques for monitoring distributed systems. Finally interferences due to program monitoring is discussed and perturbation analysis methods are presented. Program monitoring enables us to capture run-time information about a target program that cannot be obtained by merely studying program source code. The collected information can be used for program testing and debugging, dynamic system safety checking, dynamic task scheduling, performance analysis, and program optimization. Program monitoring is accomplished in two phases; namely a triggering phase and a recording phase. In the triggering phase, occurrences of pre-defined events of interest are detected, and collection of data pertinent to the events is activated. In the recording phase, the data pertinent to the events is collected and stored for postprocessing or is transmitted to a processing module for online processing, analysis and visualization. The recorded data provides a trace of events that can be used to describe the execution behaviour of the monitored system. The triggering and recording phases for program monitoring can be implemented in hardware, or software, or both hardware and software, resulting in software, hardware, and hybrid monitoring systems, respectively. In the remaining sections of this chapter, we discuss types of monitoring systems, monitoring techniques, and perturbation analysis techniques. 3.1 Types of Program Monitoring Systems 3.1.1 Software Monitoring Systems Software monitoring systems are implemented by inserting an extra set of instruction (usually known as instrumentation code) into the target system to cause data capture. Both the triggering and recording phases of program monitoring are accomplished by executing the inserted code, and the recorded data is often stored in the working memory of the target system. Since an execution of the instrumentation code uses the computing power and working 16 memory of the target system, software monitoring systems may result in an unacceptable performance penalty of the target program, and possibly their execution behaviour is also affected. The interference due to monitoring can be measured by using perturbation analysis techniques to obtain the actual performance of the target system (see section 3.6). The potential advantages of software monitoring systems are their flexibility, and that no additional hardware is required for their implementation. Without using hardware support, the dilemma of finding a balance between minimizing interference due to monitoring and recording sufficient information about the execution behaviour of a target program always exists. Limiting instrumentation on the one hand may provide inadequate measurement detail, whereas excessive instrumentation, on the other hand, may perturb the target system to an unacceptable degree. In program monitoring systems, the pre-defined events of interest are ordered according to their time of occurrence and they are replayed in the same order during visualization stage. In order to timestamp the events, a clock support is necessary. Since there is no hardware support for software monitoring systems, they rely on the target system’s clock(s) and hence the instrumentation code must have an access to the target system’s clock to timestamp events with the clock’s readings. Joyce et al. [JOYC87] propose a distributed software monitoring system to detect occurrences of events of interest and to collect information on the concurrent execution of interacting processes. In this system event detection is done inside the target processes. To allow detection of interprocess events, programmers have to modify the target processes by loading them with a version of an interprocess communication protocol to incorporate the monitoring activity into the execution of the program. The events monitored in this system are process operations that may have a direct effect on other processes: entering/leaving the system, creating/killing a process, message sends, receives, and replies. These events match the process level events we discuss in section 3.4.1. In Joyce’s monitoring system, process state transitions cannot be monitored because an application process cannot detect its own state changes. To monitor such kind of events, the kernel needs to be instrumented so that it sends transition events to the monitor. Joyce’s monitoring system is a typical software approach to program monitoring. 17 The software monitoring system is a suitable approach to the monitoring of the SOBER distributed system mainly because there is no hardware support required in the implementation of our monitoring and visualization system (see section 6.2). 3.1.2 Hardware Monitoring Systems In hardware monitoring systems, a hardware device is attached to buse(s) of the target system to passively snoop the buse(s) and detect a set of pre-defined signals. Triggering takes place on a specific combination of the pre-defined signals. Data recording is carried out by hardware, and the recorded data is stored in a separate memory independent of the monitored system. The primary advantage of hardware monitoring systems is that their interference with the execution of the target system is minimal since the monitoring system shares no computing resource of the target system. Although such devices can be designed to have minimal or no perturbation effect on the target system, their main drawback is that they generally provide limited low-level information about the execution behaviour of the target system [HABA90]. Simple snooping of system buses, or probes connected to the processor’s memory ports or I/O channels do not provide sufficient information about the target system. To collect valuable run-time information, hardware monitoring systems often use sophisticated features of hardware. Another drawback of hardware monitoring systems is that the desired signals may not be accessible as integrated circuits technique advances and more functions are built on chips [TSAI96]. Plattner [PLAT84] proposes a hardware monitoring system for monitoring single-processor real-time systems. In Plattner’s system, a hardware device called a listener is attached to the bus of the target processor and a separate storage space called a phantom memory is used to mirror the contents of the memory of the target system in real-time. A monitoring process is employed to access all information from the phanthom memory. This implementation of program monitoring obviously does not interfere with the execution of the target system since it uses no resource of the target system. The main drawback of this system is the extra cost of constructing the phanthom memory. Tsai et al. [TSAI96] extend the Plattner’s monitoring system to monitor distributed realtime systems. This model assumes that each node of the distributed target system is a single-processor autonomous computer system with its own memory and I/O devices. To 18 monitor the target distributed system, a monitoring node is connected to the address, data, and control buses of every node of the target distributed system. A module known as qualification control unit is used to detect occurrences of pre-defined conditions and to invoke the corresponding recording phase action; either start or stop action. The collected data is interpreted, analysed and displayed by the module that derives visualization. Issues such as global time reference which is crucial to monitoring real-time system, are not elaborated in this model. 3.1.3 Hybrid Monitoring Systems Hybrid monitoring systems are attractive compromise between the intrusive software monitoring systems and the expensive non-intrusive hardware monitoring systems. They utilize both software and hardware approaches to program monitoring, and to minimize perturbation due to monitoring by allowing the hardware to perform the majority of the monitoring task. Hybrid monitoring systems insert instrumentation code into the target system to detect the occurrences of pre-defined events of interest. Data recording is carried out by hardware, and the collected data is saved in a separate memory independent of the memory of the target system. Hybrid monitoring systems use two different triggering approaches: memory mapped and co-processor monitoring [TSAI96]. In memory mapped monitoring, a set of pre-defined addresses are used to trigger data recording. The monitoring unit is mapped onto the memory addresses with each address representing an event. In co-processor monitoring approach, the co-processor instructions are used to trigger event recording. The recording unit acts as a co-processor that executes the monitoring instructions. To invoke recording of data pertinent to events of interest, the co-processor instruction is sent by the target processor to the monitoring unit. Haban and Wybranietz’s DTM (distributed test methodology) [HABA90] system uses the hybrid monitoring approach to monitor program execution and to collect information pertinent to the events of interest. The main idea in the DTM monitoring system is that the target system detects significant events and these events are processed and displayed by a dedicated hardware. The DTM monitoring system is a typical example of hybrid monitoring system that employs a memory mapped monitoring approach discussed in the previous paragraph. 19 3.2 Program Monitoring Techniques In program monitoring process, there are two fundamental techniques for collecting information: tracing and sampling [TOPO95][EILE93]. In tracing technique every occurrence of the pre-defined events are detected and information about all the occurred events is collected continuously for a certain interval of time, typically, for the whole duration of an execution of the target system. Small pieces of codes, usually known as sensors, are embedded in the target program and perform the desired recording of information. Sensors can be developed in different ways. Since many distributed systems supply library routines for communication, synchronization, and creating tasks, these integral events are traced by providing macro wrappers that first perform the tracing operation and then call the desired library routines. The pre-defined events of interest that are not related to any library routine may be traced by providing the user with a function similar to printf() that allows events with custom application-specific data to be recorded. In sampling technique, information about occurrences of pre-defined events is collected asynchronously, usually at a request from the monitor module. Sampling may be performed by sensors or in some cases by probes, which resides in the monitor module and has direct access to the address space of the application [OLGE93]. The sampling approach is useful especially when we are interested only in cumulative statistics such as the total number of messages sent or received by a node at various stages of the execution of the target application. Utilizing probes can minimize the perturbation to the application that would be incurred had sensors been utilized because sensors are executing continuously, whereas probes are invoked after a given interval of time based on a sampling rate specified by the user. 3.3 Monitoring Distributed Systems In a sequential program programming, it is generally true that monitoring a program does not alter the data values generated in connection with the events, and the order in which the events are occurring. However, due to the non-deterministic behaviour of concurrency it is generally impossible to monitor a distributed system without affecting its execution, and hence the order of its events. The most we can do is to strive to minimize the probe effects. A probe affects the distributed target program so that it may not present the same behaviour 20 as it did before the probe was attached. One method to maintain the ordering of events is to predict the effect of monitoring, and to make necessary adjustments to reduce the interference effect (see section 3.6). To monitor distributed systems, we need to monitor each computing node of the system by attaching a monitor to the node. The monitor detects occurrences of pre-defined events, triggers and records event data generated by the node to which it is attached. The recorded data can be either stored locally in the memory of the target node for postprocessing or transmitted to the central node on which the visualization module is executing for on-line processing. In case the collected data is not evenly distributed among the computing nodes, too much data could be stored at one node and this requires us to build a sufficiently large data storage area for each node which is very expensive. To resolve this problem we transmit the data recorded at each computing node to the memory of a central computing node. In this case, the data storage of each node is replaced with a network interface that sends data to the central storage location. To minimize perturbation due to monitoring, a separate dedicated network is used for the transmission of the collected data to the central location resulting in a need for extra hardware. The latter option is employed in monitoring framework of our visualization system (see section 6.2). If the target system is a distributed real-time system, an additional challenge is to minimize the interference due to monitoring because the level of perturbation attained by using hybrid monitoring system may not be acceptable. Two approaches can be used to control the effect of perturbation due to monitoring system: 1) hardware monitoring devices can be used to reduce the interference due to monitoring; 2) perturbation analysis techniques are used to predict the effect of monitoring, and make necessary adjustments to reduce the effect of interference. 3.4 Abstraction Levels in Program Monitoring In testing and debugging a distributed system, different abstraction levels of the execution information provide insight into the target system at different levels of details [TSAI91]. Higher level information refers to data pertaining to events such as interprocess communication and synchronization, whereas lower level information refers to data pertaining to events such as step-by-step execution trace of a target process. Based on the granularity of 21 the required information, program monitoring can be performed at two levels of abstraction; at process level to collect higher level information; or at function level to collect more detailed lower level information. Run-time data collected by using process level monitoring includes information about events such as process state transitions, communication and synchronization among the software processes, and interactions between the software processes and the external processes. The execution data collected by monitoring at function level includes information about events such as interaction among the functions and procedures that compose the processes. Process level information is used to isolate faults within processes, whereas function level information is used to isolate faults within functions and procedures. In the rest of this section we identify events of interest in process level monitoring and function level monitoring and we state triggering and stopping conditions for data recording phase of program monitoring. 3.4.1 Process Level Monitoring The main reasons for monitoring and debugging at a process level are: 1) a process is the minimum program unit that can exhibit non-deterministic behaviour, and hence, if we can isolate faults to an individual process, we can possibly use the conventional cyclic debugging method for the successive fault isolation levels of abstraction; 2) we can reconstruct the execution behaviour for interprocess communication and synchronization operations to localize faults to an individual process [TSAI96]. In process level monitoring a process is considered as a ‘black box’ which can be either in running, or ready, or waiting state. A process changes its state depending on its current state and the event(s) occurred in the system. We distinguish events that directly affect the program execution at process level from those events that affect the execution at lower level. Arithmetic operations, value assignment to variables, and procedure calls, for example, are events that do not cause immediate state change of a process. Interprocess communication 22 and synchronization operations are among the events that may cause a change of process state and affect execution behaviour. To detect the occurrences of process level events and to record their key values, the monitoring module can be set to detect the interrupts from the I/O devices and the software traps from the applications processes that request services from the kernel. To collect the key values for an event with sub-events on two distinct nodes, such as remote process creation events, the starting and ending conditions should include the interrupts from the interprocess communication devices. The set of process level events includes among others: process creation, process termination, process synchronization, I/O operation, interprocess communication, wait child process, external interrupt, and process state change [TSAI96]. The process level events and key values pertinent to them are summarized in Table 3.1. To monitor these events and to collect information pertinent to them, Tsai et al. preset two sets of conditions in the Quality Control Unit of the interface module of the monitoring system, one condition to trigger data recording, and the other one to stop the recording. The triggering condition that must be satisfied to start data recording is summarized as follows: IF ((system call interrupt) AND (interrupt process-level related)) OR ((System call interrupt) AND (I/O request)) OR (I/O completion interrupt) OR (external interrupt from IPC device) OR (program error interrupt) THEN <trigger data recording process>; 23 After the kernel services system calls or interrupts, the kernel always switches the system mode to the user mode and then returns control to an application process. Thus, the stop condition for all the events can be stated as follows: IF (instruction changes the system mode to user mode) THEN <stop data recording process>; Information pertaining to the process level events is collected from the target system as a block of data that contains the key values for the events. The collected data can be saved to secondary storage for postprocessing or directly sent to visualization module for on-line interpretation, analysis, and display. 3.4.2 Function Level Monitoring Information collected by monitoring at process level may be too abstract for the programmer to remove bugs. To identify faulty components at lower level (i.e. faulty functions or procedures), we need to monitor the system at function level. This can be done in two steps. First, a set of faulty processes are identified by using process level monitoring, and then the faulty processes are monitored at the function level to identify faulty functions using the information collected in the first step. The events that need to be monitored at function level are function calls, and function returns. The function level events and their key values are summarised in Table 3.2. 24 Event Process Creation Process Termination Process Synchronization I/O operation Interprocess Communication Wait Child Process External Interrupt Process State Change Key Values Parent process ID Create Call Time Node ID ------------------------------------------------Child Process ID Creating Process Time Node ID Parent process ID Resuming Time Node ID ------------------------------------------------Child Process ID Termination Time Node ID Process ID Operation (P/V) Semaphore ID Value of the Semaphore Time Node ID Process ID Operation (I/O) I/O port ID Message (I/O buffer) Time Node ID Sending Process ID Message Node ID Send-Call Time Receive-Acknowledgement Time ------------------------------------------------Receiving Process ID Message Node ID Receive-Call Time Receiving-Message Time Parent Process ID Child Process ID Time Node ID Interrupted Process ID I/O port ID Message (I/O buffer) Time Node ID Process ID New State Transition Time Node ID Table 3.1 Process-level events and their key values 25 Event Function Call Calling Function ID Called Function ID Passed-in Parameters Time Function Return Calling Function ID Called Function ID Returned Parameters Time Table 3.5 Key Values 3.2 Function-level events and their key values Interference Due to Monitoring In monitoring distributed systems, the insertion of instrumentation code into a target system affects the performance of the target system which in turn affects the ordering and timing of events. The ordering of the events can be classified as a partial ordering or a total ordering. A partial ordering is a local sequence of events occurring within a node. The timing of local events is referenced to the local clock of the node. Since the local clocks of different nodes are not synchronized, the times recorded in one node cannot be compared to the times recorded in the other nodes. In contrast, the total ordering is a global sequence of all events occurring in the system. In this case the timing of all events is referenced to a single global clock or to synchronized clocks [LAMP78]. Therefore, an event with an earlier global timestamp is definitely occurred before an event with a later timestamp. In sequential computing, since intra-process events have a total ordering an interference due to monitoring affects only the timing of the events, but not their order. In distributed processing, however, delaying one of the processors may slow down or stop the execution of another process thereby causing it to miss a deadline or alter the event ordering with respect to events on remote processors. To minimize the effect of the interference caused by monitoring, two approaches are used: 1) a monitoring hardware device is used to reduce the interference; 2) perturbation analysis technique is used to predict the effect of monitoring and changes are made to reduce the interference. Perturbation analysis is discussed in the next section. 26 3.6 Perturbation Analysis Adding instrumentation code to support a program visualization invariably affects the performance of the target system [TOPO94]. Removing the instrumentation code after monitoring can definitely restore the performance of the target system. However, the target system with the instrumentation code removed may present different behaviour from the one with the instrumented code inserted [TSAI96]. Thus, for the behaviour of the system to remain predictable, the instrumentation code should be kept in the target system permanently. The “true” execution behaviour of the target system can be discovered by predicting and removing the perturbation caused by monitoring. Perturbation analysis techniques examine event ordering and timing in an attempt to find ways to reduce the effects of monitoring interference by adjusting the event ordering and timing. Event ordering is found by reconstructing the total ordering of the interprocess communication events based on the knowledge gained from the system’s kernel and crosscompiler. The “true” event timing are found by measuring the event’s delay due to the execution of the instrumentation code. Malony et al. [MALO92] present two perturbation analysis models. The first model predicts the “true” total execution time of a program from the collected event trace by removing the effect of execution of the instrumentation code. The second model adjusts individual event to its “true” time by removing the effect of execution of the instrumentation code before the events occur. In these models, it is assumed that the execution of instrumentation code can be de-coupled from the execution of the target program, and indirect perturbation such as register reference and cache reference patterns are neglected. Malony et al. conclude that with the proper perturbation model and analysis, increase in execution time due to perturbation can be reduced to less than 20% of the total execution time of the target program. Sute and Kang [SUTE94] propose a technique to preserve the execution behaviour of a target system by giving equal delay time for each involved communication events. In order to maintain the original order of events, dummy probes are inserted before some events to make the delay time uniform. More accurate performance data is obtained by removing the probe time which will be uniform for all events after the adjustments are made. 27 4 Program Visualization In this chapter, we discuss program visualization techniques, and application of visualization to program debugging and performance evaluation. Program visualization is useful in understanding, debugging and finding performance bottlenecks of a target program. Through the use of various displays, erroneous program behaviour can easily be detected and highlighted. The raw data collected by monitoring an execution of a target program can be presented directly as a sequence of data values. However, there is usually too much data for a user to interpret and comprehend. The data needs to be transformed into another form that allows large amount of data to become easily comprehensible. For instance, displaying the data in the form of graph may allow large amount of information to be understood at a glance. In program visualization we display the execution information collected from the target system in the monitoring phase, in a systematic, meaningful and logical way. Program visualization is proven to be an efficient way to display and examine dynamic behaviour of program execution behaviour. A well designed interactive graphical visualization can convey information about the behaviour of program execution much more effectively than textual representations. It also allows the user to control the level of abstraction at which the available information is displayed. In developing a program visualization system for displaying animation of program execution behaviour, two major components need to be developed. Firstly, a monitoring mechanism for extracting and formatting program event information needs to be developed. Secondly, a mechanism for mapping and restructuring the collected information as an input to the visualization component to create animated graphical displays must be developed. 4.1 Program Visualization Techniques Although program visualization is still in its infancy, some general distributed computing visualization techniques are emerging [EILE93]. Statistical displays, communication views, animations, and application-specific visualization are among the visualization techniques. In the SOBER visualization system - the SOBERvis - we employ statistical dis- 28 plays, and communication views of application-specific information. In the rest of this chapter, we present some characteristics of different visualization approaches. 4.2 Statistical Displays Many program performance visualization systems such as ParaGraph [HEAT91] heavily rely on statistical displays for the presentation of performance data. Commonly used statistical displays include bar charts, Kiviat diagrams, and utilization Gantt charts. These displays provide insight into performance of the target distributed computing system, and due to their performance oriented nature they rely heavily on real-time timestamps [TOPO95]. Thus, a visualization system should support a clock synchronization mechanism, or should have an access to a synchronized clock(s). 4.3 Communication Views Communication views are used to represent the message transmission among the nodes in a distributed computing system. Typically, the topology of the processors and interconnection network that is displayed matches the users’ mental model of the topology of the target distributed system. The ParaGraph [HEAT91] visualization system, for example, provides a substantial set of topology-specific communication views. The Lamport view (usually known as space-time view) is one of the popular communication views. In Lamport view, process numbers are listed along the y-axis and time is displayed along the x-axis. Communication events such as send/receive events are represented as a line drawn between the sending and receiving processes. That is the x-coordinates of the line are determined by the send time and receive time, whereas their y-coordinates are determined by the process identity of the sending and the receiving processes. When the Lamport view uses real-time, the view provides information about resource utilization and general communication pattern. When the view uses Lamport logical time, it enforces a consistent ordering on a computation, in addition to displaying communication pattern. Consistency is an important feature for testing and debugging. It is not achievable by using global real-time timestamps[FIDG94]. 29 4.4 Animations Sophisticated graphical toolkits support the feature that is necessary to animate events that are concurrent. This approach conveys critical information to the viewer; information that cannot be simply achieved with a serialized view of a parallel application. The Conch message passing view [TOPO94] is a typical example of a concurrent animation. In this view, processes are laid around a circle. When a process sends a message, a small filled circle representing the message moves towards the centre of the circle in the general vicinity of the process that will receive the message. When the message is received, it moves from its intermediate position in the circle to the receiving process. This display clearly presents message broadcasts and general message passing pattern. 4.5 Application-specific Visualization Application-specific visualization is a program depiction that is developed specifically for a particular application. This type of views illustrate the semantics of a program, its fundamental methodologies, and its inherent application domain [STAS92]. Visualization for correctness debugging is different from that for performance evaluation because debugging requires application-specific program views. An animation of the sorting algorithm, for example, should show the data values being exchanged, whereas a visualization of Gaussian elimination should show the matrix of values as it is manipulated. In other words, an application-specific program visualization is recognized as presenting specific information about the particular program or program class. By presenting the execution of a distributed program in its inherent semantic format or application domain, a visualization system can provide programmers with an insight into the programs functionality. The same information could be acquired by examining the values of program trace variables throughout execution, but this type of tracing is much deliberate and requires the programmer to associate the values of the variables and the program state at a particular time. 30 Program performance visualization differs from application-specific program visualization because performance views depict how efficiently a program is executing on a parallel or distributed system. Performance views illustrate message passing, process utilization, memory access, etc., and they are typically drawn from a library of graphical widgets, gauges, x-y-z plots, and charts. Performance views can be reused for many different applications because they do not focus on the semantics of a particular program. 31 5 Debugging and Testing Program debugging, and performance test are among the several areas in computing to which program monitoring and visualization is applicable. In this chapter we discuss some traditional debugging techniques for sequential programs and present debugging approach to distributed systems. 5.1 Program Debugging In program debugging, two different analysis techniques are used, namely: static, and dynamic analysis. Static analysis is used to analyse the design specification and the source code to detect program anomalies, whereas dynamic analysis is used to analyse the execution behaviour of the program. Static analysis systems are distinguished from dynamic analysis systems by not requiring program execution and by generally checking for structural faults instead of functional faults. That is, the tools of static analysis technique have no knowledge about the intended functionality of the target program, but they simply identify program structures that are generally indicators of an error [CHAR89]. In the next section, we closely investigate the two debugging techniques in more detail. 5.2 Program Debugging techniques 5.2.1 Static Analysis Static analysis tools entirely avoid the interference effect by not executing the programs. They have the potential to identify a large class of program errors that are particularly difficult to find using the dynamic analysis technique. Static analysis is being used to detect two classes of errors in distributed programs: synchronization errors and data-usage errors. Synchronization errors include such bugs as deadlock and ‘wait-forever’. Data-usage errors include the usual sequential errors, such as reading an uninitialized variable, and parallel errors typified by two processes simultaneously updating a shared variable. Static analysis uses formal specification and verification to locate erroneous code before program execution. In general, static analysis is supported by a formal specification language that precisely specifies the system, a formal design method that is systematically used to develop the system, and a formal verification method that logically proves the correctness of the developed system with respect to the specification [TSAI96]. 32 The primary problem with most of static analysis algorithms is that the set of the examined states is large and their worst-case computational complexity is often exponential. In addition, static analysis has inherent limitations in dealing with asynchronous interactions between processes. In other words, it is not possible to fully describe and model the behaviour of distributed systems by using static analysis technique before program execution. 5.2.2 Dynamic Analysis In traditional dynamic method for debugging sequential software, the program is executed until an error manifests itself; the programmer then stops the execution, examines the program status, inserts assertions, and re-executes the program in order to collect additional information about the causes of the error. This style of debugging is called cyclical debugging. In cyclical debugging three approaches are used: memory dumps, tracing, and breakpoints. 5.2.2.1 Memory dumps The memory dump approach provides the lowest level debugging information. Once the system is terminated abnormally or by request from the programmer, the program status including program object code, register contents, and a memory contents, is dumped into a file. The advantage of this approach is that it provides sufficient information necessary to locate an error. Its drawbacks are that it requires programmers to have a strong background in low-level computer languages to examine the dumped code, and it is tedious and error prone. 5.2.2.2 Tracing The tracing approach utilizes special tracing facilities supplied by the operating system, a compiler or programming environment to display selected information. The trace facility continuously tracks every step of program execution including control flow, data flow, and variable contents, and it reports relevant changes at defined times [CHEU90]. The advantage of tracing approach is that the user can interactively suspend program execution to examine changes in program status at any time. In addition, the tracing approach has the advantage of output debugging and making trace insertion easier. The tracing approach completely relies on the programmer to specify appropriate actions. If traces are enabled for multiple processors, the programmer or the debugger must assem- 33 ble them to obtain a global trace. In any case, global timestamps (either real-time or logical time) are necessary to timestamp the trace information. Hence a clock synchronization support is required. Although it is possible to develop a clock synchronization, many distributed operating systems do not provide one. When no such facility is available, we can make a selected processor responsible for generating the global trace according to the order in which the trace messages are received from all other processors. However, because of variable communication delays and the non-determinism of processor scheduling, the trace messages may not arrive at the selected processor in the order they were generated. The selected node may also become a bottleneck for the collection of trace information. Thus, a better way to address this problem is to provide a clock synchronization support. 5.2.2.3 Breakpoints A breakpoint is a point in a program execution flow where normal execution is suspended and a relevant run-time information, such as variable values, stack counters, and register values can be displayed. At breakpoint the programmer can interactively examine and modify parts of program status, or control later execution by requesting single-step execution or setting further breakpoints. The advantage of breakpoints approach is that it requires no extra code in the program, and hence avoids the effects of adding debugging probes to distributed programs. It also allows the programmer to control distributed execution and to select display information interactively. Its disadvantage is that it requires strong knowledge of programming and debugging to set the breakpoints at appropriate places in the program and to examine the relevant data. Traditional distributed debuggers generally support the same type of breakpoints as those found in sequential debuggers [CHEU90]. 5.3 Performance Measurement As multiprocessor systems proliferate in the market, there is an increasing need to evaluate their relative performance when executing various applications. Performance measurement is conducted on an existing computer system to identify current performance bottlenecks, to correct them, and to prevent potential future performance problems. An advantage of performance measurement is that the performance of the real system rather than that of the model system is obtained (as opposed to performance modelling). Disadvantages of per- 34 formance measurement, however, include the need for a real running system, and the necessary design of the measurement instrumentation. 5.4 Debugging Distributed Systems The classical approach to debugging sequential programs involves repeatedly stopping a program execution, examining program state, and then either continuing or reexecuting in order to stop at an earlier point in the execution. Unfortunately, distributed programs are not always reproducible because of their non-deterministic nature. Even when they are run several times with the same input, their results can be radically different. These differences are caused by races - a situation which occurs whenever two or more processes that are running in parallel are trying to use a resource simultaneously. For example, while two processes are running concurrently, one process may attempt to write a memory location while the other process is reading the memory location. The behaviour of the second process depends on whether or not it reads the new value after the first process has completed writing the memory. The non-determinism arising from races is particularly difficult to deal with because the programmer often has little or no control over it [CAHR89]. The resolution of a race may depend on each CPU’s load, the network traffic, and non-determinism in the communication medium. The cyclic debugging approach often fails for distributed programs because of their non-deterministic behaviour. Another problem found in distributed systems is that the concept of “global state” is misleading or even non-existent [LAMP78]. Without a synchronized global clock, it may be difficult to determine precisely the global order of events occurring in distinct, concurrently executing processors. The most straightforward approach to implementing distributed debugging is to associate a sequential debugger to each target process and to collect information from each distributed debugger. This implementation would be adequate had bugs occurred only inside the nodes. However, processes executing on different nodes have interprocess relationships that would not be detected with such an implementation. Charles et al. [CHAR89] categorize dynamic analysis techniques for debugging concurrent systems into two general categories: traditional parallel debuggers (sometimes called 35 “breakpoint” debuggers) and event-based debuggers. The traditional parallel debuggers are the easiest to build and therefore provide an immediate partial solution. They provide some control over program execution and state examination. Event-based debuggers provide better abstraction than that provided by traditional style debuggers. They also address the interference effect by permitting deterministic replay of non-deterministic programs. Using breakpoints technique to debug distributed computing systems raises three problems: 1) it is impossible to define breakpoint in terms of precise global state; 2) the semantics of single-step execution are no longer obvious (some researchers define it to be the execution of a single machine instruction or a statement of source code on a local processor, others consider it to be a single statement on each processor involved, and still others treat it as message transmission, reception, or process creation/termination); 3) there is a problem of halting a process cluster at a breakpoint or after a single step [CHEU90]. When a breakpoint is triggered, either all of the processes in the distributed program or only the process encountering the breakpoint can be stopped. The former can be difficult to achieve within a sufficiently small interval of time, and the latter can have a serious impact on systems that contain mechanisms such as time-outs. To address this issue, Cooper [COOP87] introduced a logical clock mechanism to maintain correct time-out intervals, and thus provide transparent process halting. Some debugging issues are very critical with respect to the current implementation of the SOBER system. Determining the number, the types and the size of messages generated by a given operation is, for example, very important to identify communication bottleneck. This same information can also be used in experimenting with the SOBER system using different network topologies (e.g. ring or hybrid of star and ring) in our future work to reduce message bandwidth. A detection of a failure of a computing node in general and that of the central switching node in particular, is extremely useful in the process to maintain the faulty node dynamically in future. 5.5 Chapters Review In this and the previous chapters, i.e. chapter 2 - 5, we have discussed several general and basic issues including structures and architectural models of distributed computing systems, general techniques for program monitoring, visualization and debugging and their 36 extension to distributed computing environments. Some of the concepts and principles discussed, especially monitoring and visualization techniques are directly employed in the development of our distributed visualization system, whereas some others are used in selecting a more suitable monitoring and visualization framework for our target distributed system, and others are addressed mainly because of their theoretical significance and for completeness of the report. 37 6 The SOBER Visualization System In this chapter, we briefly discuss the design and implementation of the SOBERvis system and we present its major components. In section 6.1, we present an overview of the SOBER distributed system and its components at a fairly detailed level. In sections 6.2 and 6.3, we discuss monitoring and visualization frameworks we have chosen for SOBERvis system and their implementations. Finally, in section 6.4, we present the design and implementation of the components of the SOBERvis system. The SOBER visualization (SOBERvis) system is a pure software visualization system that is developed to visualize run-time information about the SOBER distributed system. Information such as virtual network topology, statistical information about interprocess communication, and synchronization are presented to assist both SOBER programmers and users in understanding the execution behaviour of the system, in debugging, and in performance tuning. SOBERvis consists solely of instrumentation code inserted into the target system to detect the occurrences of the pre-defined events of interest and to collect the data pertinent to the events, and modules that produce visual display of the collected data. 6.1 The SOBER System The SOBER system is a distributed computing system developed to simulate offshore emergency response training routines. It further approximates real-life situations, and offers an economically attractive alternative to full scale exercises [SAND95]. The SOBER system focuses on some salient aspects of offshore accident management activities such as communication, coordination and resource allocation, and develops vital skills among the personnel and enables them to make the right decisions when the accident occurs. The SOBER system is a fully distributed system that is implemented on a set of heterogeneous autonomous workstations that are interconnected by a network. That is, a computing node in the SOBER distributed system is a single-processor computer with its own storage memory and I/O devices. Different components of the SOBER system are run on a localarea or a wide-area network, in order to better utilize distributed computing resources, and to enable several operators to interact simultaneously. Adding to the flexibility of the system, all communications are based upon standard industry protocols (TCP/IP sockets and 38 UNIX streams). Unlike many other simulators, the SOBER system uses only off-the-shelf hardware, which makes the system scalable with respect to need and cost constraints. As we mentioned earlier in chapter 1, the virtual network topology of the SOBER system is a single-star: a central switching server node and several application nodes directly connected to the central node (see Fig. 6.1). The communication cost of a network with singlestar topology is a linear function of the number of nodes on the network since communication between two nodes requires at most two transfers. However, this data transfer scheme may not ensure speed since the central switching node may be a communication bottleneck. Another major drawback of this scheme is that a failure in the central server node completely partitions the network since the server node is dedicated to message switching task. 6.1.1 Components of the SOBER System The SOBER system is composed of several components or modules which serve to manage and control objects in the shared virtual world. Combination of these basic building blocks, form application units, such as a 3-D virtual flight simulator and a radar system, with which the user can navigate and interact with objects in the virtual world. Utilization of such an object-oriented design methodology increases the independence and adaptability of the modules. An object server module executes on the central switching node and is dedicated to message switching task and to connecting the individual application components, allowing them to communicate by message-passing communication paradigm. The object server module is also responsible for the management and maintenance of the virtual world, for consistency and coordination of object interactions, for updating the state of the virtual world after each simulation time-step, and for communicating the changes to the other SOBER applications to ensure that all participating applications are able to see the consequences of the other’s actions. This in turn enables course participants who are interacting with the objects in the virtual world via the applications to see the consequences of the other participants’ activities in the virtual world. The object server also coordinates communication and execution of other applications. To avoid data corruption in the virtual world, at most one application has control over a given object in the virtual world at any given time and all other applications can only make a ‘proxy’ to all objects they don’t control. 39 A Scenario module is the instructor’s game master used for designing and executing a list of critical events - scenarios - to which the trainees must react. The scenario module also enables the instructor to create and update the library of SOBER objects in the virtual world and to decide the role of a trainee running a given application and which objects the application should control. A scenario clock emulates the time of the day which affects which objects the participants can see at a given time. In order to simulate real-life media channels, the SOBER system also integrates two other communication networks and servers: a message server and a radio communication networks. The message server network provides multimedia capabilities by supporting distribution of sound, picture and text messages. The message server, for example, enables a course instructor to send text information or alarms to the course participants in order to guide them through a problematic phase or to indicate emergencies. A radio communication network is also a separate network that is intended to provide digital sound distribution support. The Oil Simulator and the Weather Simulator components add to the level of realism of the SOBER system by providing a realistic diffusion of oil spills from installation or ships, taking into account weather conditions and the use of booms and skimmers to fight the oil spill. 40 Fig. 6.1: Virtual Network Topology of the SOBER System 41 6.2 Monitoring Framework for SOBERvis As we mentioned earlier, an implementation of a program visualization system has two distinct phases: data collection or monitoring phase, and visual display creation or visualization phase. These two phases can be implemented either as co-routines in case of an on-line visualization mode where the target system and the visualization system execute concurrently, or as distinct routines in case of an off-line visualization mode where the execution of the visualization routines start after the execution of the monitoring routines and the target program has completed. The SOBERvis system employs a software monitoring technique discussed in section 3.1.1 to collect run-time information about the target SOBER system and it shares the computing power and storage resources of the target system. The implementation of the SOBERvis monitoring, supports both on-line and off-line modes of visualization. The monitoring framework for SOBERvis is implemented by developing a monitor server to which a unique port number is assigned so that the SOBER components, servers and applications, connect themselves automatically when they are created. Fig. 6.2 shows the structure of the SOBERvis monitoring framework. The communication routines in the components are modified so that they send a copy of a successfully sent or received messages and events to the monitor server via a separate virtual network intended for the monitoring purpose. Based on the form and type of the events received, and the visualization mode the monitor server invokes a visualization routines to process the message received from the SOBER applications. The set of events of interest in the SOBER distributed system includes, but not limited to, the following: • process creation and termination; • interprocess communication (SEND and RECV events); • SOBER object creation and deletion; and • grabbing and releasing (i.e. to take control of or to release) of an object by an application. The transfer and processing of the collected data can be performed in two different ways depending on the visualization mode. 1) If the visualization mode is on-line, then the collected data is transferred to the visualization modules and is displayed in real-time; 2) if the 42 visualization mode is off-line, then the collected data is transferred to the secondary storage of the node on which the visualization module executes for postprocessing. 6.3 Visualization Framework for SOBERvis The SOBERvis system is an experimental visualization system we have developed to provide program monitoring and visualization support to programmers and users of SOBER distributed computing system. The SOBERvis system supports two types of displays: communication display focuses on large-grained events that are influenced by and related to the overall aspect of the SOBER distributed system, whereas statistical display is providing more detailed information that is useful for program analysis and performance evaluation. Moreover, SOBERvis system supports two visualization modes: on-line visualization and off-line visualization modes. With an on-line visualization mode, run-time information about the target system is displayed in real-time, whereas with an off-line visualization mode data relevant to events of interest is recorded, stored in a secondary storage and replayed when the user interactively input a proper command. The two-mode visualization approach provides more insight into the execution behaviour, performance efficiency, load balance, and the operations of the SOBER distributed system. The SOBERvis system provides its users with an opportunity to interactively determine display type and visualization mode. 43 Radar Station Scenario Weather Simulator Flight Simulator Object Server SOBER Distributed System SOBER Visualization System Graphical User Interface Monitoring Routines Visualization Routines SOBER Visualization System Network SOBER Distributed System Network Fig. 6.2: Structure of SOBERvis System 6.3.1 Communication Displays and Statistical Displays A virtual network topology display presents information such as the SOBER applications connected to or disconnected from the switching server node; the name, type, and state of the applications running on a given computing node; the total number of objects in the virtual world and the number of objects that are controlled by a given application at a given 44 time, and the proportion of messages communicated along a given communication link. For instance, to indicate the current status of a given SOBER application task we use the following colour coding: green=running, red=suspended. To maintain consistency in using colour codes in our visualization system, we use the same colour coding to display the length of time a computing node spends in any one of the above two status. In a communication display, a node of the SOBER system is represented by a sphere labeled by the name of the application running on the node. The color of the sphere representing the node indicates the state of the application executing on that particular node. A communication link between two nodes is indicated by a line drawn between the entities representing the nodes. The radius of the sphere representing a node gives an indication of the number of SOBER objects in the virtual world that are controlled by the application at a given time, and the thickness of a communication link between two application nodes is proportional the maximum size of message sent along the link. In addition to revealing the underlying virtual network topology of the SOBER system, this display enables us to control the proper functionality of the nodes and displays a faulty node using a proper color. For instance, if the object server is crashed, then the system controller or a user is informed by displaying the sphere that represents the object server in red color. The aggregate run-time information collected from the SOBER system, gives an idea about the global system behavior such as interprocess communication, and synchronization. In contrast, a detailed information about each component application reveals the runtime behavior of the component. The statistical information is categorized into two groups, namely global and local, and contains the following information. Global information: • the number of client nodes in the network at a given time • the number of clients in running/ready/waiting state • the total number of SOBER objects in the virtual world database • the total number of events occurred in the system Local information: For each SOBER client/application we display: • the name of the application • the network address of the application 45 6.3.2 • the name and address of the host on which the application executes • the total number of objects controlled by the application at a given time • the number of communication events that occurred in the application On-line vs. Off-line Visualization Approach An important consideration in designing program monitoring and visualization system is whether the information gathered is utilized in an on-line visualization mode or in an offline visualization mode. It is worth considering this issue because it is among the determining factors in selecting an appropriate monitoring technique. If the tracing monitoring technique is utilized for off-line visualization, for example, extensive buffering of the recorded data is possible since the analysis is deferred until the executions of the applications complete. But, if the tracing monitoring technique is used for an on-line visualization, information must be processed in real-time and extensive buffering would not be applicable. Some visualization environments use the same monitoring technique, namely the tracing, to support their on-line and off-line visualization modes. While these types of visualization systems are very appropriate for off-line program analysis, they require a large amount of bandwidth when used for on-line analysis. Other visualization systems distinguish between the two techniques of monitoring (tracing and sampling) and between graphical view used for an on-line analysis and those used for detailed off-line statistical analysis. PVaniM [TOPO96] uses, for example, buffered tracing for a postmortem analysis, by using a buffering hierarchy to collect trace events. For an on-line graphical view, PVaniM uses periodic sampling of events with adjustable granularity; this requires substantially less bandwidth than event tracing but the views are not as detailed. 6.4 Design and Implementation of SOBERvis The primary components of SOBERvis are its graphical user interface, the monitoring routines, and the visualization routines. Figure 6.2 shows the structure of SOBERvis system. In the rest of this section, we provide implementation details of the primary components of SOBERvis system. 6.4.1 Graphical User Interfaces The SOBERvis system has an interactive 3D graphical user interface which allows its users to interactively setup initial visualization parameters such as the display type, and the vis46 ualization mode. A user can choose between communication and statistical displays, and can decide to visualize the information either in an on-line or an off-line visualization mode. These visualization modes are discussed in more detail in section 6.3.1. When the SOBERvis system is invoked, the first window that appears on the screen is the SOBERvis Start-up window. The start-up window contains a control panel which allows the user to interactively setup initial visualization values that are mentioned above by a mouse click on the appropriate button in the control panel. Figure 6.3 shows a snapshot of a start-up window of SOBERvis. We can setup different combination of displays and visualization modes and compare the resulting views. For an on-line visualization mode, we need to specify the sampling rate by using the scale bar that appears on the bottom line of an on-line visualization window. Each initialization parameter assumes its default value if it is not initialized explicitly. The default visualization mode is an off-line mode, whereas the default display type is communication display, and the default sampling rate is 0.0 seconds, i.e. tracing monitoring technique (see section 3.2). The user can alter these default values interactively at any time. 47 Fig. 6.3: Start-up Window for SOBERvis After we setup initial values, we press the StartSbvis button to begin the graphical visualization of the target system based on the default parameters or the visualization parameters we have specified in the initialization step. This in turn displays a user interface component with which a user can interact to refine the visualization by making further choices. For instance, from the window of the SOBER virtual network topology display whose snapshot is shown in figure 6.4, selecting an item from the scrolled list of the names of applications on the left side of the window displays a detailed view of statistical information about the selected application is obtained. 48 The monitoring and visualization system can also be stopped at any time without affecting the execution of the target SOBER system. This can be accomplished either by a mouse click on the Quit button in the start-up window or by selecting the Exit from the File pane of any SOBERvis window. Fig. 6.4: Display of Virtual Network Topology of SOBER system 49 Fig. 6.5 Display of Virtual Network Topology of SOBER system (the red color indicates that the Object Server is suspended) 50 Fig. 6.6 Display of statistical information about SOBER system 51 6.4.2 Monitoring Routines Typically, programmers must hand annotate their code with print statements to produce an event log for visualization. This approach is error prone and time consuming, and may not be able to produce an event trace of sufficient detail. Another problem with this approach involves trace events that are timestamped by readings of local clocks that are not accurately synchronized, thus leading to misleading visualization. For example, we may discover a message receipt event with timestamp earlier than the timestamp of the corresponding send event. Visualization systems that use event trace data filled with such causality violations are misleading. In this subsection, we discuss the modifications we have made to the communication routines of the SOBER system to provide a support for a straightforward visualization of the SOBER system. To address this problem, we directly integrate a monitoring support required for our visualization system into the SOBER distributed system by modifying its communication primitives. Because the standard communication primitives, such as sendMessage() are cognizant of the type, form, source, destination and size of the message, they can be modified so that they can automatically produce event trace information that is necessary for visualization purpose. In our case, we have modified the communication routines, namely sendMessage(), and recvMessage(), so that they send a copy of all messages successfully sent or received by a SOBER application to the monitor server. That means, in our visualization system a tracing monitoring framework is used. A similar approach is employed, for example, in the implementation of POLKA [TOPO94]. When the SOBERvis system is invoked the monitor server initializes itself and waits for a connection request from the SOBER application’s task. The objectServer is the first SOBER application to be invoked and to be connected to the monitor server. Then other application tasks are invoked and connected to the monitor server only if they are successfully connected to the object server. A monitor server receives all the messages from the SOBER applications tasks and parse them. Then, the trace events are filtered on their arrival and only events of interest are further processed or stored for postprocessing. The message header of the SOBER events is also modified in such a way that it contains a flag which is used by the monitor server to categorize a message as a “send” or “recv”. This flag is very useful to get aggregate statistical information such as the total number of send and receive events occurred in a given SOBER application task. 52 6.4.3 Visualization Routines When the monitor server received a message from a SOBER application task, it handles the message by invoking a relevant member function of the monitor server. The first message that is received by the monitor server from a SOBER application task is a request for connection to the server. When the connection request message received the newConnection() member function handles the request. If the connection is successful, information about the application is parsed from the connection information (CM_SAP - service access point), an instance of a SOBER application node is created and appended to the global list of applications connected to the object server. Once a connection is established between a SOBER application and the monitor server, a copy of every message that is sent or received by the application is reported to the monitor server. When the message is received, the dispatchMessage() member function of the monitor server is automatically called to handle the message. If the message received is among the events of interest, a relevant visualization routine is invoked to further process the event information on-line or to store it for postprocessing depending on the display and visualization mode. If the monitor server received a message about a SOBER application is exit (either normally or due to error), the handleClose() member function is automatically called to handle the connection close request. In addition to closing the connection of the application to the monitor server, the handleClose() function marks the application with a disconnected state and the graphical entity that represents the application is redrawn in a red colour to reveal this information. Because the current implementation of the SOBER distributed system does not distinguish between normal and abnormal exit of applications, the SOBERvis system does not distinguish between the normal and abnormal close events. 53 7 Summary and Conclusion The SOBER visualization system is a prototype for monitoring and visualization system developed to aid users and programmers of the SOBER distributed system in understanding its execution behaviour and the environment in which it executes. In the monitoring and visualization of the SOBER distributed system we are interested in higher level of information and hence the abstraction level of monitoring employed in SOBERvis is at application (process) level. This is mainly because the low level information is hidden from the SOBER system users and this in turn makes the collection of such information not only irrelevant but also difficult. The main reason behind this impediment, we believe, is that the SOBER distributed system was developed without a focus on subsequent monitoring and visualization of the system. Our experience shows that monitoring data necessary to produce a meaningful visualization is not only difficult to capture, but it also may not fit into the general visualization framework. For example, in the implementation of the SOBER distributed system the notion of the three states of process is not clearly identified. The “waiting” state of a process is not recognizable. Moreover, there is no distinction between normal exit and abnormal exit of SOBER application task and hence we are obliged to indicate both exits using the same colour code - red. In the case of objectServer application task, the distinction between the normal and abnormal exits may not be necessary because the consequences are almost the same - a crashing of the system. However, for SOBER application tasks a clear distinction should be made between a normal exit and an exit caused by an error as the later may need maintenance. The temporary disconnection of the scenario application task is an issue worth mentioning. We conclude that a support for program visualization, namely event tracing support, need not be an afterthought. Instead, it should be a vital design issue considered when developing a distributed system. Several potential avenues for future work on both the SOBER and the SOBERvis systems exist. Extending the SOBERvis system so that it monitors and visualizes more detailed and extensive information about the execution is a natural expansion of the SOBERvis system. For instance, visualizing information about the objects in the virtual world that are visible to a trainee running a given SOBER application is very useful to a trainer as it provides him 54 with more information about each trainee and assists him to better understand and control the training sessions. The implementation of the SOBER distributed system requires remedy to make monitoring and visualization easier and execution information readily available for SOBER visualization system. This is also a potential avenue to pursue for future work. As we mentioned earlier, the central issue we address in this thesis work is developing a program visualization program that enables us to monitor and control the execution of the SOBER distributed system in general and that of the object server in particular, and to detect faulty computing node(s) if there is any. A mechanism to dynamically maintain such a faulty node, can be integrated into the SOBERvis system and is a potential issue to be addressed in future. 55 8 Bibliography [CHAR89] E. M. Charles, P. H. David, “Debugging Concurrent Programs”, ACM Computing Surveys, 21(4):593-622, December 1989. [CHEU90] W. H. Cheung, J, P. Black, E. Manning, “A Framework for Distributed debugging”, IEEE Software, pp. 106-115, January 1990. [COOP87] R. Cooper, “Pilgrim: A Debugger for Distributed Systems”, Proc. Seventh Int’l Conf. Distributed Computing Systems, CS Press, Los Alamitos, Calif., 1987, pp. 458-465. [COUL88] G. F. Coulouris, J. Dollimore, “Distributed Systems - Concepts and Design”, Addison-Wesley, 1988. [EILE93] K. Eileen, J.T. Stasko, “The Visualization of Parallel Systems: An Overview”, Journal of Parallel and Distributed Computing, 18(2):105-117, June 1993. [FIDG94] C. J. Fidge, “Fundamentals of Distributed Systems Observations”, Australian Computer Science Communications, 16(1):399-408, January 1994. [GUSE89] R. Gusella, S. Zatti, “The Accuracy of the Clock Synchronization Achieved by TEMPO in Berkley UNIX 4.3BSD”, IEEE Trans. on Software Engineering, 16(7). 847853, July 1989. [HABA90] D. Haban, D. Wybranietz, “A hybrid Monitor for Behaviour and Performance Analysis of Distributed Systems”, IEEE Trans. on Software Engineering, 16(2): 197-211, February 1990. [HEAT91] M. T. Heath, J. A. Etheridge, “Visualizing the Performance of Parallel Programs”, IEEE Software, pp. 29-39, September 1991. [JOYC87] J. Joyce, G. Lomow, K. Slind, and B. Unger, “Monitoring Distributed Sys56 tems”, ACM Trans. on Computer Systems, 5(2):121-150, May 1987. [LAMP78] L. Lamport, “Time, Clocks, and the ordering of Events in a Distributed System”, Communication of the ACM, 21(7), July 1978, pp. 558-565. [LARS90] S. Lars, “Postmortem Debugging go Distributed Systems”, Department of Computer and Information Science, Linkoping University, Sweden, October 1990. [MALO92] A. D. Malony, D. A. Reed, H. A. G. Wijshoff, “Performance Measurement Intrusion and Perturbation Analysis”, IEEE Trans. on Parallel and Distributed Systems, 3(4): 433-450, July 1992. [MNAB89] U. Manber, “Introduction to Algorithms: A Creative Approach”, AddisonWesley, 1989. [OGLE93] D. M. Ogle, K. Schwan, and R. Snodgrass, “The Dynamic Monitoring of Distributed and Parallel Systems”, IEEE Trans. on Parallel and Distributed Systems”, 4(7):762-778, July 1993. [PLAT84] B. Plattner, “Real-time Execution Monitoring”, IEEE Trans. on Software Engineering, SE-10(6):756-764, November 1984. [SAND95] O. A. Sandvik, F. Oldervoll, R. Torkildsen, and K. P. Villanger, “Advanced computer technologies improve crisis management training in the offshore industry”, Exploration & Production Technology International, 1995, (also found on internet at http://www.cmr.no/english/computer.html) [SHAR87] An Introduction to Distributed and Parallel Processing. Blackwell Scientific, Oxford, 1987. [SLOM87] M. Sloman, J. Kramer, “Distributed Systems and Computer Networks”, Prentice-Hall, London, 1987. 57 [STAS92] Stasko, John T. and Kraemer, Eileen, “A Methodology for Building Application-Specific Visualization of Parallel Programs”, Graphics, Visualization, and Usability Centre, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-GVU-92-10, June 1992. [STEV90] W. R. Stevens, “UNIX Network Programming”, Prentice Hall Software Series, 1990. [SUTE94] Sute Lei Kang Zhang, “Performance Visualisation of Message Passing Programs Using Relational Approach”, Proceedings of ISCA 7th International Conference on Parallel and Distributed Computer Systems, Las Vegas, Nevada, 6-8 October, 1994. [TOPO94] B. Topol, J. T. Stasko, and V. S. Sunderam, “Integrating Visualization Support into Distributed Computing Systems,” Graphics, Visualization, and Usability Centre, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-GVU-94/38, October 1994. [TOPO95] Topol, Brad and Stasko, John T. and Sunderam, Faddy S., “The Dual Timestamping Methodology for Visualizing Distributed Applications,” College of Computing, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-CC-95-21, May 1995. [TOPO96] Topol, Brad and Stasko, John T. and Sunderam, Vaidy S., “Monitoring and Visualization in Cluster Environments,” College of Computing, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-CC-96-10, March 1996. [TSAI91] J. J. P. Tsai, K. Y.Fang, H. Y. Chen, Y. D. BI, “A Non-interference Monitoring and Reply Mechanism for Real-Time Software Testing and Debugging”, IEEE Trans. on Software Eng. Vol. 16(8), pp. 897-916, August 1991. [TSAI96] J. J. P. Tsai, Y. Bi, S. J. H. Yang, R. A. W. Smith, “Distributed Real-Time Sys58 tems: Monitoring, Visualization, Debugging, and Analysis”, John Wiley & Sons. Inc., 1996. [WILL93] William F. Appelbe, John T. Stasko, and Eileen Kraemer, “Applying Program Visualization Techniques to Aid Parallel and Distributed Program Development”, Graphics, Visualization, and Usability Centre, Georgia Institute of Technology, Atlanta, GA, Technical Report GIT-GVU-91-08, October 1993. 59 9 Appendix The source code of the SOBER visualization system is appended to this report. The list of all source files and their starting page number is as follows: 60

Abstract

Related documents

Products

Support

Abstract

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib